ARNodes Network Benchmarks.
In prevision of the transfer of the arnodes to the new Cisco SMB SG300 switches I decided to gather some stats and benchmarks numbers just so I can compare the performance. I used arnode07 and arnodes08 for that purpose. There were no SGE jobs running at the systems were totally quiescent.
Network Hardware and Configuration
First, the hardware. From lspci
e each systems has 3 onboard NICs, eth0, eth1 and 2:
02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 0a:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 0a:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
From lshw
:
*-network description: Ethernet interface product: 82574L Gigabit Network Connection vendor: Intel Corporation physical id: 0 bus info: pci@0000:02:00.0 logical name: eth0 version: 00 serial: 00:e0:81:c2:03:f7 size: 1GB/s capacity: 1GB/s width: 32 bits clock: 33MHz capabilities: pm msi pciexpress msix bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=2.3.2-k duplex=full firmware=1.8-0 ip=192.168.86.207 latency=0 link=yes multicast=yes port=twisted pair speed=1GB/s resources: irq:17 memory:faee0000-faefffff ioport:cc00(size=32) memory:faedc000-faedffff *-network:0 description: Ethernet interface product: 82576 Gigabit Network Connection vendor: Intel Corporation physical id: 0 bus info: pci@0000:0a:00.0 logical name: eth1 version: 01 serial: 00:e0:81:c2:04:c2 size: 1GB/s capacity: 1GB/s width: 32 bits clock: 33MHz capabilities: pm msi msix pciexpress bus_master cap_list rom ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=igb driverversion=5.0.3-k duplex=full firmware=1.64, 0xe5ff0000 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=1GB/s resources: irq:24 memory:fbb20000-fbb3ffff memory:fbb00000-fbb1ffff ioport:d880(size=32) memory:fbafc000-fbafffff memory:fbac0000-fbadffff memory:fbaa0000-fbabffff memory:fba80000-fba9ffff *-network:1 description: Ethernet interface product: 82576 Gigabit Network Connection vendor: Intel Corporation physical id: 0.1 bus info: pci@0000:0a:00.1 logical name: eth2 version: 01 serial: 00:e0:81:c2:04:c3 size: 1GB/s capacity: 1GB/s width: 32 bits clock: 33MHz capabilities: pm msi msix pciexpress bus_master cap_list rom ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation configuration: autonegotiation=on broadcast=yes driver=igb driverversion=5.0.3-k duplex=full firmware=1.64, 0xe5ff0000 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=1GB/s resources: irq:34 memory:fbbe0000-fbbfffff memory:fbbc0000-fbbdffff ioport:dc00(size=32) memory:fbbbc000-fbbbffff memory:fbb80000-fbb9ffff memory:fbb60000-fbb7ffff memory:fbb40000-fbb5ffff
Eth0 is one the private network 192.168.86.0/24 network while eth1 and eth2 are on the private (management) network 172.16.10.0/24. eth1 and eth2 are bonded together as bond0.
/etc/network/interface
:
# eth0 - static auto eth0 iface eth0 inet static address 192.168.86.208 netmask 255.255.255.0 broadcast 192.168.86.255 gateway 192.168.86.1 # bonding ethernet network auto bond0 iface bond0 inet static address 172.16.10.208 netmask 255.255.255.0 #arp_ip_target 172.16.10.254 172.16.10.2 #gateway 172.16.10.1 bond_mode balance-alb bond_miimon 100 bond_downdelay 200 bond_updelay 200 slaves eth1 eth2 #up /sbin/ifenslave bond0 eth1 eth2 #down /sbin/ifenslave -d bond0 eth1 eth2
$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:e0:81:c1:e6:4f brd ff:ff:ff:ff:ff:ff inet 192.168.86.208/24 brd 192.168.86.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::2e0:81ff:fec1:e64f/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether 00:e0:81:c1:e6:9e brd ff:ff:ff:ff:ff:ff 4: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether 00:e0:81:c1:e6:9f brd ff:ff:ff:ff:ff:ff 5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 00:e0:81:c1:e6:9e brd ff:ff:ff:ff:ff:ff inet 172.16.10.208/24 brd 172.16.10.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::2e0:81ff:fec1:e69e/64 scope link valid_lft forever preferred_lft forever
NetPipe TCP Benchmarks
Run NPtcp on the receiver (node08) NPtcp -i
and on the sender (node07) run NPtcp -h 192.168.86.208
for eth0
and NPtcp -h 172.16.10.208
for the bond0
interface. Here are the resulting throughput, saturation and signature plots.
The plots on the left are for eth0
and those on the right for bond0
.
Throughput Plots


Saturation Plots
Block size values versus the transfer/elapsed time.


Plotted using a log-log scale, this graph allows us to determine the saturation point, the block size threshold value after which an increase of the block size just results in a close-to-linear increase in transfer time. The region from the saturation point until the end of the graph is the saturation interval, the region where where throughput cannot be improved by increasing the block size.
Signature Plots
Transfer speed versus the elapsed time.


This represents network accelaration. When plotted using a logarithmic scale for the elapsed time one can easily see that the network latency coincides with the time value where the graph takes off.
Nttcp benchmarks
Here I show nttcp
benchmarks for different size of the socket buffer with the option -l <size>
. Note that I have trimmed the output quite a bit after the first run with the lowest socket buffer length, 4096 bytes
eth0
$ for i in 4096 8192 16384 65536 262144 524288; do nttcp -D -v -t -T -x 8388608 -l $i 10.0.0.217; done nttcp-1: buflen=4096, bufcnt=2048, dataport=5038/tcp nttcp-l: transmitted 8388608 bytes Bytes Real s CPU s Real-MBit/s CPU-MBit/s Calls Real-C/s CPU-C/s l 8388608 0.06 0.00 1040.6088 20134.6727 2048 31756.86 614461.4 1 8388608 0.07 0.00 923.5122 20134.6727 2365 32545.72 709571.0 nttcp-1: buflen=8192, bufcnt=1024, dataport=5038/tcp Bytes Real s CPU s Real-MBit/s CPU-MBit/s Calls Real-C/s CPU-C/s l 8388608 0.06 0.00 1054.77286710886.4000 1024 16094.56102400000.0 1 8388608 0.07 0.00 925.5629 20134.6727 1237 17060.66 371137.1 nttcp-1: buflen=16384, bufcnt=512, dataport=5038/tcp Bytes Real s CPU s Real-MBit/s CPU-MBit/s Calls Real-C/s CPU-C/s l 8388608 0.07 0.00 1011.37636710886.4000 512 7716.1951200000.0 1 8388608 0.07 0.00 918.0419 20134.6727 779 10656.63 233723.4 nttcp-1: buflen=65536, bufcnt=128, dataport=5038/tcp Bytes Real s CPU s Real-MBit/s CPU-MBit/s Calls Real-C/s CPU-C/s l 8388608 0.07 0.00 1009.32286710886.4000 128 1925.1312800000.0 1 8388608 0.07 0.00 919.4758 20134.6727 386 5288.69 115811.6 nttcp-1: buflen=262144, bufcnt=32, dataport=5038/tcp Bytes Real s CPU s Real-MBit/s CPU-MBit/s Calls Real-C/s CPU-C/s l 8388608 0.07 0.00 973.73536710886.4000 32 464.31 3200000.0 1 8388608 0.08 0.00 870.6278 20134.6727 307 3982.82 92109.2 nttcp-1: buflen=524288, bufcnt=16, dataport=5038/tcp Bytes Real s CPU s Real-MBit/s CPU-MBit/s Calls Real-C/s CPU-C/s l 8388608 0.07 0.00 1015.37026710886.4000 16 242.08 1600000.0 1 8388608 0.07 0.01 901.4193 5033.2906 445 5977.33 33375.8
bond0
$ for i in 4096 8192 16384 65536 262144 524288; do nttcp -D -v -t -T -x 8388608 -l $i 172.16.10.208; done nttcp-1: buflen=4096, bufcnt=2048, dataport=5038/tcp Bytes Real s CPU s Real-MBit/s CPU-MBit/s Calls Real-C/s CPU-C/s l 8388608 0.07 0.00 966.8611 20134.6727 2048 29506.26 614461.4 1 8388608 0.08 0.00 886.8857 20134.6727 2233 29510.49 669967.0 nttcp-1: buflen=8192, bufcnt=1024, dataport=5038/tcp Bytes Real s CPU s Real-MBit/s CPU-MBit/s Calls Real-C/s CPU-C/s l 8388608 0.07 0.00 949.4477 20134.6727 1024 14487.42 307230.7 1 8388608 0.08 0.00 884.7809 20134.6727 1184 15610.17 355235.5 nttcp-1: buflen=16384, bufcnt=512, dataport=5038/tcp Bytes Real s CPU s Real-MBit/s CPU-MBit/s Calls Real-C/s CPU-C/s l 8388608 0.07 0.00 985.31576710886.4000 512 7517.3651200000.0 1 8388608 0.07 0.00 895.5132 20134.6727 760 10141.58 228022.8 nttcp-1: buflen=65536, bufcnt=128, dataport=5038/tcp Bytes Real s CPU s Real-MBit/s CPU-MBit/s Calls Real-C/s CPU-C/s l 8388608 0.07 0.00 943.83936710886.4000 128 1800.2312800000.0 1 8388608 0.08 0.00 881.19106710886.4000 385 5055.3538500000.0 nttcp-1: buflen=262144, bufcnt=32, dataport=5038/tcp Bytes Real s CPU s Real-MBit/s CPU-MBit/s Calls Real-C/s CPU-C/s l 8388608 0.07 0.00 970.63696710886.4000 32 462.84 3200000.0 1 8388608 0.08 0.00 885.82046710886.4000 394 5200.7039400000.0 nttcp-1: buflen=524288, bufcnt=16, dataport=5038/tcp Bytes Real s CPU s Real-MBit/s CPU-MBit/s Calls Real-C/s CPU-C/s l 8388608 0.07 0.00 984.57846710886.4000 16 234.74 1600000.0 1 8388608 0.08 0.01 882.9184 6711.5576 396 5209.98 39604.0