RAC Interconnect Performance – Jumbo Frames

By
default, Ethernet has a variable frame size up to 1,500 bytes. The Maximum
Transmission Unit (MTU) defines this upper bound and defaults to the 1,500 byte
limitation. If data is sent across the network, the data is broken into pieces
no larger than the MTU frame size. Right away, we can see a problem with the
MTU limitation for Oracle RAC’s Cluster Interconnect. Many Oracle databases are
configured with a database block size of 8KB. If one block needs to be
transferred across the private network for Cache Fusion purposes, the 8KB block
will be broken into six frames. Even with a 2KB block size, the block will be
broken into two frames. Those pieces need to be assembled back together when
arriving at the destination. To make matters worse, the maximum amount of data
Oracle will attempt to transmit is defined by multiplying thedb_block_size initialization
parameter by the db_file_multiblock_read_count parameter.  A
block size of 8KB taken 128 blocks at a time leads to 1 megabyte of data
needing to be transferred.
Jumbo
Frames allows a MTU value of up to 9,000 bytes. Unfortunately, Jumbo Frames is
not allowed in all platforms. Not only does the OS need to support Jumbo
Frames, but the network cards in the servers and the network switch behind the
private network need to support Jumbo Frames. Many of today’s NICs and switches
do support Jumbo Frames, but Jumbo Frames is not an IEEE standard, and as such,
there may be different implementations that may not all work well together. Not
all configurations will support the larger MTU size. When configuring the
network pieces, it is important to remember that the smallest MTU of any
component in the route is the maximum MTU from point A to B. You can have the
network cards configured to support 9000 bytes, but if the switch is configured
for a MTU of 1,500 bytes, then Jumbo Frames won’t be used. Infiniband supports
Jumbo Frames up to 65,000 bytes.
It
is out of scope of this book to provide direction on how to enable Jumbo Frames
in the network switch. You should talk with their network administrator, who
may, in turn, have to consult the switch vendor’s documentation for more
details. On the OS network interface side, it is easy to configure the larger
frame size. The following examples are from Oracle Linux 6. First, we need to
determine which device is used for the Cluster Interconnect.
[root@host01 ~]$ oifcfg getif
eth0  192.168.56.0  global  public
eth1  192.168.10.0  global  cluster_interconnect
The eth1 device
supports the private network. Now we configure the larger MTU size.
[root@host01 ~]# ifconfig eth1 mtu 9000
[root@host01 ~]# vi
/etc/sysconfig/network-scripts/ifcfg-eth1
In
the ifcfg-eth1 file, one line is added that says ?MTU=9000? so
that the setting persists when the server is restarted.
The
interface is verified to ensure the larger MTU is used.
[root@host01 ~]# ifconfig ?a
eth0      Link
encap:Ethernet  HWaddr 08:00:27:98:EA:FE 
          inet
addr:192.168.56.71  Bcast:192.168.56.255  Mask:255.255.255.0
          inet6
addr: fe80::a00:27ff:fe98:eafe/64 Scope:Link
          UP
BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX
packets:3749 errors:0 dropped:0 overruns:0 frame:0
          TX
packets:3590 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0
txqueuelen:1000
          RX
bytes:743396 (725.9 KiB)  TX bytes:623620 (609.0 KiB)
eth1      Link
encap:Ethernet  HWaddr 08:00:27:54:73:8F 
          inet
addr:192.168.10.1  Bcast:192.168.10.255  Mask:255.255.255.0
          inet6
addr: fe80::a00:27ff:fe54:738f/64 Scope:Link
          UP
BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX
packets:268585 errors:0 dropped:0 overruns:0 frame:0
          TX
packets:106426 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0
txqueuelen:1000
          RX
bytes:1699904418 (1.5 GiB)  TX bytes:77571961 (73.9 MiB)
Notice
that device eth1 has the larger MTU setting. The traceroute utility
can be used to verify the largest possible packet size.
[root@host01 ~]# traceroute host02-priv ?mtu
traceroute to host02-priv (192.168.10.2), 30 hops
max, 9000 byte packets
 1  host02-priv.localdomain
(192.168.10.2)  0.154 ms F=9000  0.231 ms  0.183
ms
Next,
a 9,000 byte packet is sent along the route. The ?F option ensure the packet is
not broken into smaller frames.
[root@host01 ~]# traceroute -F host02-priv 9000
traceroute to host02-priv (192.168.10.2), 30 hops
max, 9000 byte packets
 1  host02-priv.localdomain
(192.168.10.2)  0.495 ms  0.261 ms  0.141 ms
The
route worked successfully.
Now
a packet one byte larger is sent along the route.
[root@host01 ~]# traceroute -F host02-priv 9001
too big packetlen 9001 specified
The
error from the traceroute utility shows the packet of 9,001 bytes is too big.
These steps verify that Jumbo Frames is working. Let’s verify that the change
improved the usable bandwidth on the cluster interconnect. To do that,
the iperf utility is used. The iperfutility can
force a specific packet length with the ?l parameter. The public interface is
not configured for Jumbo Frames and no applications are connecting to the nodes
so the public network can be used as a baseline.
[root@host02 ~]# iperf -c host01 -l 9000
————————————————————
Client connecting to host01, TCP port 5001
TCP window size: 22.9 KByte (default)
————————————————————
[  3] local 192.168.56.72 port 18222
connected with 192.168.56.71 port 5001
[ ID]
Interval       Transfer     Bandwidth
[  3]  0.0-10.0
sec   923 MBytes   774 Mbits/sec
The
same test is repeated for the private network with Jumbo Frames enabled.
[root@host02 ~]# iperf -c host01-priv -l 9000 
————————————————————
Client connecting to host01-priv, TCP port 5001
TCP window size: 96.1 KByte (default)
————————————————————
[  3] local 192.168.10.2 port 40817
connected with 192.168.10.1 port 5001
[ ID]
Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.28
GBytes  1.10 Gbits/sec
Here
we see that the bandwidth increased from 774 Mbs/sec to 1.10 Gbs/sec, a 42%
increase! For the same 10 second interval, the number of bytes transferred
increased from 923 megabytes to 1.28 gigabytes, a 65% increase!
If
the Oracle RAC systems are using Ethernet (Gig-E or 10Gig-E) for the Cluster
Interconnect, then the recommendation is to leverage Jumbo Frames for the
private network. It is less common to employ Jumbo Frames for the public
network interfaces. Jumbo Frames requires that all network components from end
to end support the larger MTU sizes. In some cases, it may be tricky to
diagnose issues where Jumbo Frames will not work in the system, but even then,
the effort is well worth the cost.

  • July 6, 2018 | 17 views
  • Comments