Issue:

Having a larger value for MTU such as 65520 bytes = 64kb will require more memory pages to support IPoIB, the large frame size is derived from HCA IB Card with driver support at the o/s level. In some cases it may be required to reduce the size of the frame to avoid memory page fragmentation issues.

Value of 65520 bytes would require 18 contiguous memory pages based on a default kernel pagesize.

IP over IB similar to TCP on Ethernet does contain additional packet overhead.

IPoIB Packet Frame consists of –>

| Infiniband Header | IPoIB Header |  IP DATA |

Lets assume the kernel by default will use a pagesize of 4k, you can verify this value –>

# /usr/bin/getconf PAGESIZE
4096

Lets simplify the issue, in the case of having an MTU value of 7000 bytes we would allocate 2 contiguous pages,  the frame needs to also accommodate additional overheads for the packet frame which in total would be 8192 bytes.
So the kernel would need 2 pages for a 7000 byte value.

This consists of 7000 MTU value, plus IPoIB Packet Frame would be approx 8192 bytes this requiring 2 contiguous memory pages.

If we increased the MTU to a value of 8192 then to be able to handle the additional IPoIB packet frame overhead we would then require 3 contiguous pages,  however some of the additional memory would be wasted as the packet overhead does not require a full page but since the kernel allocated memory is based on pagesize this can be wasted.

Recommendation values for Exadata based on reduction values are as follows –>

62000 will require 16 contiguous memory pages per buffer.
31000 will require 8 contiguous memory pages per buffer.
15000 will require 4 contiguous memory pages per buffer.
7000 will require 2 contiguous memory pages per buffer.

For more in-depth analysis slabinfo can be used.

SOLUTION

Both RDBMS and +ASM will use RDS protocol in Exadata and this does not have a direct correlation to MTU changes.
You can verify this by looking at the alert.log of the instance when it starts up.

E.g

Using parameter settings in client-side pfile /u01/app/11.2.0.3/grid/dbs/init+ASM1.ora on machine <HOST_NAME>.<DOMAIN>
System parameters with non-default values:
large_pool_size          = 12M
instance_type            = “asm”
remote_login_passwordfile= “EXCLUSIVE”
asm_diskstring           = “o/*/*”
asm_power_limit          = 1
diagnostic_dest          = “/u01/app/grid”
Cluster communication is configured to use the following interface(s) for this instance
192.168.1.1
cluster interconnect IPC version:Oracle RDS/IP (generic)
IPC Vendor 1 proto 3
Version 4.1

Grid Infrastructure Clusterware utilizes IPoIB so the changes to MTU are relevant only to Clusterware.
Therefore changes required are only performed on the DB Compute Nodes.

In some earlier releases of 11.2 having different MTU sizes between nodes can cause node evictions.
In my testing plan this was done in an enviroment running 11.2.0.3 with Exadata BP#20 and this could be done in a rolling manner, i.e. one node at a time while having other nodes up in the cluster without any issues.

Exadata 11.2.0.3 BP#20 contains  –>
– DATABASE PATCH SET UPDATE 11.2.0.3.7 (INCLUDES CPUJUL2013)
– GRID INFRASTRUCTURE PATCH SET UPDATE 11.2.0.3.7 (GI COMPONENTS)

Recommendation is to apply latest Bundle before making this change,  such as 11.2.0.3 EXABP#20

Steps require root user!

Step 1.  Stop Grid Infrastructure

# crsctl stop crs

Step 2.  Once Clusterware is shutdown cleanly,  lets now proceed to changing MTU

PLEASE NOTE…      Below example covers 2 interface HCA cards “ib0 and ib1” since this is an X2-2 RACK.    If your running an X2-8 -> X4-8 Environment then the below ib# change needs to be implemented on all 4 HCA IB cards!              Also if your running V2 RACK then the interface name for bonded IB will be bond0.

This will need to be done on ib0, ib1 and the bond interface bondib0 as follows.
Changing the old value of 65520 to the new value of 7000 –>

E.g.

# cd /etc/sysconfig/network-scripts

# cat ifcfg-ib0
DEVICE=ib0
USERCTL=no
ONBOOT=yes
MASTER=bondib0
SLAVE=yes
BOOTPROTO=none
HOTPLUG=no
IPV6INIT=no
CONNECTED_MODE=yes
MTU=7000

# cat ifcfg-ib1
DEVICE=ib1
USERCTL=no
ONBOOT=yes
MASTER=bondib0
SLAVE=yes
BOOTPROTO=none
HOTPLUG=no
IPV6INIT=no
CONNECTED_MODE=yes
MTU=7000

Please Note.
If your environment is X4 architecture with ACTIVE/ACTIVE IB then the below change is not required since we do not use Bonding over IB with X4 but rather the native IB0 and IB1 interface cards only!

# cat ifcfg-bondib0
DEVICE=bondib0
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=###.###.###.###
NETMASK=###.###.###.###
NETWORK=###.###.###.###
BROADCAST=###.###.###.###
BONDING_OPTS=”mode=active-backup miimon=100 downdelay=5000 updelay=5000 num_grat_arp=100″
IPV6INIT=no
MTU=7000

Step 3.  Node will require a reboot after making these changes.

In order for the openibd service to also reflect these changes correctly and verify
no issues moving forward with ifcfg entries best advised to perform a reboot.

# sync
# reboot

Step 4.  Confirm Grid Infrastrucure Clusterware comes up cleanly post reboot and verify MTU values –>

 

# ifconfig ib0 |grep MTU
UP BROADCAST RUNNING SLAVE MULTICAST  MTU:7000  Metric:1# ifconfig ib1 |grep MTU
UP BROADCAST RUNNING SLAVE MULTICAST  MTU:7000  Metric:1

# ifconfig bondib0 |grep MTU
UP BROADCAST RUNNING MASTER MULTICAST  MTU:7000  Metric:1

You can proceed to repeat the same steps on the remaining nodes in the cluster.

Reference: Oracle Doc ID 1586212.1

 

Recent Posts

Start typing and press Enter to search