Sunday, March 25, 2012

Node Hang on Reboot During Exadata Patching

If you have done Exadata patching few times, you are likely to know the dreaded situation when during the patching, especially the cell nodes refuse to come back.

You wait and wait, and know that all the cells have restarted successfully, except one or may be sometimes two. The patching completes and the imageinfo command shows that the Active image version has been updated at all the cells, and now only if that down cell could come up....

Eventually you either restart the cell through ILOM or ask SA or you yourself hard reboot it. It comes back and you find out that the Active image version is still pointing towards the older version. You sift through logs, check the usb version and all that stuff.

This situation likely happens due to the lock on udev. So its a very good idea to check for such kind of locking before cell patching with the following command:

/opt/oracle.cellos/validations/init.d/checkdeveachboot

If you find any locks, reboot the cell, and then proceed with the patching. If it happens during middle of patching, and you find that a cell which was brought up through hard reboot has older image version, then check for these locks and reboot the cell, and apply the patch.

No comments: