Friday, August 20, 2010

ORA-29701: unable to connect to Cluster Manager

Just imagine that its your 3rd day at the new job, and you come to know that somebody has deleted some of the oracle binaries from the Oracle Home's bin folder?

It's a 6Tb data warehouse of a telco, which is growing rapidly on daily basis (20GB daily). Its a 10.2.0.3 database on Solaris 10 on a SPARC machine using ASM. I manage this database remotely, as it is in other country at the client site. As the database was running, when the rm command was inadvertently run in the bin directory, so only few of them got deleted. The resilient Oracle kept running without a frown.

I wanted to restart the instance and the whole server to see if database comes up or not, but client didn't let me do that. Also they didn't have any spare server from where I could just copy/paste the binaries to check if they work or not. I pressed hard for restart but in vain.

If something can go wrong, it will go wrong. Murphy was so right.

Couple of days ago, there was a power breakdown at the data center of the client, and server got shutdown. I mentally prepared myself for the worst. Well, I had the list of missing binaries, so on my laptop on I installed Solaris 10 on VMware and installed oracle 10g base release, and got the binaries. Though I was not sure that whether the oracle 10g's binaries on x86 solaris would work on sparc solaris or not.

After the server restart, first I tried to restart the ASM and database without binaries. When I tries to restart the ASM, it gave me error:

ORA-29701: unable to connect to Cluster Manager

From logs, it was shown that it was searching for crsctl binary in the bin directory and it wasn't there.

I copied the crsctl from the x86 to sparc in ORACLE_HOME/bin directory, and tries starting ASM instance again:

It started like a charm :)

1 comment:

Unknown said...

Hi,

Please be aware that this only worked because crsctl is not a binary - it is a shell script.

$ file crsctl
crsctl: Bourne shell script text executable