Archive for September, 2011

Do you know where your SAN connections are?

September 30, 2011

One of the most interesting weeks ever at work. Newish guy in the networking group accidentally pulled out all the fiber connecting servers to our SAN. Did I say that he did this in the middle of the day on Tuesday?

Lessons learned

It took us 2 hours to find the right connections and plug everything back in. A lot of the documentation was out of date and wrong.
Redundancy is awesome. A 2 node cluster with 4 HBA’s only needs one for both instances to come back up. Out of 30 or so cables we needed to find the connections for only 3-4 to get everything back up and did the rest with less stress.
And Microsoft is awesome as well. Everything came back with no corruption as soon as the fiber found its home again

Duplicate data in a unique index caused transaction log growth

September 23, 2011

I’ve had a problem database for the last year where about 5GB of data caused the log file to grow to almost 500GB. This wasn’t a critical database so I never paid much attention to it until recently. The database in question was for the Symantec Anti-virus management software which worked the entire time the database was corrupt.

We called Symantec support but they couldn’t find anything wrong with the application or the database. Yet almost every day our differential backups were close to 500GB when the data files added up to less than 10GB. Since this was not a critical database we did not do log backups. When we set it up to back up the transaction log it would backup 450GB of data every single time even though we truncated the log after each backup.

I decided to move the database to another server with less I/O and one that i was building as a dedicated DBA server. It would take hours to move 500GB so I first tried to restore the database from backup. Numerous attempts from different days failed. The restore process would simply freeze at some point and then fail.

Then I decided to consult with our Windows admins who shut the Symantec software off so that I could cleanly detach the database and manually copy the files over. Since the transaction log files would take hours to copy over I decided to try to rebuild them. I’m not going to go into the details since it’s well documented, but in 10 minutes i had mounted my original data files and SQL created a new 1MB log file.

My next step was to check for corruption. This took about 5 minutes and one of the errors was duplicate data in a unique index. I don’t have the exact wording anymore but SQL even gave me the data that had duplicate values in the table.. I tried to drop and recreate the unique index but it wouldn’t let me due to duplicate data. So I deleted one of the rows, recreated the index and the database has run without issue for the last 2 weeks and the transaction log hasn’t grown past 220MB.

My theory is that SQL was constantly trying to recreate the index in the background and this was the cause of the huge growth in the transaction log.

%d bloggers like this: