Feb 2012 – The Modern Dataist

\”How long is my index rebuild going to take ?\”

Still busy, huh ? Yeah, I know what that\’s like, but since you asked…

There\’s no easy answer to this question, so I\’m going to use the standard DBA response of \’It depends\’.

Not helpful ? How about if I explain why it depends ? Or at least touch on a few things ? Would that be OK ? Thanks….

The first set of factors are what I like to call \’Hardware\’ or perhaps \’Tin\’ factors. The more poke your servers have got in terms of RAM, CPU, and IO, the less time it will take. Like, d\’uh !! 🙂

Moving swiftly on, the next group are what I like to call \’Software\’ factors. The OS you\’re running on, the version of SQLServer you have installed, etc. These don\’t have so much impact as the hardware factors, but a two-point list looks rubbish. And it\’s nice to know whether your SQL Server will support online index creation and / or index partitioning – some editions don\’t. You can check supported features of SQL2008 R2 by Edition @:

http://msdn.microsoft.com/en-us/library/cc645993.aspx

The last and most important group are what I like to call \’Data\’ factors. Or at least, I think they\’re important. And it\’s my blog.

Factor 1: Table size

A multi-million / billion row table will take longer to index than a 1,000 row one. Again, like, d\’uh !

Factor 2: Availabililty concerns

I fully understand when people want to DROP and CREATE indexes in separate batches, but bear in mind that a CREATE without ONLINE=ON will have a detrimental effect on system performance. And if you\’re replacing an index with one of a new shape, you\’d want to DROP_EXISTING=ON as well so that at least current queries have half a hope of succeeding while the index is being rebuilt.

Factor 3: Partitioning

In some cases, you can have a partitioned table and your NONCLUSTERED indexes are not partition-aligned (ie: they\’re on PRIMARY or in their own filegroup). It\’s up to you whether you partition-align your NONCLUSTERED indexes, but in a case I worked recently, it was far better to have them on the same scheme as the base table. I\’ve not actually measured performance of index creation on both constructs, but you may wish to consider both scenarios.

NOTE: If you rebuild your CLUSTERED index for a partitioned table, but forget to put the index on the partition scheme, you\’ll end up moving the ENTIRE table off it\’s partitions and into the filegroup where you created the index. THIS CAN TAKE A LONG TIME. And you\’ve taken the decision to partition, you don\’t want anything to rip it all out for you unexpectedly.

Factor 4: Index columns and uniqueness

This one is a biggie, and the reason that this post exists in the first place.

I\’ve been doing some index rebuilds, making them partition-aligned, with DROP_EXISTING=ON and ONLINE=ON, all good stuff. On a 500 million row table.

The first one I did held a small range of values in the column that was being indexed. That is too say that each value (when present in the column) was one of, say, 100 possible values for the entire table. Rebuilding that index was a breeze.

One of the later index rebuilds was on a column that was unique across all rows, without duplicates (at least, that was how it is supposed to be). And no, it wasn\’t a GUID, before you ask. So, there were 500 million distinct entries to index, and that took AN ORDER OF MAGNITUDE longer than the first one.

A further index rebuild was similar to the first, but had a couple of included columns for query performance reasons. Whilst this rebuild did not take as long as the \’uniquer\’ one, it still took a not inconsiderable amount of time compared to the simplest structure.

So, to answer the question, you should really be taking all these factors into account when planning your index maintenance / rebuild windows.

And since we\’re not all on the same hardware, using the same software, or looking at the same data, the answer has to be…

..it depends.

Good luck !

Back soon….

\”My SSIS package takes AGES to load, and I\’m not happy…\”

If you get paid by the hour, then this post probably isn\’t for you…

If you\’re under a lot of pressure to debug a hideously complex production SSIS package that\’s gone south, or you\’re just a busy DBA, or you interested in waiting a bit less for the SSIS designer in Visual Studio or BIDS to do it\’s thing, then read on. It\’ll be worth it. Perhaps….

OK, I\’m assuming here that you\’re a DBA of some sorts, not a developer. Developers know all sorts of things about Visual Studio. Who knows, some of them may even have shared its inner secrets with you. But DBAs don\’t normally get involved in that whole mess that is VS, and source control is for wimps, right ? <– Joke !! (Sheesh… some people….)

Just as a straw poll, when you open an SSIS package, how many times to you wait for the designer to validate ALL the connection strings, sources, destinations, blah blah blah ? I guess that because you\’re reading this, enough times to make you interested in how to stop it.

Here comes the knowledge. Brace yourself. No, you\’re not bracing properly, see me after class….

Once you\’ve got your VS solution open, and the package your interested in added to that solution (a debate for another time, perhaps), BEFORE you hit \’View Designer\’ on that bad boy, stop a minute. Take a chill pill. Relaaaaaaxxx, dude !!

Under the \’SSIS\’ menu on the toolbar in BIDS / VS, there\’s an option to \’Work Offline\’. Hit that up, and your designer won’t attempt to validate chapter and verse against all datasources. In some cases, you don\’t need to check that a datasource is online to be able to see that a conditional split is wrong, or that there\’s a nasty script task that does something odd when it could have used a native SSIS transform (and the one is DEFINITELY for another time). Now you can hit \’View Designer\’…..

Hey presto, your package loads in record time, and you can get on with solving that day\’s ETL related issues. And that\’s always a good thing, because a bad day with SQL Server is better than a whole lot of other days you could have had instead.

Simples.

Back soon…

Month: Feb 2012

"How long is my index rebuild going to take ?"

"My SSIS package takes AGES to load, and I\'m not happy…"