Mar 2012 – The Modern Dataist

"Use the Force, Luke" – Sharing advice and great software…

\”Use the Force, Luke\” – Sharing advice and great software…

I love being part of the SQL Server community. OK, so my part of it is VERY small – just me and the cat sometimes, and I\’m not even sure SHE understands Data Warehouse concepts. I\’m fairly sure she\’s not read Kimball or Inmon (but neither have I for that matter).

Anyway, one of the great things about the SQL Server community is the extent to which sharing is encouraged. We can all learn from other\’s mis-steps, and sometimes you just need someone to help out or give you some encouragement, and that\’s what it\’s all about.

I\’m going to get a but Star-Wars-y here (and apologies if you don\’t happen to like the prequels), and paraphrase something the adolescent Anakin says to Padme:

\”Sharing, which you could describe as unconditional support, is at the very heart of a Jedi, sorry, DBA.\”

It\’s what drives some of us as humans. It\’s what some of us need and gain comfort from. OK, so THAT bit wasn\’t in Star Wars (George Lucas, if you\’re reading this, it\’s copyright of me, thank you very much). As a DBA, I get a lot out of learning and listening to others. And I love mentoring / training / conversing with others of a similar mind. If I could, I\’d be at the current SQL Bits conference, but someone has to look after the shop…. 😦

Even in those places where there\’s not much community spirit, the world-weary geek can take solace in the fact that there are other people all over the word in exactly the same situation, and all that connects them is the Internet. I\’ve lost count of the number of times I\’ve been in the pit of technical despair and have found comfort and encouragement from places like SQLServerCentral, Twitter, MSDN, etc, even if I\’ve not actually found the answer I\’ve been looking for. Personally, I think it\’d be a bit poor if all your questions could be answered correctly from Wikipedia. Sometimes you need a nudge, sometimes in the wrong direction, to help you grow in knowledge and confidence.

OK, so that\’s the fluffy bit done. Sorry about that.

Having been around SQL Server for, oh, AGES, I\’ve seen 3rd party tools come and go. There are some that I\’d like to have never seen in the first place, and some that I just can\’t get rid of that hang around like a bad smell or blood that you just can\’t pressure-wash of the decking over the patio. Sorry, did I type that out loud ?

BUT…

There are 3 tools that, in my opinion, are must-haves:

RedGate\’s SQLCompare (www.red-gate.com) – a great tool for comparing database objects, and generating change scripts. Actually, I\’ve never used the tool to promote any code into production, but it\’s really nice to know that I can…

SQL Sentry\’s Plan Explorer (www.sqlsentry.com) – Fantastic FREE tool to aid Execution Plan anaylsis. What this tool doesn\’t know about execution plans isn\’t worth knowing…

SoftTree\’s SQLAssistant (www.softtreetech.com) – I know, it\’s an intellisense tool. So why would you want something you get for free in the standard toolset ? Get hold of an evaluation copy, and try it out. It\’s what SSMS could be (or should be, if you love it as much as I do). And it does so much more than just SQL Server.

What links these three excellent pieces of software ? They were all recommended to me by people in the community, poeple who shared their experience, people who want to help out and share the love.

Which is the reason we\’re all in this, right ? To share, to encourage, to learn ? 🙂

Feel free to comment (but keep it clean, yes ? It\’s a #sqlfamily blog)

Back Soon…

Identity abuse – Large deletes and the DBA

This one has been done to death on the Internet, but since I\’m in a reflective mood, here\’s yet another approach to doing large deletes.

OK, so let\’s set the scene a little.

The table we\’re interested in here had some 40 million rows in it, and largely provides tracking of requests and responses based on the date they were added or updated, so our inserts are seqential, our updates less so.

As part of volume testing, a further 120 million rows were added to a range of dates already present in the table.

Like a good DBA, I took a backup of the database just before we added the rows so we had a happy regression path and could revert to a known state to run the tests again (if testing is your thing). We then liberally splattered data around the table, insterting records all over the place. It\’s all good.

Now the bad news – due to limitations on disk space, someone deleted my happy-path backup, and replaced it with a backup of the entire database, complete with the new rows.

Bum.

Bum bum bum bum bum. 😦

Double-bum when I was asked to remove all the 120 million records we recently inserted. The task at hand was to delete approx 80% of the data in the table, and we had a mix of test data (which we didn\’t want) and live data (which we did) in each date range, so it\’s not like I could just chop out whole days at a time

So I just set up a loop and deleted 100,000 rows at a time (to minimise t/log effects) and churned through all the records in a couple of hours.

Simples, no ? Anyone spot how I did that ? Magic happens here, you know…..

We used dates ? Nope, mixed data precluded that.

We inspected every row to see if it could be deleted ? Nope – that would take AGES (and would be no fun AT ALL).

We took all the records we wanted to keep, and binned the entire table before squirting the rows back in ? Nope – mixed data / inspection again. And, again, it would be no fun.

We used the IDENTITY column to identify the rows we didn\’t need. See, I mentioned INSERTS were seqential, but the table was not partitioned in any way, shape or form. 🙂

The logic goes thusly:

We knew that after we took the cut of the database from production, the only thing that would be accessing the data would be the test scripts. Further, we knew that we\’d be accessing only the data we inserted. It therefore follows that the last IDENTITY value that we had BEFORE we started inserting would mark the start point for our deletes – IDENTITY values less that this were already in the table, and everything after that was new (and it meant that we didn\’t have to care about which date the data was concerned with). And we knew what date the backup was from. That\’s what makes it simples…..

Here\’s the loop, in semi-pseudo-code:

SET NOCOUNT ON;

–How many rows do we need to delete…
DECLARE @rowstodelete INT;
SELECT @rowstodelete = COUNT(<>)FROM dbo.InflatedTable WITH (NOLOCK) WHERE <> > <>;

–And we want to know how long each chunk took….
DECLARE @starttime DATETIME;
DECLARE @timetaken INT ;

–set up simple loop
WHILE @rowstodelete > 0
BEGIN

        SET @starttime = GETDATE();

        DELETE TOP (100000) FROM –100,000 is purely arbitrary.
        dbo.InflatedTable
        WHERE <> > <>; –This is the last row we want to keep

        –Report on progress
        SET @timetaken = (DATEDIFF (ss, @starttime, GETDATE()));
        PRINT \’Report – \’ + CAST (@rowstodelete AS VARCHAR (10)) + \’ – Last 100,000 took \’ + CAST (@timetaken AS VARCHAR(10)) + \’ sec\’

        SET @rowstodelete = @rowstodelete – 100000

END

All you need to worry about then is your transaction log growth, but I can\’t give you the whole enchilada in one go.

Where would be the fun in that ?? 🙂

Back soon….

How to tell if your Testers are left-handed.

Something a little different this time, and purely for fun 🙂

I believe that the world is a right-handed conspiracy. As a committed Lefty, I struggle daily with things like scissors, corkscrews and peanut-butter jars. On the other hand, I\’m one of the few people who can write things down and use a computer mouse at the same time. Go figure…..

I was trawling through some user-generated test data the other day (and that\’s about as fun as my life gets right now), and noticed something a bit odd. You see, when asked to input some data, *any* data into a text box, most people with put something like:

\”This is some test text\”

…or perhaps even:

\”I hate testing software. It really blows…\”

If your rest team are *really* inventive, you might even get:

\”The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.\”

…but such occurances are rare, and god forbid that you should actually get the sort of meaningful data that you would have liked. THAT\’ll teach you to be more stringent on you test plans 🙂

But I digress.

What I saw in my test data was this:

\”jkhjkhjkh ihjkhjkh\”

..and this:

\”hjgjhghjgj hgjhg hjghj hjg hj hj jh hj\”

..and even this:

\”sdas. Zxczxcz\”

This set me thinking as to why these specific characters, in a string that was supposed to be random, and I came upon the thought that these patterns were because of the dominant hand of the tester who inputted the data. Think about it. When you drum your fingers impatiently, you would normally use your dominant hand. I see no reason why this would not equally apply to idly-typed text.

The characters J, H, K and I are all on the right hand side of a standard US / UK keyboard. S, D, A, S, Z, X, and C are all on the left.

So the testers were a mix of left- and right-handed (with a strong bias for right, if the test data is to be believed). Or perhaps they were all right-handed, and a couple of them were trying to write down their input strings whilst typing… or eating a sandwich… or whatever you do at your desk that ties up a hand for a few moments.

Anywho, I do know that in the team I\’m in at the moment, there\’s about a 75/25 split between right- and left-handed, which is higher than the statistical distribution average. Mayber left-handed people really hate testing. Or perhaps they\’re a little bit more creative (which is why we make such good DBAs).

Back soon…