Washing up

I’ve always hated washing dishes.

This comes from early childhood when, on Sundays, my sibling and I would be banished to the kitchen after the Family Dinner to perform the washing up, one parent having toiled for what seemed like the whole day to prepare the meal and the other having done nothing more arduous than read the newspaper and be older.

There are 3 specific things I hate about washing up:

  1. When there are things hidden in other things, like cutlery in a pan that has a lid on.
  2. When someone drops something into the bowl while you’re washing another.
  3. When someone adds items to the queue of things to be washed after foraging elsewhere in the house.

I guess I hate these things because they change the planned task. You know how many pots there are to wash, you plan what order you want to wash things in, and you set about executing that plan. Then you find the hidden stuff that means there’s more to wash than originally planned. And then some inconsiderate git a sibling / parent nonchalently adds to your workload without so much as a ‘by-your-leave’ that means your plan is now broken, but you still have to do the pots.

Fast forward some <mumble> years, and I’m a DBA working with transactions in SQL Server. And it seems me and SQL Server have something in common.

Imagine a workload submitted to SQL Server. An execution plan will be generated based on statistics and data *at the time the request was submitted*, and SQL Server will attempt to execute that plan.

Now, imagine that one of the elements in the workload is a table variable. SQL Server will always assume that the row count of that table is one (actually, not true, but that’s only recently and spoils my illustration). Table variables are equal to a closed lunchbox in the washing up – you only know what’s in there when you have to open it to clean it. If someone else has crammed stuff in there, that will have a knock-on effect on your ability to complete the work as planned. Same for SQL Server – processing more rows than planned can be expensive.

Imagine that workload has a lookup that uses a large table that has wildly out of date statistics. When SQL Server comes to that part, the assumptions made when the execution plan was generated will be wrong and there’s a whole lot more lifting-and-shifting to be done that was orginally planned.

Also imagine that the workload batch-processes data from a table that is constantly being added to. At the time the plan is decided on, there may be only 50,000 rows in the table, but while the batched are being fetched, a further 200,000 rows are added (yes, I know it’s a contrived and unlikely example) and those new rows will also need to be processed before the workload can complete.

With SQL Server, there are a number of things you can do to avoid this, though:

  1. Use table variables sparingly, and only when you are sure of the size and content. In my experience anything more that 1000 rows in a table variable could lead to complications, so I’d use a temporary table instead. *other experiences are available
  2. Keep your table statistics up to date. Whether you do this via UPDATE STATISTICS or whatever method, I don’t mind, but keep them up to date – your batch processes will thank you later.
  3. Be mindful of transaction management. Make sure you know how much data you’re picking up and how to ignore late-arriving data. Explicit transactions or transaction isolation levels such as RCSI might help you out, but as ever ‘It Depends’.

With washing up, there is only 1 thing you can do – get a dishwasher.

Back soon…

Leave a comment