The Modern Dataist

Shut up and play yer guitar…

I\’ve had the rare luxury of being away from work this week, and boy did I need that !

Apart from the opportunity to reconnect with parts of life that had been pushed to the side somewhat (and hack down the wild Eucalyptus tree in the garden), I\’ve also had the chance to be a bit philosophical about SQL.

Does that make me sound like a basket-case ? (Don\’t answer that !)

Here\’s the nugget that I\’ve come up with for your consideration:

\”It depends.\”

As I\’m sure you know, this is the DBAs answer to a straight yes/no question, as in:

\”Would using a CTE and a join improve performance ?\”

\”It depends….\”

The philosophical part was wondering if there were any other walks of life where ambiguity was baked-in in the same way. So, I picked up my guitar, and went in search of enlightenment. And a cold one (I\’m on holiday, after all).

Little did I realise, but the answer was right in front of me. Literally.

Guitar chords are almost exactly like SQL, in that the chord you use depends on the sound you want to make at that point in the tune. And I\’m not talking about different chords (like a G or an A Minor), but variations on the SAME chord.

As evidence of this, open your favourite search engine. and look for the chord patterns for E/G# (or look at this site, if you need instant gratification).

E/G# sounds like quite a specific chord – it\’s an E, with an added G#. But there are over 50 ways of making that sound. And which one you use is predicated on several things:

Does it make the right sound ?
How hard is it to play ?
What is are the preceding and subsequent chords ?

I could go on about how to place your fingers on the fretboard to play the chords (barre, half-barre, double-jointed, etc), but I won\’t. I\’m sure you get the point.

So, when faced with the question about CTEs, it\’s perfectly valid to say \”It depends…\” as long as you\’ve got a list of things to take into consideration when making the choice.

For the record, I\’d look at what is actually being attempted, indexes, aggregations, table-valued parameters, CROSS APPLY,…..

Maybe I should just stick to \’Hallelujah\’ by Leonard Cohen. The version I\’m playing has an E/G#. Other versions don\’t. Which one is right ? I kind of think it depends….

Back soon !

As one door closes, another one opens.

Further to yesterday\’s post about remembering to tidy up database users, I thought I should mention something else.

It is entirely possible to create a user in a database that is associated with a Windows login WITHOUT adding the Windows login into master. And this DOES look a bit like a security hole in SQL Server (or possibly something that DBAs should be aware of, just in case).

Anyone who has \’dbo\’ rights to a database can create a user in that database, as we know, but they don\’t have rights to create logins in SQL Server. There is, however, a way of circumventing this.And here\’s how to do it….

In SMSS object explorer, choose any non-system database you like, and expand \’Security\’ and then \’Users\’. Bring up the dialog box to create a new user….

In the UserName field, pick a valid Windows user (perhaps someone you know). Or, if you\’re into keeping database users and logins distinct, type your own style of user name in the box.

OK, moving on to the LoginName field, here you would normally choose the […] box to pick and existing login to associate with your sparkly new database user.

You don\’t have to, though. You can just type in any valid Windows login. And that\’s the hole.

PBrush

When you create the user, SQL will query Active Directory for the SID of the domain user that you entered. If it\’s valid, the new database user is created as normal, and the correct entry is added sysusers.

If you check the properties of the newly created user in SMSS, you\’ll see that there is a mapping between the UserName and the LoginName as normal (please excuse the crappy redaction – there\’s a recession on):

PBrush

…even though there is no corresponding login in master (again, sorry for the redaction):

PBrush

If you go back to the database you created your new user in, and SELECT * FROM sysusers, you\’ll see your Windows login in there, as well as the SID that SQL helpfully picked up from AD.

Going back to the post the other day, this now means that if the Windows user you used is a member of a group that has access to your server (but not the database you\’ve just \’created\’ them in), they will be able to access your database as well.

….which looks like a bit of a loophole to me…

So, I guess the morals of this story are \”Be careful who gets \’dbo\’ rights to any database\”, and \”Keep a close eye on all your database users AT ALL TIMES\”.

But I suppose you\’re already thinking of writing a script or policy that would remove any explicit database users who don\’t have a corresponding login. 🙂

Good luck with that….

Back soon !

Sharepoint – many data files make light work

I recently had the pleasure of attending a course on Sharepoint. Specifically, WSS 3.0 and MOSS 2007, if you must know, but that\’s not really important right now.

The important bit was this: The tutor asked the group, \”How many people are here because you company is thinking of implementing MOSS 2007 / 2010, and you\’d like some more information on how to do it ?\”.

Cue tumbleweed moment – in a group of some 14 delegated. No-one was there for that reason…..

The next question the tutor asked was, \”How many people are here because your company bought a point-solution from a 3rd party that uses Sharepoint, and now you\’ve got performance problems ?\”

What\’s the opposite of \’tumbleweed moment\’ ? I\’m not sure, but that\’s the response he got. There were a few rueful grins, a few nodded heads, and I think I heard someone say \”Thank you, <> !\”

I recently had the pleasure of implementing some of the performance tweaks gleaned from that course (with an exceptionally talented colleague), and thought I would share one of them with you – Splitting the ContentDBs onto multiple filegroups with multiple files.

Having plumped for moving all user-generated content into it\’s own 4-file filegroup, this is what we did:

The process is actually quite simple, but you have to interrupt the normal process flow when creating a Content Database. The default is to let Sharepoint create the database and add all the required objects. Here\’s how do it the DBA way….

First, you need to create the database for Sharepoint to attach to.

CREATE DATABASE [<>] ON PRIMARY
COLLATE Latin1_General_CI_AS_KS_WS

The important point here is the collation, so don\’t leave this line out – Sharepoint will tell you soon enough if you\’ve got it wrong.

And you\’ll also need to map the Sharepoint Admin account to \’dbo\’:

USE <<contentDBName>>;
GO
if exists (select name from master.sys.databases sd where name = N\'<>\’
and SUSER_SNAME(sd.owner_sid) = SUSER_SNAME() )
EXEC [<>].dbo.sp_changedbowner @loginame=N\’DOMAIN\\YourSPAdminServiceAccount\’, @map=false;
GO

Next, ask the SP admin to attach the database to Sharepoint. This will create all the required database objects, including those you want to move. It doesn\’t take long, so resist the urge to go walkabout…

Once these have been done, the new database will need to be detached from Sharepoint (not SQL Server) so you can make your amendments. So ask the Admin to that. If it\’s you, ask yourself nicely.

Now you have exclusive use of the database, you can add a new filegroup and data files, and then move (in this case) all the user-generated content tables to the new filegroup, in this case called [UserData]. Please note that Microsoft recommend having multiple files in the PRIMARY filegroup and leaving it at that (but, hey, it\’s only a recommendation). Note that this part of the scripting process is in SQLCMD mode, so you\’ll need to change variables as appropriate, including those associated with SIZE , MAXSIZE and FILEGROWTH , to fit your needs:

— USE SQLCMD mode !
— and change the DataPath and LogPath accordingly
:SETVAR SPContentDBName         <>
:SETVAR DataPath                        Drive:\\DataPath\\
:SETVAR LogPath                         Drive:\\LogPath\\

–Do not change anything below this line –>
——————————————————

USE [master];
EXEC (\’ALTER DATABASE \’ + \’$(SPContentDBName)\’ + \’ ADD FILEGROUP [UserData]\’);
EXEC (\’ALTER DATABASE \’ + \’$(SPContentDBName)\’ + \’ ADD FILE ( NAME = \’\’\’ + \’$(SPContentDBName)\’ + \’_001\’\’ , FILENAME = \’\’\’ + \’$(DataPath)\’ + \’$(SPContentDBName)\’ + \’_001.NDF\’\’ , SIZE = 102400KB , MAXSIZE = 12058624KB , FILEGROWTH = 102400KB ) TO FILEGROUP [UserData]\’);

EXEC (\’ALTER DATABASE \’ + \’$(SPContentDBName)\’ + \’ ADD FILE ( NAME = \’\’\’ + \’$(SPContentDBName)\’ + \’_002\’\’ , FILENAME = \’\’\’ + \’$(DataPath)\’ + \’$(SPContentDBName)\’ + \’_002.NDF\’\’ , SIZE = 102400KB , MAXSIZE = 12058624KB , FILEGROWTH = 102400KB ) TO FILEGROUP [UserData]\’);

EXEC (\’ALTER DATABASE \’ + \’$(SPContentDBName)\’ + \’ ADD FILE ( NAME = \’\’\’ + \’$(SPContentDBName)\’ + \’_003\’\’ , FILENAME = \’\’\’ + \’$(DataPath)\’ + \’$(SPContentDBName)\’ + \’_003.NDF\’\’ , SIZE = 102400KB , MAXSIZE = 12058624KB , FILEGROWTH = 102400KB ) TO FILEGROUP [UserData]\’);

EXEC (\’ALTER DATABASE \’ + \’$(SPContentDBName)\’ + \’ ADD FILE ( NAME = \’\’\’ + \’$(SPContentDBName)\’ + \’_004\’\’ , FILENAME = \’\’\’ + \’$(DataPath)\’ + \’$(SPContentDBName)\’ + \’_004.NDF\’\’ , SIZE = 102400KB , MAXSIZE = 12058624KB , FILEGROWTH = 102400KB ) TO FILEGROUP [UserData]\’);

USE $(SPContentDBName);

Next, you\’ll need to script out all the tables that you want to move to the new filegroup. I did prepare a script by way of demonstration, but it\’s a long one and a little off-putting. Suffice it to say that you need to script out dropping all the constraints on your target tables, dropping the tables you wish to move (in this case AllDocs, AllUserData, AllLists, AllLinks, AllDocVersions and AllDocStreams), and another script to recreate those tables which is modified to place them on the [UserData] filegroup, not [PRIMARY]. Don\’t worry, the process to create the database objects that\’s run in Sharepoint doesn\’t put any data in these tables.

Having done that, you can then ask the SP Admin to reattach the content database to Sharepoint. Any you can finally go walkabout, because what people put in Sharepoint isn\’t really up to you, all you want to do is make sure you\’ve got optimal performance. Isn\’t it ? 🙂

In case you\’re interested, along with the other performance changes we made, we calculated that over 200 man-hours per day were saved by not having to wait for Sharepoint to deliver content to the end-user. OK, so it\’s a second or so per page, but there\’s over 450GB of data in > 25 content databases running of a single WebApp and a few Web Front Ends, and over 4 million documents… and a user base of over 15K !

Back soon !

Please close the door behind you…

Sometimes, you come across a situation that you know you should know the answer to, but can\’t actually put your finger on. But something tells you that its fundamental knowledge. And something else then starts kicking you until you re-learn that fundamental knowledge.

Like the other day, when someone asked me \”If someone is only a member of a Windows group that has access to Database A, but they are defined as a User in Database B, can they access data in Database B ?\”

And the answer is \”yes, they can\” (but I\’m guessing that they probably shouldn\’t be able to). And I had trouble remembering that, since it\’s not something I get asked very often 🙂

Presumably, this happens in the development / early production stages, when nervous developers insist on having \’dbo\’ permissions to their baby database in their own right (rather than having a proper NT group for support), and then their login gets dropped when the service has been in for a while and is stable.

I\’m sure this happens quite a lot (and from a bit of digging, a bit more often than I originally would have thought), and those thoughtful guys at Microsoft even put a dialog box up when you delete a login advising you that you still need to delete any database user associated with the Windows login. What\’s not 100% clear from that message it that database permissions for the user will remain, and the end-user can still get access to the data even if their login does not explicitly exist on the SQL server.

Is this a bit of a security hole ? Perhaps it is. But then again (as the helpful message intimates), SQL Server cannot take arbitrary decisions about who owns what object in the database (or indeed whether any object should be removed). It\’s down to the DBA to make those decisions according to each circumstance, and with careful consultation with the end-users – you don\’t really want to drop a table that other people or services rely on day-to-day….

So, next time you get asked to drop a login for a Windows user or NT Group, remember to tidy up all the permissions in all the databases that the login had access to before you get your next latte / mocha / cold one. It\’ll save embarrasing questions about unauthorised data access later on, and it\’ll stop things kicking you in the head when you can\’t remember fundamental knowledge. Which is a win on both counts !

Certainly, from now on I\’ll be double checking that all permissions are revoked when a login is dropped. Especially on the HR and Financial systems.

Back soon !

Losing your \'master\' database. Only parts of the world have ended…

Losing your \’master\’ database. Only parts of the world have ended…

One of the questions that DBAs automatically know the answer to automatically is:

\”If \’master\’ is unavailable, then is it game over ?\”

Until today, I\’d have to answer with a resounding \’YES\’, because that\’s all that I knew.

There was a situation that surprised me today, in that SQLServer CAN stay up without a \’master\’ database if certain conditions are met. I know, it sounds weird, but as long as nothing touches the database (for example, to authenticate an incoming connection) then it will quite happily stay up for a while without major issue.

This isn\’t heresy – it happened. I kid you not. And here\’s how….

Going back through the logs, I could see that there were no events in the error log after about 11:00 (that is, until another DBA restarted the instance to get service back up and running). There were no \’login failed\’s, no \’Log was backed up\’, nothing. And yet there were no service incidents raised until about 12:15. So, let\’s dive into the Application Event log to see what was going on between 11:00 and 12:15.

It turns out that a chkdsk was run (not deliberately, I assure you, and certainly not by me) on the drive that holds the SQL binaries and the data files for \’master\’, \’MSDB\’ and \’model\’:

Event Type:     Information
Event Source:   Chkdsk
Event Category: None
Event ID:       26212
Date:           24/01/2011
Time:           11:00:46
User:           N/A
Computer:       ExampleServer
Description:
Chkdsk was executed in read-only mode on a volume snapshot.

Now, I\’ve not checked up on this (my bad), but I\’m fairly sure that running chkdsk on a drive that holds running instances of SQLServer is probably not recommended ! Notwithstanding that point, it was run, and a couple of minutes after this event, this was logged:

Event Type:     Information
Event Source:   Chkdsk
Event Category: None
Event ID:       26214
Date:           24/01/2011
Time:           11:02:10
User:           N/A
Computer:       ExampleServer
Description:
Chkdsk was executed in read/write mode.

So it would appear that chkdsk was executed again with the /f switch to fix inconsitency errors reported by the first run. It was almost exactly at this time that SQL events stopped being logged.

The first we knew of the problem was at about 12:15, when calls started to come in indicating that the \’master\’ database was unavailable.

It is my theory that the chkdsk process \’fixed\’ a detected error in the database physical files which led to SQL not being able to access it properly. Reason: If there\’s no reason to talk to the master database for anything, then it behaves pretty much like any other database that\’s not being accessed – it stays dormant, can be offlined, whatever, and you only find out when you need to use it again and it\’s unavailable. The service that was running on the instance already had all of it\’s database connections in place, and was already authenticated when the \’corruption\’ happened. I think we also lost MSDB by the same method at the same time, so no jobs could run that would invoke any xp_s that are held in master. It was only when the connections from the application got recycled (forcing the re-authentication) that there was any visible issue at all.

So, by my reckoning, that tells me that SQL Server is capable of running without a fully functioning master database. Just how long for depends on what the instance is actually doing at the time.

But I could be wrong….

Back soon !

Surely it\'s just text data ?

Surely it\’s just text data ?

Right then.

I\’m at a bit of a loss on how to start this one….So here goes….

The last few posts have been about estimating the size of indexes (indices?), primarily driven by the challenge of having a multi-terabyte table in development. People are getting a bit nervous about how what \’multi-terabyte\’ actually means in dollar value and wondering how to shave off a couple of terabytes here and there.

Without going into too much detail about how the table will be physicalised (but you can bet that it\’ll be partitioned to death), I was looking into the various ways of storing textual data in this monster, and which one would use the least amount of disk space.

While doing this, someone mentioned that SQL Server stores XML data in a double-byte format if you use the native XML datatype. I thought that this was a bit odd, since surely the database doesn\’t care what the text actually is, and the fact that the character set used in XML is defined within the XML itself (in the \’encoding\’ tag). So I set up a simple test of this, just to make sure:

CREATE TABLE XMLTest
        (stdText VARCHAR (1000),
        xmlText XML,
        unicodeText NVARCHAR (1000))

GO

That gives me 3 columns to insert the same XML into using 3 different text datatypes, and since I\’m a completely lazy DBA, I wanted to reuse the CREATE TABLE instead of creating 3 tables 🙂 After each test, I simply dropped the table and started again – there didn\’t seem a lot of point in persisting the data between tests.

Here\’s the XML I wanted to insert (some of you may recognise this from Sams \’Teach Yourself XML in 14 Days\’ – an excellent publication, I might add):

\’

                Remember to buy some milk on the way home from work

\’

Since I\’m still a completely lazy DBA, using GO 5000 is a quick way of inserting a few thousand lines. Again, I\’m not interested in the uniqueness of each record, just the storage required to hold it (so shoot me).

Insert #1 goes like this:

INSERT XMLTest (stdText)
VALUES
(\’

                Remember to buy some milk on the way home from work

\’)
GO 5000

A quick sp_spaceused \’XMLTest\’ tells me that these 5000 rows take about 1480KB of disk space. OK, I\’m happy with that (for now) since it\’s just in a normal VARCHAR column.

Moving swiftly on to the next test – inserting 5000 rows into the native XML datatype…

This time, sp_spaceused \’XMLTest\’ tells me that exactly the same amount of data (albeit in a different datatype) requires about 2760KB of storage.

This in itself is odd, don\’t you think ? If SQL Server stores XML in double-byte, shouldn\’t it take twice the storage ? 2x 1480 is 2960, not 2760….

For the sake of my sanity, I also did the same inserts into a Unicode datatype, NVARCHAR. Unicode is double-byte, after all, so I\’m expecting these 5000 rows to take near-as-dammit 2960KB. Are you ?

Imagine, then, how perplexed I was after the unicode insert. This time, the sp_spaceused \’XMLTest\’ shows that 3400KB is used. For THE SAME DATA !!! Damn you, SQL Server storage for messing with my otherwise neatly ordered world view !!!

So, the moral of this story could be seen as:

\”If you don\’t need to store XML data in the XML datatype (ie You\’re not planning on doing any clever Xquery stuff on it), then don\’t – It\’ll cost you storage in the long run.\”

Although it\’s not for me to second-guess your design decisions. I just thought I should mention it….

Back soon !

Estimating the size of a non-clustered index

After all the fun and games when doing this for a clustered index, I said I\’d post up the non-clustered version if it was interesting.

Turns out, it\’s not that different from the clustered index version, so if you\’ve already gone to the trouble of doing that, then you\’re most of the way there.

Again, there\’s a very good MSDN HowTo about it at:

http://msdn.microsoft.com/en-us/library/ms190620.aspx

…so I won\’t repeat either that or my own calculations from before. One thing to bear in mind, though, is the provision for estimating any indexes with included columns (one of the nicest indexing changes in SQL2005, in my opinion), but it\’s all detailed in the HowTo.

Of course, you may not always have a clustered index on your table, but you\’ve at least got a primary key, so that counts 🙂

Oh, and just an aside about column nullability : I don\’t mean to teach you to suck eggs or anything, but please give nullability a thought. When you\’re estimating ANY index size, it\’s quite important.

If a column can contain nulls, SQL Server has to keep track of it using a null bitmap. This null bitmap has a storage overhead in the index, so when dealing with very large tables with several indexes, you might want to keep nullable columns to a minimum. I\’m just sayin\’…..

Of course, there is a whole other piece on SQL 2008 Sparse Column support, but that brings with it it\’s own idiosynchrasies (?) when it comes to indexing and data compression, so I\’ll attempt to cover those off in a later post. Maybe.

Back soon !

Estimating the size of a Clustered Index

OK, let me first start by saying that I\’m not a expert in SQL Server Storage by any stretch of the imagination, and I\’m sure you\’re aware that estimating ANYTHING in SQL is a bit of a dark art. That\’s got that out of the way.

I recently had occasion to estimate the size of a multi-terabyte database, and thought I should factor in the Clustered Indexes (Indices?) to be a little more sure of the numbers. After wishing that I hadn\’t agreed to do this, I set to, attempting to make some sense out of this article:

http://msdn.microsoft.com/en-us/library/ms178085.aspx

I advise you to read this, and then lie down in a darkened room for a while. Then perhaps grab something to eat / watch a movie / take a bath / whatever you need to do to give you the strength to read it again.

Thankfully, I\’ve distilled this for you into a form that can easily be reproduced in your Office Spreadsheet program of choice.

Storage required @ Leaf level
Rows    1,000   Number of rows in the table
Num_Cols        1       Total number of columns (fixed and variable length)
Fixed_Data_Size 0       Total byte-size of all fixed length columns
Num_Variable_Cols       0       Number of variable length columns
Max_Variable_Size       0       Maximum byte-size of all variable length columns
Null_Bitmap     3       2 + ((Num_Cols + 7) / 8)
Variable_Data_Size      2       2 + Num_Variable_Cols * 2) + Max_Var_Size
RowSize 9       Fixed_Data_Size + Variable_Data_Size + Null_Bitmap + 4
Rows_Per_Page   736     8096 / (Row_Size + 2)
Fill_Factor     80      index fill factor
Free_Rows_Per_Page      147.2   8096 * ((100 – Fill_Factor) /100) / (Row_Size + 2)
Num_Leaf_Pages 2       Num_Rows / (Rows_Per_Page – Free_Rows_Per_Page)
Leaf_Space_Used 16384   8192 * Num_Leaf_Pages

Storage for Index information
Num_Key_Columns 1       Total number of key columns (fixed and varialbe length)
Fixed_Key_Size 0       Total byte-size of all fixed length key columns
Num_Variable_Key_Cols   0       Number of variable length key columns
Max_Var_Key_Size        8       Maximum byte-size of all variable length key columns
Index_Null_Bitmap       3       2 + ((Number of columns in the index row +7 ) / 8)
Variable_Key_Size       10      2 + Num_Variable_Key_Columns * 2 ) + Max_Var_Key_Size
Index_Row_Size 18      Fixed_Key_Size + Variable_Key_Size + Index_Null_Bitmap + 1 + 6
Index_Rows_Per_Page     405     8096 / (Index_Row_Size + 2)
LOG function    2.607455023     LogIndex_Rows_Per_Page
Non_leaf_Levels 2       1 + LOG Function * (Num_Leaf_Pages / Index_Rows_Per_Page)
Num_Index_Pages 2       Num_Leaf_Pages/(Index_Rows_Per_Page^2)+ Num_Leaf_Pages/(Index_Rows_Per_Page^1)
Index_Space_Used        16384   8192 * Num_Index_Pages

There are a few things to note here.

Firstly, this calculation estimated the storage required for data at the leaf level – essentially, all the data in the table. Adding the clustered index \’changes\’ the tables into a more ordered structure (please don\’t shout at me about this, I\’m trying to make it easy to understand, not 100% accurate – I said I wasn\’t an expert).

If you\’ve already got a better way of estimating the sizes of a heap, then feel free to stick with that (I did in my example), but you\’ll need some of the numbers in the top half to calculate the bottom half !

Most of the math is fairly self-explanatory, so I\’ll not go through that. The article referenced above gets complicated at around Section 7, so if you split out the LOG function into a separate calculation, it\’ll make your life a bit easier (as in the table above).

The bit I really wanted to mention is the calculation of Num_Index_Pages. And here\’s the rub: You have to raise the Index_Rows_Per_Page to the power of whatever the number of Non_Leaf_Levels is, then add that that to the next bit, which raises to the power of Non_Leaf_Levels -1, and you continue adding these together until the substituted number of Non_Leaf_Levels = 1. Not a problem when you have only a few Non_Leaf_Levels……

However, one table I was dealing with had > 8000 Non_Leaf_Levels. Thinking \’There goes my weekend!!\’ , I headed out for a bite to eat so I didn\’t have to think about it for a while.

And then a dirty hack came to me…

I\’m not proud of it, but it got me my weekend back….

And here it is……

OK, for each time you do the Num_Leaf_Pages / (Index_Rows_Per_Page ^ whatever), you then round up the answer to the nearest whole number. You do. Believe me. It\’s in the article. Bit beginning \”For each of the summands…\”. See ?

What this means is that you\’re actually only interested in any calculation here that results in a value > 1. Everything else will just be 1. All you have to do is work out (somewhere else) which power your raise to gives the first answer > 1, and then just add all the other calculations as a single number.

For example, for a table that has 4135 Non_Leaf_Levels, your calculation becomes (and I\’m using a common spreadsheeter):

4133+ROUNDUP((N19/(N28^2)),0)+ROUNDUP((N19/(N28^1)),0)

…because the previous power (3) resulted in a number less than 1.

Sneaky !

And I\’ve tested it out using an actual table with real data, and the estimation is almost spot-on. There\’s a whole other piece in Estimating the size of a non-clustered index. If that gets as exciting as this, then I\’ll post that up as well.

Back soon !!!

Column-level encryption – A useful example

SQL Server BooksOnline is a good thing. Actually, let\’s face it, BOL is a GREAT thing that helps any DBA do their job.

My only complaint (and it\’s a minor one at that) is that sometimes you just need to know how to do something, not be given pointers in the right direction. OK, OK, so a little learning doesn\’t hurt anyone, but when you\’re in a rush etc….

I had reason to start looking at encrypted columns the other day. It\’s the sort of thing that you know exists, but you don\’t really have much cause to use it day-to-day – a bit like the WorkMate that\’s in the workshop, but you never get round to using properly.

So, like the good DBA that I am (!), I go hunting around in BOL for information on column encryption, and what I get (because I\’m in a rush and haven\’t got time to trawl the message boards and forums) is details on how to create a column that holds encrypted data from an EXISTING column. But I haven\’t got an existing column I want to have an encrypted copy of, I want to create a whole new column that has encrypted data in it from the get-go.

Cue a merry half-hour tweaking the examples and producing the following statements, which I hope you find useful:

Firstly, you\’re going to need a Master Key (if you don\’t already have one)…

CREATE MASTER KEY ENCRYPTION BY
PASSWORD = \’YourPasswordHere\’
GO

…and you\’ll need a certificate to go with that…

CREATE CERTIFICATE YourCert
WITH SUBJECT = \’Certificate for testing column encryption\’;
GO

…and a symmetric key…

CREATE SYMMETRIC KEY Symmetric_Key_01
WITH ALGORITHM = DES — Example in BOL was AES-256, but that\’s not supported on xp (yes, this was done on SQL2008 R2 Developer Edition running on Windows XP. Sorry about that. But, hey, it works !

ENCRYPTION BY CERTIFICATE YourCert;
GO

(NB: Key management and encryption algorithms are in themselves huge topics, so I\’ll not cover the reasons why you need these things or how you should secure them / back them up, if you don\’t mind)

OK, now you\’ve got all the bits you\’re going to need, so let\’s open the symmetric key so we can actively use it…

OPEN SYMMETRIC KEY Symmetric_Key_01
DECRYPTION BY CERTIFICATE YourCert;
GO

Still not quite fully set-up yet. You\’ll need a table in your test database to hold the data in its encrypted format. Please note that the datatype for encrypted data is VARBINARY, and you\’ll need to set the length according to the data that you\’ll be encrypting. VARBINARY (256) is more than enough for this example…

CREATE TABLE YourTestDB.dbo.CipherTable
(cipherData VARBINARY (256))
GO

You\’re now ready to insert some encrypted data into the table. What\’s important here is the ENCRYPTBYKEY function. You could always use the GUID of the key explicitly here, but you may expose it to people that you may not intend to, so it\’s probably more secure to use the Key_GUID function (feel free to disagree with me about this)…

INSERT YourTestDB.dbo.CipherTable
        (cipherData)
        VALUES
        (EncryptByKey(Key_GUID(\’Symmetric_Key_01\’), \’myEncryptedTextString\’))
GO

OK, you now have \’myEncryptedTextString\’ encrypted in your table. Sure ? Really sure ? If you don\’t believe me, try…

SELECT cipherData FROM YourTestDB.dbo.CipherTable
GO

This should be a whole set of binary. Enjoy.

Wait a moment, I hear you cry, are you really sure your data\’s there ? Yes, it is. To read the string unencrypted, try…

SELECT CONVERT(varchar, DecryptByKey(cipherData))
FROM YourTestDB.dbo.CipherTable
GO

And there you have your un-encrypted string. Nice.

Also, notice that you don\’t have to supply the symmetric key name, the certificate name, or the master key to read the data back out.

Oh, and you might want to tidy up when you\’ve finished the demo….

DROP TABLE YourTestDB.dbo.CipherTable
GO
CLOSE SYMMETRIC KEY Symmetric_Key_01
DROP SYMMETRIC KEY Symmetric_Key_01
GO
DROP CERTIFICATE YourCert
GO
DROP MASTER KEY
GO

Hope this helps you out a bit.

Back soon !

SQL Server Codename \'Denali\' CTP1

SQL Server Codename \’Denali\’ CTP1

OK, so I missed all the excitement at PASS the other week. Blame that on budgetary cuts (and the fact that I\’m entitled to a vacation, dammit).

The standout piece from that was the official announcement that SQ Server v.Next (AKA \’Denali\’) went to CTP1. If you\’ve kept up with some of the output on Twitter etc, I don\’t need to tell you about how important that is.

What am I most excited about ? Short answer is \”I don\’\’t know yet, but I\’m sure as hell going to find out in the next few weeks\”.

What I am thankful for is an ISP that gives me unlimited downloads – it took me 4 trys to download over the home wireless, and eventually I gave in and hard-wired into the router (but that kind of defeats the object of having wireless in the first place).

The CTP for Denali can be downloaded at this link:

http://www.microsoft.com/downloads/en/details.aspx?FamilyID=6a04f16f-f6be-4f92-9c92-f7e5677d91f9&displaylang=en

Have fun ! I know I will.

Back soon !!!