The Modern Dataist

SELECT asterix

It’s not what you say, it’s how you say it…

Roman history

When I was a slip of a lad, I spent waayyy to much time reading Goscinny and Uderzo’s Asterix books. I loved the idea of this little village in Normandy full of larger-than-life characters always getting one over on the Romans.

Of course, I read these in English (although they were written originally in French), but the translators did an excellent job of turning french idioms into relatable jokes.

So much did I admire this skill that I originally intended to be a Translator for the UN, but there just weren’t the right circumstances for that (like having a degree in French, for one). No matter, being a data nerd works for me, too.

What does this have to do with data ?

So, when looking at wierd-ass T/SQL, PowerShell, gnarly execution plans or whatever, it helps if you can visualise the intent of what’s in front of you and then apply your ‘local‘ knowledge to tune, improve and optimise – you know how the platform works.

It’s a skill that all DBAs should have – being able to look at code and feel what’s wrong. Developers (and some DBAs) call this ‘code smell’ – when something’s not quite right and there’s a challenge there to find out what.

Right, so you’re very clever. What next ?

It would be a very sad state of affairs if the DBA just made the changes and released the code to production – it’s happened in the past and is pretty much guaranteed to annoy the developers – and hope that the changes make it back into the source if anyone bothers to ask.

So, we use source control to start the conversation with the team. There are myriad ways of doing this, and I’ll leave the mechanics of that to you. It’s not why I’m writing this, and hopefully not why you’re reading it.

What happens next it the tricky part – the softer skills you need to be able to ~~artiutlate~~ ~~atruicutate~~ articulate your reasons and listen to the responses you get. Being able to have technical conversations at the appropriate level while making (or retaining) good relationships.

There’s something else – what are you not saying ?

OK, this is where I make the connection between Asterix, code, and relationships.

In today’s remote-working environments, we deal with any number of people from different backgrounds and cultures, all with their own back-story and ways of doing things. It’s rare for anyone to see eye-to-eye with someone else immediately (unless they’re also a DBA), so we need to change what has been written in code into something that is readily explainable and, more importantly, alter that message according to the audience. Exactly like the translators of Asterix took the original text and images, changed some of the words, localised the content and delivered it back with the original artwork – message substance unchanged, but more relatable.

When approaching a new project or technology, or talking concepts and architecture, I find it much easier to explain things with pictures (I’m loves me a whiteboard, I doos). There are universal signs for ‘database’ and ‘server’, but I once explained the concept of a data lake using a Fire Engine and a simple sewer system (you might want to talk to me offline about that one).

What ??!!??

I guess what I’m really saying here is that optimising code is hard, but talking about code and concepts is harder, and needs to be very carefully handled. If you’re talking about replicas in MongoDB to a Project Manager or Solution Architect, it’s probably best not to bang on about partitions, surrogate keys, and JSON too much – try to find a concept that’s relatable.

Again, what ??

Replicas in MongoDB is a bit like a supermarket chain. All the products that the supermarket chain sells are delivered to a distribution center, and then identical items are dispatched to each supermarket so that:

Each supermarket has exactly the same products on offer
Customers can go to the local supermarket to get any product they need that that the supermarket has in stock
New products are delivered to a central location and automatically distributed.

That’s probably over simplifying things for a technical audience, but hopefully you get the idea.

Executive Summary

A good DBA knows their platform well, and can achieve performance miracles on an almost daily basis

A great DBA can explain what concepts are important and start to embed ‘best practice’ by advocating for the platform to a wide variety of people with different levels or ~~abilties~~ ~~ablitbe~~ skills.

A superlative DBA can use images and language to take people with them to shape that best practice and sharpen those skills.

I hope one day to be superlative.

Thanks for reading.

Database files and free space on mount points

OK, so there are several ways to crack this particular nut, some more elegant than others.

You can use WMI (oof), PowerShell (oofoof) or maybe even SQLServer DMVs (tell me more).

I’ve never really got to grips with PowerShell – its on my list, for sure – but there are some occasions where PowerShell might not work for you, for example in restricted environments where you don’t have access to the OS directly. Here’s a way of getting database and mount point space information if you only have ‘public’ access to master…

DMVs are one of my favourite things ever. There’s just so many of them these days and we’re getting more all the time. My *other* favourite thing is CROSS APPLY – also known as the ‘fairy join’ – because I don’t know what it does and yet I can use it to get information that might be hard to join up otherwise !

Let’s put these two to work together to get a list of databases files, where they are, and how much space there is free on the disk where they reside:

–All database mdf mount points and disk free space in MB

SELECT DISTINCT db_Name(f.database_id) AS [Database], volume_mount_point, (total_bytes / 1024) /1024 AS [Total MB], (available_bytes / 1024) / 1024 AS [Available MB]

FROM sys.master_files AS f

CROSS APPLY sys.dm_os_volume_stats(f.database_id, f.file_id)

WHERE f.file_id = 1

Database	volume_mount_point	Total MB	Available MB
tempdb	H:\\SQLTemp\\	557733	545029
MyDB1	H:\\SQLData\\	12046999	346111
MyTestDatabase	H:\\SqlData\\	12046999	346111
OtherDB1	H:\\SQLLogs\\	334638	142694

You’ll see here that I’ve taken the easy route and used file_id = 1. That’s the MDF – the primary database file – and you *must* have one of those. Oh, and I can see that someone created a database with its MDF on a mount point intended for transaction logs. I’ll follow that one up later…

A small tweak, and you’ve got the same information for log files (file_id = 2, the LDF);

Database	volume_mount_point	Total MB	Available MB
tempdb	H:\\SQLLogs\\	334638	142694
MyDB1	H:\\SQLLogs\\	334638	142694
MyTestDatabase	H:\\SQLLogs\\	334638	142694
OtherDB1	H:\\SQLLogs\\	334638	142694

That’s better – everything’s lined up nicely and where it should be, except maybe tempdb’s transaction log appears to not to be on dedicated disk. Another item for the backlog…

You might be wondering why I’ve used the DISTINCT keyword in the script. That’s because if you have multiple NDFs (secondary database files) you only really want the one entry back. I haven’t included a script for NDFs (file_id > 2) because I’m not in control of where *you* put them. Besides, giving you everything takes some of the fun out of it, don’t you think ?

Back soon…

How to **** everything up so far it ceases to be funny

So, apropos of a complete breakdown in service, here\’s a few handy tips and action points to make sure your legacy lives on in the most psychologically painful way possible.

They also have the useful side effect of demoralising your team so much that they actively consider leaving a good job and a great employer because they\’re unchallenged, unvalued, and unheard.

When choosing a new technology for your product or service, actively consider how badly you messed it up last time, and seek at all times to replicate those \’winning\’ strategies.
Consider how the new technology can be bent to your will.
How will you hobble performance ?
How will you make your new design less resilient ?
What\’s the lowest level of function that you can provide and could you push that bar even lower ?
When putting together your team, get the brightest minds and force them into scoping meetings that last for weeks but don\’t actually have an output.
Getting money to do things is hard – always try to foster a \’who will pay for this?\’ mindset.
Embrace new ways of working, but only as far as they support and perpetuate your existing business practices – no one likes change.
If change *must* happen, make sure it\’s delayed for as long as possible, but announce delays in small increments (or better still, not at all).
If your customer is important, make sure that everything is process-centric – consistent customer service is better than good customer service.
You\’re a person outside of work – when you enter the workplace, don\’t let on that you\’re a functioning member of society.
Never ask a direct question – words mean far more than actions.
If buying a product off-the-shelf, make sure that customisations you implement mean that it can never be upgraded – you made it better, so that\’s how it\’ll stay. This is especially important if considering SaaS.
Ask your user community what features of technology they\’d like to see, then implement policies or processes that mean those goals can never be reached.
In the event of your solution being accidentally viable, create management and authorisation frameworks so labyrinthine that people give up trying to request enhancements or alterations.
Remember, even a delivery of 10% of the promised functions equates to a 100% successful delivery.
Celebrate success with members of staff who had little or no input and who in all likelihood were not even aware of your project.
If you\’re making progress, claim that one in front of senior management – your team is all behind you.
If you\’re not making progress, micromanage. Two-hourly status calls are a great way to focus the team on solving problems and removing time-sapping blockers.
When preparing presentations, consider your audience. Put as many words on a slide as you can, and ensure each slide contains information on at least three unrelated topics.
When presenting, always stand behind your screen. Make sure to say \’I\’m not just going to read from the slide\’ and then read from the slide.
Block-book 15-seater meeting rooms well in advance, then use them for 1-to-1\’s every other week.
Bookend each workday with all-team meetings with titles like \’DIARY-BUSTER\’ and \”ALL MUST ATTEND\’.
Send those awkward \’bomb-shell\’ emails out of hours. Copy in as many people as you think might make it look important.
Above all, remember there is no \’I\’ in \’Team\’. There is an \’I\’ in failure, but that\’s not an option.

This is by no means an exhaustive list, but hopefully they can help you be a better version of you if you\’re into that kind of thing. Or you could just have read it, recognised some behaviours, smiled nervously, nodded sagely and moved on.

The decision, as they say, is YOURS !

Back soon….

Giving back. Literally….

So, it\’s been a while.

Not that I\’ve not had anything to say, but, you know, work-life balance and all that good stuff….

One of the things I have been doing is attending some of the wonderful community events that we enjoy in the #sqlfamily.

They\’re great for learning, meeting people, and generally doing nerd stuff. I heartily recommend any of them, but particularly the SQL Saturdays that seem to be popping up all over the world.

I don\’t get to travel outside my country that much, and when I do it\’s normally on holiday so geekery is not 100% allowed. However, I *do* like to help out at community events (PASS Summit, SQL Bits, Data Relay, and the like), and there\’s a part of me that *hates* not helping out when I see an event is on. Heck, I\’ve even recently been moderating sessions for PASS Marathon and #24HOP….

Which brings me to the reason for this post.

See, turns out that these events need sponsors. Sponsors have money because they want to showcase their products to you, and they *love* coming to community events, but for most of us the contribution required is far in excess of what we could do in a personal capacity. And some time, events just can\’t find enough corporate sponsors due to various geographic, logistical, or availability reasons, and that puts the event in jeopardy. But there\’s another way we can help, and every little helps, right ?

TL:DR – I just sponsored SQL Saturday Austin as a blog sponsor.

It\’s sure to be epic, and if I can\’t be there in person I can at least still contribute to it.

And that is giving back to the community in quite a nice way, I think.

Back soon…(er)…

Recalcitrant software

Visual Studio is 21 years old ! Hooray (said no-one, ever), it can now legally consume alcohol in most places on the planet.

As you will know by now, I\’m a data guy – plain and simple. I\’ve tried to use Visual Studio since I bought the Student Edition of Visual Studio 6.0, but somehow we\’ve never clicked, if you\’ll pardon the pun.

It\’s not that I\’m stupid, or that the software is inherently badly designed – 21 years is a long time for any one name to stick around (Question: if VSTS has been renamed to Azure DevOps, is the writing on the wall for the Visual Studio brand ?)

It\’s just that we don\’t get on. If there\’s someone in your workplace that you can\’t explain why you don\’t like, you can make a good show of being pleasant to them and in front of others while secretly hoping that your paths don\’t cross that often because it\’s always awkward and frustrating.

With Visual Studio, though, we *have* to spend time together ☹. Specifically because someone thought that data-based processes like ETL, cube design and report authoring would be a good fit for software that most people can\’t even get to MsgBox \”Hello World!\”, and then set about making that nightmare a reality.

Permit me to step back in time a bit…. Getting a bit misty eyed with nostalgia…

Back in the day, there was *one* way to get data out of / into SQLServer – the Bulk Copy Utility, or bcp.exe.

Actually, there were a few ways, but you used them at your peril. bcp was the only one you could trust. In fact, it was so successful, it\’s still around in SQL 2017….

SQL Server 7.0 brought us Data Transformation Services, an add-in to SQL Server Enterprise Manager that you got by default, and then everyone started talking ETL. (Question: Does anyone remember what people used for Reports back in 1997 ? Was it Crystal Reports ?)

It was accessed from here:

And looked like this:

Simples, no ? Actually, this kind of basic functionality is still in SQL Server Management Studio – if you\’ve create a maintenance plan, you\’ve used the great-great-great-great-great-grandchild of the original DTS…

See that ? Dts… lovely…. 🙂

OK, so, moving on, SQL 2000 didn\’t change much, to be honest. Then next step was SQL 2008 Business Intelligence Development Studio, (or BIDS for short)….

BIDS was an optional install from the source media for SQL 2005 client tools, and brought the design surfaces for Reporting Services, Analysis Services, and the newly labelled Integration Services into one place. Seems fair…

And then it all started to go a bit wrong, didn\’t it, Microsoft ? Yes, I\’m looking at you….

The successor to BIDS was SQL Server Data Tools – a new, stand-alone product, outside of the SQL Server installation media, that was completely independent of the database engine. There was much rejoicing when Juneau (the codename for what was to be SSDT) was shown in 2010, and as a stand-alone it was pretty awesome. The installer dropped a Visual Studio Shell to host the software, but that was OK, we were all friends back then.

By December 2011, though, cracks had started to show. Bill Gibson at Microsoft had to publish an article when people complained that SSDT was missing from Visual Studio after running the installer.

Then SSDT went to regular releases, meaning you had to go through all this pain again every time a new version came out. At this point, I kinda stopped using it, because other work, but recently I\’ve had to go back to SSDT, and I have to say, I\’m not impressed.

I chose this time to rip out all my previous versions of Visual Studio, be they full-fat or integrated shell, an install a shiny new version, Visual Studio 2017. I\’d heard it had some extra data bits in it, like Azure, Analytics, Data Science and Python – right up my modern street !

Imagine my joy when I saw this in the Visual Studio installer options:

Happy-happy-joy-joy, running-man dance, all that stuff !!! YES !! SSDT is now an option to install as part of the Data Storage And Processing workload !

Could this be the nirvana I had been seeking ? SSDT fully integrated into Visual Studio Enterprise ? The ability to create database projects *and* integration services projects delivered out-of-the-box with no additional installs required ?

No.

Despite the installer saying it will install SSDT, you still have to download the installer if you want to use any of the data tools and then tell it to talk to Visual Studio…..

What\’s also slightly irksome is that in that same installer as the options for iOS, Android, and other mobile platforms, Azure, Python, and a whole host of things that Developers care about, but if Microsoft are going to force data professionals into the Visual Studio world (for ETL *and* analytics), then they should at least make the experience less infuriating.

I\’m kind hoping that the tools team behind the excellent VS Code and SQL Operations Studio will be turning their eye to SSDT soon – the time is right for a cross-platform SSDT as a majestic stand-alone again, and I\’ll be pushing all the people I know to make this happen.

Actually, that\’s just 2 people.

And I don\’t really know them.

Or have any influence over what they do….

And I know I can get Visual Studio on a non-windows platform.

But that doesn\’t mean we can now get on….

Back soon….

Installing PowerShell on MacOS

Wait… what ??!??

You can have PowerShell on MacOS ???!???

Yep, and on Linux, too 🙂

Head over to github, and go get… github.com/powershell/powershell/releases

For MacOS, download the file that ends in .pkg, and then install as you would from any .pkg file. You’ll be pwsh-tastic in no time.

Couple gotchas, though:

You need to be root to install modules. Ho hum.

You need to trust the PSGallery repo to install modules

You might want to update PowerShellGet while you’re there…

Starting Powershell can be then done either from the Launcher icon, or by opening a terminal window and executing ‘pwsh’

Trusting the PSGallery is simples, though.

In your pwsh window (as root), run:

Set-PSRepository -Name PSGallery -InstallationPolicy Trusted

..wait a few seconds, then you should be able to

Install-Module PowerShellGet -Force

And also..

Install-Module SQLServer

I’m assuming something similar for Ubuntu 16.04: Get the .deb, install, trust, install-module (s), but I’ve not got that far. Too excited 🙂

Happy PWSHing !

Back soon..

SSMS 17.5 and GDPR data classification – where is the classification data stored ?

There’s a lot of focus at the moment on GDPR, the General Data Protection Regulations that come into force in Europe on May 25^th 2018 (for further info if this is news to you, see here)

SQL Server Management Studio (SSMS) now has an add-in to classify your data (TechNet blog), but I thought this merited further digging to see how it’s implemented under the covers, and where the data for the classification is stored.

So, it appears that the data classification is held in the database Extended Properties. Extended Properties were introduced way back in SQL 2005, I think, but it wasn’t until the SQL2008 version of SSMS was released that they could be viewed as part of a database.

Like any good DBA who’s hacked around in SSMS before, you’ve probably wondered what goes on under the covers, and the best way to do that is to run a Profiler Trace while you fiddle about in the SSMS UI 🙂

Let’s add some classifications to our database – using good ol’ AdventureWorks:

This opens a new tab in SSMS, with some recommendations as to what the likely classifications might be. Note: no data has been written to the database properties yet – that only happens when you save the classifications.

Creating this table runs some hard-coded T/SQL to get stuff from INFORMATION_SCHEMA.COLUMNS and build the ‘best guesses’ from the column names and datatypes. Also, note that the ‘matching’ of likely columns to types is done in several languages (I’ve omitted some for clarity here):

DECLARE @Dictionary TABLE

(

pattern NVARCHAR(128),

info_type NVARCHAR(128),

sensitivity_label NVARCHAR(128),

can_be_numeric BIT

)

INSERT INTO @Dictionary (pattern, info_type, sensitivity_label, can_be_numeric)

VALUES

(\’%username%\’ ,\’Credentials\’ , \’Confidential\’ ,1),

(\’%pwd%\’ ,\’Credentials\’ , \’Confidential\’ ,1),

(\’%password%\’ ,\’Credentials\’ , \’Confidential\’ ,1),

(\’%email%\’ ,\’Contact Info\’ , \’Confidential – GDPR\’ ,0),

(\’%e-mail%\’ ,\’Contact Info\’ , \’Confidential – GDPR\’ ,0),

(\’%last%name%\’ ,\’Name\’ , \’Confidential – GDPR\’ ,0),

(\’%first%name%\’ ,\’Name\’ , \’Confidential – GDPR\’ ,0),

(\’%surname%\’ ,\’Name\’ , \’Confidential – GDPR\’ ,0),

(\’%mainden%name%\’ ,\’Name\’ , \’Confidential – GDPR\’ ,0),

(\’%addr%\’ ,\’Contact Info\’ , \’Confidential – GDPR\’ ,0),

(\’%número%de%cartao%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%numero%de%cartão%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%numero%de%tarjeta%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%numero%della%carta%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%numero%di%carta%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%numero%di%scheda%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%numero%do%cartao%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%numero%do%cartão%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%numéro%de%carte%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%nº%carta%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%veiligheidsaantal%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%veiligheidscode%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%veiligheidsnummer%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%verfalldatum%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%ablauf%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%data%de%expiracao%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%data%de%expiração%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%data%del%exp%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%data%di%exp%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%data%di%scadenza%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),

(\’%Ausweis%\’ ,\’National ID\’ , \’Confidential – GDPR\’ ,1),

(\’%Identifikation%\’ ,\’National ID\’ , \’Confidential – GDPR\’ ,1),

(\’%patente%di%guida%\’ ,\’National ID\’ , \’Confidential – GDPR\’ ,1)

DECLARE @InfoTypeRanking TABLE

(

info_type NVARCHAR(128),

ranking INT

)

INSERT INTO @InfoTypeRanking (info_type, ranking)

VALUES

(\’Banking\’, 800),

(\’Contact Info\’, 200),

(\’Credentials\’, 300),

(\’Credit Card\’, 700),

(\’Date Of Birth\’, 1100),

(\’Financial\’, 900),

(\’Health\’, 1000),

(\’Name\’, 400),

(\’National ID\’, 500),

(\’Networking\’, 100),

(\’SSN\’, 600),

(\’Other\’, 1200)

DECLARE @ClassifcationResults TABLE

(

schema_name NVARCHAR(128),

table_name NVARCHAR(128),

column_name NVARCHAR(128),

info_type NVARCHAR(128),

sensitivity_label NVARCHAR(128),

ranking INT,

can_be_numeric BIT

)

INSERT INTO @ClassifcationResults

SELECT DISTINCT S.NAME AS schema_name,

T.NAME AS table_name,

C.NAME AS column_name,

D.info_type,

D.sensitivity_label,

R.ranking,

D.can_be_numeric

FROM sys.schemas S

INNER JOIN sys.tables T

ON S.schema_id = T.schema_id

INNER JOIN sys.columns C

ON T.object_id = C.object_id

INNER JOIN sys.types TP

ON C.system_type_id = TP.system_type_id

LEFT OUTER JOIN @Dictionary D

ON (D.pattern NOT LIKE \’%[%]%\’ AND LOWER(C.name) = LOWER(D.pattern) COLLATE DATABASE_DEFAULT) OR

(D.pattern LIKE \’%[%]%\’ AND LOWER(C.name) LIKE LOWER(D.pattern) COLLATE DATABASE_DEFAULT)

LEFT OUTER JOIN @infoTypeRanking R

ON (R.info_type = D.info_type)

WHERE (D.info_type IS NOT NULL ) AND

NOT (D.can_be_numeric = 0 AND TP.name IN (\’bigint\’,\’bit\’,\’decimal\’,\’float\’,\’int\’,\’money\’,\’numeric\’,\’smallint\’,\’smallmoney\’,\’tinyint\’))

SELECT DISTINCT

CR.schema_name AS schema_name,

CR.table_name AS table_name,

CR.column_name AS column_name,

CR.info_type AS information_type_name,

CR.sensitivity_label AS sensitivity_label_name

FROM @ClassifcationResults CR

INNER JOIN

(

SELECT

schema_name,

table_name,

column_name,

MIN(ranking) AS min_ranking

FROM

@ClassifcationResults

GROUP BY

schema_name,

table_name,

column_name

) MR

ON CR.schema_name = MR.schema_name

AND CR.table_name = MR.table_name

AND CR.column_name = MR.column_name

AND CR.Ranking = MR.min_ranking

ORDER BY schema_name, table_name, column_name

BTW, if anyone from Microsoft is reading this, you might want to correct the typo for “%mainden%name%”…

OK, so now we have our SSMS tab full of classification suggestion goodness. Let’s just accept the recommendations (and why wouldn’t you), and save the definitions back to SQL Server:

In the Profiler trace that you should be running (!), you’ll see what SQLServer does when you save the classifications. It calls ‘sp_addextendedproperty’ with parameters for each of the items that are required to be saved. It actually adds 4 properties for each one (type, type_id, label, and label_id), but you get the idea…

exec sp_addextendedproperty @name=N\’sys_information_type_name\’,@level0type=N\’schema\’,@level0name=N\’Sales\’,@level1type=N\’table\’,@level1name=N\’SalesTaxRate\’,@level2type=N\’column\’,@level2name=N\’TaxRate\’,@value=N\’Financial\’

exec sp_addextendedproperty @name=N\’sys_information_type_id\’,@level0type=N\’schema\’,@level0name=N\’Sales\’,@level1type=N\’table\’,@level1name=N\’SalesTaxRate\’,@level2type=N\’column\’,@level2name=N\’TaxRate\’,@value=N\’C44193E1-0E58-4B2A-9001-F7D6E7BC1373\’

exec sp_addextendedproperty @name=N\’sys_sensitivity_label_name\’,@level0type=N\’schema\’,@level0name=N\’Sales\’,@level1type=N\’table\’,@level1name=N\’SalesTaxRate\’,@level2type=N\’column\’,@level2name=N\’TaxRate\’,@value=N\’Confidential\’

exec sp_addextendedproperty @name=N\’sys_sensitivity_label_id\’,@level0type=N\’schema\’,@level0name=N\’Sales\’,@level1type=N\’table\’,@level1name=N\’SalesTaxRate\’,@level2type=N\’column\’,@level2name=N\’TaxRate\’,@value=N\’331F0B13-76B5-2F1B-A77B-DEF5A73C73C2\’

This was just to classify one column, Tax Rate, as type ‘Financial’ and label ‘Confidential’.

This information is saved as a database-scoped extended property, so the classifications are completely portable on backup/restore if you were to move the database for some reason. #ftw

Sadly, these are *not* available in the normal database extended properties UI, but you could query them directly if you wanted to extract them programmatically using the static T/SQL gleaned from the trace:

I hope that’s shown you where the classification data is held. It’s a simple (!) job to add the server name and database name to the query so you could use it to provide information on all your classifications across all your databases and servers, but I’ll leave that fun to you !

Back soon !!

ASPStateInMemory – I wish I was a grown-up…

When I was a slip of a lad, my dad had some recordings (legitimate, I assure you) by a comedian/folk singer/now Radio 2 presenter called Mike Harding.

Whilst the subject matter was occasionally ribald for a 10-year-old, some of the songs really struck a chord, and I now remember them still.

One such song was called \”Some things that the grown-ups just don\’t tell you\” in which a child\’s inquiring mind asks such searching questions as \”If you eat food all different colours, why does it all come out brown?\” and \”If you undo your belly-button, does your bum really fall off?\”

With that in mind (and very much earwormed myself with that one), this leads me on continually discover things that should be easy to answer but turn out give the bum\’s rush when searching the internet.

The question I asked myself was:

\”When using SQL 2014 In Memory Session State, which elements of the application stack require .Net 4 or above?\”

Quite aside from the arguments against storing session state in the least performant manner in the first place, my google-fu deserted me for quite some time – definitive answers are hard to come by these days, it’s all ‘alternative facts’.

As a DBA, I understand that at the database layer, a recent version of .Net is required to handle the creation of natively compiled stored procedures that the ASPStateInMemory database schema requires, but does that necessarily apply further up the application stack ?

Logic would dictate not (in my view) – it\’s simply a Remote Procedure Call from the application or web service, and should therefore be agnostic as to the underlying technology. Shouldn\’t it ?

In a perfect world, yes, it very much should. But as we all know from recent history, the world is very much far from perfect.

ASPStateInMemory requires a new .Net provider. That provider is in .Net4, and is compiled using the CLR for that version.

You can\’t point an existing session state connection at the InMemory version without that.

So, if you\’re applications or services are compiled in .net 3.51, they\’ll be using the wrong version of the session state provider, written for CLR 2.0.

THIS MEANS:

If you\’ve not been diligent in keeping pace with the .net Framework versions in your development, or have existing legacy applications that would benefit from moving to InMemory session state, you can’t use ASPStateInMemory without re-writing them for the later version of the Framework.

The answer took me a while to find and then understand, but it appears the answer is:

\”All of your application stack.\”

…and *that* might be expensive…

Oh dear. Back to the drawing board then. I think I left some indexes over there… I\’ll just see how fragmented they are….

Back soon !

It\'s not just the data that\'s big

It\’s not just the data that\’s big

Big Data ! How cool is that ! Imagine all the things that your big data can tell you !

All you have to do it implement it, right ? How hard can that be ??

Answer: It’s VERY hard indeed for such a simple concept.

But why is it so hard ?

Largely, it’s because of the choices you’ll have to make, the platforms you deliver on, and the suppliers you’ll have to deal with.

Sure, there are lots of big data solutions, just like there are lots of big data consultancies, but ALL of them come at a cost, no matter how ‘free’ they are.

Just by way of further reading, I recommend this:

http://dataconomy.com/understanding-big-data-ecosystem/

It’s a short article, detailing the general areas you’ll need to be considering, such as Infrastructure, Analytics and Applications, and is well worth a look.

The main thing you might realise while reading it is the sheer scale of the big data business. A few short years ago, you had your RDBMS of choice (SQL Server, DB2, Oracle,…), some flat files out of your legacy system, and some spreadsheets that Dave in Accounts was using to track how many toner cartridges he needed to order.

Now you got everything. Literally everything. And some other stuff besides that you didn’t even know existed.

Picking the right platform from that lot seems like a Herculean task. Indeed, maybe there is no ‘right’ platform ? And if not, how many platforms are too many? Then, it’s not just about the platform your data is currently stored in, it’s also about integrating those data sources to really deliver valuable insight to your business in simple and flexible way.

Perhaps the question we need to be asking is ‘What does our data look like, and what does the business need it to look like?’.

From that question, you can start to see where you want to be. Getting there is still going to be tough, but at least you’ll have an idea of what success might be.

And perhaps, if you can have some fun along the way, that wouldn’t be a bad thing either J

Back soon…

SQL Trace Flag 3004 and BACKUP database

Quick one today � Someone asked me if there\’s a starting equivalent of this message:

04/23/2014 11:46:53,Backup,Unknown,Database backed up. Database: foo creation date(time): 2013/11/07(12:19:59) pages dumped: 363 first LSN: 44:888:37 last LSN: 44:928:1 number of dump devices: 1 device information: (FILE=1 TYPE=DISK: {\’C:\\Temp\\foo.bak\’}). This is an informational message only. No user action is required.

You know, so we can track when a database backup starts, and what it\’s doing while backing up… and stuff…

There\’s plenty on trace flag 3004 for RESTORE operations from a backup file so you can identify which areas of your restore operations are taking a while, such as zeroing out and creating devices, etc, but not much on the BACKUP side of things. All I could see was that the trace flag would output detail to the error log if invoked thusly:

DBCC TRACEON (3004,3605,-1)

Take a look at http://blogs.msdn.com/b/psssql/archive/2008/01/23/how-it-works-what-is-restore-backup-doing.aspx for further reading on that. And there\’s plenty others as well, like http://jamessql.blogspot.co.uk/2013/07/trace-flag-for-backup-and-restore.html

I was a bit stumped for the BACKUP bit, so I asked the #sqlhelp tag on twitter if there was another flag for backup. Thanks to @SQLKiwi, turns out it\’s the same one.

So, with that trace flag turned on as above, the output you get in the SQL error log is far more detailed (I\’ve turned the list so it reads in the correct event order):

Date    Source Severity        Message
04/23/2014 11:47:51     spid52 Unknown BackupDatabase: Database foo
04/23/2014 11:47:51     spid52 Unknown Backup: Media open
04/23/2014 11:47:51     spid52 Unknown Backup: Media ready to backup
04/23/2014 11:47:51     spid52 Unknown Backup: Clearing differential bitmaps
04/23/2014 11:47:51     spid52 Unknown Backup: Bitmaps cleared
04/23/2014 11:47:51     spid52 Unknown BackupDatabase: Checkpoint done
04/23/2014 11:47:51     spid52 Unknown Backup: Scanning allocation bitmaps
04/23/2014 11:47:51     spid52 Unknown Backup: Done with allocation bitmaps
04/23/2014 11:47:51     spid52 Unknown BackupDatabase: Work estimates done
04/23/2014 11:47:51     spid52 Unknown Backup: Leading metadata section done
04/23/2014 11:47:51     spid52 Unknown Backup:Copying data
04/23/2014 11:47:51     spid52 Unknown Backup: DBReaderCount = 1
04/23/2014 11:47:51     spid52 Unknown Started file C:\\Program Files\\Microsoft SQL Server\\MSSQL11.TEST1\\MSSQL\\DATA\\foo.mdf
04/23/2014 11:47:51     spid52 Unknown Completed file C:\\Program Files\\Microsoft SQL Server\\MSSQL11.TEST1\\MSSQL\\DATA\\foo.mdf
04/23/2014 11:47:51     spid52 Unknown BackupDatabase: Database files done
04/23/2014 11:47:51     spid52 Unknown �正灵��慢�退%s : Log files done : Log files done
04/23/2014 11:47:51     spid52 Unknown Backup: Trailing config done
04/23/2014 11:47:51     spid52 Unknown Backup: MBC done
04/23/2014 11:47:51     spid52 Unknown BackupDatabase: Writing history records
04/23/2014 11:47:51     Backup Unknown Database backed up. Database: foo creation date(time): 2013/11/07(12:19:59) pages dumped: 362 first LSN: 44:1040:37 last LSN: 44:1072:1 number of dump devices: 1 device information: (FILE=1 TYPE=DISK: {\’C:\\ Temp\\foo.bak\’}). This is an informational message only. No user action is required.
04/23/2014 11:47:51     spid52 Unknown Writing backup history records
04/23/2014 11:47:51     spid52 Unknown BackupDatabase: Finished

With this detail, it\’s relatively simple to alert on when a BACKUP event starts, or any part of that backup event, and you could also use it for identifying which parts of your database backup are taking a long time.

Just though I should write something up for BACKUP as well as RESTORE…

Back soon….