There’s a lot of focus at the moment on GDPR, the General Data Protection Regulations that come into force in Europe on May 25th 2018 (for further info if this is news to you, see here)
SQL Server Management Studio (SSMS) now has an add-in to classify your data (TechNet blog), but I thought this merited further digging to see how it’s implemented under the covers, and where the data for the classification is stored.
So, it appears that the data classification is held in the database Extended Properties. Extended Properties were introduced way back in SQL 2005, I think, but it wasn’t until the SQL2008 version of SSMS was released that they could be viewed as part of a database.
Like any good DBA who’s hacked around in SSMS before, you’ve probably wondered what goes on under the covers, and the best way to do that is to run a Profiler Trace while you fiddle about in the SSMS UI 🙂
Let’s add some classifications to our database – using good ol’ AdventureWorks:
This opens a new tab in SSMS, with some recommendations as to what the likely classifications might be. Note: no data has been written to the database properties yet – that only happens when you save the classifications.
Creating this table runs some hard-coded T/SQL to get stuff from INFORMATION_SCHEMA.COLUMNS and build the ‘best guesses’ from the column names and datatypes. Also, note that the ‘matching’ of likely columns to types is done in several languages (I’ve omitted some for clarity here):
DECLARE @Dictionary TABLE
(
pattern NVARCHAR(128),
info_type NVARCHAR(128),
sensitivity_label NVARCHAR(128),
can_be_numeric BIT
)
INSERT INTO @Dictionary (pattern, info_type, sensitivity_label, can_be_numeric)
VALUES
(\’%username%\’ ,\’Credentials\’ , \’Confidential\’ ,1),
(\’%pwd%\’ ,\’Credentials\’ , \’Confidential\’ ,1),
(\’%password%\’ ,\’Credentials\’ , \’Confidential\’ ,1),
(\’%email%\’ ,\’Contact Info\’ , \’Confidential – GDPR\’ ,0),
(\’%e-mail%\’ ,\’Contact Info\’ , \’Confidential – GDPR\’ ,0),
(\’%last%name%\’ ,\’Name\’ , \’Confidential – GDPR\’ ,0),
(\’%first%name%\’ ,\’Name\’ , \’Confidential – GDPR\’ ,0),
(\’%surname%\’ ,\’Name\’ , \’Confidential – GDPR\’ ,0),
(\’%mainden%name%\’ ,\’Name\’ , \’Confidential – GDPR\’ ,0),
(\’%addr%\’ ,\’Contact Info\’ , \’Confidential – GDPR\’ ,0),
(\’%número%de%cartao%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%numero%de%cartão%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%numero%de%tarjeta%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%numero%della%carta%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%numero%di%carta%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%numero%di%scheda%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%numero%do%cartao%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%numero%do%cartão%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%numéro%de%carte%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%nº%carta%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%veiligheidsaantal%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%veiligheidscode%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%veiligheidsnummer%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%verfalldatum%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%ablauf%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%data%de%expiracao%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%data%de%expiração%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%data%del%exp%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%data%di%exp%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%data%di%scadenza%\’ ,\’Credit Card\’ , \’Confidential\’ ,1),
(\’%Ausweis%\’ ,\’National ID\’ , \’Confidential – GDPR\’ ,1),
(\’%Identifikation%\’ ,\’National ID\’ , \’Confidential – GDPR\’ ,1),
(\’%patente%di%guida%\’ ,\’National ID\’ , \’Confidential – GDPR\’ ,1)
DECLARE @InfoTypeRanking TABLE
(
info_type NVARCHAR(128),
ranking INT
)
INSERT INTO @InfoTypeRanking (info_type, ranking)
VALUES
(\’Banking\’, 800),
(\’Contact Info\’, 200),
(\’Credentials\’, 300),
(\’Credit Card\’, 700),
(\’Date Of Birth\’, 1100),
(\’Financial\’, 900),
(\’Health\’, 1000),
(\’Name\’, 400),
(\’National ID\’, 500),
(\’Networking\’, 100),
(\’SSN\’, 600),
(\’Other\’, 1200)
DECLARE @ClassifcationResults TABLE
(
schema_name NVARCHAR(128),
table_name NVARCHAR(128),
column_name NVARCHAR(128),
info_type NVARCHAR(128),
sensitivity_label NVARCHAR(128),
ranking INT,
can_be_numeric BIT
)
INSERT INTO @ClassifcationResults
SELECT DISTINCT S.NAME AS schema_name,
T.NAME AS table_name,
C.NAME AS column_name,
D.info_type,
D.sensitivity_label,
R.ranking,
D.can_be_numeric
FROM sys.schemas S
INNER JOIN sys.tables T
ON S.schema_id = T.schema_id
INNER JOIN sys.columns C
ON T.object_id = C.object_id
INNER JOIN sys.types TP
ON C.system_type_id = TP.system_type_id
LEFT OUTER JOIN @Dictionary D
ON (D.pattern NOT LIKE \’%[%]%\’ AND LOWER(C.name) = LOWER(D.pattern) COLLATE DATABASE_DEFAULT) OR
(D.pattern LIKE \’%[%]%\’ AND LOWER(C.name) LIKE LOWER(D.pattern) COLLATE DATABASE_DEFAULT)
LEFT OUTER JOIN @infoTypeRanking R
ON (R.info_type = D.info_type)
WHERE (D.info_type IS NOT NULL ) AND
NOT (D.can_be_numeric = 0 AND TP.name IN (\’bigint\’,\’bit\’,\’decimal\’,\’float\’,\’int\’,\’money\’,\’numeric\’,\’smallint\’,\’smallmoney\’,\’tinyint\’))
SELECT DISTINCT
CR.schema_name AS schema_name,
CR.table_name AS table_name,
CR.column_name AS column_name,
CR.info_type AS information_type_name,
CR.sensitivity_label AS sensitivity_label_name
FROM @ClassifcationResults CR
INNER JOIN
(
SELECT
schema_name,
table_name,
column_name,
MIN(ranking) AS min_ranking
FROM
@ClassifcationResults
GROUP BY
schema_name,
table_name,
column_name
) MR
ON CR.schema_name = MR.schema_name
AND CR.table_name = MR.table_name
AND CR.column_name = MR.column_name
AND CR.Ranking = MR.min_ranking
ORDER BY schema_name, table_name, column_name
BTW, if anyone from Microsoft is reading this, you might want to correct the typo for “%mainden%name%”…
OK, so now we have our SSMS tab full of classification suggestion goodness. Let’s just accept the recommendations (and why wouldn’t you), and save the definitions back to SQL Server:
In the Profiler trace that you should be running (!), you’ll see what SQLServer does when you save the classifications. It calls ‘sp_addextendedproperty’ with parameters for each of the items that are required to be saved. It actually adds 4 properties for each one (type, type_id, label, and label_id), but you get the idea…
exec sp_addextendedproperty @name=N\’sys_information_type_name\’,@level0type=N\’schema\’,@level0name=N\’Sales\’,@level1type=N\’table\’,@level1name=N\’SalesTaxRate\’,@level2type=N\’column\’,@level2name=N\’TaxRate\’,@value=N\’Financial\’
go
exec sp_addextendedproperty @name=N\’sys_information_type_id\’,@level0type=N\’schema\’,@level0name=N\’Sales\’,@level1type=N\’table\’,@level1name=N\’SalesTaxRate\’,@level2type=N\’column\’,@level2name=N\’TaxRate\’,@value=N\’C44193E1-0E58-4B2A-9001-F7D6E7BC1373\’
go
exec sp_addextendedproperty @name=N\’sys_sensitivity_label_name\’,@level0type=N\’schema\’,@level0name=N\’Sales\’,@level1type=N\’table\’,@level1name=N\’SalesTaxRate\’,@level2type=N\’column\’,@level2name=N\’TaxRate\’,@value=N\’Confidential\’
go
exec sp_addextendedproperty @name=N\’sys_sensitivity_label_id\’,@level0type=N\’schema\’,@level0name=N\’Sales\’,@level1type=N\’table\’,@level1name=N\’SalesTaxRate\’,@level2type=N\’column\’,@level2name=N\’TaxRate\’,@value=N\’331F0B13-76B5-2F1B-A77B-DEF5A73C73C2\’
go
This was just to classify one column, Tax Rate, as type ‘Financial’ and label ‘Confidential’.
This information is saved as a database-scoped extended property, so the classifications are completely portable on backup/restore if you were to move the database for some reason. #ftw
Sadly, these are *not* available in the normal database extended properties UI, but you could query them directly if you wanted to extract them programmatically using the static T/SQL gleaned from the trace:
I hope that’s shown you where the classification data is held. It’s a simple (!) job to add the server name and database name to the query so you could use it to provide information on all your classifications across all your databases and servers, but I’ll leave that fun to you !
Back soon !!