SSMS 17.5 and GDPR data classification – where is the classification data stored ?

There’s a lot of focus at the moment on GDPR, the General Data Protection Regulations that come into force in Europe on May 25th 2018 (for further info if this is news to you, see here)
 
SQL Server Management Studio (SSMS) now has an add-in to classify your data (TechNet blog), but I thought this merited further digging to see how it’s implemented under the covers, and where the data for the classification is stored.
 
So, it appears that the data classification is held in the database Extended Properties. Extended Properties were introduced way back in SQL 2005, I think, but it wasn’t until the SQL2008 version of SSMS was released that they could be viewed as part of a database.
 
Like any good DBA who’s hacked around in SSMS before, you’ve probably wondered what goes on under the covers, and the best way to do that is to run a Profiler Trace while you fiddle about in the SSMS UI 🙂
 
Let’s add some classifications to our database – using good ol’ AdventureWorks:
 
 
 
This opens a new tab in SSMS, with some recommendations as to what the likely classifications might be. Note: no data has been written to the database properties yet – that only happens when you save the classifications.
 
Creating this table runs some hard-coded T/SQL to get stuff from INFORMATION_SCHEMA.COLUMNS and build the ‘best guesses’ from the column names and datatypes. Also, note that the ‘matching’ of likely columns to types is done in several languages (I’ve omitted some for clarity here):
 
DECLARE @Dictionary TABLE
            (
                pattern                 NVARCHAR(128),
                info_type               NVARCHAR(128),
                sensitivity_label       NVARCHAR(128),
                can_be_numeric          BIT
            )
            INSERT INTO @Dictionary (pattern, info_type, sensitivity_label, can_be_numeric)
            VALUES
                (\’%username%\’                               ,\’Credentials\’               , \’Confidential\’              ,1),
                (\’%pwd%\’                                    ,\’Credentials\’               , \’Confidential\’              ,1),
                (\’%password%\’                               ,\’Credentials\’               , \’Confidential\’              ,1),
                (\’%email%\’                                  ,\’Contact Info\’              , \’Confidential – GDPR\’       ,0),
                (\’%e-mail%\’                                 ,\’Contact Info\’              , \’Confidential – GDPR\’       ,0),
                (\’%last%name%\’                              ,\’Name\’                      , \’Confidential – GDPR\’       ,0),
                (\’%first%name%\’                             ,\’Name\’                      , \’Confidential – GDPR\’       ,0),
                (\’%surname%\’                                ,\’Name\’                      , \’Confidential – GDPR\’       ,0),
                (\’%mainden%name%\’                           ,\’Name\’                      , \’Confidential – GDPR\’       ,0),
                (\’%addr%\’                                   ,\’Contact Info\’              , \’Confidential – GDPR\’       ,0),
                (\’%número%de%cartao%\’                       ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%numero%de%cartão%\’                       ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%numero%de%tarjeta%\’                      ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%numero%della%carta%\’                     ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%numero%di%carta%\’                        ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%numero%di%scheda%\’                       ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%numero%do%cartao%\’                       ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%numero%do%cartão%\’                       ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%numéro%de%carte%\’                        ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%nº%carta%\’                               ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%veiligheidsaantal%\’                      ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%veiligheidscode%\’                        ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%veiligheidsnummer%\’                      ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%verfalldatum%\’                           ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%ablauf%\’                                 ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%data%de%expiracao%\’                      ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%data%de%expiração%\’                      ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%data%del%exp%\’                           ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%data%di%exp%\’                            ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%data%di%scadenza%\’                       ,\’Credit Card\’               , \’Confidential\’              ,1),
                (\’%Ausweis%\’                                ,\’National ID\’               , \’Confidential – GDPR\’       ,1),
                (\’%Identifikation%\’                         ,\’National ID\’               , \’Confidential – GDPR\’       ,1),
                (\’%patente%di%guida%\’                       ,\’National ID\’               , \’Confidential – GDPR\’       ,1)
 
            DECLARE @InfoTypeRanking TABLE
            (
                info_type           NVARCHAR(128),
                ranking             INT
            )
            INSERT INTO @InfoTypeRanking (info_type, ranking)
            VALUES
                (\’Banking\’,                  800),
                (\’Contact Info\’,             200),
                (\’Credentials\’,              300),
                (\’Credit Card\’,              700),
                (\’Date Of Birth\’,            1100),
                (\’Financial\’,                900),
                (\’Health\’,                   1000),
                (\’Name\’,                     400),
                (\’National ID\’,              500),
                (\’Networking\’,               100),
                (\’SSN\’,                      600),
                (\’Other\’,                    1200)
 
            DECLARE @ClassifcationResults TABLE
            (
                schema_name         NVARCHAR(128),
                table_name          NVARCHAR(128),
                column_name         NVARCHAR(128),
                info_type           NVARCHAR(128),
                sensitivity_label   NVARCHAR(128),
                ranking             INT,
                can_be_numeric      BIT
            )
            INSERT INTO @ClassifcationResults
            SELECT      DISTINCT    S.NAME                 AS schema_name,
                                    T.NAME                 AS table_name,
                                    C.NAME                 AS column_name,
                                    D.info_type,
                                    D.sensitivity_label,
                                    R.ranking,
                                    D.can_be_numeric
            FROM        sys.schemas S
            INNER JOIN  sys.tables T
            ON          S.schema_id = T.schema_id
            INNER JOIN  sys.columns C
            ON          T.object_id = C.object_id
            INNER JOIN  sys.types   TP
            ON          C.system_type_id = TP.system_type_id
            LEFT OUTER JOIN @Dictionary D
            ON          (D.pattern NOT LIKE \’%[%]%\’ AND LOWER(C.name) = LOWER(D.pattern) COLLATE DATABASE_DEFAULT) OR
                        (D.pattern LIKE \’%[%]%\’ AND LOWER(C.name) LIKE LOWER(D.pattern) COLLATE DATABASE_DEFAULT)
            LEFT OUTER JOIN @infoTypeRanking R
            ON          (R.info_type = D.info_type)
            WHERE       (D.info_type IS NOT NULL ) AND
                        NOT (D.can_be_numeric = 0 AND TP.name IN (\’bigint\’,\’bit\’,\’decimal\’,\’float\’,\’int\’,\’money\’,\’numeric\’,\’smallint\’,\’smallmoney\’,\’tinyint\’))
 
            SELECT DISTINCT
               CR.schema_name AS schema_name,
               CR.table_name AS table_name,
               CR.column_name AS column_name,
               CR.info_type AS information_type_name,
               CR.sensitivity_label AS sensitivity_label_name
            FROM @ClassifcationResults CR
               INNER JOIN
                  (
                     SELECT
                        schema_name,
                        table_name,
                        column_name,
                    MIN(ranking) AS min_ranking
                     FROM
                        @ClassifcationResults
                     GROUP BY
                        schema_name,
                        table_name,
                        column_name
                  ) MR
                  ON CR.schema_name = MR.schema_name
                    AND CR.table_name = MR.table_name
                    AND CR.column_name = MR.column_name
                    AND CR.Ranking = MR.min_ranking
                  ORDER BY schema_name, table_name, column_name
 
 
BTW, if anyone from Microsoft is reading this, you might want to correct the typo for “%mainden%name%”…
 
OK, so now we have our SSMS tab full of classification suggestion goodness. Let’s just accept the recommendations (and why wouldn’t you), and save the definitions back to SQL Server:
 
 
In the Profiler trace that you should be running (!), you’ll see what SQLServer does when you save the classifications. It calls ‘sp_addextendedproperty’ with parameters for each of the items that are required to be saved. It actually adds 4 properties for each one (type, type_id, label, and label_id), but you get the idea…
 
exec sp_addextendedproperty @name=N\’sys_information_type_name\’,@level0type=N\’schema\’,@level0name=N\’Sales\’,@level1type=N\’table\’,@level1name=N\’SalesTaxRate\’,@level2type=N\’column\’,@level2name=N\’TaxRate\’,@value=N\’Financial\’
go
exec sp_addextendedproperty @name=N\’sys_information_type_id\’,@level0type=N\’schema\’,@level0name=N\’Sales\’,@level1type=N\’table\’,@level1name=N\’SalesTaxRate\’,@level2type=N\’column\’,@level2name=N\’TaxRate\’,@value=N\’C44193E1-0E58-4B2A-9001-F7D6E7BC1373\’
go
exec sp_addextendedproperty @name=N\’sys_sensitivity_label_name\’,@level0type=N\’schema\’,@level0name=N\’Sales\’,@level1type=N\’table\’,@level1name=N\’SalesTaxRate\’,@level2type=N\’column\’,@level2name=N\’TaxRate\’,@value=N\’Confidential\’
go
exec sp_addextendedproperty @name=N\’sys_sensitivity_label_id\’,@level0type=N\’schema\’,@level0name=N\’Sales\’,@level1type=N\’table\’,@level1name=N\’SalesTaxRate\’,@level2type=N\’column\’,@level2name=N\’TaxRate\’,@value=N\’331F0B13-76B5-2F1B-A77B-DEF5A73C73C2\’
go
 
This was just to classify one column, Tax Rate, as type ‘Financial’ and label ‘Confidential.
 
This information is saved as a database-scoped extended property, so the classifications are completely portable on backup/restore if you were to move the database for some reason. #ftw
 
Sadly, these are *not* available in the normal database extended properties UI, but you could query them directly if you wanted to extract them programmatically using the static T/SQL gleaned from the trace:
 
 
 
I hope that’s shown you where the classification data is held. It’s a simple (!) job to add the server name and database name to the query so you could use it to provide information on all your classifications across all your databases and servers, but I’ll leave that fun to you !
 
 
Back soon !!