SQLBits 8 Session Notes #3

Optimising MS BI

General

Scalability != Performance

Cool tools:

Xperf
xEvent
Coreinfo.exe

    Consider symbols from SQLCAT to get internal SQL function names and their calls.

    Really deep analysis of what\’s going on in the OS

Data Loading

\’Minimally Logged\’ logs allocations not all rows / pages but can be wrapped in a transaction and rolled back. TRUNCATE TABLE is minimally logged (cf Kalen Delaney)

Bulk Load is less chatty than singleton inserts – less round tripping.

INSERT myHeap WITH  (TABLOCK) SELECT….

PLEASE NOTE: HEAPS ONLY

Clustered index insert takes an exclusive RANGE lock that will expand as more data is inserted. Can lead to concurrency problems and range overlaps.

SQLServerDestination – must run IS on same server as SQLDestination <– THAT's why I don't like it ! (Uses Shared Memory connections)

/** Look up parallel load design patterns **/
/** Look up brute force hash partitioning design patterns **/
/** Look up large update design patterns **/

\”inserting into a partitioned table requires that the input is sorted by the partitioning key, even if you only hit one partition.\” <– hmm. Worried much ? Not sure… if we're not bulk loading.

Switching partitions in and out requires an exclusive lock on the table. SCH-M lock required (but only for a few ms)

Tuning SQL Server

sys.dm_os_waitstats – PAGELATCH_UP (resource id) can indicate problems at the PFS page.

Wait type RESOURCE_SEMAPHORE = waiting for memory to be allocated to process.

Fixing SOS_SCHEDULER_YIELD:
    Terminate and reconnect – don\’t use the same connection for the entire package.
    Soft NUMA (qv)
    Or give the network a kick…
    Or decrease parallelism…

BULK INSERT problem waits that indicate IO problems on source:
    IMPROVIO_WAIT
    OLEDB
    ASYNCH_NETWORK_IO

Tuning network

Affinitise NICs to CPU cores – Interrupt-Affinity Policy tool

Anecdotal evidence that 1 core in 8 is needed just to service the network on busy systems.

/**Look up Jumbo Frames**/

Discrepancy between reads and writes on NIC properties ? Could be small packet sizes (4096 bytes) – consider max of 32k (set on connection string & sql server – NETWORK PACKET SIZE)

Cubes

SSAS is not meant to be \’infinitely scaleable\’ 🙂

Overhead for each column that is extracted, but varies on the DATATYPE, not the size of the actual values in the column.
    MONEY is *sometimes* more efficient than DECIMAL – it\’s dependant on characteristics of how the specific datatype is stored.

Loose definitions:
    Large = won\’t fit in memory
    Fast Extract = saturates more than 1 NIC

Hyper-threading appropriate for SPINLOCK performance.

Partitioning a cube:
    Roughly equal size partitions
    \’Matrix\’ partitions (qv)
    Recommendation for max of ~1000 – 2000 partitions – higher would impact exploring / processing the cube

Distinct count – better performance if your partitions don\’t intersect:
    1000 – 2000, 2001 – 3000, 3001 – 4000 instead of 1000- 2500, 2000 – 8000, 1000 – 12000

DSVs can be pointed at a specific NIC (by IPAddress), so you can improve NetworkIO performance from the data source (provided you\’ve got multiple NICs)

Process Index / Aggregate:
    If MSOLAP:Memory / Quota Blocked > 0 = not enough memory for aggregations.
        Cure: Reduce AggregationMemoryMin and AggregationMemoryMax from their default values of 10% and 80%
    Increasing CoordinatorBuildMaxThreads can increase parallelism – default is 16 per partition

ProcessAdd dimension processing – don\’t assume that it\’s fast just because it only adds new rows. Size of the dimension makes a difference – this bigger, the slower.

ProcessFull vs. Incremental
    Update can take a long time.
    Always a good idea to throw in a ProcessFull every once in a while
        –  ProcessUpdate once an hour, but do a ProcessFull on the newest partitons every day.

Scaling Out
    by creating copies of the cube (cubeSync or RoboCopy) and load balance access from users to those copies.
    Note: cubeSync not really fast, better performance from RoboCopy.
    -caveat: flushing the cube from memory to disk (to RoboCopy) can make something that fits in memory very large on disk.

Parent / Child Dimensions
    Aggregates only target the [all] and leaf levels
    Performance can suffer with as few as 10K members

MOLAP dimensions + ROLAP facts are OK, but NEVER use referenced dimensions.Ever.

Natural hierarchies are very efficient.

Junk dimension – use a dimension key built from the keys of all it\’s attributes using binary addition. <– probably not good. Avoid if possible.

It\’s possible to avoid *some* types of slowly changing dimension:
    CustomerDim and CustomerDemographicDim, splitting out some of the attributes of the customer into another dimension entirely
    BUT complex when used too often
    Must be done at the relational level.

/**Look up subspace calculation / block computing**/

String Store

Every string has an overhead of 12 bytes for management
    a 1-byte string takes up more space than an 8-byte integer
    String store has a maximum size of 4GB
    Cure: on larger dimensions, prefer integer columns over strings (but I do anyway)

4GB limit applies to any store
    *.ksstore
    *.asstore
    *.bsstore

This limit is removed in SQL denali

Magic Settings

For use when you just can\’t change the design:

    –    speed up cubes
Threadpool Tuning    –    concurrency
    –    multiuser, predictability
    –    multiuser, predictability
    –    multiuser, speed
    –    multiuser, speed

Real Time Cubes

Need to agree on latency
    \”as fast as possible\” – define \’possible\’
    how do you measure it ?

Identify data classes:
    \’real\’ time  – must reside in memory. Locking/Concurrency requirement
    stale, but recent – perhaps in a different table / partition / database
    archive – perhaps in a different table / partition / database / server

Real time SNAPSHOT or RSCI isolation for low latency
BULK load or singleton INSERTs, etc, etc,…

If \”realTimeOLAP=True\”  is in the connection string, this turns off ALL caching. ROLAP data will be consistent, but any MOLAP will be affected.

DON\’T:
    Parent / Child
    Many-Many
    distinct count

Arbitrary shapes will cause massive round-tripping (Hierarchies with holes)

Resources

\”Data loading performance Guide\”
Codeplex – \’Parent/Child Naturalizer\’ (sic)

Leave a comment