Sunday, 9 January 2011

Dimensional Modelling for a Flexible Cube

It has been quite a while since I posted here; this is due partly to some personal events over the tail end of last year and partly due to a technical challenge that has been consuming my time. I am sure you are not interested in my personal life so I’ll share some information on the technical challenge. For those of you who will be enjoying the seaside air in Brighton for SQL Bits this April, this is a sneak preview of some of the material I will be presenting in one of my sessions (subject to enough votes).

I recently designed and developed a SSAS cube solution for a customer that allows measures to be calculated differently dependent upon which dimension member is being used to view them (dynamic measures).

It also allows for comparator members to be manually defined and for these members to use the dynamic measure calculation method that is defined for the “main member” even if the “comparison member” has a different calculation method. Further the “comparison members” can have anonymous aliases defined such as ‘comparator 1’, ‘comparator 2’ etc. so that when the cube is browsed the real identity of the comparison members is hidden.

All of this functionality is possible using MDX expressions to overwrite dummy measures with real values based upon the selected main member. This approach suffers from much poorer performance because SSAS cannot aggregate such measures nor can it fully utilise caching during calculations, it is also more complex a solution with the attendant impact upon development and maintenance.

But enough background, the following represents a vastly simplified data mart to support the solution. We have 2 dimensions, one dynamic, the other static; one fact table and t config tables that hang the solution together…


Config tables:

Comparison Client. This table contains 3 fields…
·         The ‘main’ client key.
·         The comparison client key. (this will match the main client key for the main client)
·         The display name. (for the main client this will be the actual name, for the comparison clients this will be an anonymous alias)

Dynamic Measures. This table contains 4 fields…
·         The client key
·         The metric group ID
·         The metric ID
·         Metric description. (this field is used to comment on how the metric is calculated to aid development and maintenance)


To manage the dynamic client dimension a view is created that joins the ‘standard’ client dimension table to the comparison config table.

/*******************************************
view to provide a star schema to the cube with the
flexibility to adopt changes to the comparison client
*********************************************/
select
      CheckSum(CONVERT(char(6), dc.Client_ID)
                  + CONVERT(char(6), dyn.Comp_Client_ID)
                  +  CONVERT(char(6), dmes.Measure_Group_ID)
                  +  CONVERT(char(6), dmes.Measure_ID)
            ) as Dim_Key      -- the dimension will need a key consisting of the client, the comparison client and the full key for the dynamic measures to be used.  (I have worked on the assumption that no key will be more than 6 digits, this is something that you would need to define against your business requirements).
      ,dc.Client_ID
      ,dyn.Comp_Client_ID
      ,dmes.Measure_Group_ID
      ,dmes.Measure_ID
      ,dyn.display_name
      -- other client attribute fields
from
      Dim_Client dc
      inner join Config_Dynamic_Client dyn on dyn.Client_ID = dc.Client_ID
      left join Config_Dynamic_Measure dmes on dmes.Client_ID = dc.Client_ID  -- joined on the main client id
       
In addition to the normal attribute fields this returns the display text from the compare table used by the cube to display the client name and the comparison client key from the comparison config table.
Note: If any of the additional attributes can be used to identify the ‘anonymous’ comparison client then real values should not be returned to the cube. Use join logic and coalesce() in your view to return a place-holding constant instead.

So if we had the following clients in the standard dimension table…

Client ID
Client Name
Client Address 1
1
Fred Flintstone
2
Barny Rubble
3
Mrs Fred Flintstone


And the following defined in the dynamic client config table

Main Client ID
Comp Client ID
Display Name
1
1
Fred Flintstone
1
2
Client 1
2
2
Barny Rubble
2
1
Client 1
2
3
Client 2
3
3
Wilma Flintstone
3
1
Client 1


Notice that in the dynamic config table each client is linked with itself (main client ID = comp client ID) and that in this case the display name matches the actual name. This config table can also be utilised to provide an alias where the data mart holds a name that you wish to overwrite, as in the case of Mrs Fred Flintstone who has been provided with her own real display name in the config table.

And the dynamic Measures configured thus…

Client ID
Measure ID
Measure Group ID
Description
1
1
1
Only count status of 2 or 7
2
2
1
Use all values
3
2
1
Use all values
1
1
2
Only use status 2,3,5,8
2
3
2
Use status 4 and over
3
2
2
Use status 2 and anything above 4



So if we apply the view to this query we will get the following results…


Dim Key
Client ID
Comp Client ID
Measure Group ID
Measure ID
Display Name
Other attributes
Hash 1
1
1
1
1
Fred Flintstone
Hash 2
1
1
2
1
Fred Flintstone
Hash 3
1
2
1
1
Client 1

Hash 4
1
2
2
1
Client 1
Hash 5
2
2
1
2
Barny Rubble
Hash 6
2
1
1
2
Client 1
Hash 7
2
1
2
3
Client 1
Hash 8
2
3
1
2
Client 2



Dim Key                               -- integer checksum of both client ids and a full descriptor of the measure
Client ID                              -- the ID for the main client
Comp Client ID                    -- the ID for the client used for comparisons
Measure Group ID               -- the ID for the dynamic measure group
Measure ID                          -- the ID for the particular calculation method used for this measure
Display Name                      -- the client name to display on reports and queries (the real name, a meaningful alias or an anonymous placeholder.)
Any other client attributes to be included.

This view is used to provide the basis for your dynamic client dimension and can be modelled in exactly the same way as any regular SSAS dimension.

Dynamic Measure

To create the dynamic measures you will need 1 view for each method of calculating a measure is required. So for Measure 1 in Measure Group 1 you would have a view with the following select statements…

/************************************************
View to provide first calculation method for measure group 1
**************************************************/

select
      1 as Measure_ID
      ,1 as Measure_Group_ID
      ,dyn.Client_ID
      ,dyn.Comp_Client_ID
      ,CheckSum(
            CONVERT(char(6), dyn.Client_ID),
            CONVERT(char(6), dyn.Comp_Client_ID),
            CONVERT(char(6), 1),          -- measure group ID
            CONVERT(char(6), 1)                 -- measure ID
      ) as Dynamic_Key
      -- standard related dimension keys
      ,mes.Measure_value1
      ,mes.measure_value2
from 
      Config_Dynamic_Client dyn
      inner join Config_Dynamic_Measure dmes on dyn.Main_Client_ID = dmes.Client_ID -- join on the comparison client ID,not the main (real client ID)
            and dmes.Measure_Group_ID = 1  -- limit to just the Measure group 1
            and mes.Measure_ID = 1  -- limit to just the calculation method #1
           
      inner join fact_Measure mes on dmes.Client_ID = mes.Client_ID
            and dyn.Comp_Client_ID = mes.Client_ID
           
where
      mes.status_code in (2, 7)     -- the conditions for this method of calculating
     
     
/************************************************
View to provide second calculation method for measure group 1 (unrestricted)
**************************************************/

select
      2 as Measure_ID
      ,1 as Measure_Group_ID
      ,dyn.Client_ID
      ,dyn.Comp_Client_ID
      ,CheckSum(
            CONVERT(char(6), dyn.Client_ID),
            CONVERT(char(6), dyn.Comp_Client_ID),
            CONVERT(char(6), 1),          -- measure group ID
            CONVERT(char(6), 2)                 -- measure ID
      ) as Dynamic_Key
      -- standard related dimension keys
      ,mes.Measure_value1
      ,mes.measure_value2
from 
      Config_Dynamic_Client dyn
      inner join Config_Dynamic_Measure dmes on dyn.Main_Client_ID = dmes.Client_ID -- join on the comparison client ID,not the main (real client ID)
            and dmes.Measure_Group_ID = 1  -- limit to just the Measure group 1
            and mes.Measure_ID = 2  -- limit to just the calculation method #1
           
      inner join fact_Measure mes on dmes.Client_ID = mes.Client_ID
            and dyn.Comp_Client_ID = mes.Client_ID
           
      -- the conditions for this method of calculating (this version of the metric uses all records)


                                … Repeat for all of the dynamic measure types …

Each measure group is then combined using a simple UNION query, again wrapped into a view. Each of these UNIONed views can then form the basis of a measure group within the cube. For the dimension usage between the dynamic measures and the dynamic dimension that you have defined use the checksum value (Dynamic_Key). From this point on cube development is completely as standard.

Initially we embarked upon this solution because we did not have any control or ownership over the ETL process. Our customer wanted to keep ownership of this phase and simply provided us with a dimensional model to our specifications within their Informix environment. That said, having built this solution there are a couple of advantages over using similar tables to manage the same functionality…

If this was done using an ETL process then the relational data mart would have to be larger in order to hold the extra table depth in the dynamic dimension and the dynamic measures, using the views this semi-duplication of rows only exists in the OLAP model where compression is better.

Using an ETL, if any of the relationships managed by the config tables the whole ETL would need to be run (or you would need to set up some clever logic to work out what had changed). You would also need to discard those parts of the data mart that where no longer valid. Using this approach you only need to reprocess the cube if anything changes.



Limitations of the solution
There are a couple of limitations to this approach…

There can only be one dynamic dimension. i.e. only one dimension may demand that measures and comparators be defined based upon which of its members are selected. The like for like comparison has the effect of bloating the cube. This will impact the hardware requirements and have an impact on performance (though negligible in comparison to the performance hit on using MDX overwrites). The amount of ‘bloating’ is broadly related to the number of comparison members that are defined – the space required for measures increases by the product of the number of comparators defined. Hence this can be controlled by setting operational limits on the number of comparators allowed.

Each measure group can have only one dynamic measure. Though there is no limit on the number of measure groups that are possible and the measures can be later grouped in display folders to mitigate the impact of this limitation.

If there is not a member of the dynamic dimension in the query, the measures will not produce reliable and valid results. (Dynamic measures will aggregate but will be using different methods to calculate them under the covers.) For this reason the ‘ALL’ member should at least be hidden on the dynamic dimension.

The full solution we implemented also provides for bucket definitions defined on a per-dynamic-member-basis that builds on the solution outlined in Chris Webb’s excellent book ‘Expert Cube Development with Microsoft SQL Server 2008 Analysis Services’ (which Chris assures me makes an excellent gift for all your family and friends.) Though I’ll leave discussing that until April at SQLBits.
Submit a session for SQLBits

-- Addition 10th Jan 2011 --

I should make it clear that all measures that intersect your dynamic dimension will need to be treated as a dynamic measure with an entry in the dynamic measure config table and an ID for the measure group and the measure, though if you are confident that it will never be calculated in more than one way you can forego the step of writting the wrapper view that UNIONs all the measure views together.

--

No comments:

Post a Comment