Saturday 14 April 2012

SCD: Slowly changing dimension

  Slowly Changing Dimensions (SCDs)
are dimensions that have data that changes slowly, rather than changing on a time-based, regular schedule.

Dealing with these issues involves SCD management methodologies referred to as Type 0 through 6. Type 6 SCDs are also sometimes called Hybrid SCDs.

The most common slowly changing dimensions are Types 1, 2, and 3.

Short summary.
  • Type 0: no action
  • Type 1: overwrite, keep only the current value
  • Type 2: use multiple rows with start/end date or effective/expire date to track whole change history
  • Type 3: track limited history with columns such as PreviousValue
  • Type 4: keep both current value and build a complete history table
  • Type 6: (1+2+3), combine method 1 , 2 and 3.

The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all.

Supplier_Key Supplier_Code Supplier_Name Supplier_State
123 ABC Acme Supply Co CA
Supplier_Key Supplier_Code Supplier_Name Supplier_State
123 ABC Acme Supply Co IL



The Type 2 method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. With Type 2, we have unlimited history preservation as a new record is inserted each time a change is made.







with incremented version numbers to indicate the sequence of changes:
Supplier_Key Supplier_Code Supplier_Name Supplier_State Version
123 ABC Acme Supply Co CA 0
124 ABC Acme Supply Co IL 1
Another popular method for tuple versioning is to add 'effective date' columns.
Supplier_Key Supplier_Code Supplier_Name Supplier_State Start_Date End_Date
123 ABC Acme Supply Co CA 01-Jan-2000 21-Dec-2004
124 ABC Acme Supply Co IL 22-Dec-2004
 The Type 3 method tracks changes using separate columns. Whereas Type 2 had unlimited history preservation, Type 3 has limited history preservation, as it's limited to the number of columns designated for storing historical data

(only the previous history is stored )
Supplier_Key Supplier_Code Supplier_Name Original_Supplier_State Effective_Date Current_Supplier_State
123 ABC Acme Supply Co CA 22-Dec-2004 IL
The Type 4 method is usually referred to as using "history tables", where one table keeps the current data, and an additional table is used to keep a record of some or all changes.

Type 6 / Hybrid

The Type 6 method combines the approaches of types 1, 2 and 3 (1 + 2 + 3 = 6). One possible explanation of the origin of the term was that it was coined by Ralph Kimball during a conversation with Stephen Pace from Kalido[citation needed]. Ralph Kimball calls this method "Unpredictable Changes with Single-Version Overlay" in The Data Warehouse Toolkit[1].


PS.
Summarized from:
http://en.wikipedia.org/wiki/Slowly_changing_dimension