Slowly Changing Dimensions (SCD) – Types

  • Updated
  • Posted in Programming
  • 7 mins read


Slowly Changing Dimensions: Slowly altering dimensions are the size wherein the info modifications slowly, reasonably than altering frequently on a time foundation.

For instance, you could have a buyer dimension in a retail area. Let say the shopper is in India and each month he does some purchasing. Now creating the gross sales report for the purchasers is simple. Now assume that the shopper is transferred to United States and he does purchasing there. How to file such a change in your buyer dimension?

You may sum or common the gross sales carried out by the purchasers. In this case you received’t get the precise insights about of the gross sales carried out by the purchasers. As the shopper wage is elevated after the switch, he/she would possibly do extra purchasing in United States in comparison with in India. If you sum the overall gross sales, then the gross sales carried out by the shopper would possibly look stronger even whether it is common. You can create a second buyer file and deal with the transferred buyer as the brand new buyer. However this may create issues too.

This is the explanation why slowly altering dimensions got here into image and attempt to clear up these issues and enterprise customers can get the best insights in regards to the information.

In the sooner articles, I’ve lined about Dimensional Modelling, Types of Dimensions, Types of Facts and Datawarehouse Design. All these matters are crucial to implement a DWH in actual world. And additionally these there shall be some interview questions on these ideas too. Be by with these ideas.

What are SCD Types

In complete there are complete 6 sorts of SCD which can be broadly used within the DWH implementation. They are as follows:

  • Type 0: This is a hard and fast dimension.
  • Type 1: Maintain solely present state.
  • Type 2: History is saved.
  • Type 3: Current and former states are saved.
  • Type 4: Combination of sorts 1 and a pair of.
  • Type 6: Hybrid sort.

In actual world. Only Type 1, 2 and three are used. Other sorts of SCDs are used hardly ever.

SCD Type 0

Type 0 is a hard and fast dimension. The information on this dimension desk by no means modifications. The information into this dimension desk is loaded one time at first of the mission. An instance for Type 0 is enterprise customers information assigned to specific areas. These enterprise customers won’t ever change their location. So the info on this dimension by no means modifications. The complete gross sales carried out by every enterprise person will be generated.

SCD Type 0

SCD Type 1

SCD sort 1 methodology is used when there is no such thing as a have to retailer historic information within the dimension desk. This methodology overwrites the outdated information within the dimension desk with the brand new information. It is used to appropriate information errors within the dimension.

As an instance, assume the shopper desk with the beneath information.

surrogate_key customer_id customer_name Location
------------------------------------------------
1             1           Mark          Chicago

Here the shopper Location is Chicago and the shopper moved to a different location New York. If you utilize type1 methodology, it simply merely overwrites the info. The information within the up to date desk shall be.

surrogate_key customer_id customer_name Location
------------------------------------------------
1             1           Mark          New York

The benefit of type1 is ease of upkeep and fewer area occupied. The drawback is that there is no such thing as a historic information stored within the information warehouse.

SCD Type 1

SCD Type 2

SCD sort 2 shops the whole historical past the info within the dimension desk. With sort 2 we are able to retailer limitless historical past within the dimension desk. In sort 2, you’ll be able to retailer the info in three alternative ways. They are:

  • Versioning
  • Flagging
  • Effective Date

SCD Type 2 Versioning

In versioning methodology, a sequence quantity is used to symbolize the change. The newest sequence quantity all the time represents the present row and the earlier sequence numbers represents the previous information.

As an instance, let’s use the identical instance of buyer who modifications the situation. Initially the shopper is in Illinois location and the info in dimension desk will look as.

surrogate_key customer_id customer_name Location Version
--------------------------------------------------------
1             1           Marston       Illinois  1

The buyer strikes from Illinois to Seattle and the model quantity shall be incremented. The dimension desk will look as

surrogate_key customer_id customer_name Location Version
--------------------------------------------------------
1             1           Marston       Illinois  1
2             1           Marston       Seattle   2

Now once more if the shopper is moved to a different location, a brand new file shall be inserted into the dimension desk with the subsequent model quantity.

SCD Type 2 Version

SCD Type 2 Flagging

In flagging methodology, a flag column is created within the dimension desk. The present file may have the flag worth as 1 and the earlier data may have the flag as 0.

Now for the primary time, the shopper dimension will look as.

surrogate_key customer_id customer_name Location flag
--------------------------------------------------------
1             1           Marston       Illinois  1

Now when the shopper strikes to a brand new location, the outdated data shall be up to date with flag worth as 0 and the newest file may have the flag worth as 1.

surrogate_key customer_id customer_name Location flag
--------------------------------------------------------
1             1           Marston       Illinois  0
2             1           Marston       Seattle   1

SCD Type 2 Flag

SCD Type 2 Effective Date

In Effective Date methodology, the interval of the change is tracked utilizing the start_date and end_date columns within the dimension desk.

surrogate_key customer_id customer_name Location Start_date   End_date
-------------------------------------------------------------------------
1             1           Marston       Illinois 01-Mar-2010  20-Feb-2011
2             1           Marston       Seattle  21-Feb-2011  NULL

The NULL within the End_Date signifies the present model of the info and the remaining data point out the previous information.

SCD Type 2 Effective Date

SCD Type 3

In sort 3 methodology, solely the present standing and former standing of the row is maintained within the desk. To monitor these modifications two separate columns are created within the desk. The buyer dimension desk within the sort 3 methodology will look as

surrogate_key customer_id customer_name Current_Location previous_location
--------------------------------------------------------------------------
1             1           Marston       Illinois          NULL

Let say, the shopper strikes from Illions to Seattle and the up to date desk will look as

surrogate_key customer_id customer_name Current_Location previous_location
--------------------------------------------------------------------------
1             1           Marston       Seattle          Illinois

Now once more if the shopper strikes from seattle to NewYork, then the up to date desk shall be

surrogate_key customer_id customer_name Current_Location previous_location
--------------------------------------------------------------------------
1             1           Marston       NewYork          Seattle

The sort 3 methodology may have restricted historical past and it relies on the variety of columns you create.

SCD Type 3

SCD Type 4

The scd type 4 can also be referred to as as quick rising dimension. Imagine monitoring all these modifications and storing them in a single dimension (utilizing type3). It takes a lot time to generate a report when this dimension desk is joined with the actual fact desk. To generate the report sooner, the info within the dimension desk needs to be minimal.

SCD Type 4

In Type 4, the present information is maintained within the dimension desk and the historical past is saved in one other desk. This improves the efficiency when producing the report. However it provides an overhead of sustaining the historic information in a separate desk.

SCD Type 6

This is a mixture of Type 1, 2 and three. This can also be referred to as as Hybrid sort. In this dimension, the present information is saved in all of the historic file in a present column.

SCD Type 6

This sort of dimension provides quite a lot of complexity. Implementing this SCD sort is bit onerous and likewise shops quite a lot of redundant information. However, this supplies a straightforward approach to examine present information with historic information.

Leave a Reply