Slowly Changing Dimensions: Slowly altering dimensions are the size wherein the info modifications slowly, reasonably than altering frequently on a time foundation.
For instance, you could have a buyer dimension in a retail area. Let say the shopper is in India and each month he does some purchasing. Now creating the gross sales report for the purchasers is simple. Now assume that the shopper is transferred to United States and he does purchasing there. How to file such a change in your buyer dimension?
You may sum or common the gross sales carried out by the purchasers. In this case you received’t get the precise insights about of the gross sales carried out by the purchasers. As the shopper wage is elevated after the switch, he/she would possibly do extra purchasing in United States in comparison with in India. If you sum the overall gross sales, then the gross sales carried out by the shopper would possibly look stronger even whether it is common. You can create a second buyer file and deal with the transferred buyer as the brand new buyer. However this may create issues too.
This is the explanation why slowly altering dimensions got here into image and attempt to clear up these issues and enterprise customers can get the best insights in regards to the information.
In the sooner articles, I’ve lined about Dimensional Modelling, Types of Dimensions, Types of Facts and Datawarehouse Design. All these matters are crucial to implement a DWH in actual world. And additionally these there shall be some interview questions on these ideas too. Be by with these ideas.
Table of Contents
What are SCD Types
In complete there are complete 6 sorts of SCD which can be broadly used within the DWH implementation. They are as follows:
- Type 0: This is a hard and fast dimension.
- Type 1: Maintain solely present state.
- Type 2: History is saved.
- Type 3: Current and former states are saved.
- Type 4: Combination of sorts 1 and a pair of.
- Type 6: Hybrid sort.
In actual world. Only Type 1, 2 and three are used. Other sorts of SCDs are used hardly ever.
SCD Type 0
Type 0 is a hard and fast dimension. The information on this dimension desk by no means modifications. The information into this dimension desk is loaded one time at first of the mission. An instance for Type 0 is enterprise customers information assigned to specific areas. These enterprise customers won’t ever change their location. So the info on this dimension by no means modifications. The complete gross sales carried out by every enterprise person will be generated.
SCD Type 1
SCD sort 1 methodology is used when there is no such thing as a have to retailer historic information within the dimension desk. This methodology overwrites the outdated information within the dimension desk with the brand new information. It is used to appropriate information errors within the dimension.
As an instance, assume the shopper desk with the beneath information.
surrogate_key customer_id customer_name Location ------------------------------------------------ 1 1 Mark Chicago
Here the shopper Location is Chicago and the shopper moved to a different location New York. If you utilize type1 methodology, it simply merely overwrites the info. The information within the up to date desk shall be.
surrogate_key customer_id customer_name Location ------------------------------------------------ 1 1 Mark New York
The benefit of type1 is ease of upkeep and fewer area occupied. The drawback is that there is no such thing as a historic information stored within the information warehouse.
SCD Type 2
SCD sort 2 shops the whole historical past the info within the dimension desk. With sort 2 we are able to retailer limitless historical past within the dimension desk. In sort 2, you’ll be able to retailer the info in three alternative ways. They are:
- Versioning
- Flagging
- Effective Date
SCD Type 2 Versioning
In versioning methodology, a sequence quantity is used to symbolize the change. The newest sequence quantity all the time represents the present row and the earlier sequence numbers represents the previous information.
As an instance, let’s use the identical instance of buyer who modifications the situation. Initially the shopper is in Illinois location and the info in dimension desk will look as.
surrogate_key customer_id customer_name Location Version -------------------------------------------------------- 1 1 Marston Illinois 1
The buyer strikes from Illinois to Seattle and the model quantity shall be incremented. The dimension desk will look as
surrogate_key customer_id customer_name Location Version -------------------------------------------------------- 1 1 Marston Illinois 1 2 1 Marston Seattle 2
Now once more if the shopper is moved to a different location, a brand new file shall be inserted into the dimension desk with the subsequent model quantity.
SCD Type 2 Flagging
In flagging methodology, a flag column is created within the dimension desk. The present file may have the flag worth as 1 and the earlier data may have the flag as 0.
Now for the primary time, the shopper dimension will look as.
surrogate_key customer_id customer_name Location flag -------------------------------------------------------- 1 1 Marston Illinois 1
Now when the shopper strikes to a brand new location, the outdated data shall be up to date with flag worth as 0 and the newest file may have the flag worth as 1.
surrogate_key customer_id customer_name Location flag -------------------------------------------------------- 1 1 Marston Illinois 0 2 1 Marston Seattle 1
SCD Type 2 Effective Date
In Effective Date methodology, the interval of the change is tracked utilizing the start_date and end_date columns within the dimension desk.
surrogate_key customer_id customer_name Location Start_date End_date ------------------------------------------------------------------------- 1 1 Marston Illinois 01-Mar-2010 20-Feb-2011 2 1 Marston Seattle 21-Feb-2011 NULL
The NULL within the End_Date signifies the present model of the info and the remaining data point out the previous information.
SCD Type 3
In sort 3 methodology, solely the present standing and former standing of the row is maintained within the desk. To monitor these modifications two separate columns are created within the desk. The buyer dimension desk within the sort 3 methodology will look as
surrogate_key customer_id customer_name Current_Location previous_location -------------------------------------------------------------------------- 1 1 Marston Illinois NULL
Let say, the shopper strikes from Illions to Seattle and the up to date desk will look as
surrogate_key customer_id customer_name Current_Location previous_location -------------------------------------------------------------------------- 1 1 Marston Seattle Illinois
Now once more if the shopper strikes from seattle to NewYork, then the up to date desk shall be
surrogate_key customer_id customer_name Current_Location previous_location -------------------------------------------------------------------------- 1 1 Marston NewYork Seattle
The sort 3 methodology may have restricted historical past and it relies on the variety of columns you create.
SCD Type 4
The scd type 4 can also be referred to as as quick rising dimension. Imagine monitoring all these modifications and storing them in a single dimension (utilizing type3). It takes a lot time to generate a report when this dimension desk is joined with the actual fact desk. To generate the report sooner, the info within the dimension desk needs to be minimal.
In Type 4, the present information is maintained within the dimension desk and the historical past is saved in one other desk. This improves the efficiency when producing the report. However it provides an overhead of sustaining the historic information in a separate desk.
SCD Type 6
This is a mixture of Type 1, 2 and three. This can also be referred to as as Hybrid sort. In this dimension, the present information is saved in all of the historic file in a present column.
This sort of dimension provides quite a lot of complexity. Implementing this SCD sort is bit onerous and likewise shops quite a lot of redundant information. However, this supplies a straightforward approach to examine present information with historic information.