Multi Language Data Warehouse Design

As the world becomes more global in nature it’s not surprising that multi language data warehouses become more and more desirable. In today's post we are going to looks at a dimensional design approach that fulfils this need whilst providing a relatively simple design pattern that can be used throughout your multi-lingual data warehouse.

Let’s start with a simple customer dimension:
 
Customer Identifier
Customer Name
Customer Type Code
Customer Type Description
Mailing Consent Indicator
Mailing Consent Description
1
Fred Blogs
H
High
1
Yes
2
John Smith
L
Low
1
Yes
3
John Doe
L
Low
0
No
5
Jane Smith
H
High
1
Yes
6
Jane Doe
L
Low
1
Yes
This dimension table is linked to our fact table via its PK (Customer Identifier). We now need to consider how this dimension table changes when we add multiple languages. In this case we are going to include 2 extra languages on top of English which are French and German.

Our initial approach is to add a language attribute to our table and just include all the multi language text into the same dimensional structure. This creates an issue about what to do with the PK. The PK of the customer dimension causes us a problem as we don’t want to have to have separate fact records for each different language version of the customer record. We therefore replicate the PK for each different language version as shown below:
 
Customer Identifier
Customer Name
Customer Type Code
Customer Type Description
Mailing Consent Indicator
Mailing Consent Description
Language Name
1
Fred Blogs
H
High
1
Yes
English
2
John Smith
L
Low
1
Yes
English
3
John Doe
L
Low
0
No
English
5
Jane Smith
H
High
1
Yes
English
6
Jane Doe
L
Low
1
Yes
English
1
Fred Blogs
H
Haut
1
Oui
French
2
John Smith
L
Bas
1
Oui
French
3
John Doe
L
Bas
0
Non
French
5
Jane Smith
H
Haut
1
Qui
French
6
Jane Doe
L
Bas
1
Oui
French
1
Fred Blogs
H
Hoch
1
Ja
German
2
John Smith
L
Niedrig
1
Ja
German
3
John Doe
L
Niedrig
0
Nein
German
5
Jane Smith
H
Hoch
1
Ja
German
6
Jane Doe
L
Niedrig
1
Ja
German

To access this table we would link the dimension to the fact table on the PK as normal and filter the dimension by the Language Name. An example piece of SQL would look something like the following:

Select FACT.VALUE,
           CUSTOMER.CUSTOMER_NAME,
           CUSTOMER.CUSTOMER_TYPE_DESCRIPTION
From FACT,
          CUSTOMER
Where FACT.CUSTOMER_IDENTIFIER = CUSTOMER.CUSTOMER_IDENTIFIER
And CUSTOMER.LANGUAGE_NAME = [Enter Language]

This provides us with a basic solution to the multi lingual problem but we will end up multiplying the storage requirements for our dimension tables by the number of languages we are supporting. To resolve this issue we create an out rigger dimension that holds the multi language text and only have the non language dependant attributes in our customer dimension as shown below:


Customer Identifier
Customer Name
Customer Type Code
Mailing Consent Indicator
Language Text Identifier
1
Fred Blogs
H
1
1
2
John Smith
L
1
2
3
John Doe
L
0
3
5
Jane Smith
H
1
1
6
Jane Doe
L
1
2


The rigger table we will call ‘customer text’ as that is what it holds. We will also only hold the unique sets of values so this table will have a much smaller size than was taken up by the multi language text in our original version of a multi language customer dimension. The only remaining problem with this approach is that we will need to create a composite key based on the Text Identifier and the Language Name, an example is shown below:


Text Identifier
Customer Type Description
Mailing Consent Description
Language Name
1
High
Yes
English
2
Low
Yes
English
3
Low
No
English
1
Haut
Oui
French
2
Bas
Oui
French
3
Bas
Non
French
1
Hoch
Ja
German
2
Neidrig
Ja
German
3
Niedrig
Nein
German

So to access our new customer dimension and it’s associated out rigger table we would use a SQL statement similar to the following:

Select FACT.VALUE,
           CUSTOMER.CUSTOMER_NAME,
           CUSTOMER_TEXT.CUSTOMER_TYPE_DESCRIPTION
From FACT,
          CUSTOMER,
          CUSTOMER_TEXT
Where FACT.CUSTOMER_IDENTIFIER = CUSTOMER.CUSTOMER.IDENTIFIER
And CUSTOMER.CUSTOMER_IDENTIFIER = CUSTOMER_TEXT.TEXT_IDENTIFIER
And CUSTOMER_TEXT.LANGUAGE_NAME = [Enter Language]

Obviously the table structures can be tided up a bit but the examples above should be enough to provide a design approach for handling multiple languages within your data warehouse.

Comments

Popular Posts