Popular Posts

Friday, September 21, 2012

MDM Architecture (or FAD)

I've been away from the blogsphere for a while as I currently working on a new book on the subject of Master Data Management. One of the ideas I've been developing is around what are the key principles that underpin an MDM solution? I believe that these principles can be summaries within four words; Flexibility, Authority, Decoupling and Secure or FADS for short. The word FADS is used because it reminds us that unless we adhere to the underlying principles of the MDM architecture the MDM component will become an out dated concept (or a fad).


The MDM architecture should have flexibility built into it. By this I mean the following:

·         The design of the solution should be able to handle change to master data, master data definitions, business needs and legal changes.

·         It should make use of industry standards to ensure that the use of multiple technology (and changes in technology) can be accommodated. This will also ensure that the rollout is easier as less pain will be felt integrating with variety of vendor software packages that can typically be found across most organisations. Standards don’t need to be technology based they may represent coding standards from standards bodies such as ISO.

·         It should be based on the concept of re-use. This means that sub components should be designed to be re-useable and also existing systems should be re-used within the MDM architecture if they can add value.

·         It should be possible to incrementally develop the MDM solution rather than have to embark on a big bang approach. This allows business value to be realised quickly whilst building towards the final end state.

·         It should be based upon consistency in the architecture. By this I mean that the architecture should be based upon the whole end state at least from a logical and conceptual level resulting in a reducing design conflicts.


For our MDM solution to be valuable to the organisation and treated as the single source, or golden source of truth, it needs to have authority. By this I mean:

·         That not only should it handle the dissemination of master data across the organisation. But it should disseminate data that everyone believes/knows is correct and has known providence.

·         The master data should also have identifiable business value. There is no point in building an MDM system as a pure IT initiative as the value of master data is in its business usage.

·         Ownership of data needs to be clear and managed through the system.


Master data is used throughout an organisation therefore it becomes important to create an architecture that can handle to variety of technologies and changes of technology both within the MDM environment and within the wider enterprise. A decoupled architecture gives us a framework that allows components to remain completely autonomous and unaware of each other. By decoupling the MDM solution we make it available as a data asset that can be used by any part of the organisation.


Under secure we have a number of areas to consider:

·         System security so that the users (and interacting systems) are only allowed to access data that they have authority to access in the mode they have authority to work in.

·         Data privacy issues need to be identified and managed. Legal/ ie data protection act.

Thursday, August 16, 2012

Master Data Management Marketplace

There has been much noise about MDM (Master Data Management) in the IT press and all the clients I talk to. Due to this chat I though it would be interesting to put a poll out to see the real state of the marketplace.I will endeavour to publish the results at some point in the future.

Monday, July 16, 2012

How much master data do you have?

Organisation’s are storing/creating more and more data year after year. This is an ever increasing problem. So how much master data does a typical organisation actually have? Let’s look at two examples:
Example 1/ A local government organisation for a city or a state/county. In the UK we call these county councils. Probably has a budget of about £350 million per year, would likely have at least 10 different key databases for areas such as HR, accounts, social care issues and planning. In addition to the 10 there are probably a multitude of satellite systems sitting on peoples desks or used by small teams but not probably known or managed by any central IT function, let’s assume 20.
Let’s make a few assumptions about these systems which we can used to give a sense of the data we need to worry about. If we assume each table within a database on average has 20 attributes. We can also assume that each main system has 25 tables and each satellite system has 10 tables.  This means we get the following:
Main Systems
·         No of Key Database systems = 10                   
·         No of Tables = 50
·         No of Attributes = 20                                           
·         Total number of Attributes = 10,000 (10*50*20)
Satellite Systems                
·         No of Satellite database systems = 20          
·         No of Tables = 10
·         No of Attributes = 20                                           
·         Total number of Attributes = 4,000 (20*10*20)
The grand total from all the above is = 14,000 attributes
·         Grand Total = 14,000
Example 2/ What about within a large corporate. If we make the assumption that tables are of a similar size but the different is the number of systems and the complexity of those systems (number of tables). With some estimated numbers we get something like the following:
Main Systems
·         No of Key Database systems = 25                   
·         No of Tables = 100                                                 
·         No of Attributes = 20                                           
·         Total number of Attributes = 50,000 (25*100*20)
Satellite Systems                
·         No of Satellite database systems = 100        
·         No of Tables = 10
·         No of Attributes = 20                                           
·         Total number of Attributes = 20,000 (20*10*20)
The grand total from all the above is = 70,000 attributes
·         Grand Total = 14,000
Using our two examples above we can now ask the question how much data is likely to be master data. Without going into the specifics let’s assume only 10% of our data can be classified as Master Data. This means that we get the 7,000 attributes for the large corporate and 1,400 attributes for the smaller company to worry about. Quite a lot I think?

How big is your enterprise? I would be interested in reads view on this.

Wednesday, April 11, 2012

A few Predictions for Data in 2012

The beginning of January I wrote a blog post that gave an update to my predictions for 2011. Overall my predictions were not bad and out of a possible 80 I scored 66 which means I got approximately 82% of my predictions correct. Christmas seems a far away memory (snow has all gone and in fact it’s now April) so I felt that it was necessary to get this year’s predictions out before we end up reaching the summer.
I’m going to stay with some of last year’s predictions and add a few extra that I believe are up and coming data issues. So the ones that stay are:
1/ Data Privacy and Data Security will grow in importance
2/ Hardware and software for managing data will adapt to the data tsunami from the corporate world.
3/ Demand for Data Experts will grow but will be hampered by available resource.
4/ Data Governance will be a hot topic.
My new predictions to add to the list are as follow:
5/ Big Data
There are a couple of terms that have hit the press over the last few years and grown in recognition by senior IT and even non IT management. The first one is big data and an example of its growth can be found in the following link:
What is big data and why has it become so important? Big data can be is a general term used to describe the huge amounts of data a company creates; be it unstructured, semi-structured or structured. A few key points:
  • It’s all about being able to use and analyse these vast quantities of data.
  • The term ‘BIG’ doesn't refer to any specific volume of data.
  • It’s generally accepted that unstructured data, most of it located in text files, accounts for at least 80% of an organization’s data.
  • If left unmanaged, the sheer volume of data that’s generated each year can be costly (in terms of storage and management) for a company.
  • Unmanaged data can also pose a liability in terms of compliances or lawsuit if it cannot be found when needed. 

Big data is important not because it represents any new idea. To be blunt we used to have VLDBMS (Very Large Data Base Management Systems) many, many years ago. What big data provides is a new way of explaining and branding the problems that organisations face. This doesn’t mean it’s not important, I actually feel that people’s ability to comprehend is just as important as technical breakthroughs.
I predict that big data will continue to grow in important as a term that senior executives can hang their hat (by this I mean use the term to justify project budgets) on.
6/ Data Scientist
The second phrase is ‘Data Scientist’. This one I struggle with a little as really all it represents is a re-branding of the data analyst role but it gets heavily confused with data architecture. I believe the re-branding is really the recognition that as data volumes and complexity grows the role of understanding and investigating the data has become larger and more important role.
7/ UK Commercialisation of data
Within the UK we are undergoing an information revolution as regards healthcare data. The NHS (UK’s government run health authority) is steadily increasing the amount of data available and the ways it can be accessed. This is all part of the UK growth strategy – on one hand to encourage investment form healthcare companies in the UK due to the availability of this type of data and secondly the government has woken up to the fact that it has a valuable asset which it should try and monetise. 
8/ Government Data Sharing
Fraud detection will go big brother. The UK government and the National Fraud Authority will take us down the road of sharing data across government and private industry to enhance the detection of fraud.
The government plans to share the personal information it holds with businesses such as banks, insurance companies and credit reference agencies. This data will in all likelihood come from the Department for Work and Pensions (DWP) and HM Revenue & Customs (HMRC).
9/ The Data Cloud
The Cloud has become big news over the last few years and I assume will grow in significance over the next. One of the growing trends is for companies to host there data systems on the cloud. Whilst I am sure this trend will grow I believe over the next 18 months the honeymoon period will end and a realisation that using the cloud doesn’t solve data problems just brings new ones. Questions such as data security, performance of environments used to host data, and contractual niceties around data ownership all start to become bigger problems.

Tuesday, January 24, 2012

Redundant Data

Redundant data can be both a benefit and a problem to organisations but ultimately it means we have more data to manage. Some redundancy is clearly required for backups and optimisation reasons but unknown or inconsistent redundancy is a disaster.

I would like to get some sense of the level of redundancy of data that exists in organisations today hence the poll below. Please be kind enough to complete. I will be posting the results on this blog in a few months hopefully.

Many thanks for your time and effort.