A recent project reminded me of the real world scenario where masterdata is never clean. We are being told that the textbox is a free text and city can be a street, district or even province. Of course, this creates a challenging downstream impact for your data analytics.
Masterdata POV (Point of View)
- Decide the POV of masterdata. One system masterdata could be another system transactional data and vice versa.
- Use a reference point and communicate a common language of masterdata.
- Knowing your end game will decide what masterdata to be collected with a relevant POV.
Tips to clean your masterdata
- If more than 50% of your masterdata data needs cleansing, it is worthwhile to drop this masterdata,
- Know what to clean and not to clean for the sake of cleaning.
- A clean masterdata exhibits consistent patterns while an unclean one is a total chaos.
- Know your domain well to clean effectively!
Cleaning masterdata is a iterative process. You get better and resilient with practice. A good data sense is also advantageous. Good luck cleaning and may the force be with you!