Tuesday, March 20, 2012

Data quality tools

Guys,

I'm trying to implement some of the data quality techniques. I use staging tables in my ETL process.

What I'd like to do is:

- check for required fields - make sure all required fields have values in all records

- check for duplicates

- check for data formats (line phone#, zip code etc)

Obviously all of this can be in SQL.

My question is what would be the best approach in SSIS to do it?

Dima.

Dima S wrote:

- check for required fields - make sure all required fields have values in all records

The Derived Column component has an ISNULL() function.

Dima S wrote:

- check for duplicates

Aggregate transform will give you all records that contains duplicates. COUNT over all the columns in which you want to check for duplicates.

Dima S wrote:

- check for data formats (line phone#, zip code etc)

Regular expressions would be a good fit here: http://blogs.conchango.com/jamiethomson/archive/2005/07/04/1754.aspx

-Jamie

|||

Thanks!

No comments:

Post a Comment