Informatica — Lookup Transformation (“Use any value”)!

For those of you who have limited/no Informatica Power Center experience before, this is one of the options you can select when you use a look up transformation in your mapping.

I’ll first explain the background as to why this is a serious problem and then move on to explaining what it has to do with the Informatica look up transformation.

Database – Dupes, Rownum/Top and Distinct

If you have worked with Databases and Queries , you might have seen the wrong usage of the DISTINCT keyword (or) “where rownum = 1” clause for Oracle and “Top” clause for SQL Server.

Developers sometimes have the tendency to eliminate duplicate rows by using these “shortcuts” instead of debugging the actual reason as to why the result set has duplicates. Sometimes, the query is generated automatically by a reporting tool which might have this option enabled. You should actually be happy if this code/logic fails soon enough and you catch it during development.

Here is a small example.

Consider the following customer dimension with just enough columns to illustrate the case that we are dealing with.

Here are the contents. As you can see, the table has names and states for different customers.

To get the current address for a given customer, the correct query would use the active_flag or the “eff_end_date” logic as below.

The following query would still give a result without an error (or) warning, but the result set is not correct and consistent. What’s even worse is the fact that you might get the correct result sometimes.

What’s all this got to do with Informatica’s Lookup Tranformation ?

Almost everything….”Use any value” is the developer’s version of saying “I dont know/care why there are duplicates..” in Informatica. Worst yet, some times, you do not realize you are doing this mistake since you don’t see the underlying code.

For those of you who haven’t worked with it before, Informatica’s lookUp transformation does what the name indicates. It does a lookup on a table based on the lookup condition. For the example above, if you need the state for a a given customer, you’ll typically do a lookup on CUSTOMER_DIM which would be something like this.

As you can see, the Default option is to “Use any Value” and not raise an error. you might get (un)lucky and see the right state sometimes, but this will eventually start throwing up errors..and Of course, this option/approach is downright wrong.

I therefore suggest you select the “Report Error” option by default and understand why you see duplicates and change the condition/ SQL Override Query if needed.

If you have come across environments where it makes sense to always use “Use any value” by default, Please post your comments and I’ll update the post with those specific scenarios if needed.

Hope this helps..