Informatica — Lookup Transformation (“Use any value”)!

For those of you who have limited/no Informatica Power Center experience before, this is one of the options you can select when you use a look up transformation in your mapping.

I’ll first explain the background as to why this is a serious problem and then move on to explaining what it has to do with the Informatica look up transformation.

Database – Dupes, Rownum/Top and Distinct

If you have worked with Databases and Queries , you might have seen the wrong usage of the DISTINCT keyword (or) “where rownum = 1” clause for Oracle and “Top” clause for SQL Server.

Developers sometimes have the tendency to eliminate duplicate rows by using these “shortcuts” instead of debugging the actual reason as to why the result set has duplicates. Sometimes, the query is generated automatically by a reporting tool which might have this option enabled. You should actually be happy if this code/logic fails soon enough and you catch it during development.

Here is a small example.

Consider the following customer dimension with just enough columns to illustrate the case that we are dealing with.

Here are the contents. As you can see, the table has names and states for different customers.

To get the current address for a given customer, the correct query would use the active_flag or the “eff_end_date” logic as below.

The following query would still give a result without an error (or) warning, but the result set is not correct and consistent. What’s even worse is the fact that you might get the correct result sometimes.

What’s all this got to do with Informatica’s Lookup Tranformation ?

Almost everything….”Use any value” is the developer’s version of saying “I dont know/care why there are duplicates..” in Informatica. Worst yet, some times, you do not realize you are doing this mistake since you don’t see the underlying code.

For those of you who haven’t worked with it before, Informatica’s lookUp transformation does what the name indicates. It does a lookup on a table based on the lookup condition. For the example above, if you need the state for a a given customer, you’ll typically do a lookup on CUSTOMER_DIM which would be something like this.

As you can see, the Default option is to “Use any Value” and not raise an error. you might get (un)lucky and see the right state sometimes, but this will eventually start throwing up errors..and Of course, this option/approach is downright wrong.

I therefore suggest you select the “Report Error” option by default and understand why you see duplicates and change the condition/ SQL Override Query if needed.

If you have come across environments where it makes sense to always use “Use any value” by default, Please post your comments and I’ll update the post with those specific scenarios if needed.

Hope this helps..
Rajesh.

  • ak

    Good point.. keep up the good work

  • Manoj

    Does this mean sometimes the lookup would return state as ‘MI’ and sometimes as ‘FL’?

    • Rajesh

      Yes. It is very much possible.
      Some of the issues I have worked on in the past have been because of this problem, where the report still shows old value in some cases and new value in others.

      • Manoj

        so u mean to say that by default it should be “Report error” correct?

        • Rajesh

          Yes.

  • Vadim

    If you loading your Type2 target incrementally, you should use active_flag = ‘Y’ in Lookup Source Filter. In that case, multiple match MUST report error, indicating you have multiple records in the target with active_flag = ‘Y’, which would be erroneous.

    For historic back-population you would have to have more complex condition with the effective dates.

    Vadim.

    • Rajesh

      Vadim,

      Either way, You de-activate the latest record and insert a new one. How you identify the latest active record is a technique that varies. Most places have active_ind and end_date and keep them in sync.

      In both cases, I believe the code should error-out when there are multiple active records.

  • Hemant Raut

    Hi Rajesh,
    I have a scenario where I need to pass one column as source to the look up table but that column is not present in the look up table i mean i need to write a look up sql override query to show the relation between these two column(i.e. source column and input port of the look up) so in look up condition it is not allowing me to set a condition with the required source column as that column is not there is look up table.
    so my question is how i can solve this and if it is possible with look up any value option will it throw an error?

    • Rajesh

      Pretty common “situation” with Informatica look-ups. You can still create a dummy source/target definition and use it in your query though. If your table has only first_name and last_name and you have the input as Name, you can create a dummy source/target with the required definition.as long as our overide query runs correctly, you should be fine.

  • Srinivasarao Kotipatruni

    Hi Rajesh,
    I have a situation where I need to pass in_org_num,in_field2, in_field3 as i nput to a lkp transformation. But, lkp table has only field2, field3 columns and not org_num column. Output shold be all the three: org_num, f2,f3 with only f2,f3 involving in the lkp condition. If a match is not found with f2,f3. Then, what is d value for org_num in output.

  • Venkatesh

    will it takes default value in unconnected lookup t/r?

    • @Venkatesh – From the documentation…”If the lookup query fails to return a value, the port receives a null value”. If you need a default value, you can wrap the lookup expression inside a IFNULL expression. Thanks for pointing this difference out. I’ll update the post.