Work flow dependency in Informatica : Part-1 Event Wait Files

This article explains managing basic work flow dependencies in Informatica using event wait files.
I’ll explain why load status entries are a better suited implementation for complex scenarios and real applications and how you can simulate the same “event-wait” style behavior using load status entries in a later post.

1. Scenario
2. Implementation
3. Other Caveats (Delete event wait files)
4. Issues and Drawbacks.
5. Alternative solution (Load Status entries).

1. Scenario

The use case for the event wait files is when you have one workflow depending on another.

Example, let us say you have two workflows, wkf_Customer_data and wkf_Sales_Data that load Customer and Sales data respectively. Let’s assume that the requirement is to start the wkf_Sales_Data only after the wkf_Customer Data is complete.

2. Implementation:

The first step to do in this case is to generate an event file (also sometimes called a trigger file) after the workflow wkf_Customer_Data is complete. On Unix based systems, you can do this by creating a command task in Informatica and using one of the following command inside the task.

Any of the above commands will create a file with the current day’s time stamp in the specified directory. This will act as a trigger file for the next process.

You can add this inside a command task from the worklow manager and connect it to your workflow as shown below.

The next task is to create a event wait task that waits on this file in the wkf_Sales_Data workflow.

And when you schedule both the work flows, the Sales Data work flow waits till the customer work flow is complete and the trigger file is generated. Here are the screen shots of the work flow monitor and the entries from the log file..

3. Delete event wait files.

The next important thing to remember is to archive/purge the trigger files so that the next day’s load will not proceed further by looking at the existing trigger file (which is one day/week/month older depending on your load frequency). If you choose to delete the file, you can do that by checking the option at the wait event task level.

4. Issues and Drawbacks.

a. Multiple dependent workflows :

The existing command task in wkf_customer_data should be changed to generate multiple files (with unique names for each dependent workflow). This is usually handled by creating trigger files with the naming convention parent_wkf_name_dependant_wkf_name.trg. Let’s say two other workflows (wkf_product_data, wkf_month_end_load) depend on the customer workflow. You would add two more commands to the customer workflow and edit the existing one. The final commands would be…

Again, each of the dependant workflows must be deleted/purged/archived accordingly.

b) No Querying capabilities.

There are questions like the ones below which are frequently asked and are difficult to answer (not impossible, though) with trigger files ,even when you archive them at the end of the load.

1. when does this workflow complete everyday?
2. What is the average load time for this workflow given the load starts at 10:00 AM Everyday.
3. When does it complete on weekends and on month-ends?
4. How do we set up an alert to notify a team if the workflow hasn’t completed till a given time?
5. Was this always a daily process ? When was it changed to a weekly process?
6. What are the dates on which this particular workflow did not run last year?

5. Alternative solution (Load Status entries).

The solution to these and many other similar issues (when working with files!) is to use Database table entries, also sometimes called load status entries).

I will discuss the implementation and post the code snippets that is needed to implement it in a later post.

-Rajesh.

  • Pingback: Workflow dependency in Informatica: Part-2 : Load status entries | ETL-Developer.com()

  • Raveendra

    Hi,
    This is very helpful. but let me know : what are the file extensions i.e .done and .trg?
    if we create single trigger file the extension is .done
    if we create multiple files, the extension is .trg

    let me clear it please.

    whar are the extensions for triggers files and is this extension is based on no.of triggers files that we generated? is it manadatory to specify the file extesion?

    • Rajesh

      @Raveendra – The file extension does not really matter. All we are checking for is if the file exists for a given day. And it does not depend on the number of files either.
      You can use .done (or .trg) for all of the scenarios. I just wasn’t consistent in my example 😉

      • Zip

        Dear Rajesh,
        Thanks for great post.
        i have a question, plz help.
        in same scenario if i would like to wait for trigger file for 2 hours, if file not gets generated workflow should get fail.

        Regards
        ZIP

  • can we abort the workflow if the file does not arrive in two hours using workflow, I know there is a way using shell

    • Rajesh

      Hi Sidd,

      Please refer to the follow up article at http://www.etl-developer.com/2011/04/workflow-dependency-in-informatica-part-2-load-status-entries/.
      The example I show there uses a table query to see if the count is greater than 1 and fails the workflow if the record is not found till a point in time.

      You can implement the same behavior to wait for a file, just check if the file exists (if [ -f $FILE ]), then return zero(success). If the file does not exists and the time has crossed, return -1 or any non-zero error code and Select “fail parent when this task fails” at the command task level.

      -Rajesh.

  • Martin

    What do I need to do if I want to exit a workflow when a specified file EXISTS.
    So the other way around I don’t want to continue the workflow if I find a STOP file for example which is created during a (UNIX shell) cmp -s file1 file2 > file.diff
    In case both files are O.K. there is no file.diff file created in case both files differ a file.diff will be created and I want to exit the WF.
    Thanks,
    Martin

    • Rajesh

      If you just want to fail the workflow when there are differences, you can do that using a shell script like this.

      #!/bin/sh
      cmp -s file1 file2 > diff.txt
      if [ -f diff.txt ]; then
      exit -1 # any non-zero error code
      fi

      You can then call this using a command task in your workflow. (Check the following “fail parent if this task fails”, “Fail task if any commmand fails”, “restart task”.) Now everytime there is a file difference, the command task returns exit code 1 and fails and so will the parent workflow.

      If you need more control as to what sessions you need to run when you see differences, you ‘ll have to use pmcmd commands in your script.

  • Pingback: Extending functionality using Command Tasks : Informatica. | ETL-Developer.com()

  • Rajkumar

    Thanks for the nice article. In this experiment Unix command is used to create event trigger file. How this could be done in a windows environment. I am learning informatica on windows environment so please provide me with information which will be really helpful for me. thanks for your help in advance.
    Regards
    R

  • Prasad

    Thanks a lot Rajesh,

    Very nice article as i am new to the informatica its very help full for me.

    Regards
    Prasad