This article explains managing basic work flow dependencies in Informatica using event wait files.
I’ll explain why load status entries are a better suited implementation for complex scenarios and real applications and how you can simulate the same “event-wait” style behavior using load status entries in a later post.
The use case for the event wait files is when you have one workflow depending on another.
Example, let us say you have two workflows, wkf_Customer_data and wkf_Sales_Data that load Customer and Sales data respectively. Let’s assume that the requirement is to start the wkf_Sales_Data only after the wkf_Customer Data is complete.
The first step to do in this case is to generate an event file (also sometimes called a trigger file) after the workflow wkf_Customer_Data is complete. On Unix based systems, you can do this by creating a command task in Informatica and using one of the following command inside the task.
cat “done” > home/....../wkf_Customer_Data.done.ksh
Any of the above commands will create a file with the current day’s time stamp in the specified directory. This will act as a trigger file for the next process.
You can add this inside a command task from the worklow manager and connect it to your workflow as shown below.
The next task is to create a event wait task that waits on this file in the wkf_Sales_Data workflow.
And when you schedule both the work flows, the Sales Data work flow waits till the customer work flow is complete and the trigger file is generated. Here are the screen shots of the work flow monitor and the entries from the log file..
Orders Workflow (Starts at 11:07 PM and waits for Customer file till 11:34 PM )
Severity Timestamp Message
INFO 1/12/2011 11:07:28 PM Event Wait file watch task instance [ew_Customer_Data]: Execution started.
INFO 1/12/2011 11:07:28 PM Event Wait file watch task instance [ew_Customer_Data]: started file watch on node [XXXXXX].
INFO 1/12/2011 11:34:07 PM Event Wait file watch task instance [ew_Customer_Data]: Execution succeeded.
INFO 1/12/2011 11:34:07 PM Session task instance [s_m_test_Orders] is waiting to be started.
Customer Workflow : Completes the task that generates trigger file at 11:34 PM
Severity Timestamp Message
INFO 1/12/2011 11:34:07 PM Command task instance [cmd_workflow_Customer_Data_complete_trigger_file]: execution of command [cmd_workflow_complete_trigger_file] completed successfully
The next important thing to remember is to archive/purge the trigger files so that the next day’s load will not proceed further by looking at the existing trigger file (which is one day/week/month older depending on your load frequency). If you choose to delete the file, you can do that by checking the option at the wait event task level.
a. Multiple dependent workflows :
The existing command task in wkf_customer_data should be changed to generate multiple files (with unique names for each dependent workflow). This is usually handled by creating trigger files with the naming convention parent_wkf_name_dependant_wkf_name.trg. Let’s say two other workflows (wkf_product_data, wkf_month_end_load) depend on the customer workflow. You would add two more commands to the customer workflow and edit the existing one. The final commands would be…
Again, each of the dependant workflows must be deleted/purged/archived accordingly.
b) No Querying capabilities.
There are questions like the ones below which are frequently asked and are difficult to answer (not impossible, though) with trigger files ,even when you archive them at the end of the load.
1. when does this workflow complete everyday?
2. What is the average load time for this workflow given the load starts at 10:00 AM Everyday.
3. When does it complete on weekends and on month-ends?
4. How do we set up an alert to notify a team if the workflow hasn’t completed till a given time?
5. Was this always a daily process ? When was it changed to a weekly process?
6. What are the dates on which this particular workflow did not run last year?
The solution to these and many other similar issues (when working with files!) is to use Database table entries, also sometimes called load status entries).
I will discuss the implementation and post the code snippets that is needed to implement it in a later post.