Extending functionality using Command Tasks : Informatica.

Martin posted an interesting question in the comments section here. It belongs to a common class of questions about how we can add abilities such as file handling, processing, and similar tasks in informatica, so I decided to extend the discussion into a post in itself.

This post explains how you can use Informatica’s command task in such scenarios. You can see another example here, where we enhance the event wait functionality to answer better queries and reduce maintenance.

Just to be clear, I’ll repeat the requirement..

Everyday before the workflow runs, check the two files “file1” and “file2” for differences. If they differ, the workflow should fail and send a failure email. Otherwise, it should continue with the rest of the workflow.

Pretty simple task for the Unix compare command. Here is the shell script that implements the above requirement, and writes details to the workflow log file. (they will not be visible from the workflow monitor ,but if you check the actual workflow log on the file system, you’ll be able to see the messages).

The second part involves calling this script from Informatica and making sure the workflow fails when we see a diff in the files.
I’m going to extend the workflow I set up in the email task example. Here is how the initial workflow looks.

Initial workflow before adding the command task

Initial workflow before adding the command task

Steps to add command Task :

1. In workflow Manager, open the folder in which your workflow is present and then open the workflow.
2. Once you are in the Workflow Designer tab, click on Task > Create task from the menu bar.
3. Select “command Task” from the drop down and name it “cmd_check_file_diffs”. Click create and done.
4. Double click on the created tasks and set the following properties.

General Tab : check “Fail parent if this task fails”.
Properties Tab : check “Fail task if any command fails”, Recovery Strategy : “Restart task”.
Command Task : Add a new command and name it “Check_File_Diffs”. In the command section enter the unix command with the full path name.

Adding script to command task

Adding script to command task

Connect the newly created command task to the rest of the workflow and add the link conditions as needed.

Adding command task in the workflow

Adding command task in the workflow

Once this is done, I added two following files to simulate a difference in the two files.

Run the workflow. Here is the content from the workflow log when I look at it in the file system.

As you can see, the following lines have been added to the log as part of the shell script.

To test the case where the files match, I copied file1 to file2 in Unix.
In this case, the contents in the workflow log look like this.. (and the workflow proceeds to the next task ).

I haven’t found any APIs in the informatica documentation that let the developer add INFO/WARNING/MESSAGES from shell scripting. The messsages I showed above are only available on the file system and cannot be viewed when you see the log from workflow monitor. You can add INFO messages from Java transformations using the log() command, but adding them from an external script doesn’t seem possible. If you know of any way to do that, please post it in the comments.

Finally, keep in mind that this is a task that you are developing outside Informatica. You must ensure you do good exception handling and logging in your shell script and test it out correctly. Obviously, Informatica will not be able to help you there 🙂

  • Jesse

    You can get command errors to show in the session log if you use pre or post-session commands on a session task instead of using a command task itself.

    • Jesse – Thanks for the comment. You are right.
      But for script that have some complex functionality (spanning more than a couple of lines), I usually prefer writing it as a stand-alone script, in which case, I take this approach.

      for one-lines (like the ones common in Sed/AWK), command tasks is definitely the way to go.

  • Goutham

    Hi,

    I created a script as given above, but it says the below:

    cannot return when not in function

    I did not understand this, can u plz tell what went wrong?

    Thanks,
    Goutham

  • alex

    I have a script that finds and deletes files that match a certain regular expression. When I run the workflow it will run the script but technically the script will fail because you cant remove non empty directories in linux and nothing will be cleaned up in the directory and it wont write to the log file. What am I doing wrong and how can I fix it?

    • Hi Alex,

      If you are trying to delete a non-empty directory, you can use “-r” option (rm -r dir_name). This is recursive, so BE VERY CAREFUL about the directory path.

      However, I rarely find the need to delete directories as part of daily ETL processing. If you are trying to delete files and your script is failing, it is quite probably is a permissions issue (not becuase the directory is non-empty). Check the permissions for the files using “ls -ltr” and change them using “chmod” command, if necessary.