An overview of DataStage components


For Business Intelligence (BI) market is very much dependent on ETL architecture. The Extract, Transform and Loading products have become far more important in the data-driven age. DataStage is one of the most important ETL tools which effectively integrate data across various systems.  DataStage designs jobs that manage the collection, transformation, validation and loading of data from different systems to data warehouses.  DataStage facilitates business analysis through its user-friendly interface and providing quality data to help in gaining business intelligence.  With IBM acquiring DataStage in 2005, it was renamed to IBM WebSphere DataStage and later to IBM InfoSphere.

DataStage has four components namely Administrator, Manager, Designer and Director.  DataStage has various versions such as Server Edition, Enterprise Edition, MVS Edition and DataStage for PeopleSoft.


This component of DataStage provides a user interface for administrating projects.  It also manages global settings and maintains interactions with various systems. The Administrator’s role ranges from setting up users and project properties to adding, moving and deleting projects. It specifies general server defaults and purging criteria.  A command interface is provided by Administrator for DataStage Repository.  It plays a crucial role in managing job scheduling options, user privileges, setting up parallel job defaults and specifying job monitoring limits.


To view and edit the contents of DataStage repository, the DataStage Manager is considered to be the main interface of the DataStage repository. Whether you want to browse the DataStage repository or store and manage reusable Meta data, DataStage Manager renders all these services. Tables and files layouts, jobs and transforms routines which are defined in the project are displayed by it.  It has a crucial role in managing all the tasks related to DataStage repository.


The designer helps in creating DataStage jobs or application by providing a design interface.  These jobs are then complied to form executable programs.  Each job explicitly specifies the source of data, required transforms and the destination of data as well.  DataStage Director is responsible for scheduling the executables which are created from compiling these jobs. Designer also provides a user friendly graphical interface. The server takes care of running these executable programs.  This module is used by developers. The extraction, cleansing, transformation, integration and loading of data is performed via a visual data flow method.


As mentioned earlier, DataStage Director provides an interface which schedules executable programs formed by the compilation of jobs.  It runs, validates, schedules and monitors server jobs and parallel jobs. The Director interface plays a vital role in parallel processing.  The main users of this interface are testers and operators.


DataStage is designed to work with large volumes of data as it can collect, integrate and transform large volumes of data which have different data structures.  It also supports Big Data and Hadoop as it lets you access Big Data directly on distributed networks. It facilitates seaming less connectivity between different data sources and applications.  It also helps in optimizing hardware utilization and can prioritize mission critical tasks.