Ds Starter

download Ds Starter

of 53

Transcript of Ds Starter

  • 8/3/2019 Ds Starter

    1/53

    Lesson 1.1: Opening the sample job

    The first step in learning to design jobs is to become familiar with the structure of jobs and with theDesigner client. The Designer client is your workbench and your toolbox for building jobs.

    About this task

    This lesson shows you how to start the Designer client and open the sample job that is supplied with thetutorial.

    The Designer client

    The Designer client gives you the tools that you need to create jobs that extract, transform, load, andcheck the quality of data.

    The Designer client is like a workbench or a blank canvas that you use to build jobs. The Designer clienthas a palette that contains the tools that form the basic building blocks of a job:

    y Stages connect to data sources to read or write files and to process data.

    y Links connect the stages along which your data flows.y Annotations provide information about the jobs that you create.

    The Designer client uses a repository where you can store the objects that you are creating as part of thedesign process. These objects can be reused by other job designers. The sample job is an object in therepository that is included with the tutorial. The sample job uses a table definition, which is also an objectin the repository.

    In the design area of the Designer client, you work with the tools and objects to create your job designs.The sample job opens in a design window.

    The sample job for the tutorial

    The sample job reads data from a flat file and writes it to a data set. Parallel jobs use data sets to storedata as the data is worked on. These data sets can be transient and invisible to you, the designer, or youcan choose to create persistent data sets. The sample job writes data to a persistent data set. The dataset provides an internal staging area where the data is held until it is written to its ultimate destination in alater module. When designing jobs, you do not have to create a staging area for your data; this is simplyhow this tutorial was constructed.

    The data that you use in this job is the bill-to information from GlobalCo. This data becomes the bill_todimension for the star schema.

    Starting the Designer client and opening the sample jobBefore you begin

    Ensure that WebSphere Application Server is running.

    About this task

    To start the Designer client and open your first job:

    Procedure

  • 8/3/2019 Ds Starter

    2/53

    1. Select Start > Programs > IBM InfoSphere Information Server> IBM InfoSphere DataStageand QualityStage Designer.

    2. In the Attach window, type your user name and password.

    3. Select the tutorial project from the Project list, and then click OK. The Designer client opens anddisplays the New window.

    4. Click Cancel to close the New window because you are opening an existing job and not creatinga new job or other object.

    5. In the repository tree, open the Tutorial folder double-click the samplejob job. All of the objects

    that you need for the tutorial are in this folder.

    The job opens in the Designer client display area. The following figure shows the Designer client withthe samplejob job open. The Tutorial folder is shown in the repository tree.

    Figure 1. Designer client

  • 8/3/2019 Ds Starter

    3/53

    Lesson checkpoint

    In this lesson, you opened your first job.

    You learned the following tasks:

    y How to start the Designer client

    y How to open a job

    y Where to find the tutorial objects in the repository tree

    Lesson 1.2: Viewing and compiling the sample job

  • 8/3/2019 Ds Starter

    4/53

    In this lesson, you view the sample job to understand its components. You compile the job to prepare it torun on your system.

    About this task

    The sample job has a Sequential File stage to read data from the flat file and a Data Set stage to write

    data to the staging area. The two stages are joined by a link. The data that will flow between the twostages on the link was defined when the job was designed. When the job is run, the data will flow downthis link.

    Exploring the Sequential File stageAbout this task

    To explore the Sequential File stage:

    Procedure

    1. In the sample job, double-click the Sequential File stage that is named GlobalCo_billTo_flat. Thestage editor opens to the Properties tab of the Output page. All parallel job stages have

    properties tabs. You use the properties tab to specify the actions that the stage performs whenthe job is run.

    2. Look at the File property under the Source category. You use this property to specify the file thatthe stage will read when the job runs. In the sample job, the File property points to a file calledGlobalCo_BillTo.csv. You specify the directory that contains this file when you run the job. Thename of the directory has been defined as a job parameter named #tutorial_direct#, the #characters show that the name is a job parameter. Job parameters are used to so that variableinformation (for example, file name or directory name) can be specified when the job runs ratherthan when the job is designed.

    3. Look at the First Line is Column Names property under the Options category. In the samplejob, this property is set to True because the first line of the GlobalCo_BillTo.csv file contains thenames of the columns in the file. The remaining properties have default values.

    4. Click on the Format tab. The Format tab looks similar to the Properties tab, but the properties thatthe job designer sets here describe the format of the flat file that the stage reads. In this case thefile is comma-delimited, which means that each field within a row is separated by a commacharacter. The Format tab also specifies that the file has DOS line endings. This setting meansthat the file can be read even when the file resides on a UNIX system.

    5. Click the Columns tab. The Columns tab is where the column metadata for the stage is defined.The column metadata defines the data that will flow down the link to the Data Set stage when the

    job runs. The GlobalCo_BillTo.csv file contains many columns. All of these columns have thedata type VarChar. As you work through the tutorial, you will apply stricter data typing to thesecolumns to cleanse the data.

    6. Click the View Data tab in the top right corner of the stage editor window.

    7. In the Value field of the Resolve Job Parameter window, specify the name of the directory in

    which the tutorial data was installed and click OK(you have to specify directory path wheneveryou view data or run the job).

    8. In the Data Browser window, click OK. A window opens that shows the first 100 rows of the datathat the GlobalCo_BillTo.csv file contains (100 rows is the default setting, but you can change it).

    9. Click Close to close the Data Browser window.

    10. Click OK to close the Sequential File stage editor.

  • 8/3/2019 Ds Starter

    5/53

    Exploring the Data Set stageAbout this task

    To explore the Data Set stage:

    Procedure

    1. In the sample job, double-click the Data Set stage that is named GlobalCoBillTo_ds. The stageeditor opens in the Properties tab of the Input page.

    2. Look at the File property under the Target category. This property is used to specify the controlfile for the data set that the stage will write the data to when the job runs. In the sample job, theFile property points to a file that is named GlobalCo_BillTo.ds. You specify the directory thatcontains this file when you run the job. A data set is the internal format for transferring data insideparallel jobs. Data Set stages are used to land data that will be used by another job.

    3. Click on the Columns tab. The column metadata for this stage is the same as the columnmetadata for the Sequential File stage and defines the data that the job will write to the data set.

    4. Click OK to close the stage editor.

    The Data Set stage editor does not have a Format tab because the data set does not require anyformatting data. Although the View Data button is available on this tab, there is no data for this stage yet.If you click the View Data button, you will receive a message that no data exists. The data gets createdwhen the job runs.

    Compiling the sample jobAbout this task

    To compile the sample job:

    Procedure

    1. Select File > Compile. The Compile Job window opens. As the job is compiled, the window is

    updated with messages from the compiler.2. When the Compile Job window displays a message that the job is compiled, click OK.

    The sample job is now compiled and ready to run.

    Lesson checkpoint

    In this lesson, you explored a simple data extraction job that reads data from a file and writes it to astaging area.

    You learned the following tasks:

    y How to open stage editors

    y How to view the data that a stage represents

    y How to compile a job so that it is ready to run

    Lesson 1.3: Running the sample job

    In this lesson, you use the Director client to run the sample job and to view the log that the job producesas it runs. You also use the Designer client to look at the data set that is written by the sample job.

    About this task

  • 8/3/2019 Ds Starter

    6/53

    You run the job from the Director client. The Director client is the operating console. You use the Directorclient to run and troubleshoot jobs that you are developing in the Designer client. You also use theDirector client to run fully developed jobs in the production environment.

    You use the job log to debug any errors you receive when you run the job.

    Running the jobAbout this task

    To run the job:

    Procedure

    1. In the Designer client, select Tools > Run Director. Because you are logged in to the tutorialproject through the Designer client, you do not need to start the Director from the start menu andlog on to the project. In the Director client, the sample job has a status of compiled, which meansthat the job is ready to run.

    Figure 1. Director client

    2. Select the sample job in the right pane of the Director client, and select Job > Run Now.

    3. In the Job Run Options window, specify the path of the project folder (forexample, C:\IBM\InformationServer\Server\Projects\Tutorial and clickRun. The

    job status changes to Running.

    4. When the job status changes to Finished, select View > Log.

    5. Examine the job log to see the type of information that the Director client reports as it runs a job.The messages that you see are either control or information type. Jobs can also have Fatal andWarning messages.

    The following figure shows the log view of the job.

    Figure 2. Job log

  • 8/3/2019 Ds Starter

    7/53

    6. Select File > Exit to close the Director client.

    Viewing the data setAbout this task

    To view the data set that the job created:

    Procedure

    1. In the sample job in the Designer client, double-click the Data Set stage to open the stage editor.

    2. In the stage editor, click View Data.

    3. Click OK in the Data Browser window to accept the default settings. A window opens that showsup to 100 rows of the data written to the data set (if you want to view more than 100 rows in adata browser, change the default settings before you click OK).

    4. Click Close to close the Data Browser window.

    5. Click OK to close the Data Set stage.

    Lesson checkpoint

    In this lesson you ran the sample job and looked at the results.

    You learned the following tasks:

    y How to start the Director client from the Designer clienty How to run a job and look at the log file

    y How to view the data written by the job

    Lesson 2.1: Creating a job

    The first step in designing a job is to create an empty job and save it to a folder in the repository.

  • 8/3/2019 Ds Starter

    8/53

    Before you begin

    If you closed the Designer client after completing module 1, you will need to start the Designer clientagain.

    About this task

    You create a parallel job and save it to a new folder in the Tutorial folder in the repository tree.

    To create a job:

    Procedure

    1. In the Designer client, select File > New.

    2. In the New window, select the Jobs folder in the left pane and then select the parallel job icon inthe right pane.

    3. Click OK. A new empty job design window opens in the design area.

    4. Select File > Save.

    5. In the Save Parallel Job As window, right-click on the Tutorial folder andselect New > Folderfrom the shortcut menu.

    6. Type in a name for the folder, for example, My Jobs then move to the Item name field.

    7. Type in the name of the job in the Item name field. Call thejob populate_cc_spechand_lookupfiles.

    8. Check that the Folder path field contains the path \Tutorial\My Jobs, then click Save.

    You have created a new parallel job named populate_cc_spechand_lookupfiles and saved it in thefolderTutorial\My Jobs in the repository.

    Lesson checkpoint

    In this lesson you created a job and saved it to a specified place in the repository.

    You learned the following tasks:

    y How to create a job in the Designer client.

    y How to name the job and save it to a folder in the repository tree.

    Lesson 2.2: Adding stages and links to the job

    You add stages and links to the job that you created. Stages and links are the building blocks thatdetermine what the job does when it runs.

    The job design

    In this lesson, you will build the first part of the job, then compile it and run it to ensure that it workscorrectly before you add the next part of the job design. This method of iterative job design is a good habitto get into. You ensure that each part of your job is functional before you continue with the design for thenext part.

    The first part of the job reads a comma-separated file that contains a series of customer numbers, acorresponding code that identifies the country in which the customers are located, and another code that

  • 8/3/2019 Ds Starter

    9/53

    specifies the customer's language. You are designing a job that reads the comma-separated file andwrites the contents to a lookup table in a lookup file set. This table will be used by a subsequent job whenit populates a dimension table.

    Adding stages and linking themBefore you begin

    Ensure that the job named populate_cc_spechand_lookupfiles that you created in lesson 1 is open andactive in the job design area. A job is active when the title bar is dark blue (if you are using the defaultWindows colors).

    A job consists of stages linked together which describe the flow of data from a data source to a datatarget. A stage is a graphical representation of the data itself, or of a transformation that will be performedon that data.

    About this task

    To add the stages to your job design:

    Procedure

    1. In the Designer client palette area, click the File bar to open the file section of the palette.

    2. In the file section of the palette, select the Lookup File Set stage icon and drag the stage to youropen job. Position the stage on the right side of the job window

    3. In the file section of the palette, select the Sequential File stage icon and drag the stage to youropen job. Position the stage on the left side of the job window.

    4. Select the Sequential File stage in the job window. In the palette area, click the General bar toopen the general section of the palette.

    5. Select the Link icon and move your mouse pointer over to the Sequential File stage. The mousepointer changes to a target shape.

    6. Click the Sequential file stage to anchor the link and then drag the mouse pointer over to theLookup File Set stage. A link is drawn between the two stages. Your data will flow down this linkwhen the job runs.

    7. Rename the stages and links as follows:

    a. Select each stage or link.

    b. Right-click and select Rename.

    c. Type the new name.

    Stage or link New name

    Sequential File stage country_codes

    Lookup File Set stage country_code_lookup

    Link country_codes_data

  • 8/3/2019 Ds Starter

    10/53

    8. Always use specific names for your stages and links rather than the default names assigned bythe Designer client. Using specific names make your job designs easier to document and easierto maintain.

    9. Select File > Save to save the job.

    Your job design should now look something like the one shown in this figure:Figure 1. Job design

    Specifying properties and column metadata for the Sequential File stage

    You will now edit the first of the stages that you added to specify what the stage does when you run thejob. You will also specify the column metadata for the data that will flow down the link that joins the twostages.

    About this task

    To edit the stages and add properties and metadata:

    Procedure

    1. Double-click the country_codes Sequential File stage to open the stage editor. The editor opensin the Properties tab of the Output page.

    2. Select the File property under the Source category.

    3. In the File field, type the path name for your project folder (where the data files were copied whenthe tutorial was set up) and add the name CustomerCountry.csv (forexample C:\IBM\InformationServer\Server\Projects\Tutorial\CustomerCountry

    .csv), and then press enter. (You can browse for the path name if you prefer, click the browse

    button on the right of the File field.) You specified the name of the comma-separated file that thestage reads when the job runs.

    4. Select the First Line is Column Name property under the Options category.

    5. Click the down arrow next to the First Line is Columns Names field and select True from thelist. The row that contains the column names is dropped when the job reads the file.

    6. Click the Format tab.

    7. In the record-level category, select the Record delimiter string property from the Availableproperties to add.

  • 8/3/2019 Ds Starter

    11/53

    8. Select DOS format from the Record delimiter string list. This setting ensures that the file can beread when the engine tier is installed on a UNIX or Linux computer.

    9. Click the Columns tab. Because the CustomerCountry.csv file contains only three columns, typethe column definitions into the Columns tab. (If a file contains many columns, it is less timeconsuming and more accurate to import the column definitions directly from the data source.)Note that column names are case-sensitive, so use the case in the instructions.

    10. Double-click the first line of the table. Fill in the fields as follows:

    Table 1. Column definition

    Column Name Key SQL Type Length Description

    CUSTOMER_ NUMBER Yes Char 7 Key column for the look up - the customer id

    11. You will use the default values for the remaining fields.

    12. Add two more rows to the table to specify the remaining two columns and fill them in as follows:

    Table 2. Additional column definitions

    Column Name Key SQL Type Length Description

    COUNTRY No Char 2 The code that identifies the customer's country

    LANGUAGE No Char 2 The code that identifies the customer's language

    13. Your Columns tab should look like the one in the following figure (if you have National LanguageSupport installed, there is an additional field named Extended):

    14. Figure 2. Columns tab

  • 8/3/2019 Ds Starter

    12/53

    15.

    16. Click the Save button to save the column definitions that you specified as a table definition objectin the repository. The definitions can then be reused in other jobs.

    17. In the Save Table Definition window, enter the following information:

    Option Description

    Data source type Saved

    Data source name CustomerCountry.csv

    Table/file name country_codes_data

    Short description date and time of saving

    Long description Table definition for country codes source file

    18. Click OK to specify the locator for the table definition. The locator identifies the table definition.

    19. In the Save Table Definition As window, save the table definition in the Tutorial folder and nameit country_codes_data.

  • 8/3/2019 Ds Starter

    13/53

    20. Click the View Data button and click OK in the Data Browser window to use the defaultsettings. The data browser shows you the data that the CustomerCountry.csv file contains. Sinceyou specified the column definitions, the Designer client can read the file and show you theresults.

    21. Close the Data Browser window.

    22. Click OK to close the stage editor.

    23. Save the job.

    Notice that a small table icon has appeared on the Country_codes_data link. This icon shows that the linknow has metadata. You have designed the first part of your job.

    Specifying properties for the Lookup File Set stage and running the job

    In this part of the lesson, you configure the next stage in your job. You already specified the columnmetadata for data that will flow down the link between the two stages, so there are fewer properties tospecify in this task.

    About this task

    To configure the Lookup File Set stage:

    Procedure

    1. Double-click the country_code_lookup Lookup File Set stage to open the stage editor. The editoropens in the Properties tab of the Input page.

    2. Select the Lookup Keys category; then double-click the Key property in the AvailableProperties to add area.

    3. In the Key field, click the down arrow and select CUSTOMER_NUMBER from the list andpress enter. You specified that the CUSTOMER_NUMBER column will be the lookup key for thelookup table that you are creating.

    4. Select the Lookup File Set property under the Target category.

    5. In the Lookup File Set field, type the path name for the lookup file set that the stage will create,(forexample,C:\IBM\InformationServer\Server\Projects\Tutorial\countrylookup.f

    s) and press enter.

    6. Click OK to save your property settings and close the Lookup File Set stage editor.

    7. Save the job and then compile and run the job by using the techniques that you learned in Lesson1.

    You have now written a lookup table that can be used by another job later on in the tutorial.

    Lesson checkpoint

    You have now designed and run your very first job.

    You learned the following tasks:

    y How to add stages and links to a job

    y How to set the stage properties that determine what the stage will do when you run the job

    y How to specify column metadata for the job and to save the column metadata to the repository foruse in other jobs

  • 8/3/2019 Ds Starter

    14/53

    Lesson 2.3: Importing metadata

    You can import column metadata directly from the data source that your job will read. You store themetadata in the repository where it can be used in any job designs.

    About this task

    In this lesson, you will add more stages to the job that you designed in Lesson 2.2. The stages that youadd are similar to the ones that you added in lesson 2.2. The stages read a comma-separated file thatcontains code numbers and corresponding special delivery instructions. The contents are again written toa lookup table that is ready to use in a later job. The finished job contains two separate data flows, and itwill write data to two separate lookup file sets. Rather than type the column metadata, you import thecolumn metadata from the source file, and use that metadata in the job design.

    Importing metadata into your repositoryAbout this task

    In this part of the lesson, you will import column definitions from the comma-separated file that containsthe special delivery instructions. You will then save the column definitions as a table definition in therepository. Note that you can import metadata when no jobs are open in the Designer client, thisprocedure is independent of job design.

    Procedure

    1. In the Designer client, select Import > Table Definitions > Sequential File Definitions.

    2. In the Directory field in the Import Metadata (Sequential) window, type the path name of, orbrowse for, the Tutorial folder in the project folder (forexample, C:\IBM\InformationServer\Server\Projects\Tutorial). The importer

    displays any files in the directory that have the suffix .txt. If there are no files to display, you seean error message. You can ignore this message.

    3. In the File Type field, select a file type of Comma Separated (*.csv). The Files list is populated

    with all the files in the specified directory that have the suffix .csv.

    4. In the Files list, select the SpecialHandling.csv file.

    5. In the To folderfield, type the folder name \Tutorial\Table Definitions in which to store thetable definition.

    6. Click Import.

    7. In the Define Sequential Metadata window, select First line is columns names.

    8. Click the Define tab and examine the column definitions that were derived from theSpecialHandling.csv file.

    9. Click OK.

    10. Click Close to close the Import Metadata (Sequential) window.

    The column definitions that you viewed are stored as a table definition in the repository.

    Loading column metadata from the repository

    You can specify the column metadata that a stage uses by loading the metadata from a table definition inthe repository.

    Before you begin

  • 8/3/2019 Ds Starter

    15/53

    Ensure that your job named populate_cc_spechand_lookupfiles is open and active.

    About this task

    In this part of the lesson, you are consolidating the job design skills that you learned and loading thecolumn metadata from the table definition that you imported earlier.

    Procedure

    1. Add a Sequential file stage and a Lookup File Set stage to your job and link them together.Position them under the stages and link that you added earlier in this lesson.

    2. Rename the stages and link as follows:

    Stage or Link Name

    Sequential File special_handling

    Lookup File special_handling_lookup

    Link special_handling_data

    3. Your job design should now look like the one shown in this figure:

    4. Figure 1. Job design

    5.

    6. Open the stage editor for the special_handling Sequential File stage and specify that it will readthe file SpecialHandling.csv and that the first line of this file contains column names.

    7. Click the Format tab.

  • 8/3/2019 Ds Starter

    16/53

    8. In the record-level category, select the Record delimiter string property from the Availableproperties to add.

    9. Select DOS format from the Record delimiter string list. This setting ensures that the file can beread when the engine tier is installed on a UNIX or Linux computer.

    10. Click the Columns tab.

    11. Click Load. You load the column metadata from the table definition that you previously saved asan object in the repository.

    12. In the Table Definitions window, browse the repository tree to the folder where you stored theSpecialHandling.csv column definitions.

    13. Select the SpecialHandling.csv table definition and click OK.

    14. In the Selected Columns window, ensure that all of the columns appear in the Selectedcolumns list and click OK. The column definitions appear in the Columns tab of the stage editor.

    15. Close the Sequential File stage editor.

    16. Open the stage editor for the special_handling_lookup stage.

    17. Specify a path name for the destination file set and specify that the lookup key is the

    SPECIAL_HANDLING_CODE column then close the stage editor.

    18. Save, compile, and run the job.

    Lesson checkpoint

    You have now added to your job design and learned how to import the metadata that the job uses.

    You learned the following tasks:

    y How to import column metadata directly from a data source

    y How to load column metadata from a definition that you saved in the repository

    Lesson 2.4: Adding job parametersWhen you use job parameters in your job designs, you create a better job design.

    Job parameters

    Sometimes, you want to specify information when you run the job rather than when you design it. In yourjob design, you can specify a job parameter to represent this information. When you run the job, you arethen prompted to supply a value for the job parameter.

    You specified the location of four files in the job that you designed in Lesson 2.3. In each part of the job,you specified a file that contains the source data and a file to write the lookup data set to. In this lesson,you will replace all four file names with job parameters. You will then supply the actual path names of the

    files when you run the job.

    You will save the definitions of these job parameters in a parameter set in the repository. When you wantuse the same job parameters in a job later on in this tutorial, you can load them into the job design fromthe parameter set. Parameter sets enable the same job parameters to be used by different jobs.

    Defining job parametersBefore you begin

  • 8/3/2019 Ds Starter

    17/53

    Ensure that the job named populate_cc_spechand_lookupfile that you designed in Lesson 2.3 is openand active.

    Procedure

    1. Select Edit > Job Properties.

    2. In the Job Properties window, click the Parameters tab.3. Double-click the first line of the grid to add a new row.

    4. In the Parameter name field, type country_codes_source.

    5. In the Prompt field, type path name for the country codes file.

    6. In the Type field, select a data type ofPathname.

    7. In the Help Text field, type Enter the path name for the comma-separatedfile that

    contains the country code definitions.

    8. Repeat steps 3-7 to define three more job parameters containing the following entries:

    Table 1. Job parameters

    Parameter Name Prompt Type H

    country_codes_lookup path name for thecountry codeslookup file set

    path name Enter the the file country ctable

    special_handling_source path name for thespecial handlingcodes file

    path name Enter the the commafile that special hdefinitio

    special_handling_lookup path name for thespecial handlinglookup file set

    path name Enter the the file special htable

    9. The Parameters tab of the Job Properties window should now look like the one in the followingfigure:

    10. Figure 1. Parameters tab

  • 8/3/2019 Ds Starter

    18/53

    11.

    12. Click OK to close the Job Properties window.

    13. Click File > Save to save the job.

    Adding job parameters to your job design

    Now that you have defined the job parameters, you will add them to your job design.

    Procedure

    1. Double-click the country_codes Sequential File stage to open the stage editor.

    2. Select the File property in the Source category and delete the path name that you entered inthe File field.

    3. Click the right arrow next to the File field, and select Insert Job Parameterfrom the menu.

    4. Select country_codes_source from the list and press enter. The text #country_codes_source#appears in the File field. This text specifies that the job will request the name of the file when you

    run the job.

    5. Repeat these steps for each of the stages in the job, specifying job parameters for each of theFile properties as follows:

    Table 2. Job parameters to be added to job

    Stage Property Parameter n

  • 8/3/2019 Ds Starter

    19/53

    Table 2. Job parameters to be added to job

    Stage Property Parameter n

    country_codes_lookup stage Lookup file set country_codes_lookup

    special_handling stage File special_handling_source

    special_handling_lookup stage Lookup file set special_handling_lookup

    6. Save and recompile the job.

    Supplying values for the job parameters

    When you run the job, the Director client prompts you to supply values for the job parameters.

    Procedure

    1. Open the Director client.

    2. Select your job name and select Job > Run Now.

    3. In the Parameters page of the Job Run Options window, type a path name for each of the jobparameters.

    4. Click Run.

    The job runs, and uses the values that you supplied for the job parameters.

    Lesson checkpoint

    You defined job parameters to represent the file names in your job and specified values for these

    parameters when you ran the job.

    You learned the following tasks:

    y How to define job parameters

    y How to add job parameters in your job design

    y How to specify values for the job parameters when you run the job

    Lesson 2.5: Creating parameter sets

    You can store job parameters in a parameter set in the repository. You can then reuse the job parametersin other job designs.

    About this task

    In this lesson, you will create a parameter set from the job parameters that you created in Lesson 2.4.You will also supply a set of default values for the parameters in the parameter set that are also availablewhen the parameter set is used.

    Parameter sets

  • 8/3/2019 Ds Starter

    20/53

    You use parameter sets to define job parameters that you are likely to reuse in other jobs. Whenever youneed this set of parameters in a job design, you can insert them into the job properties from theparameter set. You can also define different sets of values for each parameter set. These parameter setsare stored as files in the InfoSphere DataStage server installation directory and are available to use inyour job designs or when you run jobs that use these parameter sets. If you make any changes to aparameter set object, these changes are reflected in job designs that use this object until the job iscompiled. The parameters that a job is compiled with are available when the job is run. However, if youchange the design after the job is compiled, the job will link to the current version of the parameter set.

    You can create parameter sets from existing job parameters, or you can specify the job parameters aspart of the task of creating a new parameter set.

    Creating a parameter set from existing job parametersBefore you begin

    Ensure that your job is open and active.

    Procedure

    1. Select Edit > Job Properties.

    2. In the Job Properties window, click the Parameters tab.

    3. In the Parameters page, use shift-click to select all of the job parameters that you defined inLesson 2.4.

    4. Click Create Parameter Set.

    5. In the General page of the Parameter Set window, type a name for the parameter set and a shortdescription (for example, tutorial_lookupand parameter set for lookup file names).

    6. Click the Parameters tab and check that all the job parameters that you specified for your jobappear in this page.

    7. Click the Values tab.

    8. In the Value File name field, type lookupvalues1.

    9. For each of the job parameters, specify a default path name for the file that the job parameterrepresents. Your Values pages should now look similar to the Values page in the followingpicture:

    Figure 1. Values page

  • 8/3/2019 Ds Starter

    21/53

    10. Click OK, specify a repository folder in which to store the parameter set, and then click Save.

    11. The Designer client asks if you want to replace the selected parameters with the parameter setthat you have just created. Click No.

    12. Click OK to close the Job Parameters window.

    13. Save the job.

    You created a parameter set that is available for another job that you will create later in this tutorial. Thecurrent job continues to use the individual parameters rather than the parameter set.

    Lesson checkpoint

    You have now created a parameter set.

    You learned the following tasks:

    y How to create a parameter set from a set of existing job parameters

    y How to specify a set of default values for the parameters in the parameter set

    Lesson 3.1: Designing the transformation job

    You will design and run a job that performs some simple transformations on the bill_to data, and writes

    the results to a staging Data Set stage.

    The transformer job

    The data that was read from the GlobalCo_BillTo.csv comma-separated file by the sample job in Module1 contains a large number of columns. The dimension table that you will produce later in this tutorialrequires only a subset of these columns, so you will use the transformation job to drop some of thecolumns.

  • 8/3/2019 Ds Starter

    22/53

    The job will also specify some stricter data typing for the remaining columns. Stricter data typing helps toimpose quality controls on the data that you are processing.

    Finally, the job applies a function to one of the data columns to delete space characters that the columncontains. This transformation job prepares the data in that column for a later operation.

    The transformation job that you are designing uses a Transformer stage, but there are also several othertypes of processing stages available in the Designer client that can transform data. For example, you canuse the Modify stage in your job, if you want to change only the data types in a data set. Several of theprocessing stages can drop data columns as part of their processing. In the current job, you use theTransformer stage because you require a transformation function that you can customize. Severalfunctions are available to use in the Transformer stage.

    Creating the transformation job and adding stages and links

    In this part of the lesson, you will create your transformation job and learn a new method for performingtasks that you are already familiar with.

    Procedure

    1. Create a parallel job, save it as TrimAndStrip, and store it in the tutorial folder in the repositorytree.

    2. Add two Data Set stages to the design area.

    3. Name the Data Set stage on the left GlobalCoBillTo, and name the one on the

    right int_GlobalCoBillTo.

    4. Click Palette > Processing to locate and drag a Transformer stage to the design area.

    5. Drop the Transformer stage between the two Data Set stages and name the Transformerstage Trim_and_Strip.

    6. Right-click the GlobalCoBillTo Data Set stage and drag a link to the Transformer stage. This

    method of linking the stages is fast and easy. You do not need to go back to the palette and graba link to connect each stage.

    7. Use the same method to link the Transformer stage to the int_GlobalCoBillTo Data Set stage.

    8. Name the first link full_bill_to and name the second link stripped_bill_to. Your job shouldlook like the one in the following picture:

    Figure 1. TrimAndStrip job

  • 8/3/2019 Ds Starter

    23/53

    Configuring the Data Set stages

    In this part of the lesson, you configure the Data Set stages and learn a new method for loading columnmetadata.

    Procedure

    1. Open the stage editor for the GlobalCoBillTo Data Set stage.

    2. Set the File property in the Source category to point to the data set that was created by thesample job in Module 1 (GlobalCoBillTo.ds), and close the stage editor.

    3. In the repository window, select the GlobalCoBillToSource table definition in the Tutorial folder.Drag the table definition to the design area and drop it onto the full_bill_to link. The cursorchanges shape to indicate the correct position to drop the table definition. In Lesson 2.3, youopened the GlobalCoBillTo Data Set stage editor and clicked Load to perform the same action.The method described in step 3 saves time when you are designing very large jobs.

    4. Open the stage editor for the GlobalCoBillTo Data Set stage and click View Data. The databrowser shows the data in the data set. You should frequently view the data after you configure astage to verify that the File property and the column metadata are both correct.

    5. Open the stage editor for the int_GlobalCoBillTo Data Set stage.

    6. Set the File property in the Source category to point to a new staging data set (forexample,C:\IBM\InformationServer\Server\Projects\Tutorial\int_GlobalCoBil

    lTo.ds).

    Configuring the Transformer stage

    In this part of the lesson, you specify the transformation operations that your job will perform when yourun it.

    Procedure

    1. Double-click the Transformer stage to open the stage editor.

    2. CTRL-click to select the following columns from the full_bill_to link in the upper left pane of thestage editor:

    o CUSTOMER_NUMBER

  • 8/3/2019 Ds Starter

    24/53

    o CUST_NAMEo ADDR_1o ADDR_2o CITYo REGION_CODEo ZIPo TEL_NUMo REVIEW_MONTHo SETUP_DATEo STATUS_CODE

    3. Drag these columns from the upper left pane to the stripped_bill_to link in the upper right pane ofthe stage editor. You are specifying that only these columns will flow through the Transformerstage when the job is run. The remaining columns will be dropped.

    4. In the stripped_bill_to column definitions at the bottom of the right pane, edit the SQL type andlength fields for your columns as specified in the following table:

    Table 1. Column definitions

    Column SQL Type

    CUSTOMER_NUMBER Char 7

    CUST_NAME VarChar 30

    ADDR_1 VarChar 30

    ADDR_2 VarChar 30

    CITY VarChar 30

    REGION_CODE Char 2

    ZIP VarChar 10

    TEL_NUM VarChar 10

    REVIEW_MONTH VarChar 2

    SETUP_DATE VarChar 12

    STATUS_CODE Char 1

    5. By specifying stricter data typing for your data, you will be able to better diagnose inconsistencies

    in your source data when you run the job.6. Double-click the Derivation field for the CUSTOMER_NUMBER column in the stripped_bill_to

    link. The expression editor opens.

    7. In the expression editor, type the following text: trim(full_bill_to.CUSTOMER_NUMBER,'

    ','A'). The text specifies a function that deletes all the space characters from theCUSTOMER_NUMBER column on the full_bill_to link before writing it to theCUSTOMER_NUMBER column on the stripped_bill_to link. Your Transformer stage editor shouldlook like the one in the following figure:

  • 8/3/2019 Ds Starter

    25/53

    Figure 2. Transformer stage editor

    8. Click OK to close the Transformer stage editor.

    9. Open the stage editor for the int_GlobalCoBillTo Data Set stage and go to the Columns tab of theInput page. Notice that the stage editor has acquired the metadata from the stripped_bill_to link.

  • 8/3/2019 Ds Starter

    26/53

    10. Save and then compile your TrimAndStrip job.

    Running the transformation jobAbout this task

    Previously, you ran jobs from the Director client. Now, you will run the job from the Designer client. This

    technique is useful when you are developing jobs since you do not leave the Designer client. To look atthe log file, you must open the Director client.

    Procedure

    1. In the Designer client, select Diagram > Show performance statistics. Additional information isshown next to the job links to provide figures for the number of rows that were transferred and thenumber of rows that were processed per second. This information is updated as the job runs.

    2. Select File > Run and click Run in the Job run options window. As the TrimAndStrip job runs, theperformance figures for the link are updated, and the links themselves change color to show theirstatus.

    3. When the job finishes running, open the Director client, select the TrimAndStrip job, and look atits job log. You can view the job log in the Director client even when you run the job from the

    Designer client.

    Lesson checkpoint

    In this lesson you learned how to design and configure a transformation job.

    You learned the following tasks:

    y How to configure a Transformer stage

    y How to link stages by using a different method for drawing links.

    y How to load column metadata into a link, by using a drag-and-drop operation.

    y How to run a job from within the Designer client and monitor the performance of the job.

    Lesson 3.2: Combining data in a job

    The Designer client supports more complex jobs than the ones that you designed so far. In this lesson,you begin to build a more complex job that combines data from two different tables.

    About this task

    You will base your new job on the transformation job that you created in Lesson 3.1. You will add aLookup stage that looks up the data that you created in Lesson 2.2.

    Using a Lookup stage

    Performing a lookup (search) is one way in which a job can combine data. The lookup is performed by theLookup stage. The Lookup stage has a stream input and a reference input. The Lookup stage uses oneor more key columns in the stream input to search for data in a reference table. The stage adds the datafrom the reference table to the stream output.

    You can also combine data in a parallel job by using a Join stage. Where you use a large reference table,a job can run faster if it combines data by using a Join stage. For the job that you are designing, thereference table is small, and so a Lookup stage is preferred. The Lookup stage is most efficient where thedata being looked up fits into the available physical memory.

  • 8/3/2019 Ds Starter

    27/53

    You can configure Lookup stages to search for data in a Lookup file set, or they can search for data in arelational database. The job will look up the data in a reference table in a Lookup File Set stage that wascreated in Lesson 2.2 of this tutorial. When you use lookup file sets, you must specify the lookup keycolumn when you define the file set. You defined the key columns for the lookup tables that you used inthis lesson when you created the file sets in Module 2.

    Creating a lookup job

    Next, you will create a job and add some of the stages that you configured in the TrimAndStrip job thatyou designed and ran in Lesson 3.1.

    Before you begin

    Ensure that the TrimAndStrip job that you created in Lesson 3.1 is open, and that you have a multi-window view in the design area of the Designer client. In multi-window view, you can see all the open jobsin the display area. To switch from single-window view to multi-window view, click the minimize button inthe Designer Client menu bar.

    Procedure

    1. Create a job, name it CleansePrepare, and save it in the tutorial folder in the repository.

    2. In the TrimAndStrip job, drag the mouse cursor around the stages in the job to select them andselect Edit > Copy.

    3. In the CleansePrepare job, select Edit > Paste. The stages appear in the CleansePrepare job.You can now close the TrimAndStrip job.

    4. Select the Processing area in the palette and drag a Lookup stage to the CleansPrepare job.Position the Lookup stage just below the int_GlobalCoBillTo stage and name it Lookup_Country.

    5. Select the stripped_bill_to link, position the mouse cursor in the link's arrowhead, and drag to theLookup stage. You moved the link with its associated column metadata to allow data to flow fromthe Transformer stage to the Lookup stage.

    6. Delete the int_GlobalCoBillTo Data Set stage. It will be replaced with a different Data Set stage.

    7. Select the File area in the palette and drag a Lookup File Set stage to the job. Position itimmediately above the Lookup stage and name itCountry_Code_Fileset.

    8. Draw a link from the Country_Code_Fileset Lookup File Set stage to the Lookup_Country Lookupstage and name it country_reference. The link appears as a dotted line, which indicates thatthe link is a reference link.

    9. Drag a Data Set stage from the palette to the job and position it to the right of the Lookup stage.Name the Data Set stage temp_dataset.

    10. Draw a link from the Lookup stage to the Data Set stage and name it country_code.

    The job that you designed should look like the one in the following figure:

    Figure 1. Job design

  • 8/3/2019 Ds Starter

    28/53

    Configuring the Lookup File Set stageAbout this task

    In the lesson section "Creating a lookup job," you copied stages from the TrimandStrip job to theCleansePrepare job. These stages are already configured, so you need to configure only the new stagesthat you add in this job. One of these stages is the Country_Codes_Fileset lookup stage. This stagerepresents the lookup file set that you created in Lesson 2.2. In this exercise, you will use the parameterset that you created in Lesson 2.5.

    Procedure

    1. Open the Job Properties window, and click the Parameters tab.

    2. Select the first row in the grid, and click Add Parameter Set. Browse to the tutorial folder, selectthe parameter set that you created in Lesson 2.5, and click OK.

    3. Close the Job Properties window.

    4. Double-click the Country_Codes_Fileset Lookup File Set stage to open the stage editor.

    5. Select the Lookup File Set property in the Source category, click the right arrow next tothe Lookup File Set field and select Insert Job Parameterfrom the menu. A list is displayed thatshows all the individual job parameters in the parameter set.

    6. In the list, select the country_codes_lookup job parameter and then press Enter.

    7. Click the Columns tab and load the country_codes_data table definition, and then close thestage editor.

    Configuring the Lookup stageAbout this task

    You specify the data that is combined in the Lookup stage. You defined the date column that will act asthe key for the lookup when you created the lookup file set in Module 2.

    Procedure

  • 8/3/2019 Ds Starter

    29/53

    1. Double-click the Lookup_Country Lookup stage to open the Lookup stage editor. The Lookupstage editor is similar in appearance to the Transformer stage editor.

    2. Click the title bar of the stripped_bill_to link in the left pane and drag it over to the Column Namecolumn of the country_code link in the right pane. When the cursor changes shape, release themouse button. All of the columns from the stripped_bill_to link appear in the country_code link.

    3. Select the Country column in the Country_Reference link and drag it to the country_codelink. The result of copying the columns from the Country_Reference link to the country_code linkis that whenever the value of the incoming CUSTOMER_NUMBER column matches the value ofthe CUSTOMER_NUMBER column of the lookup table, the corresponding Country column will beadded to that row of data. The stage editor looks like the one in the following figure:

    Figure 2. Lookup stage editor

  • 8/3/2019 Ds Starter

    30/53

  • 8/3/2019 Ds Starter

    31/53

    4. Double-click the Condition bar in the Country_Reference link. The Lookup StageConditions window opens. Select the Lookup Failure field and select Continue from the list. Youare specifying that, if a CUSTOMER_NUMBER value from the stripped_bill_to link does notmatch any CUSTOMER_NUMBER column values in the reference table, the job continues to thenext CUSTOMER_NUMBER column.

    5. Close the Lookup stage editor.

    6. Open the temp_dataset Data Set stage and specify a file name for the data set.

    7. Save, compile and run the job. The Job Run Options window displays all the parameters in theparameter set.

    8. In the Job Run Options window, select lookupvalues1 from the list next to the parameter setname. The parameters values are filled in with the path names that you specified when youcreated the parameter set.

    9. Click Run to run the job and then click View Data in the temp_dataset stage to examine theresults.

    Lesson checkpoint

    With this lesson, you started to design more complex and sophisticated jobs.

    You learned the following tasks:

    y How to copy stages, links, and associated configuration data between jobs.

    y How to combine data in a job by using a Lookup stage

    Lesson 3.3: Capturing rejected data

    This lesson shows you how to monitor rows of data that are rejected while you are processing them.

    Before you begin

    Ensure that the CleansePrepare job that you created in Lesson 3.2 is open and active.

    About this task

    In the Lookup stage for the job that you created in Lesson 3.2, you specified that processing shouldcontinue on a row if the lookup operation fails. Any rows that contain CUSTOMER_NUMBER fields thatwere not matched in the lookup table were bypassed, and the COUNTRY column for that row was set toNULL. In this lesson, you will specify that non-matching rows are written to a reject link. The reject linkcaptures any customer numbers that do not have an entry in the country codes table. You can examinethe rejected rows and decide what action to take.

    Procedure

    1. From the File section of the palette, drag a Sequential File stage to the CleansePrepare job andposition it under the Lookup_Country Lookup stage. Name the Sequential Filestage Rejected_Rows.

    2. Draw a link from the Lookup stage to the Sequential File stage. Name the link rejects. Becausethe Lookup stage already has a stream output link, the new link is designated as a reject link andis shown as a dashed l ine. Your job should resemble the one in the following figure:

    Figure 1. Job design

  • 8/3/2019 Ds Starter

    32/53

    3. Double-click the Lookup_Country Lookup stage to open the Lookup stage editor.

    4. Double-Click the Condition bar in the country_reference link to open the Lookup StageConditions window.

    5. In the Lookup Stage Conditions window, select the Lookup Failure field and select Reject fromthe list. Close the Lookup stage editor. This step specifies that, whenever a row from thestripped_bill_to link has no matching entry in the country code lookup table, the row is rejectedand written to the Rejected_Rows Sequential File stage.

    6. Edit the Rejected_Rows Sequential File stage and specify a path name for the file that the stagewill write to (for example,c:\tutorial\rejects.txt). This stage derives the column

    metadata from the Lookup stage, and you cannot alter it.

    7. Save, compile the CleansePrepare job, and run the job.

    8. Open the Rejected_Rows Sequential File stage editor and click View Data to look at the rowsthat were rejected.

    Lesson checkpointYou learned the following tasks:

    y How to add a reject link to your joby How to configure the Lookups stage so that i t rejects data where a lookup fails

    Lesson 3.4: Performing multiple transformations in asingle job

    You can design complex jobs that perform many transformation operations on your data.

  • 8/3/2019 Ds Starter

    33/53

    About this task

    In this lesson, you will further transform your data to apply some business rules and perform anotherlookup of a reference table.

    In the sample bill_to data, one of the columns is overloaded. The SET_UP data column can contain aspecial handling code as well as the date that the account was set up. The transformation logic that isbeing added to the job extracts this special handling code into a separate column. The job then looks upthe text description corresponding to the code from the lookup table that you populated in Lesson 2 andadds the description to the output data. The transformation logic also adds a row count to the output data.

    Adding new stages and linksAbout this task

    This tasks adds the extra stages to the job that will implement the additional transformation logic.

    Procedure

    1. Add the following stages to your CleansePrepare job:

    a. Place the Transformer stage above the temp_dataset Data Set stage and name thestage Business_Rules.

    b. Place the Lookup stage immediately to the right of the temp_dataset Data Set stage andname the Lookup stageLookup_Spec_Handling.

    c. Place the Lookup File Set stage immediately above the Lookup_Spec_Handling Lookupstage and name the Lookup File Set stageSpecial_Handling_Lookup.

    d. Place the Data Set stage immediately to the right of the Lookup_Spec_Handling Lookupstage and name the Data Set stage Target.

    2. Link the stages:

    a. Link the Business_Rules Transformer stage to the Lookup_Spec_Handling Lookup stageand name the link with_business_rules.

    b. Link the Special_Handling_Lookup Lookup File Set stage to the Lookup_Spec_HandlingLookup stage and name the linkspecial_handling.

    c. Link the Lookup_Spec_Handling Lookup stage to the Target Data Set stage and namethe link finished_data.

    3. Drag the arrowhead end of the country_code link and attach it to the Business_RulesTransformer stage. The Temp_Dataset stage is not required for this job, therefore you canremove it.

    4. Delete the Temp_Dataset Data Set stage.

    5. Drag the Business_Rules Transformer stage down so that it aligns horizontally with the Lookupstages. Your CleansePrepare job should now resemble the one in the following figure:

    Figure 1. Job design

  • 8/3/2019 Ds Starter

    34/53

    Configuring the Business_Rules Transformer stageAbout this task

    In this exercise, you configure the Transformer stage to extract the special handling code and add a rowcount to the output data.

    Procedure

    1. Open the Business_Rules Transformer stage editor, and click the Show/Hide Stage Variablesicon to display the stage variable grid in the right pane. You will define some stage variables laterin this procedure.

    2. Select the following columns in the country_code input link and drag them to thewith_business_rules output link:

    o CUSTOMER_NUMBERo CUST_NAMEo ADDR_1

    o ADDR_2o CITYo REGION_CODEo ZIPo TEL_NUM

    3. In the metadata area for the with_business_rules output link, add the following new columns:

    Table 1. Column definitions

  • 8/3/2019 Ds Starter

    35/53

    Column name SQL Type Length

    SOURCE Char 10 No

    RECNUM Char 10 No

    SETUP_DATE Char 10 Yes

    SPECIAL_HANDLING_ CODE Integer 10 Yes

    4. The new columns appear in the graphical representation of the link, but are highlighted in redbecause they do not yet have valid derivations.

    5. In the graphical area, double-click the Derivation field of the SOURCE column.

    6. In the expression editor, type 'GlobalCo':. Position your mouse pointer immediately to the rightof this text, right-click and select Input Column from the menu. Then select the COUNTRYcolumn from the list. When you run the job, the SOURCE column for each row will contain thetwo-letter country code prefixed with the text GlobalCo, for example, GlobalCoUS.

    7. In the Transformer stage editor toolbar, click the Stage Properties tool on the far left.The Transformer Stage Properties window opens.

    8. Click the Variables tab and, by using the techniques that you learned for defining table definitions,add the following stage variables to the grid:

    Table 2. Stage variables

    Name SQL Type P

    xtractSpecialHandling Char 1

    TrimDate VarChar 10

    9. When you close the Properties window, these stage variables appear in the Stage Variables areaabove the with_business_rules link.

    10. Double-click the Derivation fields of each of the stage variables in turn and type the followingexpressions in the expression editor:

    Table 3. Derivations

    Stage variable Expression Description

    xtractSpecialHandling ifLen (country_code.SETUP_DATE) DataConnection from the shortcut menu.

    2. In the General page of the Data Connection window, enter name for Data Connect (forexample, tutorial_connect) and provide a short description and a long description of the

    object.

    3. Open the Parameters page.

    4. Click the browse button next to the Connect using Stage Type field.

    5. In the Open window, open the Stage Types > Parallel > Database folder, select the ODBCConnector item and click Open. The Connection parameters grid is populated and shows theconnection parameters that are required by the stage type that you selected.

    6. Enter values for each of the Parameters as shown in the following table:

    Table 1. Parameter values

    Parameter name Value

    ConnectionString Type the DSN name

    Username Type the user name for connecting to the database by using the specified DSN

    Password Type the password for connecting to the database by using the specified DSN.

    7. Click OK.

    8. In the Save Data Connection As window, select the tutorial folder and click Save.

  • 8/3/2019 Ds Starter

    40/53

    Lesson checkpointYou learned how to create a data connection object and store the object in the repository.

    Lesson 4.2: Importing column metadata from adatabase table

    You can import column metadata from a database table and store i t as a table definition object in therepository.

    About this task

    In Lesson 2.3, you learned how to import column metadata from a comma-delimited file. In this lesson,you will import column metadata from a database table by using the ODBC connector. When you importdata by using a connector, the column definitions are saved as a table definition in the project repositoryand in the dynamic repository. The table definition is then available to be used by other projects and byother components in the information integration suite.

    To import column metadata by using the ODBC connector:

    Procedure

    1. Select Import > Table Definitions > Start Connector Import Wizard.

    2. In the Data Source Location page of the Import Connector Metadata wizard, select the computerthat hosts the database from the Host name where database resides list.

    3. Click the New location link.

    4. In the Shared Metadata Management window, select your host name and click Add newdatabase.

    5. In the Add new database window, type the name of the database that has been created on therelational database for this exercise (ask your database administrator if you do not know the

    name of the database) and click OK, then click Close to close the Shared MetadataManagement window.

    6. In the Data Source Location page, select the database from the Database name list andclick Next.

    7. In the Connector Selection page, select ODBC Connector from the list and click Next.

    8. In the Connection details page, select your DSN from the Data source list and click the Load link.

    9. In the Open window, open the tutorial folder, select the data connection object that you created inLesson 4.1, and click Open. The Data source, Username, and Password fields are populatedwith the corresponding data in the data connection object.

    10. Click the Test Connection link to ensure that you can connect to the database by using theconnection details and then click Next.

    11. In the Filter page, select the schema from the Schema list (ask your database administrator if youdo not know the name of the schema) and click Next.

    12. In the Selection page, select the tutorial table from the list and click Next.

    13. In the Confirm import page, review the import details, and then click Import.

    14. In the Select Folder window, select the tutorial folder and click OK.

  • 8/3/2019 Ds Starter

    41/53

    The table definition is imported and appears in the tutorial folder. The table definition has a different iconfrom the table definitions that you used previously. This icon identifies that the table definition wasimported by using a connector and is available to other projects and to other suite components.

    Lesson checkpointYou learned how to import column metadata from a database using a connector.

    Lesson 4.3: Writing to a database

    In Lesson 4.3, you will use an ODBC connector to write the BillTo data that you created in Module 3 to anexisting table in the database.

    Before you begin

    Double-check that your database administrator ran the scripts to set up the database and database tablethat you need to access in this lesson. Also ensure that the database administrator set up a DSN for youto use for the ODBC connection.

    Connectors

    Connectors are stages that you use to connect to data sources and data targets to read or write data.

    In the Database section of the palette in the Designer are many types of stages that connect to the sametypes of data sources or targets. For example, if you click the down arrow next to the ODBC icon in thepalette, you can choose to add either an ODBC connector stage or an ODBC Enterprise stage to your

    job.

    If your database type supports connector stages, use them because they provide the followingadvantages over other types of stages:

    y Creates job parameters from the connector stage (without first defining the job parameters in thejob properties).

    y Saves any connection information that you specify in the stage as a data connection object.y Reconciles data types between source and target to avoid runtime errors.

    y Generates detailed error information if a connector encounters problems when the job runs.

    Creating the job

    In this exercise, you will create a job to write to the database.

    Procedure

    1. Create a job, name it ODBCwrite, and save it in the tutorial folder of the repository.

    2. Open the File section of the palette and add a Data Set stage to your job. Name the

    stage BillToSource.

    3. Open the Database section of the palette and add an ODBC Connector stage to your job.Position the stage to the right of the Data Set stage and name the ODBC connectorstage BillToTarget.

    4. Link the two stages together, and name the link to_target.

    Your job looks like the one in the following figure:Figure 1. Job design

  • 8/3/2019 Ds Starter

    42/53

    Configuring the Data Set stage

    In this exercise, you will configure the Data Set stage to read the data set that you created in Lesson 3.4.

    About this task

    In this exercise, you will use the table definition that you imported in Lesson 4.2. Notice that the columndefinitions are the same as the table definition that you created by editing the Transformer stage andLookup stage in the job in Lesson 3.4.

    Procedure

    1. Double-click the BillToSource Data Set stage to open the stage editor.

    2. Select the File property on the Properties tab of the Output page and set it to the data set thatyou created in Lesson 3.4. Use a job parameter to represent the data set file.

    3. In the Columns page, click Load.

    4. In the Table Definitions window, open the tutorial folder, select the table definition that youcreated in Lesson 4.2, and click OK. The columns grid is populated with the column metadata.

    5. Click OK to close the stage editor.

    Configuring the ODBC connector

    In this exercise, you configure the ODBC connector to supply the information that is needed to write to thedatabase table.

    Procedure

    1. Double-click the BillToTarget stage to open the ODBC Connector. The connector interface isdifferent than the stage editors that you have used so far in this tutorial.

    2. In the navigator area in the top left of the stage editor, click the stage icon to select it.

    3. In the Properties tab of the Stage page, click Load.

    4. In the Open window, open the tutorial folder and select the data connection object that youcreated in Lesson 4.1. The Data Source,Username, and Password properties are the displayed

    values from the data connection object.5. In the navigator area in the top left of the stage editor, click the link to select it.

    6. In the Properties tab of the to_target page, set the Write mode property in the usage categoryto Insert. The Insert statement field under the SQL property is enabled.

    7. Click the Build button next to the Insert statement field, and select Build new SQL (ODBC 3.52core syntax) from the menu. You use the SQL builder to define the SQL statement that is usedto write the data to the database when you run the job.

    8. Configure the SQL builder:

  • 8/3/2019 Ds Starter

    43/53

    a. In the Select tables area, open the tutorial folder, browse the icons that represent yourdatabase and your schema, and select the table definition that you imported in Lesson4.2.

    b. Drag the table definition to the area to the right of the repository tree.

    c. In the table definition, click Select All and drag all the columns to the InsertColumns area. The SQL builder page should look like the one in the following figure:

    Figure 2. SQL builder editor

    d. Click the SQL tab to view the SQL statement; then click OK to close the SQL builder. TheSQL statement is displayed in the Insert statement field, and your ODBC connectorshould look like the one in the following figure:

  • 8/3/2019 Ds Starter

    44/53

    Figure 3. ODBC Connector editor

    9. Click OK to close the ODBC Connector.

    10. Save, compile, and run the job.

    You wrote the BillTo data to the tutorial database table. This table forms the bill_to dimension of the starschema that is being implemented for the GlobalCo delivery data in the business scenario that the tutorialis based on.

  • 8/3/2019 Ds Starter

    45/53

    Lesson checkpoint

    You learned how to use a connector stage to connect to and write to a relational database table.

    You learned the following tasks:

    y How to configure a connector stage

    y How to use a data connection object to supply database connection detailsy How to use the SQL builder to define the SQL statement by accessing the database.

    Lesson 5.1: Exploring the configuration file

    In this lesson, the configuration file is the key to getting the optimum performance from the jobs that youdesign.

    The shape and size of the computer system on which you run jobs is defined in the configuration file.When you run a job, the parallel engine organizes the resources that the job needs according to what isdefined in the configuration file. When your computer system changes, you change the configuration file,not the jobs.

    Unless you specify otherwise, the parallel engine uses a default configuration file that is set upwhen InfoSphere DataStage is installed.

    Opening the default configuration file

    You use the Configurations editor in the Designer client to view the default configuration file.

    About this task

    To open the default configuration file:

    Procedure

    1. Select Tools > Configurations.

    2. In the Configuration window, select default from the list. The contents of the default configurationfile are displayed.

    Example configuration fileThe following example shows a default configuration file from a four-processor SMP computer system.{

    node "node1"{

    fastname "R101"pools ""resource disk "C:/IBM/InformationServer/Server/Datasets"

    {pools ""}resource scratchdisk

    "C:/IBM/InformationServer/Server/Scratch" {pools ""}}node "node2"{

    fastname "R101"pools ""resource disk "C:/IBM/InformationServer/Server/Datasets"

    {pools ""}

  • 8/3/2019 Ds Starter

    46/53

    resource scratchdisk"C:/IBM/InformationServer/Server/Scratch" {pools ""}

    }

    }

    The default configuration file is created when InfoSphere DataStage is installed. Although the system has

    four processors, the configuration file specifies two processing nodes. Specify fewer processing nodesthan there are physical processors to ensure that your computer has processing resources available forother tasks while it runs InfoSphere DataStage jobs.

    This file contains the following fields:

    node

    The name of the processing node that this entry defines.

    fastname

    The name of the node as it is referred to on the fastest network in the system. For an SMPsystem, all processors share a single connection to the network, so the fastname node is thesame for all the nodes that you are defining in the configuration file.

    pools

    Specifies that nodes belong to a particular pool of processing nodes. A pool of nodes typicallyhas access to the same resource, for example, access to a high-speed network link or to amainframe computer. The pools string is empty for both nodes, specifying that both nodes belongto the default pool.

    resource disk

    Specifies the name of the directory where the processing node will write data set files. When youcreate a data set or file set, you specify where the controlling file is called and where it is stored,but the controlling file points to other files that store the data. These files are written to thedirectory that is specified by the resource disk field.

    resource scratchdisk

    Specifies the name of a directory where intermediate, temporary data is stored.Configuration files can be more complex and sophisticated than the example file and can beused to tune your system to get the best possible performance from the parallel jobs thatyou design.

    Lesson checkpoint

    In this lesson, you learned how the configuration file is used to control parallel processing.

    You learned the following concepts and tasks:

    y About configuration files

    y How to open the default configuration file

    y What the default configuration file contains

    Lesson 5.2: Partitioning data

    When jobs run in parallel, data is partitioned so that each processor has data to process.

    About this task

  • 8/3/2019 Ds Starter

    47/53

    In the simplest scenario, do not worry about how your data is partitioned. InfoSphere DataStage canpartition your data and implement the most efficient partitioning method.

    Most partitioning operations result in a set of partitions that are as near to equal size as possible,ensuring an even load across your processors.

    As you perform other operations, you need to control partitioning to ensure that you get consistent results.For example, you are using an aggregator stage to summarize your data to get the answers that youneed. You must ensure that related data is grouped together in the same partition before the summaryoperation is performed on that partition.

    In this lesson, you will run the sample job that you ran in Lesson 1. By default, the data that is read fromthe file is not partitioned when it is written to the data set. You change the job so that it has the samenumber of partitions as there are nodes defined in your system's default configuration file.

    Viewing partitions in a data set

    You need to be able to see how data in a data set is divided into partitions to determine how the data isbeing processed.

    About this task

    This exercise teaches you how to use the data set management tool to look at data sets and how theyare structured.

    To see how data sets are structured:

    Procedure

    1. Select Tools > Data Set Management.

    2. In the Select from server window, browse for the data set file that was written by the

    GlobalCo_BillTo.ds sample job in Module 1 and click OK.3. In the partitions section, the data was written to a single partition. The Data Set

    Management window should look like the one in the following figure:

    Figure 1. Data Set Management window

  • 8/3/2019 Ds Starter

    48/53

    4. Click the disk icon in the toolbar to open the Data Set viewer

    and click OK.

    5. View the data in the data set to see its structure.

    6. Close the window.

    Creating multiple data partitionsAbout this task

    By default, most parallel job stages use the auto-partitioning method. The auto method determines themost appropriate partitioning method based on what occurs before and after this stage in the data flow.

    The sample job reads a comma-separated file. By default, comma-separated files are read sequentiallyand all their data is stored in a single partition. In this exercise, you will override the default behavior andspecify that the data that is read from the file will be partitioned by using the round-robin method. Theround-robin method sends the first data row to the first processing node, the second data row to thesecond processing node, and so on.

  • 8/3/2019 Ds Starter

    49/53

    To specify round-robin partitioning:

    Procedure

    1. Open the sample job that you used in Module 1.

    2. Open the GlobalCoBillTo_ds Data Set stage editor.

    3. Open the Partitioning tab of the Input page.

    4. In the Partition type field, select the round-robin partitioning method.

    5. Compile and run the job.

    6. Return to the data set management tool and open the GlobalCo_BillTo.ds data set. You can seethat the data set now has multiple data partitions. The following figure shows the data setpartitions on the system.

    Figure 2. Data Set Management window showing multiple partitions

    Lesson checkpoint

    In this lesson, you learned some basics about data partitioning.

  • 8/3/2019 Ds Starter

    50/53

    You learned the following tasks:

    y How to use the data set management tool to view data sets

    y How to set a partitioning method for a stage

    Lesson 5.3: Changing the configuration file

    In this lesson, you will create a new configuration file and see the effect of running the sample job with thenew configuration file.

    About this task

    This lesson demonstrates that you can quickly change configuration files to affect how parallel jobs arerun. When you develop parallel jobs, first run your jobs and test the basic functionality before you startimplementing parallel processing.

    Creating a configuration file

    You use the configuration editor that you used in Lesson 5.1 to create a configuration file.

    Procedure

    1. Select Tools > Configurations to open the Configurations editor.

    2. Select default from the Configurations list to open the default configurations file.

    3. Click Save, and select Save configuration as from the menu.

    4. In the Configuration name field of the Save Configuration As window, type a name for your newconfiguration. For example, type Module5.

    5. In the part of the configuration editor that shows the contents of the configuration file, click anddrag to select all the nodes except for the first node in your configuration file.

    6. Delete the selected entries.

    7. Click Check to ensure that your configuration file is valid. The configuration editor shouldresemble the one in the following picture:

    Figure 1. Configurations file editor

  • 8/3/2019 Ds Starter

    51/53

    8. Click Save and select Save configuration from the menu.

    Deploying the new configuration fileAbout this task

    Now that you have created a new configuration file, you use this new file instead of the default file. Youuse the Administrator client to deploy the new file. You must have DataStage Administrator privileges touse the Administrator client for this purpose.

    To deploy the new configuration file:

    Procedure

    1. Select Start > Programs > IBM InfoSphere Information Server> IBM InfoSphere DataStage

    and QualityStage Administrator.

    2. In the Administration client, click the Projects tab to open the Projects window.

    3. In the list of projects, select the tutorial project that you are currently working with.

    4. Click Properties.

    5. In the General tab of the Project Properties window, click Environment.

    6. In the Categories tree of the Environment variables window, select the Parallel node.

  • 8/3/2019 Ds Starter

    52/53

    7. Select the APT_CONFIG_FILE environment variable, and edit the file name in the path nameunder the Value column heading to point to your new configuration file. The Environmentvariables window should resemble the one in the following picture:

    Figure 2. Environment variables window

    You deployed your new configuration file. Keep the Administrator client open, because you will use it torestore the default configuration file at the end of this lesson.

    Applying the new configuration file

    Now you run the sample job again.

    About this task

    You will see how the configuration file overrides other settings in your job design. Although you previouslypartitioned the data that is read from the GlobalCo_BillTo comma-separated file, the configuration filespecifies that the system has only a single processing node available, and so no data partitioning isperformed.

    To apply the configuration file:

    Procedure

    1. Open the Director client and select the sample job that you edited and ran in Lesson 5.2.

    2. Click to reset the job so that you can run it again.

    3. Run the sample job.

  • 8/3/2019 Ds Starter

    53/53

    4. In the Designer client, open the data set management tool and open the GlobalCo_BillTo.ds dataset. You see that the data is in a single partition because the new configuration file specifies onlyone processing node.

    5. Reopen the Administrator client to restore the default configuration file by editing the path for theAPT_CONFIG_FILE environment variable to point to the default.apt file.

    Lesson checkpoint

    You learned how to create a configuration file and use it to alter the operation of parallel jobs.

    You learned the following tasks:

    y How to create a configuration file based on the default file.

    y How to edit the configuration file.

    y How to deploy the configuration file