SUPER INFORMATICA BASICS.pdf

download SUPER INFORMATICA BASICS.pdf

of 49

Transcript of SUPER INFORMATICA BASICS.pdf

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    1/49

    Data Warehousing Basic

    What is Data Warehousing?

    Data warehousing collection of data designed to support management decision making.In another words it is a repository of integrated information, available for querying and

    analyzing. According to Inmon, famous author for several data warehouse books, "A data warehouseis a subect oriented, integrated, time variant, non volatile collection of data in support ofmanagement!s decision making process".

    Who need data warehousing?

    It is needed by the knowledge worker. e.g. anager, Analyst, #$ecutive and anyauthorized person who needed the information from the large scale of database.

    Types of Systems

    %here are two types of systems.

    &. '(%)*. D++ '(A)-

    Features OLTP OLAP

    haracteristic 'perational )rocessing Informational )rocessing

    'rientation %ransactional Analysis

    /ser lerk, D0A 1nowledge 2orker  

    3unction Day to day operation (ong term informational requirements

    D0 Design #4 0ased, application oriented star5snowflake, subect oriented

    6iew detailed, flat relation +ummarized, ultidimensional

    Access 4ead52rite ostly 4ead

    D0 +ize &70 to &770 &770 to %0

    Data Warehouse Life Cyce

    %he data warehouse life cycle comprises of various phases8 Phase 1: 0usiness 4equirements ollections

    A business analyst is responsible for gathering requirements from the end usersfor the following e$ample domains

    &. %elecom*. Insurance9. anufacturing:. +ales ; 4etails

     Phase 2: Data odelingIt is the process of designing database by database architect using #42I< tool.

     Phase 3: #%( Developer   An application developer designs an #%( application by following the #%(specification using =/I based tools. +uch as I

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    2/49

     Phase 4: #%( %esting%his phase is completed by #%( tester as well as application developer also.arried out the following test in the test environment

    &. #%( unit testing*. +ystem %esting

    9. )erformance %esting:. /A% /ser Acceptance %esting-

     Phase 5: 4eport DevelopmentDesign the reports by fulfilling the report requirements templates using followingtools.

    ognos0'

     Phase 6: DeploymentIt is a process of migrating #%( and 4eport development application to the

     production environment.

     Phase 7: aintenanceaintain the Data warehousing in *:>? environments with the help of productionsupport team. 

    Data warehouse design  Data!ase Design

    A data warehouse design with the following types of schemas.&. +tar +chema*. +now 3lake +chema

    9. =ala$y +chema"# Star Schema$%Is a database design which contains a centrally located fact tablesurrounded by dimension tables. +ince the database design looks like astar hence it is called star schema Database design.

    • A fact table contains facts.

    • 3acts are numeric measure.

    •  

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    3/49

     

    +tar +chema

    Snowfa'e Schema

    %he snowflake schema is a variant of star schema, where someDimension tables are normalized , thereby further splitting the data into additional tables.%he resulting schema graph forms a shape similar to snowflake.Ad($%

    • +pace can be minimized by splitting into the normalized table.Disad($%

    • It can hamper the query performance due to more number of oins.

    a

    +nowflake +chema

    )# *aa+y Schema ,Fact Consteation Schema-

    +ophisticated application mayrequire multiple facts table to share dimension table. %his type of schema can be viewedas combination of stars hence called gala$y schema or fact constellation schema.

    +ales 3act

    +aledpk-

    ustid fk-

    +toreid fk-

    )roductidfk-Datein fk-

    +ales 3act

    +aledpk-

    ustid fk-

    +toreid fk-

    )roductidfk-

    Datein fk-

    9

    +tore Dimen

    %ime Dimen

    ustomer Dimension

    )roduct Dimension

    +tore Dimension

    %ime Dimension

    ustomer Dimension

    )roduct Dimension

    Item Dimension

    ity Dimension

    3act *3act &

    D& D9 DB DC D?

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    4/49

    =ala$y +chemaDimensions

    Dimension tables are sometimes called lookup or reference table.

    &. onfirmed Dimension @ A dimension table which can be shared by multiple facttables is known as confirmed dimension.

    *. Eunk Dimension @ A dimension with the type descriptive, flag, 0oolean which arenot used to describe the key performance indicators knows as facts, suchdimensions are called unk dimensions. #$ample, )roduct description, Address,)hone number etc.

    9. +lowly hanging Dimension @ A Dimensions that change over time are called+lowly hanging Dimensions. 3or instance, a product price changes over timeF

    )eople change their names for some reasonF ountry and +tate names may changeover time. %hese are a few e$amples of +lowly hanging Dimensions since somechanges are happening to them over a period of time. +lowly hangingDimensions are often categorized into three types namely %ype&, %ype* and%ype9. %he following section deals with how to capture and handling thesechanges over time.

    Type 1: !er"riting the old !al#es$In the year *77B, if the price of the product changes to G*B7, then the oldvalues of the columns "Hear" and ")roduct )rice" have to be updatedand replaced with the new values. In this %ype &, there is no way to findout the old value of the product ")roduct&" in year *77: since the table

    now contains only the new price and year information.Type 2: %reating another additional record$In this %ype *, the old values will not be replaced but a new rowcontaining the new values will be added to the product table. +o at any point of time, the difference between the old values and new values can be retrieved and easily be compared. %his would be very useful forreporting purposes.

    Type 3: %reating ne" &ields.In this %ype 9, the latest update to the changed values can be seen.#$ample mentioned below illustrates how to add new columns and keeptrack of the changes. 3rom that, we are able to see the current price andthe previous price of the product, )roduct&.

    :

    D* D DJD:

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    5/49

    Data .odeingA Data model is a conceptual representation of data structures tables- required for adatabase and is very powerful in e$pressing and communicating the businessrequirements.A data model visually represents the nature of data, business rules governing the data,and how it will be organized in the database.Data modeling consists of three phases to design the database.

    &. onceptual odeling

    • /nderstand the business requirements

    • Identify the entities tables-

    • Identify the columns attributes-

    • Identify the relationship*. (ogical odeling

    • Design the tables with the required attributes.

    9. )hysical odeling

    • #$ecute the logical tables to e$ist physical e$istence in the database.

    Data modeing toos

    %here are a number of data modeling tools to transform business requirements intological data model, and logical data model to physical data model. 3rom physical datamodel, these tools can be instructed to generate +K( code for creating database.

    B

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    6/49

     

    C

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    7/49

    /0FO1.AT/CA/ntroduction

    Is =/I based #%( product from Informatica corporation.

    Is a client server technology.

    Is developed using EA6A language.

    Is an integrated tool set %o Design, %o 4un, %o onitor-6ersions

    &. B.7*. C.79. ?.&.&:. .&.&B. .BC. .C

    .eta Data

    eta Data is a LData about DataM means Data that describes data and other structures,such as obects, business rules, and processes.#$ample %able +tructure column name, data type, precision, scale and kyes-,Description

    .apping

    Is a =/I representation for the data flow from source to target. In other words, thedefinition of the relationship and data flow between source and target obects.

     'e(#irements &or mappings

    a- +ource etadata

     b- 0usiness logic

    c- %arget etadata

    1epository

    entral Database or etadata +torage place

    ?

    Informatica

    +ource Database%arget Data warehouse

    4epository 2orking)lace in Informatica-4epository

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    8/49

    Staging Area

    A place where data is processed before entering the warehouse.

    Source System

    A database, application, file, or other storage facility from which the data in a datawarehouse is derived.

    Target System

    A database, application, file, or other storage facility to which the "transformed sourcedata" is loaded in a data warehouse.

    Ceansing

    %he process of resolving inconsistencies and fi$ing the anomalies in source data,typically as part of the #%( process.

    Transformation

    %he process of manipulating data. Any manipulation beyond copying is a transformation.#$amples include cleansing, aggregating, and integrating data from multiple sources.

    Transportation

    %he process of moving copied or transformed data from a source to a data warehouse.

    Wor'ing Professiona Di(isions

    Two Fa(ors

    &. Informatica )ower center  3or 0ig +cale Industries.

    *. Informatica )ower mart 3or +mall +cale Industries.

    Components of /nformatica

    %lient %omponents

    &. Designer *. 2ork flow anager 9. 2ork flow onitor :. 4epository anager 

    Designation 1oes

    #%( Architects Designing +chema, #%( +pecificationDeveloper Developing #%( Application

    Administrator Installation, onfiguration, anaging, onitoring

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    9/49

    B. Admin onsole

    4oles for Designer &. /se apping*. +ource analyze9. onnect to source Database with 'D0:. %arget Designer B. apping Designer C. applet Designer ?. %ransformation Developer 

    4oles for 2orkflow anager 

    &. %ask Developer 

    *. 2ork flow designer 9. 2orklet designer 

    J

    DesignerImport +ource DefinitionImport %arget etadataImport Designing apping

    apping$yz-

    +aveN

    4epository

    Wor'fow .anager&.reate +ession

    apping+$yz-

    +ave

    *. reate 2orkflow

    +tart

    @@#$ecuting intoInformatica server.

    @@Integration servicesare responsible fore$ecution.

    Wor'fow .onitor

    onitoring apping-

    +ession

    Admin Consoe

    3or Administrative )urpose.

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    10/49

    Wor'ing fow of Cient Component in /nformatica

     

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    11/49

    2ow the .apping can !e done?

    ustomer 

    ID number:-pk 

    fname varchar*B-

    lname varchar*B-

    =ender number&-

    oncat fname, lname -Decode =ender, 7,O3O,&,OO-

     

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    12/49

    Designer Wor'fow .anger Wor'fow .onitor 1epository .anagerreate +ourceDefinition

    reate session for eachapping

    6iew workflow ;session status

    reate #dit ; Delete folders

    reate targetDefinition

    reate 2orkflow =et +ession log reate /sers, groups, assign permission.

    Define %54 4ule #$ecute 2orkflowDesign apping +chedule 2orkflow

     

    /nformatica PowerCenter Cient Architecture

    0ote$ 'ne 2orkflow can contain more than one session but one session will contain onlyone mapping.2orkflow is upper layer of the development while session is middle layer and mapping isinner layer.

    &*

    #$ternallient

    4epository+ervices

    Integration+ervices

    2eb +ervicesPub

    apping+ource Definition%arget Definition%54 4ule+ession2orkflow+ession log

    +chedule Info

    1epository

    %arget D0+ource D0

    +tagingArea

    3

    T

    L

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    13/49

    Power Center Cients

    %he following power center clients gets installed1$ )esigner 

    It is a =/I based client component which allows you to design the planof #%( process called mapping.

    %he following types of metadata obects can be created using designerclient.a- reate +ource Definition b- reate %arget Definitionc- Design apping with or without a %ransformation rule.

    2$ *or+&lo" anager 

    It is a =/I based client component which allows you to create thefollowing task.

    a- reate session for each mapping. b- reate workflowc- #$ecute workflow

    d- +chedule workflow3$ *or+&lo" onitor It is a =/I based client component which provides the followinginformation

    a- =ive the workflow and session status +ucceeded or 3ailed- b- =et +ession (og from the repository.c- +tart, +top sessions and workflows.

    4$ 'epository anager 

    %he 4epository manager is =/I based administrative client whichallows you to create following obects.

    a- reate, #dit and Delete folders which are required to organize themetadata and the repository.

     b- reate users, user groups, assign permissions and privileges.5$ Po"er %enter 'epository

    %he )ower enter 4epository is a 4elational Database +ystemDatabase- which contains instruction required to e$tract transform andload data.

    %he )ower enter client application can access the repositorydatabase through repository service.

    %he 4epository consists of metadata which describes thedifferent types of obects such as source definition, target definition,mapping etc.

    %he Integration service uses repository obects to performe$traction, transformation and load data.

    %he repository also stores administrative information such asusername, passwords, permission and privileges.

    %he Integration service also creates metadata such as session@log, workflow and session status, start and finish time of the session andstores in repository through repository service.

    &9

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    14/49

    6$ 'epository -er!ice

    %he 4epository service manages connections to the power centerrepository from client applications.

    %he 4epository service is a multithreaded process that inserts,retrieves, deleted and updates metadata in the repository.

    %he 4epository service ensures the consistency of the metadata inthe repository.%he 3ollowing )ower enter applications can access the repository

    servicea- )ower enter lient b- Integration +ervicec- 2eb +ervice Pubd- ommand (ine )rogram 3or backup and 4ecovery for

    administrative purpose-7$ Integration -er!ice

    %he Integration +ervice reads mappings and session information from

    the repository.It e$tract the data from the mapping source stores in the memory+taging Area- where it applies the transformation rule that you canconfigure in the mapping.

    %he Integration +ervice loads the transformed data into themapping targets.

    %he integration service connects to the repository throughrepository service to fetch the metadata.

    .$ *e/ -er!ice 0#/

    %he 2eb +ervice Pub is a web service gateway for the e$ternal clients.%he web service clients Internet #$plorer, ozilla- access the

    integration service and repository service through web service hub.It is used to run and monitor web enabled work flows.

    Definitions

    -ession: A +ession is a set of instruction which perform e$traction, transformation andloading.A session reated to make the mapping available for e$ecution.

    *or+&lo":  A 2orkflow is a start task which contains a set of instruction to e$ecute theother task such as session.2orkflow is a top obect in the power center development hierarchy.

    -ched#le *or+&lo": A +chedule workflow is an administrative task which specifies the

    data and time to run the workflow.

    44 %he following client component makes communication to integration service.

    &. 2orkflow anager *. 2orkflow onitor 

    &:

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    15/49

    Transformation

    A transformation is an obect used to define business logic for processing the data.

    %ransformation can be categorized in two categories&. 0ased upon no. of rows processing

    *. 0ased upon connection0ased upon no. of rows processing there are two types of %ransformation&. Active %ransformation*. )assive %ransformation

    Acti(e Transformation$

    A transformation which can affect the number of rows while data is going from source totarget is known as active transformation.%he following are the list of active transformation used for processing the data.

    &. +ource Kualifier %ransformation*. 3ilter %ransformation

    9. Aggregator %ransformation:. Eoiner %ransformationB. 4outer %ransformationC. 4ank %ransformation?. +orter %ransformation. /pdate +trategy %ransformationJ. %ransaction ontrol %ransformation&7. /nion %ransformation&&.

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    16/49

    #$ample of )assive %ransformation

      &: 4ows &: 4ows &:I- &:'-

    0ased on onnection there are two types of %ransformation&. onnected*. /nconnected

    Connected$ A transformation which is participated in mapping data flow directionconnected to the source and target- is known as connected transformation.@@All active and passive transformation can be used as connected transformation.@@A connected transformation can receive the multiple inputs and can provide multipleoutputs.

    + %

    + I ' %a$I ' Annual +al

    5nconnected$ A transformation which is not participating in a mapping data flowdirection neither connected to source nor to the target- is known as unconnectedtransformation.@@ An unconnected transformation can receives the multiple inputs but provides a singleoutput.@@ %he following transformation can be used as unconnected transformation.&. +tored )rocedure %ransformation*. (ookup %ransformation

      + %

    &C

    +K#mp#mp %#mp%a$U+al>7.&7

    #$pression %ransformation

    T61 

    SAL

    CO.

    T61 

    T61 

    apping

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    17/49

    Port 7 Types of Port

    A )ort represents column of the database table or file.

    %he following are the types of port.

    &. Input )ort

    *. 'utput )ortInput )ort A port which can receive the data is known as input port, which is representedas I.

    'utput )ort A port which can provide the data is known as output port, which isrepresented as '.

    3TL Specification Document ,.apping Specification Document-

    A mapping specification document is an e$cel sheet or word document which containsinformation about following obects.

    &. +ource

    *. %arget

    9. 0usiness (ogic %ransformation 4ule-

    +ource %arget %ransformation 4ule

    +ource %ype %arget %ype alculate %a$ sal>7.&7-+ource %able %arget %able for top 9 employees based+ource olumn %arget olumn on the salary in dept 97F3ormat %ype ), + 3ormat %ype ), +Description Description

    DFD

    &: 4ows &: 4ows &:I- C'- CI- 9'- 9I- 9'-

    "# Fiter Transformation

    %his is of type an active transformation which allows you to filter the data based on givencondition.@@ A condition is created with the three elements

    &. )ort*. 'perator 9. 'perand

    &?

    #mp +K#) Dept U 97 %op 9 %a$ +al>.&7- %#mp

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    18/49

    %he integration service evaluates the filter condition against each input record, returns%4/# or 3A(+#.@@ %he integration service returns %/4# when the records is satisfied with the conditionand the records are given for further processing or loading the data into the target.@@ %he integration service returns 3A(+# when the input record is not satisfied with the

    condition and those records are reected from filter transformation.@@ 3ilter transformation does not support LI

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    19/49

    9# Aggregator Transformation

    %his is of type of an Active transformation which allows you to calculate the summaryfor a group of records.

    Aggregator transformation is created with following four components.&. =roup by It defines the group on a port for which summaries are calculated. #$.

    Deptno*. Aggregate #$pression@ %he aggregate e$pressions can be developed only in the

    output ports using following aggregate function.@@sum -@@ma$ -@@ avg -

    9. +orted Input @ An aggregator transformation receives sorted data as an input toimprove the performance of summary calculations.

    %he port on which group is defined, the same ports need to be sorted,

    using sorter transformation. 'nly group by port need to be sorted bysorter transformation-

    :. Aggregate ache @ %he Integration service creates cache memory when the firsttime session e$ecutes on it.@@ %he aggregate cache stored on server hard drive.

      @@ An incremental Aggregation uses aggregate cache to improve the performanceof session.

    Incremental AggregationItOs a process of calculating the summary for only new records, which pass throughmapping using historical cache.

     

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    20/49

    @@ /se lookup transformation to perform following tasks.&. =et related value*. In updating slowly changing dimension.

    Difference between #$pression and Aggregator %ransformation

    3+pression Transformation Aggregator Transformation

    )assive %ransformation Active %ransformation

    #$pressions are calculated for eachrecord

    #$pressions are calculated for group ofrecord

     

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    21/49

    erging

    Porizontally 6ertical

    Eoiner %ransformation /nion %ransformation

    #qui@Eoin aster outer oin Detail outer oin

    3ull 'uter Eoin

    =# 1outer Transformation

    4outer transformation is a type of active transformation which allows to apply multiplecondition, to load multiple target table.@@ Is created with two types of group.&. Input =roup @ 2hich receives the data from source.*. 'utput =roup @ 2hich sends the data to target.'utput groups are also of two types.

    &. /ser defined group allows to apply condition.

    *. Default group captures the reected record.

    *&

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    22/49

    Difference between 3ilter ; 4outer transformation.

    Fiter 1outer

    +ingle ondition based ultiple ondition based

    +ingle %arget ultiple %arget

    an not capture reects apture the reects

    DFD

    1outer Transformation

    Input

    +tateUP4 

    +tateUD(

    +tateU1A

    Default

    ># 5nion Transformation

    /nion transformation combines multiple input flows into a single output flow.It supports homogeneous and heterogeneous sources also.reated with two groups.

    &. Input group @ 4eceives the information*. 'utput group @ +ends the information to the target.

    /nion transformation works as union all in 'racle.

     

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    23/49

    /se the normal property when the stored procedure involves is performing calculation

    ""# Source @uaifier Transformation

    %his is a type of an active transformation which allows you to read the data from

    databases and flat files te$t file-.+K( 'verrideItOs a process of changing the default +K( using +ource filter, /ser defined oins, +ortinginput data and #liminating duplicates Distinct-+ource Kualifier transformation supports +K( override when the source is database.%he above logic gets process on the database server.%he business logic process is sharing between integration service and database server.%his improves the performance of data acquisition.

    /ser Defined Eoins If the two sources are belongs to the same database user account orsame 'D0 then apply the oins in the source qualifier rather than using oiner

    transformation.

    .appet 7 Types of .appet

    A mapplet is reusable metadata obect created with business logic using set oftransformation.A mapplet is created using mapplet designer tool.%here are two types of mapplet.

    &. Active mapplet @ ItOs created with the set of active transformation.*. )assive mapplet @ ItOs created with the set of passive transformation.

    It can be reused in a multiple mappings, having the following restrictions.&. 2hen you want to use stored procedure transformation you should use the stored

     procedure transformation with the type

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    24/49

    It is created in two different ways8i. /sing %ransformation developer ii. onverting a

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    25/49

     

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    26/49

      W5#)T

    W5#)D#%AI(+T

    0ormaier Transformation

    %his is of type of an active transformation which reads the data from =lobal file source.It is used to read the file from '0'( source. #very '0'( source definition bydefault associate with

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    27/49

    A transaction can be control at session level also by using the property commit interval.

    Seuence *enerator Transformation

    %his is of type passive transformation which allows you to generate the sequence number

    to be treated as primary keys.@@ A surrogate key is a system generated sequence number to be used as primary key tomaintain the history in a dimension tables.

    @@ A surrogate key is also known as dimensional key or artificial key or synthetic key.

    @@ A sequence generator transformation is created with two default output ports.

    i.

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    28/49

    %he default update strategy e$pression is DDInsert.

    /pdate strategy transformation functions works on target definition table.

    %he target table should contain primary key.

    /se the following target table options at session level to implement an update strategy

    i. Insert @ It inserts the records in the target.ii. /pdate @ /pdate as /pdate@@It updates the record in the target.

    iii. Delete @ It deletes the records on the target.

    iv. /pdate as insert @ 3or each update it insert a new record in the target.

    v. /pdate else insert @ It updates the record if e$ist else insert new record in thetarget.

    /se an update strategy transformation to update +D.

    CAC23

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    29/49

    Eoiner ache

    Inde$ ache Data ache

    LOO5P CAC23

    2ow it wor's

    %here are two types of cache memory inde$ and data cache.

    All ports value from the lookup table where the port is part of the lookup condition areloaded into inde$ cache.

    %he inde$ cache contains all ports value from the lookup table where the port is specifiedin the lookup condition.

    %he data cache contains all port values from the lookup table that are not in lookup

    condition and are specified as LoutputM ports.After the cache loaded, values from the lookup input ports that are part of lookupcondition are compared to inde$ cache.

    /pon a match the rows from the cache are included in stream.

    Types of Loo'up Cache

    2hen the mapping contains lookup transformation the integration service queries thelookup data and stores in the lookup cache.

    %he following are the types of cache created by integration service.

    &. +tatic (ookup ache%his is the default lookup cache created by integration service, it is the read only cache,can not be updated.

    *. Dynamic (ookup ache

    %he cache can be updated during the session run and particularly used when you performa lookup on target table in implementing +lowly hanging Dimension.

    *J

    Deptno&7*797:7

    Dname (ocationP4 PH0DI%

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    30/49

    It the lookup table is the target the cache is changed dynamically as target load rows are processed.

     

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    31/49

    )erformance onsideration

    A large lookup table may require more memory resources than available. A +K( overridein the lookup transformation can be used.

    )ersistent (ookup ache%he cache can be reused for multiple session runs. It improves the performance of thesession.

    A**13*AT3 CAC23

    2ow it Wor's

    2hen the first time session e$ecutes on integration service, the integration service createsan aggregate cache which is made up of inde$ cache and the data cache.

    %he integration service uses an aggregate cache to perform incremental aggregation.

    %his improves the performance of session.

    %here are two types of cache memory, inde$ and data cache.

    All rows are loaded into cache before any aggregation tasks place.

    All inde$ cache contains group by port values.

    %he data cache contains all ports value variable and connected output ports.

     

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    32/49

    )erform incremental aggregation using aggregate cache. )erform group on numerical port rather than using character port.

    Aggregate ache

    Inde$ ache Data ache

    SO1T31 CAC23

    2ow it Wor's

    If the cache size is specified in the properties e$ceeds the available amount of memory onthe integration service process machine then the integration service fails the session.

    All of the incoming data is passed into cache memory before the sort operation is

     performed.If the amount of incoming data is greater than the cache size specified then the)owerenter will temporary store the data in the sorter transformation work directory.

    1ey )oints

    %he integration service requires disk space of at least twice the amount of incoming datawhen storing data in work directory.

    )erformance onsideration

    /sing sorter transformation may improve performance over an L'rder byM clause in a+K( override in aggregate session when the source is a database because the sourcedatabase may not be tuned with the buffer size needed for a database sort.

    Performance Consideration in Earious Transformations

    Fiter Transformation

    1eep the filter transformation as close to the source qualifier as possible to filter the dataearly in the data flow.

    If possible move the same condition to source qualifier transformation.

    9*

    Deptno&7*797:7

    +um+al-777&*777C777JJ777

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    33/49

    1outer Transformation

    2hen splitting row data based on field values a router transformation has a performanceadvantage over multiple filter transformation because a row is read once into the inputgroup but evaluated multiple times based in the number of groups. 2hereas using

    multiple filter transformation requires the same row data to be duplicated for each filtertransformation.

    5pdate Strategy Transformation

    %he update strategy transformation performance can vary depending on the number ofupdates and inserts. In some cases there may be a performance benefit to split a mappingwith updates and insert into two mapping and sessions. 'ne mapping with inserts andother with updates.

    3+pression Transformation/se operator instead of functions

    #$ Instead of using concat function use NN operator to concatenate two string fields.

    +implify the comple$ e$pressions by defining variable ports.

    %ry to avoid the usage of aggregate function.

    TAS and TP3S OF TAS 

    A task is defined as a set of instructions. %here are two types of task.

    i. 4eusable %ask @ A task which can be defined for multiple workflows is known as

    reusable task. A reusable task is created using task developer tool. #$ +ession,command, #mail.

    ii.

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    34/49

    ii. +equential batch processing @ +ession e$ecutes one after another.

    %he above pictorial representation defines as follows

    If +@&7 is finished +ucceeded or 3ailed- then +@*7 start and so on.

    Lin' Condition

     In sequential batch processing the session e$ecuted sequentially and conditionally usinglink condition. Define the link conditions using a predefined variable called)rev%ask+tatus

    %he above pictorial representation defined as follows, If the +@&7 succeeded then +@*7will e$ecute and so on.

    WO1L3T and TP3 OF WO1L3T

    A 2orklet is defined as group of tasks. %here are two types of worklet.i. 4eusable 2orklet @ A worklet which can be defined in a multiple workflows is knownas reusable worklet.

    A reusable worklet is created using worklet designer tool. In a workflow manager.

    A worklet can be e$ecuted using a start task known as workflow.

    ii.

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    35/49

    *. )re@)ost +ession shell command @ you can call the command task as the pre@postsession shell command for a session task.

    Hou can use any valid /

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    36/49

    Timer Tas' 

    Hou can specify the period of time to wait before integration service runs the ne$t task inthe workflow with the timer task.

    %he timer task has two types of settings.i. Absolute type @ 2e can specify the time that integration service starts running

    the ne$t task in the workflow.

    ii. 4elative type @ Hou instruct the integration service to wait for specified period of time. After the timer task.

    #$ A workflow contains two sessions. Hou want the integration service wait &7 minutesafter the first session completes, before it runs the second session.

    /se the timer task after the first session, in the relative time setting of a timer task.+pecify &7 minutes for start time of the timer task.

    Assignment Tas' 

    Hou can assign a value to user defined workflow variable with the assignment task.

    %o use assignment task in the workflow first create an add an assignment task toworkflow. %hen configure the assignment task to assign value or e$pression to userdefined variable.

    3mai Tas' 

    #mail task is used to send an email within a workflow.

     

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    37/49

    :. )ing +ervice 6erifies the integration service is running or not.

    B. Pelp 4eturns the synta$ for the command that you specify with help.

    C. +tart 2orkflow It starts the workflow on integration service.

    ?. +chedule 2orkflow Instructs the integration service to schedule a workflow.

    0efore working with these commands you have to set environment variable for command prompt.

    +et the #nvironment 6ariable

    &. y omputer 4ight lick )ropertiesAdvanced#nvironment 6ariable

    *. lick   

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    38/49

    B. #$it #$it the )4#) from command line.

    5ser Defined Function

    It lets you to create customized function or user specific function to meet the specific

     business task that is not possible with built in functions.%he user defined functions can be private or public.

    .apping Parameters

    A mapping parameters represents a constant value that can be define before mapping run.

    A mapping parameter is created with the name, type, datatype, precision and scale.

    A mapping parameter is defined in a parameter file, which is saved with an e$tension.prm

    A mapping can be reused for various business rules by parameterize the mappings.

    4epresented by GG.)arameter file +ynta$

    Q3older

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    39/49

    Q3older . +essionR

    G session parameter U onnection

    Tracing Le(eA tracing level determines the amount of information in the session log.

    %he following are the types of tracing levels.

    &.

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    40/49

    *. )erform session recovery if the integration service has issued at least one commit.

    2hen you start the recovery session the integration service reads the 4'2ID of last rowcommitted record from ')0+4644#'6#4H table.

    %he integration service reads all the source data and start processing from ne$t 4'2ID.

    D3B5**31 

    It is used to debug the mapping to check the business functionality.

    .etadata 3+tension

    A metadata e$tension provides information about the developer who has created anobect.

    etadata e$tension includes the following information.

    &. Developer

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    41/49

    onnect to the source database with valid username and password. 4un the +K( Kueryon the database to verify that the data is available in the table from where it needs to bee$tracted.

    #$pected 0ehavior 

    @@ %he login to the database should be successful.

    @@ %he table should contain relevant data.

    Actual 0ehavior 

    @@ As e$pected

    %est 4esult

    @@ )ass or 3ail

    Data Load6/nsert

    #nsure that records are being inserted in the target.

    %est )rocedure

    i. ake sure that target table is not having any recordsii. 4un the mapping and check that records are being inserted in the target table.

    #$pected 0ehavior 

    %he target table should contain inserted record.

    Actual 0ehavior 

    @@ As e$pected

    %est 4esult

    @@ )ass

    )# Data Load65pdate

    #nsure that update is properly happening in the target.

    %est )rocedure

    i. ake sure that some records are there in the target already.

    ii. /pdate the value of the some field in a source table record which has been alreadyloaded into the target.

    iii. 4un the mapping

    #$pected 0ehavior 

    %he target table should contain updated record.

    Actual 0ehavior 

    @@ As e$pected

    %est 4esult

    @@ )ass

    8# /ncrementa Data Load

    #nsure that the data from the source should be properly populated into the targetincrementally and without any data loss.

    :&

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    42/49

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    43/49

    i. )erform a manual check to confirm that source columns are properly linked to thetarget columns.

    #$pected 0ehavior 

    %he data from the source columns should be placed in target table accurately.

    Actual 0ehavior @@ As e$pected

    %est 4esult

    @@ )ass

    =# Eerify 0aming Standard

    #nsure that obects are created with industry specific naming standard.

    %est )rocedure

    i. A manual check can be performed to verify the naming standard.

    #$pected 0ehavior 

    'bects should be given appropriate naming standards.

    Actual 0ehavior 

    @@ As e$pected

    %est 4esult

    @@ )ass

    ># SCD Type& .apping

    #nsure that surrogate keys are properly generating for a dimensional change.

    %est )rocedure

    i. Insert a new record with new values in addition to already e$isting records in the

    source.

    ii. hange the value of some field in a source table record which has been already loadedinto the target run the mapping.

    ii. 6erify the target for appropriate surrogate keys.

    #$pected 0ehavior 

    %he target table should contain appropriate surrogate key for insert and update.

    Actual 0ehavior 

    @@ As e$pected

    %est 4esult

    @@ )ass

    SST3. T3ST/0*

    +ystem testing also called Data 6alidation %esting.

    %he system and acceptance testing are usually separate. It might be move beneficial tocombine the two phases in case of tight timeline and budget constraint.

    :9

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    44/49

    A simple technique of counting the number of records in the source table that should betie up with

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    45/49

    ptimization:

    i. Eoiner %ransformation

    &. /se +orted Input

    *. Define the source as master source which occupies the least amount of memory

    in the cache.ii. Aggregator %ransformation

    &. /se +orted Input

    *. Incremental aggregation with aggregate cache.

    9. =roup by simpler ports, preferably

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    46/49

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    47/49

    * 4eturns multiple values by linking'utput ports to anothertransformation.-

    4eturns one value by checking the 4eturn )ortoption for the output port that provides thereturn value.

    9 #$ecute for every record passingthrough the transformation

    'nly e$ecuted when the lookup function iscalled

    : ore visible, shows where thelookup values are used.

    (ess visible as the lookup is called from ane$pression within another transformation.

    B Default values are used. Default values are ignored.

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    48/49

    +K( 'verride supported

    an be unconnected and invoked as needed

     )isad!antage

    an not output multiple matches

    /nconnected can only have one return value Does not support L'4M condition

    5nconnected Loo'up Transformation

    An unconnected transformation is not a part of data flow, act as a lookup that can becalled through other transformation using (1) identifier.

    It improves the efficiency of mapping.

    2igh Le(e Design

    %he following activities need to be identified.

    &. Identify the source system

    *. Identify the 4D0+

    9. Identify the hardware requirements

    :. Identify the #%( ; '(A) software requirement.

    B. Identify the operating system requirements.

    :

  • 8/16/2019 SUPER INFORMATICA BASICS.pdf

    49/49

    3TL De(eopment Life Cyce

    #%( )roect )lan

    0usiness 4equirements

    Pigh (evel Design

    (ow (evel Design

    #%( Development

    #%( /nit %esting

    +ystem %esting

    )erformance %esting

    #%( /ser Acceptance %esting

    Deployment

    2arranty, +tabilization )eriod

    aintenance