Post on 05-Jul-2018
8/16/2019 SUPER INFORMATICA BASICS.pdf
1/49
Data Warehousing Basic
What is Data Warehousing?
Data warehousing collection of data designed to support management decision making.In another words it is a repository of integrated information, available for querying and
analyzing. According to Inmon, famous author for several data warehouse books, "A data warehouseis a subect oriented, integrated, time variant, non volatile collection of data in support ofmanagement!s decision making process".
Who need data warehousing?
It is needed by the knowledge worker. e.g. anager, Analyst, #$ecutive and anyauthorized person who needed the information from the large scale of database.
Types of Systems
%here are two types of systems.
&. '(%)*. D++ '(A)-
Features OLTP OLAP
haracteristic 'perational )rocessing Informational )rocessing
'rientation %ransactional Analysis
/ser lerk, D0A 1nowledge 2orker
3unction Day to day operation (ong term informational requirements
D0 Design #4 0ased, application oriented star5snowflake, subect oriented
6iew detailed, flat relation +ummarized, ultidimensional
Access 4ead52rite ostly 4ead
D0 +ize &70 to &770 &770 to %0
Data Warehouse Life Cyce
%he data warehouse life cycle comprises of various phases8 Phase 1: 0usiness 4equirements ollections
A business analyst is responsible for gathering requirements from the end usersfor the following e$ample domains
&. %elecom*. Insurance9. anufacturing:. +ales ; 4etails
Phase 2: Data odelingIt is the process of designing database by database architect using #42I< tool.
Phase 3: #%( Developer An application developer designs an #%( application by following the #%(specification using =/I based tools. +uch as I
8/16/2019 SUPER INFORMATICA BASICS.pdf
2/49
Phase 4: #%( %esting%his phase is completed by #%( tester as well as application developer also.arried out the following test in the test environment
&. #%( unit testing*. +ystem %esting
9. )erformance %esting:. /A% /ser Acceptance %esting-
Phase 5: 4eport DevelopmentDesign the reports by fulfilling the report requirements templates using followingtools.
ognos0'
Phase 6: DeploymentIt is a process of migrating #%( and 4eport development application to the
production environment.
Phase 7: aintenanceaintain the Data warehousing in *:>? environments with the help of productionsupport team.
Data warehouse design Data!ase Design
A data warehouse design with the following types of schemas.&. +tar +chema*. +now 3lake +chema
9. =ala$y +chema"# Star Schema$%Is a database design which contains a centrally located fact tablesurrounded by dimension tables. +ince the database design looks like astar hence it is called star schema Database design.
• A fact table contains facts.
• 3acts are numeric measure.
•
8/16/2019 SUPER INFORMATICA BASICS.pdf
3/49
+tar +chema
Snowfa'e Schema
%he snowflake schema is a variant of star schema, where someDimension tables are normalized , thereby further splitting the data into additional tables.%he resulting schema graph forms a shape similar to snowflake.Ad($%
• +pace can be minimized by splitting into the normalized table.Disad($%
• It can hamper the query performance due to more number of oins.
a
+nowflake +chema
)# *aa+y Schema ,Fact Consteation Schema-
+ophisticated application mayrequire multiple facts table to share dimension table. %his type of schema can be viewedas combination of stars hence called gala$y schema or fact constellation schema.
+ales 3act
+aledpk-
ustid fk-
+toreid fk-
)roductidfk-Datein fk-
+ales 3act
+aledpk-
ustid fk-
+toreid fk-
)roductidfk-
Datein fk-
9
+tore Dimen
%ime Dimen
ustomer Dimension
)roduct Dimension
+tore Dimension
%ime Dimension
ustomer Dimension
)roduct Dimension
Item Dimension
ity Dimension
3act *3act &
D& D9 DB DC D?
8/16/2019 SUPER INFORMATICA BASICS.pdf
4/49
=ala$y +chemaDimensions
Dimension tables are sometimes called lookup or reference table.
&. onfirmed Dimension @ A dimension table which can be shared by multiple facttables is known as confirmed dimension.
*. Eunk Dimension @ A dimension with the type descriptive, flag, 0oolean which arenot used to describe the key performance indicators knows as facts, suchdimensions are called unk dimensions. #$ample, )roduct description, Address,)hone number etc.
9. +lowly hanging Dimension @ A Dimensions that change over time are called+lowly hanging Dimensions. 3or instance, a product price changes over timeF
)eople change their names for some reasonF ountry and +tate names may changeover time. %hese are a few e$amples of +lowly hanging Dimensions since somechanges are happening to them over a period of time. +lowly hangingDimensions are often categorized into three types namely %ype&, %ype* and%ype9. %he following section deals with how to capture and handling thesechanges over time.
Type 1: !er"riting the old !al#es$In the year *77B, if the price of the product changes to G*B7, then the oldvalues of the columns "Hear" and ")roduct )rice" have to be updatedand replaced with the new values. In this %ype &, there is no way to findout the old value of the product ")roduct&" in year *77: since the table
now contains only the new price and year information.Type 2: %reating another additional record$In this %ype *, the old values will not be replaced but a new rowcontaining the new values will be added to the product table. +o at any point of time, the difference between the old values and new values can be retrieved and easily be compared. %his would be very useful forreporting purposes.
Type 3: %reating ne" &ields.In this %ype 9, the latest update to the changed values can be seen.#$ample mentioned below illustrates how to add new columns and keeptrack of the changes. 3rom that, we are able to see the current price andthe previous price of the product, )roduct&.
:
D* D DJD:
8/16/2019 SUPER INFORMATICA BASICS.pdf
5/49
Data .odeingA Data model is a conceptual representation of data structures tables- required for adatabase and is very powerful in e$pressing and communicating the businessrequirements.A data model visually represents the nature of data, business rules governing the data,and how it will be organized in the database.Data modeling consists of three phases to design the database.
&. onceptual odeling
• /nderstand the business requirements
• Identify the entities tables-
• Identify the columns attributes-
• Identify the relationship*. (ogical odeling
• Design the tables with the required attributes.
9. )hysical odeling
• #$ecute the logical tables to e$ist physical e$istence in the database.
Data modeing toos
%here are a number of data modeling tools to transform business requirements intological data model, and logical data model to physical data model. 3rom physical datamodel, these tools can be instructed to generate +K( code for creating database.
B
8/16/2019 SUPER INFORMATICA BASICS.pdf
6/49
C
8/16/2019 SUPER INFORMATICA BASICS.pdf
7/49
/0FO1.AT/CA/ntroduction
Is =/I based #%( product from Informatica corporation.
Is a client server technology.
Is developed using EA6A language.
Is an integrated tool set %o Design, %o 4un, %o onitor-6ersions
&. B.7*. C.79. ?.&.&:. .&.&B. .BC. .C
.eta Data
eta Data is a LData about DataM means Data that describes data and other structures,such as obects, business rules, and processes.#$ample %able +tructure column name, data type, precision, scale and kyes-,Description
.apping
Is a =/I representation for the data flow from source to target. In other words, thedefinition of the relationship and data flow between source and target obects.
'e(#irements &or mappings
a- +ource etadata
b- 0usiness logic
c- %arget etadata
1epository
entral Database or etadata +torage place
?
Informatica
+ource Database%arget Data warehouse
4epository 2orking)lace in Informatica-4epository
8/16/2019 SUPER INFORMATICA BASICS.pdf
8/49
Staging Area
A place where data is processed before entering the warehouse.
Source System
A database, application, file, or other storage facility from which the data in a datawarehouse is derived.
Target System
A database, application, file, or other storage facility to which the "transformed sourcedata" is loaded in a data warehouse.
Ceansing
%he process of resolving inconsistencies and fi$ing the anomalies in source data,typically as part of the #%( process.
Transformation
%he process of manipulating data. Any manipulation beyond copying is a transformation.#$amples include cleansing, aggregating, and integrating data from multiple sources.
Transportation
%he process of moving copied or transformed data from a source to a data warehouse.
Wor'ing Professiona Di(isions
Two Fa(ors
&. Informatica )ower center 3or 0ig +cale Industries.
*. Informatica )ower mart 3or +mall +cale Industries.
Components of /nformatica
%lient %omponents
&. Designer *. 2ork flow anager 9. 2ork flow onitor :. 4epository anager
Designation 1oes
#%( Architects Designing +chema, #%( +pecificationDeveloper Developing #%( Application
Administrator Installation, onfiguration, anaging, onitoring
8/16/2019 SUPER INFORMATICA BASICS.pdf
9/49
B. Admin onsole
4oles for Designer &. /se apping*. +ource analyze9. onnect to source Database with 'D0:. %arget Designer B. apping Designer C. applet Designer ?. %ransformation Developer
4oles for 2orkflow anager
&. %ask Developer
*. 2ork flow designer 9. 2orklet designer
J
DesignerImport +ource DefinitionImport %arget etadataImport Designing apping
apping$yz-
+aveN
4epository
Wor'fow .anager&.reate +ession
apping+$yz-
+ave
*. reate 2orkflow
+tart
@@#$ecuting intoInformatica server.
@@Integration servicesare responsible fore$ecution.
Wor'fow .onitor
onitoring apping-
+ession
Admin Consoe
3or Administrative )urpose.
8/16/2019 SUPER INFORMATICA BASICS.pdf
10/49
Wor'ing fow of Cient Component in /nformatica
8/16/2019 SUPER INFORMATICA BASICS.pdf
11/49
2ow the .apping can !e done?
ustomer
ID number:-pk
fname varchar*B-
lname varchar*B-
=ender number&-
oncat fname, lname -Decode =ender, 7,O3O,&,OO-
8/16/2019 SUPER INFORMATICA BASICS.pdf
12/49
Designer Wor'fow .anger Wor'fow .onitor 1epository .anagerreate +ourceDefinition
reate session for eachapping
6iew workflow ;session status
reate #dit ; Delete folders
reate targetDefinition
reate 2orkflow =et +ession log reate /sers, groups, assign permission.
Define %54 4ule #$ecute 2orkflowDesign apping +chedule 2orkflow
/nformatica PowerCenter Cient Architecture
0ote$ 'ne 2orkflow can contain more than one session but one session will contain onlyone mapping.2orkflow is upper layer of the development while session is middle layer and mapping isinner layer.
&*
#$ternallient
4epository+ervices
Integration+ervices
2eb +ervicesPub
apping+ource Definition%arget Definition%54 4ule+ession2orkflow+ession log
+chedule Info
1epository
%arget D0+ource D0
+tagingArea
3
T
L
8/16/2019 SUPER INFORMATICA BASICS.pdf
13/49
Power Center Cients
%he following power center clients gets installed1$ )esigner
It is a =/I based client component which allows you to design the planof #%( process called mapping.
%he following types of metadata obects can be created using designerclient.a- reate +ource Definition b- reate %arget Definitionc- Design apping with or without a %ransformation rule.
2$ *or+&lo" anager
It is a =/I based client component which allows you to create thefollowing task.
a- reate session for each mapping. b- reate workflowc- #$ecute workflow
d- +chedule workflow3$ *or+&lo" onitor It is a =/I based client component which provides the followinginformation
a- =ive the workflow and session status +ucceeded or 3ailed- b- =et +ession (og from the repository.c- +tart, +top sessions and workflows.
4$ 'epository anager
%he 4epository manager is =/I based administrative client whichallows you to create following obects.
a- reate, #dit and Delete folders which are required to organize themetadata and the repository.
b- reate users, user groups, assign permissions and privileges.5$ Po"er %enter 'epository
%he )ower enter 4epository is a 4elational Database +ystemDatabase- which contains instruction required to e$tract transform andload data.
%he )ower enter client application can access the repositorydatabase through repository service.
%he 4epository consists of metadata which describes thedifferent types of obects such as source definition, target definition,mapping etc.
%he Integration service uses repository obects to performe$traction, transformation and load data.
%he repository also stores administrative information such asusername, passwords, permission and privileges.
%he Integration service also creates metadata such as session@log, workflow and session status, start and finish time of the session andstores in repository through repository service.
&9
8/16/2019 SUPER INFORMATICA BASICS.pdf
14/49
6$ 'epository -er!ice
%he 4epository service manages connections to the power centerrepository from client applications.
%he 4epository service is a multithreaded process that inserts,retrieves, deleted and updates metadata in the repository.
%he 4epository service ensures the consistency of the metadata inthe repository.%he 3ollowing )ower enter applications can access the repository
servicea- )ower enter lient b- Integration +ervicec- 2eb +ervice Pubd- ommand (ine )rogram 3or backup and 4ecovery for
administrative purpose-7$ Integration -er!ice
%he Integration +ervice reads mappings and session information from
the repository.It e$tract the data from the mapping source stores in the memory+taging Area- where it applies the transformation rule that you canconfigure in the mapping.
%he Integration +ervice loads the transformed data into themapping targets.
%he integration service connects to the repository throughrepository service to fetch the metadata.
.$ *e/ -er!ice 0#/
%he 2eb +ervice Pub is a web service gateway for the e$ternal clients.%he web service clients Internet #$plorer, ozilla- access the
integration service and repository service through web service hub.It is used to run and monitor web enabled work flows.
Definitions
-ession: A +ession is a set of instruction which perform e$traction, transformation andloading.A session reated to make the mapping available for e$ecution.
*or+&lo": A 2orkflow is a start task which contains a set of instruction to e$ecute theother task such as session.2orkflow is a top obect in the power center development hierarchy.
-ched#le *or+&lo": A +chedule workflow is an administrative task which specifies the
data and time to run the workflow.
44 %he following client component makes communication to integration service.
&. 2orkflow anager *. 2orkflow onitor
&:
8/16/2019 SUPER INFORMATICA BASICS.pdf
15/49
Transformation
A transformation is an obect used to define business logic for processing the data.
%ransformation can be categorized in two categories&. 0ased upon no. of rows processing
*. 0ased upon connection0ased upon no. of rows processing there are two types of %ransformation&. Active %ransformation*. )assive %ransformation
Acti(e Transformation$
A transformation which can affect the number of rows while data is going from source totarget is known as active transformation.%he following are the list of active transformation used for processing the data.
&. +ource Kualifier %ransformation*. 3ilter %ransformation
9. Aggregator %ransformation:. Eoiner %ransformationB. 4outer %ransformationC. 4ank %ransformation?. +orter %ransformation. /pdate +trategy %ransformationJ. %ransaction ontrol %ransformation&7. /nion %ransformation&&.
8/16/2019 SUPER INFORMATICA BASICS.pdf
16/49
#$ample of )assive %ransformation
&: 4ows &: 4ows &:I- &:'-
0ased on onnection there are two types of %ransformation&. onnected*. /nconnected
Connected$ A transformation which is participated in mapping data flow directionconnected to the source and target- is known as connected transformation.@@All active and passive transformation can be used as connected transformation.@@A connected transformation can receive the multiple inputs and can provide multipleoutputs.
+ %
+ I ' %a$I ' Annual +al
5nconnected$ A transformation which is not participating in a mapping data flowdirection neither connected to source nor to the target- is known as unconnectedtransformation.@@ An unconnected transformation can receives the multiple inputs but provides a singleoutput.@@ %he following transformation can be used as unconnected transformation.&. +tored )rocedure %ransformation*. (ookup %ransformation
+ %
&C
+K#mp#mp %#mp%a$U+al>7.&7
#$pression %ransformation
T61
SAL
CO.
T61
T61
apping
8/16/2019 SUPER INFORMATICA BASICS.pdf
17/49
Port 7 Types of Port
A )ort represents column of the database table or file.
%he following are the types of port.
&. Input )ort
*. 'utput )ortInput )ort A port which can receive the data is known as input port, which is representedas I.
'utput )ort A port which can provide the data is known as output port, which isrepresented as '.
3TL Specification Document ,.apping Specification Document-
A mapping specification document is an e$cel sheet or word document which containsinformation about following obects.
&. +ource
*. %arget
9. 0usiness (ogic %ransformation 4ule-
+ource %arget %ransformation 4ule
+ource %ype %arget %ype alculate %a$ sal>7.&7-+ource %able %arget %able for top 9 employees based+ource olumn %arget olumn on the salary in dept 97F3ormat %ype ), + 3ormat %ype ), +Description Description
DFD
&: 4ows &: 4ows &:I- C'- CI- 9'- 9I- 9'-
"# Fiter Transformation
%his is of type an active transformation which allows you to filter the data based on givencondition.@@ A condition is created with the three elements
&. )ort*. 'perator 9. 'perand
&?
#mp +K#) Dept U 97 %op 9 %a$ +al>.&7- %#mp
8/16/2019 SUPER INFORMATICA BASICS.pdf
18/49
%he integration service evaluates the filter condition against each input record, returns%4/# or 3A(+#.@@ %he integration service returns %/4# when the records is satisfied with the conditionand the records are given for further processing or loading the data into the target.@@ %he integration service returns 3A(+# when the input record is not satisfied with the
condition and those records are reected from filter transformation.@@ 3ilter transformation does not support LI
8/16/2019 SUPER INFORMATICA BASICS.pdf
19/49
9# Aggregator Transformation
%his is of type of an Active transformation which allows you to calculate the summaryfor a group of records.
Aggregator transformation is created with following four components.&. =roup by It defines the group on a port for which summaries are calculated. #$.
Deptno*. Aggregate #$pression@ %he aggregate e$pressions can be developed only in the
output ports using following aggregate function.@@sum -@@ma$ -@@ avg -
9. +orted Input @ An aggregator transformation receives sorted data as an input toimprove the performance of summary calculations.
%he port on which group is defined, the same ports need to be sorted,
using sorter transformation. 'nly group by port need to be sorted bysorter transformation-
:. Aggregate ache @ %he Integration service creates cache memory when the firsttime session e$ecutes on it.@@ %he aggregate cache stored on server hard drive.
@@ An incremental Aggregation uses aggregate cache to improve the performanceof session.
Incremental AggregationItOs a process of calculating the summary for only new records, which pass throughmapping using historical cache.
8/16/2019 SUPER INFORMATICA BASICS.pdf
20/49
@@ /se lookup transformation to perform following tasks.&. =et related value*. In updating slowly changing dimension.
Difference between #$pression and Aggregator %ransformation
3+pression Transformation Aggregator Transformation
)assive %ransformation Active %ransformation
#$pressions are calculated for eachrecord
#$pressions are calculated for group ofrecord
8/16/2019 SUPER INFORMATICA BASICS.pdf
21/49
erging
Porizontally 6ertical
Eoiner %ransformation /nion %ransformation
#qui@Eoin aster outer oin Detail outer oin
3ull 'uter Eoin
=# 1outer Transformation
4outer transformation is a type of active transformation which allows to apply multiplecondition, to load multiple target table.@@ Is created with two types of group.&. Input =roup @ 2hich receives the data from source.*. 'utput =roup @ 2hich sends the data to target.'utput groups are also of two types.
&. /ser defined group allows to apply condition.
*. Default group captures the reected record.
*&
8/16/2019 SUPER INFORMATICA BASICS.pdf
22/49
Difference between 3ilter ; 4outer transformation.
Fiter 1outer
+ingle ondition based ultiple ondition based
+ingle %arget ultiple %arget
an not capture reects apture the reects
DFD
1outer Transformation
Input
+tateUP4
+tateUD(
+tateU1A
Default
># 5nion Transformation
/nion transformation combines multiple input flows into a single output flow.It supports homogeneous and heterogeneous sources also.reated with two groups.
&. Input group @ 4eceives the information*. 'utput group @ +ends the information to the target.
/nion transformation works as union all in 'racle.
8/16/2019 SUPER INFORMATICA BASICS.pdf
23/49
/se the normal property when the stored procedure involves is performing calculation
""# Source @uaifier Transformation
%his is a type of an active transformation which allows you to read the data from
databases and flat files te$t file-.+K( 'verrideItOs a process of changing the default +K( using +ource filter, /ser defined oins, +ortinginput data and #liminating duplicates Distinct-+ource Kualifier transformation supports +K( override when the source is database.%he above logic gets process on the database server.%he business logic process is sharing between integration service and database server.%his improves the performance of data acquisition.
/ser Defined Eoins If the two sources are belongs to the same database user account orsame 'D0 then apply the oins in the source qualifier rather than using oiner
transformation.
.appet 7 Types of .appet
A mapplet is reusable metadata obect created with business logic using set oftransformation.A mapplet is created using mapplet designer tool.%here are two types of mapplet.
&. Active mapplet @ ItOs created with the set of active transformation.*. )assive mapplet @ ItOs created with the set of passive transformation.
It can be reused in a multiple mappings, having the following restrictions.&. 2hen you want to use stored procedure transformation you should use the stored
procedure transformation with the type
8/16/2019 SUPER INFORMATICA BASICS.pdf
24/49
It is created in two different ways8i. /sing %ransformation developer ii. onverting a
8/16/2019 SUPER INFORMATICA BASICS.pdf
25/49
8/16/2019 SUPER INFORMATICA BASICS.pdf
26/49
W5#)T
W5#)D#%AI(+T
0ormaier Transformation
%his is of type of an active transformation which reads the data from =lobal file source.It is used to read the file from '0'( source. #very '0'( source definition bydefault associate with
8/16/2019 SUPER INFORMATICA BASICS.pdf
27/49
A transaction can be control at session level also by using the property commit interval.
Seuence *enerator Transformation
%his is of type passive transformation which allows you to generate the sequence number
to be treated as primary keys.@@ A surrogate key is a system generated sequence number to be used as primary key tomaintain the history in a dimension tables.
@@ A surrogate key is also known as dimensional key or artificial key or synthetic key.
@@ A sequence generator transformation is created with two default output ports.
i.
8/16/2019 SUPER INFORMATICA BASICS.pdf
28/49
%he default update strategy e$pression is DDInsert.
/pdate strategy transformation functions works on target definition table.
%he target table should contain primary key.
/se the following target table options at session level to implement an update strategy
i. Insert @ It inserts the records in the target.ii. /pdate @ /pdate as /pdate@@It updates the record in the target.
iii. Delete @ It deletes the records on the target.
iv. /pdate as insert @ 3or each update it insert a new record in the target.
v. /pdate else insert @ It updates the record if e$ist else insert new record in thetarget.
/se an update strategy transformation to update +D.
CAC23
8/16/2019 SUPER INFORMATICA BASICS.pdf
29/49
Eoiner ache
Inde$ ache Data ache
LOO5P CAC23
2ow it wor's
%here are two types of cache memory inde$ and data cache.
All ports value from the lookup table where the port is part of the lookup condition areloaded into inde$ cache.
%he inde$ cache contains all ports value from the lookup table where the port is specifiedin the lookup condition.
%he data cache contains all port values from the lookup table that are not in lookup
condition and are specified as LoutputM ports.After the cache loaded, values from the lookup input ports that are part of lookupcondition are compared to inde$ cache.
/pon a match the rows from the cache are included in stream.
Types of Loo'up Cache
2hen the mapping contains lookup transformation the integration service queries thelookup data and stores in the lookup cache.
%he following are the types of cache created by integration service.
&. +tatic (ookup ache%his is the default lookup cache created by integration service, it is the read only cache,can not be updated.
*. Dynamic (ookup ache
%he cache can be updated during the session run and particularly used when you performa lookup on target table in implementing +lowly hanging Dimension.
*J
Deptno&7*797:7
Dname (ocationP4 PH0DI%
8/16/2019 SUPER INFORMATICA BASICS.pdf
30/49
It the lookup table is the target the cache is changed dynamically as target load rows are processed.
8/16/2019 SUPER INFORMATICA BASICS.pdf
31/49
)erformance onsideration
A large lookup table may require more memory resources than available. A +K( overridein the lookup transformation can be used.
)ersistent (ookup ache%he cache can be reused for multiple session runs. It improves the performance of thesession.
A**13*AT3 CAC23
2ow it Wor's
2hen the first time session e$ecutes on integration service, the integration service createsan aggregate cache which is made up of inde$ cache and the data cache.
%he integration service uses an aggregate cache to perform incremental aggregation.
%his improves the performance of session.
%here are two types of cache memory, inde$ and data cache.
All rows are loaded into cache before any aggregation tasks place.
All inde$ cache contains group by port values.
%he data cache contains all ports value variable and connected output ports.
8/16/2019 SUPER INFORMATICA BASICS.pdf
32/49
)erform incremental aggregation using aggregate cache. )erform group on numerical port rather than using character port.
Aggregate ache
Inde$ ache Data ache
SO1T31 CAC23
2ow it Wor's
If the cache size is specified in the properties e$ceeds the available amount of memory onthe integration service process machine then the integration service fails the session.
All of the incoming data is passed into cache memory before the sort operation is
performed.If the amount of incoming data is greater than the cache size specified then the)owerenter will temporary store the data in the sorter transformation work directory.
1ey )oints
%he integration service requires disk space of at least twice the amount of incoming datawhen storing data in work directory.
)erformance onsideration
/sing sorter transformation may improve performance over an L'rder byM clause in a+K( override in aggregate session when the source is a database because the sourcedatabase may not be tuned with the buffer size needed for a database sort.
Performance Consideration in Earious Transformations
Fiter Transformation
1eep the filter transformation as close to the source qualifier as possible to filter the dataearly in the data flow.
If possible move the same condition to source qualifier transformation.
9*
Deptno&7*797:7
+um+al-777&*777C777JJ777
8/16/2019 SUPER INFORMATICA BASICS.pdf
33/49
1outer Transformation
2hen splitting row data based on field values a router transformation has a performanceadvantage over multiple filter transformation because a row is read once into the inputgroup but evaluated multiple times based in the number of groups. 2hereas using
multiple filter transformation requires the same row data to be duplicated for each filtertransformation.
5pdate Strategy Transformation
%he update strategy transformation performance can vary depending on the number ofupdates and inserts. In some cases there may be a performance benefit to split a mappingwith updates and insert into two mapping and sessions. 'ne mapping with inserts andother with updates.
3+pression Transformation/se operator instead of functions
#$ Instead of using concat function use NN operator to concatenate two string fields.
+implify the comple$ e$pressions by defining variable ports.
%ry to avoid the usage of aggregate function.
TAS and TP3S OF TAS
A task is defined as a set of instructions. %here are two types of task.
i. 4eusable %ask @ A task which can be defined for multiple workflows is known as
reusable task. A reusable task is created using task developer tool. #$ +ession,command, #mail.
ii.
8/16/2019 SUPER INFORMATICA BASICS.pdf
34/49
ii. +equential batch processing @ +ession e$ecutes one after another.
%he above pictorial representation defines as follows
If +@&7 is finished +ucceeded or 3ailed- then +@*7 start and so on.
Lin' Condition
In sequential batch processing the session e$ecuted sequentially and conditionally usinglink condition. Define the link conditions using a predefined variable called)rev%ask+tatus
%he above pictorial representation defined as follows, If the +@&7 succeeded then +@*7will e$ecute and so on.
WO1L3T and TP3 OF WO1L3T
A 2orklet is defined as group of tasks. %here are two types of worklet.i. 4eusable 2orklet @ A worklet which can be defined in a multiple workflows is knownas reusable worklet.
A reusable worklet is created using worklet designer tool. In a workflow manager.
A worklet can be e$ecuted using a start task known as workflow.
ii.
8/16/2019 SUPER INFORMATICA BASICS.pdf
35/49
*. )re@)ost +ession shell command @ you can call the command task as the pre@postsession shell command for a session task.
Hou can use any valid /
8/16/2019 SUPER INFORMATICA BASICS.pdf
36/49
Timer Tas'
Hou can specify the period of time to wait before integration service runs the ne$t task inthe workflow with the timer task.
%he timer task has two types of settings.i. Absolute type @ 2e can specify the time that integration service starts running
the ne$t task in the workflow.
ii. 4elative type @ Hou instruct the integration service to wait for specified period of time. After the timer task.
#$ A workflow contains two sessions. Hou want the integration service wait &7 minutesafter the first session completes, before it runs the second session.
/se the timer task after the first session, in the relative time setting of a timer task.+pecify &7 minutes for start time of the timer task.
Assignment Tas'
Hou can assign a value to user defined workflow variable with the assignment task.
%o use assignment task in the workflow first create an add an assignment task toworkflow. %hen configure the assignment task to assign value or e$pression to userdefined variable.
3mai Tas'
#mail task is used to send an email within a workflow.
8/16/2019 SUPER INFORMATICA BASICS.pdf
37/49
:. )ing +ervice 6erifies the integration service is running or not.
B. Pelp 4eturns the synta$ for the command that you specify with help.
C. +tart 2orkflow It starts the workflow on integration service.
?. +chedule 2orkflow Instructs the integration service to schedule a workflow.
0efore working with these commands you have to set environment variable for command prompt.
+et the #nvironment 6ariable
&. y omputer 4ight lick )ropertiesAdvanced#nvironment 6ariable
*. lick
8/16/2019 SUPER INFORMATICA BASICS.pdf
38/49
B. #$it #$it the )4#) from command line.
5ser Defined Function
It lets you to create customized function or user specific function to meet the specific
business task that is not possible with built in functions.%he user defined functions can be private or public.
.apping Parameters
A mapping parameters represents a constant value that can be define before mapping run.
A mapping parameter is created with the name, type, datatype, precision and scale.
A mapping parameter is defined in a parameter file, which is saved with an e$tension.prm
A mapping can be reused for various business rules by parameterize the mappings.
4epresented by GG.)arameter file +ynta$
Q3older
8/16/2019 SUPER INFORMATICA BASICS.pdf
39/49
Q3older . +essionR
G session parameter U onnection
Tracing Le(eA tracing level determines the amount of information in the session log.
%he following are the types of tracing levels.
&.
8/16/2019 SUPER INFORMATICA BASICS.pdf
40/49
*. )erform session recovery if the integration service has issued at least one commit.
2hen you start the recovery session the integration service reads the 4'2ID of last rowcommitted record from ')0+4644#'6#4H table.
%he integration service reads all the source data and start processing from ne$t 4'2ID.
D3B5**31
It is used to debug the mapping to check the business functionality.
.etadata 3+tension
A metadata e$tension provides information about the developer who has created anobect.
etadata e$tension includes the following information.
&. Developer
8/16/2019 SUPER INFORMATICA BASICS.pdf
41/49
onnect to the source database with valid username and password. 4un the +K( Kueryon the database to verify that the data is available in the table from where it needs to bee$tracted.
#$pected 0ehavior
@@ %he login to the database should be successful.
@@ %he table should contain relevant data.
Actual 0ehavior
@@ As e$pected
%est 4esult
@@ )ass or 3ail
Data Load6/nsert
#nsure that records are being inserted in the target.
%est )rocedure
i. ake sure that target table is not having any recordsii. 4un the mapping and check that records are being inserted in the target table.
#$pected 0ehavior
%he target table should contain inserted record.
Actual 0ehavior
@@ As e$pected
%est 4esult
@@ )ass
)# Data Load65pdate
#nsure that update is properly happening in the target.
%est )rocedure
i. ake sure that some records are there in the target already.
ii. /pdate the value of the some field in a source table record which has been alreadyloaded into the target.
iii. 4un the mapping
#$pected 0ehavior
%he target table should contain updated record.
Actual 0ehavior
@@ As e$pected
%est 4esult
@@ )ass
8# /ncrementa Data Load
#nsure that the data from the source should be properly populated into the targetincrementally and without any data loss.
:&
8/16/2019 SUPER INFORMATICA BASICS.pdf
42/49
8/16/2019 SUPER INFORMATICA BASICS.pdf
43/49
i. )erform a manual check to confirm that source columns are properly linked to thetarget columns.
#$pected 0ehavior
%he data from the source columns should be placed in target table accurately.
Actual 0ehavior @@ As e$pected
%est 4esult
@@ )ass
=# Eerify 0aming Standard
#nsure that obects are created with industry specific naming standard.
%est )rocedure
i. A manual check can be performed to verify the naming standard.
#$pected 0ehavior
'bects should be given appropriate naming standards.
Actual 0ehavior
@@ As e$pected
%est 4esult
@@ )ass
># SCD Type& .apping
#nsure that surrogate keys are properly generating for a dimensional change.
%est )rocedure
i. Insert a new record with new values in addition to already e$isting records in the
source.
ii. hange the value of some field in a source table record which has been already loadedinto the target run the mapping.
ii. 6erify the target for appropriate surrogate keys.
#$pected 0ehavior
%he target table should contain appropriate surrogate key for insert and update.
Actual 0ehavior
@@ As e$pected
%est 4esult
@@ )ass
SST3. T3ST/0*
+ystem testing also called Data 6alidation %esting.
%he system and acceptance testing are usually separate. It might be move beneficial tocombine the two phases in case of tight timeline and budget constraint.
:9
8/16/2019 SUPER INFORMATICA BASICS.pdf
44/49
A simple technique of counting the number of records in the source table that should betie up with
8/16/2019 SUPER INFORMATICA BASICS.pdf
45/49
ptimization:
i. Eoiner %ransformation
&. /se +orted Input
*. Define the source as master source which occupies the least amount of memory
in the cache.ii. Aggregator %ransformation
&. /se +orted Input
*. Incremental aggregation with aggregate cache.
9. =roup by simpler ports, preferably
8/16/2019 SUPER INFORMATICA BASICS.pdf
46/49
8/16/2019 SUPER INFORMATICA BASICS.pdf
47/49
* 4eturns multiple values by linking'utput ports to anothertransformation.-
4eturns one value by checking the 4eturn )ortoption for the output port that provides thereturn value.
9 #$ecute for every record passingthrough the transformation
'nly e$ecuted when the lookup function iscalled
: ore visible, shows where thelookup values are used.
(ess visible as the lookup is called from ane$pression within another transformation.
B Default values are used. Default values are ignored.
8/16/2019 SUPER INFORMATICA BASICS.pdf
48/49
+K( 'verride supported
an be unconnected and invoked as needed
)isad!antage
an not output multiple matches
/nconnected can only have one return value Does not support L'4M condition
5nconnected Loo'up Transformation
An unconnected transformation is not a part of data flow, act as a lookup that can becalled through other transformation using (1) identifier.
It improves the efficiency of mapping.
2igh Le(e Design
%he following activities need to be identified.
&. Identify the source system
*. Identify the 4D0+
9. Identify the hardware requirements
:. Identify the #%( ; '(A) software requirement.
B. Identify the operating system requirements.
:
8/16/2019 SUPER INFORMATICA BASICS.pdf
49/49
3TL De(eopment Life Cyce
#%( )roect )lan
0usiness 4equirements
Pigh (evel Design
(ow (evel Design
#%( Development
#%( /nit %esting
+ystem %esting
)erformance %esting
#%( /ser Acceptance %esting
Deployment
2arranty, +tabilization )eriod
aintenance