Architecture framework in support ofeffort estimation of legacy systems
modernization towards a SOAenvironment
Towards a SOA Modernization Impact Analysis
Final Version
Zdravko Anguelov
IBM Nederland B.V. verleent toestemming dat deze scriptie ter inzage wordt gegeven middels plaatsingin bibliotheken. Voor publicatie, geheel of gedeeltelijk van deze scriptie dient vooraf toestemming doorIBM Nederland B.V. te worden verleend.
Architecture framework in support ofeffort estimation of legacy systems
modernization towards a SOAenvironment
THESIS
submitted in partial fulfillment of therequirements for the degree of
MASTER OF SCIENCE
in
COMPUTER SCIENCE
by
Zdravko Anguelov
Software Engineering Research GroupDepartment of Software TechnologyFaculty EEMCS, Delft University of TechnologyDelft, the Netherlandswww.ewi.tudelft.nl
Global Business ServicesIBM Nederland B.V.
Johan Huizingalaan 765Amsterdam, The Netherlands
www.ibm.com
Architecture framework in support ofeffort estimation of legacy systems
modernization towards a SOAenvironment
Author: Zdravko AnguelovStudent id: 1232436Email: [email protected]
Abstract
Because of their poor Business/IT alignment, many legacy systems lack the flexi-bility to support rapid changes to the business processes they implement, required bytoday’s enterprises. Furthermore, after many years of maintenance, there is a need tomanage their resulting increased complexity and maximize asset utilization throughreuse. The third complicating circumstance is that these legacy systems cannot simplybe replaced as it is too expensive and risky. For these three reasons, legacy systems aremodernized towards a Service Oriented Architecture.
This thesis presents a framework for performing an impact analysis of such a mod-ernization. It supports the trade-off analysis, needed in the planning phase, for findingthe optimal selection of modernization strategies and judging their yield. The impact isexpressed through the estimation of, on the one side, the effort and, on the other side,the gain of the changes these modernization strategies entail. The thesis concentrateson one of the many types of changes in modernization – the architectural and designchanges to the software system.
The presented framework structures current approaches to modernization in a setof class definitions, system model relationships and a process description. This is doneaccording to the effort they produce, preparing them for its estimation. For this effortestimation, this thesis introduces a Rating Model for quantifying the modernizationeffort using the system models of the framework. This quantification is done throughthe identification of so-called Points of Modernization, a categorization of the modern-ization strategies and a set of effort indicator metrics.
Based on this framework, this thesis also presents an experiment. For a subjectlegacy system, concrete approaches are shown for the instantiation of the frameworkmodels and the subsequent effort estimation is done using the indicator of Scattering.
The analysis of the resulting effort and its relation to the gain show the optimal solu-tions for the modernization of the subject system. Concluding, this thesis discusses thefeasibility of the approach and the future work such as more quantitative research onthe rest of the effort indicators.
Thesis Committee:
Chair: Prof. Dr. A. van Deursen, Faculty EEMCS, TU DelftUniversity supervisor: Dr. Hans-Gerhard Gross, Faculty EEMCS, TU DelftCompany supervisor: Dr. Raymond van Diessen, IBMCommittee Member: Ir. Bernard Sodoyer, Faculty EEMCS, TU Delft
ii
Preface
This thesis is the final work of my Masters study Computer Science at the Delft Universityof Technology. It was conducted between December 2008 and December 2009 in collab-oration with IBM Amsterdam. The subject domain of the research is the modernization oflegacy systems towards a Service-Oriented environment. The existing modernization strate-gies in this domain are analyzed for the changes they require to the architecture and designof legacy systems. This has served as basis for the construction of a framework for evalu-ating the overall impact of legacy system modernization and in particular the estimation ofeffort required for performing the modernization strategies.
The contribution of this research is the organization of the modernization changes insuch a way that their effort impact can be estimated following a suggested process. Nextto this theoretical contribution, the thesis suggests concrete automated approaches for theinstantiation of the architectural models prescribed in the framework. These approaches areapplied for the purpose of effort estimation on a contemplated modernization of a subjectlegacy system. The outcome of this case study is analyzed in the context of impact analysisand trade-off analysis. This work concludes with an evaluation of the presented frameworkand with the outlines for further development of this field through future work.
The reason for performing this research is the need in IBM for a quantitative model ofthe origin and propagation of impact for the domain of SOA modernization. In particular, anestimation is needed of the formation of effort due to modernization. Such a model wouldhave the predictive power needed for a trade-off analysis of future modernization initiativesin enterprises.
This thesis had two complementing goals on the road to the needed impact analysis. Inthe first place, to do a broad investigation of the field of modernization towards SOA and laythe grounds for impact analysis, which encompasses amongst others the estimation of effort,gain and risk of the modernization. To do this, the important issues had to be identified thathave impact on the modernization process and results. After which, an inventory had to bemade of the existing approaches to modernization considering the issues they belong to andthe way they produce effort. As there was no such previous work bringing together thesemultiple domains, the literature study phase of the research was a real challenge.
On the other hand, this thesis also had as a goal to contribute quantitative results to thefield. For this purpose, we have concentrated on the issue of effort estimation. In this field,
iii
PREFACE
we investigate into detail a concrete instance of legacy system modernization and evaluatethe effort estimation results in the context of the broad goal of SOA modernization. Manyof the issues in constructing the framework were cleared through the literature research.However the experimental part helped me to get a more clear view on issues and make thewhole more precise. I hope that I have been able to find the balance between the two goalsand produce useful results to both theory and practice.
Finally, I would like to express my gratitude and give thanks to everybody that madethis thesis possible. In the first place, these are my supervisors Raymond van Diessen fromIBM and Hans-Gerhard Gross from the Software Evolution Research Lab at TU Delft. Ithank them for their support and feedback throughout all the research phases. I would alsolike to thank my people-managers at IBM, Jochem Harteveld and Schelto van Heemstra,who welcomed me into the Business Application Modernization team and helped me feelat home there. Of course, I would also like to thank all the team members themselves forhaving me and especially David Stern who helped me with the technical realization of theexperiment of this thesis. For this, I would also like to thank Marcel Wulterkens and histeam at IBM for making the experiment possible with their readiness to provide me withthe code of the subject legacy system. And finally the many others at IBM I talked to aboutprojects and issues, which opened my eyes also for the practical issues and considerationsin real modernization projects.
Zdravko AnguelovDelft, the Netherlands
January 20, 2010
iv
Contents
Preface iii
Contents v
List of Figures vii
Listings ix
1 Problem Statement 1
2 Theoretical context 52.1 Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Service Oriented Architecture . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Existing approaches to system evolution . . . . . . . . . . . . . . . . . . . 14
2.3.1 Renaissance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3.2 Architecture Driven Modernization and RMM . . . . . . . . . . . 172.3.3 SMART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Modernization towards Service Oriented Architectures . . . . . . . . . . . 202.4.1 Impact Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.4.2 Classification of modernization strategies . . . . . . . . . . . . . . 222.4.3 Architectural and design features . . . . . . . . . . . . . . . . . . . 25
3 Effort Estimation Framework – Research description 273.1 Targeting the Effort in the Software Application domain . . . . . . . . . . . 273.2 Framework foundation on the existing approaches . . . . . . . . . . . . . . 303.3 Framework configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.1 Class definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3.2 Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.3 Control Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 Framework specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.4.1 Framework Meta Model . . . . . . . . . . . . . . . . . . . . . . . 42
v
CONTENTS
3.4.2 Framework instantiation process . . . . . . . . . . . . . . . . . . . 423.4.3 Effort estimation Rating Model . . . . . . . . . . . . . . . . . . . 43
4 Legacy Logistics System – effort estimation experiment 474.1 Experiment definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.2 Experiment planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.1 Context selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.2.2 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.2.3 Variable selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.2.4 Subject system description . . . . . . . . . . . . . . . . . . . . . . 494.2.5 Experiment design . . . . . . . . . . . . . . . . . . . . . . . . . . 504.2.6 Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.2.7 Validity evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 Experiment operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.3.1 Business Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.3.2 Architectural Model . . . . . . . . . . . . . . . . . . . . . . . . . 544.3.3 Complete as-is model and Rating Model . . . . . . . . . . . . . . . 54
4.4 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.4.1 Rating Model indicator of Scattering . . . . . . . . . . . . . . . . . 554.4.2 Resulting Effort . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.5 Interpretation of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.5.1 Effort development and relationship to Gain . . . . . . . . . . . . . 574.5.2 Trade-off analysis – Gain/Effort . . . . . . . . . . . . . . . . . . . 57
4.6 Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 584.6.1 Future work – Increase accuracy with Service Model . . . . . . . . 60
5 Research evaluation 615.1 Contribution and applications of research results . . . . . . . . . . . . . . . 615.2 Limitations of the framework . . . . . . . . . . . . . . . . . . . . . . . . . 625.3 Resulting quality of the framework . . . . . . . . . . . . . . . . . . . . . . 64
6 Summary, Conclusion and Future Work 676.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Bibliography 71
A Glossary 81
B MATLAB code used for performing the experiment 85
C Figures 89
vi
List of Figures
1.1 Simple model of system modernization . . . . . . . . . . . . . . . . . . . . . 2
2.1 ADM modernization paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Graph of the expected relationship between Effort and Gain . . . . . . . . . . . 283.2 Componentization- and Integration effort for different legacy sets . . . . . . . . 39
C.1 Hierarchy of structural paradigms in software architecture . . . . . . . . . . . 89C.2 Service Component Architecture (SCA) . . . . . . . . . . . . . . . . . . . . . 90C.3 Service Component Architecture using Service Data Object (SDO) . . . . . . . 90C.4 Data Access Service (DAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90C.5 SOA Process View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91C.6 SOA Development View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91C.7 Renaissance - Incremental Process model . . . . . . . . . . . . . . . . . . . . 92C.8 Renaissance - Portfolio analysis . . . . . . . . . . . . . . . . . . . . . . . . . 92C.9 Renaissance - Data flow diagram for evolution modeling . . . . . . . . . . . . 92C.10 Renaissance - Evolution strategies . . . . . . . . . . . . . . . . . . . . . . . . 93C.11 Renaissance - Information sources for performing modernization . . . . . . . . 93C.12 Risk-management modernization approach (RMM) . . . . . . . . . . . . . . . 94C.13 ADM Transformation types in the horseshoe model . . . . . . . . . . . . . . . 94C.14 SMART process for the identification of a modernization strategy towards SOA 95C.15 Architectural goal of the modernization towards SOA . . . . . . . . . . . . . . 96C.16 Legacy systems components impacted by modernization . . . . . . . . . . . . 96C.17 Impact domains involved in system modernization (ADM) . . . . . . . . . . . 97C.18 Framework entities for modeling the legacy system . . . . . . . . . . . . . . . 97C.19 Framework entities for modeling the target SOA system . . . . . . . . . . . . . 98C.20 Framework entities for modeling the modernization process . . . . . . . . . . . 98C.21 Framework relationships between models and the contained entities . . . . . . 99C.22 Framework - impact analysis process . . . . . . . . . . . . . . . . . . . . . . . 100C.23 Framework - Rating Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101C.24 Experiment - Business Model extraction and traceability to the code level . . . 102
vii
LIST OF FIGURES
C.25 Experiment - Architectural Model extraction and traceability to the code . . . . 102C.26 Experiment - Business Process Model mapped on the Architectural Model . . . 102C.27 Experiment - Overview of the experiment software prototype design . . . . . . 103C.28 Experiment validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103C.29 Experiment - Subject system call map export . . . . . . . . . . . . . . . . . . 104C.30 Business Process for entry-point node #12 . . . . . . . . . . . . . . . . . . . . 104C.31 Business Process for entry-point node #49 . . . . . . . . . . . . . . . . . . . . 104C.32 Business Process for entry-point node #5174 . . . . . . . . . . . . . . . . . . . 105C.33 Architectural Model - complete hierarchical clustering dendrogram . . . . . . . 106C.34 Architectural Model - magnified hierarchical clustering dendrogram . . . . . . 107C.35 Architectural Model - component definition for different granularities . . . . . 108C.36 D300 – scattering distribution for granularity level of 300 . . . . . . . . . . . . 109C.37 D500 – scattering distribution for granularity level of 500 . . . . . . . . . . . . 109C.38 D1000 – scattering distribution for granularity level of 1000 . . . . . . . . . . . 110C.39 D3000 – scattering distribution for granularity level of 3000 . . . . . . . . . . . 110C.40 D5386 – scattering distribution for granularity level of 5386 . . . . . . . . . . . 111C.41 Box plot of five scattering distributios Dg with g=300, 500, 1000, 3000, 5386 . 112C.42 Experiment analysis - scattering median . . . . . . . . . . . . . . . . . . . . . 113C.43 Experiment analysis - scattering mean . . . . . . . . . . . . . . . . . . . . . . 113C.44 Experiment analysis - scattering standard deviation . . . . . . . . . . . . . . . 114C.45 Experiment analysis - ripple effect scattering . . . . . . . . . . . . . . . . . . . 115C.46 Experiment analysis - E f f ort(g) . . . . . . . . . . . . . . . . . . . . . . . . . 116C.47 Experiment analysis - difference in scattering growth between processes . . . . 117C.48 Best approximations of the Gain/Effort relationship . . . . . . . . . . . . . . . 118C.49 Trade-off analysis - Pareto frontier . . . . . . . . . . . . . . . . . . . . . . . . 119C.50 Trade-off analysis - optimal solutions . . . . . . . . . . . . . . . . . . . . . . 120C.51 Trade-off analysis - rate of change around optimal solutions . . . . . . . . . . . 121C.52 Experiment future work - Service Model mapped to Application Model . . . . 122C.53 Experiment future work - Business Model mapped to Service Model . . . . . . 122C.54 Experiment future work - building the to-be architecture using SCA . . . . . . 122C.55 Experiment future work -service encapsulation model . . . . . . . . . . . . . . 123C.56 Future work overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
viii
Listings
B.1 Business Model abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . 85B.2 Visualization of Business Process abstraction for entry-node #5174 . . . . . 86B.3 Architectural Model abstraction . . . . . . . . . . . . . . . . . . . . . . . 87B.4 Business to Architectural model mapping for all granularities . . . . . . . . 88
ix
Chapter 1
Problem Statement
Many enterprises have legacy systems that are up to several decades old. Such systems arecharacterized by the fact that they have been changed and enhanced, through maintenance,during the long period of time of their existence. The reasons for these changes, are changesin the business requirements(environment), which have to be reflected in the supporting ITsystem. This continuous evolution, with 5 to 7% functionality modification a year [51], hasmade these systems large, complex and heterogeneous. Due to the inevitable maintenance,the system’s design has deteriorated, which makes further changes even more difficult andless isolated (with greater impact on the rest of the system). Such legacy systems havebecome essential to the operation of the enterprise and cannot just be discarded and rebuiltfrom scratch. This is often too risky (functionality may get lost, too complex functionality,undocumented functionality), too expensive and takes too much time.
At the same time, enterprises have goals that require changes of their systems to bemade, ranging from technology driven maintenance to the incorporation of new businessrequirements. Two main goals are: 1) becoming agile and 2) reducing the cost of technol-ogy. Enterprises want to become more competitive by being able to react fast to marketchanges [30]. This means that it must be possible for them to change business processesfast and thus reduce the time-to-market of their products. Reducing the cost of technologycan be achieved by improving system interoperability and increasing reuse. Interoperabilityneeds to be improved internally, between the enterprise’s own applications, but also exter-nally, with other parties such as suppliers and customers. There is thus a need for creating adynamic and easily reconfigurable IT environment, which corresponds to the dynamic busi-ness environment of today. Furthermore, increasing reuse will increase efficiency and givethe enterprise a competitive edge on the market. All the above-mentioned characteristics,that will improve the agility of the system and reduce the cost of technology, are not easilyachievable in most legacy systems, because of the systems’ rigid design and heterogeneity.
Legacy System Modernization
This description of the current state of legacy systems and of the desired business organiza-tion exposes a mismatch. This is the mismatch between modern business requirements and
1
1. PROBLEM STATEMENT
the supporting legacy software systems. It is manifested through the fact that the structureof legacy IT systems is not optimal for a good IT-to-Business alignment. At the same time,industry case studies and research indicate that an architectural change of legacy systems,towards Service Oriented Architecture, can remedy this problem [8, 74, 66]. Thus, so farwe have established two facts:
1. Legacy systems do not satisfy the business needs of the enterprises using them: Theyare not easy to change at the pace of change of the business processes they support.They are also not economically efficient as there is little reuse possible due to theirdesign.
2. SOA addresses many of theses issues: SOA-based systems offer a good solution forsupporting a fast changing environment, because of better IT-to-Business alignment.SOA also structures the system in such a way that reuse is much easier.
Therefore, it is obvious that the need exists for achieving the potential solution(2.) tothe problem(1.). The process to do that is a special case of the general approach fromthe field of system evolution, described in section 2.3. The fundamental problem in thisarea is, thus, concerned with the evolution of an as-is legacy system to a to-be system(Figure 1.1), where the to-be system has to comply with the architectural principles ofService-Orientation (section 2.2).
AS-ISSystem
TO-BESystem
Figure 1.1: Simple model of system modernization
Impact Analysis and Effort Estimation
The above-mentioned modernization is carried out according to modernization strategiesthat can have far reaching consequences [8, 56, 60, 23]. The modernization impact canvary significantly in its effort, cost and duration, depending on the extensiveness of thechanges needed to reach the target system organization. This means that there is a signifi-cant investment involved for the enterprise with a high risk of loss of legacy functionality,performance etc. [91, 27]. An enterprise will find the investment needed, justified only upto a certain level, depending on the benefits it offers. Therefore, a feasibility analysis ofsuch modernization is needed prior to its execution.
Such a feasibility analysis has to weight the positive against the negative impacts ofsuch a modernization project. In its most basic form, these would be the benefits against theneeded effort. Therefore, an important prerequisite to impact analysis is the establishing ofan estimation mechanism for the effort involved. Furthermore, since it is only a preliminaryanalysis, it must not require much resources itself to perform, which means relying heavilyon automation and at hand resources such as source code. For these reasons an automatableestimation mechanism is most suitable.
2
Research Question Formulation
Modernization towards SOA has many aspects, each having some form of impact on theeventual effort needed. Hence the subtitle of this thesis – “Towards SOA modernizationimpact analysis”, as we will concentrate only on part of the issues relevant to a completeimpact analysis . As an initial division, the literature study has identified the organizational,business and technology (information and systems) aspects. From these, the aspect we aremost interested in is the technology and specifically the software architecture organization.Concentrating on the software architecture description of the as-is and to-be systems is agood choice as an indicator of the changes the modernization will require for three reasons.
In the first place, the architectural view is an abstraction from the ”how?” [53, p.42].Thus an approach based on it will give a more generic approach, applicable to a broaderset of subject systems and will not be influenced by low level implementation dependentdifferences.
Second, as is shown in section 2.2 on Service Oriented Architecture, many of the deci-sions in such systems are done on a high-abstraction level – architectural and design level.There they are more easily relatable to the business domain. The software architectural levelis, thus, where the link can be more easily made between rather different system organiza-tions such as legacy and SOA.
Finally, architectural views have as an advantage that they usually can be derived in anautomated way from the legacy source code. Finding an automated approach is of moreinterest compared to expert-driven approaches, because legacy systems display two charac-teristics, which can be most efficiently handled through automation:
1. size and complexity: due to the size and complexity of legacy systems the sought afterapproach by the research has to be repeatable and be able to handle a large volume ofinput information. This will make it applicable to projects in different contexts, basedon reuse with the same initial development investment.
2. poor understanding and documentation: due to the limited availability of other sys-tem design sources, the approach has to mainly rely on hard assets such as sourcecode and deployment configurations.
So in summary, this research is focused on how to use this architectural informationabout a legacy system to analyze the impact of modernizing towards an Service OrientedArchitecture by estimating the needed effort. This is formulated as the following researchquestion:
The modernization process of legacy systems to a SOA environment imposes systemarchitectural/design changes. How to perform an automated overall (software) changeimpact analysis for these changes?
Thesis Research Goals
The literature study has identified the following existing work relevant to the research prob-lem described above:
3
1. PROBLEM STATEMENT
• generic modernization frameworks such as ADM, RMM, Renaissance and SMART(automated vs expert-driven)• legacy system portfolio analysis for identification and prioritization of sub-system
modernization needs• SOA target environment specification guidelines• set of modernization strategies on the architectural level
From the survey of current approaches (section 2.3), it becomes apparent there is nomethod that binds them together. Such a method would make it possible to relate the appli-cation of the known modernization strategies in the context of legacy system modernizationtowards SOA. In doing so, such a methodology can be used to evaluate the impact and ex-tent of the architectural changes that are the consequence of the modernization strategies.This would give the basis for performing a SOA change impact analysis and modernizationeffort estimation. So the broad setting for the problem stated above is the identificationand targeting of the legacy system weak points, specification of target SOA architecture,and application of SOA modernization strategies as part of a broader process of systemmodernization. Thus, the sub-problems this thesis research focuses on are:• Which selection of elementary modernization changes on architectural/design level
should be used to describe the SOA modernization strategies? These strategies asdefined in section sec:strategies can be described through the software architecturemodeling entities described in section sec:architecture. A selection of these elemen-tary changes will form the basis for describing the modernization plan towards SOAof the legacy system.• How to organize these modernization changes? Are there groups of related changes?
Is there impact propagation (between groups) based on these relationships?• How to relate the changes in this organization to each other, so that the total impact of
the modernization can be quantified? What scale of measurement is most appropriateand feasible to rate this impact–nominal, ordinal, interval or ratio scale?
This report began in this chapter with a clear statement of the research problem to besolved. The purpose of the rest of the report is to describe the findings of the thesis research.In chapter 2, a selection of existing approaches is described that were identified during theliterature study and that together form the context for the further thesis research. Theseinclude software architecture modeling concepts, SOA specification concepts and modern-ization strategies. Chapter 3 describes the concrete approach suggested for tackling theproblem stated in the research question. It consists of a framework specification togetherwith its scope and the assumptions made. This approach is then applied in practice. Theexperimental setting for that and the results are presented in chapter 4. The chapter con-sists of the description of a case study legacy system and a concrete implementation of thesuggested framework used to analyze it. In the last section of that chapter the results of theexperiment are analyzed. In chapter 5 the whole research is evaluated as to whether it canbe considered successful and what its added value is. Finally in chapter 6 the conclusionsof the research as a whole and the possibilities for future work are presented.
4
Chapter 2
Theoretical context
This chapter describes the theoretical context of the research problem. It summarizes thefields of research relevant to the problem statement. These established facts, theories andnotions will function as premises for the research. They form the theoretical context thatwill be extensively referenced later on. It is needed in order to make a clear distinctionbetween background knowledge, which will not be put in question throughout the rest ofthe research, and the sought after knowledge. This will help in the identification of theaccomplishments of the performed research. The purpose of the theoretical framework isalso to mark the boundaries of the research project.
The research problem is at the intersection of four research areas: Software Archi-tecture, Service-Oriented Architectures, System Evolution, Impact Analysis. They aredescribed in the following sections with the artifacts from these domains and the methods tomanipulate them. First is the the domain of Software Architecture in section 2.1, listing theconcepts used to describe the structure of software systems. Based on this foundation, sec-tion 2.2 describes the characteristics of the specific class of Service-Oriented Systems. Insection 2.3 we present the existing approaches for overall system evolution and moderniza-tion. We conclude in section 2.4 with field of Impact Analysis and its application in legacymodernization focusing on the specific issues in transforming the architectural aspects oflegacy systems towards SOA.
2.1 Software Architecture
Complex systems can benefit from a high-level design, guiding its construction and mainte-nance [49]. Architecturing involves the specification on a high, abstract level, of a system,its elements and the way they interact with each other.
Definition 2.1 Architecture: The fundamental organization of a system, embodied in itscomponents, their relationships to each other and the environment, and the principles gov-erning its design and evolution, IEEE standard P1471–2000 [49]
Based on this definition, we can conclude that using software architecture is useful fortwo reasons. First, architectural descriptions manage to stay focused on the fundamental
5
2. THEORETICAL CONTEXT
software system characteristics. This is possible through the use of abstraction and hidingof detail. One such example of filtering out information and concentrating on a particularsystem aspect is through system views [49], such as a development view, process viewetc. [61].
Second, it is possible to distinguish between different levels of abstraction throughsoftware architecture. This can be summarized in a hierarchy of structural abstractionparadigms [53], shown in figure C.1. This demonstrates the difference in abstraction levelbetween software architecture, software design and implementation structures. The systemsarchitecture consists of early design decisions, on a higher abstraction level, meant to guidethe further more detailed system design. This view has also been adopted by the IEEE [49].
These two characteristics of software architectural descriptions make them very suit-able as models for reasoning about system modernization. Through software architecture,a level of abstraction can be found where it is possible to describe different systems or dif-ferent states of the same system using the same architectural concepts. This enables thecomparison of these instances even though they may be implemented very differently. Inthe following subsections we describe concepts on three levels of abstraction that are usedto capture software architectural views – components, patterns and frameworks.
Components and Connectors
One of the main viewpoints, to document the software architecture, is in terms of compo-nents and connectors [36, 2, 53]. These are two concepts on a medium level of abstraction(Figure C.1). There is no exact definition of a component [53, p.30], only a general under-standing about its characteristics. A component is considered the basic building block ofan application. It is an autonomous, encapsulated computational entity which accomplishespre-defined tasks through internal computation and external communication. There are nolimitations on the granularity of a component. Its size can vary from a GUI button througha collection of classes to a complex application service. Procedures, packages, objects andclusters can all be viewed as instances of the same abstraction - the software component.Components define their contract, to be used for external communication, in the form ofinterfaces. An interface thus defines the services the implementing component supplies toits requesters. Two components can communicate with each other through a connector, thatadheres to the communication definitions of the interfaces of both components. A connectoris thus the collection of rules and constraints that define the relationship between compo-nents [69]. The purpose of this representation is to decouple the functionality of a systemin independent entities - components and connectors. This way different compositions ofthese entities are made possible, through the pre-defined interfaces, and ultimately supportreusability. This decoupling makes it also possible to focus on select groups of softwarecomponents in isolation when designing or analyzing a software architecture.
Definition 2.2 Component: an autonomous, encapsulated computational entity which ac-complishes pre-defined tasks through internal computation and external communication
Definition 2.3 Connector: the collection of rules and constraints that define the relation-ship between components through their interfaces
6
Software Architecture
Component attributes are used to describe the components. Put together, they form aComponent Model [53, p.113-120]. Sample elements of this model are the specificationof the externally visible properties of the component, constraints on these properties and anevent model. Based on this component view of a system’s architecture and the componentmodels one can reason about the software architecture as a collection of components. Gar-lan et al. [37] have introduced the term architectural mismatch to denote interoperabilityproblems between components due to assumptions about:• the nature of components: functionality supply, infrastructure, control model, data
manipulation• the nature of connectors: protocols and data models• the architecture of the assemblies: constraints on interactions• the runtime construction process: order of instantiation
Further component attributes that can have influence on component ensembles interoper-ability are given by Bhuta et al. [7]. In their work the authors describe an automated methodto evaluate the compatibility of components given their component descriptions. Thesedescription consists of four four categories of attributes:• General attributes• Interface attributes: control inputs, data inputs, data outputs• Internal assumption attributes: concurrency, encapsulation, layering, preemption• Dependency attributes: communication dependency, execution language support
In system modernization, architectural transformations are part of the process (sec-tion 2.4). Mismatches between the resulting subsystems and components will have to beovercome, leading to additional effort. In an effort estimation framework, it is thus neces-sary to be able to estimate the compatibility of resulting subsystems and components. Theabove-mentioned Component Models offer a quantifying approach for this purpose.
Patterns
A further concept for describing the software architecture of a system is a pattern. Hess [45]emphasizes the application of patterns in the modernization of legacy systems. Experiencehas shown that well designed systems in the same problem domain, display similar architec-ture [42]. Groups of components and connectors display characteristic structures, that aretypical for certain commonly encountered problems. These typical groups of componentsare instances of the abstract solution to the common problem which is called a pattern. Thedefinition of a pattern we are going to use is given by Buschamann [18] as follows:
Definition 2.4 Pattern: A pattern for software architecture describes a particular recur-ring design problem that arises in specific design contexts, and represents a well-provengeneric scheme for its solution. The solution scheme is specified by describing its con-stituent components, their responsibilities and the ways in which they collaborate.
Patterns can be found on each of the abstraction levels of figure C.1. Kaisler [53, p.29]and Buschmann [18] make the distinction between architectural patterns, design patterns
7
2. THEORETICAL CONTEXT
and idioms in order of decreasing abstraction. The definition of architectural patterns ac-cording to Buschmann [18, p.12] as a means of capturing the overall structuring principlesof a system is:
Definition 2.5 Architectural pattern: An architectural pattern expresses a fundamentalstructural organization schema for software systems. It provides a set of predefined subsys-tems, specifies their responsibilities, and includes rules and guidelines for organizing therelationship between them.
The most widely spread architectural patterns are first documented by Garlan and Shawin 1994 [38] who use the term architectural styles. These architectural styles have been alsocategorized in taxonomies [83, 58], [53, p.218-327] based on similarities and differencesin their properties and application. Architectural patterns help achieve a specific globalsystem property, such as the adaptability of the user interface of a system. An overviewof the patterns on the design level is given by Gamma et al. [35]. They are categorizedas creational, structural and behavioral. The most important four from their work are alsoknown as the Gang Of Four. Design patterns are, as architectural patterns, a proven solutionto a common problem in a generic form [53]. They must be instantiated for the particularcontext they are going to be used in. The difference with architectural patterns is that theirsolution space is much more fine-grained. The definition given by Buschmann:
Definition 2.6 Design Pattern: A design pattern provides a scheme for refining the sub-systems or components of a software system, or the relationship between them. It describesa commonly-recurring structure of communicating components that solves a general designproblem within a particular context.
Patterns can be nested in each other to combine their advantages and solve the softwareengineering problem gradually at different levels. A single system may use multiple archi-tectural styles. They can be combined in multiple ways, for example hierarchically, to forma heterogeneous architecture [38].
Frameworks
Using components and patterns, a framework goes even further and specifies a partialrealization of a system. Frameworks are above the level of software architecture in theabstraction hierarchy (Figure C.1). The main goal of frameworks is to enable reuse of theentire system designs and application structures, found in these lower abstraction levels [53,p.344]. This is done by making only the fundamental choices/assumptions on architecturaland design level for a particular domain and packaging these as a ready-to-use infrastruc-ture. This supplies the guidelines for the structure of the class of solutions in the particulardomain. At the same time, it leaves open the implementation of the details to each concreteapplication using the framework. As a result, the framework determines the overall structureof the applications derived from it, but also allows for flexibility and customization.
Definition 2.7 Framework: a generic architecture complemented by an extensible set ofcomponents [53, p.343]
8
Software Architecture
The specification of a framework is built up from the following three parts:• Class definitions: collection of concept definitions [53, p.405]• Relationships: set of structural and behavioral relationships between the classes
of the framework. Together the classes and relationships capture the structure of aframework [53, p.405]• Control Model: this interactions’ specification [53, p.345, Taligent Corp.] “ties the
parts of the framework together to provide a skeletal solution for a class of prob-lems” [53, p.410]. Usually this results in an inverted role of control between theapplication and the infrastructure – the infrastructure calls/invokes application rou-tines
By definition, a framework supports Instantiation(of the framework), Refinement(for aspecific domain) and Extension(for a more general domain) [53, p.406-407]. These ca-pabilities and the above-mentioned structure of frameworks offer the following advan-tages [53]:• improved maintainability, because of the consistent reuse of the framework facilities
within each application. But also because of the consistency between applicationsusing the same framework.• improved quality and reliability, because all concrete applications based on a frame-
work use proven design/architecture and best practices.• shorter design-time, because the set of common facilities and solution structures are
supplied by the infrastructure of the framework and are reused for each new applica-tion.
Frameworks are relevant to this research for three reasons:I. frameworks complete the set of abstraction concepts for architecture description
II. our definition of Service Oriented Architecture (Definition 2.8) is based on the con-cept of a framework
III. frameworks are useful for the structuring of our own approach to effort estimation
Knowledge about the presence of a framework in the architectural description of a sys-tem is useful for modernization effort estimation. In the first place, just as patterns, frame-works indicate a degree of cohesion between the components they consist of. This cohesionis the source of resistance to change of only part of the structure. Changes due to moderniza-tion of only one part, may propagate to the rest of the structure and result in additional effort.Furthermore, frameworks are bound to contain a lot of architectural assumptions leading tointeroperability problems between different frameworks: architectural mismatches betweenthe control models, mismatch between the provided and required services by the differentframeworks [53, p.410-411].
A framework arrangement is also useful for the specification of the impact analysisand effort estimation suggested in this research. Frameworks are also used outside thedomain of software architecture. There, they can specify any generic structure togetherwith the provision of facilities for extension and customization. Also in this broader sense,the advantages and capabilities of using a framework apply. Because this research is just
9
2. THEORETICAL CONTEXT
an initial attempt in the field, we want it to have the advantages of frameworks describedabove to enable future improvements. The construction of the framework will be done instages. The set of assumptions will be separated from the set of configuration parameters.The challenge of framework design is in finding the balance between reuse and flexibility.On the one hand, the goal is to increase reusability by packaging components so that theycan be reused in as many application domains as possible. But at the same time, an equallyimportant goal is to design architectures that are easily adapted to the requirements of thetarget application domains, thus, achieving flexibility [53, p.404].
2.2 Service Oriented Architecture
This section gives a description of Service Oriented Architecture (SOA). It describes thefeatures of SOA that distinguish it from the typical legacy system architecture. The mod-ernization process this thesis considers has SOA as the target environment. Achieving everycharacterizing feature of this target environment impacts the amount of effort required. Thisis why we need to define what has to be established through modernization in order to quan-tify the resulting effort.
Enterprise Total Architecture
Businesses need to employ a holistic view of their enterprise in order to be successful. Inthis view the enterprise consists of four interrelated elements: business processes, peo-ple, information and systems brought together by a purpose and described in the so-calledEnterprise Total Architecture [17]. The success of this total architecture is directly relatedto its ability to enable the achievement of the enterprise purpose. The purpose of the ar-chitecture is, thus, to support the business processes that make the enterprise work. Thismeans that understanding these business processes is an essential prerequisite to creatingthat architecture. Service Oriented Architecture is a paradigm on how to arrange the fourabove-mentioned elements of the Total Architecture of the Enterprise in such a way that theenterprise can reach its goal. Through its guidelines on all four aspects it helps build a co-ordinated whole, that can evolve and still keep up a good overall quality. Designing a SOAaids, for example, in building an environment that integrates the business process architec-ture and the software architecture of the enterprise. There are more aspects to designing aSOA that the software and the applications. This holistic view of the enterprise by the SOAparadigm distinguishes it from the approaches used to engineer legacy IT systems.
Service Oriented Architecture challenge
A fundamental idea of the Service Oriented Architecture paradigm, as we have seen above,is the case for any ICT infrastructure, is that the business processes do not simply dependon the information systems - they define the services these systems should provide [16].In other words SOA is a Business-Driven approach [32, p.52]. The business processes in-evitably change and this leads to changes in the supporting systems. Here lies the majorchallenge for SOA - designing systems in such a way that accommodating most business
10
Service Oriented Architecture
process changes can be achieved by simply rearranging existing business services. Thus,the main aim is to achieve flexibility [52]. But this flexibility is determined by each of theabove-mentioned aspects of the Total Architecture. This means that, besides system flexi-bility, a SOA must ensure a clear and flexible organization, roles and processes. Achievingflexibility is hindered by the challenges of existing distributed systems in the enterprise,different owners and system heterogeneity [52, p.13].
SOA definition
In the following sections we give the characteristics of the Service Oriented Architectureparadigm, by giving definitions of its most important concepts, including its own designparadigm and design principles, design patterns, a distinct architectural model, and relatedconcepts and technologies. Three of the most important of these concepts are services,interoperability and loose coupling [52, p.16]. Service Oriented Architecture is still anevolving field of research and there is still no single industry-wide definition. We give thedefinition, that captures best all the aspects in a single formulation.
Definition 2.8 Service-Oriented Architecture: A service-oriented architecture is a frame-work for integrating business processes and supporting IT infrastructure as secure, stan-dardized components services - that can be reused and combined to address changingbusiness priorities [9]
Definition 2.9 Capability: A capability is a resource that may be used by a service providerto achieve a real world effect on behalf of a service consumer [67]
Definition 2.10 Business Process: A business process is a description of the tasks, partic-ipants’ roles and information needed to fulfill a business objective [67]
“So an SOA is not simply an approach to designing computer systems; it is an approachto designing the enterprise. Shared capabilities are utilized in different contexts to achieveeconomies of scale and consistency of operations and control. The computer system shouldbe designed to align with the design of the business” [26, p.32]. The technological founda-tion of SOA is the design paradigm of Service-Orientation. It is guided by a set of 8 designprinciples that will be described later on. The service is here the fundamental unit throughwhich solution logic is represented [32]. However designing a SOA is not just making surethat a system displays characteristics of integration, loose coupling and is built out of ser-vices. For a successful SOA the following four ingredients are important - Infrastructure,Architecture, Processes, Governance [52, p.18].
Services
As already mentioned, SOA is a Business-Driven approach, so the main goal is to structurethe supporting software system following the structure of the business processes. The idealend-situation is an easy and direct 1-to-1 mapping, between business and software solutionlogic, which results in improved flexibility. This results in flexibility, because the cause
11
2. THEORETICAL CONTEXT
for change is often initiated from the business. Thus, being able to follow the changes inthe business processes by only reconfiguring(reconnecting components) and not function-ality redistribution results in flexibility. This is why the concepts of services and Service-Orientation are introduced. The definition of a service is not yet settled upon [16, 30, 82].We follow:
Definition 2.11 Service: A service is an implementation of a well-defined piece of businessfunctionality, with a published interface that is discoverable and can be used by serviceconsumers when building different applications and business processes [82, Ch1.5].
Services have been mentioned in the section on software architecture as general con-cepts for functionality that a component can supply to a requesting party. This is mainly afine-grained concept on the level of function calls. In SOA, a service has a more course-grained meaning. Services form a Service Layer [30, p.23] as an abstraction above com-ponents, functioning as a convenient way of encapsulating functionality and offering it as acapability. Ideally ”services map to the business functions that are identified during businessprocess analysis“[30, p.28].
In spite of the absence of concrete definitions there is a common set of characteristicsmost associated with service orientation [75]. These are further distiled, by Erl [31], to the8 Design Principles, mentioned in the SOA definition subsection:Contracts: Services share a formal contract defining the terms of information exchange in
their interactionReusability: Services are designed to support potential reuseCoupling: Services are loosely coupledAbstraction: Services abstract underlying logic. The only part of a service that is visible to
the outside world is what is exposed via the services description and formal contract.The underlying logic is invisible and irrelevant to service requestors.
Composability: Services may compose other services. This possibility allows logic to berepresented at different levels of granularity and promotes reusability and the creationof abstraction layers
Autonomy: Services are autonomous. The logic governed by a service resides within anexplicit boundary. The service has complete autonomy within this boundary and isnot dependent on other services for the execution of this governance
Statelessness: Services should be designed to maximize statelessness even if that meansdeferring state management elsewhere
Discoverability: Services are discoverable. They should allow their descriptions to bediscovered and understood by humans and service users who may be able to makeuse of the services logic
From the principles mentioned above autonomy, loose coupling, abstraction and the need fora formal contract can be considered the core that forms the foundation for SOA. From thispoint of view, SOA is nothing more than a paradigm aimed at improving system flexibility.As such, it is appropriate for large distributed systems with different owners and a highdegree of heterogeneity [52].
12
Service Oriented Architecture
SOA Software Architecture Modeling
Two architectural views of the software architecture of Service-Oriented systems are centralto identifying modernization changes and thus effort. The first view captures the aspects ofProcess View described by Kruchten [61] and is given by Lewis et al. [64] (Figure C.5). Inthis view, central is the interoperation between Service Providers and Services Consumers.
Definition 2.12 Service Provider: a participant that offers a service that permits somecapability to be used by other participants [67]
Definition 2.13 Service Consumer: a participant that interacts with a service in order toaccess a capability to address a need [67]
Besides these entities, Infrastructure is also an important enabling concept. The SOAInfrastructure connects service consumers to service providers. It usually implements aloosely coupled, synchronous or asynchronous, message-based communication model, butother mechanisms are possible. The infrastructure contains elements to provide support forcross-cutting concerns in achieving system integration such as service discovery, security,transaction management and logging. A common SOA infrastructure is an Enterprise Ser-vice Bus (ESB) to support web service environments [64]. The Process View of SOA illus-trates the present separation of concerns on an architectural level between Service Providers,Service Consumers and Infrastructure.
The second view of the software architecture of Service-Oriented systems is from thework of Gimnich [39] (Figure C.6). It demonstrates the difference in architecture betweenService-Oriented systems and legacy systems through the Development View describedin [61]. In legacy systems the functionality from the layers 1, 2, and 4 – OperationalSystems, Service Components and Business Processes – can be found, albeit not alwaysso neatly separated. This results in direct coupling between the business processes andthe supporting software systems. SOA introduces the additional abstraction of the ServiceLayer [30] between the Business and the Components. It enables that important IT-to-Business alignment mentioned in the problem description in chapter 1 by decoupling thebusiness process from the implementing components.
There are two standards for modeling the Development View. One standard is con-cerned with modeling the application logic. The other, models the data used by the appli-cations. The first is the Service Components Architecture (SCA)1,2, used to model thelink between the elements from each of the four layers of the Development View. The SCAspecification (Figure C.2) defines how the system components are built and how to com-bine those components into complete applications. The Service Components layer of theDevelopment View is organized according to SCA through the concepts of Componentsand Composites. By adding Services and References to the them, the mapping is donefrom the Service Components layer to the Service layer. The components are linked to theservices they supply. Linking Services and References to each other through the concept ofWires, the further link upwards is done to the Business Processes layer. The wired services
1http://osoa.org2http://www.oasis-opencsa.org/
13
2. THEORETICAL CONTEXT
are aggregated into business processes. Finally this resulting implementation independentmodel can also be linked through Bindings to a specific implementation from the bottomlayer of the Development View – Operational Systems.
The second standard complementing SCA is the Service Data Objects (SDO)3. It spec-ifies an architecture for handling the business data in an implementation independent way.The goal is unified data access to heterogeneous data sources. The SDO representation ofthe data is exchanged through the wires of the SCA model (Figure C.3). The SDO architec-ture is based on the concept of disconnected Data Graphs built out of Data Objects. Infor-mation about the structure is contained as a Metadata(Figure C.4). Access to data sourcesis provided by a class of components called Data Access Services (DAS). The DAS is re-sponsible for the marshaling and unmarshaling of the implementation specific and possiblyheterogeneous data sources into the SDO representation as Data Graphs (Figure C.4).
Both architectural views, mentioned in this subsection, are needed for the estimation ofmodernization effort. They model architectural characteristics that are impacted by modern-ization. These are the structure of components and their interoperation of both applicationlogic and data (section 2.4). To manage the complexity of the effort estimation problem, the“divide and conquer” approach has to be applied. Using the above-mentioned views enablesthe division of the target SOA architecture in pieces for which the effort can be estimated inrelative isolation and in respect to different modernization concerns.
2.3 Existing approaches to system evolution
This section describes three existing approaches for performing system evolution and mod-ernization – Renaissance, ADM and SMART. Only when the steps of modernizing a sys-tem are clearly defined, can an estimation be made of the effort this process will require.The definition of these steps is what these approaches supply. They share some commonfeatures, but also complement each other in other areas. Each one emphasizes on differentaspects of the modernization process.
Renaissance has the most oversight of the three. It defines all phases of the modern-ization process. This gives a good starting point and orientation of where the effort ofmodernizing a legacy system lies. Renaissance focuses on possible modernization strate-gies, cost factors and their categorization. On the other hand, ADM and the accompanyingRMM approach focus on the modernization planning phase. They model the changes in-volved on the architectural level and the distinction between affected domains. This is animportant extension to Renaissance as modernization towards SOA consists for an impor-tant part of architectural transformations. The reason for this is that the differences that needto be overcome between legacy and SOA systems ares mainly in the software architecture.
Both Renaissance and ADM are approaches that lend themselves for automation. Thatis not the case with SMART. It is an expert-driven approach in a questionnaire form. How-ever, it has to its advantage that it is specifically developed for evaluating the modernizationtowards SOA. Therefore, parts of these three approaches will be combined for the construc-tion of the overall effort estimation framework in chapter 3. The rest is mentioned as an
3http://osoa.org/display/Main/Service+Data+Objects+Home
14
Existing approaches to system evolution
indicator of how much is left out of the scope of this research and will be mentioned in thechapter on future work, chapter 6.
2.3.1 Renaissance
Renaissance is a generic incremental approach for performing legacy system modernization.“Renaissance provides and end-to-end guidance, from project conception through to systemdeployment, for managing evolution projects” [98, p.17]. The method defines:
1. a generic process that guides practitioners through the activities of evolution projects2. practical advice and techniques on how to perform the activities of the process3. an information repository defining the information to gather during the process
The process of modernization iterates through four phases: Plan Evolution, Implement,Deliver, Deploy (Appendix C.7). In the Planning phase is decided whether to decommis-sion the system, continue to maintain it, improve through reengineering or replace it. ThisPortfolio Analysis is based on an assessment of the legacy system and its operational con-text. The system’s “technical quality” and “business value” are estimated by quantifyinga corresponding set of parameters. For the technical quality, attributes such as age of thesystem, failure rate, complexity and size are important [98, Table3.2]. The choice of furtheraction depends on the cost/risk trade-off decisions taken for the project. Once the technicalquality and business value have been quantified, Renaissance prescribes a recommendationabout further action, depending on the combination (Figure C.8).
Evolution strategies
In the case of further action, the Renaissance method identifies six evolution strategies toevolve to a new desired state, given in figure C.10. Each of them involves certain effort anhas its benefits, given in the description [98, p.50-54].
Evolution modeling
To support the modernization process, Renaissance gives the evolutionary steps that have tobe taken, together with the information they require (Appendix C.9). Two main aspects aremodeled: the Legacy System and the Target System. Each of these models consist of twoparts Context Model and Technical Model [98, Figure4.1].
Context modeling is done from four viewpoints on the system [98, Figure4.4]. To rep-resent these viewpoints a set of documents [98, Table4.1] and capturing techniques [98,Table4.2] is recommended for each of them.• Business: captures the system’s support for the business processes• Functional: describes the system functionality which implements the business pro-
cesses captured in the business view• Structural: overview of the system’s software architecture and data structures model• Environmental: describes the physical system structure such as its network organi-
zation and communication model
15
2. THEORETICAL CONTEXT
Technical modeling is done by further decomposing the documents from the structural andenvironment view for both the legacy and target system. The objectives for the techni-cal modeling are three: a) identify components which can be worked on independentlyb) provide a base to extract, adopt, and reuse components c) provide sufficient detail tosubsequently implement the target system.
Information Management
Renaissance defines six categories of information needed for the evolution process:• Business: contains Business goals, Business process description, Problem statement• Legacy system: contains the Assessment report, System documentation, Context
model, technical model• Target System: contains the Context model and Technical model• Evolution strategy: contains Possible evolution strategies, Cost Benefit Risk analy-
sis, System evolution strategy• Project Management: contains Project plan and Deployment plan• Test: contains the Test strategy plan, Test data, Test report
The first four are needed in order to plan the modernization of the system (Appendix C.11).The target system specification does not need to created into detail as it is adjusted in theImplementation and Delivery phases. The last two information sources, Project Manage-ment and Test, contribute to the effort and costs required, but are dependent on the first fourand are only made after these are ready.
Integration Models
Renaissance presents six Integration Models for the components of the legacy system [98,Fig5.7]. Each integration model is appropriate for a particular reengineering strategy [98,Table5.2]. It describes a logical structure and how to integrate parts of the legacy systemthrough it into a target system. Here we list the models and the way to achieve them sug-gested by Renaissance:• Data Integration - Direct access, Meta-data encapsulation• Service Integration - Remote Procedure Call, Message-oriented middleware• Presentation Integration - Screen scraping• Distributed Object Integration - Direct Access, Encapsulation
Cost Drivers
Renaissance gives a process for estimating the risks and costs associated with each evolutionstrategy [98, Figure3.14]. The costs are divided in three categories:
I. RIC – Reengineering Investment CostsII. OSRC – Operations and Support costs for Reengineered Components
III. OSLC – Operations and Support costs for Legacy Components
16
Existing approaches to system evolution
The costs are a consequence of effort required for modernization activities such as [98, p.25,Table3.7]:• Reverse engineering• Software development• Data migration• Incremental evolution• Target system deployment• Legacy system integration
• Integration testing• Licensing costs• Hardware• Training• Operational site activation
The possible estimation techniques for the above-mentioned costs are: a) Expert judg-ment b) Analogy-based c) Top-down d) Bottom-up e) Parametric [98, Table3.8].
2.3.2 Architecture Driven Modernization and RMM
The Risk-Management Modernization (RMM) approach gives a more detailed process de-scription (Appendix C.12) of the first two steps in a modernization track, identified by theRenaissance method – Plan Evolution and Implement. The initial portfolio analysis is sim-ilar to the one performed according to the Renaissance method [82, Figure3-2]. Once theparts of the system, that need to be modernized, are identified – the modernization candi-dates, the process continues by identifying the stakeholders and their requirements. Theseare then used in a cost-benefit analysis to “help decide which approaches, if any, should beevaluated further” [82, p.31].
ADM Horseshoe model
The actions part of the modernization are performed in the context of the ArchitectureDriven Modernization(ADM) model [60, 59]. According to ADM, the existing legacysystem as well as the target solution can be represented by models on three levels of ab-straction: Business architecture domain, Application architecture domain and Technol-ogy architecture domain. The modernization effort is then represented by a path startingfrom the Technology models of the legacy system and ending at the Technology modelsof the target solution. This path can go through the higher abstraction levels or remainlimited to the lower levels. This leads to three basic types of modernization approaches,Figure 2.1. They range from less invasive and at the same time with less modernizationimpact(Figure 2.1a), to more invasive and more difficult to perform, but with much moremodernizing impact(Figure 2.1c).
Going between the models of the different layers is done by transforming them. Thereare three types of such transformations: Formal transformations, Abstraction transforma-tions and Enhancing transformations (Appendix C.13). The path that is thus traversed hasthe form of a horseshoe and thus the name of this model – the ADM Horseshoe Model.
RMM Program understanding
In order to evaluate the modernization in terms of risk and effort, RMM includes the phaseof Program Understanding. It consists of two parallel activities of Legacy System Un-
17
2. THEORETICAL CONTEXT
(a) Technical Architecture-DrivenModernization
(b) Application/Data Architecture-Driven Modernization
(c) Business Architecture-DrivenModernization
Figure 2.1: ADM modernization paths
derstanding and Target Technology Understanding. The Legacy System Understanding iswhere the first part of the ADM modernization path is done, going from the Technologyarchitecture models to the Business architecture models – reconstruction. On the code-structure level reconstruction techniques include artifact extraction, static and dynamic anal-ysis, and slicing techniques. On the function-level, the used techniques include semanticand behavioral pattern matching, and aggregation and clustering.
Once the models of the legacy system have been constructed to the desired level of ab-straction follows the Target Technology Understanding. This is a contingency-based eval-uation of multiple technology alternatives in parallel, targeted at a sample model problem.Following this initial investigation, an evaluation is also done at the architectural level fol-lowing the ATAM approach [57]. For this purpose, use cases are used to evaluate howwell each architectural alternative supports the desired system requirements and quality at-tributes. After target architecture has been chosen, a modernization strategy is developed.This strategy targets issues about the scheduling of the phases of the modernization activi-ties, such as code migration, data migration and intermediate deployment states [82, Figure13-3]. Seacord et al. recommend the usage of Data adapters and Logic adapters as build-ing blocks for the realization of such a modernization strategy. The alternatives are analyzedagain and a choice is made about the global ordering of activities. The concrete details ofthe transformations are developed in a Code Migration Plan and a Data Migration Plan.The identification of the needed transformations in the Code Migration Plan is based on acomponentization analysis. The componentization can be based on two different cohesioncriteria, according to Seacord et al. [82, p.256]:• user-transaction based – identifies, through structural analysis on the call-graph of
the source code, four categories of program elements: root-, node-, leaf- and isolatedprogram elements. The root elements and the call tree underneath them map directlyto user transactions. This functionality is then clustered together.• business object based – closely linked data entities, extracted from the database tech-
nical design are clustered together. For each such data record set, the functionalityaround it that manipulates data entities from the set, is clustered together.
The Code Migration Plan then schedules the migration of the clusters over sequentialiterations. The goal are as few as possible adapters, necessary to accommodate for the inter-operability between already migrated and not yet migrated clusters, during the intermediatestates. Finally the complete modernization strategy if reconciled with the stakeholder re-
18
Existing approaches to system evolution
quirements and the resources needed to realize the modernization strategy are estimated,using the estimated size of the system and cost models such as COCOMO II [13].
2.3.3 SMART
The Service Migration and Reuse Technique(SMART) [63, 64] is a modernization approachdeveloped by the SEI specifically for the migration towards SOA. The method analyzes,through a questionnaire of more than 60 categories, the viability of reusing legacy com-ponents as the basis for services by answering questions like: What services make sense todevelop? What components can be mined to derive these services? What changes are neededto accomplish the migration? What migration strategies are most appropriate? What are thepreliminary estimates of cost and risk? The method contains three important elements:SMART process a systematic means to gather information about the legacy components,
the candidate services, and the target SOA environment (Appendix C.14).Service Migration Interview Guide (SMIG) contains the questions that guide the infor-
mation gathering. Its goal is to cover all known issues that could influence the migra-tion in cost and effort.
Artifact Templates output of the SMART process – Stakeholder List, Characteristics List,Migration Issues List, Business Process-Service Mapping, Service Table, ComponentTable, Notional Service-Oriented System Architecture, Service-Component Alterna-tives, Migration Strategy
SMART process activities
Establish Context Understand the business and technical context for migration. Identifystakeholders, Understand the legacy system and target SOA environment at a highlevel, Identify a set of candidate services for migration
Define Candidate Services select a small number of services from the initial list of candi-date services, that perform concrete functions, have clear inputs and outputs, and canbe reused across a variety of potential applications
Describe Existing Capability technical personnel are questioned to gather informationabout the legacy system components that contain the functionality meeting the needsof the services selected in the Define Candidate Services activity.
Describe Target SOA Environment gather information about major components of theSOA environment, impact of specific technologies and standards used in the environ-ment, guidelines for service implementation, state of target environment, interactionpatterns between services and the environment and execution environment for ser-vices. Build the so-called Notional Service-Oriented System Architecture to describethe system in terms of its components – service consumers, infrastructure, services,legacy components – and how they interact with each other, resulting in a representa-tion similar to the SOA Process View (section 2.2).
Analyze the Gap provide preliminary estimates of the effort, risk, and cost to convert thecandidate legacy components into services, given the candidate service requirementsand target SOA characteristics. The discussion of the changes that are necessary for
19
2. THEORETICAL CONTEXT
each component is used as the input to calculate these preliminary estimates. TheService-Component Alternatives artifact is created during this activity to illustratethe potential sources for functionality to satisfy service requirements.
Develop Strategy generate a strategy to address the migration issues generated by the pre-vious steps
Overall, although the SMART method has a few disadvantages, it is an important ad-dition to Renaissance and ADM. Its first disadvantage is that it is a proprietary approachand the concrete content of the questionnaires is not publicly available. But even then, theapproach is not based on automated data collection or artifacts. It does not rely on scanningand analyzing the source code of the legacy system. This means that the SMART Processmust be performed by trained experts, collecting information through interviews or manualinspection. In spite of these two disadvantages, SMART does provide indicators about thespecifics of modeling the modernization towards SOA, which the previous methods lack– Artifact Templates, Define Candidate Services, Describe Existing Capabilities. For thisreason, these will be used to extend the approaches of Renaissance and ADM.
2.4 Modernization towards Service Oriented Architectures
Previously it was described how to model legacy system architectures, target SOA environ-ments and the overall process of modernization. There is a fourth and final element neededto address the research question. This is the issue of quantifying the impact and effortinvolved in performing the modernization towards SOA in the context of the approachespresented earlier. Specifically, we will concentrate on the use of software architecture in-formation – the concrete models and changes on the architectural level, that we consider toinfluence the modernization effort.
The section begins with the fundamentals of evaluating the impact of modernization– Impact Analysis. It gives an overview of the aspects where the impact is manifested –Global, Organizational, Business and Technological. In this context, the Software Impacton legacy systems is positioned relative to the impact on the organization and business. Thispositioning of the impact on the software aspect in the Global Impact of modernization isnecessary for the definition of the research scope and the relation to future work.
The remaining two sections get into the details of the modernization towards SOA atthe software architecture level. They present the ways to organize the known moderniza-tion strategies and their architectural and design changes looking at the effort they require.Their known impact on the modernization effort will be used as a base for the overall effortestimation of this research’s framework.
2.4.1 Impact Analysis
System evolution and modernization involve performing changes to the system. The es-timation of the consequences and extent of these changes is done through the process ofImpact Analysis. We distinguish three areas that are impacted in the case of legacy systemmodernization towards SOA. We present them all here in order to make clear what is theextend of the scope of the research specified later in chapter 3.
20
Modernization towards Service Oriented Architectures
Definition 2.14 Impact Analysis: the activity of identifying what needs to be modified inorder to make a change, or to determine the consequences on the system if the change isimplemented [50]
Global Impact
The SOA modernization effort is concerned with all aspects of the Enterprise Total Architec-ture – people, business processes, information, and systems (section 2.2). A similar holisticview is adopted by Renaissance (Appendix C.16). In addition, Josuttis [52] describes fourareas of concern specific to developing a Service Oriented Architecture: Architecture, In-frastructure, Processes, and Governance. In the case of SOA as target architecture, mod-ernization changes are not only limited to the software system, but have a more globallyreaching impact.
Each of these aspects needs to be addressed in the modernization and initiates its ownseries of changes. However they are all interrelated and changes in one impact the others.One example is the relationship of the technology aspect to the business. The architecturalchoices made, limit the possible options and/or create opportunities for the business [78]. Asecond example is the relation of the technology aspect to the organization of the enterprise.Once the Service Oriented Solution is in place it has to be maintained and the facilities forthis SOA Governance have to be put in place [16, ch13], [76, §2.5], [52, p.261].
For a complete Impact Analysis of the modernization all of these aspects have to beinvolved. But we will further concentrate on the change impact in the software systemaspect. We cannot consider it in isolation and will have to make assumptions about the restand the enterprise purpose.
Impact organization aspect
The organization must be set up to enable the anticipated changes to the legacy system and atthe same time adjust to them. This is characterized by Tilley in [93, §5.6] as organizationalreadiness. The ripple-effects of the change impact will create the need for also modernizingthe organizational structures [52, ch.8]. Part of assessing the severity of this impact is thecapability assessment, an inventory of the training needs, an application usage survey, anoperational deployment considerations and the available capabilities in managing organi-zational change. Furthermore, specifically for Service-Oriented systems, it has been shownthat a different set of roles is needed for their development and maintenance [54],[16, ch7].The overall degree of impact of the modernization towards SOA is, thus, also related to itsorganizational SOA readiness – SOA maturity [16, ch6],[5],[44, p.532, ch13]. The samechanges in a different organization may lead to a different impact and effort.
Impact business aspect
The modernization towards SOA has also an impact on the business aspect of the legacyarchitecture [17, ch.9]. The design principles of Service-Orientation have clear guidelinesabout participants, roles and responsibilities in the enterprise business processes. Thus themodernization of the legacy system calls for the introduction of their clear definitions and
21
2. THEORETICAL CONTEXT
possible restructuring of orchestration and responsibilities [52, ch.7]. There exist a numberof approaches for the identification of business legacy affected by the modernization. ThisBusiness Domain Analysis helps identify the problematic points in the business processlandscape that need modernization and thus deduce the impact such as the IBM ComponentBusiness Model (CBM) [33].
Impact technology aspect
The emphasis of this research lies with the impact analysis for the technology and softwaresystems aspect. SOA is a Business-Driven paradigm so the input of the impact analysisprocess for the technology aspects must be the result of the Business Domain Analysis. Theimpact on software systems manifests itself in two different ways: architecture/design andsystem quality attributes [65, 74, 76]. There are two aspects to their quantification:
I. difference/similarity analysis• between architectural models of as-is and to-be system: data [22] or functional-
ity [2, 69]• based on system quality attributes characteristic for Service-Oriented Architec-
tures: interoperability [55, 7], compatibility [2], service-orientation [46]• reuse potential: identification reusable components [40]
II. change propagation [101]
2.4.2 Classification of modernization strategies
Besides the differences between as-is and to-be system architectures, the modernizationeffort is also determined by the modernization strategies chosen to overcome these differ-ences [48]. In order to make an estimation of the overall effort, we need to quantify thedifference between these modernization strategies in terms of effort. Existing work makesit possible to do this on an ordinal scale in two dimensions. First, we classify them ac-cording to the issues they addressed in the modernization towards SOA. Second, we use aclassification of modernization strategies according to their impact and required effort.
We use the following identified issues in modernizing towards SOA to make a classifi-cation of the modernization strategies according to which one they target:
O’Brien [76]Identification/Mining of servicesintegration of common shared servicesdevelopment SOA InfrastructureDevelopment service providers/consumersSOA GovernanceArchitectural Trade-off Analysis
Sneed [89, 90, 86]degree of dependency on the environment(none, partially, totally)identify business rules (“code striping”)achieving statelessness of servicesn:m relationship between business rules and code blocksdatamodel mapping to SOA technology(eg.XML)
22
Modernization towards Service Oriented Architectures
Bierhoff - Architectural mismatch [10]the nature of services
- functionality supply- infrastructure expectations- control model- data manipulation
communication between services- asynchronous communication- message data model
global architecture structureconstruction process
Depending on their impact, modernization strategies can be classified in two differentcategories: Black-Box modernization and White-Box modernization [82, p.9].
Black-box modernization involves examining only the external behavior of the legacysystem through its inputs and outputs. The system’s internal working mechanisms are ig-nored. There are two common motives for this approach. Either there is simply no descrip-tion of the internals available (no source code) or the internals are considered too complexand are abstracted away by considering only their effect on the environment. A commonblack-box method is wrapping. Unfortunately this approach is not always practical andoften still requires knowledge about the system component’s internals through white-boxtechniques [80].
White-box modernization is much more extensive and complex than the black-box ap-proach. It requires the expertise of software engineers on a much more deeper level and isthus also known as software reengineering.
Definition 2.15 Software reengineering: Reengineering is the systematic transformationof an existing system into a new form to realize quality improvements in operation, systemcapability, functionality, performance, or evolvability at a lower cost, schedule or risk tothe customer [82].
The reengineering process involves three phases. First, it gathers information about theinternals of the legacy system through program understanding [24]: modeling the domain,extracting information from the code, creating abstractions that describe the underling sys-tem structure. This process is also known as reverse engineering:
Definition 2.16 Reverse engineering: The process of analyzing a subject system to iden-tify the system’s components and their interrelationships and create representations of thesystem in another form or at a higher level of abstraction [24].
Then, using the knowledge from this analysis, follows software restructuring. Afterwhich the third and last phase of forward engineering is performed. This view on systemreengineering is also advocated by the Architecture Driven Modernization approach.
Definition 2.17 Software restructuring: the transformation from one representation formto another at the same relative abstraction level, while preserving the subject system’s ex-ternal behavior(functionality and semantics) [24].
23
2. THEORETICAL CONTEXT
Reengineering is often based on graphs as a representation of the system. This has alsobeen done by Cremer et al. [25] for a system written in Cobol similar to the one used in theexperiment of this research.
Within the white-box and black-box categories, we distinguish specifically for the mod-ernization towards SOA three main non-mutually exclusive categories: wrapping approaches,componentization, service extraction. They are aimed at achieving the changes neededfor Service-Orientation (Appendix C.15 [39]).
Wrapping approaches
Wrapping is a way of overcoming the mismatches between the interface offered by a soft-ware component and the interface required by its environment. This approach correspondsto the replacement of a connector between two components with a new component with aconnector to each of the components.
Definition 2.18 Wrapping: surrounding the legacy system with a software layer that hidesthe unwanted complexity of the old system and exports a modern interface [82]
Wrapping can be performed for different parts of the system and with different degreeof added functionality in the wrapper. In all cases, the application of this approach aimedat satisfying the SOA design principle of Contracts. Wrapping approaches have been ex-tensively studied [15, 23, 47, 85, 88, 89, 12, 1, 19, 21] with the following most significantexamples of different types:• Simple wrapper for encapsulation and integration of services [89]• Wrapper containing additional functionality: Logic adapter for the request-response
pattern [19, 21], Session Based, Transaction Based, Data Based [1]• Replacement with COTS components [7, 47] such as an ESB [8] complying with the
same contract• Componentization combined with wrapping towards a MVC design pattern [12]
Componentization
Componentization is a group of restructuring approaches. They are based on clustering/-grouping together functionality into components according to some criteria they share, suchas the use of the same data source or the implementation of the same concern. By doingthat, the cohesion of the code is increased according to the clustering criteria and the cou-pling to the rest of the functionality is decreased. This approach introduces autonomousand manageable units of reuse – the components – and defines clear separating lines be-tween them. The result of such restructuring makes possible the fulfillment of the SOAdesign principles of coupling, autonomy and reusability. The following work describes theimportant characteristics of the method and the variations in the approach:• Componentization according to a domain model: use domain knowledge to identify
“features” supported by the legacy system and identifying the functionality belong-ing to each feature. This code is then clustered per feature [68]
24
Modernization towards Service Oriented Architectures
• Componentization according to existing code structure (coupling, dependency ma-trix): automatic identification of components [40]• Componentization for a concern: clustering for client-server architecture [20]• Componentization for a quality: component mining for reusability [84]
Service extraction
The group of service extraction approaches is a special case of componentization. Here,extra effort is spent on making sure that the resulting components are also suitable to beused as services in the infrastructure of a Service-Oriented architecture. This means that theSOA design principles of contracts, composability, statelessness, and discoverability arecentral. The clustering criteria are chosen accordingly in the following approaches:• Combining black-box and white-box approaches: legacy system decomposition into
components and exposing them as services to be integrated into a service orientedarchitecture [103]• Service extraction as part of ADM: service extraction by following datamodel defini-
tions [41]• Business process and service extraction [104]
2.4.3 Architectural and design features
The modernization strategies described above are targeted at transforming the legacy archi-tecture into a Service-Oriented organization. In this transformation process, most entitiesfrom the legacy architecture have their analog in to Service Oriented Architecture. Thiscorrespondence can be one-to-many, many-to-one or many-to-many. For example a set oflegacy components can be chosen to form one SCA Composite exposing a set of Servicesor one legacy component can be split and each part wrapped and exposed as a Service.
However, not every entity from the legacy architecture has its SOA counterpart and theother way around. We can illustrate this by saying that the entities described in the abovetransformations form the intersection between the source and target architectures. Thereare however, entities that belong to only one of them. This means that on the one hand, theaddition may be required of architectural features not present in the legacy architecture. Onthe other hand, it may be necessary to remove architectural features that are obsolete in thetarget architecture. Here we describe these two categories according to their root cause.
Target Architecture
The first category is formed by the features that are required in the target SOA architecture,but that don’t have their analogue in the legacy system. Papazoglou [79] gives an overviewof the basic architectural implications of introducing a Service-Oriented Architecture. Hespecially emphasizes on the concept of the Enterprise Service Bus (ESB). Together with it,a set of SOA capabilities are describes such as:• Transaction capabilities• Reliable messaging capabilities• Dynamic connectivity
25
2. THEORETICAL CONTEXT
These are features that may not b present in the legacy architecture and thus have to beadded from scratch.
Next to these features, Stal [92] points out the use of certain patterns as a result ofenforcing SOA design principles. This relationship is valid both ways: the presence of theparticular pattern leads to the enforcement of the corresponding design principle and on theother hand the pursuit of a particular design principle leads to the use of the correspondingpattern. Some of the correspondences Stal mentions are:• Loose coupling⇔ Bridge- , Observer- , Reactor- , Store-and-Forward patterns• Statelessness⇔ Resource Lifecycle Manager, Activator- , Evictor patterns• Composability⇔ Coordinator pattern, Strategy pattern
Legacy Architecture
Modernization may also necessitate the removal of entities from the legacy architecture.These can be components, connectors or whole patterns these form. To estimate the impactof this, we need to know what the functionality represents that is being removed. We quan-tify how much of it is taken over by either replacement in another form or by dropping it alltogether. For this, a classification is very suitable of connectors, components and structuralrelationships.
The classification of connectors is given by Mehta et al. [69] through an extensive tax-onomy. Removing or replacing connectors can be a way of decreasing coupling, one of themain goals of SOA. This taxonomy makes is possible to identify what capability supportedby the connector is dropped that may need (partial)replacement and which other connectorsshould be considered that are close to it in the taxonomy.
A similar approach for components is based on architectural mismatches [10, 7]. Theset of properties describing the component (Component Model, section 2.1) are used toidentify mismatches and similarities between it and its replacement or between the leftover components. Thus, when components are removed, these mismatches can be used toquantify the effort needed to make the whole work together again.
Removal can also affect a whole group of components and connectors following a de-sign pattern. In such cases, it is also important to classify the removed functionality. Thereare two types of information that have influence on the effort for modernization. The firstis what feature is removed described by the pattern’s rationale. The second is which furthercomponents may be affected captured by the set of components involved in the pattern. Ar-celli [3] describes a set of patterns that are useful for this purpose as they are most relevantfor the case of modernization towards SOA.
26
Chapter 3
Effort Estimation Framework –Research description
In this chapter, we present the theoretical results of the conducted research. These results aretwofold. In the first place, this is the combination of the described existing approaches mostsuitable for the goal of modernization effort estimation. Secondly, that is the contributionof our own concepts and argumentation about how those approaches should be extendedto use in the form of an Effort Estimation Framework. This chapter consists of two parts.The first two sections specify the preconditions of the research. The following two sectionscontain the above-mentioned original contribution of this thesis.
The preconditions of the research begin with the presentation of the scope in section 3.1.Within this scope, the goal, as described in the problem statement (chapter 1), is to constructa framework that structures the process of impact analysis of the modernization towardsService Oriented Architectures of legacy systems. Continuing the research preconditions,section 3.2 presents the assumptions on which the framework is built. They form theinvariable part that is the basis for the framework.
The following two sections present the original work of this research. First, section 3.3presents the configuration of the Effort Estimation Framework. In this process the issues inmodernization effort estimation with their multiple possible solutions are considered. Foreach of them, the preferred choice is presented and how it fits in the framework. Finally,all of these configurations are combined and presented in the form of the resulting EffortEstimation Framework. Section 3.4 presents its three components Class definitions, Rela-tionships and Control Model.
3.1 Targeting the Effort in the Software Application domain
This section presents the initial scoping of the research, done by narrowing down the appli-cation domain of the framework. Three aspects are used to limit the relevant domains:
1. type of impact2. modernization impact aspect3. abstraction domains
27
3. EFFORT ESTIMATION FRAMEWORK – RESEARCH DESCRIPTION
Impact Types
Extending the definition of Impact Analysis given in Definition 2.14 on page 21, we definetwo types of impact in the case of legacy system modernization:
Definition 3.1 Gain: the potential value the changes deliver [78] (Positive impact fromthe point of view of the enterprise). Characterized by an individually defined value functionGAIN()
Definition 3.2 Effort: the cost of the changes needed to modernize (Negative impact fromthe point of view of the enterprise). Characterized by an individually defined cost functionEFFORT ()
The problem statement of this research (chapter 1) described the needed feasibility anal-ysis of a modernization project. Such an analysis would weight the benefits against the costof such a project. These two trade-off factors will be represented by the two functions de-fined above – Bene f it = GAIN(), Cost = EFFORT (). The valuations of these depend onmultiple variables, some of which we will try to identify in this research as so called Indi-cators. The effort estimation approach of this research will be based on the balancing of theabove-mentioned two Impact Types with an overall envisioned dependency of the Effort tothe desired Gain as shown in figure 3.1. In this research we will concentrate on the effortcost function and only monitor the resulting gain.
EFFORT (%)
GAIN (%)
0 10 20 30 40 50 60 70 80 90 1000102030405060708090100
Figure 3.1: Graph of the expected relationship between Effort and Gain
Here we need to make a distinction between the Business Gain and the Gain we areconsidering in this research. We explain why we make this distinction and how the twoare related. According to [81] an enterprise has a Mission from which follow its BusinessGoals. These in turn serve as a basis for the construction of a Business Strategy for achiev-ing them. Based on this we define the overall Business Gain from a modernization projectas the degree it contributes to the achievement of the set Mission, Business Goals and Busi-ness Strategy of an enterprise.At this point the mentioned enterprise goals and strategy are still only defined in the busi-ness domain and are as Bleistein [11] shows only soft goals. Examples of such goals are
28
Targeting the Effort in the Software Application domain
an increase in Return On Investment, decrease in Time To Market and increased flexibility.In order to translate them into measurable hard goals an implementation plan needs to bedefined containing the concrete Objectives to achieve. The realization of this implementa-tion plan depends on the enabling factors such as people, processes, structures and culture,and IT. We speak of alignment when these are ready for the execution of the strategy [81].Part of achieving alignment is the IT-Business alignment [71]. Bleistein suggests an ap-proach for achieving such an alignment through the translation of the Business Goals to ITrequirements [11]. For the realization of these requirements and the implementation of thebusiness strategy there are multiple choices for enabling technologies amongst which SOA.So a decision needs to be made whether the elicited requirements can be best satisfied bythe Service-Oriented approach and the advantages it offers such as flexibility and reuse orwhether there are other more suitable technologies. The Gain considered in this research isthus on the level of implementing the above-mentioned Objectives and is defined as the de-gree of gained conformance to the principles of Service Orientation. Increase in this Gainwill lead to an increased Business Gain only and only if SOA and its advantages fit thebusiness requirements resulting from the Business Goals [71]. In [81], [71] and [70] arementioned classifications of business strategies. Based on these we give an indication ofthose that will most likely experience the greatest Business Gain from the modernizationtowards a Service Oriented Architecture. These include the adaptive firm in a very chal-lenging environment, the entrepreneurial conglomerate, the innovator and companies thathave as main strategy the entering of new markets through process innovation or productdifferentiation.
Concluding, we assume for the further research that the process of Business Strategydetermination has resulted in the decision for SOA as enabling technology. Based on thisgoal, the subsequent gain is determined by the degree of achievement of Service-Orientationexpressed through the SOA design principles such as loose coupling, reusability and auton-omy described in section 2.2.
Modernization Impact Aspects
As mentioned in the context of this research (§2.4, §2.3), there are many aspects of thelegacy system where the impact of modernization is visible. This research is concentratedon the Application Software sub-part of the Technical aspect (Appendix C.16). Within thisApplication Software sub-part, we recognize three factors that influence the effort estima-tion: (1) as-is system state, (2) to-be target state, (3) modernization strategies (known fromfigure 1.1).
Modernization Domains
All the above-mentioned three factors can be viewed on three levels of abstraction as de-scribed in the Architecture Driven Modernization (§2.3.2). The impact of the modernizationon both gain and effort relative to the domain is shown in figure C.17 [60]. The researchis concentrated on the Application/Data domain together with its links to the Business do-main. The reason for this choice is that these domains are concerned with abstraction and
29
3. EFFORT ESTIMATION FRAMEWORK – RESEARCH DESCRIPTION
enhancing transformations, where the link to the Technology domain merely involves for-mal transformations [59].
3.2 Framework foundation on the existing approaches
In this section, we give the basic assumptions of this research within the scope given inthe previous section. These are based on the existing approaches presented in sections 2.3and 2.4. These assumptions further define the outlines of the proposed solution and theinvariable parts in its application.
First and foremost, we assume a framework structure of the research end-result. Theadvantages of such an organization (§ 2.1) are needed for two reasons. First, the mod-ernization process can be viewed as a transformation function between a Domain – as-islegacy systems and a Range – to-be SOA systems. Both the domain and the range of themodernization represent large sets of systems. With a framework we can specify the high-level guidelines for the effort estimation common to the modernization between differentinstances. Thus allowing its extension with additional modernization effort aspects andmore information sources. Second, a framework offers a clear division between the compo-nents of modernization effort estimation. This makes it possible to concentrate on a subpartin isolation and specialize it for a particular set of instance systems. This will leave the restunaffected and will still produce overall results.
The Effort Estimation Framework as such has to define:
Class definitions: Meta-model presented in section 3.4.1 + Rating ModelRelationships: Relationships, part of Meta-model presented in section 3.4.1Control Model: Process + Rating Model + Relationships
This research takes into account the challenge in finding the balance between reusabilityand flexibility (§2.1) of the resulting framework. In order to make the flexibility level of theframework clear, we divide its theoretical base in two parts. First, we present the rigid set ofassumptions in the rest of this section that define the frameworks broad reuse domain. Then,in section 3.3, we present a set of configuration issues that give it flexibility. These offerchoices for concrete instantiation of issues from the framework. A different set of choicescould be made there to influence the estimation outcome, but that would still preserve theoverall relationships within the framework. For example, for each step in the Control Modela concrete approach can be chosen. In the framework instantiation process there is still onelast level of implementational choices left, which we will present in chapter 4.
The assumptions of the framework that follow have two dimensions:1. the subject they are about: Legacy system, SOA system or Modernization strategies2. the framework part they belong to: Class definitions, Relationships or Control Model
We will present them grouped together according to the first one and indicate where neces-sary which framework part they are relevant for.
30
Framework foundation on the existing approaches
Overall
O-1. use models on each level expressed in (abstraction)concepts in order to make theframework possible. These models can be replaced or extended and instantiated fora particular system. As long as they supply the required information to fit in theframework. One can add indicators, concepts etc.
O-2. the framework contains 3 Domains – Business, Application, Technology – based onADM figure C.17[Class definitions, Control Model]
O-3. concepts are linked top-down to other domains’ concepts (relation ”is-implemented-by”) – makes sequential causal impact propagation analysis possible
O-4. Concepts are linked to the concrete data sources they originate from and are repre-sented in Architectural Views [Class Definitions]
O-5. portfolio analysis(fig.C.8) is performed prior to effort estimation. See also FutureWork in chapter 6
O-6. cost estimation is the phase following this estimation output. Assume cost division asgiven by Renaissance on page 16
Legacy System
L-1. the software system legacy code is the only input source (no documentation, no expertknowledge) [Class Definitions]
L-2. the considered programming paradigm is that of procedural languages (no OO andits advantages of embedded semantics). There is different research done to modellegacy systems and each is tailored at a specific paradigm. Only the OMG has so farproduced a standard (KDM) that attempts to bring all paradigms together. Currentlyit is only a standard and there are no implementations of it [Class Definitions]
Modernization
M-1. both the legacy system and the SOA target can be abstracted to the architectural level,expressed with the same concepts, where the only difference will be in the structure
M-2. 4 Phases based on Renaissance(fig. C.9) and RMM(fig. C.12) [Entities]. See alsoFuture Work in chapter 6
M-3. changes and their impact propagate top-down starting from the Business Processlevel(see fig.C.17)
M-4. model overall steps of SOA modernization based on SMART (see fig.C.14). Steps:Establish context, Define Candidate Services, Describe Existing Capability, Describetarget SOA environment, Analyze the Gap
M-5. using Renaissance modernization strategies shown in fig.C.10 as basisM-6. using Data Adapters and Logic Adapters as strategy types (see RMM on page 18M-7. set of available approaches to be: wrapping, componentization, service extraction
(see section 2.4.2)M-8. although there are also other factors, the effort is largely determined by the architec-
tural changes needed to the legacy system architecture
31
3. EFFORT ESTIMATION FRAMEWORK – RESEARCH DESCRIPTION
M-9. rating is to be done based on gap analysis between as-is and to-be, fig.C.9 [ProcessModel - Stages]
M-10. a hierarchical rating approach with modernization issues on the top and concrete mea-surable indicators on the bottom
SOA
S-1. two categories of entities – infrastructure/services, see definition 2.8 [Entities, RatingModel]
S-2. definition 2.11 ”well-defined piece of business functionality” for service identifica-tion [Service extraction methods]
S-3. SOA Design Principles(sec.2.2) are the guiding force behind architectural decisionsabout the target environment
S-4. possible SOA capabilities are the ones listed by Papazoglou in [79] [Rating Model]S-5. 2 architectural views on SOA systems – Development View (fig.C.5) and Process
View (fig.C.6) [Rating model]S-6. it is possible to describing the target Service-Oriented system in terms of SCA con-
cepts (fig.C.2)
3.3 Framework configuration
Starting in this section, we present the original work of this thesis. Here, the configurationis presented of each of the three components of the Effort Estimation Framework – Classdefinitions (§3.3.1), Relationships (§3.3.2) and Control Model (§3.3.3). The configurationis done by first presenting the possible options for addressing an issue and then giving thechoices we have made. The resulting framework is then presented in the next section 3.4.
The contribution in all three parts is two types. In the first place, concepts from existingapproaches are selected and combined through categorization. This is the case for the enti-ties selection in the Class Definitions, the model selection in the Relationships and phasesand domains in the Control Model. Secondly, this section introduces a Rating Model builton top of the selection of existing approaches. It uses the information gathered with themto produce an effort estimation of the modeled system modernization.
3.3.1 Class definitions
Configuration categorization
C-1. we organize the class definitions as a meta-model consisting of 3 sets of modelingconcepts: Legacy System, SOA system and Modernization strategies
C-2. choose for the modeling concepts on the architectural level: component, connector,interfaces (sections 2.1)
C-3. the differences between legacy systems and SOA-based ones, that cause the needfor evolution are the Service Abstraction Layer and the resulting grouping of func-
32
Framework configuration
tionality on the implementational level, mapping business functionality to softwarefunctionality (1:1)
C-4. architectural and design patterns (see section 2.1) are considered important conceptsfor the modernization towards SOA as noted by Arcelli [3] and Stal [92]. Their workshows that certain patterns can be used to achieve SOA design principles as describedin section 2.4.3. However, in the literature study there is no concrete indicator foundfor the estimation of the modernization effort needed given the presence/absence of apattern. Based on this, the use of architectural and design patterns in the frameworkis configured to be limited. On the one side, the place of patterns in the meta-model isdefined by the correspondence between their rationale and the SOA design principles.However, they cannot be included in the Rating Model until further research is doneon the relation between patterns and modernization effort such as: a) the effort neededto transform a set of legacy entities into a pattern b) the effort that is saved by reusingan existing pattern binding a set of legacy entities
C-5. use a Rating Model to organize the aggregation of the different factors to a effort es-timation output
Configuration Rating Model
The Rating Model is base on the Goal/Question/Metric method [97]. Two issues that standat the basis of the configuration of the Rating Model are the choice of an effort quantifica-tion method and the choice of a rating scale for the estimation output.
Quantification method – an effort estimation analysis could be based on:1. quality attributes and general legacy system characteristics such as size, complexity,
etc. These are easily measurable, but would produce too inaccurate results. The rea-son for that is that they do not provide enough information, and only give an indica-tion of the resistance of the system to change. What is equally important is measuringthe need, the spreading of the changes. Such an approach cannot, thus, produce anaccurate estimation of the effort, which is the product of the resistance and the need
2. a second and much more accurate approach is to try and approximate the moderniza-tion process that would take place. In doing so, giving an estimation for both resis-tance and need. This though means modeling the modernization changes followingthe ADM modernization scenario from section 2.3.2, a more extensive process
Rating scale – the effort estimation can be expressed on one of four measurement scales:1. Nominal scale: division in sets through labels2. Ordinal scale: relative order, but no magnitude of the difference3. Interval scale: difference magnitude, but no absolute zero point4. Ratio scale: rating with an absolute zero point
33
3. EFFORT ESTIMATION FRAMEWORK – RESEARCH DESCRIPTION
Points of Modernization as basis for effort estimationIn order to apply the second, more accurate, above-mentioned quantification method weneed to model the modernization process in the framework and use this approximation foreffort estimation. The base for this model is provided by the concept of a Point of Modern-ization(PoM) in the Rating Model. A PoM signifies a gap between the architectural modelsof the as-is and to-be systems. It is identified by checking the as-is architectural views forconformance with the SOA design principles(see section 2.2). A Point of Modernizationuniquely identifies the relationship between a set of legacy entities, a set of SCA to-be en-tities and the the set of strategies that will be applied to transform the one into the other.For a set of legacy components this could mean the application of multiple strategies suchas re-componentization and wrapping to achieve the desired end-system organization. Eachsuch Point of Modernization is, thus, the source of a certain amount of effort in the mod-ernization process. The way this effort is calculated from the identified PoMs is specified inthe Rating Model.
In this version of the Rating Model we limit ourselves to the change identificationthrough difference analysis (§2.4.1) for quantifying the impact of modernization. Althoughthe remaining issue of change propagation is ignored, we emphasize that there are depen-dencies between the separate Points of Modernization and their corresponding legacy andSOA entities. This will lead to certain kinds of impact propagation, such as the so-calledindirect impact [14]. For example, the application of a modernization strategy on one set oflegacy components can affect other related components as well, initially not included in anyPoM. This will result in the need for creating an additional Point of Modernization coveringthem, resulting in more overall effort. At the same time the coupling of a component to datastructures and routines that are to be changed as well can also influence the overall effort.Isolated changes may cost more effort than when they are combined as is pointed out byBayer [6]. These effects of impact propagation have to be considered before extending theRating Model.
Modernization effort classificationWe choose to organize the Rating Model by grouping the modernization strategies and theircorresponding effort according to the following prominent modernization concerns: identi-fication, componentization and integration. This categorization is based on the modern-ization strategies presented in section 2.4.2 and the modernization issues from section 2.4.2.
Strategy effort indicators and rating aggregationAs was mentioned earlier in this section, we have chosen to base the production of an effortestimation, in the Rating Model, on the discovered set of Points of Modernization. Thiseffort valuation of each PoM is determined by two factors. First, a set of indicators is usedto rate the impact of the modernization strategy chosen for the PoM within its category.Second, the category is of importance to which the modernization strategy belongs, fromthe ones described above.
The values produced by the indicators are aggregated following the Rating Model cat-egories to produce an effort estimation on the ordinal scale. The ordinal scale is the resultof the choice to order the impact of the modernization strategies only on an ordinal scale
34
Framework configuration
(as opposed to interval and ratio scales). For instance, white-box strategies such as restruc-turing have a higher effort weight than a black-box strategy such as wrapping. With thecurrent knowledge only an ordinal scale is feasible. In order to increase the accuracy bygiving an interval scale grading of categories and their strategies more research has to bedone on the relation between them. This issue is further explained in the chapter on futurework, chapter 6. The set of useful indicators for the componentization concern is given intable 3.1 and for the integration concern in table 3.2.
PROPERTY INDICATORAutonomy set of data used by given function [89]
function to (global)variable ratio [68]Separation of concerns [63] degree of scattering [102]
decomposability [20]feature-to-function ratio [68]multiplicity business rule-to-code block [89]presence of design patterns [3]
Flexibility [4] Service granularity [89]code similarity dendrogram based on features=identifier names [103]impact domain (call-graph subfunction propagation) [90]dominance tree from call graph [6]
Statelessness [89] data-usage [89, 77]number of stateless/stateful functions [68]static data containment [90]
Dependency on environment [89] Commercial software dependency [63]dependencies on operating systems, databases, filesystems [63]
Coupling number of outgoing GO-TO statements from the mined component [90]number of global/local vars [68]coupling to functionality outside of the component [103]number of cross-cutting points [102]use of dynamic code collaboration patterns [100]
Table 3.1: Componentization indicators
Among the indicators, we make a distinction between two categories – Level1 andLevel2. A Level1 indicator can only give a relative information about the effort on an ordi-nal scale. While a Level2 indicator makes it more absolute and improve the estimation to aninterval scale. This is the case, for example, with the indicators of “scattering” and “impactdomain size”, mentioned later on. Used on its own, a high scattering value gives a relativeidea about the effort compared to a low scattering value. However, only when this scatteringis combined with the absolute size of the respective impact domain (e.g. lines of code), canboth estimations be compared on an interval scale. The reason for this is that combinedthe two indicators make it possible to order a single complex change (high scattering andsmall impact domain) can be compared to a simpler but extensive change (low scatteringand large impact domain). It is furthermore obvious that using only the categorization of thestrategies without their indicator based evaluation, also only an ordinal scale of estimationcan be achieved.
35
3. EFFORT ESTIMATION FRAMEWORK – RESEARCH DESCRIPTION
PROPERTY INDICATORCoupling interfacing styles(socket, RPC, signal, pipe, file I/O) [23]
interaction model type (session based vs. request/response) [21]Abstraction interface-to-implementation coupling [23]/design patterns [3]Contracts feature composition and relations [68]
interface complexity (number of used data types) [90]interface definition [103]the input/output interface parameters of component [data-flow analy-sis] [12]unidirectional/bidirectional integration [1, 23]
Architectural mismatches [10]:= infrastructure expectations connector taxonomy [69]= data manipulation Business Object Model differences [43]= communication model communication direct/hub-and-spoke [8], protocols [7]= message data model logical data model, common message model [4]
Table 3.2: Integration indicators
Rating Model input configurationWe configure the Rating Model to use two sources of input. One with information about thelegacy system and one with information about the target system. These correspond respec-tively to the domain and range of the modernization transformation function as described inthe assumptions of section 3.2. In the same section, we also noted the fact that the domainand range encompass very large spaces of instance systems. Which exact instance fromthat space is the target design depends on the system requirements. Narrowing them downmakes it possible to produce a more accurate estimation for the specific situation at hand.We make this possible by enabling the use of input parameters for specifying the desiredend-system architecture.
We have already described the type of input information being gathered about the legacysystem in the form of Points of Modernization. But in order to be able to calibrate the Rat-ing Model also for a more specific target Service Oriented Architecture we configure itto be able to receive input about that system’s configuration. We do this by grouping theabove-mentioned effort indicators according to the SOA property they promote, see ta-bles 3.1 and 3.2. The choice for these properties is based on the SOA design principlesdescribed in section 2.2. The input about the target SOA system consists, thus, of a set ofdesired properties and the degree to which each should be achieved. The choice of propertytranslates into a choice for one of the corresponding indicators to be included in the Rat-ing Model for the evaluation of the modernization effort. The desired degree can then beused together with the resulting effort estimation for an optimal choice in the cost/benefitanalysis described in the problem statement (chapter 1) and in the scoping of the researchin section 3.1 by comparing the effort/gain predictions for different sets of input parameters.
To give an example for the concrete case of the Coupling property, we distinguish be-tween three degrees, ordered from high to low, given bellow. For each of them, we also
36
Framework configuration
show how it manifests itself on the application level through the coupling classification byMyers [72]. The rating scale for this input is also ordinal, because a higher accuracy issuperfluous as it would just be lost when used in the ordinal Rating Model.
High: high coupling between components exists both in the application logic and in thedata models. High coupling in the application logic manifests itself through theuse of global variables (Common coupling) and direct calls between components(Control/Content coupling). The coupling between data models is demonstratedby Data and Stamp coupling, where components share data type definitions, whichmakes them dependent on each other.
Medium: only high coupling between the data models of different components exists,while their application logic is loosely coupled. A loose coupling between the func-tionality of components is introduced through the use of contracts (interfaces) orthrough design patterns such as the Mediator.
Low: the components are loosely coupled both in their application logic and their datadefinitions. The additional decoupling between data models in this level is demon-strated through the use of a canonical data model to which each component canmake its own mapping for internal use. An possible architectural organization forthis level of coupling, where both the data and logic of components are decoupled,is an environment based on messaging with a canonical data model of the for thecommunication messages.
Concluding the input configuration, it can differ which indicator will be included in theRating Model or will be evaluated as part of the effort estimation. It depends on the SOAproperty requirements and the degree they should be present in the end-system. The selec-tion of these is part of the configuration and calibration process of the framework.
Rating Model effort estimation variablesHere we configure the internal workings of the Rating Model valuation. We presentedthe functions EFFORT () and GAIN() in definitions 3.1 and 3.2, which are evaluated inthe Rating Model. The variables in these functions are the input variables of the RatingModel and through them also the effort indicators. The cost/benefit analysis is performed byevaluating these two functions for a given set of input variables and weighting them againsteach other. By changing the set of used variables and their values different situations can beconsidered for finding an optimal choice.
The variables of the gain function GAIN() follow from the scope definition in sec-tion 3.1 and the configuration of the Rating Model input. It is a function of the SOA proper-ties. For the concrete valuation function we assume a simple weighted sum of their achieveddegree, where the weights are specific to the modernization goals and proportional to theirrelative importance. Thus, we write:
GAIN( f lexibility,coupling, ..) = w f ∗ f lexibility+wc ∗ coupling+ ... (3.1)
The effort function EFFORT (), however, is the one this framework concentrates on and isthus more elaborate. It is based in the Rating Model and it depends on the effort indicators
37
3. EFFORT ESTIMATION FRAMEWORK – RESEARCH DESCRIPTION
and the effort classification categories presented earlier in the configuration section. We usethe following definitions:
EFFORT (Si,N,C, I) = N(Si)⊕C(Nbi,Nii)⊕ I(Ci)⊕S(Nii) (3.2)
N(Si) = k ∗Punclassi f ied ∗ |Si|+m∗Pin f ra ∗ |Nbi|+n∗ (1−Pin f ra)∗ |Si| (3.3)
C(Nbi,Nii) = f (|Nbi|, |Nii|,scattering, |impact domain|,coupling) (3.4)
I(Ci) = f (|Ci|, implementation coupling, inter f ace,mismatch) (3.5)
S(Nii) = f (|CCPNii |,{SOA capability}) (3.6)
where
Si = set of legacy entities considered for modernization
Nbi = set of legacy entities identified as business logic for set Si
Nii = set of legacy entities identified as infrastructural logic for set Si
Ci = set of components resulting from the componentization for set Si
CCPNii = set of cross-cutting points for set of entities NiiN() = effort estimation function for the identification concern group
C() = effort estimation function for the componentization concern group
I() = effort estimation function for the integration concern group
S() = effort estimation function for the satisfaction of SOA concerns group
Punclassi f ied = percentage of unclassified legacy entities
Pin f ra = percentage entities identified as responsible for supplying
infrastructural services
The following paragraphs clarify the relations chosen above with a few important re-marks. For the EFFORT () function, equal importance is assumed for the three concerngroups and thus weights of 1 are chosen in equation (3.2). In addition to that, the opera-tor ⊕ is used to indicated the aggregation of the effort of the different concern groups. Aswas already mentioned in this section, the relation between the estimated effort for the dif-ferent groups can not be specified quantitatively yet and only an ordinal scale can be used.This is the reason that results of the separate groups cannot be added up directly yet.
The dependency specified in equation (3.3) is the result of assumption S-1. We recog-nize two subsequent levels of identification of system entities. In the first place there is adivision between classified and unclassified entities. The ratio between these is signified bythe percentage Punclassi f ied . Classified entities are those for which it is possible to establishtheir role in the system as belonging to either one of the categories on the second level –infrastructural- or business logic. The ratio between these two is given by Pin f ra. This di-vision creates three categories each of which will be treated differently in the subsequentmodernization and thus generate different effort:• Punclassi f ied ∗ |Si| – proportional to the number of unclassified entities and source of
effort for the identification of their role, possibly manually
38
Framework configuration
• Pin f ra ∗ |Nbi| – proportional to amount of business logic the infrastructure part inter-acts with• (1−Pin f ra)∗ |Si| – proportional to the number of business logic related entitiesFor each of these components, a weight coefficient should be chosen such that it fits ex-
perimental data. In equation 3.3 these are signified by k,m, and n, but the relation betweenthe three components does not need to be linear. The same holds for the function specifica-tions of C(b, i) and I(c), which only give a general indication of the dependency. We onlywant to indicate that there is a direct correlation to the variables. The concrete choice ofweight factors is left for the instantiation and calibration phase. We give an example of suchan instantiation in the next chapter with a concrete case study legacy system.
Another important notice about the effort quantification equations is that they are alldependent on the set of legacy entities considered for modernization Si. This means thatEFFORT () can be used to investigate the best phasing of the project as well. By con-sidering first a small initial set and monitoring and how the effort and gain change as thisset is extended with more legacy entities. This way an analysis similar to the one given infigure 3.2 can be produced.
set1 set2 set3 set4 set5
C(b,i)I(c)
EFFORT
Figure 3.2: Componentization- and Integration effort over increasingly larger legacy entitysets
The figure also shows the difference in effort needed for the modernization of the ap-plication logic supplying infrastructural services such as communication and logging andbusiness specific logic. The effort for the former has a high initial value, but once modern-ized these facilities requires few extra effort with the extension of the legacy entities set.The latter is characterized by a steady growth in effort.
3.3.2 Relationships
In this subsection we describe the configuration of the relations between the different mod-els of the framework. These are a selection from the information sources used to capturethe characteristics of a modernization case by the different existing approaches. As wasdescribed in the scope section 3.1, the framework is concentrated around the Business andApplication domains. So these are the domains for which models are defined as follows:• Business Model
39
3. EFFORT ESTIMATION FRAMEWORK – RESEARCH DESCRIPTION
• Architectural Model, Integration Model• Service Model• Development View, Process View• Rating Model
We keep the relation between the Business domain and the Application domain as isshown in figure C.17. All the information from the Business domain is contained in a sin-gle Business model. However, the Application domain is captured in two separate modelscontaining information about two separate concerns – Architecture model concerned withComponentization and Integration model concerned with Integration. Together they formthe as-is model of the legacy application.
To these as-is models we relate two models of the to-be system - the Service model andthe Development View/Process View. We introduce the additional Service Model based onour assumption C-3. It is positioned between the Business and Application models and hasa correspondence relation to both in a similar way to the relation depicted in figure C.6. TheDevelopment and Process views capture again application domain information, but aboutthe SOA to-be system. They are also split based on the concerns they model – Compo-nentization and Integration respectively. Both are expressed through Service ComponentArchitecture entities from figure C.2.
Finally we have the Rating Model. It consists of two parts. First is the rating aggrega-tion part, which contains the strategy and indicator entities. Second is the part containingthe Points of Modernization mentioned in the previous subsection. These are also relatedto the rest of the models, specifically the Development/Processes views and the Architec-tural/Integration models, and thus function as an interface to the rating part with input aboutthe legacy system. The complete overview of the resulting relationships configuration isshown as part of the framework in figure C.21.
3.3.3 Control Model
In the configuration of the Control Model we decide on the workflow to be followed whenusing the framework. This is why in this subsection we present the two major parts thatdefine it – the order of model construction and the choices for approaches to achievingparticular steps.
We decide for the analysis process to follow a simple pipe-and-filter configuration. Thismeans that the order of model construction is sequential, with each step using the resultsof the previous one. This is motivated by both ADM where each model has its roots in alower/higher level model and the steps described in the modernization methodologies Re-naissance and SMART, section 2.3. The configuration of the steps of the Control Modelconsists of the following four phases, which apply for all the three domains of the frame-work:Reverse Engineering First identify the available sources: code, documentation, deploy-
ment configuration, maintenance history. Then gain structural(as opposed to se-mantic) knowledge of the legacy system architecture by constructing its static(as op-posed to dynamic) model with the concepts specified in the Meta-model(table C.18).
40
Framework configuration
The result of this phase for each of the three domains is an as-is Model, Business-,Application- and Technology respectively
Restructuring Taking the as-is models, result of the Reverse Engineering phase, in eachof the domains as a starting point, restructuring is done to identify the desired end-system organization. This is done according to the SOA design principles and usingthe known requirements for the end Service Oriented Architecture
Gap Analysis The analysis consists of three important parts. First make an inventory of thedifferences by comparing as-is to to-be models in each domain in the form of Points ofModernization. Second, assign a set of possible solutions to each PoM and propagatethe impact of the change within the same or to a lower domain if necessary. And last,give a valuation to the end-result by using the decision variables in the Rating Model
Forward Engineering This phase considers the bridging options, result of the gap analysisphase. The end-system requirements and the limitations they impose are percolatedup the domains starting from the Technical domain. There, first SOA environmentrestrictions are taken into consideration and their impact on the Application domainis evaluated. In this way a refinement of the effort estimation made in the previousphase is produced
Approaches for each of the above-mentioned phases tend to divide in two groups – au-tomated vs expert-driven – as is the case with ADM and SMART described in section 2.3.We choose to use techniques that are simpler and produce more crude results, but are au-tomatable.
Business Service identification can be done top-down or bottom-up [31, 76]:1. Top-down: identify needed business services based on a business domain analysis
and map them to existing components, by searching which components capabilitycould satisfy the need
2. Bottom-up: identify the business needs by looking at the capabilities offered by thecomponents
The business domain analysis part of the top-down approach involves domain expert par-ticipation. Although it has the disadvantage to be less accurate, the bottom-up approach ispreferred in the further configuration of the framework because of its suitability for automa-tion and less supervision.
Constraint percolationTO-BE system requirement based constraints percolate in the direction opposite to the im-pact(see RMM and fig.C.12). This is recognized and part of the overall process model infigure C.22, but will not be further explored in this research.
Automated analysis typeWe have chosen for static structural analysis. This is the most basic type of analysis, but atthe same time the one most prone to automation. The analysis could be extended to dynamicand even semantic, but they require an increasing degree of expert supervision.
41
3. EFFORT ESTIMATION FRAMEWORK – RESEARCH DESCRIPTION
3.4 Framework specification
This section presents the complete framework for modernization effort estimation basedon the scope, assumptions and configuration from the previous sections of this chapter. Itconsists of a set of deliverables forming together the framework that can be instantiated,refined and extended. The contribution of the framework is twofold. In the first place, itcombines and integrates existing approaches and concepts. Secondly. these are then usedas a base for the new effort estimation Rating Model.
First we describe the meta model of the framework in subsection 3.4.1, containing theclass definitions and relationships. Following, in subsection 3.4.2, is the description of theprocess of instantiating the framework and its control model. Finally, we present the RatingModel result of the framework configuration.
3.4.1 Framework Meta Model
This meta model contains the Class Definitions and Relationships part of the framework.The Class Definitions consist of a categorization of entities from the existing approachesdivided over the three effort factors described in the scope: Legacy system (Appendix C.18),SOA target system (Appendix C.19) and Modernization strategies (Appendix C.20).
Secondly, the relationships are presented that the framework defines between the differ-ent models and the entities they contain. These were initially configured in subsection 3.3.2.Here we present them in detail in UML notation in figure C.21. For each model, we alsoshow the major entities from the meta model that it contains. Through these entities therelationships between the models are concretely defined. In the figure is also indicated thedistribution of the models over the framework domains. The layering is similar to the ADMapproach, with the Business domain on top.
3.4.2 Framework instantiation process
The instantiation process of the framework, that prepares it for the effort estimation analysis,is depicted in two figures, C.21 and C.22. Figure C.22 shows the global order in which themodels of the framework should be created. It is based on the configuration of the ControlModel and shows its 4 Phases split over the 3 domains of modernization described in thescope. This defines twelve different activities in their intersection. These are shown asnumbered green rectangles in figure C.22. Each of these steps identifies and separates out aparticular goal for a particular domain. For example, step#1 is concerned with extracting orreverse engineering the as-is legacy system in terms of the business domain entities specifiedin table C.18 of the meta-model. The figure also specifies the set of input and output modelsfor each of the above-mentioned steps. A central role is played by the PoM Model. For eachof the domains, this model contains the set of Points of Modernization identified throughthe Gap Analysis comparing the as-is to the to-be models.
It is further also important to note the indicated impact propagation direction and therequirements percolation in the opposite direction. This is based on the assumption that theBusiness Domain models are implemented through the Application Domain models. And
42
Framework specification
this means that any changes in the former may propagate and impact the later. The samealso holds for the relationship between the Application Domain models and the TechnologyDomain models. This is indicated by the assumption M-3 in section 3.2.
Compared to the higher-level specification of figure C.22, figure C.21 is a concreteinstantiation resulting from the framework configuration of section 3.3. Based on the con-figuration of the Rating Model, it shows the concrete models to be used to supply it with theneeded input. The numbers next to the inter-model relationships indicate the order in whichthe models are to be produced.
3.4.3 Effort estimation Rating Model
The Rating Model configuration was presented in section 3.3. The resulting model is shownhere in figure C.23. It is based on aggregating the effort estimated for the modernizationstrategy attached to each Point of Modernization. This aggregation is first done in the threeconcern groups – identification, componentization and integration – with their strategiesindicated in the meta model table C.20. Next to these groups, aimed at the achievement ofseparation of concerns, there is a forth group concerned with the satisfaction of the end-result system requirements that may not have a correspondence in the legacy system. Theseare the SOA capabilities also mentioned in the meta-model in table C.19. The Rating Modelthus contains three levels: at the highest level are the concerns, at the middle level the mod-ernization strategies that belong to that concern, and at the lowest level the effort indicators.These were already described in the framework configuration section. For this version ofthe Rating Model we have kept the choice of indicators limited as there is not yet enoughempirical evidence to indicate the exact relation of indicators to modernization strategiesto be able to distribute them correctly over the categories. Both effort and gain can beaggregated according to the same scheme presented here.
The configuration section of the Rating Model (3.3, “Strategy effort indicators andrating aggregation”) explained the choice for an ordinal scale of strategy rating. This orderis also depicted here in the diagram of the Rating Model by the order of the concern groupsthey belong to. In addition to their relative effort, this order also indicates the succession inwhich the groups are evaluated for an increasing accuracy of the effort estimation. Startingfrom the left with “Identification”, the strategies require increasingly less effort. This isbased on the way these strategies are implemented as was presented in section 2.4. Here westate the main points:• Identification: may require a large amount of manual effort• Componentization: great deal of restructuring, which requires also more insight into
the implementation• Integration: merely mismatch alignment through wrapping and adapter addition• Concern satisfaction: merely plugging in or removal of functionality similar to the
COTS strategy and to some degree wrappingFollowing, we will go into the rating for each of these groups and their strategies. We willconclude with the role design patterns can play in the rating process.
43
3. EFFORT ESTIMATION FRAMEWORK – RESEARCH DESCRIPTION
Identification
The result in this concern group is an estimation of the effort needed for the Reverse En-gineering activities in the modernization. As presented earlier in figure C.22, the activitiesof this phase are present in all three domains – Business, Application and Technology andthus, identification effort is required in each of them. This is why the rating model reserves aplace for strategies for each of the domains. The business category from figure C.23 is aboutidentifying Business Model entities and Technology about building a technology model ofthe system including such as operating systems, etc (see Meta-model entities figure C.18).We acknowledge these aspects and give their place in the rating model. However we con-centrate on the Application aspect, as was described in the scope in section 3.1 and give itsbuildup of strategies and indicators.
The rating for the Application Domain is based on strategies for the classification ofthe legacy entities according to two criteria. The first has to do with the role of the entity– component (COMPONENT)or connector (CONNECTOR). While the second makes adistinction based on the purpose – business application logic (APP LOGIC) or infrastruc-tural functionality (INFRASTRUCTURE). This is expressed in assumption S-1 (page 32),stating these two types of functionality. The combination of these two criteria with twoalternatives each, creates four categories for the identification of legacy entities. Once it hasbeen identified for every legacy entity in which category it belongs, the rating can be doneusing the two indicators shown in the figure – the percentage classified legacy entities andthe percentage that implements infrastructural functionality.
The reason for choosing this division is that specific business logic and more generalfunctionality can benefit best from different modernization strategies. Thus, investing effortearly on in identifying them correctly, will pay off in less effort in later phases as it would bepossible to apply the most appropriate strategy. The APP LOGIC entities are more likelyto be kept during the modernization and bring about effort in the componentization andintegration concern groups. Their functionality will most likely be encapsulated into ser-vices and integrated to work together to perform the same business processes. On the otherhand, the infrastructural facilities such as communication and resource discovery are morelikely to be replaced as a whole in favor of a new infrastructure/framework that suppliesthe same services. This, as opposed to reengineering it. The former will result in effort inthe integration and satisfaction of concerns groups, while the later will result in effort forcomponentization.
Concluding this concern group, we need to make an important remark about its effortoutput and how it will be influenced in case of an extension of the framework. Currently,it has been setup for the ordinal scale. As is shown in equation 3.3, this means usingthe enumeration of PoMs to be identified, a qualitative relationship between them suchas infrastructure entities require more effort to identify than business logic and giving aneffort estimation based on the size of this set and its composition. However, if the rating isextended to an interval or ratio scale, the degree of automation that can be applied for theextraction of the models in the Reverse Engineering phase will start influencing the effort.This is a consequence of the difference in realization effort needed between automatedand manual expert-driven approaches. This was discussed in section 3.3.3 and throughout
44
Framework specification
chapter 2. Once a unit of measure is introduced on an absolute scale such as costs, differentweights will be used when translating to it for automated and expert-driven identification.For the expert-driven part this weight factor will be much greater for two reasons: greatertime investment per PoM and greater costs per time unit.
Componentization
The componentization rating is concerned with delivering an estimation of the effort basedon only local subsystem restructuring where needed. The modernization strategies that fallin this group are the more invasive white-box modernization strategies. The effort estima-tion resulting from this category gives a local estimate about restructuring subsystems inisolation. This means that different subsets can be considered for modernization togetherby only summing their corresponding efforts.
The goal of the strategies in this category is to achieve the rough architectural descrip-tion of the application as required by the Service-Oriented paradigm. This means, sepa-ration of concerns, Business-to-IT alignment and SOA design principles. The selection ofindicators from table 3.1 was thus made such that it would measure the effort of the changesthe modernization strategies would involve.
Integration
Through this category the effort and gain for the system as a whole are estimated. This isdone by rating the impact of the integration issues between the component subsets chosenafter the componentization. The strategies here represent the possible actions that may beneeded for Points of Modernization. They are all aimed at achieving a seamless whole byconsidering the separate components as a black-box. This is why these strategies are con-sidered to have less impact than the componentization strategies. Internally there is alsoan order in the estimated amount of effort required by each strategy. In the diagram, theWrapper strategy is the one requiring the least amount and the replacement of a compo-nents with a COTS (Commercial Of The Shelve) solution has the greatest estimated effort.A selection was made from the possible indicators from table 3.2, which can be used witheach strategy.
SOA concern satisfaction
All the concern groups and their strategies described so far rate the effort in transformingexisting assets through restructuring or integration. This covers the intersection of func-tionality present in both the legacy and the SOA system. The category of SOA concernsatisfaction, on the other hand, is responsible for estimating the effort needed for handlingentities that don’t have an equivalent in either the as-is or the to-be system. This means thatthere are two strategies that could be applied – ADD or REMOVE. The former is necessaryin case one of the SOA capabilities mentioned in table C.20 is required in the target sys-tem, but has no equivalent in the legacy system. For example the addition of a messagingcapability to a system based on direct communication through API’s. The latter strategy– REMOVE – is applied to legacy entities that are obsolete in the SOA environment. For
45
3. EFFORT ESTIMATION FRAMEWORK – RESEARCH DESCRIPTION
both strategies holds that their effort depends on the number of places they interact with therest of the application as they represent cross-cutting concerns, described in section 2.2 inthe subsection on “SOA Software Architecture”. Thus, a quantifiable indicators for thesestrategies is the so called number of cross-cutting points (CCP) [102], used in equation (3.6)of the Rating Model EFFORT () function. It is also mentioned in table 3.1 as an useful in-dicator for the Componentization concerns of which SOA concern satisfaction is a specialcase.
Rating through the use of design patternsDesign patterns play an important role in the forward engineering step of the target SOA [3].In the reverse engineering part though, they are less useful. Their use is based on the fact thatthe presence of a design pattern can help to induce semantic knowledge from the structureonly by assuming that the pattern structure was used with the intent of its rationale. So thepresence of a pattern in the legacy system means two things:• there is a clear structure, so restructuring will be easier• the structure has a rationale(pattern forces). This can then be compared to the to-be
situation forces.The presence of a pattern and thus the certain degree of separation of concerns it achievesdoes not automatically mean that the involved components can be reused as is. It still mightbe the case that the whole composite would have to be rewritten from scratch. In such acase the pattern’s only added value is that it aids in the system understanding. A patterncould also indicate a less optimal solution, but one that is easier to achieve by following thepatterns structure. For example including in the component all other classes participating inthe pattern, although they do not directly have any relationship with the core functionality.
46
Chapter 4
Legacy Logistics System –effort estimation experiment
This chapter has two goals. The first is to demonstrate the instantiation of the Effort Estima-tion Framework for a concrete legacy system. The second goal is to evaluate the resultingeffort estimation and through it the effectiveness of the framework that delivered it.
The chapter begins in section 4.1 with the definition of the experiment performed toreach the above-mentioned goals. This experiment is structured according to the guidelinesby Wohlin [99] on experimentation in software engineering. Based on the experiment defi-nition, section 4.2 describes the planning for performing the experiment. This includes thehypothesis and design of the experiment, the description of the legacy system used as a sub-ject and the instrumentation. The execution of this planning is then presented in section 4.3Experiment operation. The results of the experiment are presented in section Data Analysis(4.4) noting the important facts about them. An interpretation of these results is given insection 4.5. We end the chapter with a discussion and the conclusions from the experimentin section 4.6. This last section also containing ideas about future work for extending theexperiment setting.
4.1 Experiment definition
The rest of the chapter describes the experiment with object of study the Effort EstimationFramework from chapter 3. The purpose is to evaluate the effectiveness of the suggestedframework and the effort estimation results it produces. This experiment is a single objectstudy done from the perspective of the researcher. The single subject of the study is theLegacy Logistics System undergoing the experiment through a prototype software system.
4.2 Experiment planning
4.2.1 Context selection
The experiment is executed by the research student himself with no test subjects. Profes-sional software engineers were involved, but their contribution was not related to the test
47
4. LEGACY LOGISTICS SYSTEM – EFFORT ESTIMATION EXPERIMENT
data itself. The tests are performed in an off-line situation parallel to a real life moderniza-tion project. Although the examined problem is from a real situation, it is constrained insize due to time restrictions. These constraints are both on the study object as well as on thestudy subject. For the experiment only part of the framework is considered and only part ofthe subject legacy system.
4.2.2 Hypothesis
To evaluate the effectiveness of the framework, the following null hypothesis is formulated:“According to the Effort Estimation Framework, the specified input Gain is unrelated to theframework output Effort”. The experiment is designed with the goal to deliver empiricaldata to reject this null hypothesis in favor of the alternative hypothesis. That alternative isformulated as “The Gain is related to the Effort following the Effort Estimation Frameworkin an exponential relationship”. This expectation was mentioned in figure 3.1, section 3.1.
4.2.3 Variable selection
In order to disprove the null hypothesis, the relevant parameters from the framework areselected as independent and dependent variables to monitor during the experiment.
NotationDue to the restricted setup, not all framework variables will be involved in the experiment.We introduce a notation to emphasize that only a limited subset of the parameters included inthe framework are being evaluated. The two main functions are GAIN() and EFFORT (),defined in section 3.3.1. When these are only evaluated in relation to only some of theirspecified parameters, we respectively denote them by Gain() and E f f ort().
Independent variablesThe controlled environment is created by the set of selected independent variables. The firstof theses is the input factor of the experiment – Granularity. It is chosen, because it isa measurable entity proportional to the frameworks input parameters of Flexibility [8] andComposability (section 2.2). The Effort Estimation Framework defines these on their turnto be linearly proportional to the Gain – GAIN( f lexibility,coupling, ...) (equation 3.1). Sofor the experiment we have Gain( f lexibility) as theoretical cause to be examined throughthe treatment of the independent variable granularity, denoted with g (see Appendix C.28).For each treatment of g a test will be done executing the Effort Estimation Framework withit as input. The granularity is measured as the total number of components to partition thelegacy system in and has a range of [0..5386] on a ratio scale.
The remaining set of independent variables will, however, be kept constant. Thesevariables together with their value and measurement scale are:• Legacy components subset (Si): value = all; scale = Ordinal• Rating Model concern: value = Componentization; scale = Ordinal• Modernization strategy: value = Restructure; scale = Ordinal
48
Experiment planning
• Model extraction methods: described in the experiment design; scale = Nominal
Dependent variablesThe experiment will monitor the development of two dependent variables:• Scattering: indicator from the framework Rating Model; scale = Ratio• Effort: indirect measurement based on the scattering; scale = Ratio
The function EFFORT () is defined in equation 3.2. Because this experiment evaluatesonly the Componentization concern, we use only the C(Nbi,Nii) part of the equation. Fur-thermore, assumption iii on page 50 about the supplied LLS code means that Nii = Ø fromwhich follows that Nbi = Si. So for the effort estimation in this case study holds that:
EFFORT ()≈C(Nbi,Nii) = E f f ort(scattering) (4.1)
Because the Componentization concern is considered in isolation, no aggregation ofeffort estimations is done. This makes it possible to do the experiment on the ratio scale. Thescore however is an index internal for the indicator/modernization strategy combination.It cannot be related to an index produced for a different strategy with different indicator.The relation between these two is defined only on the ordinal scale by the Rating Model.Currently, when merging multiple indicators or categories the score has to be converted tothe Ordinal scale by a custom scaling that divides the scores in a couple of groups. With the100% being the maximal total effort.
4.2.4 Subject system description
The subject legacy system for the experiment was selected through stratified random sam-pling. A set of criteria were important for the characterization of the group from whicha system was randomly chosen. These criteria are typical for the problem domain of theEffort Estimation Framework: legacy application, implemented in a procedural language,complex interdependencies, in need of modernization indicated by the maintenance team.
The chosen system is the Logistics Legacy System – LLS. It contains many subsys-tems with heterogeneous implementations and with many interdependencies. The devisionbetween them is based on their responsibility domains. Three of these subsystems are givenbellow with their corresponding implementation language and runtime environment. Theyshare the following characteristics: a) transaction oriented, b) work on a common datamodel (in DB2) and c) rely on some types of infrastructure support (CICS vs. IMS-TM).• Request Handling – CICS/Cobol• Warehouse Management – IMS/PL1• Location Planning – IMS/PL1
We concentrate on the Request Handling sub-system, which will be further referredto as RH. It is concerned with the management of incoming requests for parts and, thus,the implementation code includes request process flows and inventory management. Thecode considered as belonging to this sub-system was exported from the LLS codebase andsupplied by the maintaining team.
49
4. LEGACY LOGISTICS SYSTEM – EFFORT ESTIMATION EXPERIMENT
A brief examination of it gives the following points of consideration:i. the relation of compilation to run-time units is one-on-one. One PROGRAM/PRO-
CEDURE per file. This means that the units in a fine grain representation of the codecan be seen as both files and programs.
ii. No built in environment facility for program address passing. This has had to becustom implemented as part of the code base. This code falls into a different cate-gory when it comes to modernizing possibly to other platform/language with differentcapabilities. It contains no business functionality but utility
iii. we assumed that the code contains application logic and no infrastructural facilities
4.2.5 Experiment design
For each treatment of the granularity factor, the experiment instantiates four parts of theEffort Estimation Framework: the meta-model entities (section 3.4.1), the meta-model re-lationships (section 3.4.1), the analysis process steps and models (section 3.4.2) and theRating Model (section 3.4.3). Here follows a list of the parts used for each of them.
Meta-model entities:• Legacy – table C.18, columns Business and Application• SOA – table C.19, column Application• Strategies – table C.20, column Componentization
Meta-model relationships (figure C.21):1. Business Model: processes/activities2. Architectural Model, Integration Model: Legacy Components and Connectors3. map Business Model to Architectural Model: which ”Legacy Components” imple-
ment each ”Business Process”4. Service Model: information(cluster around data types), business(logic), general(rest) [96]5. Rating Model
Analysis process (figure C.22):• steps: 1, 2, 3, 4, 5, 6• models: AS-IS Business Model, AS-IS Software System Model, TO-BE Business
Model, TO-BE Software System Model, Business PoM Model, Software PoM Model
For each test run in the experiment, three steps are executed to build the models de-scribed above. First is constructed the AS-IS Business Model. Followed by the extractionand abstraction of the AS-IS and TO-BE Architectural models. In the third step, the twoare combined giving the PoM Model. This is used to calculate the dependent variables ofScattering and Effort. These three steps are described below.
Business ModelThe Business Model of the RH subsystem is constructed through extraction and abstractionof the source code. The end result is a representation of the business process flows inthe Business Process layer from the Development View of figure C.6. The approach is asimplified version of the one described by Zou [104]. While extracting this view we keep
50
Experiment planning
the traceability to the code level, the importance and purpose of which is described byXiao [101]. This means that the link between each business process and the PROGRAMentities implementing them is kept (figure C.24).
In the variable selection is described that the subset of legacy components consideredfor the experiment will be constant for all treatments and include all the available compo-nents. This results in a constant number of business processes that will be extracted in thisstep of the experiment.
Architectural ModelThe Architectural Model of the RH subsystem is constructed by abstracting legacy com-ponents based on the coupling between programs. The end result is a representation ofthe legacy components in the Service Components layer from the Development View offigure C.6. While extracting this view we also keep the traceability to the code level (fig-ure C.25).
Complete as-is model and PoM ModelIn this step we first build a complete as-is model of the system by linking the processesfrom the Business Process model to the components implementing them in the Architec-tural Model. Figure C.26 shows conceptually how this is done. We use the traceability linksof both models to the code level to relate the entities in them. This mapping between thetwo models is used to identify the Points of Modernization. For each business process thereis one Point of Modernization with an attached strategy of Restructuring. For each of thesePoMs we measure the effort indicator of Scattering that belongs to the Separation of con-cerns property. Scattering is defined as the number of components that implement a givenbusiness process and thus, the scattering of business functionality over legacy components.We use the following notation:
Dg = distribution of the scattering over the business processes for granularity g
Dg =[
bp s]
bpg = business process distribution component of Dg
sg = scattering component of Dg
For a given granularity level g, we use the scattering distribution Dg and use it in anaggregation function to calculate the effort for the Componentization concern. The concretespecification of the effort as a function of the granularity is as follows:
E f f ort(g) =n
∑i=1
bpg,i ∗ s2g,i (4.2)
here, n is the size of the distribution matrix Dg, bpg,i - the number of business processesfrom that distribution, sg,i - the corresponding scattering index. For the construction of thisvaluation function a choice has to be made about the second part of the product, s2
g,i. Itrepresents the scattering weight, which indicated the effort relation between different scat-tering indexes. In other words, how much more(or less) effort does a PoM with a scattering
51
4. LEGACY LOGISTICS SYSTEM – EFFORT ESTIMATION EXPERIMENT
indicator value of sn require than one with scattering of sn−1.The choice of scattering weight relation we have made here, is based on two assumptions:• same scattering index means same scattering weight. However two PoMs with the
same scattering index do not have to valuate to the same scattering weight. Think ofother factors such as size, structure etc that could influence the relation.• we assume a fully connected graph between all the implementing programs of one
business process. This means that splitting them into n+ 1 components would leadto n∗(n−1)
2 connections. This is of order O(n2) so we also assume a quadratic relationfor the scattering weights.
4.2.6 Instrumentation
The above-mentioned experiment steps will be fully automated with a prototype softwaresystem. The process of model instantiation and evaluation is implemented in this prototypein the following phases:
1. Extraction+Abstraction ->AS-IS Models:• Business Model• Application Model
– Architectural Model– Integration Model
2. Restructure ->TO-BE Models:• Service Model+SOA Infrastructure• Development View• Process View
3. Analysis• Rating Model
The flow between these phases and the data that is saved is shown in figure C.27. Mi-nor preprocessing of the source code files was needed before they could be parsed in theExtraction phase. This was done with a custom written Java processor and involved theremoval of invalid characters from the source such as EOF. For the Extraction phase itself,the Relativity Modernization Workbench R©1 was used. The extracted information was thenexported in Excel format. A second custom Java application was used to transform this datainto a format that could be imported in the tool used for the following phase of Abstraction.For it as well as for the Restructuring and Rating Model analysis, MATLAB R©2 was used.
4.2.7 Validity evaluation
Here are presented the considered threats to the validity of the experiment, mainly internaland external. The concepts under study in this experiment are shown according to [99] inAppendix C.28.
1http://www.microfocus.com/products/modernizationworkbench/index.asp2http://www.mathworks.com/products/matlab/
52
Experiment operation
None of the main threats to the internal validity apply to the designed experiment, be-cause it is automated through the software prototype. This means that the application of thetreatments does not change with time, they do not have any side effects on the subject sys-tem and there are no human factors influencing the test results. Thus, the experiment offersa high degree of certainty that the applied treatment is the cause of the observed effects. Thesingle exception is the instrumentation. It forms however no big threat as the tools used areconsidered reliable. Relativity is a commercial workbench of high quality. The main partsof the custom MATLAB code are to be found in appendix B for inspection.
The external validity of the experiment is also not jeopardized, which increases thepossibility to generalize the results. Again, because of the automated process performedby the prototype, the test data is not susceptible to the moment in time the tests are run orany characteristics of the population of subject participants. In addition, the subject legacysystem has been chosen so that is representative for the problem domain targeted by theEffort Estimation Framework (section 4.2.4). This makes any generalizations based on thetest data valid for the domain of similar legacy systems.
4.3 Experiment operation
As an input Application Model was used the call-graph of the RH subsystem. The Rela-tivity Modernization Workbench R© was used to parse the legacy Cobol code and produce acaller/callee map. That was exported in the form of a report, an excerpt of which is shownin appendix C.29. This callmap was converted to a directed graph represented as an adjencymatrix in MATLAB, which we will further refer to under as M.
The unique id of each node in this graph representation is shown in column 2 in fig-ure C.29. Each such node corresponds to a PROGRAM entity from the code. The edgesrepresent the calls between the programs. The graph contains in total 5386 nodes and 8349edges. It is important to note that the caller/callee map produced by Relativity contains onlyinformation about which program calls/is called by other programs. It does not extract theorder in which these are called. So the nesting part of the flow is preserved, but the orderwithin a program is not. This means there is no particular order (pre-, in- or post-) in whichthe tree nodes should be read, but for our purposes this is no limitation.
A second observation worth noting is that M was found to be an acyclic graph (DAG).This fact simplifies some of the algorithms used in the Business Model (4.3.1) and Archi-tectural Model (4.3.2) sections, but does not invalidate their generality.
4.3.1 Business Model
As input for this step we used the adjency matrix M. To extract an approximation of thebusiness process flows in the RH subsystem, the code from Listing B.1 was used. Accordingto it, we first identify the set of entry-nodes. These are programs that are not called by anyother program but only invoke others. These root nodes are considered the starting pointof business processes, which can be invoked from outside or through the system’s front-end. In total we identified 1026 such entry-point nodes out of a total of 5386. Once theentry-point nodes are identified, we extract the subtree of nodes from the call-graph with
53
4. LEGACY LOGISTICS SYSTEM – EFFORT ESTIMATION EXPERIMENT
root each of the entry-nodes. This produces the end-result subtrees. This is a set of size1026 containing the sets with the nodes belonging to each business process. This result isonly an approximation of the real business processes of the RH subsystem, because of thecall-graph limitations described in the introduction. This means that the extracted businessprocesses do not completely depict the order of activity execution, but that for our purposesthis limitation does not influence the end result. The traceability to the code level is kept bypreserving the ids of the nodes in the subtrees, which uniquely identify the implementingprograms.
Visualization of the extracted process flows was done with the code from Listing B.2.Three such flows with increasing size are shown in figures C.30, C.31 and C.32. The oneshown in figure C.32 contains the most nodes in the whole RH subsystem. For the resultingprocess trees, it is common to share subtrees. This could be used as an indicator for thechoice of granularity level (see subsection 4.3.2) so as to optimize reuse.
4.3.2 Architectural Model
Here as well, the adjacency matrix M was used as starting point. It was process with thecode from listing B.3 with the goal of identifying clusters of program nodes to form compo-nents. First the distances between the different programs in the call-graph was calculated,forming the DIST MATRIX. Then this information was used to calculate their hierarchi-cal clustering structure, similar to the approach of Fan-Chao [34], which clusters the nodesclosest to each other first. For the concrete implementation we used weighted clustering(WPGMA).
The result of the clustering is a dendrogram showing the way the program nodes canbe gradually clustered together forming larger components. This dendrogram, which wewill further refer to as TREE is shown in figure C.34. From this, clusters can be definedby deciding on the level of granularity at which the tree should be cut up into components.Such a sample devision is shown in figure C.35, where the different resulting componentsare colored differently.
4.3.3 Complete as-is model and Rating Model
The Rating Model is instantiated and used to estimate the effort for modernizing towardsachieving the SOA property of Composability. This is in our case the input parameter forthe Rating Model about the to-be system as specified in 3.3. Figure C.26 shows how thisstep was performed. On the left side, we have the set of subtrees and on the right we havethe TREE clustering structure. From these, we calculate the scattering using the code fromlisting B.4.
54
Data Analysis
4.4 Data Analysis
4.4.1 Rating Model indicator of Scattering
Descriptive statistics
• Relative frequency distribution for five of the treatments for the factor g:D300 (Appendix C.36), D500 (Appendix C.37), D1000 (Appendix C.38), D3000 (Ap-pendix C.39), D5386 (Appendix C.40)• box plots of the same five distributions (Appendix C.41)• plot of the median for all distributions of the treatment of g (Appendix C.42)• pot of the mean for all distributions of the treatment of g (Appendix C.43)• plot of the standard deviation for all distributions of the treatment of g (Appendix C.44)
The distribution plots show the relative frequency (y axis) of the scattering index (xaxis) in the set of business processes. All distributions display a logarithmic decay - highconcentration of low scattering and increasingly less occurrences of higher scattering in-dexes. When we compare the distributions with an increasing granularity (D300, D500 etc.),we also see that the scattering is very concentrated for low granularities, but that it spreadsout to higher values with the increasing granularity. How this happens exactly is seen forthe scattering indexes 1 to 4 in Appendix C.45, where the areas S2, S3, S4 display a rippleeffect from low to high scattering.
Furthermore, from the frequency distribution plot for the extreme case of D5386, it isnoticeable that there are no more business processes that have a scattering index of 1. Thismeans that in case of such a partitioning, every business process is implemented by two ormore separate components. How this situation arises can be seen in Appendix C.45. Atpoint P1 the scattering index 1 starts dropping and for the same granularity treatment atpoint P2 the scattering index 2 starts increasing with similar rate.
The most important fact to note about the scattering test data is based on the meanand median plots. Both plots over all treatments of the granularity show a clear increasingpattern. This indicates that there is a relationship between the treatment g and the dependentvariable of the experiment Scattering. We will test this statement as part of the hypothesistesting section (4.5.1).
Outliers
The box plots and the plot of the standard deviation show that there is an increasing numberof outliers, with an increasing deviation from the mean scattering value. Furthermore, al-though the mean value for each g increases, the scattering outliers increase at an even fasterrate. They seem to form groups with similarly large scattering. This is best seen in thebox plot for g=5386. There, we can see four groups (A, B, C, D) with similar very largescattering deviating significantly from the mean value.
Based on this observation, we decide not to dismiss these scattering outliers. Theydisplay a clear structure and are not sporadic. Furthermore, they are not the result of anythreat to the experiment validity such as measurement faults (see section 4.2.5). This means
55
4. LEGACY LOGISTICS SYSTEM – EFFORT ESTIMATION EXPERIMENT
that they supply important information. An interpretation of their presence is given in sec-tion 4.5.
4.4.2 Resulting Effort
Based on the test results for the scattering, we calculate the effort for all the treatmentsof g according to equation 4.2. The graph of the produced effort estimation is shown inAppendix C.46.
The Effort graph shows an increasing growth for an increasing granularity. It has twoimportant features. The first is the initial flat effort up to a granularity of 275 components.This is due to the fact that up to that point the total number of implementing components isso low that the implementation of every business process fits into exactly one component.This changes as the components become more fine-grained.
The second interesting feature of the effort graph is the kink around g=4250. To analyzeit further, we divide the business processes in five groups. These groups are formed basedon the scattering at g=3000. In the box plot for g=3000 (Appendix C.41) we showed thatthere are four groups with similar scattering indexes. We call them A, B, C and D fromhighest to lowest scattering. All the rest of the processes with lower scattering we put inthe fifth group – Z. The mean scattering in each of these groups of business processes isfollowed (AppendixC.47). The graph shows a sudden and sharp increase at g = 4250 in theaverage scattering for groups A,B,C and D, the ones with highest scattering.
Regression Analysis
Because we are interested in the Gain as a function of the Effort, we will analyze the in-verse of the Effort output. We investigate possible approximations of the relation betweenthe granularity and the Effort. We do this for two data sets: ALL containing all the datapoints from the Effort function, LIMIT containing all the data points up to the kink phe-nomenon described earlier starting at g=3250. The regression approximations we considerare evaluate based on their RMSE (Root Mean Squared Error). We consider three modelsfor the approximation:• Polynomials: a1 ∗ xn + ..+an ∗ x+ c for n = 1..8• Sum of sin functions: a1 ∗ sin(b1 ∗ x+ c1)+ ..+an ∗ sin(bn ∗ x+ cn) for n = 1..8• Power function: a∗ xb + c
The data about the approximations from table 4.1 shows three things:• the best approximation models on average for LIMIT and ALL are both based on
periodic functions• for ALL, the polynomial approximation delivers reasonably good results ( f =−4.065e−10∗
x2 +0.002961∗ x+454.3), but an increasing degree does not improve the fit signifi-cantly• for LIMIT, the power approximation is a also very good ( f = 0.2066∗ x0.6855 +210)
56
Interpretation of results
ALL LIMITn Polynomial Sum of sin Power Polynomial Sum of sin Power1 239.9 124.8 135.2 112 68.2 37.32 122.6 127.0 - 59.0 34.2 -3 122.0 92.6 - 42.5 33.1 -4 116.9 75.4 - 32.5 26.6 -5 95.0 58.0 - 32.0 25.2 -6 94.3 83.1 - 31.4 30.2 -7 93.0 66.7 - 29.5 28.1 -8 91.7 35.0 - 28.9 24.4 -Mean 121.9 82.8 135.2 46.0 33.8 37.3
Table 4.1: RMSE of Regression models
4.5 Interpretation of results
In this section we interpret the effort estimation results from the experiment using the dataanalysis from section 4.4.
4.5.1 Effort development and relationship to Gain
To definitively disprove the null hypothesis, we assume that Effort and Gain are not relatedto each other. This means that the independent variable of the experiment - granularity andthe dependent variable - scattering are assumed to have a normal distribution. The ANOVAanalysis of all scattering distributions Dg for all the treatments of g shows a p-value of 0.From this we conclude that the scattering is related to the granularity variable. This wasalready suggested by the increasing mean value of the scattering (Appendix C.43).
By construction in this experiment, the Effort is related to the scattering (E f f ort ∼scattering, equation 4.2) and the Gain is related to the granularity (Gain ∼ f lexibility ∼ g,section 4.2.3). When we use these two relationships together with the fact that the scatteringis related to the granularity, we can conclude that the Effort is related to the Gain followingthe Effort Estimation Framework (E f f ort ∼ scattering ∼ g ∼ Gain). This rejects the nullhypothesis.
4.5.2 Trade-off analysis – Gain/Effort
The main application of the proven relationship is to support trade-off analysis in the plan-ning phase of modernization. This becomes clear from the examination of the relationshipbetween Effort and Gain, using the regression analysis made earlier. There, we analyzedthe Gain as a function of the Effort, which means working on the inverted data points of theexperiment results.
For the LIMIT data subset the trend is modeled well enough by the power function. Onthe other hand, for the ALL set of data, the overall trend is captured by the polynomial ofsecond degree. It captures the trend in the data well and a further increase of the polynomial
57
4. LEGACY LOGISTICS SYSTEM – EFFORT ESTIMATION EXPERIMENT
degree does not significantly improve the fit. The difference between the two is that theapproximation of LIMIT has no maximum and the one of ALL does. The limit of thederivative of the power function approaches 0, This means that the approximation neverapproaches a maximum. On the other hand the quadratic function does have a maximum,because its derivative is linear decreasing function. Both approximation can be seen in thegraph in Appendix C.48.
The phenomenon at g= 4250 explains why although LIMIT has a power approximationit turns into a quadratic one when all points are considered. At that granularity, the complexprocesses with high scattering start disintegrating very fast with the increase in granularity.These processes have a high weight in the overall effort estimation and thus increase itsharply. This stops the growth that LIMIT displays and turns the overall relationship intoa quadratic function. The power function for the whole domain would mean that an everincreasing effort would deliver unlimited increase in gain. The quadratic overall relationshiphowever indicates that there is a maximum to the growth.
So the overall trend is quadratic, but for the more exact trade-off analysis the periodicapproximation is more useful. It models the local fluctuations in effort, which appear tobe periodic. This approximation by an eighth degree sum of sin functions is shown inAppendix C.48.
The trade-off analysis helps determine the best granularity level for the componentiza-tion of the legacy system. For this purpose, the optimal points have to be identified. Theseoptimal points lie on the Pareto frontier (Appendix C.49) and are defined by the optimalcombination of Effort and Gain. In this case, this means minimizing the effort and max-imizing the gain. So the utility function to minimize is ∑(1−Gain) +E f f ort(g). Theresulting graph is shown in figure C.50. It contains one global optimum (GO) and severallocal optima ordered in utility (LO1, LO2, etc). The global optimum is at the same granu-larity as the kink described earlier caused by the scattering of the more complex processesexploding. This means that beyond that point splitting the complex business processes upto increase the granularity is not worth the effort. So unless it is a requirement to reach acertain higher level of granularity, it is recommended to stay at the global optimum point.
The growth in gain per unit effort can also be used in the decision to increase or decreasethe chosen level of granularity. This growth is displayed by the derivative of the regressionapproximation (Appendix C.51).
4.6 Discussion and conclusion
Some unexpected results were observed in this experiment. This discussion presents theirpossible interpretation and consequences. Still, the results of the experiment also confirmedthe hypothesis about the effort/gain relationship and support a number of conclusions. Thesewill be presented before concluding with suggestions about future work and extension ofthe experiment setup.
In the first place, the unusual outliers were observed in the box plots of the scatteringdistributions Dg (Appendix C.41). They form groups with similar scattering much higherthan the average. These groups also have different growth rate of increasing scattering.
58
Discussion and conclusion
This seams to indicate that there is an additional factor that is related to the scattering -the complexity of the business process. This complexity is defined as the number of steps(executing Cobol PROGRAMs). More complex processes react differently to the increase ingranularity than more simple ones. This indicates than an additional experiment parameter(independent variable) is in its place - the percentage of complex processes in the legacysystem. It might be possible to use it as an effort indicator.
The second unexpected observation is in the discovered dependency between the Gainand the Effort. It does resemble to a certain degree the overall expected dependency fromfigure 3.1. However, the growth of the effort is not exponential as expected, but polynomial.One explanation for this could be that the effort growth is a system dependent characteristicrelated to the percentage of complex processes in it. As the experiment examined onlyone subject system there is not enough evidence to give a definitive answer. An extendedresearch comparing different systems could clarify this issue.
A second reason for the limited polynomial growth of the Effort could be the constrainedsetup of this experiment. It considers only one factor for the Gain and only one effort indi-cator. The absence of certain factors in the experiment that are are inversely related to thenumber of system components and are more dominant for higher granularities such as per-formance (or maintenance of the resulting components) might be the reason for the limitedgrowth of the effort. Generalizing this idea, effort indicators could be classified accordingto the degree they stimulate (negative factors) or limit (positive factors) the growth in ef-fort. This would make it possible to estimate how balanced and thus how representative theestimation is. Too many indicators from one category could make the results skewed.
Finally, the third interesting observation is related to the effort regression analysis. Therelation of the effort and thus also the scattering to the granularity seams to be periodic. Thebest approximation on average of the effort is the sum of sin functions. It has an error lowerthan polynomial and power approximations even for low degrees. An explanation for thismight be the ripple effect in the scattering development that was shown in appendix C.45.Business processes gradually ascent into higher degrees of scattering with the increasinggranularity following a periodic motion.
We can conclude that with this experiment the relationship between scattering and gran-ularity was established. Through this, also the Gain was related to the Effort in the contextof the Effort Estimation Framework. It was also shown that these scattering and effort esti-mations can be used for trade-off analysis. It was shown that there is a point where splittingfurther the processes is not lucrative any more looking at the effort/gain ratio. However,an extension of the experiment is necessary with more independent variables in order toclarify two of the observations mentioned in the discussion above. In the first place, whatis the significance of the suspected new parameter for the percentage of complex businessprocess. Secondly, are there any other parameters that would make the effort estimationdisplay exponential growth.
59
4. LEGACY LOGISTICS SYSTEM – EFFORT ESTIMATION EXPERIMENT
4.6.1 Future work – Increase accuracy with Service Model
This subsection presents the suggested approach for increasing the accuracy of the effortestimation in the experiment. It has not been tested in practice due to time constraints.The increased accuracy is based on the fact that also the service layer is being taken intoconsideration for the modernization. This more closely resembles the desired SOA end-organization. The service layer is the intermediate abstraction layer between the invokingbusiness processes and implementing components in the Development View (figure C.6.
For producing a model of this layer different identification techniques can be used. For-mal Concept Analysis(FCA) [28, 62, 96] for the abstracting the Business Object Model, as-pect mining for the identification of infrastructural services [102] and program slicing [20,73] for the extraction of the three basic service type defined by the estimation framework– Information Services, Business Services, General Services. For the extraction of the ser-vices the following encapsulation model can be used, suggested by Sneed [89] shown infigure C.55. Specifically for Information services, the approach should be to componentizebased on coupling to data types (cluster functionality around data types) and encapsulationas DAS as shown in figure C.4
Incorporation of this approach in the structure of the performed experiment is similarto the existing model extraction steps. The goal here is also to keep the traceability tothe code level. This is possible because the above-mentioned techniques are bottom-up,generating abstractions from the code. This, opposed to the top-down approaches using adomain model put together by domain experts. This relation of the service model to thecode level entities can then be used to establish their corresponding legacy components andbusiness processes. This process is shown in figures C.52 and C.53. Figure C.54 depicts thereorganization that would have to take place in order to reach a 1-to-1 mapping between theimplementing components and the business process activities.
60
Chapter 5
Research evaluation
This research was targeted at the problem stated in chapter 1, in the field of system modern-ization towards SOA. That problem is the need for effort estimation in support of decisionmaking in such modernization projects. Now that this thesis has presented the Effort Es-timation Framework in chapter 3 together with an empirical experiment in chapter 4, wecan evaluate how they contributes to achieving the goal expressed in the problem statement.First, the contribution of this research is evaluated based on the applications of the deliveredEffort Estimation Framework (§5.1). Despite its broad applicability, the framework still hasits limitations, which are evaluated in section 5.2. In conclusion, both the framework’scontribution and its limitations are taken into consideration in an evaluation of the overallquality (§5.3).
5.1 Contribution and applications of research results
The main contribution of this thesis is the structure it gives to the domain of effort estimationin modernizing towards SOA. The framework that is presented, creates the organizationnecessary to tackle to problem of effort estimation in a systematic way that did not existbefore.
In the first place, the thesis identifies the multiple dimensions relevant to forming acomplete view of the problem domain from existing research. These dimensions encom-pass approaches to system and modernization process modeling. Building on this base, aninnovative contribution is made consisting of two parts. The first innovative addition is theselection and combination of different parts of the existing approaches. A decision is madeon how to fit the complementing parts together. The second innovative addition of this re-search is the method for effort estimation. It delivers a rating based on the systematizationof the issues relevant to effort estimation and the causes of modernization effort. The centralconcept in this Rating Model is the Point of Modernization. This abstraction decouples theeffort estimation from the underlying system models created by existing approaches.
The presented Effort Estimation Framework has two main applications. The first is intrade-off analysis as part of the planning phase of modernization projects. The second is its
61
5. RESEARCH EVALUATION
use as a frame of reference for future work on impact analysis of the modernization towardsSOA.
The framework supports trade-off analysis between effort and gain, because of its con-struction. It accepts an input in the form of parameters describing the desired gain from themodernization towards SOA such as flexibility and lower coupling. On the other hand theframework produces the effort estimation as output. The framework relates effort estima-tion to gain and enables in this way the search for an optimal modernization plan with a setof requirements for both entities as a starting point.
The framework also offers a frame of reference for future work on the impact of mod-ernization towards SOA. It contains an organization of the basics of the problem domain ina form allowing extension. This reference can be used in defining a clear scope of futureresearch through its overview of the related areas. In this way, concrete areas can be studiedin isolation and afterwards the findings can be related to the whole. This can also be used inthe definition of empirical experiments and case studies. The framework eases the design(choice of independent and dependent variables) and the comparison of results.
Finally, such a frame of reference is also useful for work in the industry. It deliversguidance for modernization discussions by relating the major issues of modernizing to eachother. It can function as a checklist in assessing the impact of decisions. The framework is ata high enough level to be instantiated for many different legacy systems. Most importantly,the rational behind the construction of the framework can be traced back to the scientificsources and its applicability can be verified for a particular case at hand.
The thesis also discusses the possible influence of issues/factors that fall outside ofthe framework scope such as business gain, modernization impact on the organization andbusiness and modernization process phases and cost drivers.
5.2 Limitations of the framework
The Effort Estimation Framework is based on a set of assumptions and configurations fortwo reasons, both of which are aimed at improving the framework. The first is to narrowdown the domain of considered systems and modernization strategies so that the effort es-timation is more accurate. But at the same time also not making it too specific in order tokeep the framework reusable(the framework dilemma, section 2.1). The second reason is toenable the choice, through configuration, for the most suitable effort estimation techniques,by considering the target Service Oriented Architecture and the strategies to modernize to-wards such an environment.
Next to leading to the above-mentioned improvements, these choices of assumptionsand configurations also lead to limitations. We consider a limitation to be an ability orfeature that the framework does not have, but that is desirable in order to deliver the mostcomplete effort estimation results. Furthermore, we consider the limitations to manifestthemselves in one of three possible areas:• the application area of the framework and the supported input types• the achieved estimation accuracy of the framework• the relevancy of the output
62
Limitations of the framework
Following is the discussion of these limitations. We do this per root cause, starting with thescope and ending with the framework assumptions and configuration.
Scope limitations
Risk – the framework considers the trade-off analysis between effort and gain. This is theresult of the impact assumption in section 3.1. There is also another important moderniza-tion factor in the literature – risk [87, 94]. It is not included as a factor in the evaluations inthe Rating Model and affects the accuracy of the framework’s optimal point estimation.Business domain – the effort estimation is done in the technical domain. This makes it moreeasily quantifiable, but it also distances it from the business domain which is generally morerelevant for the enterprise strategy management. This separation between the Business gainand Technical gain was also made explicit in section 3.1 and limits the both input and outputof the framework to technical entities. This is something that the framework certainly needsto be extended in and that is why this issue is also treated in the chapter on future work 6.Technical implementation – The framework is specified in isolation from the out-of-scopedomains. The fact that such a relationship exists has been noted in section 3.1. However,because of the scope of this research, the concrete interaction between the domains hasnot been defined, such as for example impact propagation between them. Specifically, theseparation from the Technical domain creates limitations. The framework does not specifythe exact impact propagation to the implementation level an this limits the relevancy of theoutput it supplies. It still is on a relatively high-level of abstraction and not expressed inconcrete realization terms. This limitation, however, is justifiable for two reasons. The firstis that this link falls under the category of formal transformations [59] and is consideredtrivial. The second is that it is implementation technology dependent and this step should,thus, be left for the instantiation and calibration of the framework.
Framework assumptions and configuration limitations
Procedural legacy systems – another limitation on the input is due the assumption L-2 insection 3.3. The effort estimation in the Identification group of the Rating Model couldbe more accurate through a more effective legacy system modeling if extended with otherparadigms (e.g. Object-Oriented). However, this is not a serious limitation as a largestpart of the legacy systems considered for modernization do make use of the proceduralparadigm.
Business service identification approach – the approach to Business service identificationconfigured in the Control Model in section 3.3.3, limits the accuracy of the effort estimation.Because the models are extracted bottom-up from the existing code they tend to preserveits flaws such as inefficient business process flows or low level of reuse. The bottom-upapproach taken in the framework introduces, thus, an error in the estimation as it preservesthe system organization mainly in the Business domain. It would eventually be needed toremove these flaws at the cost of extra effort.
63
5. RESEARCH EVALUATION
Furthermore, the consequence of the current configuration as presented in section 3.3 arethe following limitations:• Input: the approach taken, using SOA properties as input, gives a rather limited range
of options and is possibly on a too high-level of abstraction. This could be overcomeby extending the target system specification possibilities. For this purpose, we suggestusing the approaches for the evaluation of a Service-Oriented system, which wereidentified in the literature study such as [46, 95, 8].• Output: the precision of the output is currently limited to the ordinal scale. This
is the consequence of the absence of quantitative information about the relationshipbetween the effort for different modernization strategies. This issue was described inthe “Strategy effort indicators and rating aggregation” subsection of section 3.3.1.• Accuracy: the accuracy of the effort estimation is limited due to the following:
– the level of abstraction used in the architectural models is rather high. It goesdown only to components and connectors and not to function level (section 3.3.1,configuration C-2)
– the choice not to consider the propagation of impact between domains yet. Thisconfiguration issue is described in the “Points of Modernization as basis foreffort estimation” subsection of section 3.3.1.
– the Rating Model does not take into account such an impact propagation andassumes a single effort aggregation pass in the order identification, componen-tization, integration, SOA capabilities.
– no quantitative use of pattern information. As was argued in section 3.3.1 con-figuration C-4, the use of design patterns in this estimation framework remainslimited. The extend of this limitation is described at the end of section 3.4.3.
– the absence of concrete quantitative data about the relation of the strategies andindicators to effort. The field of effort estimation for modernization towardsSOA is rather unexplored so currently the presented indicators and strategiesare only qualitatively related to effort. Each one of them has to be researchedto make this relation concrete and also how combining them together influencesthe estimation.
5.3 Resulting quality of the framework
From the description of the limitations above, we can conclude that the framework is coarsegrained in its approach and the resulting estimations. This is due to the absence of empiricalresults to support a more accurate relationship specification between indicators and effort.It does not, however, lack any essential aspects in rating the modernization effort. It offersa structure that can be further refined by systematically investigating its subparts.
The quantification method’s accuracy depends on the correctness of the relationshipbetween effort indicators, modernization strategies and identification of PoM. And also onthe weights of these relationships. These should generally be calibrated through industryexperience. But even if the method is calibrated the analysis process is generally as good asthe information it has to work with. This makes the concrete approaches used in the Reverse
64
Resulting quality of the framework
Engineering phase the weakest link. And because we configure the framework to rely onautomated evaluation, this introduces the limitations of the absence of expert knowledge.
65
Chapter 6
Summary, Conclusion andFuture Work
6.1 Summary
This thesis is aimed at the problem of impact analysis of the modernization of legacy sys-tems towards Service Oriented Architecture. Chapter 1 specified the need for a methodfor decision making specifically relying on the estimation of the effort expected for mod-ernization. The goal of the research is to establish a method for the estimation of thismodernization effort as part of overall impact analysis.
The solution proposed in this thesis is the Effort Estimation Framework. It is based onexisting work from several areas of research. Chapter 2 describes this background in the ar-eas of software architecture (§2.1), service-orientation (§2.2) and modernization approaches(§2.3, §2.4). These existing approaches are used to capture the information on which theeffort estimation is based. There are four such important sources of information. In the firstplace, these are the models of the legacy architecture and Service Oriented Architecture. Inaddition to this, the process of modernization, the strategies used and their impact is alsomodeled. In third place, the overall context of the modernization is described such as stake-holders, domains and costs-drivers to put the effort estimation into perspective. Finally, thespecific issues in the modernization of software system towards SOA are identified.
The framework itself is presented in chapter 3. It is built in several stages - assumptions(§3.1, §3.2), configuration (§3.3) and specification (§3.4). This allows the adaptation of theframework to specific domain or as a result of further research. Its contribution is the orga-nization of the existing approaches so that they can be used for effort estimation specificallyfor the modernization towards Service Oriented Architectures (§3.4.1). For the purpose ofestablishing such an effort estimation, the framework specifies on top of the approaches aRating Model (§3.4.3) and a process for gathering the required information (§3.4.2). TheRating Model relates measurements of architectural characteristics to effort and aggregatesthese estimations. Essential for this is the concept of Point of Modernization (PoM). TheRating Model offers an effort estimation output on an ordinal scale. It is based on a dis-tinction between impact types, classification of modernization concerns, ordering of themodernization strategies according to effort and a classification of measurable indicators.
67
6. SUMMARY, CONCLUSION AND FUTURE WORK
The use of the framework is presented in chapter 5. The chapter describes an empiricalexperiment that demonstrates the instantiation of part of the framework (§4.2.5, §4.3). Theexperiment evaluates the effectiveness of the framework based on the measurement of theeffort indicator of Scattering. The outcome of the experiment shows the relationship thatexists between the framework input of Gain and its output of Effort (§4.4.2). This demon-strates that the Effort Estimation Framework is effective in producing an effort estimationgiven input of required gain. This relationship is then used to demonstrate the applicationof the effort estimation for trade-off analysis (§4.5).
Finally, the Effort Estimation Framework and the research results are evaluated in chap-ter 5. The evaluation treats three aspects. The first is concerned with the application of theproposed framework. The suitability of the framework is discussed for impact and trade-off analysis (§5.1). Secondly, the limitations of the framework are discussed anticipatedas a result of the assumptions and choices made (§5.2). Finally, the overall quality of theapproach taken in the framework is confirmed (§5.3).
6.2 Conclusion
The main conclusion from this thesis is that automated estimation of modernization efforttowards SOA is feasible. The theoretical context (chapter 2) shows that there are existingmethods describing the main aspects of the problem of modernization effort estimation.What is missing is a method to bring them together. This thesis shows that they can becombined to form a structure for tackling the problem (chapter 3).
The suggested Effort Estimation Framework shows how the different aspects/methodsfrom software architecture and modernization can be integrated. They complement eachother in different areas which makes it possible to build an all-round view of the causes ofeffort in legacy modernization. The resulting framework divides the problem of effort esti-mation in separate fragments (phases, domains, responsibilities, modernization strategies).This lowers the complexity of the problem as estimations can be made for the different partsseparately and then aggregated. The Rating Model makes this possible through the Pointsof Modernization and the categorizations it uses.
Furthermore, the empirical part of the research using the Effort Estimation Framework(chapter 4) leads to establishing the following facts:• the Gain and Effort in the constructed framework are related to each other through on
the scattering and granularity variables• this relationship can be used for trade-off analysis and to find an optimal solution with
given required levels of Effort and Gain• the discovery of an unexpected parameter which could be useful - the percentage of
complex business processes• the parameters used are not enough to model all the expected characteristics of the
Effort estimation developmentWe conclude with the remark that this research is only an initial attempt and requires
much more work, both theoretical and empirical. A vision on the direction of this futurework is described in the following section.
68
Future work
6.3 Future work
This section presents the big picture around the findings of this research, emphasizing onthe future possibilities. We describe the path towards an accurate effort estimation of mod-ernization projects (Appendix C.56), the eventual goal in legacy system modernization, andposition our current work on it.
The path towards a good estimation of modernization effort is divided in stages in termsof cost. Each stage introduces a change in scale and units of measurement, increasing theaccuracy of the Rating Model. This gradation is shown on the left side in figure C.56. Inthe middle is shown the central for this research process of modernizing. Based on themodernization strategies that can be applied to perform this process, we presented a RatingModel. This model is concentrated in the stage in the estimation path - the Difference Iden-tification. The highest accuracy level achievable here based on the available information isan estimation on the ordinal scale. The availability of the source information is determinedin the preceding stage of Model Extraction and SOA specification for the legacy and targetsystems respectively. Their goals are determined during the Portfolio Analysis stage. Fol-lowing the Difference Identification stage is the Trade-off Analysis weighting the possiblemodernization strategies against each other. Finally comes the stage that translates the effortinto cost through a Cost Model. Each of these stages and their possibilities are clarified inthe following subsections.
Portfolio Analysis
The existing approach to portfolio analysis is rather limited (section 2.3). In addition tothis, the results of this essential process are not yet related to the framework. One of theassumptions of the research is that this analysis has been successfully performed and itsresults can readily used as input for the effort estimation. The future challenge is to extendthe framework by shifting its input side further in the direction of the Business Goals. Thiswill make the link possible between the Business Gain and the SOA gain (section 3.1).
A second area for improvement is the models used as input for the framework. Theyare currently fixed, but they depend on the business goals. For example, a custom selec-tion of models could be made in case of a target of greater Return on investment throughincreased reuse. The same applies for the specification of the target system configuration.The decision for or against SOA as enabling technology and the selection of desired SOAcapabilities depends on the set Business Goals. The aim is to link the business need to thetechnical need. So linking the framework to the output of the Portfolio Analysis process willlead to improved: a) legacy model selection, b) SOA capabilities specification. Altogether,this will make it possible to make a more precise estimation guided directly by the businessgoals of the company.
Decision Model
The currently used Rating Model does not support any constraints. These do exist in practiceand may mean that a sub-optimal solution must be chosen. This means that the estimation
69
6. SUMMARY, CONCLUSION AND FUTURE WORK
given actually forms and lower limit estimation of the modernization effort. Taking intoaccount constraints such as resources, technology, know-how to make a selection from theavailable modernization strategies, could lead to an increase in the estimation. These con-straints can also be seen as requirements(soft and hard) on the to-be system as for examplefixed operating system, programming language, SOA infrastructure supplier, performance.Together these form the Decision Model. The final output of the analysis based on thismodel is a modernization plan containing a chosen strategy covering all Points of Mod-ernization. The criteria for making these choices can differ depending on the goals of thecompany. For this reason we theorize that an additional Relation Model is necessary. It isan interchangeable view on the system on how to prioritize/valuate the constraints incor-porated in the Decision Model. Possible approaches are to prioritize the application of theredesign strategy to components with a high business value or to consider only black-boxstrategies for components with a high reusability degree. The Decision Model is used toquantify the differences between the solutions the different strategies suggest. This makesit possible to switch over to an interval scale. It also has to include a more elaborate gainvaluation of the impact of modernization as explained in section 3.1.
Cost Model
The effort output of the framework is intended as input to a Cost Model. This model willrelate the effort to a concrete implementation and its costs. This will make it possible toswitch to a ratio scale and use money as output. Here we come back to the types of costsinvolved in a modernization project mentioned in section 2.3. The cost items mentionedthere each belong to one of the three categories:
1. Reengineering Investment Costs (RIC)2. Operations and Support Costs for Reengineered Components (OSRC)3. Operations and Support Costs for Legacy Components (OSLC)
Each of these cost items has its origins in one of the four phases of system evolution: A).PlanEvolution, B).Implement, C).Deliver and D).Deploy.
The framework gives an effort estimation that serves as input for the combination A1 -the RIC costs in the Planning stage. The challenge for the future is to extend this frameworkwith the estimation of the effort needed for the calculation of the other possible combina-tions. In this phase it is also of importance to know how the estimates can be improved. Asthe output of this phase can be verified with real projects, there is a possibility for calibrationthrough iteration.
Extension current approach
Furthermore, the accuracy of the current approach could be improved by extending it with:• extend service model identification approximation with business-goal driven service
identification (top-down approach)• add dynamic analysis(Wu [100]): run the system and keep track of the invocation of
each business rule to obtain an indication of its business value. higher usage ->higherbusiness value
70
Bibliography
[1] W. Al Belushi and Y. Baghdadi. An approach to wrap legacy applications into webservices. June 2007.
[2] Robert Allen and David Garlan. A formal basis for architectural connection. ACMTrans. Softw. Eng. Methodol., 6(3):213–249, 1997.
[3] Francesca Arcelli, Christian Tosi, and Marco Zanoni. Can design pattern detectionbe useful for legacy systemmigration towards soa? In SDSOA ’08: Proceedings ofthe 2nd international workshop on Systems development in SOA environments, pages63–68, New York, NY, USA, 2008. ACM.
[4] A. Arsanjani, S. Ghosh, A. Allam, T. Abdollah, S. Gariapathy, and K. Holley. Soma:a method for developing service-oriented solutions. IBM Syst. J., 47(3):377–396,2008.
[5] Ali Arsanjani and Kerrie Holley. The service integration maturity model: Achiev-ing flexibility in the transformation to soa. In SCC ’06: Proceedings of the IEEEInternational Conference on Services Computing, page 515, Washington, DC, USA,2006. IEEE Computer Society.
[6] Joachim Bayer, Jean-Francois Girard, Martin Wurthner, Jean-Marc DeBaud, andMartin Apel. Transitioning legacy assets to a product line architecture. SIGSOFTSoftw. Eng. Notes, 24(6):446–463, 1999.
[7] Jesal Bhuta and Barry Boehm. A framework for identification and resolution of in-teroperability mismatches in cots-based systems. In IWICSS ’07: Proceedings of theSecond International Workshop on Incorporating COTS Software into Software Sys-tems: Tools and Techniques, page 2, Washington, DC, USA, 2007. IEEE ComputerSociety.
[8] Phil Bianco, Rick Kotermanski, Summa Technologies, and Paulo Merson. Evaluatinga service-oriented architecture. Technical Report CMU/SEI-2007-TR-015, CarnegieMellon University, Pittsburgh, PA, USA, 2007.
71
BIBLIOGRAPHY
[9] Norbert Bieberstein, Robert G. Laird, Dr. Keith Jones, and Tilak Mitra. ExecutingSOA: A Practical Guide for the Service-Oriented Architect. IBM Press, May 2008.
[10] Kevin Bierhoff, Mark Grechanik, and Edy S. Liongosari. Architectural mismatchin service-oriented architectures. In SDSOA ’07: Proceedings of the InternationalWorkshop on Systems Development in SOA Environments, page 4, Washington, DC,USA, 2007. IEEE Computer Society.
[11] Steven J. Bleistein, Karl Cox, and June Verner. Strategic alignment in requirementsanalysis for organizational it: an integrated approach. In SAC ’05: Proceedings ofthe 2005 ACM symposium on Applied computing, pages 1300–1307, New York, NY,USA, 2005. ACM.
[12] T. Bodhuin, E. Guardabascio, and M. Tortorella. Migrating cobol systems to the webby using the mvc design pattern. In WCRE ’02: Proceedings of the Ninth WorkingConference on Reverse Engineering (WCRE’02), page 329, Washington, DC, USA,2002. IEEE Computer Society.
[13] B. Boehm, B. Clark, E. Horowitz, C. Westland, R. Madachy, and R. Selby. Costmodels for future software life cycle processes: Cocomo 2.0. Annals of SoftwareEngineering, 1:57–94, December 1995.
[14] Shawn A. Bohner. Extending software change impact analysis into cots components.In SEW ’02: Proceedings of the 27th Annual NASA Goddard Software EngineeringWorkshop (SEW-27’02), page 175, Washington, DC, USA, 2002. IEEE ComputerSociety.
[15] Diego Bovenzi, Gerardo Canfora, and Anna Rita Fasolino. Enabling legacy systemaccessibility by web heterogeneous clients. In CSMR ’03: Proceedings of the Sev-enth European Conference on Software Maintenance and Reengineering, page 73,Washington, DC, USA, 2003. IEEE Computer Society.
[16] Paul Brown. Succeeding with SOA: Realizing Business Value Through Total Archi-tecture. Addison-Wesley Professional, 2007.
[17] Paul Brown. Implementing SOA: Total Architecture in Practice. Addison-WesleyProfessional, 2008.
[18] Frank Buschmann, Regine Meunier, Hans Rohnert, Peter Sommerlad, and MichaelStal. Pattern-oriented software architecture: a system of patterns. John Wiley &Sons, Inc., New York, NY, USA, 1996. TU-Bib: IN69D22Busc.
[19] G. Canfora, A.R. Fasolino, G. Frattolillo, and P. Tramontana. Migrating interactivelegacy systems to web services. Software Maintenance and Reengineering, 2006.CSMR 2006. Proceedings of the 10th European Conference on, pages 10 pp.–36,March 2006.
72
[20] Gerardo Canfora, Aniello Cimitile, Andrea De Lucia, and Giuseppe A. Di Lucca.Decomposing legacy programs: a first step towards migrating to client-server plat-forms. J. Syst. Softw., 54(2):99–110, 2000.
[21] Gerardo Canfora, Anna Rita Fasolino, Gianni Frattolillo, and Porfirio Tramontana. Awrapping approach for migrating legacy system interactive functionalities to serviceoriented architectures. J. Syst. Softw., 81(4):463–480, 2008.
[22] Satish Chandra, Jackie de Vries, John Field, Howard Hess, Manivannan Kalidasan,Komondoor V. Raghavan, Frans Nieuwerth, Ganesan Ramalingam, and Justin Xue.Using logical data models for understanding and transforming legacy business appli-cations. IBM Syst. J., 45(3):647–655, 2006.
[23] Chia-Chu Chiang. Automated software wrapping. In ACM-SE 45: Proceedings ofthe 45th annual southeast regional conference, pages 59–64, New York, NY, USA,2007. ACM.
[24] E. J. Chikofsky and J. H. Cross. Reverse engineering and design recovery: A taxon-omy. IEEE Software, 7(1):13–17, 1990.
[25] Katja Cremer, Andre Marburger, and Bernhard Westfechtel. Graph-based tools forre-engineering. Journal of Software Maintenance, 14(4):257–292, 2002.
[26] Fred A. Cummins. Building the Agile Enterprise: With SOA, BPM and MBM. Mor-gan Kaufmann Publishers Inc., San Francisco, CA, USA, September 2008. TU-Bib:IN69D33Cumm.
[27] Andrea De Lucia, Rita Francese, Giuseppe Scanniello, and Genoveffa Tortora. De-veloping legacy system migration methods and tools for technology transfer. Softw.Pract. Exper., 38(13):1333–1364, 2008.
[28] Concettina Del Grosso, Massimiliano Di Penta, and Ignacio Garcia-Rodriguezde Guzman. An approach for mining services in database oriented applications. InCSMR ’07: Proceedings of the 11th European Conference on Software Maintenanceand Reengineering, pages 287–296, Washington, DC, USA, 2007. IEEE ComputerSociety.
[29] Merriam-webster’s online dictionary, 11th edition. http://www.merriam-webster.com/dictionary/dendrogram.
[30] Mark Endrei, Jenny Ang, Ali Arsanjani, Sook Chua, Philippe Comte, Pl Krogdahl,Min Luo, and Tony Newling. Patterns: Service-oriented Architecture and Web Ser-vices. IBM Redbook, April 2004.
[31] Thomas Erl. SOA: Principles of Service Design. Prentice Hall, July 2007.
[32] Thomas Erl. SOA Design Patterns. Prentice Hall, December 2008.
73
BIBLIOGRAPHY
[33] M. Ernest and J. M. Nisavic. Adding value to the it organization with the componentbusiness model. IBM Syst. J., 46(3):387–403, 2007.
[34] Meng Fan-Chao, Zhan Den-Chen, and Xu Xiao-Fei. Business component identifica-tion of enterprise information system: A hierarchical clustering method. In ICEBE’05: Proceedings of the IEEE International Conference on e-Business Engineering,pages 473–480, Washington, DC, USA, 2005. IEEE Computer Society.
[35] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design patterns:elements of reusable object-oriented software. Addison-Wesley Longman PublishingCo., Inc., Boston, MA, USA, 1995.
[36] David Garlan. Software architecture: a roadmap. In ICSE ’00: Proceedings of theConference on The Future of Software Engineering, pages 91–101, New York, NY,USA, 2000. ACM.
[37] David Garlan, Robert Allen, and John Ockerbloom. Architectural mismatch: Whyreuse is so hard. IEEE Softw., 12(6):17–26, 1995.
[38] David Garlan and Mary Shaw. An introduction to software architecture. Technicalreport, Carnegie Mellon University, Pittsburgh, PA, USA, 1994.
[39] Rainer Gimnich. Component models in soa realization. ftp://ftp.software.ibm.com/software/emea/de/soa/sqm2007_gimnich_presentation.pdf, 2007.
[40] Eduardo Machado Goncalves, Marcilio Silva Oliveira, and Kleber Rogerio Bacili.Digitalassets discoverer: automatic identification of reusable software components.In OOPSLA ’07: Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion, pages 872–873, NewYork, NY, USA, 2007. ACM.
[41] I. Garcia-Rodriguez de Guzman, M. Polo, and M. Piattini. An adm approach toreengineer relational databases towards web services. In WCRE ’07: Proceedingsof the 14th Working Conference on Reverse Engineering, pages 90–99, Washington,DC, USA, 2007. IEEE Computer Society.
[42] Neil B. Harrison, Paris Avgeriou, and Uwe Zdun. Using patterns to capture architec-tural decisions. IEEE Softw., 24(4):38–45, 2007.
[43] Carsten Hentrich and Uwe Zdun. Patterns for business object model integration inprocess-driven and service-oriented architectures. In PLoP ’06: Proceedings of the2006 conference on Pattern languages of programs, pages 1–14, New York, NY,USA, 2006. ACM.
[44] Peter Herzum and Oliver Sims. Business Components Factory: A ComprehensiveOverview of Component-Based Development for the Enterprise. John Wiley & Sons,Inc., New York, NY, USA, 2000.
74
[45] H. M. Hess. Aligning technology and business: applying patterns for legacy trans-formation. IBM Syst. J., 44(1):25–45, 2005.
[46] Helge Hofmeister and Guido Wirtz. Supporting service-oriented design with met-rics. In EDOC ’08: Proceedings of the 2008 12th International IEEE EnterpriseDistributed Object Computing Conference, pages 191–200, Washington, DC, USA,2008. IEEE Computer Society.
[47] Z.-W. Hong and H. Jiau. A design pattern for reengineering windows software ap-plications into reusable corba objects. In FTDCS ’01: Proceedings of the 8th IEEEWorkshop on Future Trends of Distributed Computing Systems, page 215, Washing-ton, DC, USA, 2001. IEEE Computer Society.
[48] John Hutchinson, Gerald Kotonya, James Walkerdine, Peter Sawyer, Glen Dobson,and Victor Onditi. Migrating to soas by way of hybrid systems. IT Professional,10(1):34–42, 2008.
[49] IEEE. Recommended practice for architectural description of software intensive sys-tems, September 2000.
[50] Per Jnsson and Mikael Lindvall. Impact analysis. In Engineering and ManagingSoftware Requirements. Springer Berlin Heidelberg, 2005.
[51] Capers Jones. Assessment and control of software risks. Yourdon Press, Upper SaddleRiver, NJ, USA, 1994.
[52] Nicolai Josuttis. SOA in practice. O’Reilly, 2007.
[53] Stephen H. Kaisler. Software Paradigms. John Wiley & Sons, 2005.
[54] Mira Kajko-Mattsson, Grace A. Lewis, and Dennis B. Smith. A framework for rolesfor development, evolution and maintenance of soa-based systems. In SDSOA ’07:Proceedings of the International Workshop on Systems Development in SOA Envi-ronments, page 7, Washington, DC, USA, 2007. IEEE Computer Society.
[55] Mark Kasunic and William Anderson. Measuring systems interoperability: Chal-lenges and opportunities. Technical Report CMU/SEI-2004-TN-003, Carnegie Mel-lon University, Pittsburgh, PA, USA, 2004.
[56] Mansour Kavianpour. Soa and large scale and complex enterprise transformation.In ICSOC ’07: Proceedings of the 5th international conference on Service-OrientedComputing, pages 530–545, Berlin, Heidelberg, 2007. Springer-Verlag.
[57] Rick Kazman, Mario Barbacci, Mark Klein, S. Jeromy Carriere, and Steven G.Woods. Experience with performing architecture tradeoff analysis. In ICSE ’99: Pro-ceedings of the 21st international conference on Software engineering, pages 54–63,New York, NY, USA, 1999. ACM. ATAM.
75
BIBLIOGRAPHY
[58] Rick Kazman, Paul C. Clements, Len Bass, and Gregory D. Abowd. Classifyingarchitectural elements as a foundation for mechanism matching. In COMPSAC ’97:Proceedings of the 21st International Computer Software and Applications Confer-ence, pages 14–17, Washington, DC, USA, 1997. IEEE Computer Society.
[59] Vitaly Khusidman. Adm transformation. architecture-driven modernization andtransformation. http://www.omg.org/docs/admtf/08-06-10.pdf, 2008.
[60] Vitaly Khusidman and William Ulrich. Architecture-driven modernization: Trans-forming the enterprise. http://www.omg.org/docs/admtf/07-12-01.pdf, 2007.
[61] Phillippe Kruchten. Architecture blueprints—the “4+1” view model of software ar-chitecture. In TRI-Ada ’95: Tutorial proceedings on TRI-Ada ’91, pages 540–555,New York, NY, USA, 1995. ACM.
[62] Tobias Kuipers and Leon Moonen. Types and concept analysis for legacy systems.In IWPC ’00: Proceedings of the 8th International Workshop on Program Compre-hension, page 221, Washington, DC, USA, 2000. IEEE Computer Society.
[63] Grace Lewis, Edwin Morris, and Dennis Smith. Analyzing the reuse potential of mi-grating legacy components to a service-oriented architecture. In CSMR ’06: Proceed-ings of the Conference on Software Maintenance and Reengineering, pages 15–23,Washington, DC, USA, 2006. IEEE Computer Society.
[64] Grace A. Lewis, Edwin J. Morris, Dennis B. Smith, and Soumya Simanta. Smart:Analyzing the reuse potential of legacy components in a service-oriented architectureenvironment. Technical Report CMU/SEI-2008-TN-008, Carnegie Mellon Univer-sity, Pittsburgh, PA, USA, 2008.
[65] Lars Lundberg, Jan Bosch, Daniel Hggander, and Per olof Bengtsson. Quality at-tributes in software architecture design. In Proceedings of the IASTED 3rd Interna-tional Conference on Software Engineering and Applications, pages 353–362, 1999.
[66] C. Matthew MacKenzie, Ken Laskey, Francis G. McCabe, Peter F Brown, and Re-bekah Metz. Reference model for service oriented architecture 1.0. Technical report,OASIS, October 2006.
[67] Francis G. McCabe, Jeff A. Estefan, Ken Laskey, and Danny Thornton. Referencearchitecture for service oriented architecture version 1.0. Technical report, OASIS,April 2008.
[68] Alok Mehta and George T. Heineman. Evolving legacy system features into fine-grained components. In ICSE ’02: Proceedings of the 24th International Conferenceon Software Engineering, pages 417–427, New York, NY, USA, 2002. ACM.
[69] Nikunj R. Mehta, Nenad Medvidovic, and Sandeep Phadke. Towards a taxonomy ofsoftware connectors. In ICSE ’00: Proceedings of the 22nd international conferenceon Software engineering, pages 178–187, New York, NY, USA, 2000. ACM.
76
[70] Danny Miller and Peter H. Friesen. Archetypes of Strategy Formulation. MANAGE-MENT SCIENCE, 24(9):921–933, 1978.
[71] Abhay Nath Mishra. Business strategy and it-enabled business capabilities: Fits,misfits and firm performance. http://auapps.american.edu/˜alberto/IT/Mishra.ppt, 2008.
[72] Glenford J Myers. Reliable software through composite design. Petrocelli/Charter,1975.
[73] Ph. Newcomb and L. Markosian. Automating the modularization of large COBOLprograms: application of an enabling technology for reengineering. In Proceedingsof the 1st Working Conference on Reverse Engineering, pages 222–230, 1993. Expe-rience report using the Software Refinery to build a modularization tool for COBOL.
[74] Liam O’Brien, Len Bass, and Paulo Merson. Quality attributes and service-orientedarchitectures. Technical Report CMU/SEI-2005-TN-014, Carnegie Mellon Univer-sity, Pittsburgh, PA, USA, 2005.
[75] Liam O’Brien, Len Bass, and Paulo Merson. Quality attributes and service-orientedarchitectures. Technical Report CMU/SEI-2005-TN-014, Software Engineering In-stitute of Carnegie Mellon University, 2005.
[76] Liam O’Brien, Paul Brebner, and Jon Gray. Business transformation to soa: aspectsof the migration and performance and qos issues. In SDSOA ’08: Proceedings ofthe 2nd international workshop on Systems development in SOA environments, pages35–40, New York, NY, USA, 2008. ACM.
[77] Liam O’Brien, Dennis Smith, and Grace Lewis. Supporting migration to servicesusing software architecture reconstruction. In STEP ’05: Proceedings of the 13thIEEE International Workshop on Software Technology and Engineering Practice,pages 81–91, Washington, DC, USA, 2005. IEEE Computer Society.
[78] Ipek Ozkaya, Rick Kazman, and Mark Klein. Quality-attribute based economic val-uation of architectural patterns. In ESC ’07: Proceedings of the First InternationalWorkshop on The Economics of Software and Computation, page 5, Washington, DC,USA, 2007. IEEE Computer Society.
[79] Mike P. Papazoglou and Willem-Jan Heuvel. Service oriented architectures: ap-proaches, technologies and research issues. The VLDB Journal, 16(3):389–415, July2007.
[80] Daniel Plakosh, Scott Hissam, and Kurt Wallnau. Into the black box: A case study inobtaining visibility into commercial software. Technical Report CMU/SEI-99-TN-010, Carnegie Mellon University, Pittsburgh, PA, USA, 1999.
[81] Harvard Business School Press. The essentials of strategy. Harvard Business SchoolPress, 2006.
77
BIBLIOGRAPHY
[82] Robert C. Seacord, Daniel Plakosh, and Grace A. Lewis. Modernizing Legacy Sys-tems: Software Technologies, Engineering Process and Business Practices. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2003.
[83] Mary Shaw and Paul C. Clements. A field guide to boxology: Preliminary classifica-tion of architectural styles for software systems. In COMPSAC ’97: Proceedings ofthe 21st International Computer Software and Applications Conference, pages 6–13,Washington, DC, USA, 1996. IEEE Computer Society.
[84] Dennis B. Smith, Liam O’ Brien, and John Bergey. Using the options analysis forreengineering (oar) method for mining components for a product line. In SPLC2: Proceedings of the Second International Conference on Software Product Lines,pages 316–327, London, UK, 2002. Springer-Verlag.
[85] H. M. Sneed. Encapsulating legacy software for use in client/server systems. InWCRE ’96: Proceedings of the 3rd Working Conference on Reverse Engineering(WCRE ’96), page 104, Washington, DC, USA, 1996. IEEE Computer Society.
[86] Harry M. Sneed. Program interface reengineering for wrapping. In WCRE ’97:Proceedings of the Fourth Working Conference on Reverse Engineering (WCRE ’97),page 206, Washington, DC, USA, 1997. IEEE Computer Society.
[87] Harry M. Sneed. Risks involved in reengineering projects. In WCRE ’99: Proceed-ings of the Sixth Working Conference on Reverse Engineering, page 204, Washington,DC, USA, 1999. IEEE Computer Society.
[88] Harry M. Sneed. Wrapping legacy cobol programs behind an xml-interface. InWCRE ’01: Proceedings of the Eighth Working Conference on Reverse Engineering(WCRE’01), page 189, Washington, DC, USA, 2001. IEEE Computer Society.
[89] Harry M. Sneed. Integrating legacy software into a service oriented architecture. InCSMR ’06: Proceedings of the Conference on Software Maintenance and Reengi-neering, pages 3–14, Washington, DC, USA, 2006. IEEE Computer Society.
[90] Harry M. Sneed and Stephan H. Sneed. Creating web services from legacy hostprograms. Web Site Evolution, IEEE International Workshop on, 0:59, 2003.
[91] H.M. Sneed. Risks involved in reengineering projects. In Reverse Engineering, 1999.Proceedings. Sixth Working Conference on, pages 204–211, Oct 1999.
[92] Michael Stal. Using architectural patterns and blueprints for service-oriented archi-tecture. IEEE Softw., 23(2):54–61, 2006.
[93] S.R. Tilley and D.B. Smith. Perspectives on legacy system reengineering, 1995.
[94] Amjad Umar and Adalberto Zordan. Reengineering for service oriented architec-tures: A strategic decision model for integration versus migration. Journal of Systemsand Software, 82(3):448 – 462, 2009.
78
[95] Andre van der Hoek, Ebru Dincel, and Nenad Medvidovic. Using service utilizationmetrics to assess the structure of product line architectures. In METRICS ’03: Pro-ceedings of the 9th International Symposium on Software Metrics, page 298, Wash-ington, DC, USA, 2003. IEEE Computer Society.
[96] Arie van Deursen and Tobias Kuipers. Identifying objects using cluster and conceptanalysis. In ICSE ’99: Proceedings of the 21st international conference on Softwareengineering, pages 246–255, New York, NY, USA, 1999. ACM.
[97] R. Van Solingen and E. Berghout. The Goal/Question/Metric Method – A PracticalGuide for Quality Improvement of Software Development. McGraw-Hill PublishingCompany, 1999.
[98] Ian Warren. The Renaissance of Legacy Systems: Method Support for Software-System Evolution. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1999.
[99] Claes Wohlin, Per Runeson, Martin Host, Magnus C. Ohlsson, Bjoorn Regnell, andAnders Wesslen. Experimentation in software engineering: an introduction. KluwerAcademic Publishers, Norwell, MA, USA, 2000.
[100] Lei Wu, Houari Sahraoui, and Petko Valtchev. Program comprehension with dynamicrecovery of code collaboration patterns and roles. In CASCON ’04: Proceedings ofthe 2004 conference of the Centre for Advanced Studies on Collaborative research,pages 56–67. IBM Press, 2004.
[101] Hua Xiao, Jin Guo, and Ying Zou. Supporting change impact analysis for serviceoriented business applications. In SDSOA ’07: Proceedings of the InternationalWorkshop on Systems Development in SOA Environments, page 6, Washington, DC,USA, 2007. IEEE Computer Society.
[102] Charles Zhang and Hans-Arno. Jacobsen. Quantifying aspects in middleware plat-forms. In AOSD ’03: Proceedings of the 2nd international conference on Aspect-oriented software development, pages 130–139, New York, NY, USA, 2003. ACM.
[103] Zhuopeng Zhang and Hongji Yang. Incubating services in legacy systems for archi-tectural migration. In APSEC ’04: Proceedings of the 11th Asia-Pacific SoftwareEngineering Conference, pages 196–203, Washington, DC, USA, 2004. IEEE Com-puter Society.
[104] Ying Zou, Terence C. Lau, Kostas Kontogiannis, Tack Tong, and Ross McKeg-ney. Model-driven business process recovery. In WCRE ’04: Proceedings of the11th Working Conference on Reverse Engineering, pages 224–233, Washington, DC,USA, 2004. IEEE Computer Society.
79
Appendix A
Glossary
In this appendix we give an overview of frequently used terms and abbreviations.
Architecture: The fundamental organization of a system embodied in its components, theirrelationships to each other, and to the environment, and the principles guiding its de-sign and evolution [49]
ADM (Architecture Driven Modernization): an approach to modeling the moderniza-tion process of legacy systems. It is shown in figure C.13
Architectural description: A collection of products to document an architecture [49]
Architectural view: A representation of a whole system from the perspective of a relatedset of concerns [49]
Component: an autonomous, encapsulated computational entity which accomplishes pre-defined tasks through internal computation and external communication. This defini-tion (2.2) is explained in detail in section 2.1 on page 6.
Dendrogram: a branching diagram representing a hierarchy of categories based on de-gree of similarity or number of shared characteristics especially in biological taxon-omy1 [29]
Design pattern: a design pattern describes a commonly-recurring structure of communi-cating components that solves a general design problem within a particular context.The complete definition (2.6) and its context are explained on page 8
Design principle: an accepted industry practice with a specific design goal. The service-orientation design paradigm is comprised of a set of design principles that are applied
1http://en.wikipedia.org/wiki/Dendrogram
81
A. GLOSSARY
together to achieve the goals of service-oriented computing [32]
Development View: one of four architectural views according to Kruchten [61] that fo-cuses on the software module organization in subsystems, layers and libraries defininggrouping and visibility. This view is shown for the Service-Oriented set of systemsin figure C.6
Effort: the cost of the changes needed to modernize (see definition 3.2)
Framework: a generic architecture complemented by an extensible set of components (seedefinition 2.7)
Gain: the potential value changes deliver (see definition 3.1)
Impact: what needs to be modified in a system in order to make a change or the conse-quences on the system if the change is implemented(see definition 2.14)
Indicator: “Strategy effort indicators and rating aggregation” in section 3.3.1
Modernization strategy: an approach for handling the difference in a Point of Modern-ization and transforming its set of legacy entities into the desired set of target systementities.
PoM (Point of Modernization): an entity identifying a difference between an as-is and ato-be model, which is the base for the effort estimation analysis (see section 3.3.1 onpage 34)
Process View: one of four architectural views according to Kruchten [61] that focuses onrepresenting the runtime aspects of a system such as communication and synchro-nization and non-functional requirements such as performance, availability, systemsintegrity and fault-tolerance. This view is shown for the Service-Oriented set of sys-tems in figure C.5
Rating scale: the measurement scale used to express the effort in the Rating Model of theframework in. It can range from a nominal to a ratio scale and is explained in detailin section 3.3.1 on page 33
SCA (Service Component Architecture): a modeling standard for the architecture of Service-Oriented systems (see figure C.2)
SDO (Service Data Object): a data modeling standard for Service-Oriented systems (seefigure C.3)
Viewpoint: A specification of the conventions for constructing and using a view. A pattern
82
or template from which to develop individual views by establishing the purposes andaudience for a view and the techniques for its creation and analysis [49]
83
Appendix B
MATLAB code used for performingthe experiment
This appendix contains the code listings of the prototype implementation in MATLAB ofthe Abstraction phase of the research experiment.
Listing B.1: Business Model abstraction� �1 % IN : M − t h e a d j e c e n c y m a t r i x o f t h e g raph2 % OUT: e n t r y−nodes == a l l nodes f o r which t h e row c o n t a i n s >0 ONES and t h e column ==0 ONES3 %4 % e n t r y n o d e s = [ e n t r y n o d e i d 1 , . . . . , . . . ]5 % s u b t r e e s = [ s u b t r e e n o d e s o f e n t r y n o d e 1 , . . . , . . . ]6 e n t r y n o d e s m a s k = a r r a y f u n (@( i ) ( sum (M( i , : ) ) > 0) && ( sum (M( : , i ) ) == 0 ) , 1 : s i z e (M) ) ;7 e n t r y n o d e s = f i n d ( e n t r y n o d e s m a s k ) ;8 % f i n d t h e s u b t r e e f o r each e n t r y−node9 Ms = sp ar se (M) ;
10 s u b t r e e s = a r r a y f u n (@( i ) g r a p h t r a v e r s e (Ms , i ) , e n t r y n o d e s , ’ Uni formOutput ’ , f a l s e ) ;� �
85
B. MATLAB CODE USED FOR PERFORMING THE EXPERIMENT
Listing B.2: Visualization of Business Process abstraction for entry-node #5174� �1 M2 = z e r o s ( s i z e (M, 1 ) ) ;2 e n t r y n o d e = 5174 ;3 s u b t r e e a r r = s u b t r e e s ( f i n d ( e n t r y n o d e s == e n t r y n o d e ) ) ;4 s u b t r e e a r r = s u b t r e e a r r {1} ;5 f o r node = s u b t r e e a r r6 row = M( node , : ) ;7 c o l = M( : , node ) ;89 row new = z e r o s ( 1 , 5 3 8 6 ) ;
10 co l new = z e r o s ( 5 3 8 6 , 1 ) ;11 f o r t o n o d e = s u b t r e e a r r12 row new ( 1 , t o n o d e ) = row ( 1 , t o n o d e ) ;13 end14 f o r f rom node = s u b t r e e a r r15 co l new ( from node , 1 ) = c o l ( f rom node , 1 ) ;16 end17 M2( node , : ) = row new ;18 M2( : , node ) = co l new ;19 end20 % prune empty nodes21 i s D e l = @( i d ) ( ( sum (M2( id , : ) ) == 0) && ( sum (M2( : , i d ) ) == 0 ) ) ;22 delMask = a r r a y f u n ( i s D e l , 1 : s i z e (M2 ) ) ;23 delMask = a r r a y f u n (@( i ) n o t ( i ) , delMask ) ;24 M3 = M2( : , delMask ) ;25 M3 = M3( delMask , : ) ;26 % g e t i d s27 i d s = f i n d ( sum (M2 ) ) ;28 i d s = [ i d s , e n t r y n o d e ] ;29 i d s = s o r t ( i d s ) ;30 % c o n v e r t t o s t r i n g31 i d s s t r = c e l l ( 1 , numel ( i d s ) ) ;32 f o r i = 1 : numel ( i d s )33 i d s s t r { i } = i n t 2 s t r ( i d s ( 1 , i ) ) ;34 end35 % d i s p l a y graph36 h = view ( b i o g r a p h (M3, i d s s t r ) ) ;� �
86
Listing B.3: Architectural Model abstraction� �1 % c a l c u l a t e d i s t a n c e s between a l l nodes based on c o n n e c t i v i t y m a t r i x M2 M sparse = sp ar se (M) ;3 DIST MATRIX = z e r o s ( s i z e (M) ) ;4 f o r f rom node = 1 : s i z e (M)5 di sp ( [ ’ f rom node ’ , num2str ( f rom node ) ] ) ;6 f o r t o n o d e = 1 : s i z e (M)7 i f f rom node ˜= t o n o d e8 [ d i s t , path , p r ed ] = g r a p h s h o r t e s t p a t h ( M sparse , f rom node , t o n o d e ) ;9 DIST MATRIX ( to node , f rom node ) = d i s t ;
10 end11 end12 end1314 % f o r m a t m a t r i x t o t h e i n p u t f o r m a t o f t h e l i n k a g e f u n c t i o n15 U = t r i u ( DIST MATRIX ) ;16 U = U’ ;17 L = t r i l ( DIST MATRIX ) ;18 % s u p e r i m p o s e m a t r i c e s19 DIST MATRIX2 = L ;20 f o r row = 1 : s i z e (U ) ;21 f o r c o l = 1 : s i z e (U ) ;22 i f ( DIST MATRIX2 ( row , c o l ) == I n f )23 i f U( row , c o l ) ˜= I n f24 DIST MATRIX2 ( row , c o l ) = U( row , c o l ) ;25 e l s e26 % r e p l a c e I n f wi th a r e l a t i v e l y l a r g e number27 DIST MATRIX2 ( row , c o l ) = 1 5 ;28 end29 end30 end31 end3233 % do t h e h i e r a r c h i c a l c l u s t e r i n g f o r t h e dendrogram34 DIST MATRIX3 = s q u a r e f o r m ( DIST MATRIX2 ) ;35 TREE = l i n k a g e ( DIST MATRIX3 , ’ w e i g h t e d ’ ) ;36 [H, T ] = dendrogram (TREE , 0 ) ;� �
87
B. MATLAB CODE USED FOR PERFORMING THE EXPERIMENT
Listing B.4: Business to Architectural model mapping for all granularities� �1 % c a l c u l a t e t h e s c a t t e r i n g f o r a l l p o s s i b l e c l u s t e r i n g s2 SURF MATRIX = z e r o s ( 5 3 8 6 : 1 0 2 6 ) ;3 f o r g = 1:53864 di sp ( [ ’ g= ’ , num2str ( g ) ] ) ;5 [H, T ] = dendrogram (TREE , g ) ;6 p r o c e s s t o c o m p o n e n t m a p p i n g = a r r a y f u n (@( p r o c e s s ) a r r a y f u n (@( node ) T ( node , 1 ) ,7 s u b t r e e s { p r o c e s s } ) , 1 : 1 0 2 6 , ’ Uni formOutput ’ , f a l s e ) ;8 s c a t t e r i n g o c c u r r e n c e s = a r r a y f u n (@( p r o c e s s ) numel ( u n i qu e (9 p r o c e s s t o c o m p o n e n t m a p p i n g { p r o c e s s } ) ) , 1 : 1 0 2 6 ) ;
10 SURF MATRIX( g , : ) = s c a t t e r i n g o c c u r r e n c e s ;11 end1213 % c a l c u l a t e t h e d i s t r i b u t i o n o f t h e s c a t t e r i n g14 SURF MATRIX2 = z e r o s ( 5 3 8 6 : 4 1 0 ) ;15 f o r g = 1:538616 map = SURF MATRIX( g , : ) ;17 d i s t r i b u t i o n = z e r o s ( 1 , 4 1 0 ) ;18 s = s o r t ( un iq ue ( map ) ) ;19 f o r i = s20 d i s t r i b u t i o n ( 1 , i ) = numel ( f i n d ( map == i ) ) ;21 end22 SURF MATRIX2( g , : ) = d i s t r i b u t i o n ;23 end� �
88
Appendix C
Figures
Figures-I. Software Architecture (section 2.1)
Figure C.1: Hierarchy of structural paradigms in software architecture
89
C. FIGURES
Figures-II. Service Oriented Architecture (section 2.2)
Figure C.2: Service Component Architecture (SCA)
Figure C.3: Service Component Architecture using Service Data Object (SDO)
Figure C.4: Data Access Service (DAS)
90
C. FIGURES
Figures-III. Renaissance (section 2.3.1)
Figure C.7: Incremental Process model
C.8: Figure C.8: Portfolio analysis
Figure C.9: Data flow diagram for evolution modeling
92
Figure C.10: Renaissance - Evolution strategies
Figure C.11: Renaissance - information sources for performing modernization
93
C. FIGURES
Figures-IV. ADM (section 2.3.2)
Figure C.12: Risk-management modernization approach (RMM)
Figure C.13: ADM Transformation types in the horseshoe model
94
Figures-V. SMART (section 2.3.3)
Figure C.14: SMART process for the identification of a modernization strategy towardsSOA
95
C. FIGURES
Figures-VI. Modernization (section 2.4)
Figure C.15: Architectural goal of the modernization towards SOA
(a) Legacy system composition (b) Legacy system components relations
Figure C.16: Legacy systems components impacted by modernization
96
Figures-VII. Effort Estimation Framework (section 3)
Figure C.17: Impact domains involved in system modernization (ADM)
Business Application/Data Technology
Concepts ● Business Rule● Business Process● Business Activity● Business Object
● Component – exposed interfaces – type classification (layer, concern, relative level of
granularity ) – dependencies/constraints = on data types (used variables) = on finctionality (called components) – implementation technology binding
● Connector – connector class – intrface A – interface B
– embedded functionality (synchronoous / asynchronous) – technology binding● Interface – data input/output types
● – control input/output types
● Patterns – layers; pipe-and-filter; blackboard – Facade, Mediator, Singleton, Abstract Factory, Bridge, Decorator, Adapter, Proxy, Command, Chain of Responsibility, State, Observer
Technology stack – OS – DBMS / App Server – Middleware – Frameworks / Libraries / Runtime
environments – Build- / Test- / Deploy facilities
Data Sources Application/Data Architecture
● Module dependency graph● Data flow graph● Call graph
Deployment diagram
Architectural Views
Logical View ● Development View● Process View
Physical View
Table C.18: Framework entities for modeling the legacy system
97
C. FIGURES
Business Application/Data Technology
Concepts ● Capability● Business Process● Business Service● Process Choreography● Service Provider● Service Consumer ● Service Mediator
SCA concepts + ● Service contract/interface● Simple/Composite service● Agnostic/Specific Service● Logic-, Entity- and Utility Services● Service Task● Resource● Service referenced object● Service owned object
● SOA recomendet technology stack
Data Source ● 8x SOA Design Principles● Requirements● Industry model
● 8x SOA Design Principles● Requirements● Industry model
● 8x SOA Design Principles● Requirements
Architectural Views ● Development View● Process View
Table C.19: Framework entities for modeling the target SOA system
Separation of ConcernsSatisfaction of Concerns
Identification Componentization Integration
Architectural View
Based on: META_I – LegacyMETA_I – SOA
● Development View● Service Component Architecture ● Service Data Object
● Process View● Service Component Architecture● Service Data Object
Quality Attributes View
Concepts ● SCA_Composite● SCA_Component● SCA_Property● SCA_Service● SCA_Binding● SCA_Reference● SDO_DataGraph● SDO_DataObject
● Entitty Service● Logic Service
● SCA_Wire● SCA_Service● SCA_Reference ● SDO_DataGraph● SDO_DataObject
● Utility Services
● Communication capability● Dynamic connectivity● Routing capabilities● Endpoint discovery● Integration capabilities● Transformation capabilities● Reliable messaging capabilities● Security capabilities● Transaction capabilities● Management & monitoring● Scalability capabilities
StrategiesWHITE-BOX:
● RESTRUCTURE: keep Composites, Services, Properties; change Components, Wires● REDESIGN_WITH_REUSE: keep Components; change Composite and Wires● REPLACE_NEW
BLACK-BOX: ● WRAPPER: change interface and/or Binding● ADAPTER: data; logic● REPLACE_COTS
● ADD● REMOVE
Impact indicators SOA Design principles – Reusability, Statelessness, Autonomy, Composability, Coupling
SOA Design principles – Contracts, Abstraction,Composability, Coupling
● SOA capability availability● Architectural style choices● SOA metrics
Table C.20: Framework entities for modeling the modernization process
98
Dev
elop
men
t Vie
w +
Pro
cess
Vie
w
Rat
ing
Mod
el
#7+#
5re
uses
Arch
itect
ural
Mod
el +
Inte
grat
ion
Mod
el
Serv
ice
Mod
el
Infra
stru
ctur
e C
omp
Inte
rface
2*
Con
nect
or*
1Le
gacy
Com
pone
nt Bus
ines
s C
omp
Bus
ines
s A
ctiv
ity
Bus
ines
s Pr
oces
s 1*
Busi
ness
Rul
e
*
Bus
ines
s O
bjec
t
*1
*
Ser
vice
Gen
eral
Bus
ines
sIn
form
atio
n
#3+#
5
Ser
vice
Com
posi
te
Ser
vice
Com
pone
nt
impl
emen
ted
by
PoM
Stra
tegy
Com
plex
ityIn
dica
tor
Con
cern
#1
#2
#3
#4#5
cont
ains
cont
ains
cont
ains
Busi
ness
Mod
el
#6
#7
han
dled
by
#8W
HIT
E-BO
XB
LAC
K-B
OX
*
1
*
1
**
11
1
*
**
**
**
Inte
rface
PoM
Con
nect
orP
oM
Com
pone
ntP
oM
......
......
1
*1 *
#9
Figu
reC
.21:
Fram
ewor
kre
latio
nshi
psbe
twee
nm
odel
san
dth
eco
ntai
ned
entit
ies
99
C. FIGURES
AS-IS System
AS-ISBusiness Model
TO-BEBusiness Model
Res
truc
ture
AS-ISSoftware System Model
TO-BESoftware System Model
AS-ISTechnology Model
TO-BETechnology Model
Rev
erse
eng
inee
r
BusinessRequirements
Software System
Requirements
TechnologyRequirements
#1 #4 #7
#2 #5 #8
PoMModel
PoMModel
Gap
Ana
lysi
s
#3 #6
PoMModel
#9
Impact Propagation
ImpactPropagation
Forw
ard
Engi
neer
Modernization Strategy Choices per PoM
Modernization Strategy Choices per PoM
Modernization Strategy Choices per PoM
Overall Modernization Strategy Report
#10 #11 #12
Stak
ehol
der D
omai
ns
TO-BE constraint propagation
Impact Propagation
TO-BE constraint propagation
Impact Propagation
#13
Figure C.22: Framework - impact analysis process100
Iden
tific
atio
nIn
tegr
atio
nC
ompo
nent
izat
ion
Mod
erni
zatio
n R
atin
g
ADAP
TER
WR
APP
ER
CO
NC
ERN
SLE
VEL
IND
ICAT
OR
SLE
VEL
STR
ATEG
IES
LEVE
L
WH
ITE
-BO
XB
LAC
K-B
OX
CO
TS
Logi
cD
ata
RE
DES
IGN
RE
STR
UC
TUR
ER
EPL
AC
E_N
EW
Sat
isfa
ctio
n
1. E
FFO
RT
2. G
AIN
CO
MP
ON
ENT
/ CO
NN
EC
TOR
BU
SIN
ES
SA
PPLI
CAT
ION
TEC
HN
OLO
GY
APP
_LO
GIC
INFR
AS
TRU
CTU
RE
Per
cent
age
of c
ode
that
has
bee
n cl
assi
fied
Per
cent
age
of c
ode
that
Is in
frast
ruct
ure-
rela
ted
SO
A ca
pabi
lity
SO
A qu
ality
attr
ibut
e
AD
DR
EM
OVE
Num
ber o
f cro
ss-c
uttin
g po
ints
size
of i
mpa
ct
dom
ain
degr
ee/ty
pe
of c
oupl
ing
degr
ee o
f sca
tterin
g(L
evel
1)
inte
rface
-to-im
plem
enta
tion
coup
ling
inte
rface
co
mpl
exity
arch
itect
ural
mis
mat
ch
Figu
reC
.23:
Fram
ewor
k-R
atin
gM
odel
101
C. FIGURES
Figures-VIII. Experiment (section 4)
Experiment design (section 4.2.5)
f3()
f2()
f1()
Code
Business Model
Figure C.24: Business Model extraction and traceability to the code level
f3()
f2()
f1()
Code
Architectural Model
#2
Figure C.25: Architectural Model extraction and traceability to the code
f3()
f2()
f1()
Code
Architectural ModelBusiness Model
#3
Figure C.26: Business Process Model mapped on the Architectural Model
102
Experiment instrumentation (section 4.2.6)
Extraction Abstraction
COMPONENTIZATION [INTEGRATION][PROCESS – Business Domain]
[PROCESS – Application Domain]
Restructure
RatingAnalysis
[PROCESS – Technical Domain]
ServiceModel
EnvironementModel
Architectural Model
COMPONENTIZATION [INTEGRATION][IDENTIFICATION]
Business Model
Application Model
Integration Model
Visualisation
Development View
Process View:SOA-infrastructure components
Rating Mdoel
Figure C.27: Overview of the experiment software prototype design
Experiment validity (section 4.2.7)
GAINEFFORT
Flexibility
ScatteringGranularity Conclusion validityInternal validity
External validity
Observation
Theory
Cause construct
Construct validity
Construct validity
Effect construct
Treatment Outcome
Figure C.28: Experiment validity (adapted from [99])
103
C. FIGURES
Experiment operation (section 4.3)
Type1 ID1 Name1 Label1 RelNameUser Type2 ID2 Name2 Label2 RelName2PROGRAM 2 A1DBX3 A1DBX3 Calls Program thru Decision DECISION 6 [email protected] A1DBX3.Calls.OU-CE-FIN-OU-ID-SQL-SEL-F ResolvesToProgramEntryPROGRAM 2 A1DBX3 A1DBX3 Calls Program thru Decision DECISION 7 [email protected] A1DBX3.Calls.EDALVZ2-PK-SQL-SEL-F ResolvesToProgramEntryPROGRAM 2 A1DBX3 A1DBX3 Calls Program thru Decision DECISION 8 [email protected] A1DBX3.Calls.LOCA-EXCHG-RATE-SEL-DRV-F ResolvesToProgramEntryPROGRAM 2 A1DBX3 A1DBX3 Calls Program thru Decision DECISION 9 [email protected] A1DBX3.Calls.DEC152-TO-CHAR-CVT-F ResolvesToProgramEntryPROGRAM 2 A1DBX3 A1DBX3 Calls Program thru Decision DECISION 10 [email protected] A1DBX3.Calls.CHAR-RIGHT-ALIGN-DRV-F ResolvesToProgramEntryPROGRAM 4 AA01A AA01A Calls Program thru Decision DECISION 11 [email protected] AA01A.Calls.EDAVVH3-PN-NAME-SQL-SEL-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 12 [email protected] AA0BW1.Calls.OU-BUS-FCT-OU-ID-SEL-DRV-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 13 [email protected] AA0BW1.Calls.OU-CE-WL-STOCK-SQL-SEL-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 14 [email protected] AA0BW1.Calls.R-ACWSXY-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 15 [email protected] AA0BW1.Calls.FU-ADDR-SQL-EXIST-VAL-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 16 [email protected] AA0BW1.Calls.FU-ADDR-SQL-EXIST-VAL-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 17 [email protected] AA0BW1.Calls.OU-NONCE-SEL-DRV-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 18 [email protected] AA0BW1.Calls.OU-BUS-FCT-OU-ID-SEL-DRV-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 19 [email protected] AA0BW1.Calls.OU-REL-VW-EXIST-SQL-SEL-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 20 [email protected] AA0BW1.Calls.OU-CE-WL-RV-SQL-SEL-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 21 [email protected] AA0BW1.Calls.OU-NONCE-SEL-DRV-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 22 [email protected] AA0BW1.Calls.OU-CE-WL-STOCK-SQL-SEL-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 23 [email protected] AA0BW1.Calls.OU-NONCE-SEL-DRV-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 24 [email protected] AA0BW1.Calls.OU-BUS-FCT-OU-ID-SEL-DRV-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 25 [email protected] AA0BW1.Calls.EDAFVB1-ENTRY-SEL-F ResolvesToProgramEntryPROGRAM 7 AA0BW1 AA0BW1 Calls Program thru Decision DECISION 26 [email protected] AA0BW1.Calls.R-AA7NX7-F ResolvesToProgramEntry
Figure C.29: Excerpt from the call map export of the subject legacy system generated withRelativity
8 9
11
12
150 887
932 936 937 938
1878
Figure C.30: Business Process for entry-point node #12
35
38 40
42
49
58
64
166
170
206
390
759
814
955
1001 1005
10101012 1013 1039 10401041 1057 1061
1064
1066
1073
1868
18691870
1875
187618771879
1880
4749
Figure C.31: Business Process for entry-point node #49
104
C. FIGURES
53185361534349965355534251784999535051881543 936154453475297154851035240530852845280533153031545141715471542 8 11 97 145 9 932 12 150 887 937 9381419538153225317149953735319531152961483501852931507149814975131149214871486 1875167 1861541154653215320528352381495500950195010 1
4764 73329982999 253 2541118112035483549491053335044533211131115111411321134 8048054806 754 755476248111378222222232224221622252226480748094818481948024828482948034817 757 758 7611101 3963992 3983993212121222123212621273547356635673558355935543555 2 330 407 653 266 276 238 258 245 283 301 416 419 465 271 320 325 305 947 949 844863 310 313 334 57 649 540 7604872 794487448734875 792 803 805 110 8061724 520 784 798 80017231166 450 504 60 92 775101921042105 485 528 492 511 519 5231016 542 226 813487148704876 863 76311371214 287 339 383 366 371 376 303 304 5355012 6421554 644 601 640 76913341379134213431355 773133013741366137013561362133313581377 120 315 319 324 678 293 74810211060 99510041028 2241043 2251024 668 999 777 78148691008 7791035 693102510231022104410321045104610331574 318 3221047 768 81210591048 815 81010381056 208 220 454 699 747 814 698 700 658 659 718 6574893 683 734 694 660 735 661 695 685 713 716 6 7 14 829 917 17 165 185 62 671387 9071385139052285295 31619061607166410851663166515631605167216681602156416941692169316011560169816701671186316993421372235103732342234231027140714054771477513894766477416661667 460189218931890189118591896189919001908190919051907189418951897189819691975199119891990188418851888188618871889190221552158216021592167216821652166215721642169 10 16 193506350655073 413 41525302531 862 865 894 897 901 902 408361436155156 41036063613361636173618 414 449360936083336 440 4383607361036113612 41 697 43 59 212 421 663 691 802 822 405 500 570 572 593 587 5901015 522 524 529 5573331348836584175417741764155416441744178418341844181418241794180416541694172417341704171416641684167398340044003398440053986400139853996398939903987398839914152418841924195419841994196419741904191399741934194399941894134413541364145413241474146413341434144414141424137413840464063406240504053405640574054405540964149410441074110411141084109409741004139414041024103409840994101411241134105410641144115401340584024402540684026403340354059403440664022406540644023402740484049403040314028402940324036404140424039404040374038 4412381250124312451232124812361239124713351251 21118624466 40618581854185539984047405140522252214621562161216321472162 72918564847 730 873161447495264475152671455146114561463 2611868531218694769187049722129 81620132066211521482149215021512152343934513440 68238124765477618571861213021312143213221332144 321 412 417 448 459 461 129481621852329241112645023499450074274427542734278427942764277427242804283428442814282 870426942704271 880 393 597 612 958 959 963 96247604804 952 960 9531696169717031155116111621157115811591160 614 619 620 62314721474192119851988192219241923198019921982198319861987 8661961196419711963196019701965197219671968 503518351939793980 5435283529 489 531 83939784060406139813982 241 909 904 2474095 249 906 285 891 900 918 279 911 881 883 326 328 896 332 892 922 945 262 934 928 272 291 311 307504219761977 488 491409040934094 498 514 517 1272092209621112112211320242025209349302097208220832175210021012084208549402095 40251775060501621532154217021714985499153001993199420982106210721082109 188 189 190 70734923441344233423343457445754576459845993443345234444588458945904608460934453446459145924593461046113403340446064607460246033806458045814582460446054583458445774578457946004601362417015323170234543459345534703471495451954968496049584955 13 135 88920152016 56214591462 28 78 160 895 83 94 103 20 29 1124781 27 32 98 340 100 1411036 96 143 124 87 1074787 95 9931018103716041606 133 13922002212 45513731363137510495182521613241361136013711367136813651369136413724903490449055091490949084906154952335236523552341551155052275239523252305229 154794 858 126 411 74213401341 137 168 171 18 948 3947854784479148244825 374786 943 864 940 882 7447894788479047784779478047824783 153 1554797479847954796 33 114 11647924793 65 893 13436302198 29548154830484248464832483448494957483148334841484348524853481348144836483848354837 2104770484848514850476747684844484548394840 878 879 88414044777 96152815175 969 971 97449765302530150875137121515865021503248824891503153345072518652315033518752435249498851685089500150565226492552484929492849324937 306 868 840 44152425247142153465244 91347074708144014451446145349755011518451935325144714484939510751285130509253775329511051095096510051015241525352515102518553785190 2148584861 53 851 853 497 823 811 8092027202848564855 4848594854 117 122 123 11548574862 723 45 200 239 240475747581426 246 521 656 631 243 273 244 275 259 646 647 628 634 598 651 655 55314134914533535903591 174 182 179 1831030219443224323 778 175 181 184217221732176218021782179221345474393439448924894188136754759365036521883218821892174218721952196218421812183488548892192219320992190219120142202218620882102211021032086208750942089209020912094288929062890288729052888 118441344144415442244234335433620183799201938004489376237643798380537963797440744081164432443255255441144174412432943304409441044054406432643274390439237933794500552455121497343204321 3551940193119433502521319101911145114521458148514941496148848011490197819841979198111383526352735243525456345664565338836993389370045623701 148464346424639469046891146114734494688463246334634463746444645464646404641 807120611811186120711824678469311401167119011841197118812011187120411431144117618204677468146384674468746514684465246824683467946804675467615522248228222832241488122472275186450951865186750491866233623872404238822042246220622492251223022612284226222052285220748954896228622882287192919331935193419381930193219361939193722032210221122762277 2972145 595 74348605152 86136313632 365187436513647364847024703 42218721873 599 48021773579 394 9701229363336344378437943804382213721382140213921412142 715365436603351225036594822482336533655367036713668366936563657117534753663366636673352364936773676366136643665327732993283327932963297328032923289329832913282329332903278329432953281328433004612461346144622462346214624461746184619462046254626462746294628 724 7511179 7521195 725 72811711189120211831196119811681172116911851174117812191180119112171230 615 61744494451445444574476447744624463446044614452445343314332 624 626445844594450447144554470446944684456446744654464 242 506 490 512 966 509 539 494 544 554 551 888 555 89010682077207820712072206820732074207520762067208120692070207920802022202620232032206520292030203120552033206020592034206120352062206320402041203620372051205220382039205320542047204820442045204620492050205620582057 5485054 55050485057142250514949 5585052 5594951494849475053505050475046 222 248 26543574358436443634365436143624372437043714368436943664367252625892590258825912593259225945078509850822294229823222323229723212300229523012299233123022319230322962304230523202306 73942094216488342104212421942204214421542224223422442064207420848804211423342174232423042294234423118604377334443754383437343814374447444752405453624064376421342184221431543164317431844874488381638173818382138253822 63034333434371052375176343543283436 76234253426 81934533468346934643465345634583457346034613462346334663467 998146610001468145414571469146414651914191519171916440445101918191919201958220119741959195619731957229122922293233023072310231123122308231323142315230923322317231823162395239623282416241724182420232724122413241424192415232423972364236524072408236623672368232523392363239924002369239823942326240123382425242824262427233323842335238523372409241024022403238623892390239123922393 343 361 377 933 379 375 384 388 39118091810294129592960296229442961294529632965296429422943180129031802290229042907297129082967296629682970297229392940 965 968289528942897289629272946292929322931293329282938293028912893289229202921289829012899290018461847185018512429290932613262291829242919325732583259326029162917291229132910291129142915292229232334234223592372237323742340235123492350242124222341234323572344235323452355234823522358237023792378234723562360238323822346235423612381238023622377237623712375 9541107110621162117221722782279228122802227222822332234222922562231223222352237223622382270224422712245227222543431343222592267226022682269225722582239224022532255224222432118211949424945 46357735885345 101 9311002165935753586534151615173516351624965 574 92410811092 602 635356340833564356536253626 158 159 169 172 2013573 541487948983585533953381148359435951083108749314933534849353574533753363581358235833584141253444971 356358935783580 360 403 404150915361510152915201521153451061522153215281531502850551533152350261524153050155014152615274977510549815134513649875027497849844983507650835024503050295025503750365034161151115279510850905077516451145113510415581561156815701565156715661621527652781559156215691571164252601643525952615270526916151644163916301637163816311645164916571647164816201646165616541653165016511655165253095314531353165315529052925291529952985294 55107610891091 842 526107510793571 2341072 513105410701071 1764900 119 121 125 162 368 370 99 629 632 675 671 673 677 2741069 830492251921014101737683769143637783779378037811442144414432017378637873784378537883789108610881437 780 78252075224522352185380525852225220 195 19835503551 764 766111611031104 3894334 8674558 886 912345045674597 859 86044724473 898 899 579 58210031007101145574564 71122092208111011114572457345704571 252 335 9961026 942 944 9503686368711091122 5473598 596415410981099357053403572 399 838 5803568456935691058 5611108 5831112 771 790 785 804 786 787 791 793 799 795 796 797 236370253763836 401384738483837383938403841 836 84938383844121847635208 5921228121612201263 348 3504306430743014302 439430838313834383538323833 54542444245 5773829500216101628 3533619362736283483356236225351 648 652 45735213520339833963416372834813485 606369233613693369036914544454533995117352235233684368937823783 7023479347834743473347234763477368536883814382838263827 4184516 44345174518339245044521452945054506339445353790379138013802 828 834 832 845 921 847 920 831 8334756 850 44426711473265826622677267814092670267326721408267426691410266026752676 55637671142155333743375 8272657455945614560 903 905 908 910 923 946 941 9291871 6084444 60944454446219721994447455444484555455633973773381337703771377437725386381038093775377638113777150839533954395539703956396539673966396239643963395139523957396839693960396139583959 994 99710201034343711451153339534383642364336443645364616091612161616135266526816171634163316415262526352735265161816191622162916321623162416255274527552715272353135343535353235363537353335413545354635433544493851155118512251295123499051255368536653575124536353625132 3 39052823480 543 206106610951420166016611691 486 493 5844424443444254426151151815211 76 77 11141594162416341604185416141864187 18014241439142514381338142914311434142314321441143314351339 81363938685067 323 3803641364021824867 7262128390539063907 85 88 109415340824006 835 4364069196219663869407740784091412941304131 496 49940844092407140724018407041214122 525 52741164016401941174118401740204021404340444045 857359936043605 856415041574158386738703871393139323933389138923893 433 435 2218784812 29619012114 131 1781094109737033704 177 463 349 8762218221922202221 3445159 3781882108210901096 4463302467146724673 973126112691274127512701272128412851273128612811279128312821276128012891288128713521875350917791313134413261332134513471346134813491351135013251329122112241226401440674015 223 232 233 102 104 136 128 130 14411301131 213 132 156 138 151 157 209 5031006 278 2841084100935133515494348865210 7324884 147 149194119531945194919461951195219481954194419551947 108 105 1731262 164 167 300363814061277169036373672 877160813191328 575 604 576 578 581 534 567 537 5694997 770 774 885 776 78840874088 290 294 23272327241579271927262722 79 86 2192698520031965201 221 855 83726672668 98626881450 9872704 9882700264326992697270326832684244826872449264226802681268227012702275027582760275127522759 2352538256825693195257025662567 204 22918172513309625142631297329762969297429772978271127122738 216270827062748274927152717271627252718 453273127292435243424361518262924372439243825443033254515192430243127622763263026332632153727282730322627052707271327141535311531163085308430863087311430923121309331263129312831271848307918493106310731083109308031103081308230832634263726352636284828622861297547502853285628542855284928522851285028602517251827612765276727642766 227264426482649264649442645264726852686 927 930265026512735273749532736148914911525287928812878288028742875287628772864270927103071307228632865325532563253325428662867286828693251325215403057307030603036304230583050306930513059303730403041303830393066306830673044306430653055305230633053305630541415141631983199153932323233320232033200320131973225322732103211322832293230323132043222320532213223323432353245324629923214299332133216321532243244323632373212321732203219321811331135280628232796288228832797280528132814282428852825282631403177282728302831282828291538316131623188315531563187318931483154315231533151316331643157315831593160314931753150318031823181288428863165317931843186319131923176319031783166318527882789280728082809 1911733174017321736 20717371730173117341735173817391741174417431746174717511745174917501752174217481762183418351753175417561755175717581761175917601233123418362841244512401237284028432842130013101307130813161337130213111305130313061844178418111812184517891795184117801782178117831785123112951297129012981291129912961318124612491427150315041505150615141515176917781842184317631765177417701773177117761772177718041807180518081803180627702771278628032787280228043269327032653266277227733267326832633264283228342833283528372836277427752782278327782779277627772780278124702489248824712472247524762473247424772478 140 2682732273329253032292615161517252125392573257426192620 250 308 363 309 312 314 317 397 2024761 358 367254826172610261625472615 329261425462572249624992612261325002561250432473248 964 96724972501257924982502251225032580261126182515251625402542255625622560255725582541254326212622255925632600260326012602257825842583258525952596 975250625052525 99225232527 976 98325082528252425072529 984 264 288 352 985 980 333 341 3461484 990 981 991 359 982 989 280 282 423146751575189500850205180521950975138507450755088515851965286528514715148515051661500503950175040497452525035504150005038516050435197534952035199519811771357493449365225525052545256525752771252125412533094313247553133454631343095309830973099310231013100311131133112312231253124312317962981300224462447300330053000300129822983300430082984299129853010301130063020300730093013301830123016298929902994299629951255125712591260206441514752400241484156475426272628125612581593262526262433284428452846313528471592262326241729243217051706475352895288528753075310495649594923492429572958295529562953295431193120295129523117311818131814182918181819 192 197431943334478 199 251449444954492449344904491 738 765 841449844991598159916003830439544163765379243844387376643984399439144964402440344004401115136621575157615881589159049664967439644974397 5053741371337393748376137353745 8691042374637553759538537363714374337183747375434004479448036793815375337603673367436824701368346944709471047114700473315951597370653533712517937093730371151544980498649695215505817251728172653285327172747064728472944404441438543884730473146984699469547234724472147224726473247273725 27024802481247924832482 338302730213022303030313028302930233024 801 825 926 935 818 820 826 919 925 35 49101010411065 759 401063 641064 38 42 17010011876 16610671877 97711541555100510731013107810291050105210311222 9551012107415561557104010611057103910551051 5810621722170417071721172017161715171916881673171716751678168716821679167616891686109316581677171817101681171216801714171117131684167417091708 264864 749 750 821 808 3871139 767 7464868 61 8938753910391239113919 215 21739083909391739183915391639133914 90 913896389838973920 228 23138943895390339043901390238993900 196 292 3694000 327 331 336 337 277 2811502 289 302 362 364 386 736 342 357 392 372 37437954073407441254126411941204127412841234124407540762456246124642462246324572460245824592486248713151323138613881327140313171396132013361392 442 471 469 470 502 466 481 473 475 484 468 507 482 510 477 478 479111711191121316731833170316931683193319444274428443744354436443244334429443144304438443946304631466046634664466146624658465946914692199819992000200120112002201220032004200920102007200820052006393439353940394539463943394439413942393639373939393843394340434543494350434443484355435343544351435243464347 218 257 256 424 267 2693807380845864587 674 600 603 645 650 670 627 63645944595 633 65433903391 672 705 688 680 689 719 737 690 720 741 74045504596458547474748 721 731 722 684 676 679 687 2553635492736365116127836003973 5304012169516274085 710 714 681 706 709 717 704 708 71248264827 546 552 589 591 571 594 573 2143884 565 566387438833885392139303926387238773882388638893890388738883880388139273879392338733876 82439223878 8443924 523587 605 823576 54911001102392539283929 430 4313487362936213620 237 2863348 5643514 516 518 843375833474533454945224530332033493345 532 568 563 533 536 538 4514539373445284532375745094523455145241594159645344543 852333833394537453845484704470547134714471947204717471847154716369753583698 56117322142215330133403341 194 263 351372137204531 20353791271 38233293861330333043362341553823393330533064519452043864389338433853695348234843729 400333716623365538336803372337334863742349137563737 643351235113862 8173489334633103354332134194982499250855093333038423843 639 6411414141843034299430050065174429816854296532433093350330837513727332833593360 97236963694374450035004 5603353333433353332333312231225 735330 7533143315 7047254507455244834484160316693763524616831587 69647124540454133553356336434203417456851913427342834303429338033813715372637524508452545263382338334183723349335053506368137333716538444814482 607 610 6213494 611 6133307336333115371537033124304430553673495537553695365340134063402112911363496349734983501353034993500 48334093410340734243408187918804979451133681460431033264311332743094235431242384239423642974237424335923593 2054485448635033504 345374037313678373837084920511214111482162648874899515515133490370753645356371937505372535415123319535937495374338633873717535239473948370553264989384552023846424149074911491551195120 9511200 9561192499311931203 957119449121213491650225013121133573358362335173516 420 447331833693313 428 429 872 8753324332533163317452745424553469646974514451544204421441844194313431437243376337734133414 501 586 464 585 467 5883370337133783379340535073508332233234512451317004743474449021152115642405045514150845081507150685061506950705360514751492652265321342135213625362537 665 6672484248548774878 637 6382450245124652466 347 373 354 939 3951927192849264998497018151830183125342535 47 515086509951264995508050795127397639771828183718382509251042634264468546864250425142004201 298 2992452245342894290 161 16347414742 93 14650595066506450624356435943603025302622632264281628222818282038193820 701 7032492249318271839184013911397140113931394140213951912191343374338513551531105355635571925192642674268 432 434194219501293130144424443114911504655465646574665466615771585513351445151514651455139514351425140516513141359255425552936293730473048265626592661267911413994399520422043328532872511252217881794126712684649465013041312 4744913 476 508 5151852185326892690474547462519252014771478227322744897490126912692157215733971397227682769313831392020202128382839 106 14222652266320832094635463629472948112311261127112815912720272727212423242449174918314531473146380338043141314230883089 24 2534113412425942603447344846154616394939503090309151695194521427982800428742885170517252055217520452065212520951711903190431303131 753 78351835221 381 38526402655264138493850353835423034304324902491 1134866 72748652949295011991205424242534255425242541797301717982987301929881799301418002986301529974341434243434261426225522571138113834227422826632664 871 874 19 914 34 36 30 31 63 66 68 69 71 72178617871791179217901793 4523602108036013603 616 61825322533 230 2601242124412651266 744 7454963496414491470176417671768 437 456 462 458 409 44529342935105310774202420338593860307330743560356146474648323832403239263826542639228922902753275527572754275612271321132238573858 772 78932713272199519961997473847394740353935401821182518264919492140794086408040814089428542862608260949504952279027912794279528122792279347344735 848 85425982599 487 495244224442443 978 97924402441382338243249325030773078 622 625466746684500450138513852258625872604260524942495269526961380479948002734274712081209 4721235124127432746255025532744274548204821 5 916530453055306 4 915359635973366336738533854306130622581258247364737328632882799280139743975 686 75627842785281028112741274225772597494149461163116511701581158442044205465346542606260727392740130913981399140029792980256425654256425742582120212421254248424931733174426542661124112530353049 152 154304530461331135313761354310331053104 425 426 427 662 664 666 669 692317131723863386418161832183342254226385538562815282128172819386538661428143031363137163516361640285728592858287328702871287246694670320632074772477312101212 846148014931501147914813143314430753076266526664808481017661775129212941578158240074011401040084009450245032693269432753276158015832454245513821384324132433242147514764961496242914294429542924293182218231824257525764246424725492551355235533273327424672469246848884890
7 8 9 10 11 12 13 14 15
node ID
clustering index
FigureC
.33:ArchitecturalM
odel-complete
hierarchicalclusteringdendrogram
oftheL
LS
RequestH
andlingsubsystem
106
5318
5361
5343
4996
53555
34251
78499
953
50518815
43 93615
44534752
971548
5103
5240
5308
5284
5280
5331
5303
1545
1417
1547
1542 8
11
97
145 9
932 1
2 1
50 8
87 9
37 9
3814
1953
8153
2253
1714
9953
7353
1953
1152
9614
8350
1852
9315
0714
9814
9751
3114
9214
8714
86 1
8751
67 1
8615
4115
4653
2153
2052
8352
3814
9550
0950
1950
10
147
64 7
3329
9829
99 2
53 2
5411
1811
2035
4835
4949
1053
3350
4453
3211
1311
1511
1411
3211
34 8
048
0548
06 7
54 7
5547
6248
1113
7822
2222
2322
2422
1622
2522
2648
0748
0948
1848
1948
0248
2848
2948
0348
17 7
57 7
58 7
6111
01 3
9639
92 3
9839
9321
2121
2221
2321
2621
2735
4735
6635
6735
5835
5935
5435
55
2 3
30 4
07 6
53 2
66 2
76 2
38 2
58 2
45 2
83 3
01 4
16 4
19 4
65 2
71 3
20 3
25 3
05 9
47 9
49 8
448
63 3
10 3
13 3
34 5
7 6
49 5
40 7
6048
72 7
9448
7448
7348
75 7
92 8
03 8
05 1
10 8
0617
24 5
20 7
84 7
98 8
0017
2311
66 4
50 5
04 6
0 9
2 7
7510
1921
0421
05 4
85 5
28 4
92 5
11 5
19 5
2310
16 5
42 2
26 8
1348
7148
7048
76 8
63 7
6311
3712
14 2
87 3
39 3
83 3
66 3
71 3
76 3
03 3
04 5
3550
12 6
4215
54 6
44 6
01 6
40 7
6913
3413
7913
4213
4313
55 7
7313
3013
7413
6613
7013
5613
6213
3313
5813
77 1
20 3
15 3
19 3
24 6
78 2
93 7
4810
2110
60 9
9510
0410
28 2
2410
43 2
2510
24 6
68 9
99 7
77 7
8148
6910
08 7
7910
35 6
9310
2510
2310
2210
4410
3210
4510
4610
3315
74 3
18 3
2210
47 7
68 8
1210
5910
48 8
15 8
1010
3810
56 2
08 2
20 4
54 6
99 7
47 8
14 6
98 7
00 6
58 6
59 7
18 6
5748
93 6
83 7
34 6
94 6
60 7
35 6
61 6
95 6
85 7
13 7
16 6
7
14
829
917
17
165
185
62
67
1387
907
1385
1390
5228
5295
316
1906
1607
1664
1085
1663
1665
1563
1605
1672
1668
1602
1564
1694
1692
1693
1601
1560
1698
1670
1671
1863
1699
3421
3722
3510
3732
3422
3423
1027
1407
1405
4771
4775
1389
4766
4774
1666
1667
460
1892
1893
1890
1891
1859
1896
1899
1900
1908
1909
1905
1907
1894
1895
1897
1898
1969
1975
1991
1989
1990
1884
1885
1888
1886
1887
1889
1902
2155
2158
2160
2159
2167
2168
2165
2166
2157
2164
2169
10
16
193
5063
5065
5073
413
415
2530
2531
862
865
894
897
901
902
408
3614
3615
5156
410
3606
3613
3616
3617
3618
414
449
3609
3608
3336
440
438
3607
3610
3611
3612
41
697
43
59
212
421
663
691
802
822
405
500
570
572
593
587
590
1015
522
524
529
557
3331
3488
3658
4175
4177
4176
4155
4164
4174
4178
4183
4184
4181
4182
4179
4180
4165
4169
4172
4173
4170
4171
4166
4168
4167
3983
4004
4003
3984
4005
3986
4001
3985
3996
3989
3990
3987
3988
3991
4152
4188
4192
4195
4198
4199
4196
4197
4190
4191
3997
4193
4194
3999
4189
4134
4135
4136
4145
4132
4147
4146
4133
4143
4144
4141
4142
4137
4138
4046
4063
4062
4050
4053
4056
4057
4054
4055
4096
4149
4104
4107
4110
4111
4108
4109
4097
4100
4139
4140
4102
4103
4098
4099
4101
4112
4113
4105
4106
4114
4115
4013
4058
4024
4025
4068
4026
4033
4035
4059
4034
4066
4022
4065
4064
4023
4027
4048
4049
4030
4031
4028
4029
4032
4036
4041
4042
4039
4040
4037
4038
44
1238
1250
1243
1245
1232
1248
1236
1239
1247
1335
1251
211
1862
4466
406
1858
1854
1855
3998
4047
4051
4052
2252
2146
2156
2161
2163
2147
2162
729
1856
4847
730
873
1614
4749
5264
4751
5267
1455
1461
1456
1463
261
1868
5312
1869
4769
1870
4972
2129
816
2013
2066
2115
2148
2149
2150
2151
2152
3439
3451
3440
682
3812
4765
4776
1857
1861
2130
2131
2143
2132
2133
2144
321
412
417
448
459
461
129
4816
2185
2329
2411
1264
5023
4994
5007
4274
4275
4273
4278
4279
4276
4277
4272
4280
4283
4284
4281
4282
870
4269
4270
4271
880
393
597
612
958
959
963
962
4760
4804
952
960
953
1696
1697
1703
1155
1161
1162
1157
1158
1159
1160
614
619
620
623
1472
1474
1921
1985
1988
1922
1924
1923
1980
1992
1982
1983
1986
1987
866
1961
1964
1971
1963
1960
1970
1965
1972
1967
1968
50
3518
3519
3979
3980
54
3528
3529
489
531
839
3978
4060
4061
3981
3982
241
909
904
247
4095
249
906
285
891
900
918
279
911
881
883
326
328
896
332
892
922
945
262
934
928
272
291
311
307
5042
1976
1977
488
491
4090
4093
4094
498
514
517
127
2092
2096
2111
2112
2113
2024
2025
2093
4930
2097
2082
2083
2175
2100
2101
2084
2085
4940
2095
402
5177
5060
5016
2153
2154
2170
2171
4985
4991
5300
1993
1994
2098
2106
2107
2108
2109
188
189
190
707
3492
3441
3442
3342
3343
4574
4575
4576
4598
4599
3443
3452
3444
4588
4589
4590
4608
4609
3445
3446
4591
4592
4593
4610
4611
3403
3404
4606
4607
4602
4603
3806
4580
4581
4582
4604
4605
4583
4584
4577
4578
4579
4600
4601
3624
1701
5323
1702
3454
3459
3455
3470
3471
4954
5195
4968
4960
4958
4955
13
135
889
2015
2016
562
1459
1462
28
78
160
895
83
94
103
20
29
112
4781
27
32
98
340
100
141
1036
96
143
124
87
107
4787
95
993
1018
1037
1604
1606
133
139
2200
2212
455
1373
1363
1375
1049
5182
5216
1324
1361
1360
1371
1367
1368
1365
1369
1364
1372
4903
4904
4905
5091
4909
4908
4906
1549
5233
5236
5235
5234
1551
1550
5227
5239
5232
5230
5229
15
4794
858
126
411
742
1340
1341
137
168
171
18
948
39
4785
4784
4791
4824
4825
37
4786
943
864
940
882
74
4789
4788
4790
4778
4779
4780
4782
4783
153
155
4797
4798
4795
4796
33
114
116
4792
4793
65
893
134
3630
2198
295
4815
4830
4842
4846
4832
4834
4849
4957
4831
4833
4841
4843
4852
4853
4813
4814
4836
4838
4835
4837
210
4770
4848
4851
4850
4767
4768
4844
4845
4839
4840
878
879
884
1404
4777
961
5281
5175
969
971
974
4976
5302
5301
5087
5137
1215
1586
5021
5032
4882
4891
5031
5334
5072
5186
5231
5033
5187
5243
5249
4988
5168
5089
5001
5056
5226
4925
5248
4929
4928
4932
4937
306
868
840
441
5242
5247
1421
5346
5244
913
4707
4708
1440
1445
1446
1453
4975
5011
5184
5193
5325
1447
1448
4939
5107
5128
5130
5092
5377
5329
5110
5109
5096
5100
5101
5241
5253
5251
5102
5185
5378
5190
21
4858
4861
53
851
853
497
823
811
809
2027
2028
4856
4855
48
4859
4854
117
122
123
115
4857
4862
723
45
200
239
240
4757
4758
1426
246
521
656
631
243
273
244
275
259
646
647
628
634
598
651
655
553
1413
4914
5335
3590
3591
174
182
179
183
1030
2194
4322
4323
778
175
181
184
2172
2173
2176
2180
2178
2179
2213
4547
4393
4394
4892
4894
1881
3675
4759
3650
3652
1883
2188
2189
2174
2187
2195
2196
2184
2181
2183
4885
4889
2192
2193
2099
2190
2191
2014
2202
2186
2088
2102
2110
2103
2086
2087
5094
2089
2090
2091
2094
2889
2906
2890
2887
2905
2888
118
4413
4414
4415
4422
4423
4335
4336
2018
3799
2019
3800
4489
3762
3764
3798
3805
3796
3797
4407
4408
1164
4324
4325
5255
4411
4417
4412
4329
4330
4409
4410
4405
4406
4326
4327
4390
4392
3793
3794
5005
5245
5121
4973
4320
4321
355
1940
1931
1943
3502
5213
1910
1911
1451
1452
1458
1485
1494
1496
1488
4801
1490
1978
1984
1979
1981
1138
3526
3527
3524
3525
4563
4566
4565
3388
3699
3389
3700
4562
3701
148
4643
4642
4639
4690
4689
1146
1147
3449
4688
4632
4633
4634
4637
4644
4645
4646
4640
4641
807
1206
1181
1186
1207
1182
4678
4693
1140
1167
1190
1184
1197
1188
1201
1187
1204
1143
1144
1176
1820
4677
4681
4638
4674
4687
4651
4684
4652
4682
4683
4679
4680
4675
4676
1552
2248
2282
2283
2241
4881
2247
2275
1864
5095
1865
1867
5049
1866
2336
2387
2404
2388
2204
2246
2206
2249
2251
2230
2261
2284
2262
2205
2285
2207
4895
4896
2286
2288
2287
1929
1933
1935
1934
1938
1930
1932
1936
1939
1937
2203
2210
2211
2276
2277
297
2145
595
743
4860
5152
861
3631
3632
365
1874
3651
3647
3648
4702
4703
422
1872
1873
599
480
2177
3579
394
970
1229
3633
3634
4378
4379
4380
4382
2137
2138
2140
2139
2141
2142
715
3654
3660
3351
2250
3659
4822
4823
3653
3655
3670
3671
3668
3669
3656
3657
1175
3475
3663
3666
3667
3352
3649
3677
3676
3661
3664
3665
3277
3299
3283
3279
3296
3297
3280
3292
3289
3298
3291
3282
3293
3290
3278
3294
3295
3281
3284
3300
4612
4613
4614
4622
4623
4621
4624
4617
4618
4619
4620
4625
4626
4627
4629
4628
724
751
1179
752
1195
725
728
1171
1189
1202
1183
1196
1198
1168
1172
1169
1185
1174
1178
1219
1180
1191
1217
1230
615
617
4449
4451
4454
4457
4476
4477
4462
4463
4460
4461
4452
4453
4331
4332
624
626
4458
4459
4450
4471
4455
4470
4469
4468
4456
4467
4465
4464
242
506
490
512
966
509
539
494
544
554
551
888
555
890
1068
2077
2078
2071
2072
2068
2073
2074
2075
2076
2067
2081
2069
2070
2079
2080
2022
2026
2023
2032
2065
2029
2030
2031
2055
2033
2060
2059
2034
2061
2035
2062
2063
2040
2041
2036
2037
2051
2052
2038
2039
2053
2054
2047
2048
2044
2045
2046
2049
2050
2056
2058
2057
548
5054
550
5048
5057
1422
5051
4949
558
5052
559
4951
4948
4947
5053
5050
5047
5046
222
248
265
4357
4358
4364
4363
4365
4361
4362
4372
4370
4371
4368
4369
4366
4367
2526
2589
2590
2588
2591
2593
2592
2594
5078
5098
5082
2294
2298
2322
2323
2297
2321
2300
2295
2301
2299
2331
2302
2319
2303
2296
2304
2305
2320
2306
739
4209
4216
4883
4210
4212
4219
4220
4214
4215
4222
4223
4224
4206
4207
4208
4880
4211
4233
4217
4232
4230
4229
4234
4231
1860
4377
3344
4375
4383
4373
4381
4374
4474
4475
2405
4536
2406
4376
4213
4218
4221
4315
4316
4317
4318
4487
4488
3816
3817
3818
3821
3825
3822
630
3433
3434
3710
5237
5176
3435
4328
3436
762
3425
3426
819
3453
3468
3469
3464
3465
3456
3458
3457
3460
3461
3462
3463
3466
3467
998
1466
1000
1468
1454
1457
1469
1464
1465
1914
1915
1917
1916
4404
4510
1918
1919
1920
1958
2201
1974
1959
1956
1973
1957
2291
2292
2293
2330
2307
2310
2311
2312
2308
2313
2314
2315
2309
2332
2317
2318
2316
2395
2396
2328
2416
2417
2418
2420
2327
2412
2413
2414
2419
2415
2324
2397
2364
2365
2407
2408
2366
2367
2368
2325
2339
2363
2399
2400
2369
2398
2394
2326
2401
2338
2425
2428
2426
2427
2333
2384
2335
2385
2337
2409
2410
2402
2403
2386
2389
2390
2391
2392
2393
343
361
377
933
379
375
384
388
391
1809
1810
2941
2959
2960
2962
2944
2961
2945
2963
2965
2964
2942
2943
1801
2903
1802
2902
2904
2907
2971
2908
2967
2966
2968
2970
2972
2939
2940
965
968
2895
2894
2897
2896
2927
2946
2929
2932
2931
2933
2928
2938
2930
2891
2893
2892
2920
2921
2898
2901
2899
2900
1846
1847
1850
1851
2429
2909
3261
3262
2918
2924
2919
3257
3258
3259
3260
2916
2917
2912
2913
2910
2911
2914
2915
2922
2923
2334
2342
2359
2372
2373
2374
2340
2351
2349
2350
2421
2422
2341
2343
2357
2344
2353
2345
2355
2348
2352
2358
2370
2379
2378
2347
2356
2360
2383
2382
2346
2354
2361
2381
2380
2362
2377
2376
2371
2375
954
1107
1106
2116
2117
2217
2278
2279
2281
2280
2227
2228
2233
2234
2229
2256
2231
2232
2235
2237
2236
2238
2270
2244
2271
2245
2272
2254
3431
3432
2259
2267
2260
2268
2269
2257
2258
2239
2240
2253
2255
2242
2243
2118
2119
4942
4945
46
3577
3588
5345
101
931
1002
1659
3575
3586
5341
5161
5173
5163
5162
4965
574
924
1081
1092
602
635
3563
4083
3564
3565
3625
3626
158
159
169
172
201
3573
541
4879
4898
3585
5339
5338
1148
3594
3595
1083
1087
4931
4933
5348
4935
3574
5337
5336
3581
3582
3583
3584
1412
5344
4971
356
3589
3578
3580
360
403
404
1509
1536
1510
1529
1520
1521
1534
5106
1522
1532
1528
1531
5028
5055
1533
1523
5026
1524
1530
5015
5014
1526
1527
4977
5105
4981
5134
5136
4987
5027
4978
4984
4983
5076
5083
5024
5030
5029
5025
5037
5036
5034
1611
5111
5279
5108
5090
5077
5164
5114
5113
5104
1558
1561
1568
1570
1565
1567
1566
1621
5276
5278
1559
1562
1569
1571
1642
5260
1643
5259
5261
5270
5269
1615
1644
1639
1630
1637
1638
1631
1645
1649
1657
1647
1648
1620
1646
1656
1654
1653
1650
1651
1655
1652
5309
5314
5313
5316
5315
5290
5292
5291
5299
5298
5294
55
1076
1089
1091
842
526
1075
1079
3571
234
1072
513
1054
1070
1071
176
4900
119
121
125
162
368
370
99
629
632
675
671
673
677
274
1069
830
4922
5192
1014
1017
3768
3769
1436
3778
3779
3780
3781
1442
1444
1443
2017
3786
3787
3784
3785
3788
3789
1086
1088
1437
780
782
5207
5224
5223
5218
5380
5258
5222
5220
195
198
3550
3551
764
766
1116
1103
1104
389
4334
867
4558
886
912
3450
4567
4597
859
860
4472
4473
898
899
579
582
1003
1007
1011
4557
4564
711
2209
2208
1110
1111
4572
4573
4570
4571
252
335
996
1026
942
944
950
3686
3687
1109
1122
547
3598
596
4154
1098
1099
3570
5340
3572
399
838
580
3568
4569
3569
1058
561
1108
583
1112
771
790
785
804
786
787
791
793
799
795
796
797
236
3702
5376
3836
401
3847
3848
3837
3839
3840
3841
836
849
3838
3844
1218
4763
5208
592
1228
1216
1220
1263
348
350
4306
4307
4301
4302
439
4308
3831
3834
3835
3832
3833
545
4244
4245
577
3829
5002
1610
1628
353
3619
3627
3628
3483
3562
3622
5351
648
652
457
3521
3520
3398
3396
3416
3728
3481
3485
606
3692
3361
3693
3690
3691
4544
4545
3399
5117
3522
3523
3684
3689
3782
3783
702
3479
3478
3474
3473
3472
3476
3477
3685
3688
3814
3828
3826
3827
418
4516
443
4517
4518
3392
4504
4521
4529
4505
4506
3394
4535
3790
3791
3801
3802
828
834
832
845
921
847
920
831
833
4756
850
444
2671
1473
2658
2662
2677
2678
1409
2670
2673
2672
1408
2674
2669
1410
2660
2675
2676
556
3767
1142
1553
3374
3375
827
2657
4559
4561
4560
903
905
908
910
923
946
941
929
1871
608
4444
609
4445
4446
2197
2199
4447
4554
4448
4555
4556
3397
3773
3813
3770
3771
3774
3772
5386
3810
3809
3775
3776
3811
3777
1508
3953
3954
3955
3970
3956
3965
3967
3966
3962
3964
3963
3951
3952
3957
3968
3969
3960
3961
3958
3959
994
997
1020
1034
3437
1145
1153
3395
3438
3642
3643
3644
3645
3646
1609
1612
1616
1613
5266
5268
1617
1634
1633
1641
5262
5263
5273
5265
1618
1619
1622
1629
1632
1623
1624
1625
5274
5275
5271
5272
3531
3534
3535
3532
3536
3537
3533
3541
3545
3546
3543
3544
4938
5115
5118
5122
5129
5123
4990
5125
5368
5366
5357
5124
5363
5362
5132
3
390
5282
3480
543
206
1066
1095
1420
1660
1661
1691
486
493
584
4424
4434
4425
4426
1511
5181
5211
76
77
111
4159
4162
4163
4160
4185
4161
4186
4187
180
1424
1439
1425
1438
1338
1429
1431
1434
1423
1432
1441
1433
1435
1339
81
3639
3868
5067
323
380
3641
3640
2182
4867
726
2128
3905
3906
3907
85
88
109
4153
4082
4006
835
436
4069
1962
1966
3869
4077
4078
4091
4129
4130
4131
496
499
4084
4092
4071
4072
4018
4070
4121
4122
525
527
4116
4016
4019
4117
4118
4017
4020
4021
4043
4044
4045
857
3599
3604
3605
856
4150
4157
4158
3867
3870
3871
3931
3932
3933
3891
3892
3893
433
435
22
1878
4812
296
1901
2114
131
178
1094
1097
3703
3704
177
463
349
876
2218
2219
2220
2221
344
5159
378
1882
1082
1090
1096
446
3302
4671
4672
4673
973
1261
1269
1274
1275
1270
1272
1284
1285
1273
1286
1281
1279
1283
1282
1276
1280
1289
1288
1287
1352
1875
3509
1779
1313
1344
1326
1332
1345
1347
1346
1348
1349
1351
1350
1325
1329
1221
1224
1226
4014
4067
4015
223
232
233
102
104
136
128
130
144
1130
1131
213
132
156
138
151
157
209
503
1006
278
284
1084
1009
3513
3515
4943
4886
5210
732
4884
147
149
1941
1953
1945
1949
1946
1951
1952
1948
1954
1944
1955
1947
108
105
173
1262
164
167
300
3638
1406
1277
1690
3637
3672
877
1608
1319
1328
575
604
576
578
581
534
567
537
569
4997
770
774
885
776
788
4087
4088
290
294
23
2723
2724
1579
2719
2726
2722
79
86
219
2698
5200
3196
5201
221
855
837
2667
2668
986
2688
1450
987
2704
988
2700
2643
2699
2697
2703
2683
2684
2448
2687
2449
2642
2680
2681
2682
2701
2702
2750
2758
2760
2751
2752
2759
235
2538
2568
2569
3195
2570
2566
2567
204
229
1817
2513
3096
2514
2631
2973
2976
2969
2974
2977
2978
2711
2712
2738
216
2708
2706
2748
2749
2715
2717
2716
2725
2718
453
2731
2729
2435
2434
2436
1518
2629
2437
2439
2438
2544
3033
2545
1519
2430
2431
2762
2763
2630
2633
2632
1537
2728
2730
3226
2705
2707
2713
2714
1535
3115
3116
3085
3084
3086
3087
3114
3092
3121
3093
3126
3129
3128
3127
1848
3079
1849
3106
3107
3108
3109
3080
3110
3081
3082
3083
2634
2637
2635
2636
2848
2862
2861
2975
4750
2853
2856
2854
2855
2849
2852
2851
2850
2860
2517
2518
2761
2765
2767
2764
2766
227
2644
2648
2649
2646
4944
2645
2647
2685
2686
927
930
2650
2651
2735
2737
4953
2736
1489
1491
1525
2879
2881
2878
2880
2874
2875
2876
2877
2864
2709
2710
3071
3072
2863
2865
3255
3256
3253
3254
2866
2867
2868
2869
3251
3252
1540
3057
3070
3060
3036
3042
3058
3050
3069
3051
3059
3037
3040
3041
3038
3039
3066
3068
3067
3044
3064
3065
3055
3052
3063
3053
3056
3054
1415
1416
3198
3199
1539
3232
3233
3202
3203
3200
3201
3197
3225
3227
3210
3211
3228
3229
3230
3231
3204
3222
3205
3221
3223
3234
3235
3245
3246
2992
3214
2993
3213
3216
3215
3224
3244
3236
3237
3212
3217
3220
3219
3218
1133
1135
2806
2823
2796
2882
2883
2797
2805
2813
2814
2824
2885
2825
2826
3140
3177
2827
2830
2831
2828
2829
1538
3161
3162
3188
3155
3156
3187
3189
3148
3154
3152
3153
3151
3163
3164
3157
3158
3159
3160
3149
3175
3150
3180
3182
3181
2884
2886
3165
3179
3184
3186
3191
3192
3176
3190
3178
3166
3185
2788
2789
2807
2808
2809
191
1733
1740
1732
1736
207
1737
1730
1731
1734
1735
1738
1739
1741
1744
1743
1746
1747
1751
1745
1749
1750
1752
1742
1748
1762
1834
1835
1753
1754
1756
1755
1757
1758
1761
1759
1760
1233
1234
1836
2841
2445
1240
1237
2840
2843
2842
1300
1310
1307
1308
1316
1337
1302
1311
1305
1303
1306
1844
1784
1811
1812
1845
1789
1795
1841
1780
1782
1781
1783
1785
1231
1295
1297
1290
1298
1291
1299
1296
1318
1246
1249
1427
1503
1504
1505
1506
1514
1515
1769
1778
1842
1843
1763
1765
1774
1770
1773
1771
1776
1772
1777
1804
1807
1805
1808
1803
1806
2770
2771
2786
2803
2787
2802
2804
3269
3270
3265
3266
2772
2773
3267
3268
3263
3264
2832
2834
2833
2835
2837
2836
2774
2775
2782
2783
2778
2779
2776
2777
2780
2781
2470
2489
2488
2471
2472
2475
2476
2473
2474
2477
2478
140
268
2732
2733
2925
3032
2926
1516
1517
2521
2539
2573
2574
2619
2620
250
308
363
309
312
314
317
397
202
4761
358
367
2548
2617
2610
2616
2547
2615
329
2614
2546
2572
2496
2499
2612
2613
2500
2561
2504
3247
3248
964
967
2497
2501
2579
2498
2502
2512
2503
2580
2611
2618
2515
2516
2540
2542
2556
2562
2560
2557
2558
2541
2543
2621
2622
2559
2563
2600
2603
2601
2602
2578
2584
2583
2585
2595
2596
975
2506
2505
2525
992
2523
2527
976
983
2508
2528
2524
2507
2529
984
264
288
352
985
980
333
341
346
1484
990
981
991
359
982
989
280
282
423
1467
5157
5189
5008
5020
5180
5219
5097
5138
5074
5075
5088
5158
5196
5286
5285
1471
5148
5150
5166
1500
5039
5017
5040
4974
5252
5035
5041
5000
5038
5160
5043
5197
5349
5203
5199
5198
1177
1357
4934
4936
5225
5250
5254
5256
5257
5277
1252
1254
1253
3094
3132
4755
3133
4546
3134
3095
3098
3097
3099
3102
3101
3100
3111
3113
3112
3122
3125
3124
3123
1796
2981
3002
2446
2447
3003
3005
3000
3001
2982
2983
3004
3008
2984
2991
2985
3010
3011
3006
3020
3007
3009
3013
3018
3012
3016
2989
2990
2994
2996
2995
1255
1257
1259
1260
2064
4151
4752
4002
4148
4156
4754
2627
2628
1256
1258
1593
2625
2626
2433
2844
2845
2846
3135
2847
1592
2623
2624
1729
2432
1705
1706
4753
5289
5288
5287
5307
5310
4956
4959
4923
4924
2957
2958
2955
2956
2953
2954
3119
3120
2951
2952
3117
3118
1813
1814
1829
1818
1819
192
197
4319
4333
4478
199
251
4494
4495
4492
4493
4490
4491
738
765
841
4498
4499
1598
1599
1600
3830
4395
4416
3765
3792
4384
4387
3766
4398
4399
4391
4496
4402
4403
4400
4401
1151
3662
1575
1576
1588
1589
1590
4966
4967
4396
4497
4397
505
3741
3713
3739
3748
3761
3735
3745
869
1042
3746
3755
3759
5385
3736
3714
3743
3718
3747
3754
3400
4479
4480
3679
3815
3753
3760
3673
3674
3682
4701
3683
4694
4709
4710
4711
4700
4733
1595
1597
3706
5353
3712
5179
3709
3730
3711
5154
4980
4986
4969
5215
5058
1725
1728
1726
5328
5327
1727
4706
4728
4729
4440
4441
4385
4388
4730
4731
4698
4699
4695
4723
4724
4721
4722
4726
4732
4727
3725
270
2480
2481
2479
2483
2482
338
3027
3021
3022
3030
3031
3028
3029
3023
3024
801
825
926
935
818
820
826
919
925
35
49
1010
1041
1065
759
40
1063
64
1064
38
42
170
1001
1876
166
1067
1877
977
1154
1555
1005
1073
1013
1078
1029
1050
1052
1031
1222
955
1012
1074
1556
1557
1040
1061
1057
1039
1055
1051
58
1062
1722
1704
1707
1721
1720
1716
1715
1719
1688
1673
1717
1675
1678
1687
1682
1679
1676
1689
1686
1093
1658
1677
1718
1710
1681
1712
1680
1714
1711
1713
1684
1674
1709
1708
26
4864
749
750
821
808
387
1139
767
746
4868
61
89
3875
3910
3912
3911
3919
215
217
3908
3909
3917
3918
3915
3916
3913
3914
90
91
3896
3898
3897
3920
228
231
3894
3895
3903
3904
3901
3902
3899
3900
196
292
369
4000
327
331
336
337
277
281
1502
289
302
362
364
386
736
342
357
392
372
374
3795
4073
4074
4125
4126
4119
4120
4127
4128
4123
4124
4075
4076
2456
2461
2464
2462
2463
2457
2460
2458
2459
2486
2487
1315
1323
1386
1388
1327
1403
1317
1396
1320
1336
1392
442
471
469
470
502
466
481
473
475
484
468
507
482
510
477
478
479
1117
1119
1121
3167
3183
3170
3169
3168
3193
3194
4427
4428
4437
4435
4436
4432
4433
4429
4431
4430
4438
4439
4630
4631
4660
4663
4664
4661
4662
4658
4659
4691
4692
1998
1999
2000
2001
2011
2002
2012
2003
2004
2009
2010
2007
2008
2005
2006
3934
3935
3940
3945
3946
3943
3944
3941
3942
3936
3937
3939
3938
4339
4340
4345
4349
4350
4344
4348
4355
4353
4354
4351
4352
4346
4347
218
257
256
424
267
269
3807
3808
4586
4587
674
600
603
645
650
670
627
636
4594
4595
633
654
3390
3391
672
705
688
680
689
719
737
690
720
741
740
4550
4596
4585
4747
4748
721
731
722
684
676
679
687
255
3635
4927
3636
5116
1278
3600
3973
530
4012
1695
1627
4085
710
714
681
706
709
717
704
708
712
4826
4827
546
552
589
591
571
594
573
214
3884
565
566
3874
3883
3885
3921
3930
3926
3872
3877
3882
3886
3889
3890
3887
3888
3880
3881
3927
3879
3923
3873
3876
824
3922
3878
844
3924
52
3587
605
82
3576
549
1100
1102
3925
3928
3929
430
431
3487
3629
3621
3620
237
286
3348
564
3514
516
518
843
3758
3347
4533
4549
4522
4530
3320
3349
3345
532
568
563
533
536
538
451
4539
3734
4528
4532
3757
4509
4523
4551
4524
1594
1596
4534
4543
852
3338
3339
4537
4538
4548
4704
4705
4713
4714
4719
4720
4717
4718
4715
4716
3697
5358
3698
56
1173
2214
2215
3301
3340
3341
194
263
351
3721
3720
4531
203
5379
1271
382
3329
3861
3303
3304
3362
3415
5382
3393
3305
3306
4519
4520
4386
4389
3384
3385
3695
3482
3484
3729
400
3337
1662
3365
5383
3680
3372
3373
3486
3742
3491
3756
3737
643
3512
3511
3862
817
3489
3346
3310
3354
3321
3419
4982
4992
5085
5093
3330
3842
3843
639
641
1414
1418
4303
4299
4300
5006
5174
4298
1685
4296
5324
3309
3350
3308
3751
3727
3328
3359
3360
972
3696
3694
3744
5003
5004
560
3353
3334
3335
3332
3333
1223
1225
73
5330
75
3314
3315
70
4725
4507
4552
4483
4484
1603
1669
3763
5246
1683
1587
696
4712
4540
4541
3355
3356
3364
3420
3417
4568
5191
3427
3428
3430
3429
3380
3381
3715
3726
3752
4508
4525
4526
3382
3383
3418
3723
3493
3505
3506
3681
3733
3716
5384
4481
4482
607
610
621
3494
611
613
3307
3363
3311
5371
5370
3312
4304
4305
5367
3495
5375
5369
5365
3401
3406
3402
1129
1136
3496
3497
3498
3501
3530
3499
3500
483
3409
3410
3407
3424
3408
1879
1880
4979
4511
3368
1460
4310
3326
4311
3327
4309
4235
4312
4238
4239
4236
4297
4237
4243
3592
3593
205
4485
4486
3503
3504
345
3740
3731
3678
3738
3708
4920
5112
1411
1482
1626
4887
4899
5155
1513
3490
3707
5364
5356
3719
3750
5372
5354
1512
3319
5359
3749
5374
3386
3387
3717
5352
3947
3948
3705
5326
4989
3845
5202
3846
4241
4907
4911
4915
5119
5120
951
1200
956
1192
4993
1193
1203
957
1194
4912
1213
4916
5022
5013
1211
3357
3358
3623
3517
3516
420
447
3318
3369
3313
428
429
872
875
3324
3325
3316
3317
4527
4542
4553
4696
4697
4514
4515
4420
4421
4418
4419
4313
4314
3724
3376
3377
3413
3414
501
586
464
585
467
588
3370
3371
3378
3379
3405
3507
3508
3322
3323
4512
4513
1700
4743
4744
4902
1152
1156
4240
5045
5141
5084
5081
5071
5068
5061
5069
5070
5360
5147
5149
2652
2653
2134
2135
2136
2536
2537
665
667
2484
2485
4877
4878
637
638
2450
2451
2465
2466
347
373
354
939
395
1927
1928
4926
4998
4970
1815
1830
1831
2534
2535
47
51
5086
5099
5126
4995
5080
5079
5127
3976
3977
1828
1837
1838
2509
2510
4263
4264
4685
4686
4250
4251
4200
4201
298
299
2452
2453
4289
4290
161
163
4741
4742
93
146
5059
5066
5064
5062
4356
4359
4360
3025
3026
2263
2264
2816
2822
2818
2820
3819
3820
701
703
2492
2493
1827
1839
1840
1391
1397
1401
1393
1394
1402
1395
1912
1913
4337
4338
5135
5153
1105
3556
3557
1925
1926
4267
4268
432
434
1942
1950
1293
1301
4442
4443
1149
1150
4655
4656
4657
4665
4666
1577
1585
5133
5144
5151
5146
5145
5139
5143
5142
5140
5165
1314
1359
2554
2555
2936
2937
3047
3048
2656
2659
2661
2679
1141
3994
3995
2042
2043
3285
3287
2511
2522
1788
1794
1267
1268
4649
4650
1304
1312
474
4913
476
508
515
1852
1853
2689
2690
4745
4746
2519
2520
1477
1478
2273
2274
4897
4901
2691
2692
1572
1573
3971
3972
2768
2769
3138
3139
2020
2021
2838
2839
106
142
2265
2266
3208
3209
4635
4636
2947
2948
1123
1126
1127
1128
1591
2720
2727
2721
2423
2424
4917
4918
3145
3147
3146
3803
3804
3141
3142
3088
3089
24
25
3411
3412
4259
4260
3447
3448
4615
4616
3949
3950
3090
3091
5169
5194
5214
2798
2800
4287
4288
5170
5172
5205
5217
5204
5206
5212
5209
5171
1903
1904
3130
3131
753
783
5183
5221
381
385
2640
2655
2641
3849
3850
3538
3542
3034
3043
2490
2491
113
4866
727
4865
2949
2950
1199
1205
4242
4253
4255
4252
4254
1797
3017
1798
2987
3019
2988
1799
3014
1800
2986
3015
2997
4341
4342
4343
4261
4262
2552
2571
1381
1383
4227
4228
2663
2664
871
874
19
914
34
36
30
31
63
66
68
69
71
72
1786
1787
1791
1792
1790
1793
452
3602
1080
3601
3603
616
618
2532
2533
230
260
1242
1244
1265
1266
744
745
4963
4964
1449
1470
1764
1767
1768
437
456
462
458
409
445
2934
2935
1053
1077
4202
4203
3859
3860
3073
3074
3560
3561
4647
4648
3238
3240
3239
2638
2654
2639
2289
2290
2753
2755
2757
2754
2756
1227
1321
1322
3857
3858
772
789
3271
3272
1995
1996
1997
4738
4739
4740
3539
3540
1821
1825
1826
4919
4921
4079
4086
4080
4081
4089
4285
4286
2608
2609
4950
4952
2790
2791
2794
2795
2812
2792
2793
4734
4735
848
854
2598
2599
487
495
2442
2444
2443
978
979
2440
2441
3823
3824
3249
3250
3077
3078
622
625
4667
4668
4500
4501
3851
3852
2586
2587
2604
2605
2494
2495
2695
2696
1380
4799
4800
2734
2747
1208
1209
472
1235
1241
2743
2746
2550
2553
2744
2745
4820
4821
5
916
5304
5305
5306
4
915
3596
3597
3366
3367
3853
3854
3061
3062
2581
2582
4736
4737
3286
3288
2799
2801
3974
3975
686
756
2784
2785
2810
2811
2741
2742
2577
2597
4941
4946
1163
1165
1170
1581
1584
4204
4205
4653
4654
2606
2607
2739
2740
1309
1398
1399
1400
2979
2980
2564
2565
4256
4257
4258
2120
2124
2125
4248
4249
3173
3174
4265
4266
1124
1125
3035
3049
152
154
3045
3046
1331
1353
1376
1354
3103
3105
3104
425
426
427
662
664
666
669
692
3171
3172
3863
3864
1816
1832
1833
4225
4226
3855
3856
2815
2821
2817
2819
3865
3866
1428
1430
3136
3137
1635
1636
1640
2857
2859
2858
2873
2870
2871
2872
4669
4670
3206
3207
4772
4773
1210
1212
846
1480
1493
1501
1479
1481
3143
3144
3075
3076
2665
2666
4808
4810
1766
1775
1292
1294
1578
1582
4007
4011
4010
4008
4009
4502
4503
2693
2694
3275
3276
1580
1583
2454
2455
1382
1384
3241
3243
3242
1475
1476
4961
4962
4291
4294
4295
4292
4293
1822
1823
1824
2575
2576
4246
4247
2549
2551
3552
3553
3273
3274
2467
2469
2468
4888
4890
2468101214
node
ID
clustering index
Figu
reC
.34:
Arc
hite
ctur
alM
odel
-com
plet
ehi
erar
chic
alcl
uste
ring
dend
rogr
amof
the
LL
SR
eque
stH
andl
ing
subs
yste
m.T
hele
ftm
ost
node
sar
em
agni
fied.
107
C. FIGURES
53185361534349965355534251784999535051881543 936154453475297154851035240530852845280533153031545141715471542
8 11 97 145 9 932 12 150 887 937 9381419538153225317149953735319531152961483501852931507149814975131149214871486 1875167 1861541154653215320528352381495500950195010
14764 73329982999 253 2541118112035483549491053335044533211131115111411321134 8048054806 754 755476248111378222222232224221622252226480748094818481948024828482948034817 757 758 7611101 3963992 3983993212121222123212621273547356635673558355935543555
2 330 407 653 266 276 238 258 245 283 301 416 419 465 271 320 325 305 947 949 844863 310 313 334 57 649 540 7604872 794487448734875 792 803 805 110 8061724 520 784 798 80017231166 450 504 60 92 775101921042105 485 528 492 511 519 5231016 542 226 813487148704876 863 76311371214 287 339 383 366 371 376 303 304 5355012 6421554 644 601 640 76913341379134213431355 773133013741366137013561362133313581377 120 315 319 324 678 293 74810211060 99510041028 2241043 2251024 668 999 777 78148691008 7791035 693102510231022104410321045104610331574 318 3221047 768 81210591048 815 81010381056 208 220 454 699 747 814 698 700 658 659 718 6574893 683 734 694 660 735 661 695 685 713 716 6 7 14 829 917 17 165 185 62 671387 9071385139052285295 31619061607166410851663166515631605167216681602156416941692169316011560169816701671186316993421372235103732342234231027140714054771477513894766477416661667 460189218931890189118591896189919001908190919051907189418951897189819691975199119891990188418851888188618871889190221552158216021592167216821652166215721642169 10 16
2 4 6 8 10 12 14
node ID
clustering index
FigureC
.35:Com
ponentdefinitionbased
onthreshold
inhierarchicalclustering
(threshold=12)
108
Experiment analysis (section 4.4)
0 10 20 30 40 50 60 700
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Scattering
Rel
ativ
e fr
eque
ncy
(%)
g = 300
Figure C.36: D300 – scattering distribution for granularity level of 300
0 10 20 30 40 50 60 700
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Scattering
Rel
ativ
e fr
eque
ncy
(%)
g = 500
Figure C.37: D500 – scattering distribution for granularity level of 500
109
C. FIGURES
0 10 20 30 40 50 60 700
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Scattering
Rel
ativ
e fr
eque
ncy
(%)
g = 1000
Figure C.38: D1000 – scattering distribution for granularity level of 1000
0 10 20 30 40 50 60 700
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Scattering
Rel
ativ
e fr
eque
ncy
(%)
g = 3000
Figure C.39: D3000 – scattering distribution for granularity level of 3000
110
0 10 20 30 40 50 60 700
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Scattering
Rel
ativ
e fr
eque
ncy
(%)
g = 5386
Figure C.40: D5386 – scattering distribution for granularity level of 5386
111
C. FIGURES
g = 300 g = 500 g = 1000 g = 3000 g = 5386
0
50
100
150
200
250
300
350
400
Sca
tterin
g
Box plot Dg
A.
C.
D.
B.
Figure C.41: Box plot of five scattering distributios Dg with g=300, 500, 1000, 3000, 5386
112
0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000
1
2
3
4
5
6
Granularity g
Med
ian
valu
e of
Sca
tterin
g
Figure C.42: Median value of the Scattering distributions for all granularities
0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000
2
4
6
8
10
12
14
16
18
Granularity g
Mea
n va
lue
of S
catte
ring
Figure C.43: Mean value of the Scattering distributions for all granularities
113
C. FIGURES
0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000
5
10
15
20
25
30
35
40
45
50
Granularity g
Sta
ndar
d de
viat
ion
valu
e of
Sca
tterin
g
Figure C.44: Standard deviation value of the Scattering distributions for all granularities
114
050
010
0015
0020
0025
0030
0035
0040
0045
0050
000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.91
Gra
gula
rity
g
Relative frequency (%)
scat
terin
g in
dex
= 1
scat
terin
g in
dex
= 2
scat
terin
g in
dex
= 3
scat
tern
g in
dex
= 4
tota
l
S2
S3
S4
P2
P1
Figu
reC
.45:
Rel
ativ
edi
stri
butio
nof
scat
teri
ngin
dexe
s1,
2,3
and
4.T
hefig
ure
show
sth
eri
pple
effe
ctfr
omlo
wer
tohi
gher
gran
ular
ities
(S2→
S3→
S4).
Thi
sis
caus
edby
busi
ness
proc
esse
sge
tting
mor
esc
atte
red
over
the
impl
emen
ting
them
com
pone
nts
due
toan
incr
ease
dgr
anul
arity
.A
lso
show
nis
atu
rnin
gpo
inti
nth
esc
atte
ring
deve
lopm
ent(
P 1,P
2).
Atg
ranu
lari
tyle
vel3
650,
the
gran
ular
ityis
sohi
ghth
atth
esi
mpl
etw
oco
mpo
nent
busi
ness
proc
ess
star
tget
ting
split
115
C. FIGURES
0500
10001500
20002500
30003500
40004500
50000
0.5 1
1.5 2
2.5x 10
6
Granularity g
Effort
FigureC
.46:TotalEffortbased
onthe
Scatteringovereach
granularitylevel
116
3500
3750
4000
4250
4500
4750
5000
5000
051015202530
Gra
nula
rity
g
Cumulative percentual growth (%)
grou
p A
(15
0 <
s)
grou
p B
(80
<
s <
= 1
50)
grou
p C
(50
<
s <
= 8
0)gr
oup
D (
22
< s
<=
50)
grou
p Z
(
s <
= 2
2)
Figu
reC
.47:
Diff
eren
cein
grow
thof
the
scat
teri
ngbe
twee
ngr
oups
ofbu
sine
sspr
oces
ses
with
diff
eren
tcom
plex
ity(A
,B,C
,D,Z
).T
hehi
gher
the
busi
ness
proc
ess
com
plex
ityth
esh
arpe
rth
egr
owth
.T
his
grow
thsp
urt
ofth
esc
atte
ring
expl
ains
the
sudd
engr
owth
inth
eov
eral
lEff
orta
bove
gran
ular
ity42
50.
117
C. FIGURES
00.5
11.5
22.5
x 106
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Effort
g ~ Gain
g vs. Effort
ALL: sin8
ALL: x^2
LIMIT
: x^0.7
FigureC
.48:B
estapproxim
ationsthrough
regressionanalysis
ofthe
Gain/E
ffortrelationship.
The
limited
setof
points(L
IMIT
)for
Effort
<=
1.25x10
6has
aninfinitely
increasingapproxim
ation(x
0.7).H
owever
thetrend
inthe
complete
setof
datapoints
(AL
L)
isbestapproxim
atedby
afunction
with
am
aximum
(x2).
The
bestfittingapproxim
ationof
thew
holedata
setisthe
periodicsum
ofsin
functions.
118
00.
51
1.5
22.
5
x 10
6
0
1000
2000
3000
4000
5000
6000
Effo
rt in
dex
Reverse Gain
Figu
reC
.49:
Trad
e-of
fana
lysi
s-P
aret
ofr
ontie
r.T
heop
timiz
atio
nfu
nctio
nis
Min
∑E
ffor
t+(1−
Gai
n).T
heop
timal
solu
tions
are
the
ones
clos
erto
the
orig
inof
the
grap
h.
119
C. FIGURES
00.25
0.50.75
11.25
1.51.75
22.25
2.5
x 106
0.75
0.8
0.85
0.9
0.95 1
1.05
1.1
Effort index
Utility valuation
AB
LO1
LO2
LO3
LO4
LO5
LO −
Local Optim
umG
O −
Global O
ptimum
GO
FigureC
.50:O
ptimalsolutions
ofthe
trade-offanalysis
basedon
theG
ainand
Effortdata.
The
figureshow
sthe
globaloptimum
(GO
)and
severallocaloptima
inorderofvaluation
(LO1 ,LO
2 ,LO3 ,LO
4 ,LO5 ).
The
optimization
functionis
Min
∑E
ffort+(1−
Gain),as
shown
inC
.49.
120
−202468
x 10
−3
1st deriv (units Gain per 1 unit Effort)
00.
250.
50.
751
1.25
1.5
1.75
22.
252.
52.
5
x 10
6
−1
−0.
50
0.51
x 10
−7
2nd deriv
Effo
rt in
dex
LO3
LO2
GO
LO5
LO4
LO −
Loc
al O
ptim
umG
O −
Glo
bal O
ptim
um
A
Figu
reC
.51:
Trad
e-of
fana
lysi
s-r
ate
ofch
ange
arou
ndop
timal
solu
tions
121
C. FIGURES
Experiment future work (section 4.6.1)#5
f3()
f2()
f1()
Code
Architectural ModelService Model
Figure C.52: Experiment future work - Service Model mapped to the Application Modellegacy components#6
Service Model
Architectural Model
Business Model
Figure C.53: Experiment future work - Business Model mapped to the Service Model usingthe Application Model
Business Model
Architectural Model
Code Implementation
Architectural Model
Code Implementation
1:1
AS-IS TO-BE
Strategies
Service Model
Figure C.54: Experiment future work - building the to-be architectural model using SCAconcepts based on an 1-to-1 mapping of the services and the implementing components
122
C. FIGURES
Figures-IX. Future work (section 6.3)
Difference Identification
Difference Identification
Difference Quantification
Difference Quantification
AS-ISSystemAS-ISSystem
TO-BESystem
TO-BESystem Modernization
Towards accurate effort estimation
Model extractionModel extraction
SOA specificationSOA specification
Nominal scaleOrdinal scale
Interval scale
Ratio scale
Portfolio AnalysisPortfolio Analysis
RIC OSRCOSLC
OSLC
ImplementImplement
DeliverDeliver
DeployDeploy
[Cost Model]
[Relation Models][Decision Model]
Trade-off analysisTrade-off analysis
......
...
RIC - Reengineering Investment CostsOSRC - Operations and Support Costs for
Reengineered ComponentsOSLC - Operations and Support Costs for
Legacy Components
Figure C.56: Future work overview. There are several phases in the process of reaching aneffective effort estimation. This research has focused on the Difference identification phase.There are however other phases that need further work such as portfolio analysis, decisionmodels and the cost estimation base on the effort
124
Top Related