INFORMATION TECHNOLOGY

 


Semantic Designs, Inc.

Software Re-Engineering Tools for Automating Design Upgrade

View PDF Document         Text Only Version

Software upgrades, new tools, and the latest software bundles are so commonplace that computer users do not realize how much money and effort go into these developments. In the mid-1990s, software engineers had to spend an enormous amount of time and money to understand how a software system worked by studying the details of its implementation in each line of code (LOC) before deciding what changes were needed and how to implement them. Automating this process by using a software engineering tool would cut development time, save valuable labor resources, and improve the quality of the software system.

Semantic Designs, a small start-up company in Texas, proposed to create a software design maintenance system (DMS) as the first step in developing that engineering tool. Semantic Designs projected that using DMS would reduce software maintenance costs by a factor of two or more and would save approximately $2 billion annually.

The company’s vision for DMS involved considerable technical risk, because, in 1995, no software tool existed to capture and modify a formal software design. Furthermore, no software reengineering system processed more than a few thousand LOCs; DMS would target both of these goals. Most significantly, company researchers planned to analyze software code in the same way that the human brain processes the meaning of a sentence through a process called semantic analysis.

Semantic Designs was unable to secure sufficient capital to proceed with their project. Therefore, they applied for funding from the Advanced Technology Program (ATP) under its “Component-Based Software” focused program in 1995. The three-year ATP-funded project began in December 1995.

At the completion of the project in 1998, Semantic Designs had successfully developed its first significant DMS software technology and had validated a toolkit for the automated conversion of very large legacy systems. The Semantic Designs team developed a new programming language, called PARLANSE, for implementing DMS. Both DMS and the toolkit based on it have performed very well since 1998, and companies such as Northrop Grumman, Boeing, Raytheon, and Rockwell have used this automatic analysis tool to great success.

 COMPOSITE PERFORMANCE SCORE
               (based on a four star rating)
                 * * * *

Research and data for Status Report 95-09-0059 were collected during January – March 2006.

 

 

Software Re-Engineering Is Tedious and Costly

Software maintenance, which involves migrating and upgrading software applications, as well as complying with current industry requirements, is a major software engineering cost for businesses. In 1995, maintenance

 

accounted for 50 to 80 percent of the $160 billion spent annually on software development in the United States. Yet there was little theory or few standard tools to automate the process. Legacy software systems typically lacked critical documentation about their design and architecture. Capturing software design and


 

architecture information through a tool would not only help the engineer’s task, it would expedite the process and save a company valuable labor resources. The tool could also enhance the quality of the upgraded software, because manual software maintenance, which relied on informal design information, was generally prone to human error.

A typical legacy software system would successfully deliver results over a long period and could perform better and more efficiently if it could be adapted to the changing needs of the organization. In 1995, software engineers often had no proper documentation describing how the code implemented the original tasks for which the software was intended. For example, if engineers had to upgrade a legacy system to meet new specifications or migrate it to a newer system, they would have to first decipher the legacy code to identify and remove unusable code and then add new code. Most large-scale legacy systems would have one million (or more) lines of code (LOCs), equivalent to about 16,000 pages of text. If the engineer had to manually change several words and phrases in some of these million-plus lines, it could take several years to identify the applicable lines and how they interacted with the rest of the system. Once completed, it would be another lengthy project to check for human error. It was not economically feasible to undertake such a huge task unless the process could be automated. In 1995, lacking an automated process to upgrade legacy systems, software engineers would often build new systems and abandon the legacy system, or they would continue with the old system, ignoring the need to upgrade. Both of these “solutions” to the problem of legacy system obsolescence were costly and inefficient. A more cost-effective resolution would be to use an intelligent “search-and-change" tool, as a kind of extremely smart word processor, which would do the job more efficiently and more thoroughly in a much shorter timeframe.  

Semantic Project Targets Automation of
Software Re-Engineering

Semantic Designs was a start-up company in Austin, Texas, with plans to create a software design maintenance system (DMS) that would be the basis of a search-and-change tool for software code. The company’s goal was to develop this tool for automating

 

the analysis and upgrade of the legacy architecture in a software system, leaving engineers free to work on the design and construction of additional features of the software. DMS would be like a giant processor that could work with any software and multiple programming languages. Semantic Designs researchers planned to analyze software code in the same way that the human brain processes the meaning of a sentence. This process is called semantic analysis.


In 1995, lacking an automated process to
upgrade legacy systems, software engineers
would often build new systems and abandon
the legacy system.


Through semantic analysis and the use of a new programming language, the DMS tool would be able to manipulate legacy system code and adapt it to the engineer’s concept of upgrades and additions, thus helping to “write” a better and newer version of the software in a much shorter time. For example, if one used a word processor to write the sentence, “Colorless green ideas sleep furiously,” the grammar and spelling checkers of the word processor would point out errors, if any, as one typed the sentence. But the word processor would not point out that the sentence was meaningless. On the other hand, if the word processor had semantic analysis capability, it could understand the meaning of each word, analyze word usage in the sentence, and conclude that the sentence was meaningless. That is, the word processor would be performing a sophisticated analysis on behalf of the writer, saving valuable time and resources. If this word processor worked in only one or two languages, say French and English, or if it could only analyze sentences of a definite length, its application would be limited. But if it worked in any language and with any sentence length, the application’s value would be considerably higher. The proposed DMS tool was very similar to this advanced word processor example.

Huge Spillover Effect on Economy Is Expected

Improved methods of software maintenance could have a wide-ranging positive impact on the U.S. economy. Semantic Designs projected that by using the DMS tool, U.S. businesses could reduce software maintenance


 

costs by a factor of two or more, saving approximately $2 billion annually. But the savings did not have to come only in upgrading legacy software systems; DMS could be used to automate part of the process of building new software and validating new code. Software components processed by DMS could also be shared across multiple software projects, thus reducing cost and raising quality. The net spillover effect of this technology on the U.S. economy would be significant only if Semantic Designs completed the research within a short time and commercialized the emerging technology soon thereafter. To accomplish this, it was imperative that the company garner outside funding.


Researchers planned to use semantic analysis to understand software code in the same way that humans understand the meaning of a sentence.


The success of DMS would depend on the implementation of the research team’s vision of an integrated architectural system. In 1994, there were no systems to capture a formal software design or modify it. Few software reengineering systems could handle more than a few thousand LOCs. Semantic Designs planned to use DMS to process 100,000 lines or more, and its main mission of automation was based on extracting and modifying software design. Most significantly, the researchers planned to use semantic analysis to understand software code in the same way that humans understand the meaning of a sentence.1 Successful development and implementation of the DMS concept would result in a significant technological breakthrough, but there was also a substantial chance for failure.

ATP Funds DMS Research

Venture capitalists and other investors were unwilling to fund a research project with such a high level of technical risk. Therefore, the company applied for and received funding under ATP’s “Component-Based Software” focused program in 1995. Their proposed project objectives were in line with the focused program’s emphasis on developing technologies to reuse software code and automate software development. ATP funding would enable

 

Semantic Designs to vigorously pursue research and overcome substantial technical risks within a short timeframe.

On the day Semantic Designs officially received the ATP award in 1995, they had an empty office and no hardware or software to use in implementing any part of their research plan. They quickly procured the infrastructure and started research. From the outset, the Semantic Designs team realized that a programming language that could handle the computation of large-scale software systems, and could run in parallel on multiple processor workstations was essential to their work. Since this parallel programming language did not exist at the time, Semantic Designs created PARLANSE and built a DMS processor around it. Typically, legacy software systems ran on multiple computers throughout an enterprise and covered a variety of operating languages, processors, and platforms. A crucial question was whether PARLANSE could actually scale to handle diverse software configurations simultaneously.  By March 1996, the researchers had stabilized the first version of PARLANSE. It took them one year and extensive research to ensure that PARLANSE would work with Pentium-compatible code and UNICODE character systems.

After that breakthrough, Semantic Designs was successful in refining PARLANSE and making it work with the C++ programming language. But lacking adequate equipment resources, they still could not test all features of PARLANSE. At the end of 1997, when they could afford an eight-way multiple processor, the team was finally able to test PARLANSE over six to eight parallel processors. They were very satisfied with the results. Soon after, they also built a graphic interface for DMS on the Windows NT platform. In 1998, the Semantic Designs team built a high-performance PARLANSE compiler and revised the PARLANSE task manager to operate in large-scale systems. By the end of the ATP-funded project, the research team had developed DMS technology, had created the PARLANSE programming language, and had built a set of prototype tools based on DMS. PARLANSE was the first parallel programming language to work simultaneously on multiple processors and handle one million or more LOCs. The DMS tools

1http://www.semdesigns.com/Company/Publications/Legacy%20Transformation.pdf


 

with PARLANSE capability made the task of re-engineering large legacy systems quick and efficient.

DMS Handles Large-Scale Systems

If the DMS vision was to have significant economic impact, it had to achieve the following:

·         Operate on software systems of considerable scale handling as many as 1 million LOCs

·         Store the basic design data of a large software program

·         Have sufficient computational capacity to process these data within a reasonable period of time

·         Implement the work in parallel on high-end workstations

DMS achieved these goals signifying a technological breakthrough for the Semantic researchers. Indeed, PARLANSE was by itself a significant success and remains highly effective as of 2006. Using DMS, a software programmer can revise software systems and effect custom changes based on rules; using PARLANSE, the DMS can automatically distribute this task evenly across a set of multi-processors with 80-percent or higher efficiency. This is the single biggest advantage of PARLANSE. The project researchers wrote 14 journal papers and made several presentations at workshops and international symposia.

Prototype Toolkit Covers Several
Programming Languages

The Semantic researchers developed a prototype toolkit during the ATP-funded project based on the DMS technology. This toolkit was designed to implement the DMS technology on a legacy system for software upgrade and migration. The tools in the kit covered various programming languages, such as C, C++, COBOL, FORTRAN, HTML, Java, SQL, and Visual Basic and were supposed to make software engineering quick and efficient. While these tools could be customized to suit the specifications of a legacy system, the broad categories were the formatter, source file browser, test coverage tool, profiler tool, clone detector, migration tool, and source code obfuscator. As of 2006, the company has generated enough revenue from this product to pay

 

off the expenditure incurred for the initial commercial launch of the DMS toolkit. The two tools from this DMS toolkit described below illustrate the toolkit’s utility.

Clone Detector Tool: Most legacy software systems consisted of large amounts of exactly similar, and, hence, redundant code called clones. The detection, grouping, and ultimate removal of these clones during software upgrade were critical. The clone detector tool identified and sorted out this code from various parts of the legacy system and helped remove redundancy in the software architecture. This tool was initially validated for processing 77,000 lines of COBOL code during the ATP-funded project, but it has proved successful with other programming languages and very large numbers of LOCs, as well.

Source Code Obfuscator: Obfuscation, or changing the appearance of the software architecture to hide the real code, is critical to protect the exclusivity of a software technology. Hence, the source code obfuscator was an important tool in the DMS toolkit. This tool analyzed the software source code and generated a protective “cover” that would be very hard to break. Any software programmer trying to uncover the source code unethically would be lost in the mesh of code on the surface of this cover and would not be able to reach the real source code. Semantic Designs protected its own technology and intellectual rights using the obfuscation technique rather than pursuing patents.

Clone Detection Becomes Important
Revenue Generator

Semantic Designs sells clone detection mostly as a service but sometimes as a product; their charges depend on the size of the source code base to be processed. Typically, they sell their service to two or three large companies each year, generating about $150,000 in revenue. One company used Semantic’s clone detection services on 500,000 lines of Java-based code over a 6-month period.  They managed to shrink the source code base by as much as 25 percent and cut maintenance costs significantly. “We are still trying to get companies to listen to this kind of story,” said Dr. Ira Baxter, the company CEO as he engaged in negotiation with a division of the U.S. Army for the clone detection service.


 

Although Semantic Designs has successfully promoted these tools for cutting the cost of software re-engineering, computing the actual dollar savings accruing to a company that uses these tools has been difficult. Using standard industry estimates, Semantic Designs calculated the savings as follows: On an average, it costs $1 per LOC per year to maintain the code; assuming the code life is about 10 years, it will cost $10 to maintain this LOC over its lifetime. Saving 25 percent of 400,000 LOCs would mean removing 100,000 lines from the source base. At $10 of maintenance cost per line, this should result in savings of $1 million. Semantic Designs charges $ 0.03 per LOC for clone detection; it costs another $0.50 to remove each clone manually. This adds up to $65,000 ($15,000 to find and approximately $50,000 to remove) total cost for clone detection and removal from a legacy software system with 400,000 LOCs. Comparing this cost to the potential $1 million savings in maintenance cost indicates that the return on investment could be significant.

There are other benefits from using the clone detector tool. For instance, in the case of one company, Semantic Designs demonstrated that a clone occurred as many as 450 times within 6 LOCs in a legacy system. Since the clone had a “bug,” reuse of the LOC spread the bug. Removing the clones removed all occurrences of that bug within those LOCs, thus enhancing quality. This kind of indirect positive effect is harder to quantify, according to a company representative.


By 2000, Semantic Designs had earned
more than $5 million in business directly attributable to the DMS technology.


Since 2004, the company has been selling its source code obfuscator as an individual product, pricing it around $150 to $1,500 depending on its compatibility with different computer languages. Demand for this tool has been growing steadily, and, as of 2006, the revenue generated from the obfuscator has paid for Semantic Designs’ entire online advertising expenses.

Web service providers are potential customers of the source code obfuscator. When an online service

 

company builds web sites with JavaScript code, some of the JavaScript has to be transmitted to the customer's web browser to operate the web service. These companies can use Semantic Designs’ tool to obfuscate the code and protect it from reverse engineering. Other potential customers are electronic circuit designers who send circuit designs of digital electronic systems to their customers in a special computer language called Verilog. Using the source code obfuscator, they can scramble the circuit design and avoid reverse engineering. 

B2 Bomber Legacy Software Converted Using
DMS Technology

After the ATP-funded project ended in 1998, Semantic Designs submitted two proposals to a government agency to implement DMS further; both were turned down for lack of funds at the agency. Faced with these rejections, the company felt “an absence of identifiable paths to exhibit the ATP-funded technology to other government agencies,” observed a company spokesperson. However, in the same year, Semantic Designs won a contract from Rockwell Software to build a translator, based on DMS, for updating a proprietary software language. This was their first significant attempt at the commercialization of the DMS technology. By 2000, the company had earned more than $5 million in business directly attributable to the DMS technology. That same year, Semantic Designs converted a mission-critical software package belonging to Northrop Grumman used in B2 bombers. This software was coded in a legacy language and had to be migrated to C. DMS automatically converted the new software which then passed the extremely rigorous in-house systems tests before being released for integration in the B2 bomber system. Semantic Designs completed a similar migration of the legacy software system for F16 airplanes for Raytheon.

Semantic Designs also won a Boeing contract in 2002. Boeing used a software system called Boldstroke, designed in 1990, to operate an estimated 6,000 avionics components in 18 different airplane frames. Boeing engineers later chose another software system called Common Object Request Broker Architecture (CORBA) that was mandated by Government standards for airplane frames. But Boldstroke was not compatible with CORBA. Manual conversion of each component of


 

Boldstroke to the CORBA system would take a full month’s labor. There were 6,000 such components which meant that manually, such a conversion was an impossible task. With DMS, the project could be planned, implemented, and validated by Semantic Designs in approximately two years. However, this was a proof-of-concept: “We had originally planned on converting 60 components to help Boeing demonstrate their software running in UAVs (unmanned aerial vehicles). For lots of complicated reasons, they ended up using our tool to convert just two components, and those components did end up controlling a real UAV in a rather spectacular demonstration made for the armed forces last April [2005] at White Sands. But Boeing isn't committed to actually converting the larger bulk of the components at this point; we've just demonstrated that it would be possible at very economical costs,” said a spokesperson from Semantic Designs.

A cost-benefit evaluation was important when analyzing the scope of Boeing’s migration. The team of developers from Semantic Designs and Boeing who worked on this migration project wrote an article on DMS application in the May 2005 issue of Crosstalk.2 They said, “The measure of success is not whether a migration tool achieves 100 percent automation, but whether it saves time and money overall. Boeing felt that converting 75 percent of the code automatically would produce significant cost savings, a good rule of thumb for modest-sized projects…The code produced by the BMT [Boeing Migration Tool that was built with DMS to implement the migration] was 95 to 98 percent finished.” Thus DMS exceeded Boeing’s expectations.

The massive automated analysis and transformation of legacy systems like Boeing’s or Northrop Grumman’s have quantitative pay-off and qualitative benefits in terms of better and more efficient software. For instance, if Boeing intended to convert only two components in its Boldstroke program, they could have done it manually; to convert all major components of Boldstroke, they would need to automate the process. Similarly, failure to convert the legacy code in the B2 bomber software system would ground the airplanes. At the same time, Northrop Grumman faced a massive migration task to upgrade its archaic software system. In both cases, process automation was the only feasible solution. As a

 

result, Semantic Designs projects that the demand for their DMS-based products will grow.

The company has actively marketed DMS tools as (1) a toolkit that large organizations can purchase to build their own custom tools, (2) a technology service for Semantic Designs to implement custom tools, and (3) a basis for developing standard software engineering tools. In 2006, Semantic Designs also won significant contracts for building the following:

·         Vector compilers using DMS for the vector super microprocessors

·         Large-scale embedded C system re-engineering

·         Boxed products for software tools in more than 20 programming languages

The company anticipates annual business growth from DMS tools of about $1.4 million from 2006 to 2008. In five years, they project the company’s annual revenues will reach about $10 million and expect to double it to $20 million within a 10-year timeframe. “The DMS tool set is the principal driver of the value proposition by providing highly automated mass analysis or change on large-scale software systems,” said the company CEO. He added, “It isn't a three- or five-year head start. Frankly, without the [ATP] funding, the DMS tools would simply not exist.” 

Conclusion

Semantic Designs applied for funding under ATP’s “Component-Based Software” focused program in 1995 to develop a technology for automating software engineering tasks. Their objective was to build a design maintenance system (DMS) to automate the capture of legacy software system architecture and to manage design changes for its migration to another system. The three-year project ended with the successful development and validation of the DMS technology and toolkit, along with the creation of PARLANSE, a parallel programming language. Semantic Designs’ proprietary system made upgrading of legacy systems technologically and economically feasible. As of 2006, Semantic Designs continues to serve large companies like Boeing, Rockwell Software, and Northrop Grumman with the DMS technology and has sold the DMS tools to many others.

2http://www.semdesigns.com/Company/Publications/CrosstalkArticle/CrossTalk-05-2005.html



PROJECT HIGHLIGHTS
Semantic Designs, Inc.

Project Title: Software Re-Engineering Tools for Automating Process (Automating Legacy Systems of Software: Design Maintenance System)

Project: To develop a software design maintenance system (DMS) for automating design capture and migration of legacy software with minimum interaction from engineers.

Duration: 12/1/1995 - 11/30/1998
ATP Number: 95-09-0059

Funding (in thousands):
 
ATP Final Cost                $1,915    92.7%
Participant Final Cost           150      7.3%
Total                                $2,065

Accomplishments: Semantic Designs accomplished all their technical goals within the timeframe and plan set forth in their proposal to ATP:

·          Developed and validated the design maintenance system (DMS) technology for automation of code conversion for legacy software systems

·          Developed a prototype DMS toolkit compatible with several programming languages

·          Successfully developed and validated two tools: clone detector tool and source code obfuscator tool

·          Created a parallel programming language, called PARLANSE, for use on multiple large-scale processors in parallel

·          Developed PARLANSE as the first parallel programming language to handle more than one million lines of code compared to a few thousand lines

·          Introduced semantic analysis in the task of software re-engineering for greater speed and higher quality

Commercialization Status: Semantic Designs developed a prototype toolkit based on DMS for automating the extraction, analysis, and migration of source lines of code in a software legacy system. After the ATP-funded project ended, they successfully implemented a business plan to sell this toolkit and related services to several defense contractors like Rockwell, Northrop Grumman and Boeing. By the end of 2005, Semantic Designs had won and implemented contracts worth $5 million. They expect to grow their business by approximately $1.4 million annually till 2008,

 

and since the product and service lines are entirely based on DMS, this business growth is an outcome of the success of DMS. The company has also generated enough revenue to pay off the expenditure incurred for the initial commercial launch of the DMS toolkit.

Outlook: The business outlook for Semantic Designs and its DMS technology is strong. The DMS toolkit has been a good revenue generator for the company. Many organizations have legacy systems that urgently need migration, and the DMS toolkit is the only one of its kind on the market today. As of 2006, Semantic Designs was seeking additional business in defense and commercial sectors. 

Composite Performance Score: * * * *

Number of Employees: 5 employees at project start, 8 as of January 2006

Focused Program: Component-Based Software, 1995

Company:
Semantic Designs, Inc.

12636 Research Blvd., Suite C214

Austin, TX 78759-2200

 

Contact: Ira Baxter

Phone: (512) 250-1018

 

Publications: Project researchers shared their findings through the following publications:

 

·          Baxter, I., and C. Pidgeon. “Software Change through Design Maintenance.” Proceedings of the International Conference on Software Maintenance, IEEE, Bari, Italy, pp. 250-259, October 1-3, 1997.

·          Mehlich, M., and I. Baxter. “Mechanical Tool Support for High Integrity Software Development.” Proceedings of Conference on High Integrity Systems, IEEE, 1997.

·          Baxter, I.D., and M. Mehlich. “Reverse Engineering is Reverse Forward Engineering.” Proceedings of the Fourth Working Conference on Reverse Engineering, IEEE, Amsterdam, Netherlands, pp. 104-113, October 6-8, 1997.

 

PROJECT HIGHLIGHTS
Semantic Designs, Inc.

·          Baxter, I., A. Yahin, L. Moura, M. Anna, and L. Bier. “Clone Detection Using Abstract Syntax Trees.” Proceedings of the International Conference on Software Maintenance, IEEE, Bethesda, MD, pp. 368-377, November 16-20, 1998.

·          Baxter, Ira. Transformation Technology Bibliography, 1998.

·          Baxter, I. Tutorial on Transformation Systems. ICSR4, ICSR5 & ASE, 1998.

·          Baxter, I. “Design Reuse and Scale: Keys to Practical Code Generation and Large-Scale Software Maintenance.” Proceedings of the Third Symposium on Application-Specific Systems and Software Engineering Technology, IEEE, Richardson, TX, pp. 119-120, March 24-25, 2000.

·          Baxter, I. “Preprocessor Conditional Removal by Simple Partial Evaluation.” Proceedings of the Eighth Working Conference on Reverse Engineering, IEEE, Stuttgart, Germany, pp. 281-290, October 2-5, 2001.

·          Ricca, F., P. Tonella, and I. Baxter. “Restructuring Web Applications Via Transformation Rules.” Proceedings of the First International Workshop on Source Code Analysis & Manipulation (SCAM), IEEE, Trento, Italy, pp. 150-160, November 10, 2001.

·          Baxter, I. “Parallel Support for Source Code Analysis and Modification.” Proceedings of the Second International Workshop on Source Code Analysis & Manipulation (SCAM), IEEE, pp. 3-14, October 1, 2002.

·          Baxter, I., P. Pidgeon, and M. Mehlich. “DMS: Program Transformations for Practical Scalable Software Evolution.” Proceedings of the International Conference on Software Engineering, IEEE. pp. 625-634, May 23-28, 2004.

·          Baxter, I.D., and R.L. Akers. “Component Architecture Re-engineering by Program Transformation.” Proceedings of the Twentieth International Conference on Software Maintenance, IEEE, p. 509, September 11-14, 2005.

·          Akers, R., I. Baxter, and M. Mehlich. “Reengineering C++ Components Via Automatic Program Transformation.” Proceedings of the Twelfth Working Conference on Reverse Engineering, IEEE, p. 13-22, November 7-11, 2005.

·          Akers, R., I. Baxter, M. Mehlich, B. Ellis, K. Luecke. “C++ Component Model Reengineering By Automatic Transformation.” Crosstalk, 2005.

 

Presentations:

·          Baxter, I. “An Overview of Transformational Design Maintenance System.” Workshop on Transformation Systems, Durham, NC, 1996.

·          Pidgeon, C. “Organizing and Enabling Domain Engineering to Facilitate Software Maintenance.” Eighth Annual Workshop on Software Reuse, IEEE, 1997.

·          Baxter, I. “Transformation Systems: Domain-Oriented Component and Implementation Knowledge Reuse.” Ninth Annual Workshop on Software Reuse, IEEE, 1998.

·          Baxter, I. “Parallel Support for Source Code Analysis and Modification.” Keynote address, Source Code Analysis and Manipulation (SCAM), IEEE, 2002.


Research and data for Status Report 95-09-0059 were collected during January – March 2006.