James R. (Jim) Cordy

Research Projects and Support


Research Interests

  • Computer language design and implementation.
  • Software engineering, software architectures, software tools.
  • Software and document analysis and transformation.
  • Parser-driven systems, grammar engineering.
  • Rule-based programming, pattern recognition.
  • Semantic markup, the semantic web.
  • Autonomic software systems.

Current Projects

All of my current research projects are carried out in the context of the Source Transformation Group in the Software Technology Laboratory, in collaboration with Prof. T.R. Dean, Prof. J. Dingel and Prof. D. Blostein. The Source Transformation Group explores the paradigm of structured source transformation as applied to problems in software system design, analysis and maintenance, as well as computational problems in general.

The TXL Project

Prof. J.R. Cordy, Prof. T.R. Dean, Adrian Thurston, Derek Shimozawa          TXL Website

TXL is a unique programming language and source transformation system on which much of the research of the group is based. TXL is the evolving result of more than fifteen years of concentrated research on rule-based structural transformation as a paradigm for the rapid solution of complex computing problems. TXL has been widely used in research and industry, and has been particularly successful in the domain of software system analysis and maintenance.

The global goal of this project is the definition, exploration and refinement of the transformational programming paradigm. Recent work in the TXL project has concentrated on the design and formalization of a modularity concept for transformation rulesets and the exploration of higher-level specification of transformation rules (ETXL), on user interfaces for maintaining and understanding transformational programs (TETE), on the conversion of TXL to the XML standard for structured document exchange, and on agile parsing, a general method for customizing grammars for each particular analysis and transformation task.

In addition to the other projects outlined below, recent work using TXL in mathematics notation recognition and business card recognition is carried out in the Diagram Recognition Laboratory.

Next Generation Transformational Languages Project

Prof. J.R. Cordy, Adrian Thurston

The goal of the NGTL project is the design and implementation of the next generation of programming languages based on the source transformation paradigm. Based on lessons learned from TXL, ASF+SDF, Stratego and other successful practical source transformation languages, this project will undertake to extend the paradigm to accessible mainstream general purpose use. A first experiment in higher level language features for this class of language has resulted in ETXL, an extension of TXL designed by Adrian Thurston.

The Whole Website Understanding Project (sponsored by CSER)

Prof. J.R. Cordy, Prof. T.R. Dean, Mykyta Synytskyy, Lei Wang

The Whole Website Understanding Project (WWSUP) explores the analysis and design-level understanding of entire websites from their source code. The project seeks to automate an understanding that transcends boundaries between languages (HTML, style sheets, Visual Basic, JavaScript, Java, Perl, etc.), and technologies (client, server, database). The goal is to allow exploration of improvements to the architecture and abstract design of websites using refactorings that cross language and technology boundaries in order to improve website maintainability and long term evolution.

This is a long term project with many facets and interesting challenges. Recent work has involved the integrated parsing of client side source languages (HTML, Visual Basic, JavaScript), client side clone detection and refactoring, Java applet design recovery, analysis and migration, and Java unique renaming and library evolution analysis.

The Software Design Ontology Project (sponsored by CSER)

Prof. J.R. Cordy, Prof. T.R. Dean, Prof. D. Jin (U. Manitoba)

The Software Design Ontology Project is aimed at the problem of interoperability of legacy software system understanding, analysis and migration toolsets. While many different practical systems for software system understanding and analysis have been demonstrated, each uses it own unique format, technology and schema to represent recovered software design information. Using a constructive approach to deriving a shared "domain ontology" for software design concepts that can be used as a bridge between different formats, schemas and tools.

Recent work in this project has resulted in a taxonomy of patterns for software exchange, and in the OASIS ontological framework for reverse engineering tool integration documented and demonstrated in Prof. Jin's recent PhD thesis.

Transformation Engineering Toolkit for Eclipse (TETE) Project (sponsored by IBM)

Prof. J.R. Cordy, Derek Shimozawa

Source transformation is rapidly becoming a mainstream programming paradigm for the solution of a wide range of problems in software engineering, database, artificial intelligence and web technology. It supports solutions in many important areas such as the data mining, structured document generation, analysis and restructuring, software static analysis and refactoring, gene sequencing, document recognition, internet commerce and the semantic web. Increasingly, source transformation systems such as TXL, ASF+SDF and XSL/T are being taught and used as a modern solution technology at the upper undergraduate level.

While experienced users of source transformation systems can rapidly achieve good results, the technology can be very difficult to learn, due in part to the lack of good teaching paradigms, interfaces and support tools. The long time it takes to learn to use these source transformation tools is a real barrier to their adoption in the undergraduate curriculum. In this project we are attacking this problem head-on by creating a new kind of interface and paradigm specifically aimed at supporting learning about source transformation and source transformation tools. The Transformation Engineering Toolkit for Eclipse, or TETE for short, provides a simple, consistent interface for learning about, authoring, maintaining and interactively exploring source transformations specified using the TXL source transformation language.

Software Tuning Panels for Autonomic Control (STAC) Project (sponsored by IBM)

Prof. J.R. Cordy, Liz Dancy, Nevon Brake

One aspect of autonomic computing is the ability to identify, separate and tune parameters related to performance, security, robustness and other properties of a software system. Often the response to events affecting these properties consists of changes to tunable system parameters such as table sizes, timeout limits, restart checks and so on. One can think of these tunable parameters as a set of knobs that can be tweaked or switched to adapt the system to environmental or usage changes. In many ways these tunable parameters correspond to the switches and potentiometers on the tuning panel of many hardware devices.

If we model our software system in these hardware terms, it is immediately obvious that we have a long way to go to have something as convenient and appropriate to autonomic operation as a tuning panel. While in some kinds of software, such as database systems, tuning parameters have been explicitly identified and isolated, in other kinds of software the parameters appropriate to autonomic operation are often hidden deep within software sources for sound architectural reasons such as information hiding and separation of concerns.

In this project we plan to address this problem with the goal of leveraging existing software analysis, refactoring and transformation techniques to identify and isolate tuning and other system parameters in a separate "tuning panel" for the software system. In essence, the goal will be to provide a framework to automate the rearchitecting of software systems for more effective autonomic operation by getting the "knobs and switches" all in one place, without violating the integrity and maintainability of the system. The problem is roughly analogous to the problem of hardware layout constraints that provide for contacts or controls to be isolated at the accessible edges of a silicon chip or printed circuit board while maintaining the architectural integrity of the circuit.

Complementary Software Validation Project

Prof. J.R. Cordy, Prof. J. Dingel, Jeremy Bradbury

Modern formal analysis tools are not only used to provide property proofs but are also used to debug software systems - a role that has traditionally been reserved for testing tools. In this project, we are interested in exploring the complementary relationship between testing and formal analysis. We have begun with an approach to the assessment of testing and formal analysis tools using metrics to measure the quantity and efficiency of each technique at finding bugs, and have designed an assessment framework constructed to allow for symmetrical comparison and evaluation of testing versus formal property checking.

Lightweight Semantic Markup Project (sponsored by ITC-IRST)

Prof. J.R. Cordy, Prof. J. Mylopoulos (U. Toronto / U. Trento), Prof. L. Mich (U. Trento), Nadia Kiyavitskaya, Nicola Zeni (U. Trento)

Semantic markup is the annotation of world-wide web or other natural language documents to assign explicit real-world semantics to portions of the document in order to allow for rapid identification of documents and parts of documents relevant to a particular question or purpose. Semantic markup represents the essential difference in the vision of the "semantic web". Given the number and scope of documents on the world-wide web, transition to the semantic web vision cannot be achieved without large-scale efficient automation of semantic markup. It seems clear that full natural language understanding systems will not be ready for this task for some time, and thus lightweight, approximate methods may be our best hope for this immediate and pressing need.

Another domain in which such an immediate and pressing need for large scale analysis of source texts has been faced is legacy software source analysis, which successfully faced the "year 2000" problem only a few years ago. Some of the most successful techniques for automating solutions to that problem utilized "design recovery", the analysis and markup of source code according to a semantic design theory, to assist in this problem. In this project we are leveraging the highly efficient methods and tools already proven in the software analysis and markup domain as the basis of a new lightweight method for semantic analysis and markup of natural language texts, in the hope that we can attain similar performance and scalability while yielding good quality approximate results.

Research Support and Collaboration

My research is currently supported from the following sources.

NSERC Individual Discovery Grant, "Source Transformation Systems", five years (2007-12). Supports my mainstream research in the exploration of source transformation as a programming paradigm.

NSERC Cooperative Research and Development (CRD) Grant, "Software Engineering for Network-Centric Computing", three years (2002-05), with T.R. Dean. Part of a large Consortium for Software Engineering Research (CSER) project with J. Mylopoulos (Toronto, P.I.), K. Kontogiannis (Waterloo) and E. Stroulia (Alberta). Supports my research in web technology and software design ontology.

IBM Centre for Advanced Studies and IBM Research through Eclipse Innovation Awards, "Transformation Engineering Toolkit for Eclipse" (2004-05), "Software Tuning Panels for Autonomic Control" (2005-06, 2007-08).

SRA division of ITC-IRST Trento, Italy, (2004-05), with J. Mylopoulos. Supports my new research in lightweight semantic markup and the semantic web.

Queen's University Special Research Award, five years (2002-07), with T.P Martin. Supports my exploratory research in new topics such as implicit invocation languages.


Cordy Home

Last updated 7 April 2007