PhD position

Safe build of systems by static analysis of names

  • Location: IMT Atlantique, Brest
  • Start date: September/October 2021
  • Keywords: Safety of build, heterogeneous language development, naming consistency, formal model, static analysis.
  • Team : P4S/SHARP/LabSTICC (UMR 6285)
  • Contact : julien.mallet[@]imt-atlantique.fr

Announce

Applications are invited for a PhD position on using static analysis on heterogeneous languages to improve build safety. The student will work in the P4S team at IMT Atlantique & Lab-STICC (CNRS), Brest, France The position is for 3 years from September/October 2021.

Position requirements:

  • Master’s degree or equivalent in Computer Science, with a specialization in Software Engineering
  • Knowledge or special interest in static analysis (ideally)
  • Solid software development and programming skills
  • Good communication skills in English (both oral and writing)

IMT Atlantique is a French elite technological university located in the beautiful city of Brest, west of France (4 hours by train from Paris). Pleasant working conditions will be offered to the student.

To get more information and apply, please send a complete CV with a corresponding motivation letter, recommendation letter(s) and a list of both already published papers and open source contributions (if any) to Julien Mallet and Antoine Beugnard .

Context and Issues

Developing large software systems relies on many technologies and more and more frequently on many languages. Some of these languages are for programming, others for configuring, building, testing, or deploying the system. In this context, building languages such as makefile, ant, maven or gradle have a central position. They make explicit and automatic many different processes: compiling, building, testing, packaging, deploying all using many different source files for programs, resources, configurations, tests, etc.

As Mokhov et al. said in [1],

“[b]uild systems (such as MAKE) are big, complicated and used by every software developer …[but] [t]hese complex build systems use subtle algorithms, but they are often hidden away, and not the object of study.”

In this PhD proposal, we do not focus on build mechanisms, such as file dependencies, as in [1], but on a special point of view, common to all build systems and of critical importance for developers: naming.

In fact, as a consequence of this diversity of languages, references to information from one language to another one are frequent. For example, in the Android development systems, resources are described in XML files that Java programs can use. An ad hoc technique is used to build a cross-reference by generating a R.java file that links java entities to XML resources. However, when many languages and resources are used there exists no general approach that ensures the good usage of cross-references among different languages. This makes the use, verification and tuning of large heterogeneous systems very difficult.

Moreover, these dependencies depend on time. Some are introduced at development time, but required to be resolved at execution time. This means that until execution, it may be possible that no warning at all is raised.

Usually, a part of these dependencies is described in a build file such as a make (or ant, maven, gradle) file. But this is only a small part of all dependencies. Remaining unchecked dependencies may trigger unexpected runtime errors. We propose a new model to reduce this kind of risks.

The goal of this PhD is to study a general theory for describing the usage of name in heterogenous context that offers consistency verification algorithms.

We are interested in the use of names to identify entities. Resources are named. Files, images, sounds, variables, screens, activities have name when they need to be referenced. Strategies to use name may be different. A file system uses a tree, Android uses a flat naming space (R.java file), Internet URIs (protocol+name tree), programming languages use scopes, etc. Every useful entity has a name: files, resources, modules, function, computer, node, command, etc.

Then, the goal is to provide a unified model that can ensure the good use of names when, building and running heterogeneously defined systems. The expected contribution is a generic naming model where good usage of names can be checked at different stages of the development process. The approach excludes things is part of security and that the proposed approach could, with deeper studies, be applied to security aspects. For instance, a reference with no value, a configuration data with multiple values or a name too largely visible may be the source of attacks. We have identified on the CAPEC and OWASP sites some attacks that could be prevented by our approach (Intent Spoofing, Resource Location Spoofing or Contaminated Resources, Sensitive Data Exposure or XML External Entity).

The research

The overall process we imagine is 3 steps. The first step is to extract (1) local data within a single “naming model”, then everything is composed (2) in a single and global model on which a verification (3) is applied. Verification is processed at different stages of the development process (from compilation to run) against rules that are either mandatory (the semantics) either optional (design choices) such as good practices.

We rely on the theory of name resolution [2], proposed by P. Neron et al. to address the problem. This theory is applied to many languages but always independently and for explaining fine language mechanisms. We propose to generalize this approach to heterogeneous context including non-programming languages such as build languages.

We intend to define a common naming model in which each part would be projected in, including its references to external “things”. Each small model could be combined with others in order to check increasingly the global consistency of the (collection of) naming model(s). The role of this model is close to those of a linker (in the sense of compilation) but in a heterogeneous context with multiple languages and ignoring all the operational semantic of each language except its naming aspect. (The level of details would have to be explored).

Many contexts could be used as experimental platforms. As a starting point, we could develop a proof of concept tool supporting C language and a subset of make. Another option would be to use the Android platform.

A prototype would have to extract the model from all heterogeneous parts and to check its correcteness. We imagine two complementary approaches:

  • Define a formal model, formal properties and definitions. Then implement it in a prototype for validation purpose.

  • Start to implement a semi-formal model and check it rapidly against different platforms. The generaliza- tion and formalization is developed afterwards.

The choice of the strategy relies mainly of the applicant skills.

Research team

The P4S team is member of the SHARP cluster of the Lab-STICC laboratory (UMR 6285). Its name is an acronym for Processes for Safe and Secure Software and Systems.

Our interest is in methods and tools to specify and describe systems and software so that evaluation and analysis increase confidence and ensure safety and security.

We use are experts in system and process modeling. We advocate the idea that the mastering of both process and product are required to reach safety, security of built products.

This motto is a reason why the heterogeneous aspects of this thesis includes both source files and build files. In fact, if the source files describe the product, we consider the build description as an abstraction of the development process.

References

[1] A. Mokhov, N. Mitchell, S. Peyton Jones. Build Systems à la Carte, presented at ICFP 2018: Proceedings of the ACM on Programming Language, pp. 1 – 29, 2018.

[2] P. Neron, A. Tolmach, E. Visser, and G. Wachsmuth. A Theory of Name Resolution. presented at the Programming Languages and Systems - 24th European Symposium on Programmin, ESOP 2015, London, UK, 2015.

[3] CAPEC. Common Attack Pattern Enumeration and Classification, capec.mitre.org.

[4] OWASP. Open Web Application Security Project, www.owasp.org.