College of Computer and Information Science
Northeastern University
360 Huntington Avenue
Boston, Massachusetts 02115

Office: 202 West Village H, room 344
Phone: ++1-617-373-3100
Email: wahl–ät–ccs.neu.edu
                                      Thomas Wahl                   
Northeastern University

NEWS:

  • Prospective Ph.D. applicants: I am planning to expand my group in all of my research areas and am therefore looking for enthusiastic students to work with.
    First study the type of research I do (start with the overview below). Then you can contact me about it via email. For applying to our PhD program, see here. Our deadlines are typically in December.
  • Course "Engineering Reliable Software", Spring 2019: see here



Brief bio:

  • 2011–present: Full-time faculty at Northeastern University, College of Computer and Information Science
  • 2009–2011: Research Assistant, Oxford University, Department of Computer Science (formerly "Computing Laboratory"), UK
  • 2007–2009: Lecturer and Postdoc, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland
  • 2007: Ph.D., University of Texas, Austin



General research interests:

  • Software verification techniques, such as predicate abstraction, SAT- and SMT solving, decision procedures
  • Infinite-state system verification, such as arbitrarily-threaded programs or distributed systems
  • Stability, robustness, reliability of numeric computations, especially using floating-point arithmetic
  • Software analysis techniques for security verification, such as detection of information leaks (e.g. through side channels)



Overview: Some current projects

  • Parameter-aware program verification using Observation Sequences (currently with Peizun Liu):

  • Short version: Verification techniques try to analyze software (or hardware) for bugs: behaviors that are unexpected, unwanted, and can cause the software to end up in an unworkable state, crash, or simply compute wrong results. This project develops a verification technique that tries to exploit the fact that most software systems are designed over a variable number of resources. How do changes (especially incremental changes) of the resource parameter affect the behavior of the software?

    More details: Many systems are naturally designed (consciously or not) over a variable number of certain resources, which can mean physical or logical entities. Examples of such resources include:
    1. the number of physical components of a certain functional type,
    2. the number of clients that simultaneously request a service from a server,
    3. the capacity of the communication channel between components,
    4. the number of context switches permitted among concurrent threads,
    5. the recursion nesting depth (= stack size) permitted in a piece of software.
    We can expect that the behavior of the instances of the parametric design family (= the concrete designs for each parameter choice) depend on the parameter in a smooth, "continuous" way. This enables analyses of the function that maps the number of resource instances (= the parameter) to a concrete design (this function goes from the natural numbers to the design space).
    More concretely, suppose k is an integral parameter of the design. With the target property P to be verified in mind, we define a function O mapping the parameter to observations about the design of size k, such that knowledge of O(k) enables a decision on whether P holds of the instance of size k. For example, the "observation" could simply be the set of states reachable in the system of size k.
    In general, we require of function O that it be monotone: more resources means more system behaviors (observations). This is almost always the case provided the value of k is not accessible in the program defining a system instance's behavior. (If it is, the dependence on k can be arbitrary, making analysis against unbounded k in general impossible.)
    We now analyze the system design for increasing values of k. Due to monotonicity of O, consecutive observation sets O(k) are in some sort of ≤ relationship. This begs the question whether the function O converges at some parameter point k0, i.e. whether it stops increasing after reaching k0. If so, the truth value of property P has stabilized, i.e. unless we have detected a violation of P up to this point, we have proved the property for all parameter instances.

    We have applied the Observation Sequence paradigm to the undecidable problem of analyzing reachability for concurrent procedures communicating via global-scope variables [PLDI 2018]. Our technique is dubbed CUBA (Context-UnBounded Analysis). A webpage with downloads and many other resources can be found here. Ongoing work includes the design and implementation of a technique for analyzing unbounded-queue message passing systems written in the P language, such as distributed communication protocols.

  • Analyzing floating-point programs for non-portability due to compiler and hardware dependencies (currently with Yijia Gu):

  • Short version: The precise semantics of floating-point arithmetic programs, i.e. the value they compute, depends not only on the input but also on the behavior of the compiler and on the target hardware. Such dependencies infringe on the highly desirable goal of software portability: the same program run on the same inputs on different platforms can produce different results. We are investigating the consequences of these dependencies for numeric programs in general, for decision-making programs, and for floating-point compilers.

    More details: Serious doubts on the portability of numeric applications arise when platform-dependent program differences are behavioral, i.e. when they lead to changes in the control flow of a program. We have developed a method that takes a numeric procedure and determines whether a given input may cause the procedure to make different decisions, based merely on the floating-point platform on which it is compiled and executed. This method can be used in static and dynamic analyses of programs, to estimate their numeric stability [EURO-PAR 2015]. A project webpage can be found here.

    More recently, we developed a method that can partially repair numeric instability against platform changes, making numeric programs more robust. This is in principle easy, by instructing the compiler to evaluate every expression following a "strict" evaluation model, such as "left-to-right", and to ignore any precision or efficiency optimization features, such as contracted operations, special floating-point division routines, etc., across the whole program. But this can lead to inefficient code: often times numeric instability is caused only by certain critical program fragments; we don't want to take away the compiler's freedom to optimize the others, which do not contribute (much) to instability.
    Our method therefore first identifies what these critical program statements are, and then stabilizes them within a small local scope. The goal is to produce programs with a good balance between efficiency and robustness against platform changes. [VMCAI 2017]

    Most recently, we are working on a way to evaluate the "damage", in terms of precision, that compiler optimizations cause to floating-point programs. Since floating-point arithmetic expression semantics is very sensitive to the evaluation order (e.g., the addition operation is not associative), expression rewriting generally changes the value computed, at least for some inputs. This causes runtime optimizers to produce code that is not I/O equivalent to the original, violating one of the fundamental rules of code optimization. In many cases, the affect on the computed value is marginal, but not in all. How can we detect such cases, how can we decide whether the optimization is "worth it" compared to the change in output value, and how can we perhaps even repair the optimizer, so that it produces stable yet runtime-optimized code? [This is ongoing work.]
  • Compilation-dependent security properties of software (currently with Konstantinos Athanasiou):

  • Software verification techniques are arguably underused in proofing a program against security vulnerabilities. The few techniques that do exist typically operate on the source code level, which have proven useful for detecting leaks. For software side-channel attacks, which try to gain access to passwords or other secret keys by exploiting fine-grained information about the power or time consumption of program runs, source code level techniques are not reliable in actually verifying security properties. The reason is that the program undergoes many security-relevant changes during compilation, such as due to code optimizations, register allocation, and even runtime instruction scheduling. Such modifications are of course not visible, often not even predictable, at the source code level. Classical static analysis techniques thus give a false sense of security.

    This problem is well-known throughout verification but has recently received a huge amount of attention in the wake of source code-oblivious side-channel leaks such as Meltdown and Spectre. In this project we investigate various instances of this problem. Our goals are to detect them, to fix them, and to prove that certain leaks are not possible, independent of how the compiler transforms the program into an executable. [This is ongoing work.]



Research sponsors:

National Science
                                      Foundation Microsoft Research



Conference program or organizing committees:




Awards:


Design borrowed in part from Domagoj Babic