VIJAY
GANESH RESEARCH STATEMENT

The primary focus of my research is the theory and practice of mathematical or symbolic reasoning algorithms (e.g., SAT/SMT solvers or provers) aimed at AI, software engineering, security, physics, and mathematics. In this context, I currently lead the following projects: MapleSAT solver, complexity-theoretic understanding of solvers, theory and practice of string solvers, MathCheck SAT+CAS system for combinatorial mathematics, Attack Resistance, SAT-based cryptanalysis.

From 2005 to 2012, I led the STP project, a solver for a theory of bit-vectors and arrays, that played an important role in the development of dynamic symbolic execution or concolic testing. To-date I have won over 25 honors, best-paper awards, and distinctions in my research career.

Motivation

Mathematical logic reasoning
engines (aka solvers or provers) are algorithms that take as
input mathematical formulas and decide whether they have
solutions (dually, decide validity). Over last several
decades, these algorithms have had deep impacts in many
areas of software engineering, security, physics, and
mathematics. For example, SAT/SMT solvers (e.g., MiniSAT,
Z3, and STP) have enabled many formal verification, testing,
program analysis, and synthesis techniques; proof assistants
(e.g., HOL and Coq) are increasingly being used to verify
the correctness of mission-critical software; and computer
algebra systems (e.g., Maple and Mathematica) have long been
used by physicists and engineers to symbolically reason
about mathematical models. Additionally, many of these tools
(esp. proof assistants) are now being used by mathematicians
to construct proofs for long-standing open conjectures
(e.g., the Kepler conjecture).

While the dramatic impact of these tools is reason enough to continue research on this topic, I am driven by the sheer intellectual challenge of pushing the limits of "automation of mathematics" in the spirit of Leibniz and Hilbert. Despite the impossibility of this task for mathematics in general -- shown by Godel and Turing -- the work of many mathematicians and computer scientists has led to significant progress in automation of many fragments of mathematics since the 1950's. I am in pursuit of the day when solvers and provers will be a crucial part of mathematical practice, just as they already are in engineering and software development.

While the dramatic impact of these tools is reason enough to continue research on this topic, I am driven by the sheer intellectual challenge of pushing the limits of "automation of mathematics" in the spirit of Leibniz and Hilbert. Despite the impossibility of this task for mathematics in general -- shown by Godel and Turing -- the work of many mathematicians and computer scientists has led to significant progress in automation of many fragments of mathematics since the 1950's. I am in pursuit of the day when solvers and provers will be a crucial part of mathematical practice, just as they already are in engineering and software development.

Significant practical contributions

Over the last several years, I
have led the development of four award-winning solvers that
have made significant impact on software engineering,
security, and increasingly on combinatorial mathematics: MapleSAT, The Z3 String Solver, MathCheck, and STP.
MapleSAT (2015 - present) is a SAT solver for solving
Boolean formulas obtained from software engineering and
security applications. The crucial insight in MapleSAT --
key to winning gold and silver medals at the 2016 and 2017
SAT competitions -- is that solver heuristics (e.g.,
branching and restarts) can be modeled as reinforcement
learning techniques. The MapleSAT Boolean SAT solver is one
of the first solvers where machine learning and deductive
methods have been symbiotically combined for greater
efficiency.

The Z3 string solver (2013 - present) is a solver for formulas from a rich theory of string equations, regular expressions, and arithmetic. The Z3 string solver has proven to be particularly useful in the context of security applications. This solver is currently in use in many companies including Amazon, IBM, Microsoft, and Google.

Prior to developing MapleSAT and the Z3 string solver, I designed and implemented the STP bit-vector and array solver (2005 - 2012), a popular SMT solver used in software verification and testing. The STP solver played an enabling role in the development of dynamic symbolic analysis, widely considered a breakthrough technology in software engineering and security, and is currently in use in more than 100 research projects in academia and industry. STP has been used to find over a thousand security bugs in Debian Linux tools. For this work on SMT solvers and symbolic analysis, my collaborators and I won an ACM Test of Time Award at the CCS 2016 conference.

The Z3 string solver (2013 - present) is a solver for formulas from a rich theory of string equations, regular expressions, and arithmetic. The Z3 string solver has proven to be particularly useful in the context of security applications. This solver is currently in use in many companies including Amazon, IBM, Microsoft, and Google.

Prior to developing MapleSAT and the Z3 string solver, I designed and implemented the STP bit-vector and array solver (2005 - 2012), a popular SMT solver used in software verification and testing. The STP solver played an enabling role in the development of dynamic symbolic analysis, widely considered a breakthrough technology in software engineering and security, and is currently in use in more than 100 research projects in academia and industry. STP has been used to find over a thousand security bugs in Debian Linux tools. For this work on SMT solvers and symbolic analysis, my collaborators and I won an ACM Test of Time Award at the CCS 2016 conference.

Significant theoretical contributions

On the theoretical front, I
have resolved several open problems in logic and complexity
theory. The most prominent being decidability and undecidability theorems
for different fragments of theories over string equations,
regular expressions, arithmetic, and the string-integer
conversion predicate.

In a separate line of work, my students and I developed MathCheck: a SAT solver-based tool to construct counterexamples for math conjectures. The key insight behind MathCheck is that the problem of finite verification or counterexample construction for mathematical conjectures can be reduced to a combinatorial search problem. While SAT solvers are effective at combinatorial search, they are severly handicapped by the fact they don't come equipped with domain-specific knowledge about any particular mathematical domain or the mathematical-conjecture-under-test. On the other hand, Computer Algebra Systems (CAS) can be viewed as storehouses of significant mathematical knowledge, but are poor at combinatorial search. By combining the search capabilities of SAT solvers with domain-specific knowledge of CAS, we can make the problem of searching for counterexamples to combinatorial conjectures tractable. MathCheck is one such SAT+CAS combination. We used MathCheck to resolve many open problems in combinatorial mathematics. Perhaps our most prominent success was to find the smallest counterexample to the Williamson conjecture -- a question that had been open since 1944.

In another line of work, my collaborators and I recently wrote one of the first papers on the proof complexity of SMT solvers. I am also deeply interested in proof complexity of SAT solvers, and am working towards understanding the power of restarts, extension rules, and algebraic proof systems within this context.

In a separate line of work, my students and I developed MathCheck: a SAT solver-based tool to construct counterexamples for math conjectures. The key insight behind MathCheck is that the problem of finite verification or counterexample construction for mathematical conjectures can be reduced to a combinatorial search problem. While SAT solvers are effective at combinatorial search, they are severly handicapped by the fact they don't come equipped with domain-specific knowledge about any particular mathematical domain or the mathematical-conjecture-under-test. On the other hand, Computer Algebra Systems (CAS) can be viewed as storehouses of significant mathematical knowledge, but are poor at combinatorial search. By combining the search capabilities of SAT solvers with domain-specific knowledge of CAS, we can make the problem of searching for counterexamples to combinatorial conjectures tractable. MathCheck is one such SAT+CAS combination. We used MathCheck to resolve many open problems in combinatorial mathematics. Perhaps our most prominent success was to find the smallest counterexample to the Williamson conjecture -- a question that had been open since 1944.

In another line of work, my collaborators and I recently wrote one of the first papers on the proof complexity of SMT solvers. I am also deeply interested in proof complexity of SAT solvers, and am working towards understanding the power of restarts, extension rules, and algebraic proof systems within this context.

Current and future directions

All my current and future projects, at their core,
are about "automation of mathematics via solver/provers".
For example, I am not only interested in novel SAT/SMT
algorithms (see 2. and 3. below), but also interested in
their proof complexity (see 1. below). Further, I am
interested in their applications in mathematics and
physics via SAT+CAS tools like MathCheck (see 4. and 6.
below), as well as in security (see 5. and 7. below).

- Proof
complexity of SAT/SMT solvers and formal verification
algorithms: One of the most important
problem in solver and verification research is "why do
solvers work efficiently, even though the problems they
solve are traditionally considered hard (NP-complete,
PSPACE-complete,...)." One of my research goal is to
understand this phenomenon through the lens of proof
complexity theory, which is the ideal setting for a
theoretical understanding of SAT/SMT solvers and formal
verification algorithms in general. Within this context,
my collaborators and I have already made significant
progress. We have identified parameters such as mergeability of
Boolean formulas that go a long way in explaining
the structure of industrial instances, and why solvers are
efficient on such instances. The eventual goal of this
research program is to prove parameterized proof
complexity-theoretic theorems that explain precisely why
solvers are efficient for large classes of industrial
instances.

- Novel SAT/SMT algorithms via machine
learning: As mentioned above, one of the
solvers my students and I developed is the MapleSAT solver, one of the most
efficient SAT solvers today. MapleSAT combines (inductive)
reinforcement learning algorithms for branching and
restarts with (deductive) conflict analysis. It is
important to note that modern SAT solvers are algorithms
that inherently perform decisions during their search
whose rewards may only known far into the future. Such
algorithms can particularly benefit from reinforcement
learning methods as we have already demonstrated in
MapleSAT. My goal is to continue with the integration of
machine learning algorithms with deductive methods inside
solvers. Given the success of MapleSAT I expect this line
of research to be very fruitful for developing powerful
new methods for both SAT and SMT solvers.

- Novel SMT algorithms
for first-order and higher-order theories (e.g., theories over strings): I
plan to continue to work on fragments and extensions of
string theories as part of the Z3 string solver. Theories
over strings are of great importance in the context of
computer security, and developing practically efficient
algorithms for them is very challenging. The Z3 string
solver is an important development, but a lot remains to
be done. For example, there are virtually no good
practical solvers for context-free grammar constraints or
combinations of word equations, arithmetic over length,
and context-free grammars. Another theory of interest is
floating-point arithmetic, especially given its relevance
to physics simulation software.

- Combining SAT and CAS for
Combinatorial Math: As mentioned above,
one of my enduring interests is to develop effective tools
for counterexample construction and finite verification of
math conjectures. In this context, my students and I
developed the MathCheck SAT+CAS tool. We plan to
continue extending this tool, especially in the context of
the probabilistic method. The key idea underlying the
probabilistic method is the use of probability theory to
non-constructively establish the existence of large
combinatorial objects. The problem we would like to
address is "how can we
constructively establish the existence of large
combinatorial objects whose existence has been
established non-constructively via the probabilistic
method". Given the ability of SAT+CAS to search
large combinatorial spaces it is natural to use SAT+CAS
tools to constructively establish the existence of large
combinatorial objects, thus augmenting the power of the
probabilistic method.

- Attack-resistance
of security defense mechanisms: Computer
security researchers often propose defense mechanism
without providing formal guarantees of security. To
address this problem, my collaborators and I developed an
idea we call attack-resistance, wherein we formally
analyze security defense mechanisms via the idea of
computational indistinguishability from cryptography. We
say that a defense mechanism is attack-resistant provided
a resource-bounded attacker cannot computationally
distinguish the mechanism-under-analysis from an ideal
unbreakable system. We applied this to several well-known
defense mechanisms, and were able to show that these
mechanisms were indeed attack-resistant in a formal sense.
Additionally, our analysis led us to finding weaknesses in
other mechanisms. We published our first big paper on this
topic at the Euro Security & Privacy conference 2017
held in Paris, France.

- Verification and testing of
smart contracts: Since the advent of
bitcoin, the term blockchain has become commonplace and
represents an exciting new class of technologies that have
the potential to fundamentally transform many industries.
There are many blockchain technologies out there, e.g.,
Ethereum from Waterloo and Hyperledger from IBM. The term
smart contract refers to programs that represent contracts
between entities, wherein, the contract resides on the
blockchain and leverages all the features that blockchain
provides (e.g., immutability after parties commit to the
contract). Unfortunately, smart contracts are plagued by
security problems. This is a great opportunity for
verification/testing experts to apply their expertise to
the smart contract world. My students and I are
researching novel SAT and SMT solvers algorithms aimed at
security analysis of smart contracts.