FCFM - Universidad de Chile
Oficina 311 Norte (DCC)
Av. Beauchef 851
Santiago, Chile
Research Interests
I am interested in the semantics and verification of programs in general, and in particular, in the verification of probabilistic programs. More specifically, my research interests are:
- Program semantics and verification
- Probabilistic programming
- Theorem provers
- Language-based security
Supervision of Students
Are you interested in these topics and have a (reasonably) solid background in theoretical computer science? If you would like to write your master thesis on this topic, please do not hesitate to drop me a line or drop by my office.
Publications
For more detailed information about my publications, please consult my DBLP or Google Scholar entry.
A mechanized formalization of GraphQL
With T. Díaz and E. Tanter
In CPP'20: 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, New Orleans, USA, 2020
GraphQL is a novel language for specifying and querying web APIs, allowing clients to flexibly and efficiently retrieve data of interest. The GraphQL language specification is unfortunately only available in prose, making it hard to develop robust formal results for this language. Recently, Hartig and Pérez proposed a formal semantics for GraphQL in order to study the complexity of GraphQL queries. The semantics is however not mechanized and leaves certain key aspects unverified. We present GraphCoQL, the first mechanized formalization of GraphQL, developed in the Coq proof assistant. GraphCoQL covers the schema definition DSL, query definitions, validation of both schema and queries, as well as the semantics of queries over a graph data model. We illustrate the application of GraphCoQL by formalizing the key query transformation and interpretation techniques of Hartig and Pérez, and proving them correct, after addressing some imprecisions and minor issues. We hope that GraphCoQL can serve as a solid formal baseline for both language design and verification efforts for GraphQL.
Runtime analysis of quantum programs: A formal approach
With A. Díaz-Caro
In PLanQC'20: 1st International Workshop on Programming Languages for Quantum Computing, New Orleans, USA, 2020
In this abstract we study the resource consumption of quantum programs. Specifically, we focus on the expected runtime of programs and, inspired by recent methods for probabilistic programs, we develop a calculus à la weakest precondition to formally and systematically derive the (exact) expected runtime of quantum programs. Notably, the calculus admits a notion of loop runtime invariant that can be readily used to derive upper bounds of their runtime. Finally, we show the applicability of our calculus analyzing the runtime of (a simplified version of) the BB84 quantum key distribution protocol.
Weakest precondition reasoning for expected runtimes of randomized algorithms
With B. L. Kaminski, J.-P. Katoen, and C. Matheja
In JACM: Journal of the ACM, Vol 65, No 5, 2018
This article presents a wp–style calculus for obtaining bounds on the expected runtime of randomized algorithms. Its application includes determining the (possibly infinite) expected termination time of a randomized algorithm and proving positive almost–sure termination—does a program terminate with probability one in finite expected time? We provide several proof rules for bounding the runtime of loops, and prove the soundness of the approach with respect to a simple operational model. We show that our approach is a conservative extension of Nielson’s approach for reasoning about the runtime of deterministic programs. We analyze the expected runtime of some example programs including the coupon collector’s problem, a one–dimensional random walk and a randomized binary search.
Conditioning in probabilistic programming
With F. Gretz, N. Jansen, B. L. Kaminski, J.-P. Katoen and A. McIver
In TOPLAS: ACM Transactions on Programming Languages and Systems, Vol 40, No 1, 2018
This paper investigates the semantic intricacies of conditioning, a main feature in probabilistic programming. Our study is based on an extension of the imperative probabilistic guarded command language pGCL with conditioning. We provide a weakest pre–condition (wp) semantics and an operational semantics. To deal with possibly diverging program behaviour we consider liberal pre–conditions. We show that diverging program behaviour plays a key role when defining conditioning. We establish that weakest pre–conditions coincide with conditional expected rewards in Markov chains—the operational semantics—and that the wp–semantics conservatively extends the existing semantics of pGCL (without conditioning). An extension of these results with non–determinism turns out to be problematic: although an operational semantics using Markov decision processes is rather straightforward, we show that providing an inductive wp–semantics in this setting is impossible. Finally, we present two program transformations which eliminate conditioning from any program. The first transformation hoists conditioning while updating the probabilistic choices in the program, while the second transformation replaces conditioning—in the same vein as rejection sampling—by a program with loops. In addition, a program transformation is presented that replaces an independent identically distributed loop by conditioning.
Weakest precondition reasoning for expected run–times of probabilistic programs
With B. L. Kaminski, J.-P. Katoen and C. Matheja
In ESOP'16: 25th European Symposium on Programming Languages and Systems, Eindhoven, The Netherlands, 2016
Best Theory Paper Award
This paper presents a wp-style calculus for obtaining bounds on the expected run-time of probabilistic programs. Its application includes determining the possibly infinite expected termination time of a probabilistic program and proving positive almost-sure termination--does a program terminate with probability one in finite expected time? We provide several proof rules for bounding the run-time of loops, and prove the soundness of the approach with respect to a simple operational model. We show that our approach is a conservative extension of Nielson's approach for reasoning about the run-time of deterministic programs. We analyze the expected run-time of some example programs including a one-dimensional random walk and the coupon collector problem.
Reasoning about recursive probabilistic programs
With B. L. Kaminski, J.-P. Katoen and C. Matheja
In LICS'16: 31st Annual ACM/IEEE Symposium on Logic in Computer Science, New York, USA, 2016
This paper presents a wp-style calculus for obtaining expectations on the outcomes of (mutually) recursive probabilistic programs. We provide several proof rules to derive one- and two-sided bounds for such expectations, and show the soundness of our wp-calculus with respect to a probabilistic pushdown automaton semantics. We also give a wp-style calculus for obtaining bounds on the expected runtime of recursive programs that can be used to determine the (possibly infinite) time until termination of such programs.
Conditioning in probabilistic programming
With F. Gretz, N. Jansen, B. L. Kaminski, J.-P. Katoen and A. McIver
In MFPS'15: 31st Conference on the Mathematical Foundations of Programming Semantics, Nijmegen, The Netherlands, 2015
In this paper, we investigate the semantic intricacies of conditioning in probabilistic programming, a major feature, e.g., in machine learning. We provide a quantitative weakest pre–condition semantics. In contrast to all other approaches, non–termination is taken into account by our semantics. We also present an operational semantics in terms of Markov models and show that expected rewards coincide with quantitative pre–conditions. A program transformation that entirely eliminates conditioning from programs is given; the correctness is shown using our semantics. Finally, we show that an inductive semantics for conditioning in non–deterministic probabilistic programs cannot exist.
Understanding probabilistic programs
With J.-P. Katoen, F. Gretz, N. Jansen and B. L. Kaminski
In Correct System Design - Symposium in Honor of Ernst-Rüdiger Olderog on the Occasion of his 60th Birthday, Oldenburg, Germany, 2015
We present two views of probabilistic programs and their relationship. An operational interpretation as well as a weakest pre-condition semantics are provided for an elementary probabilistic guarded command language. Our study treats important features such as sampling, conditioning, loop divergence, and non-determinism.
Verified indifferentiable hashing into elliptic curves
With G. Barthe, B. Grégoire, S. Heraud and S. Z. Béguelin
In JCS: Journal of Computer Security, Vol 21, No 6, 2013
Many cryptographic systems based on elliptic curves are proven secure in the Random Oracle Model, assuming there exist probabilistic functions that map elements in some domain (e.g. bitstrings) onto uniformly and independently distributed points in a curve. When implementing such systems, and in order for the proof to carry over to the implementation, those mappings must be instantiated with concrete constructions whose behavior does not deviate significantly from random oracles. In contrast to other approaches to public-key cryptography, where candidates to instantiate random oracles have been known for some time, the first generic construction for hashing into ordinary elliptic curves indifferentiable from a random oracle was put forward only recently by Brier et al. We present a machine-checked proof of this construction. The proof is based on an extension of the CertiCrypt framework with logics and mechanized tools for reasoning about approximate forms of observational equivalence, and integrates mathematical libraries of group theory and elliptic curves.
Probabilistic relational reasoning for differential privacy
With G. Barthe, B. Köpf and S. Zanella-Béguelin
In TOPLAS: ACM Transactions on Programming Languages and Systems, Vol 35, No 3, 2013
Differential privacy is a notion of confidentiality that allows useful computations on sensible data while protecting the privacy of individuals. Proving differential privacy is a difficult and error-prone task that calls for principled approaches and tool support. Approaches based on linear types and static analysis have recently emerged; however, an increasing number of programs achieve privacy using techniques that fall out of their scope. Examples include programs that aim for weaker, approximate differential privacy guarantees and programs that achieve differential privacy without using any standard mechanisms. Providing support for reasoning about the privacy of such programs has been an open problem.
We report on CertiPriv, a machine-checked framework for reasoning about differential privacy built on top of the Coq proof assistant. The central component of CertiPriv is a quantitative extension of probabilistic relational Hoare logic that enables one to derive differential privacy guarantees for programs from first principles. We demonstrate the applicability of CertiPriv on a number of examples whose formal analysis is out of the reach of previous techniques. In particular, we provide the first machine-checked proofs of correctness of the Laplacian, Gaussian, and exponential mechanisms and of the privacy of randomized and streaming algorithms from the literature.
Beyond differential privacy: Composition theorems and relational logic for f-divergences between probabilistic programs
With G. Barthe
In ICALP'13: 40th International Colloquium on Automata, Languages and Programming, Riga, Latvia, 2013
f-divergences form a class of measures of distance between probability distributions; they are widely used in areas such as information theory and signal processing. In this paper, we unveil a new connection between f-divergences and differential privacy, a confidentiality policy that provides strong privacy guarantees for private data-mining; specifically, we observe that the notion of α-distance used to characterize approximate differential privacy is an instance of the family off-divergences. Building on this observation, we generalize to arbitrary f-divergences the sequential composition theorem of differential privacy. Then, we propose a relational program logic to prove upper bounds for the f-divergence between two probabilistic programs. Our results allow us to revisit the foundations of differential privacy under a new light, and to pave the way for applications that use different instances of f-divergences.
Verified indifferentiable hashing into elliptic curves
With G. Barthe, B. Grégoire, S. Heraud and S. Z. Béguelin
In POST'12: 1st Conference on Principles of Security and Trust, Tallinn, Estonia, 2012
Many cryptographic systems based on elliptic curves are proven secure in the Random Oracle Model, assuming there exist probabilistic functions that map elements in some domain (e.g. bitstrings) onto uniformly and independently distributed points in a curve. When implementing such systems, and in order for the proof to carry over to the implementation, those mappings must be instantiated with concrete constructions whose behavior does not deviate significantly from random oracles. In contrast to other approaches to public-key cryptography, where candidates to instantiate random oracles have been known for some time, the first generic construction for hashing into ordinary elliptic curves indifferentiable from a random oracle was put forward only recently by Brier et al. We present a machine-checked proof of this construction. The proof is based on an extension of the CertiCrypt framework with logics and mechanized tools for reasoning about approximate forms of observational equivalence, and integrates mathematical libraries of group theory and elliptic curves.
Probabilistic relational reasoning for differential privacy
With G. Barthe, B. Köpf and S. Zanella-Béguelin
In POPL'12: 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Philadelphia, USA, 2012
Differential privacy is a notion of confidentiality that protects the privacy of individuals while allowing useful computations on their private data. Deriving differential privacy guarantees for real programs is a difficult and error-prone task that calls for principled approaches and tool support. Approaches based on linear types and static analysis have recently emerged; however, an increasing number of programs achieve privacy using techniques that cannot be analyzed by these approaches. Examples include programs that aim for weaker, approximate differential privacy guarantees, programs that use the Exponential mechanism, and randomized programs that achieve differential privacy without using any standard mechanism. Providing support for reasoning about the privacy of such programs has been an open problem.
We report on CertiPriv, a machine-checked framework for reasoning about differential privacy built on top of the Coq proof assistant. The central component of CertiPriv is a quantitative extension of a probabilistic relational Hoare logic that enables one to derive differential privacy guarantees for programs from first principles. We demonstrate the expressiveness of CertiPriv using a number of examples whose formal analysis is out of the reach of previous techniques. In particular, we provide the first machine-checked proofs of correctness of the Laplacian and Exponential mechanisms and of the privacy of randomized and streaming algorithms from the recent literature.
Verifiable security of Boneh-Franklin identity-based encryption
With G. Barthe and S. Zanella-Béguelin
In ProvSec'11: 5th International Conference on Provable Security, Xian, China, 2011
Identity-based encryption (IBE) allows one party to send ciphered messages to another using an arbitrary identity string as an encryptionkey.Since IBE does not require prior generation and distribution of keys, it greatly simplifies key management in public-key cryptography. Although the concept of IBE was introduced by Shamir in 1981, constructing a practical IBE scheme remained an open problem for years. The first satisfactory solution was proposed by Boneh and Franklin in 2001 and constitutes one of the most prominent applications of pairingbased cryptography. We present a game-based machine-checked reduction of the security of the Boneh-Franklin IBE scheme to the Bilinear Diffie-Hellman assumption, and analyze its tightness by providing an exact security bound. Our proof simplifies and clarifies the original proof by Boneh and Franklin and can be automatically verified by running a trusted checker.
Formally certifying the security of digital signature schemes
With S. Zanella Béguelin, B. Grégoire and G. Barthe
In S&P'09: 30th IEEE Symposium on Security and Privacy, Oakland, USA, 2009
We present two machine-checked proofs of the existential unforgeability under adaptive chosen-message attacks of the FullDomain Hash signature scheme. These proofs formalize the original argument of Bellare and Rogaway, and an optimal reduction by Coronthat provides a tighter bound on the probability of a forgery. Both proofs are developed using CertiCrypt, a general framework toformalize exact security proofs of cryptographic systems in the computational model. Since CertiCrypt is implemented on top of theCoq proof assistant, the proofs are highly trustworthy and can be verified independently and fully automatically.
PhD Thesis
Approximate relational reasoning for probabilistic programs
Adivor: Gilles Barthe
Technical University of Madrid, Madrid, Spain, 2014
The verified security methodology is an emerging approach to
build high assurance proofs about security properties of
computer systems. Computer systems are modeled as probabilistic
programs and one relies on rigorous program semantics techniques to
prove that they comply with a given security goal. In particular, it
advocates the use of interactive theorem provers or automated provers
to build fully formal machine-checked versions of these security
proofs.
The verified security methodology has proved successful in modeling
and reasoning about several standard security notions in the area of
cryptography. However, it has fallen short of covering an important
class of approximate, quantitative security notions. The
distinguishing characteristic of this class of security notions is
that they are stated as a "similarity" condition between the output
distributions of two probabilistic programs, and this similarity is
quantified using some notion of distance between probability
distributions.
This class comprises prominent security notions from
multiple areas such as private data analysis, information flow
analysis and cryptography. These include, for instance,
indifferentiability, which enables securely replacing an idealized
component of system with a concrete implementation, and differential
privacy, a notion of privacy-preserving data mining that has received
a great deal of attention in the last few years. The lack of rigorous
techniques for verifying these properties is thus an important problem
that needs to be addressed.
In this dissertation we introduce several quantitative program logics
to reason about this class of security notions. Our main theoretical
contribution is, in particular, a quantitative variant of a
full-fledged relational Hoare logic for probabilistic programs. The
soundness of these logics is fully formalized in the Coq
proof-assistant and tool support is also available through an
extension of CertiCrypt, a framework to verify cryptographic proofs
in Coq.
We validate the applicability of our approach by building fully
machine-checked proofs for several systems that were out of the
reach of the verified security methodology. These comprise, among
others, a construction to build "safe" hash functions into elliptic
curves and differentially private algorithms for several
combinatorial optimization problems from the recent literature.
Tools
I contributed in the development of the following tools:
- CertiCrypt. CertiCrypt is a framework built on top of the Coq proof assistant that
enables the machine-checked construction and verification of
language-based cryptographic proofs. It has been successfully used to prove the security of several emblematic cryptographic constructions, including encryption schemes, digital signature schemes, zero-knowledge protocols and hash functions.
- CertiPriv. CertiPriv is a framework built as an extension of CertiCrypt aimed at verifying differential privacy properties of probabilistic programs. Differential privacy is the most robust and widely accepted privacy policy for mining senstitive data. The CertiPriv framework was used to certify privacy properties of several basic (numeric) mechanisms and combinatorial optimization algorithms.