Kingdom Project

Overview

The goal of the project is to create a compelling software environment for the development and execution of applications that are adaptable to diverse delivery contexts, using formal methods to ensure reliability.

Put simply, we want to create software that doesn't suck.

Licensing

Code will be released under the GNU General Public License, version 3.

Other licenses may be available from YesLogic on request.

Focus Areas


Regular Expressions

Goals

Certified compilation of regular expressions to automata using Antimirov's partial derivatives construction.

Motivation

Regular expressions are a formalism ideally suited to specification and implementation with formal methods. They are essential for text processing and form the basis of most markup schema languages. Antimirov's construction offers a new technique for compiling regular expressions to automata that has not been widely implemented, but promises improved efficiency over the traditional approach.

References

Partial Derivatives of Regular Expressions and Finite Automata Constructions, V. Antimirov (CiteSeer)


Irregular Expressions

Goals

Formal semantics and implementation for regular expressions with backreferences and other extensions.

Motivation

Popular implementations of regular expressions typically add features such as backreferences, that are not found in the regular languages and are impossible to implement efficiently in the general case. Nevertheless, these features must be supported, and ideally would be implemented in such a way that they do not impose any performance penalty for expressions in which they are not used.


Grammars & Parsing

Goals

Factoring left-recursive grammars while preserving equivalence and implementation of top-down parsers.

Motivation

Many grammars can be parsed by top-down parsers, which are simple to implement and verify. However, left-recursive grammars must be factored first to avoid infinite recursion in the parser. The grammar given for JavaScript in the ECMA standard is a good example of a left-recursive grammar that is amenable to top-down parsing if factored.


JavaScript

Goals

Formal specification of JavaScript semantics and implementation of a certified interpreter / JIT compiler.

Motivation

JavaScript is the programming language of the modern web, powering interactive websites, online applications, games, and increasingly even the browsers themselves. Standards compliance, reliability, and security are crucial requirements for any JavaScript implementation. A certified JavaScript implementation conforming to a formal specification of the language semantics would be a strong foundation for the next generation of web user-agents.

References

ECMA-262 — 3rd edition (HTML)
ECMAScript Edition 3 Errata (HTML)
EcmaScript 3.1 Working Draft, ES4 Wiki (HTML)


Binary Data Formats

Goals

Design and implementation of a language for describing binary data formats.

Motivation

Applications often need to process binary data formats such as OpenType fonts, PNG and JPEG images, multimedia container formats and ZIP archives. Typically these formats are supported by C libraries which offer no guarantees of correctness or reliability, and over the years there have been numerous security exploits targeting these libraries.

A data description language for describing binary data formats would offer a robust solution to these problems. The language would require dependent types, as array lengths and integer widths can depend upon earlier values in the binary data. From the format description it would be possible to programmatically derive functions for safe access to the data in a binary file, eliminating the need for tedious and error-prone manual coding.

References

The Next 700 Data Description Languages, K. Fisher, Y. Mandelbaum, and D. Walker (PDF)


Unicode Encoding

Goals

Formal specification of UTF-8 encoding and implementation of encoder/decoder.

Motivation

UTF-8 decoding is simple, but surprisingly easy to get wrong, which has been the cause of several security exploits. An elegant formalisation of UTF-8 that could be used to verify implementations would be very useful. An extra complication is that implementations may be optimised to decode multiple characters simultaneously using SIMD instructions.


Future Ideas

Knuth / Plass paragraph breaking algorithm, a landmark in document layout. Perhaps this is a specific instance of general techniques dynamic programming and could be formalised as such?

Hexadecimal integer literals in Coq. Don't laugh, this is a serious issue!

Specialisation or partial evaluation allows greater use of abstraction in the code design without compromising runtime performance. For example, specifying multiple simple transformations that are combined into a more complex single-pass transformation. A common use case for this is text processing, which usually consists of a decoding step that transforms a byte sequence to a Unicode code point sequence, then a subsequent parsing step applied to the code point sequence. Expressing these as separate transformations improves clarity, but combining them can improve performance by removing the need to explicitly create the intermediate sequences. See Neil Mitchell's Supero optimisations for Haskell.

HTML parsing. This is a big one. How to create a formal spec when there is barely even an informal spec?

Are attribute grammars an appropriate formalisation for the "cascade" of formatting properties required by CSS and XSL-FO?

OpenType layout features combine glyph substitutions and context-sensitive rewrite rules in a way that can be quite inefficient if implemented naively. Is it worth compiling these into an automata of some kind?

Rasterisation has a strong geometrical basis, and should be amenable to formal treatment. For compositional imaging, the Haskell Pan library and Core Image MacOS X API offer promising directions.

IO with left-fold enumerators?


Contributors