facebook

Tau as a Generalized Blockchain

Bitcoin may abstractly be described as a decentralized machine that appends new data (transactions) to a public shared database (ledger) given certain proofs (signatures).
One immediately asks, what if we take the Blockchain algorithm and allow arbitrary data with multiple public shared databases and with arbitrary proofs? Moreover, can the protocol itself evolve with time?

Let’s break those questions into pieces:
1.  Allowing arbitrary data, Bitcoin’s protocol may already encode arbitrary data, but not efficiently. It costs way beyond customary storage networks or services, and the cost is justified – all nodes have to store and process that data.

2.  Multiple public shared databases we’d like to have many different applications over one Blockchain that secures them all, yet we don’t want users of one application to be affected by the existence of another application.

3.  Arbitrary proofs, proofs are nothing but computer programs and vice versa. At Bitcoin’s scope, the proof is the signature verification function’s output, sometimes with the addition of scripts (examples of useful scripts at https://en.bitcoin.it/wiki/Contract), yet Bitcoin’s scripts are quite weak. So we’d like to allow arbitrary code to “run on the blockchain”, a code that is able to do practically anything that a computer can do.

4.  Self-definition, Since we speak of a shared database of programs, can the network’s protocol itself, i.e. the client’s code itself, be on-chain and auto-update itself from the chain?

1 is a special case of 3 since ultimately code can contain data (and that’s basically Ethereum, Bitcoin+3). 2 can be achieved by allowing to put many programs on the blockchain, while users pick which program to run. This can be fulfilled by a blockchain containing Java/C++/Python/Precompiled programs, in other words, a decentralized appstore that auto-updates its own code (4). We call this construction Naive Tau.

Why is it naive? Because apparently it answers all requirements, but de-facto we didn’t decentralize anything, the code itself still comes from centralized hands. Since we have to either trust the program’s developers for its safety or for other guarantees, we can no longer expect users to trust all programs on the network or a subnetwork. That, even if executing the program is required in order to validate a financial transaction and get into a public ledger. This is unlike, say, Bitcoin’s blockchain, that transactions and scripts are “innocent” enough for everyone to safely publish or validate.

A popular concern among projects dealing with trustless computing is the Halting problem. How can we know that execution of a given code will ever halt, rather stuck all machines on the network? It turns out that under a Turing Complete language, it is impossible to build a software that decides whether given source-code, its execution will ever halt. Ethereum confronted this problem with so-called Gas. The code is indeed executed among all machines participating in the network once they validate the blocks, but the code’s author pays per how many computational steps the code has actually performed. If it performs too much calculations (“out of gas”), it is then aborted.

Obviously, we would like to know much more than only whether a code is halting or not, but also its exact running time given some input, without the need to actually execute it. Similarly, we would like to have custom user-defined guarantees about code without executing it, like that it does not do insecure operations (by any well-defined measures of insecurity, such as connecting to the local network or accessing private files). We will also be very happy to know that the code actually meet its own requirements – namely, that it is correct. We then won’t need tedious, full-of-holes QA process, once we have a mathematical proof that a code meets our requirements.

But it turns out that life aren’t so easy. The Halting problem over Turing machines became so famous not because its answer is “No”, but because its answer is both “Yes” and “No”! A contradiction. This situation is mildly called “undecidability”. And this can go arbitrarily far, any such contradiction can be elevated to any statement, supplying a proof that it is true, and also supplying a proof that it is false!

Because of such pathologies, we can never trust a proof over a Turing complete language. Completeness here happens to be precisely the completeness Godel spoke about, when he proved that any language that is expressive enough to describe arithmetic, must be either inconsistent or incomplete. Since Turing complete languages are also “Godel-complete”, they are provably inconsistent.

Godel, therefore, left us with two paths completeness or consistency, while the latter implies decidability, the inability to prove an incorrect statement. Completeness is the ability to prove any correct theorem. Alas, if you can prove any correct theorem, then you can prove also any incorrect theorem!

This is of course a very unpleasant situation for philosophers, logicians, mathematicians, and computer scientists. But a breaktrough came during the 70s and the 80s by Per Martin-Lof he presented the (now called) Martin-Lof Type Theory (MLTT). MLTT is a programming language with such an expressive power, that it is considered to replace the old foundations of mathematics that consist of first-order classical logic and ZFC set theory. This is as serious as it sounds, new foundations of mathematics. This is the mathematics of the 21st century.

Why are those foundations “better”?
1.  They are in a language that fits computers too. We can now reformulate all math (a process that has already began, e.g. under the project of Homotopy Type Theory) in a way that a computer can comprehend, verify our proofs, and discover new proofs and theorems.

2.  This system is decidable. Although it picks Godel’s consistency path, it turns out that the completeness it is giving up is not so painful it is only incomplete with respect to infinite codes. It cannot prove infinitely long mathematical claims or run code that assumes infinite computers (unlike Turing machines that are infinite by definition).

MLTT covers all what matters finite computers and finite math formulas (infinite sets are welcome, even construction of the reals using Dedekind cuts – after all, this math gets into a finite book). As a result, given a syntactically well-formed code in MLTT together with a proof that “this code will halt” or “this code will never perform more than 1M operations”, we can trust the proof. The decidability guarantees that only true statements can be proved. Moreover, a machine can seek for a proof itself. This is called Automated Theorem Proving, but for complex enough problems their runtime is astronomically high. Yet, a “skeleton” of a complex proof can be supplied by human, letting the machine verifying it by trying to fill the missing parts of the proof.

Back to our Blockchain construction, we can now put MLTT-compatible language on our blockchain, and we can now supply proofs for anything that we can prove, in a way that a 3rd party is able to trust our proofs. We need not to worry anymore about halting, or about security measures, or about correctness – we can simply require proofs for them all.

On Tau we take RDF-family languages’ syntax (notably Notation3 and Nquads, which consist of triples of “subject verb object”) and give them MLTT’s semantics. Equipped with an automated theorem prover that JIT-compiles the proof search (or verification, as a special case), it offers a general-purpose programming language that is decidable, so code can provide proofs about itself (without even getting into self-reference paradoxes!). The language incorporates a Blockchain and DHT as mentioned several times in other locations. Thanks to MLTT and automatic reasoning, non-naive Tau is an intelligent, aware (at the sense it can reason over its own code), safe and trustless decentralized network, while remain so even when allowing general and arbitrary code to run.

Of course, an explanation of what is Tau should come now and tell more about its features and uses, and we have discussed those in the past in various places (especially on this blog and on bitcointalk). So far on this post we concluded the Naive Tau construction, why it is doomed, and what is the correct way to perform secure multiparty computation.