News Archive
PhpRiot Newsletter
Your Email Address:

More information

Language runtimes and backwards compatbility (or why you shouldn't write a version control system in Python)

Note: This article was originally published at Planet PHP on 4 June 2012.
Planet PHP

Software project choose languages based on idoms of the languages. Languages can provide mechanism and structures to support object orientation or a functional programming. Less time is spend thinking about backwards compatbility of a programming language runtimes. While this is usually a non-issue for short living software like websites or software in tightly controlled environment, it becomes an issue for software projects that need to guarantee backwards-compatbility for years. For example: a version control system.

The Mercurial project aims to support Python 2.4 to Python 2.7. It does not support Python 3. Why? Python 3 was a drastic change. Unicode are the default string types and many other things were changed. The impact of the changes are similar to the change from PHP 4 to PHP 5. Most of the software projects, including have to adapt these language changes. For projects that need to support long-term supported operating systems like RHEL or Solaris 9/10, this becomes an issue. You could drop Python 2.X support and tell existing users of your software to look for something else - a no-go for a version control system. You could also never support Python 3, but someday Python 2.X will reach EOL. New distribution releases might not include Python 2.X by then, while LTS operating systems might still not have Python 3. Writing software that needs to be backwards-compatbile for 8 years can be a problem if it's implemented in languages such as Python, PHP or Perl.

The source of the problem

Why is this a not an issue for Java or C, but for Python, PHP and the-likes? Java and C compile to a bytecode that is guaranteed to be stable. C compiles to machinecode. A processor architecture won't change anymore. If it's a x86 processor, it will support x86. It won't change with the next software update. If your code needs to support old C code that modern compilers don't understand anymore, use an old one. The same goes for Java. The JVM runtime has a defined set of instructions, which won't be changed anymore. It doesnt matter which Java compiler you used, in the end it will produce bytecode that will run on any JVM. You still might have problems with standard framework implementations and featuresets. E.g. your OS uses an old libc, or your JVM supports an old classpath, but the language itself is not the issue itself.

Python and PHP do compile to bytecode as well, pretty much like Java, with one exception: They do it in memory and the VM to interprete the bytecode is bundled with the compiler. This is were the backwards compatbility problem comes in play. You cannot run Python bytecode compiled on Python 3 with a Python 2 interpreter. You cannot compile with PHP 5 and run it on PHP 4. Not only might your runtime just simple not be able to do so (like PHP), or your VM implementation is not guaranteed to be stable. That means in Python and PHP the underlying machine that you compile might change with the next update. Let's compare this to the x86 world. Your next software update might change the x86 instruction set? You would have to recompile all your C code and maybe some of the old C code cannot be compiled with modern C compilers and old C compilers might not be able to get compiled on the new instruction set. Sounds painful, particularlly if you really care about backwards-compatbility.


Thinking about this for a while, I came to the conclusion that Python, PHP and others did an architectual mistake. They bundled the VM and runtime with the compiler. Thus your language version defines your runtime and the underlying machinecode. If you write a new language, write down a minimum instruction set that you will always support and separate your VM from your compiler. Always support that instruction set. This can lead to interesting problems. The implemnetation of Java Generics is a good example. Nobody thought about generics when defining the insturctions set. Therefore the bytecode was not designed to retain generic information. Thats why the JVM cannot retain generic information during runtime and has to check generic types on compile time. This is known as type erasure. Python and PHP developer would probably just introduce new bytecodes, not caring about BC. (Well PHP devs would just pretend that PHP is a web language and web projects shouldn't care about BC at all ;)).

If you seriously care about backward-compatbility even for LTS systems that are 8 years old, choose a language that has a implementation which separates the VM from the compiler. Languages like Java (probably C#) do this. Java developer won't define behavior that requires a new bytecode. PHP and Python are wonderful languages to write software in, but personally I am not sure if it is wise to write something like a VCS in such a language.

Long story short: Language choice matters for BC. If you write your own language, please separate your VM from your compiler. Better (as johannes pointed out) compile to an existing VM like JVM, CLR or LLVM