What is Cython? An introduction to a supercharged version of Python
Fed up of how slow Python is? If you’re looking for a way to optimise Python code, especially for handling big data, you may want to invest some time in Cython.
Like animals or plants, languages live and die. A language is kept alive by its speakers and can become extinct when those speakers die themselves. Take Latin, for example, which formed the foundations of many modern Western languages but has not been widely spoken for centuries. It’s the same story for programming languages.
Programming languages rise and fall with the waves of innovation. Coding goes through trends; any kind of language evolves and builds, and it’s the same for programming languages, surging in popularity before becoming eventually displaced, confined to the annals of forgotten hard-drives.
Some of the code that developers used in the early days of software is but hieroglyphics on a cave wall in comparison to the code we employ today. Quite famous languages, too, have fallen by the wayside, been developed upon or simply withered in popularity. Pascal led to Delphi, Objective C was replaced by a much cooler Swift and Perl has been in a liminal state for years, in and out of fashion regularly.
Python is being built upon and it’s benefiting data scientists and AI programmers.
What is Cython?
Python is a high-level, general-purpose programming language, created by Dutchman Guido van Rossum in 1991, and named after the eccentric British comedy troupe, Monty Python. Cython is a portmanteau of Python and C/C++, not to be confused with CPython.
Cython has a user-friendly interface, allowing Python to interact with C/C++ code. Cython can massively decrease computing time: that’s the main reason that it has generated hype, as unlike Python, Cython requires code to be “executable”.
A compiler analyses the source code and can produce a lower machine language of the code that you have written. Generally, this makes the machine language faster and more easily readable for your computer to run. Python is a high machine language. Cython takes complicated instructions that can’t be read by humans and compiles it efficiently so a computer can read it.
Python is almost readable for humans. It is an interpreter rather than a compiler, which means it will convert code into instructions for a computer line-by-line. Translation takes time. Python scans through until it reaches an error and this can be a slow process.
Cython’s speed in comparison to Python depends on the code itself. Computationally heavy loops can be managed quickly and easily by Cython.
How could Cython be used?
Python is commonly used in conjunction with artificial intelligence. For many developers, python is at the spine of machine learning and developers use libraries like Keras, Scikit-learn and TensorFlow. Python is also used for Natural Language Processing (NLP) as it uses simple syntax and rich text processing tools.
The ability that Cython has to speed up your Python workflows makes it extremely appealing. Take an NLP task, for example: imagine that you want to count the number of times a particular word crops up in a dataset. A Python loop can find the answer, but supposing that there are thousands of documents within a dataset, it could a long time to get a result.
In Python, there’s something called a global interpreter lock (GIL). This is essentially a lock that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. CPython’s memory management is not thread-safe but this lock stabilises Python.
With Cython, this GIL lock is released.
Cython is the programmer’s language of the future when it comes to NLP
Cython code is a little longer – a developer must declare and populate the C structures – but ultimately it runs a lot quicker. Cython code can be up to 2300 times faster than Python implementation; though the traditional assumption with AI is that it’s conducted in laboratories by scientists on super quantum computers, the truth is that you can run Python on Mac, Windows and Linux. This speed update is very much welcome to any kind of programmer.
If there is a significant disadvantage with Cython, it is that some developers prefer not to “contaminate” the code, so to speak. Some prefer Python to stay as Python: confusing the code with C elements can complicate things.
However, in the long run, Cython exists simply to make Python easier, not to make it more complex. Python programs can run Cython with no add-ons or extensions. If you are pre-processing a large dataset or computing analytics, Cython is a relatively easy way to speed up your workflow.
Languages come and go. As recently as 2012, the Cromarty dialect found in Northern Scotland went extinct with the passing of its last sole survivor. Programming languages should be thought of in a similar way: spoken language is developing every day, branching off into different directions as words and other languages are added to each other.
A development like Cython even got its name in the same way that Frenglish, for example – the combination of English and French elements into one tongue – got its title. Cython is the programmer’s language of the future when it comes to NLP. What kind of language will we end up building upon it?