Optimising your code with Numba

19 April 2021
Marta López
Marta López

Head of Marketing and Communication

Optimise your Python code with Numba

Discover all the Python acceleration features with Numba. Learn how to optimise Python code and optimise your resources, with this powerful tool.

What will we see in this article?

Topics:

  • What is Numba? 
  • How does Numba work?
  • First steps: Compiling for the CPU
  • Use Numba to compile and accelerate functions using the CPU.

What is Numba?

Numba is a compiler for Pythonthe name of the module specially designed to speed up your numerical functions, generating machine code optimised from Python code, using LLVM. With this tool you can optimise your code and get performance similar to C, and C++, without having to switch programming languages. Import an optimisation module for your Python development environment.

You can see the code in my GitHub! 🙂

How does Numba work?

Numba allows you to accelerate Python code (the numerical functions) using the CPU (Central Processing Unit) and the GPU (Graphics Processing Unit):

  1. Compiled Functions: Numba compiles only Python functions, not entire applications. Basically, Numba is one of the Python modules which improves the performance of our functions.
  2. Just-in-time: @jit (Dynamic translation): Numba transforms the "bytecode" (intermediate, more abstract code) to machine code immediately prior to its execution.
  3. Focus on numerical functions: Numba is focused on numeric data, such as int, float and complex. Today, there are limitations to using it with string data.

Numba is not the only way of programming in CUDAwhich is normally programmed directly in C, C++. However, Numba allows you to program directly in Python and optimise your code for both CPU and GPU by simply changing a couple of lines of your code. In relation to Python, there are other alternatives such as pyCUDA, here is an overview:

CUDA C/C++:

  1. It is the most common and flexible way of programming in CUDA.
  2. Accelerates C and C++ applications.

pyCUDA

  1. It is the most efficient way to program in CUDA in Python.
  2. Requires inserting C code into Python.

Numba

  1. Less efficient than pyCUDA.
  2. Allows you to write your code in Python.
  3. It also optimises the code for CPU.

First steps: Compiling CPU functions 

Numba, apart from speeding up your functions by using the GPUcan be used to optimise functions in the CPU. To do this, you simply use a python decorator with what is called decorated functions.

First of all, let's evaluate the hiplot function to see how Numba works. We must use the compiler @jit. The result is the same as the pure Python function, but Numba keeps the original Python implementation in the argument .py_func.

GPU Benchmark 

It is important to measure the performance of our code, and to check if Numba really works. There is a difference between the Python implementation and the Numba implementation. 

The result is astonishing: The function math.hypot of Python is faster than Numba! This is because Numba introduces certain steps when calling a function, so if the function is very simple, Numba can make it take longer than the pure Python implementation.

Operation 

When we have initialised the hypot:

  • IR: Intermediate representations
  • Bytecode Analysis: Intermediate code, more abstract than machine code.
  • LLVM: Low Level Virtual Machine, infrastructure for developing compilers.
  • NVVM: An IR compiler based on LLVM, it is designed to render kernels on GPUs.

Each line of Python is preceded by several lines of IR code. It is most useful to look at the types of annotations Numba displays when operating on variables, for example in pyobjectvNumba is indicating that it does not know the function np.sin from Python and import it directly from the source code. We can inspect processes for hypot using .inspect_types().

A Concrete Example: How a fractal is generated

We are going to measure the performance of a code that is in charge of creating fractals using the Mandelbrot ensemble and see if Numba helps us to optimise it. 

It takes about 4.64 seconds to generate a fractal using the Mandelbrot Set, now let's see if Numba improves performance by using the @jit decorator.

We can see how we have reduced the time from 4.62 seconds to 52.4 ms... and this has been done just by adding a decorator.

A common mistake

We have said that Numba only works with numeric functions, and although it compiles and executes Python code, there are some data types that it cannot compile yet (such as dictionaries).

In the example above we can see however how it did not fail!!! We have said that Numba does not compile dictionaries... The point here is that Numba creates 2 functions, one for Python and one for Numba. So, here we are looking at the Python solution, we can check it by adding nopython = True.

jit(nopython = True) is equivalent to njit

IMMUNE Technology Institute

If you want to learn PythonThe development of applications for the world of the 21st century, the handling of large amounts of data and different operating systems, our Master in Data Science is designed for you. Not only will you acquire all the technical knowledge you need to build your future as a Data Scientist, in the campus most comprehensive and innovative MadridYou will also enjoy discovering the human dimension of the data. For more information, click here.

In addition to this, our institution offers many other courses, among which we highlight the degree in software engineeringwhere you will get all the basics to start working with Python and Numba.

This article was written by: Alejandro Díaz Santos- ( GitHub) for IMMUNE Technology Institute.

Subscribe to our newsletter
menuchevron-down