Introduction

In the previous post I have compared various languages and libraries in terms of their speed. This notebook contains the code used in the comparison as well as some details about the choices made to improve the performance of numba implementation.

From Numba website: "Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN."

Setup

# ! pip install pandas
# ! pip install numba

Getting data

import pandas as pd

def get_data(path):
    fasta_df = pd.read_csv(path, lineterminator=">", header=None)
    fasta_df[['id', 'seq']] = fasta_df[0].str.split('\n', expand=True)[[0,1]]
    return fasta_df.seq.to_numpy(dtype=str)
seqs = get_data('picked_msa.fasta')

Just to remind the pseudo code looks like this:

for seq1 in seqs:
  for seq2 in seqs:
    if count_mathes(seq1, seq2) > threshold:
      weight +=1
  meff += 1/weight

meff = meff/(len(seq1)^0.5)

As with Numpy and Python versions, we use the same input data. The code is closer to the version of pure Python because wrapping optimised Numpy code turned out to be slower. It seems that you are better off leaving all optimisation for Numba.

import numpy as np
from numba import jit, prange

def get_nf_numba(seqs, threshold=0.8):
    seqs = seqs.view(np.uint32).reshape(seqs.shape[0], -1)
    n_seqs, seq_len = seqs.shape
    is_same_cluster = np.eye(n_seqs)
    for i in prange(n_seqs):
        c  = 0
        for j in prange(i+1, n_seqs):
            identity = np.equal(seqs[i], seqs[j]).mean()
            is_more = np.greater(identity, threshold)
            is_same_cluster[i,j] = is_more
            is_same_cluster[j,i] = is_more
    meff = 1.0/is_same_cluster.sum(1)
    return meff.sum()/(seq_len**0.5)

There are a couple of things that need to be done in order to utilise Numba fully. Firstly, Numba uses JIT -(just in time compilation)[https://en.wikipedia.org/wiki/Just-in-time_compilation]. Hence you need to wrap your functions with either @jit or ‘jit’ function. Note, the first run of the wrapped function will be slower as Numba needs to compile code. Secondly, there is the nopython option that bypasses the Python interpreter. It has its own down sides that allows code to run faster.

fn = jit(get_nf_numba, nopython=True,parallel=False)
fn(seqs[:100])

0.18006706787628668

%%timeit -n 3 -r 3
fn(seqs[:2500])

3.77 s ± 13.2 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)

Another really nice feature of Numba is that it allows to parallelise code with one single option as you can see below.

fn = jit(get_nf_numba, nopython=True,parallel=True)
fn(seqs[:100])

0.18006706787628668

%%timeit -n 3 -r 3
fn(seqs[:2500])

669 ms ± 10.9 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)

Finally, if precision is less important and can be sacrificed for extra speed, there is fastmathoption. From (Numba documentation)[https://numba.readthedocs.io/en/stable/user/performance-tips.html?highlight=fastmath#fastmath]:

“In certain classes of applications strict IEEE 754 compliance is less important. As a result it is possible to relax some numerical rigour with view of gaining additional performance. The way to achieve this behaviour in Numba is through the use of the fastmath keyword argument”

fn = jit(get_nf_numba, nopython=True,parallel=True, fastmath=True)
fn(seqs[:100])

0.18006706787628668

%%timeit -n 3 -r 3
fn(seqs[:2500])

176 ms ± 8.06 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)

Numba seemed to be the fastest library that I tried on CPU and was relatively easy to get started. Of course, there will be cases where Numba will not work, but in general it seems that Numba deserves to be at least considered seriously when looking for ways to improve performance of the code.