Performance Analysis - Numba
Number of effective sequences implemented in Numba
In the previous post I have compared various languages and libraries in terms of their speed. This notebook contains the code used in the comparison as well as some details about the choices made to improve the performance of numba implementation.
From Numba website: "Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN."
# ! pip install pandas
# ! pip install numba
import pandas as pd
def get_data(path):
fasta_df = pd.read_csv(path, lineterminator=">", header=None)
fasta_df[['id', 'seq']] = fasta_df[0].str.split('\n', expand=True)[[0,1]]
return fasta_df.seq.to_numpy(dtype=str)
seqs = get_data('picked_msa.fasta')
Just to remind the pseudo code looks like this:
for seq1 in seqs:
for seq2 in seqs:
if count_mathes(seq1, seq2) > threshold:
weight +=1
meff += 1/weight
meff = meff/(len(seq1)^0.5)
As with Numpy and Python versions, we use the same input data. The code is closer to the version of pure Python because wrapping optimised Numpy code turned out to be slower. It seems that you are better off leaving all optimisation for Numba.
import numpy as np
from numba import jit, prange
def get_nf_numba(seqs, threshold=0.8):
seqs = seqs.view(np.uint32).reshape(seqs.shape[0], -1)
n_seqs, seq_len = seqs.shape
is_same_cluster = np.eye(n_seqs)
for i in prange(n_seqs):
c = 0
for j in prange(i+1, n_seqs):
identity = np.equal(seqs[i], seqs[j]).mean()
is_more = np.greater(identity, threshold)
is_same_cluster[i,j] = is_more
is_same_cluster[j,i] = is_more
meff = 1.0/is_same_cluster.sum(1)
return meff.sum()/(seq_len**0.5)
There are a couple of things that need to be done in order to utilise Numba fully. Firstly, Numba uses JIT -(just in time compilation)[https://en.wikipedia.org/wiki/Just-in-time_compilation]. Hence you need to wrap your functions with either @jit
or ‘jit’ function. Note, the first run of the wrapped function will be slower as Numba needs to compile code. Secondly, there is the nopython
option that bypasses the Python interpreter. It has its own down sides that allows code to run faster.
fn = jit(get_nf_numba, nopython=True,parallel=False)
fn(seqs[:100])
%%timeit -n 3 -r 3
fn(seqs[:2500])
Another really nice feature of Numba is that it allows to parallelise code with one single option as you can see below.
fn = jit(get_nf_numba, nopython=True,parallel=True)
fn(seqs[:100])
%%timeit -n 3 -r 3
fn(seqs[:2500])
Finally, if precision is less important and can be sacrificed for extra speed, there is fastmath
option. From (Numba documentation)[https://numba.readthedocs.io/en/stable/user/performance-tips.html?highlight=fastmath#fastmath]:
“In certain classes of applications strict IEEE 754 compliance is less important. As a result it is possible to relax some numerical rigour with view of gaining additional performance. The way to achieve this behaviour in Numba is through the use of the fastmath keyword argument”
fn = jit(get_nf_numba, nopython=True,parallel=True, fastmath=True)
fn(seqs[:100])
%%timeit -n 3 -r 3
fn(seqs[:2500])
Numba seemed to be the fastest library that I tried on CPU and was relatively easy to get started. Of course, there will be cases where Numba will not work, but in general it seems that Numba deserves to be at least considered seriously when looking for ways to improve performance of the code.