Introduction

In the previous post I have compared various languages and libraries in terms of their speed. This notebook contains the code used in the comparison as well as some details about the choices made to improve the performance of Tensorflow implementation.

Setup

# ! pip install pandas
# ! pip install tensorflow-gpu

Getting data

import pandas as pd
import numpy as np

def get_data(path):
    fasta_df = pd.read_csv(path, sep="\n", lineterminator=">", index_col=False, names=['id', 'seq'])
    return fasta_df.seq.to_numpy(dtype=str)

seqs = get_data('../data/picked_msa.fasta')

Just to remind the pseudo code looks like this:

for seq1 in seqs:
  for seq2 in seqs:
    if count_mathes(seq1, seq2) > threshold:
      weight +=1
  meff += 1/weight

meff = meff/(len(seq1)^0.5)

import tensorflow as tf

@tf.function
def get_nf_tf(seqs, threshold=0.8,dtype='float16', batch_size=1):
    n_seqs, seq_len = seqs.shape
    s = tf.constant(0, dtype=tf.float32)
    for i in tf.range(0, limit=n_seqs, delta=batch_size):
        batch = tf.expand_dims(seqs[i:i+batch_size], 1)
        match= tf.cast(tf.equal(seqs, batch), dtype)
        pairwise_id = tf.reduce_mean(match, -1)
        is_more =  tf.cast(tf.greater(pairwise_id,threshold), dtype)
        cluster_size = tf.divide(1.0, tf.reduce_sum(is_more, -1))
        s = s + tf.cast(tf.reduce_sum(cluster_size), tf.float32)
    return tf.divide(s, tf.constant((seq_len**0.5), dtype=tf.float32))

seqs_ = seqs[:100]
get_nf_tf(seqs_.view(np.uint32).reshape(seqs_.shape[0], -1))

<tf.Tensor: shape=(), dtype=float32, numpy=0.1800427>

%%timeit -n 3 -r 3
seqs_ = seqs[:100]
get_nf_tf(seqs_.view(np.uint32).reshape(seqs_.shape[0], -1))

39.3 ms ± 2.46 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)

seqs_ = seqs[:100]
get_nf_tf(seqs_.view(np.uint32).reshape(seqs_.shape[0], -1), dtype='float32')

<tf.Tensor: shape=(), dtype=float32, numpy=0.18006703>

%%timeit -n 3 -r 3
with tf.device('/cpu:0'):
    get_nf_tf(seqs_.view(np.uint32).reshape(seqs_.shape[0], -1), dtype='float32')

19 ms ± 2.4 ms per loop (mean ± std. dev. of 3 runs, 3 loops each)

Couple points:

float16 - gives improvement on GPU, but precision suffers. If CPU is used, float16 is slower than float32.
You cannot easily manipulate values in the Tensor. As a result, I could not use symmetry to reduce the amount of computations needed.
tf.function can give a massive speed boost.