Performance Analysis - Tensorflow
Number of effective sequences implemented in Tensorflow
In the previous post I have compared various languages and libraries in terms of their speed. This notebook contains the code used in the comparison as well as some details about the choices made to improve the performance of Tensorflow implementation.
# ! pip install pandas
# ! pip install tensorflow-gpu
import pandas as pd
import numpy as np
def get_data(path):
fasta_df = pd.read_csv(path, sep="\n", lineterminator=">", index_col=False, names=['id', 'seq'])
return fasta_df.seq.to_numpy(dtype=str)
seqs = get_data('../data/picked_msa.fasta')
Just to remind the pseudo code looks like this:
for seq1 in seqs:
for seq2 in seqs:
if count_mathes(seq1, seq2) > threshold:
weight +=1
meff += 1/weight
meff = meff/(len(seq1)^0.5)
import tensorflow as tf
@tf.function
def get_nf_tf(seqs, threshold=0.8,dtype='float16', batch_size=1):
n_seqs, seq_len = seqs.shape
s = tf.constant(0, dtype=tf.float32)
for i in tf.range(0, limit=n_seqs, delta=batch_size):
batch = tf.expand_dims(seqs[i:i+batch_size], 1)
match= tf.cast(tf.equal(seqs, batch), dtype)
pairwise_id = tf.reduce_mean(match, -1)
is_more = tf.cast(tf.greater(pairwise_id,threshold), dtype)
cluster_size = tf.divide(1.0, tf.reduce_sum(is_more, -1))
s = s + tf.cast(tf.reduce_sum(cluster_size), tf.float32)
return tf.divide(s, tf.constant((seq_len**0.5), dtype=tf.float32))
seqs_ = seqs[:100]
get_nf_tf(seqs_.view(np.uint32).reshape(seqs_.shape[0], -1))
%%timeit -n 3 -r 3
seqs_ = seqs[:100]
get_nf_tf(seqs_.view(np.uint32).reshape(seqs_.shape[0], -1))
seqs_ = seqs[:100]
get_nf_tf(seqs_.view(np.uint32).reshape(seqs_.shape[0], -1), dtype='float32')
%%timeit -n 3 -r 3
with tf.device('/cpu:0'):
get_nf_tf(seqs_.view(np.uint32).reshape(seqs_.shape[0], -1), dtype='float32')
Couple points:
- float16 - gives improvement on GPU, but precision suffers. If CPU is used, float16 is slower than float32.
- You cannot easily manipulate values in the Tensor. As a result, I could not use symmetry to reduce the amount of computations needed.
- tf.function can give a massive speed boost.