Performance Analysis - Julia
Number of effective sequences implemented in Julia
In the previous post I have compared various languages and libraries in terms of their speed. This notebook contains the code of Julia implementation. I have struggled to make it run in parallel. I am also not sure if the code is actually optimal, but I include this for completion.
import Statistics
using NPZ
input_data = npzread(npz_file_path)
input_data = Int.(input_data)
function get_nf_row(input_data)
dim1, dim2 = size(input_data)
pairwise_id = input_data[2:dim1,:] .== reshape(input_data[1,:], (1,dim2))
pairwise_id = Statistics.mean(pairwise_id, dims=2)
pairwise_id .> 0.8
end
function get_nf_julia(input_data)
n_seqs, seq_len = size(input_data)
is_same_cluster = ones((n_seqs,n_seqs))
Threads.@threads for t in 1:24
for i in 1+t:24:n_seqs-1
out = get_nf_row(input_data[i:n_seqs, :])
is_same_cluster[i, i+1:n_seqs] =out
is_same_cluster[i+1:n_seqs, i] =out
end
end
s = 1.0./sum(is_same_cluster, dims=2)
sum(s)/(seq_len^0.5)
end
@time get_nf_julia(input_data)