haha-systems/windowpain
Windowpain is a random-access tool for reading genetic sequences from huge FASTA sequence files in milliseconds.
Windowpain is a random-access tool for reading genetic sequences from huge FASTA sequence files in milliseconds.
Windowpain provides lightning-fast random access to FASTA sequence files by using memory mapping and efficient indexing. It allows you to read specific windows of sequence data without loading the entire file into memory.
git clone https://github.com/haha-systems/windowpain.git
cd windowpain
zig build --release=fast
./windowpain index <fasta_filename> <index.json>
./windowpain read <fasta_filename> <index.json> <sequence_index> [window_size] [window_start] [--raw]
./windowpain index test.fasta test.json
./windowpain read test.fasta test.json 0 10
This will first index the test.fasta
file and then read the first 10 characters of the first sequence in test.fasta
and print it to the console.
stdin
./windowpain read test.fasta test.json 0 10 --raw | <other_tool>
This will read the first 10 characters of the first sequence in test.fasta
and pass it to other_tool
. The other_tool
can be any tool that accepts raw sequence data on stdin
.
If you want to iterate over a sequence, just pass the
position += window_size
to theread
command'swindow_start
argument.
index
command and should be used with the read
command.read
command can optionally output the raw sequence data without formatting to stdout
.