haha-systems/windowpain
Windowpain is a random-access tool for reading genetic sequences from huge FASTA sequence files in milliseconds.
Windowpain is a random-access tool for reading genetic sequences from huge FASTA sequence files in milliseconds.
Windowpain provides lightning-fast random access to FASTA sequence files by using memory mapping and efficient indexing. It allows you to read specific windows of sequence data without loading the entire file into memory.
git clone https://github.com/haha-systems/windowpain.git
cd windowpain
zig build --release=fast
./windowpain index <fasta_filename> <index.json>
./windowpain read <fasta_filename> <index.json> <sequence_index> [window_size] [window_start] [--raw]
./windowpain index test.fasta test.json
./windowpain read test.fasta test.json 0 10
This will first index the test.fasta file and then read the first 10 characters of the first sequence in test.fasta and print it to the console.
stdin./windowpain read test.fasta test.json 0 10 --raw | <other_tool>
This will read the first 10 characters of the first sequence in test.fasta and pass it to other_tool. The other_tool can be any tool that accepts raw sequence data on stdin.
If you want to iterate over a sequence, just pass the
position += window_sizeto thereadcommand'swindow_startargument.
index command and should be used with the read command.read command can optionally output the raw sequence data without formatting to stdout.