Converting outies to innies

Some sequencers (notably SOLiD) when doing mate pair sequencing provide reads in the R3/F3 format, where both reads are pointing in the forward direction. Some tools, e.g. scaffolders, insist on reads that point inward. Thus, one may want to convert reads from

------>R3       ------>F3

to

------>R3       <------F3

Now, one option would be to flip the reads around before aligning them, however, if the reads are already aligned this is not necessary. We can flip the reads on the stream.

We don’t need to flip the SEQ and QUAL fields since they are always in the 5′ -> 3′ direction. All that we need to do is identify the F3 reads and change the 0x10 flag which indicates SEQ being reverse complemented. This takes care of the F3 reads, on the R3 reads, we need to change the flags of the R3 read to add the 0x20 flag (in addition to changing the F3 flag). And we are done.

Here is a small code snippet that does the flipping on the fly and produces a new bam with the reads pointed in the right direction.

samtools view -h outie.bam | \
gawk 'BEGIN{OFS="\t"}{if ($1~/^@/) {print $0; next;} \
else if (and($2, 0x40)){$2=xor($2, 0x20)} \
else if (and($2, 0x80)){$2=xor($2, 0x10)} print $0}'| \
samtools view -bS - > innie.bam
Advertisements

Author: Farhat

I am a physicist turned bioinformatics researcher turned Data Scientist.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s