Book review: The 24-hour mind

As a kid I thought that sleep was largely a waste of time. There was so much more we could do if we didn’t have to just lie down like a log for 8 hours or so a night. However, dreams opened up another portal and it felt that dreams redeemed sleep time. I came across The 24-Hour Mind by Rosalind Cartwright on brainpickings and the review looked interesting enough to get the book. I had read Freud’s Interpretation of Dreams a while back and found that somewhat lacking in rigor. This book is far more scientific and for the most part does not try to find meaning in individual dreams, like dreaming of a cat meaning something about your femininity. The book stays clear of the folklore surrounding dreams and that is refreshing.

The major flaw that I see in this book is that most studies are performed on small samples and the effect sizes are not particularly large. In something as complex as human sleep and given that some of the studies described span months to a year, that means that many of the conclusions drawn will likely fade as sample sizes grow larger. A notable absence in the book is the lack of anything to do with lucid dreaming. Also, almost all studies referenced are on people living in the urban West. While sleep is universal, different cultures have different relationships to sleep and it may have been interesting to learn if cultures that have a midday siesta as a part of their routine have differences in sleeping and dream patterns. Or variations between people who live in sunny/equatorial climates and those living in the gloomier climates or in winter in the north.

The book starts off with the early days of sleep research in the 1930s with the investigations by Freud and Jung in abnormal patients and then going on to the more scientific measurements of brainwaves and different sleep states in the 50s. We learn about the different stages of sleep and how sleep disorders played a role in revitalizing dream research. She writes about the differences in various stages of sleep like REM, NREM, and SWS which will show up throughout the book and so are good to remember. The second chapter is on collecting dreams. The best part I liked about it is how she allowed children to draw to express their dreams and I think that that may be effective for adults too. The medium affects the message and we know that something like choice of language changes how people think so it is quite conceivable that verbally describing something or drawing it out would provide differing content. Using just text to write or talk about a dream may mean that we undervalue parts that may be difficult to put into words. Also, not everyone is adept at describing what they are feeling and drawings may bring out certain emotions better.

She talks about her lab’s method of collecting dreams by waking people in the middle of REM sleep and asking what was going on their minds. While I found the method kind of barbaric, I couldn’t think of any better methods myself. Here she goes into the role of emotions in dreams and the difficulties with capturing that. We learn that dreams are longer and more complex toward the end of the sleep and also that REM sleep is intimately connected to learning and new memory formation. This is also corroborated by studies in mice.

“Through the night, from REM to REM, new information is integrated, drawing together more and more remote associations. The dream story line gets stretched into increasingly illogical and bizarre connections.”

Next, the book looks at the connections between sleep deprivation and metabolic and psychological disorders. There are some interesting connections between lack of dreaming during REM sleep in major depression and the effect of antidepressants on REM sleep. In this chapter is one of the more interesting studies described in the book. It focuses on divorce and dreaming and depression. Rosalind looks at some 61 different subjects (low numbers but let’s let that slide) going through divorce. While some of these subjects are depressed, others are not. Following them over a year, she breaks them into 3 groups. First group is those that are not depressed, the second group was depressed in the beginning but not depressed a year later, and the third is depressed both a year later and at the beginning. I think with large enough numbers there should’ve been a fourth group of those who were not depressed in the beginning but depressed a year later but it is missing. Looking at the emotional content of the dreams of these three groups, a conclusion is drawn that those who got over the depression tended to have more positive dreams in the last half of their sleep, while those who did not had more negative dreams. My major issue with the study was that the effect size is low here and I expect it to vanish in a larger study. The low sample size is understandable because getting enough volunteers would be hard and funding would not be easy to come by but on the other hand, it throws many of the conclusions into question. Despite this, the chapter is in interesting and gives the reader something to ponder over and see what one can get from one’s own dreams.

Then we get into sleep related disorders like sleepwalking and other parasomnias. While this topic was particularly fascinating for me, as I used to be a sleepwalker this and the following chapters were among the weakest parts of the book. Firstly, while she acknowledges that sleep walking is quite common in childhood, she focuses on a couple of sleep-related murder cases to a large extent. She takes a personal interest in the Scott Falater case and her credibility suffers as it appears she is too attached to the outcome. There is a bunch of his dreams as an Appendix in the book and I couldn’t figure out what that has to do with the 24-hour mind. Over a span of almost a decade a handful of selected dreams don’t indicate anything one way or the other. There is a lot of speculation on what his mental state was and what happens during the murder and it is a distraction from what the book is about. There is relatively little data and mostly anecdotes and weak ones at that. There is some good parts on what may induce people to sleepwalk and the effect of pharmaceuticals like Ambien on sleepwalking.

The 8th chapter talks about nightmares and other dream related disorders. She looks at people with PTSD and how their sleep patters differ from normal. It is an interesting read. The 9th chapter on Dreaming and the Unconscious talks about the effect of sleep and dreaming on things like problem-solving and creativity. There are references to Freud’s theories of dreaming and how the dreaming may play a role in filtering what is to be remembered or not among the events that happened during the day.

The books ends, appropriately enough, by tying everything together. She asserts that dreams play a role in emotional regulation and updating the self. Given the topic, it is a bit less data driven than the other chapters but it makes some good points. One should read the book if you are interested in sleep in general. Given how much effect sleep (or its lack) have on on waking hours, it is a good read to see what happens when one is persistently sleep deprived and what to watch out for in terms of physical or emotional health.

(Sometimes) more is less

The general consensus is that when it comes to data more is always better. You may run into issues with processing too much data or the time it takes to process it, or storage costs, otherwise, if a little bit of something is good, more is better. After all, more comprises of many versions of less, so you can always work with subset your data. However, sometimes it does turn out that more is less. Sometime back I came across an interesting paper ‘When Less is More: “Slicing” Sequencing Data Improves Read Decoding Accuracy and De Novo Assembly Quality’ which reports on the de novo assembly of BAC clones. BAC clones which are relatively short DNA fragments (100–150 kbp) and given their short size sequencing depths in the range of 1,000x–10,000x  are easy to achieve. They study how the assembly quality changes as the amount of sequencing data increases and find that when the depth of sequencing increases over a certain threshold, sequencing errors make the the problem of decoding reads to their assemblies and the problem of de novo assembly harder and as a consequence the quality of the solution degrades with more and more data.

The reason here is that in the presence of noise as the data increases, it becomes increasingly harder to tell a novel sequence from a sequencing error. The solution they propose in this case is a “divide and conquer” solution: slice the data in subsamples, decode each slice independently, then merge the results.

In other situations, the choice of the wrong model can also lead to wrong conclusions. In tree inference, long branch attraction (LBA) is a form of systematic error whereby distantly related lineages are incorrectly inferred to be closely related. LBA arises when the amount of molecular change accumulated within a lineage is sufficient to cause that lineage to appear similar (thus closely related) to another long-branched lineage, solely because they have both undergone a large amount of change, rather than because they are related by descent. Thus, when we have a tree of this type, the more data that is collected, the more strongly will the inferred tree tend toward the wrong tree.

As a digression on more and less, back in 1978, a command called more was written by a University of California grad student named Daniel Halbert. It was a fairly basic pager, something that allowed you to view a file one screenful at a time. Very handy, except that if you wanted to go scroll back, it would not be possible. Getting around its limitations, in 1984 another developer wrote a pager which would allow you to go both forward and backward navigation through the file among other improvements. This program was called less and less could do a lot more than more.

The dose makes the poison…

…and this goes for things that aren’t measured out in doses too. In this post, I will look at some studies regarding stress. Stress is something that we are usually seek to avoid and it seems completely sensible as well. However, studies show that stress may have some positive sides too. Evolution has fine tuned us to live within a certain range of various parameters and it isn’t good when we move out of them even if we move out of them on the arguably ‘good’ side.

We are constantly bombarded with studies that show how stress can do everything from reducing immunity to shortening lives among other deleterious effects. However, a recent study in Nature Communications “Early life stress in fathers improves behavioral flexibility in their offspring” points out that stress can also be a positive and interestingly, the effects can be seen across generations. The authors find that the pups of stressed male mice were more behaviorally flexible.

To create stress, the authors subjected the mice pups to unpredictable maternal separation combined with unpredictable maternal stress (MSUS) for two weeks. MSUS entails taking away the pups’ mothers at unpredictable intervals and subjecting their mothers to stressful situations, such as being placed in cramped tubes or in cups of cold water. The researchers then assessed behavioral flexibility in the pups by making them complete tasks that required them to follow rapidly changing rules to get water and food. They found that mice that had been stressed early in life outperformed controls. When the researchers bred males subjected to MSUS with wild-type females, the resulting offspring similarly excelled at behavioral flexibility. There are also studies that show acute stress can increase nerve growth. That may be related to why trauma can accelerate learning.

Other similar situations can be seen with ultra clean water. Ultra clean water from which minerals have been stripped out is less healthy than regular water and hard water has been associated with better cardiovascular health. Hygiene hypothesis posits that super clean environments have contributed to an increase in allergies as the immune system having nothing better to do, overreacts to the presence of ordinary objects like dust or dander.

On privacy and the cloud


Another day, another breach, this time of about 5 million gmail passwords. Last week hundreds of nude celebrity pics were stolen from allegedly stolen from iCloud (though Apple has denied that). A New York Times story claimed that hackers had stolen over a billion passwords. Yet, remarkably, nothing significant has changed. People are not demanding better security from online services, nor are they sharing less. People seem to have internalized what Sun Microsystems’ CEO Scott McNealy said over 15 years ago, “You have zero privacy anyway, get over it.”

Also, with an increasing move towards chatting and messaging over actual talking, it has become easier to store and process conversations. Nowadays, whenever someone online says something starting with ‘between you and me’, I silently add a ‘and Google’ to it. We willingly hand over an incredible amount of information to companies like Google and Apple every day. We almost always carry around a device that knows exactly where we are at every moment of the day and sends that information to be stored forever. It keeps tabs on what you are doing and who you’re communicating with. We even upload every pic we take into the cloud, risqué or otherwise.

There is an occasional murmur when things go what people consider a bit too far but then it all dies down and privacy standards are lowered a bit more.

On mosaics and chimeras

Whenever we sequence a genome, we assume that there is one genome that an individual possesses. While we are aware that mutations may happen, in for example, cancer cells, the usual assumption is that all cells in the body contain more or less the same genome. In animals with multiple births, it is not uncommon to see chimeras produced by the merger of multiple fertilized eggs. This can be contrasted with mosaicism which denotes the presence of two or more populations of cells with different genotypes in one individual who has developed from a single fertilized egg. 

In most cases, mosaics or chimeras would not be detected unless some medical test shows it up. There have been famous cases like that of Foekje Dillema, a female athlete who was later on found to be an XX/XY chimera and stripped of her medals. In chimeric or mosaic individuals, different body cells may have different genomes. With increase in genetic testing by parents, especially when one of their children has a genetic disorder clinicians have figure out when the disorder-associated mutation arose: Did it spring up during the creation of the sperm or egg that contributed to the child’s genetic makeup, or did it come from the parents genetic makeup.

A recent paper in The American Journal of Human Genetics shows that mosaicism may be a lot more common than previously thought. From the paper’s abstract:

However, increasing sensitivity of genomic technologies has anecdotally revealed mosaicism for mutations in somatic tissues of apparently healthy parents. Such somatically mosaic parents might also have germline mosaicism that can potentially cause unexpected intergenerational recurrences. Here, we show that somatic mosaicism for transmitted mutations among parents of children with simplex genetic disease is more common than currently appreciated.

These results indicate that many of the widely used tests for identifying CNVs and either fail to detect many kinds of genetic alterations or lack the precision to distinguish mosaicism from completely constitutional alternations. These results suggest that higher genome resolution as obtained from high throughput sequencing might allow rearrangement-specific LR-PCR to become an inexpensive yet sensitive test for CNV mosaicism. In addition, there is a need for more sensitive and specific tests for identifying disorders arising from low-level mosaicism.

On friends and family

An interesting paper in PNAS by Nicholas Christakis and James H. Fowler claims that our friends are more closely related to us than random strangers. On average, we are related to our friends at a level close to that of fourth cousins. Maybe people have to go that far from their family before they can stand being around them without being forced to.

Beyond just the average similarities across the whole genome, they find that friends tend to be most similar in affecting the sense of smell, and most dissimilar in genes controlling immunity. The immunity one makes sense in that it would be good to hang around with people who are not susceptible to the same kinds of infections as you are.

Another fascinating result from the study is that genes that are more similar between friends seem to be evolving faster than other genes. This could also explain why humans evolution seems to have speeded up over the last 30,000 years. The authors suggest that the social environment is a force in driving the evolution of humans.

The authors use the Framingham Heart Study dataset to draw their conclusions which comprises of a relatively homogeneous population of European ancestry. It would be interesting to see whether these results hold up in other populations. Among other interesting results that the authors have claimed is that obesity is contagious.

Maybe this explains why certain people just click as friends even though you have extremely dissimilar tastes and temperaments and some people just don’t despite all usual indicators favoring a pairing. With the FDA ordering companies like 23andme from marketing health related results from its genetic tests, they could now  use the troves of data they have collected to not just tell you your potential ancestry and present relatives, but also suggest new friends. Someday you may be able to type in ‘suggest a friend’ into the searchbar and hit I’m feeling lucky and get a match genetically guaranteed to be better than the average stranger. India, of course, is no laggard here. Genomepatri is already available and all they have to do is tie up with a marriage portal and along with all the other data that goes into find an ideal match, you could also be guaranteed to find someone who is not just a spouse but a friend. I, for one, welcome the new GATTACA world.

Most frequently used commands on the console

For a bioinformatician, nothing matches the speed and efficacy of the command line once you are used to it. The point and click interface of many tools simply don’t cut it when it comes to the power and flexibility of command lines and shell scripts. More importantly, scripts can be combined with other scripts, put in version control and adapted to work with other systems.

As Matt Might says, “The continued dominance of the command line among experts is a testament to the power of linguistic abstraction: when it comes to computing, a word is worth a thousand pictures.”

So I tried to see what are the most common commands I use on the command line. So with a first pass using the command

farhat@heracles:~$ cut -f1 -d' ' .bash_history |sort |uniq -c|sort -n|tail -20
13 mv
13 paste
14 bg
15 history|grep
17 wc
18 ~/software/bwa-0.7.5a/bwa
20 samtools
23 bedtools
28 less
32 scp
38 vi
42 rm
49 head
49 sudo
54 tail
66 screen
106 top
112 cat
397 cd
587 ls


Not surprisingly, ls and cd are there are lot. As is screen, that probably means a number of commands are missed from this list. samtools comes less freuqently than bedtools and that may be because samtools is often a part of pipelines and that will make it appear less frequently than it is used. So let’s correct for that:

farhat@heracles:~$ sed 's/|/\n/g' .bash_history| sed 's/^ //' | cut -f1 -d' ' |sort |uniq -c|sort -n|tail -20
14 bg
18 ~/software/bwa-0.7.5a/bwa
23 bedtools
23 history
32 scp
38 vi
39 samtools
41 wc
42 rm
49 sudo
66 screen
67 gawk
67 tail
79 grep
84 less
106 top
111 head
115 cat
397 cd
587 ls

With that change we see that samtools does go much farther ahead than bedtools. This also indicates which commands are ripe for optimization. Even a few keystrokes saved on the most frequently used commands can save a fair bit of time.

Converting outies to innies

Some sequencers (notably SOLiD) when doing mate pair sequencing provide reads in the R3/F3 format, where both reads are pointing in the forward direction. Some tools, e.g. scaffolders, insist on reads that point inward. Thus, one may want to convert reads from

------>R3       ------>F3


------>R3       <------F3

Now, one option would be to flip the reads around before aligning them, however, if the reads are already aligned this is not necessary. We can flip the reads on the stream.

We don’t need to flip the SEQ and QUAL fields since they are always in the 5′ -> 3′ direction. All that we need to do is identify the F3 reads and change the 0x10 flag which indicates SEQ being reverse complemented. This takes care of the F3 reads, on the R3 reads, we need to change the flags of the R3 read to add the 0x20 flag (in addition to changing the F3 flag). And we are done.

Here is a small code snippet that does the flipping on the fly and produces a new bam with the reads pointed in the right direction.

samtools view -h outie.bam | \
gawk 'BEGIN{OFS="\t"}{if ($1~/^@/) {print $0; next;} \
else if (and($2, 0x40)){$2=xor($2, 0x20)} \
else if (and($2, 0x80)){$2=xor($2, 0x10)} print $0}'| \
samtools view -bS - > innie.bam

On the obsession with Impact Factors

There is another paper out on Causes for the Persistence of Impact Factor Mania. From the abstract:

Numerous essays have addressed the misuse of the journal impact factor for judging the value of science, but the practice continues, primarily as a result of the actions of scientists themselves. This seemingly irrational behavior is referred to as “impact factor mania”.

Goodhart’s law: When a measure becomes a target, it ceases to be a good measure.

First Law of Metridynamics: The observed metric will improve.

Even a cursory glance indicates that the impact factor mania is perfectly rational. Given the increasing distance between administrators who decide on grants and promotions and the science they are trying to judge a single number is easy to judge on. Deciding on the actual impact of the science is difficult, time consuming, needs deep knowledge and you are still likely to go wrong. On the other hand, some function of the impact factor, is a single easy to judge number. Even if you know that the number is not particularly accurate, you can still justify an action later on based on a third party number.

Finding variation in a multiple sequence alignment

A friend of mine had a number of sequences that had been multiply aligned and wanted to find variations within those sequences. This should be a somewhat common task for biologists but I was unable to find an already existing tool that does this. So I thought of writing a small tool which did this with Python. Python can be downloaded from and Biopython which helps with parsing FASTA files from

I will write it with a number of comments so it is easier for non-programmers to follow. The first thing we need to do is, load the relevant libraries and load the FASTA sequences.

import sys
from Bio import SeqIO

We create an empty dictionary to hold the sequences with the sequence ID as key and sequence as value.

seqs={} #creating an empty dictionary for holding sequences

for seq_record in SeqIO.parse(sys.argv[1], 'fasta'):

The for loop iterates over the entire file while the second line adds sequences to the dictionary with the id as key and sequence itself as the value. If the FASTA file contains multiple sequences with same ID, the last one is the only one that will be recorded.

seqlen=len(seq_record.seq) # extract length from last sequence

#variation finding
for i in range(seqlen):
    for seq in seqs:
    if len(charset)==1: continue
    print("Column "+str(i)+" contains "+" ".join(charset))

In the above snippet, the first line gets the length of the sequence from the last sequence (since it is a multiple alignment, we assume all lengths are the same).

The loop from line 3-10 iterates over the entire length of the sequence, with the inner loop from line 6-7 iterating over all the sequences. The list chars holds all the characters in a particular column. We use the set operation to identify columns that are variant. Set outputs only unique members of a list. If the number of members of a set is 1, it means the comlumn is invariant and we can skip that, which is done by line 9. If not, line 10 outputs the different characters in a particular column.
We can also make a particular sequence as the reference and output variations with respect to that.

A run on a small file gives me the following output

farhat@palantir:~/$ python test.fa
Column 2 contains A T
Column 3 contains C T G
Column 5 contains C G
Column 6 contains T G
Column 7 contains A C T
Column 8 contains A G