If you are at the IMP or IMBA then just log in with your username/password, no registration needed. GMI and MFPL members please use firstname.lastname as your username, and your professional e-mail address when registering.

Welcome to the BioComp Knowledge Hub! You can ask questions here related to bioinformatics, statistics, computational biology and similar subjects. You can also answer questions and rate other users' contributions.

Frequently Asked Questions

Possible false-positive chase when analyzing multigene disorders

0 votes

This question is based on a real, published investigation. I am not giving references to protect the (possibly) guilty... smiley

Description of the situation: There's a disease that seems to be heritable, but it is definitely not a single-mutation-in-a-single-gene disorder. The researchers collected families with diseased members and looked for mutated genes using high-throughput sequencing in both healthy and sick family members. ("Mutated" means w.r.t. the wild-type reference sequence.) They found over 30 genes that were always mutated in the diseased individuals and built their conclusions only on those. However, they excluded mutated genes that occurred also in the healthy phenotype from the analysis.

To my paranoid mind this smells like "chasing false positives". I would like to know whether this kind of analysis is OK, and if not, then what's the statistically proper way of addressing this situation?

asked Oct 10, 2014 in Statistics by aszodi (590 points)

2 Answers

+1 vote
Definitely you will be chasing some 'false positives' in this case.The only reasonable approach is to control for family relations and respective positives occurring in non-diseased members by directly including all this information in the model. This is the classical cases of random effects in any kind of a GLMM. The random effect helps to determine whether this effect is family-specific or actually disease-specific.
answered Oct 15, 2014 by aposekany (520 points)
0 votes
yes, it is a kind of chasing false positives; it could be a reasonable trade-off if all members/families ('samples' sounds a little bit strange in this respect) are relatives; if not, if at least not all families are relatives to each other, then it's an alternative to treat the population structure in the analysis to avoid the likely large amount of false positives.
One common way for doing this is to formulate a linear mixed model; methods for solving it are for instance EMMA, Fast-LMM, ...;
The difficulty lies in defining the linear mixed model, but it should not be hard when there is full genome data and the family trees are known.
answered Oct 13, 2014 by APL (890 points)