Recently, GenomeWeb did a pretty extensive write-up of BioGPS. I think it’s a pretty detailed and accurate view of how BioGPS fits into the sphere of online biological resources.
The only part I take a bit of issue with is the series of quotes by Larry Moran, who then followed it up with a more detailed blog post. Among his comments,
Larry Moran, a biochemist at the University of Toronto, told BioInform by e-mail that he had looked at a few of his “favorite genes” in the portal. “I don’t think it’s a very useful database,” he said, since it is a summary of information gleaned from other databases with “no attempt at annotation.”
This is a great opportunity to clarify here BioGPS’s use cases and our target audience…
Larry has focused his entire scientific career on the study of HSP70 family genes. For people like Larry who only care about a handful of genes, they really don’t have a great need for gene portals. They know their genes backward and forward, and they get their information directly by following the primary literature. Relative to that, every gene portal will be missing important information.
But BioGPS targets researchers doing genome-scale science, which has become increasingly popular in the last decade or so. Suppose you’ve done a microarray experiment comparing tumor tissue to a matched control, and maybe you’ve identified 100 differentially expressed genes. You can’t pick in advance which genes you’ll find, and undoubtedly the list will include genes you’ve never studied before. What kind of resources help you quickly evaluate which are the most promising for follow-up studies? Gene portals.
And gene portals are not just for gene expression analyses. Scientists doing copy-number analysis, methylation profiling, proteomics, or functional genomics all face this similar issue. For people doing “data-driven science”, gene portals are essential for quickly learning about unfamiliar genes.
Larry goes on to say:
The point is whether taking the expression data and adding links from other sources makes BioGPS a valuable resource.
Not as far as I can see.
Well, Larry has essentially described SymAtlas (BioGPS’s precursor), and that site gets about 1.7 million hits per year worldwide. So yes, many people do think that’s useful. Moreover, BioGPS extends that model by enabling users to aggregate data from multiple gene portals, emphasizing community extensibility and user customizability. Based on his characterization above, it seems Larry hasn’t explored these features yet.
Larry’s right, we’re not attempting to do annotation, so BioGPS might not be useful to him. But he’s not our target audience either…
Larry has focused his entire scientific career on the study of HSP70 family genes. For people like Larry who only care about a handful of genes, they really don’t have a great need for gene portals. They know their genes backward and forward, and they get their information directly by following the primary literature. Relative to that, every gene portal will be missing important information.
You miss the point. There are hundreds of people who are knowledgeable about a small subset of genes. Almost all of them agree that the existing databases are not very accurate with respect to their genes.
What’s the logical conclusion?
I suppose you could conclude that the only thing wrong with biological databases is that they aren’t very accurate for those genes that have been intensively studied but they are extremely useful for all the other genes.
That doesn’t sound like smart science to me.
Large scale experiments, such as expression studies, are very important and useful but the very nature of such work means that the researchers don’t know very much about the genes they’re working with.
The goal is to connect the survey results with the experts on individual genes to see if the survey results are accurate. You don’t so this by simply linking to existing biological databases and hoping that everyone will assume the database entries are accurate, and so are the survey results.
Real science means getting down and dirty and exploring the details. Superficial isn’t going to work and it could, in fact, be very harmful.
Larry, you seem to be suggesting that because BioGPS doesn’t solve the world’s gene annotation needs, that because it doesn’t integrate scientists who are on the entire spectrum from focused gene families to genome-wide scans, that therefore gene portals like BioGPS aren’t useful.
Does your blender also cook eggs for you in the morning? If not, is the blender useless?
BioGPS is designed to serve researchers who are doing genome-scale science. I mentioned a few areas in the original post. To add a few more, how about genome-wide association studies, miRNA profiling, or epigenetics? Check out any issue of Nature Genetics or PLoS Genetics for some specific examples.
Generally, these studies start with unbiased genome-wide scans (“data-driven science”) as a mechanism for generating hypotheses. Researchers then use gene portals like BioGPS to prioritize candidates, and then proceed to get “down and dirty” with validation.
I absolutely agree with you that there are many researchers who specialize on small sets of genes, and I absolutely agree that their research is valuable. (Often they serve as critical collaborators or successors to the validation experiments mentioned above.) However, you seem to think that that is the only kind of valuable research (“real science”), and that data-driven science is merely “superficial”.
To believe we are going to annotate the genome just by studying the gene families we’ve traditionally studied is a bit like searching for your keys under the lamppost. You’re fighting the power (law).
We certainly aren’t dismissive of the fact that experts in specific genes or gene families have lots to contribute. In fact, our Gene Wiki effort is exactly aimed at inviting those experts to share their knowledge.
Larry, we certainly don’t claim that BioGPS is the solution to all of our gene annotation needs. But for those who are starting with data-driven genome-scale science, we think BioGPS is a useful tool for the toolbox.
Andrew, you seem to be missing the point somewhat. Larry is saying that the information in the database is wrong, not that it’s incomplete. And he’s saying that using bad data will make even good science bad. He’s not saying that genome-scale science is intrinsically bad. He’s saying that databases like yours will make them bad.
The point is that when experts look at their pet genes, the information there is wrong. Therefore, presumably, the information on less well-studied genes will also be wrong. People who use those data will reach wrong conclusions, turning their careful work into gibberish.
You say that you expect the experts to fix the bad data, but you also say that the experts are not your target audience. If so, why do you expect the experts to take the time to fix something that will never benefit them? Wishful thinking and high principles are fine, but without a tangible benefit why should anyone try to fix your database, or any of the dozens of others that are springing up, each equally optimistic and equally untested?
The concept behind these databases is a noble one, but I for one have been avoiding them, and will continue to, avoid them, because they seem to be quite unreliable. Yours seems no different, and it’s not encouraging to see you misunderstanding Larry’s quite basic and perfectly reasonable point.
Ian, thanks for your comment. I think you and Larry are both raising an interesting point about reliability of gene portals and genome-scale databases. Clearly portals have varying degrees of data quality and reliability. And you’re right, users of those databases (including our gene expression data generated in a separate effort) need to critically evaluate the data.
But this interesting discussion is tangential to the point of BioGPS.
BioGPS is a content aggregator, in some ways not unlike Google News. We take content sources that our users are already consulting, and providing it in a more convenient and useful form. We also enable other users to both extend and customize the site, and that enables some interesting “social networking” benefits. But ultimately, BioGPS is not about content creation, it’s about content aggregation.
I certainly respect scientists like you and Larry who don’t consult any gene annotation portals. If you don’t believe those sources and don’t need them in your research, then you won’t be using BioGPS. If Larry had said “BioGPS is not useful in my research,” no problems…
But Larry said “I don’t think it’s a very useful database,” period, implying no one would benefit from checking it out. Well, there’s a lot of researchers doing genome-scale science, and they consult gene portals daily. (Have you tried to analyze microarray data without consulting one? Can’t be done…) For those users, we think BioGPS offers many useful advantages.
Again, I’m not pitting data-driven science versus gene-focused science, since both have scientific value. But I do question the relevance of strongly-worded statements by someone from one camp about tools meant for the other camp.
Said another way, I wouldn’t trust a review of a steakhouse by a vegetarian, nor a vegetarian restaurant by a Texan.
Ian,
I will take issue with your and Larry’s interpretation. The information is not wrong. Data are data.
However, the *interpretation* of the data can be misguided. We provided guidelines for interpretation of probe set information in the geneatlas papers — if levels are not sufficiently high, then you are looking at noise and should take the measurements with a grain of salt.
In our two papers, to validate these datasets we did 2000+ RT-PCR experiments (82%+ validation rate), dozens of northern blots and in situs. Also, Tim Hughes group at Toronto did a metaanalysis of their data collected on a different technology and ours and the correlation was incredible. No one has gone as far to validate these datasets as we have. Period. That being said, there is a high false negative rate, which we delve into in the manuscripts. (That is if you bothered to read them.)
To speak to your specific point, that your genes do not display the patterns you expect: individual patterns of gene expression have been validated by hundreds of laboratories for hundreds/thousands of genes — several human disease genes and mouse loci have been cloned using these data. This is all a matter of record — citations — though factual and not subject to happy feely interpretation, they are still useful as a general measure of utility. Hundreds/thousands of happy customers, who got this data free of charge, fully informed, at no cost to the taxpayer.
To bring things back on point, these issues have NOTHING to do with BioGPS other than the atlas sets are one of 50+ datasets you can CHOOSE to aggregate (or NOT). That point seems elusive.