As part of my work at Counsyl, I interview most of our software engineering candidates who are interested in machine learning and data processing problems. I always ask prospective candidates why they’re interested in Counsyl; with surprisingly high frequency the answer is something like, “I want to work on Big Data, and I’ve heard genomics has the biggest data around, so that must be the place to be!”
This conversation has happened so many times that I decided to devote an entire tech talk to its misconceptions. In the talk, I argue that Big Data is not intrinsically interesting; instead, most of the hype is because Big Data is relevant to advertising, and advertising drives the consumer Internet. I further argue that although the total volume of data in genomics may be large, it’s better to think of genomics as a large collection of small data problems, rather than as Big Data.
I find that most of the engineers I talk to who are interested in Big Data aren’t interested in it for its own sake (people working on setting TeraSort records aside!) Instead, they’re interested in it because of what they hope it might offer: technologies that are more personalized and useful to the consumer. I argue, both to candidates and in the talk, that genomics’ small data problems actually fit this bill much better than the usual problems in Big Data space.
You can see a video of the talk here.
Then check out our current job openings.
Imran S. Haque is the Director of Research at Counsyl. Prior to joining Counsyl, he completed his Ph.D in computer science at Stanford, where he worked on large-scale machine learning for drug design with Vijay Pande and Daphne Koller. His code reviews mostly consist of giving the look of disapproval to questionable constructs.
December 23, 2020
Myriad’s 7 Greatest Hits in 2020
Before we say goodbye to 2020, we’d like to thank and recognize our partners. Together, we ensure patients have access…Read more about Myriad’s 7 Greatest Hits in 2020
December 4, 2020
Knowledge is Power: How PKU Affected the Ahern Family
December 3rd is National PKU Day. PKU, short for phenylketonuria, is an inherited genetic condition in which the body cannot properly…Read more about Knowledge is Power: How PKU Affected the Ahern Family
November 3, 2020
Simplify Patient Identification
Streamlining Cancer Risk Assessment with Digital Tools November is Family Health History Month and is a good time to bring…Read more about Simplify Patient Identification