Healthcare Data: Fuel for your Machine Learning ‘Rocket Ship’

Machine Learning has been passed down from a not-so new field of pattern recognition. Yet in recent times, it has been gaining fresh momentum. From gigantic corporations to fledgling startups, venture capitalists will not lose a moment to bombard the audience with buzzwords like deep learning, neutral networks, NPL etc.

“…90% of the investors have very little idea what AI is so if you’re a founder raising money, you should sprinkle some AI into your pitch deck… Then sit back and watch the funding roll in.”

This is funny nevertheless true and applies to almost all wannabe entrepreneurs and inexperienced investors.

As a computer and computational scientist, it is upsetting to find entrepreneurs and investors reiterate jargons and employ generic Use cases of how machine learning will benefit the humanity at a large scale. I do not want to be judgmental but it seems like they do not even comprehend the requirement to get started in the first case. During our journey, we have gradually understood and started addressing the 4Ps necessary for any machine learning startup. These 4Ps are essential for a well-informed and high-quality healthcare startup. To be brief, they are

i. Problem (i.e. use cases),

ii. Product (i.e. software, hardware),

iii. Petabytes (i.e. colossal amount of data), and

iv. People (i.e. subject matter experts, computer scientists, programmers).

Contacting the subject matter experts working on the areas concerned is perhaps the best way to hit upon use cases or application areas. Most often tech entrepreneurs come from computer science background and thus at times they forget the interdisciplinary nature of the challenge. For example, being a startup organization in medical imaging or radiology space and attempting to build ML algorithms for Computer Aided Diagnostics, one should not overlook to bring in advisors who are radiologists or physicians. Working with them will help one to figure out the right use cases and application areas.

One of the most difficult challenges for any ML healthcare startup is acquisition of data. The major barriers to subdue in healthcare data are privacy and ownership. It is precisely this area, which is the prime focus of this article. I have tried to compile not only the Use cases but also the data set that is available from the internet for a better understanding of the situation.

Prof. Andrew Yan-Tak Ng rightfully remarked “…when I think about building machine learning products I think of building a rocket ship… A rocket ship is a giant engine together with a ton of fuel. Both need to be really big. If you have a lot of fuel and a tiny engine, you won’t get off the ground. If you have a huge engine and a tiny amount of fuel, you can lift up, but you probably won’t make it to orbit. So you need a big engine and a lot of fuel. …giant computers, that’s our rocket engine. And the fuel is the data.

Presently, is drilling data wells and building data pipelines through various products, initiatives, and collaborations to fill huge reservoirs of the so called fuel ‘data’ for our space journey.

Here is the list of links for respective healthcare data, which would catalyze medical researches and trials facilitating for a better healthcare protection. I would love to share with the community.



Contagious Diseases

Clinical & Drug Discovery

Identify patients who will be admitted to a hospital within the next year using historical claims data. [Data no longer available]

Think it’s possible to make hospital visits hassle-free? [Data no longer available]



FTP for GenBank below :D






Healthcare Entrepreneur, Computational Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store