• Log in with Facebook Log in with Twitter Log In with Google      Sign In    
  • Create Account
  LongeCity
              Advocacy & Research for Unlimited Lifespans


Adverts help to support the work of this non-profit organisation. To go ad-free join as a Member.


Photo
- - - - -

Hypothetical Question: Neural Networks to detect clinically significant phenotypes

machine learning

  • Please log in to reply
5 replies to this topic

#1 littlePawn

  • Guest
  • 61 posts
  • 11
  • Location:Australia
  • NO

Posted 06 December 2015 - 02:06 PM


I've just recently begun learning about machine learning, in particular neural networks, logistic regression etc. In an effort to identify significant SNPs that are common to my own ailments (in particular essential tremor and generalized anxiety) where no conclusive links exist, I was wondering about a potential crossover of these two fields on a large scale. In my opinion, the first step to a cure for anything, even aging, is to first be able to diagnose the disease correctly. It seems many ailments presenting symptoms outside of the standard ones (cancer, diabetes etc) have no known cause (essential tremor for example, which whilst believed to be hereditary there as yet are no studies I'm aware of showing any SNP's of statistical significance that may be the cause). In fact, I'd say even for the majority of diseases, we don't know the specific combinations of SNP's that may cause them.

 

Biology is the most complicated problem to date presented to mankind. I don't believe we can possibly identify the disease causing SNP's or other gene mutations (nonsense, insertions, deletions etc) for all diseases in a laboratory and through guesswork, or clinical observation. We must somehow employ machine learning to assist in solving this problem. Once we know the mutations responsible for most diseases, we will then know what to specifically target for a cure via gene therapy or whatever other available technology there is in the future.

 

I therefore have a question, and I do apologize in advance if it is simplistic, as I am new to this area of study. This is a hypothetical question as I doubt computational power yet exists to compute this, but given that computation power and time were sufficient (parallel processing on a massive cloud or something as such), would the following project lead to any useful results?

 

Basically the idea is this: Train a neural network with n features, where n is the set of all known SNP's and other mutations that can be represented categorically somehow. So most likely, 10,000,000+ features which will expand with time. Then, let m be our samples. We will have a sample of let's say, 100,000 people which have some disease X, and another 100,000 people who don't have this disease as a control. We randomly mix the data together to create a 200,000 sample size called m. We categorize the output either as 0 or 1 (0 is disease, 1 is not).  We then divide this dataset into whatever the preferred standard is (let's say 60% of m is the training set, 20% a cross validation set, 20% a test set).

 

The question now. After we have trained the neural network and assigned all the appropriate regularization, hidden layers etc, would you expect any statistically significant prediction on the 20% test set? And if so, could we then go on to identify the features (SNPs), perhaps via unsupervised clustering, that are statistically significant in terms of being a cause for this disease X?

 

 



#2 niner

  • Guest
  • 16,276 posts
  • 2,000
  • Location:Philadelphia

Posted 07 December 2015 - 04:41 AM

 

Biology is the most complicated problem to date presented to mankind.

 

 

Amen.  Some people don't get this.   Your general idea of applying machine learning to biological data is a good one.  In fact, it's the basis of Craig Venter's new company, HLI.   One problem that I see with your particular implementation is that you have a huge number of features but a relatively small number of cases.   You will probably need a method that helps with feature selection in order to reduce the dimensionality of the feature space, and/or more data.  A lot of people are looking at modern variants of multi-layer neural nets in methodology now called "deep learning".  Venter hired away one of Google's deep learning heavyweights.  Anyway, it's a very cool idea.  If you keep pursuing it and get good at it, you'll be very employable.



sponsored ad

  • Advert

#3 Danail Bulgaria

  • Guest
  • 2,212 posts
  • 421
  • Location:Bulgaria

Posted 07 December 2015 - 09:02 AM

@littlePawn

 

You certainly have written some correct things.

"first step to a cure for anything, even aging, is to first be able to diagnose the disease correctly."

Absolutely agree as someone with a medical background.

 

"even for the majority of diseases, we don't know the specific combinations of SNP's that may cause them."

Correct, as far as I know.

 

Your idea about the neural network and how will it work is built in your mund as step by step. It remains only to sit infront of the laptop and code the network. It may turn out to do something usefull.

My advice to you - take some PC language, and make it before someone to have stollen your idea ;)



#4 littlePawn

  • Topic Starter
  • Guest
  • 61 posts
  • 11
  • Location:Australia
  • NO

Posted 08 December 2015 - 02:35 AM

@niner Thanks for the feedback. Yeah I was thinking the sample size would be a bit of a problem, but also perhaps not if some SNP's are significant enough and hold enough weight. Your idea of dimensionality reduction is great. Would be very interesting to compare some PCA algorithm in different dimensions vs the raw dataset. Would be amazing what a supercomputer or two could do with data such as that..

 

@seivtcho Thanks for the encouragement. To be honest I'd be surprised if I was the first. And the way I see the life extension movement, forgoing any immediate profit now by sharing knowledge openly so we can live long enough to then make a profit in some special niche 2000+ years in the future that we're good at is the most reasonable thing. If only mainstream corporations also thought this way.. We're definitely on the verge of something exciting I think, so long as key players can keep their eyes on the bigger picture and its potential rewards instead of the immediate ones.

 

Interestingly I notice that 23andme and Pfizer, perhaps other drug companies, have signed contracts. I can just imagine that this is something they would be doing. 23andme would surely have amassed a large enough sample size from both their SNP data and those questionnaires they send out asking people if they have a disease... In effect a huge drug company like Pfizer would have a license to print money in the not too distant future, especially with all the patents that will no doubt ensue. Drug targeting on the level of gene expression, perhaps via inhibiting/activating specific signal transduction pathways if we can't access the gene directly, sounds plausible I think.

 

But even if we're all missing out on the profit this may bring, if the rewards are there.. that disease and potentially aging is cured by whatever companies come together, then may we live long enough to focus on finance and business a few hundred years from now instead of biology and reap our fruit then.

 

 



#5 treonsverdery

  • Guest
  • 1,312 posts
  • 161
  • Location:where I am at

Posted 27 October 2016 - 07:23 PM

There are many approaches with big data, another thing you could do with a hundred million medical and genomics records is to find SNPs that are rare, like less than 5% of a population, then see if any of these rare genotypes are linked to either thn better or wrose than well. 

 

The theory is that almost everyone has some rare SNPs, some of which might be highly correlated to hyper wellness or disease.  Hyper wellness is good to find as people can then genetically engineer their children to be hyper well with ultralongevity.



sponsored ad

  • Advert

#6 littlePawn

  • Topic Starter
  • Guest
  • 61 posts
  • 11
  • Location:Australia
  • NO

Posted 17 December 2016 - 02:14 PM

There are many approaches with big data, another thing you could do with a hundred million medical and genomics records is to find SNPs that are rare, like less than 5% of a population, then see if any of these rare genotypes are linked to either thn better or wrose than well. 

 

The theory is that almost everyone has some rare SNPs, some of which might be highly correlated to hyper wellness or disease.  Hyper wellness is good to find as people can then genetically engineer their children to be hyper well with ultralongevity.

 

It's a great idea and one I've tried implementing. I've gone to various disease forums (GAD/ET etc) to find people matching identical symptoms to mine that also have 23andme or some other SNP raw data of their genome. I then connect to an SNP database such as 1000 genomes that has actual population frequency for polymorphisms. Do a 3 way SQL table join of my genome, the other person's genome, and the population frequency for the SNPs, and then sort by pop frequency. I keep then finding more people and joining them to the table. I think I managed to get about 5 people with the same exact self-reported symptoms, and a number of SNP's isolated with a pop frequency less than 0.01%. These SNP's are not clinically significant currently but they belong to genes that are very likely involved in diseases such as GAD. Could this have been something significant I discovered? Unlikely given the very low number of people, it's probably just a coincidence. But it might be more than that too? Who knows.

 

The problem I ran across that made me stop with this project, is that it's just impossible for me to acquire more datasets. I ended up getting my threads closed on some of those forums for advertising people to signup for a 23andme kit so that they could contribute their genome to my project. Isn't that ironic? Forums dedicated to support for disease sufferers banning someone who's looking for a cure.

 

Anyway I've stopped with this project for several months now but if you have any idea how we could go about collecting more genomes from people we know have the same disease then it could continue. As I said before I bet this is exactly what 23andme/Pfizer is already doing. But a group of us together could surely find our own niche in this market. I know we'll all be banging our head in 20 years time that we missed this gold mine. Not just about the money but about contributing to research for specific diseases that may be of interest to us.







Also tagged with one or more of these keywords: machine learning

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users