Breast cancer, as a primary tumour, is well-know to cause secondary tumours in brain, bone, liver and lung. The datasets, obtained from Gene Ex-pression Omnibus (GEO), having accession number GSE 14020 contains 65 samples of different metas-tases of Breast Cancer. Specifically, these samples are divided in two series matrix Samples (GPL96 and GPL570) having different dimensions. GPL96 have 36 samples and 22,283 genes. Whereas, GPL570 have 29 samples and 54,675 genes. Given that X = x1,x2,x3.... xN presents the list of genes and y1 = Brain; y2 Bone; y3 Liver; y4 Lung presents the location of metastases, you are to devise a framework looking at a new data point X one can give appropriate prog-nosis of its most likely location i.e., Y . Dataset can be accessed at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE14020
Problem is solved using different machine learning classification algorithm. I have used Recursive Feature elimination to for Feature Selection and results are compared with different Train/Test Ratio.
Repository contains the term paper written on this problem as well.