Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hello, I have some question about Struct LMM #28

Open
dkssud24 opened this issue Apr 25, 2019 · 1 comment
Open

hello, I have some question about Struct LMM #28

dkssud24 opened this issue Apr 25, 2019 · 1 comment

Comments

@dkssud24
Copy link

dkssud24 commented Apr 25, 2019

alldata.xlsx
bedbimfam.zip
cov_agesex.txt
env_bmigroup.txt
pheno_glucose.txt

hi @horta, I am a Master degree in Kyung Hee University in Korea. I am majoring in bio informatics. I was impressed with your paper and struct LMM program. I think the idea about Struct LMM is the best.
So I tried to assign it to part of our data to your program, and then I want to make sure it worked properly.

  1. Why do I get an error if I do not norm or gausinaize Pheno data?
  2. Is the outcome of the assignment of Pheno, cov, and env equal to your intentions?

outcome :
chrom snp cm pos a0 a1 i pv_int pv
1 rs17106184 0.0 50909985 A G 0 0.116081 0.039977

`import os
import pandas as pd
import scipy as sp
from limix_core.util.preprocess import gaussianize
from struct_lmm import run_structlmm
from struct_lmm.utils.sugar_utils import norm_env_matrix
from pandas_plink import read_plink
import geno_sugar as gs

if name == "main":

# import genotype file
#bedfile = "data_structlmm/chrom22_subsample20_maf0.10"
bedfile = "rs17106184_N50"
(bim, fam, G) = read_plink(bedfile)

# subsample snps
#Isnp = gs.is_in(bim, ("22", 17500000, 18000000))
#G, bim = gs.snp_query(G, bim, Isnp)

# load phenotype file
phenofile = "pheno_glucose.txt"
dfp = sp.loadtxt(phenofile)
pheno = norm_env_matrix(dfp)
#dfp = pd.read_csv(phenofile, index_col=0)
#pheno = gaussianize(dfp.loc["BMI"].values[:, None])

# load environment file and normalize
envfile = "env_bmigroup.txt"
E = sp.loadtxt(envfile)
E = norm_env_matrix(E)

# mean as fixed effect
#covs = sp.ones((E.shape[0], 1))
covs = "cov_agesex.txt"
covs = sp.loadtxt(covs)

# run analysis with struct lmm
snp_preproc = {"max_miss": 0.01, "min_maf": 0.02}
res = run_structlmm(
    G, bim, pheno, E, covs=covs, batch_size=100, snp_preproc=snp_preproc
)

# export
print("Export")
print(res)
#if not os.path.exists("out"):
#    os.makedirs("out")
#res.to_csv("out/res_structlmm.csv", index=False)

`

@horta
Copy link
Collaborator

horta commented Jan 23, 2020

Hi @dkssud24 . Have you tried the new version?

Regarding question 1., struct-lmm might fail to run if your phenotype have extreme values (is that the case?). Gaussianizing the phenotype make those values become smaller.

Regarding question 2., we have updated the documentation. It is more clear now how to use the inputs. Please, have a look at https://github.com/limix/struct-lmm/blob/master/struct_lmm/_lmm.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants