You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am really struggling to get individual weights of SNPs from a pca analysis. I can get the weights from each individual, but I would really like to get the results from each SNP. I am also wondering why of 1.1 million SNPs in my plink file why the pca plot says it can only use 600,000. It says because they are not all on autosomes, but they are. They are in a species with 34 chromosomes. Any help would be appreciated. I am attaching my code. # Load the SNPRelate package
library(SNPRelate)
Hello, I am really struggling to get individual weights of SNPs from a pca analysis. I can get the weights from each individual, but I would really like to get the results from each SNP. I am also wondering why of 1.1 million SNPs in my plink file why the pca plot says it can only use 600,000. It says because they are not all on autosomes, but they are. They are in a species with 34 chromosomes. Any help would be appreciated. I am attaching my code. # Load the SNPRelate package
library(SNPRelate)
Set the file paths
bed.fn <- "/Users/tannervanorden/Desktop/Plink/Original_edited_2.bed"
fam.fn <- "/Users/tannervanorden/Desktop/Plink/Original_edited_2.fam"
bim.fn <- "/Users/tannervanorden/Desktop/Plink/Original_edited_2.bim"
Define the output GDS file
Specify a new GDS file name
new_gds_fn <- "/Users/tannervanorden/Desktop/Plink/PCA.gds"
Convert PLINK BED file to GDS file with cvt.chr="char"
snpgdsBED2GDS(bed.fn, fam.fn, bim.fn, new_gds_fn)
Perform PCA
genofile <- snpgdsOpen(new_gds_fn)
pca_result <- snpgdsPCA(genofile)
Extract population labels
Hendrys <- rep("Hendry's Creek", 42)
PineR <- rep('Pine & Ridge Creeks', 29)
Mill <- rep("Mill Creek", 18)
Trout = rep("Trout Creek", 4)
Mountain_Dell = rep("Mountain Dell", 2)
Birch = rep("Birch Creek", 3)
Willard = rep("Willard Creek", 30)
population_labels <- c(Hendrys, PineR, Mill, Trout, Mountain_Dell, Birch, Willard)
Create a data frame for ggplot
pca_plot_data <- data.frame(
LD1 = pca_result$eigenvect[, 2],
LD2 = pca_result$eigenvect[, 1],
Group = population_labels
)
Define population colors
population_colors <- c("darkorange", "blue", 'goldenrod1', 'red', 'chartreuse1', 'purple', 'pink', "cornflowerblue", 'bisque2', "black", "cyan", "brown", "lightgrey", "olivedrab3")
Create ggplot
pca_plot <- ggplot(pca_plot_data, aes(x = LD1, y = LD2, color = Group)) +
geom_point() +
scale_color_manual(values = population_colors) +
labs(x = "LD1", y = "LD2", title = "PCA Plot") + # Adjust title as needed
theme_linedraw()
Display the plot
print(pca_plot)
Close the GDS file
snpgdsClose(genofile)
The text was updated successfully, but these errors were encountered: