CircaDB¶ ↑

A Database of Circadian Gene Expression Profiles¶ ↑

Welcome to the CircaDB project. This is a data set of time course series data, highlighting circadian gene expression cycles through the use of the Cosopt algorithm for determining rhythmicity.

It is a Ruby on Rails application with a single controller and large data pre-set data set. Notable aspects include full-text search capability of gene annotation data via the Sphinx search engine, and data graphs dynamically produced by the Google Chart API.

If you have any questions contact the Hogenesch Lab at the University of Pennsylvania

bioinf.itmat.upenn.edu/hogeneschlab

License¶ ↑

All of the source code for data loading and the web application is licensed under the GNU General Public License (GPL-2.0). See the LICENSE file for more details.

Installing¶ ↑

Prerequisites¶ ↑

Install Ruby (www.ruby-lang.org/en/), RubyGems (rubygems.org/) and the bundler gem (gembundler.com/). Install MySQL (version >= 5.0) or have one you can connect to and create a database Install the Sphinx full text search engine (freelancing-god.github.com/ts/en/installing_sphinx.html)

Install¶ ↑

If you are only interested in reviewing the site and running a local instance, then:

Configure the config/database.yml file from the example config/database.yml.example to connect to your MySQL database.
Use the provided rake tasks to create the database

rake bootstrap:all

Visualize your own dataset¶ ↑

If you are interested in loading your own data, then you will need to load the array annotation information, add a Assay object for your experiment, then add the raw and statistic data from your experiment. See the lib/tasks/seed.rake for an example of our routines that manipulate and load data from source data files.

Examples to prepare your data can be found in /prepare_data_scripts.

You will need to find the matching annotation file for your gene chip. These can be found at websites like Affymetrix. To make the annotation files compatible with the database, they need to be in the following format (Checkout prepare_affy_annotation for an example):

%w{ gene_chip_id probeset_name genechip_name species annotation_date sequence_type
sequence_source transcript_id target_description representative_public_id
archival_unigene_cluster unigene_id genome_version  alignments
gene_title gene_symbol chromosomal_location unigene_cluster_type
ensembl entrez_gene swissprot ec omim
refseq_protein_id refseq_transcript_id flybase agi  wormbase
mgi_name rgd_name sgd_accession_number go_biological_process
go_cellular_component go_molecular_function
pathway interpro trans_membrane qtl annotation_description
annotation_transcript_cluster
transcript_assignments annotation_notes }.join(",")

Now you will need to prepare the microarray data itself. To fill the database the following format is expected (Checkout prepare_hughes_liver_data for an example), where Cubase is the link to the google API plot:
```
%w{ probe_set time_points.join(",") data_points.join(",") cubase }.join("@")
```

After the statistic test are finished you can use prepare_stats.

%w{row_names p_lomb_scargle q_lomb_scargle period_lomb_scargle phase_shift_lomb_scargle p_values_lichtenberg, q_values_lichtenberg
period_lichtenberg BH_Q_jtk ADJ_P_jtk
PER_jtk LAG_jtk AMP_jtk}.join(",")

Place your prepared datasets into /seed_data folder.

Before the dataset can be added with rake seed:fill , we will need to take a look at /lib/tasks/seed.rake. It is important to comment out the pre-existing datasets that are not required for your own database.

task :genechips => :environment do
  c = ActiveRecord::Base.connection
  c.execute "delete from gene_chips"
  g  = GeneChip.new(:slug => "Mouse430_2", :name => "Mouse Genome 430 2.0 ( Affymetrix)")
  g.save
  ...
  ### Add your gene chip here
  #g  = GeneChip.new(:slug => "SLUG-NAME", :name => "DESCRIPTION")
  #g.save
end
...
desc "Seed SLUG-NAME annotations"
task :SLUG-NAME_probesets => :environment do
  # probes
  fields = %w{ gene_chip_id probeset_name genechip_name species   annotation_date sequence_type sequence_source transcript_id   target_description representative_public_id archival_unigene_cluster  unigene_id genome_version alignments gene_title gene_symbol  chromosomal_location unigene_cluster_type ensembl entrez_gene swissprot ec   omim refseq_protein_id refseq_transcript_id flybase agi wormbase mgi_name   rgd_name sgd_accession_number go_biological_process go_cellular_component   go_molecular_function pathway interpro trans_membrane qtl   annotation_description annotation_transcript_cluster transcript_assignments   annotation_notes }
  g  = GeneChip.find(:first, :conditions => ["slug like ?", "SLUG-NAME"])
  count = 0
  buffer = []
  puts "=== Begin Probeset insert ==="
  FasterCSV.foreach("#{RAILS_ROOT}/seed_data/prepared_SLUG-NAME.csv",     :headers=> true ) do |ps|
    count += 1
    buffer << [g.id] + ps.values_at
    if count % 1000 == 0
      Probeset.import(fields,buffer)
      buffer = []
      puts count
    end
  end
  Probeset.import(fields,buffer)
  puts count
  puts "=== End Probeset insert ==="
end
...
v = [["liver","Mouse Liver 48 hour (Affymetrix)", affy_id],
   ["pituitary","Mouse Pituitary 48 hour (Affymetrix)",affy_id],
   ["NIH3T3","NIH 3T3 Immortilized Cell Line 48 hour (Affymetrix)",affy_id],
   ["WT_liver","Wild Type Liver (GNF microarray)", gnf_id],
   ["WT_muscle","Wild Type Muscle (GNF microarray)",gnf_id],
   ["WT_SCN","Wild Type SCN (GNF microarray)", gnf_id],
   ["panda_liver","Liver Panda 2002 (Affymetrix)",u74av1_id],
   ["panda_SCN_MAS4","SCN MAS4 Panda 2002 (Affymetrix)", u74av1_id],
   ["panda_SCN_gcrma","SCN gcrma Panda 2002 (Affymetrix)", u74av1_id]]
   ### ADD YOUR ASSAY HERE
   # ["SLUG-NAME","description", GENE-CHIP_id]]

#v = []
Assay.import(f,v)
...
### Add your dataset
%w{ YOUR_PREPARED_DATA_SET }.each do |etype|
count = 0
buffer = []
a = Assay.find(:first, :conditions => ["slug = ?", etype])
puts "=== Raw Data #{etype} insert starting ==="

File.open("#{RAILS_ROOT}/seed_data/#{etype}_data","r" ).each do |line|
  count += 1
  line = line.split("@")
  time_points = line[1].split(",").map {|element| element}
  data_points = line[2].split(",").map {|element| element}
  cubase = line[3]
  psid = probesets[line[0]]
  buffer << [a.id(), a.slug, psid, line[0], time_points.to_json, data_points.to_json,cubase]

  if count % 1000 == 0
    ProbesetData.import(fields,buffer)
    puts count
    buffer = []
  end
end
ProbesetData.import(fields,buffer)
puts "=== Raw Data #{etype} end (count= #{count}) ==="
...
### Add your stats
%w{ YOUR_PREPARED_DATA_SET }.each do |etype|

  count = 0
  buffer = []
  a = Assay.find(:first, :conditions => ["slug = ?", etype])
  puts "=== Stat Data #{etype} start ==="

  FasterCSV.foreach("#{RAILS_ROOT}/seed_data/hughes_#{etype}_stats") do |row|
    count += 1
    aslug, psname = 0,row[0].to_i
    psid = probesets[row[0]]
    buffer << [a.id, a.slug,psid, psid, psname] + row[1..-1].to_a
    if count % 1000 == 0
      ProbesetStat.import(fields,buffer)
      buffer = []
      puts count
    end
  end
  ProbesetStat.import(fields,buffer)
  puts "=== Stat Data #{etype} end (count = #{count}) ==="
end
...

Run:
```
rake seed:fill
```

Name		Name	Last commit message	Last commit date
Latest commit History 291 Commits
app		app
config		config
db		db
doc		doc
lib/tasks		lib/tasks
log		log
o-ri		o-ri
prepare_data_scripts		prepare_data_scripts
public		public
script		script
server		server
tmp		tmp
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE		LICENSE
README.rdoc		README.rdoc
Rakefile		Rakefile
config.ru		config.ru
distal_colon_prep.csv		distal_colon_prep.csv
test.csv		test.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CircaDB¶ ↑

A Database of Circadian Gene Expression Profiles¶ ↑

License¶ ↑

Installing¶ ↑

Prerequisites¶ ↑

Install¶ ↑

Visualize your own dataset¶ ↑

About

Releases

Packages

Languages

License

khayer/circadb

Folders and files

Latest commit

History

Repository files navigation

CircaDB¶ ↑

A Database of Circadian Gene Expression Profiles¶ ↑

License¶ ↑

Installing¶ ↑

Prerequisites¶ ↑

Install¶ ↑

Visualize your own dataset¶ ↑

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages