Zu Hauptinhalt springen

Software

Regensburg GEM Platform - Development of genetic-epidemiologic methods (GEM) und their realization in software (GWAS data quality control, interaction analyses, stratified approaches, Imputation)

Prof. Dr. Iris Heid, Dr. Thomas Winkler, Dr. Mathias Gorski, Dr. Felix Günther, Kira Stanzick M.Sc.

Here you can download software that was developed for various aspects of genome-wide association study (GWAS) analyses. 


KidneyGPS

KidneyGPS - a web-based software tool to support GPS for kidney function loci

The results of Gene PrioritiSation (GPS) based on GWAS meta-analyses (GWAMA) and post-GWAMA are high-dimensional and requires expertise for interpretation. To provide easier access to the relevant results from GWAMAs and post-GWAMAs for kidney function and kidney function decline to experts from other fields (e.g. medicine, physiology, biology), we have developed KidneyGPS as a user-friendly web-application. KidneyGPS enables easy access by search functions on genes, variants, and regions, to prioritize genes and variants likely relevant for kidney function in humans for functional follow-up. Several options allow for customizing the presented output according to the specific needs of the user.

Click here to access our Gene PrioritiSation tool

Releases

KidneyGPS 1.0 Web-based gene prioritisation based on data and results from Stanzick et al. 2021 (424 loci , 634 signals, 5906 genes, 38,306 variants with evidence for association with eGFR)
KidneyGPS 1.1 [March 2022]: Larger datasets for gene expression and genetic variants that modulate gene expression in the kidney (integrating eQTL-data for kidney tissue from Sheng et al., 2021)

This work is conducted within the SFB-1350.


EasyQC2 (CHARGE Gene-Lifestyle Interaction Working group)

Description

EasyQC2 is an extension of the previous EasyQC R-package that provides advanced and improved functionality. 

Please go to the CHAREG-GLI section of the website that provides the current version of package and material for download: www.genepi-regensburg.de/charge-gli


EasyQC (Winkler et al. 2014)

Description

EasyQC is an R-package that provides advanced funcionality

(i) to perform file-level QC of single genome-wide association (GWA) data-sets;

(ii) to conduct quality control across several GWA data-sets (meta-level QC);

(iii) to simplify data-handling of large-scale GWA data-sets

One could also say, it can be used as Nonsense-Detector for study-specific GWA data-sets.

Download

Currect Version 23.8: EasyQC_23.8.tar.gz 

Previous distributed version: EasyQC_9.2.tar.gz

Manual: EasyQC_9.0_Commands_140918_2.pdf

ChangeLog: EASYQC_CHANGE.log

Download – Early AMD meta-analysis cleaning material

The following EasyQC ecf-script was used for quality control of 11 Early AMD GWAS prior to meta-analysis (Winkler et al. BMC Medical Genomics 2020). The cleaning script was developed for binary outcome GWAS that were conducted with rvtest: 

     studyqc-earlyamd.ecf

The script is capable with rvtest output. Further details and instructions on the individual steps are shown as comments in the script.

Download – 1000 Genomes / HRC cleaning material

The following material can be used for quality control of 1000 Genomes or HRC imputed GWAS result data sets.

Scripts:

     fileqc_1000G.ecf

This script can be used with the below 1000G or HRC reference files and incorporates different QC steps such as Sanity Checks, Filtering, Allele coding harmonization, Marker harmonization, Allele frequency checks, QQ plots, etc. In particular the allele coding and the marker harmoniization are inevitable steps prior to meta-analysis.

Allele frequency reference data (all based on NCBI build 37):

The provided allele frequency reference files are using the cptid format for marker identifiers. The cptid format is automatically generated by the EasyQC function CREATECPTID. Please see the EasyQC manual for more detailed information on the format.


Allele frequency reference data for 1000G phase1 version3 imputed GWAS (based on allele frequencies given in the "legend" files from the IMPUTE website):

   Excluding X-Chr variants:

     allelefreq.1000G_EUR_p1v3.impute_legends.noDup.noX.gz

     allelefreq.1000G_AFR_p1v3.impute_legends.noDup.noX.gz

     allelefreq.1000G_AMR_p1v3.impute_legends.noDup.noX.gz

     allelefreq.1000G_ASN_p1v3.impute_legends.noDup.noX.gz

   Excluding X-Chr variants, excluding monomorphic variants:               

     allelefreq.1000G_EUR_p1v3.impute_legends.noMono.noDup.noX.v2.gz

     allelefreq.1000G_AFR_p1v3.impute_legends.noMono.noDup.noX.v2.gz

     allelefreq.1000G_AMR_p1v3.impute_legends.noMono.noDup.noX.v2.gz

     allelefreq.1000G_ASN_p1v3.impute_legends.noMono.noDup.noX.v2.gz

Allele frequency reference data for 1000G phase3 version5 imputed GWAS (based on allele frequencies given in the "legend" files from the IMPUTE website):

   Excluding monomorphic variants, excluding CNVs:

     1000GP_p3v5_legends_rbind.noDup.noMono.noCnv.noCnAll.afref.ALL.txt.gz

     1000GP_p3v5_legends_rbind.noDup.noMono.noCnv.noCnAll.afref.EUR.txt.gz

     1000GP_p3v5_legends_rbind.noDup.noMono.noCnv.noCnAll.afref.AFR.txt.gz

     1000GP_p3v5_legends_rbind.noDup.noMono.noCnv.noCnAll.afref.AMR.txt.gz

     1000GP_p3v5_legends_rbind.noDup.noMono.noCnv.noCnAll.afref.EAS.txt.gz

     1000GP_p3v5_legends_rbind.noDup.noMono.noCnv.noCnAll.afref.SAS.txt.gz

Allele freq. reference data for Haplotype Reference Consortium (HRC) imputed GWAS (based on allele frequencies given in the reference file provided by Will Rayner, http://www.well.ox.ac.uk/~wrayner/tools/#Checking):

   Excluding variants with mac<5 or maf<0.1% 

     HRC.r1-1.GRCh37.wgs.mac5.sites.tab.cptid.maf001.gz

Mapping files (all based on NCBI build 37):

The mapping files contain information about chromosome and position for various contained marker identifiers (e.g., rsIDs) that do not contain the chromosomal and position information within the marker name (e.g., "chr1:123:AT_A"). The files are based on imputation reference files from the MACH and the IMPUTE websites. It can be used with the EasyQC function CREATECPTID that allows for harmonization of marker names across studies by compiling unique cptid's. Please see the EasyQC manual for more detailled information on the cptid format.

 Mapping file for 1000G phase1 version3 imputed GWAS:

   rsmid_map.1000G_ALL_p1v3.merged_mach_impute.v3.mergeindels.txt.gz

 Mapping file for 1000G phase3 version5 imputed GWAS:

   rsmid_machsvs_mapb37.1000G_p3v5.merged_mach_impute.v3.corrpos.gz

 Mapping file for HRC imputed GWAS:

   HRC.r1-1.GRCh37.wgs.mac5.sites.tab.rsid_map.gz

Change log for mapping files: CHANGE_map.log

Download – GIANT QC paper (Winkler et al) material

The following material has been used for quality control and for several projects of the Genetic Investigation of ANthropometric Traits (GIANT) consortium.

Scripts:

  File-level QC scripts:

     1_filelevel_qc.gwa.ecf (for HapMap imputed data)

     1_filelevel_qc.metabochip.ecf (for genotyped Metabochip data)

  Meta-level QC script:

     2_metalevel_qc.ecf

  Meta-Analysis script (to be used with metal):

     3_metal_metaanalysis.txt

  Meta-Analysis QC scripts

     4_metaanalysis_qc.compare.ecf

     4_metaanalysis_qc.compare_logfiles.r (R-script)

     4_metaanalysis_qc.studymeta.ecf

Reference data:

  Allele frequency reference data (based on NCBI build 36):

     AlleleFreq_HapMap_CEU.v2.txt.gz (for CEU HapMap imputed data)

     AlleleFreq_1000G_EUR_Metabochip.v1.txt.gz (for CEU genotyped Metabochip data)

  Marker harmonization reference data (based on NCBI build 36):

     SNPID_to_ChrPosID.b36_v2.txt.gz

  QT interval SNPs reference data (based on NCBI build 36):

     QTSNPs_AEL_TW.txt

Please see our QC paper "Winkler et al.: Quality control and conduct of genome-wide association meta-analyses. Nature Protocols 2014" for further details regarding this scripts and material.

Download – Exomechip cleaning material

Scripts:

  Cleaning scripts for Rvtests output:

     clean_rvtests.ecf (for Rvtets association output)

     clean_rvtests_cov.ecf (for Rvtets *Cov* output)

  Cleaning scripts for Raremetalworker output:

     clean_raremetalworker.ecf (for Raremetalworker association output)

     clean_raremetalworker_cov.ecf (for Raremetalworker *cov* output)

Reference data:

  Exomechip Allele frequency reference data:

       AFR.frequencies

       AMR.frequencies

       EUR.frequencies

       ASN.frequencies


Requirements

R 2.13 or higher.

Only UNIX/LINUX systems are supported.


Citation

If you use EasyQC please cite

"Winkler et al.: Quality control and conduct of genome-wide association meta-analyses. Nature Protocols 2014"

and (if possible) reference our webpage "www.genepi-regensburg.de/easyqc".

Thank you.


License

EasyQC is licensed under the GNU General Public License, version 3.

Copyright © 2012 by Thomas Winkler.

Although we hope that EasyQC will be very useful, it is published WITHOUT ANY WARRANTY.


Contact

If you require support for a different platform or have any further questions please e-mail Thomas Winkler


date of last update: 2017-02-20


EasyStrata (Winkler et al. 2014)

EasyStrata


Description

EasyStrata is an R-package that provides advanced funcionality

(i) for the evaluation of stratified GWAS;

(ii) for plotting GWAS results with a specific focus on stratification;

(iii) to simplify data-handling of large-scale GWA data-sets

Download

Version 8.6: EasyStrata_8.6.tar.gz

Command Reference / Manual: EasyStrata_8.6_Commands_140615.pdf

Alternatively, you can access the package via the CRAN R package repository: http://cran.r-project.org/web/packages/EasyStrata/

Download – Example scripts and data

The following scripts have been developed and can be used for the evaluation of stratified GWAMA results from the Genetic Investigation of ANthropometric Traits (GIANT) consortium.

Scripts:

  Plotting scripts:

     easystrata_figure1_miami.ecf (Miami-Plot for contrasting two strata)

     easystrata_supplfigure3_qqplot.ecf (QQ-Plot of multiple strata)

     easystrata_supplfigure4_scatter.ecf (Scatter-Plot of strata-specific effect sizes)

     easystrata_supplfigure5_qq_omitreported.ecf (QQ-Plot excluding known loci)

     easystrata_supplfigure6_plotspeed.ecf (Increasing plot speed)

     easystrata_supplfigure7_break_yaxis.ecf (Breaking up y-axis of Manhattan-plot)

     easystrata_supplfigure8_panel.ecf (Panel of QQ and scatter plots)

  Evaluation scripts:

     easystrata_supplpipe2A_sexdiff.ecf (Difference btw. 2 strata)

     easystrata_supplpipe2B_sexdiff_filt.ecf (Difference btw. 2 strata + overall filter)

     easystrata_supplpipe2C_joint.ecf (Joint main+interaction effect)

  Integrative genome screen script (Winkler et al. NatComm 2018):

     integrative_screen.ecf (The integrative screen script requires EasyStrata v18.1 or greater that can be downloaded here: EasyStrata_18.1.tar.gz)

Data:

  Example mapping file:

     hapmap36.map (Hapmap b36 mapping file: SNPID, Chromosome, Position)

  Example locus annotation file:

     WAIST_2009_2010_14_reported.txt (Known waist-hip ratio loci, published by Lindgren et al 2009, Heid et al 2010)


Requirements

R 2.13 or higher. R packages 'Cairo' and 'plotrix'.


Citation

If you use EasyStrata please cite

"Winkler et al.: EasyStrata: evaluation and visualization of stratified genome-wide
association meta-analysis data. Bioinformatics 2014"

and (if possible) reference our webpage "www.genepi-regensburg.de/easystrata".

Thank you.


License

EasyStrata is licensed under the GNU General Public License, version 3.

Copyright © 2012 by Thomas Winkler.

Although we hope that EasyStrata will be very useful, it is published WITHOUT ANY WARRANTY.


Contact

If you require support for a different platform or have any further questions please e-mail Thomas Winkler


date of last update: 2018-04-19


MLA-bilateral (Günther et al. 2020)

MLA-bilateral

mcblog

The mcblog R-package provides an implementation of the maximum likelihood approach to adjust worse-entity logistic regression for bilateral disease for entity-specific misclassification using validation data as introduced in Guenther et al. (2020). This approach can e.g., be used to adjust genetic association estimates for bilateral disease phenotypes (e.g., age-related macular degeneration) for misclassification in the disease status due to error-prone or suboptimal entity specific disease classifications when gold-standard classifications are available for a subset ofentities.  

Download

mcblog_0.0.0.9000.tar.gz

Example code

Please see the following vignette for an introduction into the usage of the R-package and illustrative examples:

introduction_mcblog.html

Reference:

Guenther, F., Brandl, C., Winkler, T. W., Wanner, V., Stark, K., Küchenhoff, H., & Heid, I. M. (2020). Chances and challenges of machine learning based disease classification in genetic association studies illustrated on age-related macular degeneration. Genetic Epidemiology.  

Contact

Felix.Guenther@stat.uni-muenchen.de


idGenerator

idGenerator provides an automated tool to generate identifiers (IDs) with multiple features, particularly for modern epidemiological or clinical studies. The software enables the generation of structured IDs to facilitate study organization, layered IDs to enhance data protection, and check digits to detect entry errors. It is easy to utilize due to a user-friendly graphic user interface and practical by providing IDs as standard text and 128B barcode. idGenerator addresses towards small to medium epidemiologic or clinical studies in need of a simple yet secure concept and tool for ID creation management. The software may be used by study personnel without programming training on a standard Windows computer.

Download:

Contact

If you require support for a different platform or have any further questions please e-mail Matthias Olden.

date of last update: 2021-02-22


Meta-Mega pipeline

Here you can find the scripts for the parallel processing imputation pipeline along with a detailled description. 

Description: 

     MetaMega_pipeline_parallel_phasing_imputing_v2.pdf

Scripts: 

     01_phasing.pbs
     02_imputing.pbs
     03_generate_phasing_pbs_scripts.R
     04_generate_imputing_pbs_scripts.R
     05_submit.sh

Please contact mathias.gorski@ukr.de if you have questions or problems running the pipeline. 



  1. Fakultät für Medizin

Lehrstuhl für Genetische Epidemiologie

Institut für Epidemiologie und Präventivmedizin

Fittosize 191 191 4464fe38391e129d5ad35222161ab65b Eingang

Universitätsklinikum Regensburg
Franz-Josef-Strauß-Allee 11
93053 Regensburg