eigenMT: efficient multiple hypothesis testing correction for eQTLs

1. Installation

  • Download the eigenMT source eigenMT.tgz

  • or, download the eigenMT source and example data: eigenMTwithTestData.tgz (Warning this is ~200Mb)

  • Open the downloaded tar file using the following command:

    tar -xzvf eigenMT.tgz
  • This will create a directory named eigenMT. Inside this directory will be the eigenMT python script (eigenMT.py) as well as example input files for a test run of the script.

  • eigenMT runs as a stand-alone python script. It requires an active installaton of python (version 2.7 or higher) to be installed. To download and install a python distribution, there are a few convenient options
  • These bundled installations already include a number of modules for running eigenMT. The following are modules required for running eigenMT
  • Again, these modules should come pre-packaged in one of the bundled python installations above. If you decide to install these packages yourself, please see the listed websites for detailed instructions.

2. Input

  • An eigenMT run will have the following command format:

    python eigenMT.py --CHROM <chromosome ID>
            --QTL <Matrix-eQTL SNP-gene tests file> \
            --GEN <Matrix-eQTL genotype matrix> \
            --GENPOS <Matrix-eQTL genotype position file> \
            --PHEPOS <Matrix-eQTL phenotype position file> \
            --var_thresh [variance explained threshold, default 0.99] \
            --cis_dist [distance threshold for detection of cis-eQTLs, default 1e6 or max distance tested by Matrix-eQTL]
            --OUT <output filename> \
            --window [window size, default 200]
            --external [flag option indicating whether genotype matrix provided was used in initial cis-eQTL calling. By default, assumes genotype matrix was used for testing.]
  • The input fields are defined as follows.
    • CHROM
      • Chromosome ID. Indicates which chromosome the analysis will be performed on. Must match the ID used in the Matrix-eQTL SNP-gene tests file.
    • QTL
      • Filename for Matrix-eQTL SNP-gene tests file. The first two columns of this file will correspond to the variant and probe IDs, respectively. This file must also contain a column with the nominal P-values from the QTL tests. These three columns should be labeled in the header of the file with the following names: ‘snps’, ‘gene’, and ‘pvalue’, respectively. See qtls.txt for an example.
    • GEN
      • Genotype matrix in Matrix-eQTL format. Can accept either hardcoded or dosage based genotypes. Missing genotypes must be encoded as NA and will be imputed to the mean genotype during correction. See genotypes.txt for an example.
    • GENPOS
      • Matrix-eQTL genotype position file. See gen.positions.txt for an example.
    • PHEPOS
      • Matrix-eQTL phenotype position file. See phe.positions.txt for an example.
    • var_thresh
      • Threshold for amount of variance explained in the genotype correlation matrix. Default is 99% variance explained. Increasing this threshold will increase estimates of effective number of tests (M_eff) and decrease accuracy of the approximation.
    • cis_dist
      • Distance threshold to test for cis-eQTLs. For example, this option could be used to restruct the discovery of significant cis-eQTLs to within 1MB of gene TSSs. Default is the minimum of 1e6 or the distance threshold set in the initial testing.
    • OUT
      • Output filename. Output format is described below.
    • window
      • Window size parameter. Determines what size of disjoint windows to split genotype matrices for each gene into. Default is 200 SNPs. We recommend using a window size of at least 50 SNPs up to 200 SNPs to balance accuracy and speed.
    • external
      • flag option indicating whether the genotype matrix provided was used in the initial cis-eQTL testing or is a separate genotype matrix. Without this flag, the program assumes the genotype matrices used for cis-eQTL testing and multiple testing correction are the same. If a separate genotype matrix is used, it should be representative of the population under study.
  • Descriptions of the Matrix-eQTL file formats can be found here

3. Output

  • The output file is in tab-separated format with the following columns:
    • Col 1: SNP ID
    • Col 2: GENE ID
    • Col 3: estimate of effect size BETA from Matrix-eQTL
    • Col 4: T-statistic from Matrix-eQTL
    • Col 5: p-value from Matrix-eQTL
    • Col 6: eigenMT corrected p-value
    • Col 7: estimated number of independent tests for the gene
  • Note: each tested gene will appear once in the output file with its most significant SNP and the eigenMT corrected p-value.

4. Example

  • We offer a small example for use of eigenMT. We provide a genotype matrix (genotypes.txt) and corresponding position file (positions.txt) in Matrix-eQTL format. This genotype matrix is for chromosome 19 for the EUR373 samples as part of the GEUVADIS study. We also provide a sample of the cis-eQTL tests performed for these samples using Matrix-eQTL in the file qtls.txt. This sample SNP-gene tests file represents 100 genes. The full SNP-gene tests file (qtls_full.txt) for chromosomne 19 is also provided for replication of results from our paper. To run eigenMT on the example data, use the following command from the eigenMT directory:

    python eigenMT.py --CHROM 19 \
            --QTL qtls.txt \
            --GEN genotypes.txt \
            --GENPOS gen.positions.txt \
            --PHEPOS phe.positions.txt \
            --OUT exampleOut.txt
  • Note: the example takes roughly 15 minutes to run. Additionally, this example uses the default settings for window size (200 variants), variance explained (99%), and cis-distance threshold (1MB).

  • Finally, we provide an R script (compareToEmpirical.R) to visualize the example results against empirical p-values. Specifically, we provide a file with the empirical p-values (10000 permutations) estimated for all 1057 genes tested on chromosome 19 (empiricalPvalues.txt). To perform the comparison for the example run, use the following command after generating the eigenMT results exampleOUT.txt:

    R CMD BATCH compareToEmpirical.R
  • This will produce a pdf file with 3 plots:
    • The first plot will show the untransformed empirical p-values (x-axis) compared to the untransformed eigenMT p-values (y-axis).
    • The second plot will show the same as the above on a -log10 scale.
    • The third plot will show the same as the second with the most extreme empricial p-values removed, i.e. empirical p-values < 1e-4.

5. Example with user-specified cis-distance threshold

  • To increase felxibility of cis-eQTL testing, we have included the option –cis_dist. With this option, users can first perform cis-eQTL testing at large distances from gene TSSs (or probe positions), say 10MB or more. They can then test for significant cis-eQTLs in a much smaller window, say 100KB, with this option (–cis-dist 1e5). If a larger distance is requested than that used in initial testing, our method will default to the threshold used in the initial testing. An example command is given below:

    python eigenMT.py --CHROM 19 \
    --QTL qtls.txt \
    --GEN genotypes.txt \
    --GENPOS gen.positions.txt \
    --PHEPOS phe.positions.txt \
    --OUT exampleOut.txt \
    --cis_dist 1e5
  • The QTL file was generated using a distance threshold of 1e6 (1MB). This command will perform multiple testing correction using our method for variants within 100KB of gene TSSs.

6. Population stratification and other covariates

  • We recommend first removing the effects of population stratification (genotype PCs, population or ancestry assignments) and other covariates (age, gender, PEER factors, etc) from the expression matrix. We have shown that by first removing these effects and then performing cis-eQTL calling on the inverse rank normalized residuals, we ensure the conservativeness and accuracy of eigenMT. We will soon provide an R script with example code for how to perform this normalization step prior to a run of eigenMT.

7. Citation

  • When using eigenMT, please cite:

    * Davis, JR, Fresard, L, et al. eigenMT: efficient multiple hypothesis testing correction for eQTLs.