Integration of genetics and miRNA–target gene network identified disease biology implicated in tissue specificity

MicroRNAs (miRNAs) modulate the post-transcriptional regulation of target genes and are related to biology of complex human traits, but genetic landscape of miRNAs remains largely unknown. Given the strikingly tissue-specific miRNA expression profiles, we here expand a previous method to quantitatively evaluate enrichment of genome-wide association study (GWAS) signals on miRNA–target gene networks (MIGWAS) to further estimate tissue-specific enrichment. Our approach integrates tissue-specific expression profiles of miRNAs (∼1800 miRNAs in 179 cells) with GWAS to test whether polygenic signals enrich in miRNA–target gene networks and whether they fall within specific tissues. We applied MIGWAS to 49 GWASs ( n Total = 3 520 246), and successfully identified biologically relevant tissues. Further, MIGWAS could point miRNAs as candidate biomarkers of the trait. As an illustrative example, we performed differentially expressed miRNA analysis between rheumatoid arthritis (RA) patients and healthy controls ( n = 63). We identified novel biomarker miRNAs (e.g. hsa-miR-762) by integrating differentially expressed miRNAs with MIGWAS results for RA, as well as novel associated loci with significant genetic risk (rs56656810 at MIR762 at 16q11; n = 91 482, P = 3.6 × 10 −8 ). Our result highlighted that miRNA–target gene network contributes to human disease genetics in a cell type-specific manner, which could yield an efficient screening of miRNAs as promising biomarkers.

Complex human traits are products of orchestration of high dimensional multi-omics layers, including genome, epigenome, transcriptome, proteome and metabolome. Genome-wide association study (GWAS) has discovered thousands of genomic loci associated with these traits ( 1 , 2 ). Integration of such large-scale genetic data with multi-layered omics information has successfully identified cell-type specific and context-dependent regulatory mechanism of diseases ( 3 ). While previous trans-omics approach mostly focused on integration with transcriptome (e.g. RNA-seq) ( 4 ) and epigenome data (e.g. ChiP-seq and Hi-C) ( 3 , 5 ), innovative construction of the analytic pipeline to integrate additional omics layers has been warranted to further elucidate complex biology of the traits. Non-coding regions in the human genome constitute one of the unrevealed layers in biology. MicroRNAs (miRNAs), short non-coding RNA molecules of 21–25 nucleotide long, are key players in post-transcriptional gene regulation ( 6 , 7 ). Numerous studies have shown their critical role in the pathogenesis of various human diseases ( 8 , 9 ) and application of miRNAs as a biomarker or a therapeutic target is ongoing and promising ( 10 , 11 ). Nevertheless, it has been difficult to detect comprehensive association signals of miRNAs as compared with those of protein-coding genes and mRNAs, because genomic region that encodes miRNAs is relatively small. Biological roles of a miRNA should also be interpreted in a tissue specific context along with its target gene. On the arrival of recent high throughput sequencing technologies, comprehensive catalog of miRNA expression profile was created and revealed that the expression levels of miRNAs varied greatly according to tissues and were highly skewed ( 12 ). Harnessing this tremendous work, here we extended our method to quantitatively evaluate enrichment of GWAS polygenic signals on miRNA–target gene networks ( MIGWAS ; miRNA–target gene networks enrichment on GWAS ) that we have previously reported ( 13 ) to further decipher tissue-specific contribution of miRNA function in each trait. The MIGWAS enables us to study the tissue-specific landscape of post-transcriptional regulation, and to identify candidate miRNAs that are essential in pathophysiology. Our method can also conduct in silico screening of the miRNAs that can be used as novel biomarkers or therapeutic targets on the traits, which was empirically validated by the subsequent case-control analysis of differentially expressed miRNAs obtained from clinical subjects and the large-scale genetic association analysis of the lead variants.


Calculation of gene- and miRNA- P values from GWAS summary statistics
We converted GWAS SNP association signals into a gene- or miRNA- level P value (i.e. P Gene and P miRNA , respectively). Similarly to the method Segrè et al. used in MAGENTA software ( 14 ), the best P value of a set of SNP P values mapped onto each gene or miRNA was corrected for the confounding effects of physical and genetic properties of genes or miRNAs on the P value (Figure 1A ). We excluded genes and miRNAs located in the major histocompatibility complex (MHC) region to avoid the influence from its long linkage disequilibrium and complex architecture ( 15 ).

Curation of miRNA expression data and miRNA–target gene network information
Tissue-specific miRNA expression data from the FANTOM5 consortium ( 12 ) excluding candidate novel ones was downloaded from a web site (see URLs ). Expression count per million values of mature miRNAs derived from 1842 pre-miRNAs in 179 human cell types were quantile-normalized using preprocessCore v1.34.0 package in R software. These cell types were classified as 18 tissue type categories; bone ( n = 10), brain ( n = 14), cardiac ( n = 3), eye ( n = 3), fat ( n = 15), fetal ( n = 14), gastrointestinal (GI; n = 7), genitourinary (GU; n = 6), immune ( n = 22), joint ( n = 1), kidney ( n = 7), liver ( n = 4), lung ( n = 10), muscle ( n = 8), pancreas ( n = 1), skin ( n = 6), vascular ( n = 15) and others ( n = 33; Supplementary Table S1 ) based on the organs from which these cells were collected. The result of the principal component analysis describing the expression pattern of miRNAs in each cell is shown in Supplementary Figure S1 . Next, we calculated a TSI as Ludwig et al. previously described ( 16 ). In brief, the TSI for j th miRNA was calculated as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}\begin{equation*}ts{i_j} = \frac{{\mathop \sum \nolimits_{i = 1}^N \left( {1 - {x_{j,i}}} \right)}}{{N - 1}},\end{equation*}\end{document} where N corresponds to the total number of cells measured and x j,i is the expression amount of i th cell normalized by the maximal expression of any cell for j th miRNA. In each cell type, we defined a set of highly and specifically expressed miRNAs that satisfy two conditions; (i) normalized expression value falls within the top 10 percentile of those obtained from all miRNAs and (ii) TSI > 0.7 (Figure 1B ). TSI > 0.7 threshold made it possible to include miRNAs with a wide range of expression levels and a variety of miRNAs, while retaining the power ( Supplementary Figure S2 ). This set of highly and specifically expressed miRNAs was used to partition miRNA’s heritability signal into various human cell types. We obtained four kinds of major target gene prediction algorithms; TargetScan Human, DIANA-TarBase, PITA, and miRDB (see URLs ; Supplementary Table S2 ). For each of the prediction algorithms, we made a prediction score matrix with a row corresponding to genes, and a column corresponding to miRNAs.

Quantifying the enrichment of miRNA–gene network to a trait in a specific cell type
In order to empirically evaluate cell type-specific enrichment of miRNA–target gene network to a trait, we extended MIGWAS, which we have previously described (Figure 1C ). Let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${{\boldsymbol{y}}_{A,{S_{i,j}}}}$\end{document} be the set of miRNA and gene pairs that satisfy the following four conditions: (i) target prediction score of the miRNA and gene is above j th threshold in i th prediction algorithm, (ii) the above mentioned P miRNA is below a nominal signal (α = 0.01), (iii) the P gene is also below α and (iv) the miRNA is included in a set of highly and specifically expressed miRNAs in cell A. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${\rm{n}}({{\boldsymbol{y}}_{A,{S_{i,j}}}})$\end{document} denotes the number of pairs included in the set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${{\boldsymbol{y}}_{A,{S_{i,j}}}}$\end{document} . We estimated the enrichment signal in cell A by comparing this metric \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${\rm{n}}({{\boldsymbol{y}}_{A,{S_{i,j}}}})$\end{document} with the empirical null distribution using a permutation procedure (20 000 times). Under this null distribution, we assumed that there is no association among tissue-specific miRNA expression, miRNA–target gene network and GWAS signal. In each iteration step, we randomly shuffled P values of both all genes and all miRNAs while retaining the miRNA–target gene relationships, and calculated the metric of the k th permutation step as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${\rm{n}}({{\boldsymbol{y^\prime}}_{(k)A,{S_{i,j}}}})$\end{document} (i.e. the number of miRNA–gene pairs that met all the four condition (i)–(iv) with the shuffled P values). The significance \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${P_{enrichment,\ A,{S_{i,j}}}}$\end{document} of the metric was defined as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$P( {{\rm{n}}( {{{\boldsymbol{y^\prime}}}} ) \ge {\rm{n}}( {{\boldsymbol{y}}} )} )$\end{document} . We then sequentially slid the threshold j of the i th prediction algorithm from top one percentile to 0.1 percentile with eight partitions. When integrating the statistics, we adopted the results obtained only if the mean of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${\rm{n}}({{\boldsymbol{y^\prime}}_{(k)A,{S_{i,j}}}})$\end{document} is more than five because the estimation of the null distribution can be biased when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${\rm{n}}({\boldsymbol{y^\prime}})$\end{document} s are sparse. In order to compensate for the uncertainty existing within a single target prediction algorithm, we also integrated the results of four prediction algorithms to obtain one robust enrichment P value in cell A by meta-analyzing the enrichment significance as follows: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}\begin{eqnarray*} && {P_{enrichment,A}} = \\ && \begin{array}{@{}c@{}} {\rm{\Phi }}\left( {\frac{1}{{\sqrt {{n_{algorithm}}} }}\mathop \sum \limits_{i\ = \ 1}^{\ {n_{algorithm}} = \ 4} \left( {\frac{1}{{\mathop \sum \nolimits_j {k_{{S_{i,j}}}}}}\mathop \sum \limits_{j\ = \ 1}^9 {k_{{S_{i,j}}}}{{\rm{\Phi }}^{ - 1}}\left( {{P_{enrichment,A,{S_{i,j}}}}} \right)} \right)} \right),\\ {\rm{where\ \ }}{k_{{S_{i,j}}}} = \left\{ {\begin{array}{@{}*{1}{c}@{}} {1,\ when\ mean\ y^\prime \ge 5}\\ {0,\ when\ mean\ y^\prime < 5} \end{array}} \right. \end{array} \end{eqnarray*}\end{document} Φ represents the cumulative distribution function of the normal distribution. Permutation P value < 0.05 was interpreted as statistically significant in all the following analyses. In addition to quantifying tissue specific enrichment signal, we recorded the pairs of a miRNA and a gene included in the set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${{\boldsymbol{y}}_{A,{S_{i,j}}}}$\end{document} to be considered as tissue-specific trait associated pairs of a miRNA and a gene. We have released a python source code together with formatted miRNA expression materials to allow users to perform MIGWAS using any GWAS summary statistics on their own computers (see URLs ). A multithreading option using multiprocessing module in python is implemented to shorten the computing time (∼6.50 h when applied to summary statistics of one GWAS with ∼7 million variants for 179 cell types using four CPU cores with 2.3 GHz). Selection of the algorithms to predict miRNA–target genes, TSI thresholds, and permutation numbers can be optimized by the users.

GWAS summary statistic data sets
We collected 49 GWAS summary statistics ( Supplementary Table S3 ). Twenty-one of them were provided from public websites or collaborators. They include diseases consisting of four major categories (immune/allergy-related [ n = 4], neuropsychiatric [ n = 2], cardiovascular [ n = 1] and genitourinary [ n = 1]) and quantitative traits consisting of five major categories (anthropometric [ n = 3], metabolic [ n = 3], musculoskeletal [ n = 1], cardiovascular [ n = 2], kidney-related [ n = 2] and hematological [ n = 2]). The remaining 28 summary statistics are obtained from ongoing BioBank Japan project ( 17 ), >170 000 genome and health-related phenotype biobank of Japanese. These include diseases consisting of nine major categories (immune/allergy-related [ n = 4], metabolic [ n = 1], musculoskeletal [ n = 2], neuropsychiatric [ n = 1], eye-related [ n = 2], cardiovascular [ n = 5], lung-related [ n = 1], liver-related [ n = 2], genitourinary [ n = 2], and malignancy [ n = 7]), and quantitative traits consisting of 2 anthropometric traits (adult height, and body mass index [BMI]). Definition of the diseases and the process of patient registration are described elsewhere ( 17 , 18 ). From these summary statistics, SNP positions based on the UCSC hg19 reference and their association P values from linear regression (quantitative trait) or logistic regression (binary trait) are collected and re-formatted.

Study populations for high throughput miRNA expression sequencing
In this case-control study, Japanese patients with early rheumatoid arthritis (RA) were enrolled together with healthy Japanese volunteers. The diagnosis of RA was based on 2010 ACR/EULAR criteria ( 19 ). Written informed consent was obtained prior to the enrollment from all the individuals. In total, 30 RA patients and 33 healthy control (HC) participants were enrolled. Detailed information on the participants is in Supplementary Table S4 . This study was approved by the ethical committee of Osaka University.

MiRNA expression sequencing and differential expression analysis in RA patients
Whole blood from the subject was collected into an ethylenediaminetetraacetic acid (EDTA) tube. PBMCs were isolated using Ficoll–Paque density gradient medium. Total RNA from PBMCs was isolated using miRNeasy Micro Kit (Qiagen). Libraries for miRNA-seq were prepared using SMARTer smRNA-Seq Kit (Takara) following manufacturer instructions. RNA-seq was performed using a HiSeq 2500 (Illumina, read length of 100 bp, single-end). For QC of the sequencing data, adaptor sequences were trimmed using Cutadapt, and reads with low quality score (Phred quality score < 20 in > 80% of total bases) and reads with length >50 bp were removed. All the samples have total read counts > 1.0 × 10 6 , and thus proceeded to further analysis. Read count information is summarized in Supplementary Table S4 . Reads were aligned against the known miRNA sequences in miRBase database, using Bowtie with recommended options described in the literature ( 20 ). Differential miRNA expression analysis between RA and HC was carried out with the R package DESeq2.

Overlap enrichment analysis of differentially expressed miRNAs with the MIGWAS result
To test whether differentially expressed miRNAs between RA patients and HC significantly overlap with candidate RA-associated miRNAs obtained from the MIGWAS, we performed a permutation procedure. We randomly shuffled sample IDs of the miRNA expression table, and defined differentially expressed miRNAs (FDR-q < 0.05). The number of overlapped miRNAs included both in differentially expressed miRNAs and in the MIGWAS candidates was recorded in each permutation step. After 5000 permutation steps, empirical P value was calculated as the number of permutation steps where the number of overlapped miRNAs was equal to or exceeded that from real dataset divided by the total number of permutation steps ( = 5000 iterations).

The in silico replication study of the biomarker miRNA loci with RA risk
For each genetic locus that harbors a biomarker miRNA identified by the MIGWAS of RA and the differentially expressed miRNA analysis of the RA patients, we selected the lead SNP with the most significant association within the locus from the original trans-ethnic RA meta-analysis (19 234 RA cases and 61 565 controls) ( 21 ). By looking-up the additional two RA GWAS of Japanese (3308 RA cases and 8357 controls; Supplementary Table S5 ) ( 22 ), we conducted the in silico replication study of the lead SNPs with RA risk. Meta-analysis of the GWAS and replication study was conducted by an inverse-variance method assuming a fixed-effects model.

Integrative analysis with eQTL summary data and summary-level Mendelian Randomization (SMR) analysis
We hypothesized that miRNAs identified by the MIGWAS pipeline exert their effects on the trait by regulating the transcripts of their target genes, and that the target genes’ expression level should also have a dose dependent association with the trait. To test this hypothesis, we undertook two ways of analysis using publicly available eQTL summary data. First, we tested whether target genes of the identified miRNA are enriched in eGenes within whole blood when compared with those across all tissues using summary eQTL data from GTEx consortium ( 23 ). eGenes in each tissue were defined as genes that harbor cis -eQTL variants which associate with their expression level with FDR-q < 0.05. The enrichment was assessed by the binomial test, with the prior probability being the number of eGenes within whole blood divided by the number of eGenes across all tissues that were included within the scope of eQTL analysis, GWAS, and miRNA’s target prediction. Second, we performed SMR to show that the expression level of target genes that were identified through the above-mentioned analysis associates with the trait by causality or pleiotropy, and not by linkage. By integrating trans-ethnic GWAS summary statistics of RA ( 21 ) with CAGE eQTL data of peripheral blood ( 24 ), we did SMR analysis for 41 target genes of hsa-miR-762. We considered P HEIDI ≥ 0.05 as a threshold that we could not reject the null hypothesis that there was a single causal variant affecting both gene expression and trait variation ( 25 ).


Overview of our MIGWAS statistical methods
The overview of our MIGWAS method is shown in Figure 1 . The principal hypothesis of MIGWAS is that if the miRNA is associated with a genetic risk of a trait, the target genes of the miRNA are also associated with the trait as a miRNA functionally works by interacting with its target genes. To test this hypothesis, we first converted SNP-level association statistics in the GWAS into gene- and miRNA-level association P values (i.e. P Gene and P miRNA , respectively; Figure 1A ). Then we quantitatively evaluated enrichment of pairwise association signals of miRNAs and their target genes. We note that we set the significance threshold of P Gene and P miRNA (α = 0.01) less stringent than the typical genome-wide significant level ( 26 ), to enhance sensitivity and to comprehensively incorporate the polygenic nature of complex traits ( 27 , 28 ). In order to quantify tissue-specific contribution of miRNAs in each trait, we next partitioned enrichment of miRNA–target gene signals into each cell type separately. After normalization of miRNA expression data released by the FANTOM5 consortium ( 12 ), we defined a set of highly and tissue-specifically expressed miRNAs in a total of 179 human cells (Figure 1B ). We performed cell type-based partitioning analysis within the defined set of miRNAs of each of the 179 cell types (Figure 1C ), and thus obtained cell-type specific enrichment P value through permutation-based parallel computing. In addition our MIGWAS pipeline automatically identifies a set of trait-associated miRNA–target gene pairs for both specific tissue categories and in all tissue categories combined together.

Heterogeneous enrichment of miRNA–target gene network to human complex traits
We applied the extended MIGWAS pipeline to the 49 GWAS summary statistics to evaluate the overall contribution of miRNA–target gene networks to various complex human traits, firstly without considering tissue-specificity. The GWASs covered a wide range of traits including anthropometric ( n = 4), immune/allergy ( n = 8), metabolic ( n = 5), musculoskeletal ( n = 2), neuropsychiatric ( n = 3), eye-related ( n = 2), cardiovascular ( n = 8), lung-related ( n = 1), kidney-related ( n = 2), liver-related ( n = 2), genitourinary ( n = 3), hematological ( n = 2) and malignancy traits ( n = 7). The detailed information of the GWASs is shown in Supplementary Table S3 and the Materials and Methods section. We observed nominally significant ( P < 0.05) miRNA–target gene enrichment signal in three traits for five GWASs; height, RA and type 2 diabetes (Figure 2A ), which was consistent and robust with our previous results despite of the update on miRNA–target gene prediction algorithms ( 13 ). We note that we observed trans-ethnically consistent results for height and type 2 diabetes ( P < 0.05 in both populations), two traits for which independent summary statistics were obtained from European and Japanese populations separately ( 29 , 30 ), which empirically validates the robustness of the MIGWAS results.

Tissue specific MIGWAS results successfully identified the disease-relevant tissue and associated miRNAs
Motivated by the robustness of our MIGWAS pipeline supported by trans-ethnic consistency, next we performed the partitioned enrichment analysis using the tissue specific miRNA expression data. Each trait was analyzed considering 179 different human cell types in parallel, and we annotated each cell into 18 tissue categories from which it was collected from ( Supplementary Table S1 ). We here highlighted enrichment signal to immune-related cells in each trait as an example (Figure 2B ). Tissue specificity of the traits in immune-related cells showed the different spectrum of enrichment with the most significant enrichment in Graves’ disease ( P = 2.6 × 10 −6 ). Relatively high enrichment signals were also found in type 2 diabetes as well ( P < 5.9 × 10 −5 in both populations), where obesity-associated chronic tissue inflammation is reported to play a key contributing role ( 31 ). Figure 2C shows the strongest associated cell in each trait. These results successfully illustrated the diverse tissue-specific nature of miRNA–target gene contribution to a trait, which was not obvious through the tissue-naïve approach. In many cases the analysis identified biologically relevant cell types in each trait (e.g. immune cells in Graves’ disease, chronic hepatitis C, systemic lupus erythematosus, Crohn's disease and asthma, adipocytes in LDL cholesterol and coronary artery disease, brain tissue in age at menarche ( 32 )). Detailed cell names of top association in all the traits are described in Supplementary Table S6 . Considering the abundance of accumulated literature on miRNA’s associations in immune- and allergy- related traits ( 33 ), we highlighted detailed enrichment signals in immune- and allergy- related traits (Figure 3 ). Here again, we observed disease relevant cell types in each trait. In RA, miRNA–target gene contribution was enriched in lung, bone, and immune cells ( P = 3.6 × 10 −3 , 9.6 × 10 −3 and 1.5 × 10 −2 , respectively), which is consistent with biological understanding of the disease that pathogenic autoimmunity is first triggered in the lungs faced with environmental stimuli such as cigarette smoking and then causes bone erosion ( 34 ). We note that these disease-relevant tissues have not been identified by any previous methods to integrate with mRNA transcriptome and epigenome data despite their pivotal roles in RA pathophysiology ( 3 , 4 , 5 ), thereby demonstrating validity of our method to incorporate tissue-specific miRNA profiles. All the cell names with significant enrichment in immune-related traits are described in Supplementary Table S6 .

Differential expression of miRNAs among rheumatoid arthritis patients enriches in MIGWAS result and pinpoints novel causal mechanisms
Through the MIGWAS pipeline, we also obtained a list of candidate trait-associated miRNAs and target genes as candidate biomarkers, which was defined systematically to satisfy the criteria used in the enrichment analysis (i.e. both miRNA and its predicted target gene are associated with the trait with a nominal significance [ P value < 0.01]) with all tissue categories combined together. Given the previous studies showing the association of miRNAs in the biology of RA ( 13 , 35 , 36 ), we decided to focus on RA as a target of in vivo validation. To validate candidate novel miRNAs of which association with RA was identified by MIGWAS, we profiled miRNA expression in peripheral blood mononuclear cells (PBMCs) from 30 patients with early RA and 33 control subjects using high throughput miRNA sequencing. PBMCs were selected for the miRNA expression profiling because they have been shown to harbor much of heritability in the pathology of RA ( 3 , 32 ), and also because they can be collected in the clinical setting. We found 94 differentially expressed miRNAs with a false discovery rate (FDR) of 0.05 (Figure 4A ). Of these, four miRNAs (hsa-miR-93-5p, hsa-miR-106b-5p, hsa-miR-301b-3p and hsa-miR-762) overlapped those identified by the MIGWAS pipeline applied for the RA GWAS. Permutation procedures in which sample IDs of the miRNA expression data were randomly shuffled 5000 times revealed that the observed overlap of as many as four miRNAs was much larger than would be expected by chance (98.0-fold enrichment [four miRNAs divided by the mean simulated overlap, 0.0408], permutation P value = 0.0010 [five simulation steps with equal or more overlap than the observed overlap divided by total simulation steps], Materials and Methods for permutation procedure; Table 1 and Supplementary Figure S4 ). One of the overlapped miRNAs, hsa-miR-762, was significantly highly expressed in RA patients (log 2 fold change = 1.15 and FDR-q = 0.043) and has a prominent expression specific to immune-related cells (tissue specificity index [TSI] = 0.982, Figure 4B and C ). This would implicate the utility of hsa-miR-762 as a future target of clinical validation for a biomarker of RA. On the other hand, the other three miRNAs showed repressed expressions in the RA patients with ubiquitous expression profiles among the cell types (TSI < 0.40). While genetic variants located nearby the identified biomarker miRNAs confer suggestive associations in the original RA case-control GWAS (19 234 RA cases and 61 565 controls, P GWAS ≥ 3.2 × 10 −7 ) ( 21 ), overlap with the RA case-control miRNA-seq analysis strongly prioritizes true-positive associations of the lead variants in such loci. Thus, we conducted an in silico replication study using additional 3308 RA cases and 8357 controls ( Supplementary Table S5 ) ( 22 ). Of these, two loci that harbor the three miRNAs satisfied the genome-wide significance threshold when GWAS and the replication study was meta-analyzed (22 119 cases and 69 363 controls; P META_GWAS = 3.3 × 10 −8 for rs34130487 at MIR95-MIR106B at 7q22 and P = 3.6 × 10 −8 for rs56656810 at MIR762 at 16q11; Figure 4D , E , Table 2 , and Supplementary Table S5 ), of which RA associations were novel findings. We further listed target genes of these miRNAs (Table 1 ). As for hsa-miR-762, 41 genes were identified as potential target genes that work synergistically to cause RA (Figure 4D ). To investigate whether those target genes have an eQTL effect in immune-related cells where hsa-miR-762 is supposed to be functioning, we referred to the summary eQTL statistics in human tissues released by GTEx consortium ( 23 ). We observed that target genes of hsa-miR-762 harbor cis -eQTL variants within whole blood as a proxy of immune-related cells more frequently than expected when compared to those in all available human tissues (1.46-fold enrichment with binomial P value = 0.014). This consistently suggests that miRNAs should modulate the association with the trait through interfering the transcriptome, and that the targeted gene's association with the trait should be driven by regulation on expression within a specific trait-associated tissue. In order to clarify the connection between eQTL signals of the variants which hsa-miR-762’s target genes harbor and their neighboring association signals in RA GWAS, we performed the summary data-based Mendelian randomization (SMR) ( 25 ). This analysis can test whether the expression level of the target genes associates with RA by causality or pleiotropy, and not by linkage. In many of the target genes of hsa-miR-762, we could finemap the potentially causal or pleiotropic eQTL variants that not only existed nearby GWAS genes but also both mediated gene-expression levels and were associated with RA risk ( Supplementary Table S7 for the summary statistics). As an illustrative example, genetic variants at SYNGR1 locus, which was one of the target genes of hsa-miR-762 and showed the strongest association statistics in the SMR analysis, are associated with RA risk (upper right panel at Figure 4E ). In expression data from Consortium for the Architecture of Gene Expression (CAGE) in PBMCs ( 24 ), SYNGR1 locus harbors cis -eQTL variants (lower right panel at Figure 4E ). The SMR analysis reveals that GWAS signal of the SNP (rs2069235, P GWAS = 7.6 × 10 −13 , P eQTL = 1.3 × 10 −38 ) in SYNGR1 is strongly and significantly mediated by eQTL effect ( P SMR = 3.5 × 10 −10 ), which was not driven by linkage ( P HEIDI = 0.27). The MIGWAS pipeline and miRNA expression profiles from clinical subjects successfully pinpointed the causal mechanism of MIR762 ’s association with RA, where increased expression of hsa-miR-762 in immune-related cells interferes with transcripts such as SYNGR1 , whose expression levels are associated with causing RA.

By integrating the large scale GWAS data and miRNA–target gene prediction algorithms together with comprehensive tissue-specific miRNA expression profiles, here we successfully extended a method MIGWAS to evaluate enrichment of GWAS signals on miRNA–target gene networks and to partition them into a tissue specific context. The application of the MIGWAS pipeline to a wide range of genetics of complex human traits depicted cell type-specificity of the trait biology, as well as identification of biomarker miRNAs and novel genetic risk loci such as hsa-miR-762 encoded in the MIR762 locus at 16q11 for RA. The implementation of genetic information on miRNAs will certainly serve to help investigators elucidate pathophysiology and forward clinical application, as well as further development of novel therapeutic targets. Our MIGWAS pipeline highlights four innovative features: (i) quantitatively assessing the miRNA–target gene contribution to genetics of a trait, (ii) deciphering tissue specificity to identify causal tissues on a trait, (iii) efficiently prioritizing genetic loci for a follow-up study that harbor true-positive associations with disease risk and (iv) providing candidate biomarker miRNA–target gene lists that might contribute to the understanding of disease biology. Our work firmly supports the idea that the miRNA’s contribution to genetics of complex human traits occurs in a cell type-specific manner in the same way as gene expression ( 4 ) and epigenetic regulation ( 3 , 32 ). A list of candidate miRNA–target gene pairs provides an ideal resource for experimental validation aiming at the discovery of biomarkers or potential therapeutics. The MIGWAS enables us to pinpoint a causal miRNA–target gene network to a trait, which was not achieved by GWAS alone or previous approaches to integrate GWAS and epigenome or transcriptome data. We acknowledge that our study has several limitations. First, tissue specific MIGWAS analysis was performed based on an expression data from healthy individuals, while miRNA expression profile might be different depending on the disease status of individuals. We consider that a compromising approach to use summary expression data from healthy individuals as ours has been shown to successfully capture enrichment of the trait heritability to biologically-relevant tissue ( 37 ). However, to have further insights into the disease biology, future application of expression profiles from pathologically altered tissues or inclusion of specifically depleted miRNAs should be warranted. Second, the in silico prediction of a miRNA and a target gene confers ambiguity dependent on the algorithms. In order to obtain unbiased prediction scores and to assure the robustness of the analysis, we integrated multiple different algorithms. When compared with a recently updated database miRTarBase ( 38 ), which provides the largest amount of miRNA–target gene interactions with experimental validations to date, our approach of multiple thresholding and multiple algorithms indeed contributed to the better functional prediction in vitro ( Supplementary Figure S3 ). Further integration of such experiment-based target gene information into the MIGWAS pipeline would be warranted. Third, since our method is based on the GWAS results, enrichment signal could be inflated due to the inflated GWAS summary statistics. To address this point, we took a permutation approach to estimate the null distribution, which made the MIGWAS results robust against the inflation of the GWAS itself ( 13 ). Forth, we did not incorporate tissue-specific expression profile of genes in our method. The interaction of both miRNAs’ and their target genes’ expression levels is beyond this study because it suffers from exponential combinations and computational intensiveness. Nevertheless, future integration would contribute to deciphering fine nature of tissue specificity. In conclusion, MIGWAS demonstrated that miRNA–target gene networks contribute to human disease genetics in the context of cell type-specific expressions, and successfully identified miRNAs as promising biomarkers.

The URLs for data presented herein are as follows: MIGWAS source code, FANTOM5 consortium, The BioBank Japan Project (BBJ), miRBase, miRDB, TargetScan Human, DIANA-TarBase, PITA, GTEx consortium, CAGE eQTL data, R DESeq2 package, R preprocessCore package, SMR software, CIRCOS, Cutadapt, Bowtie,

Supplementary Material

Supplementary Data are available at NAR Online.

Japan Society for the Promotion of Science (JSPS) KAKENHI [15H05670, 15H05911, 15K14429, 16KT0196]; Japan Agency for Medical Research and Development [17ek0410047h0001, 18gm6010001h0003, 18ek0410041h0002]; Takeda Science Foundation; Nakajima Foundation; SECOM Science and Technology Foundation, Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Bioinformatics Initiative of Osaka University Graduate School of Medicine, and Inoue Foundation for Science. FANTOM5 was made possible by a Research Grant for RIKEN Omics Science Center from the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT) (to Y.H.); MEXT for the RIKEN Preventive Medicine and Diagnosis Innovation Program (to Y.H.); Grant of the Innovative Cell Biology by Innovative Technology (Cell Innovation Program) from the MEXT, Japan (to Y.H.); Grant from MEXT to the RIKEN Center for Life Science Technologies and Grant from MEXT to RIKEN Center for Integrative Medical Sciences; J.H. is an employee of TEIJIN PHARMA LIMITED; N.S. is an employee of Ono Pharmaceutical Co., Ltd. Funding for open access charge: Japan Agency for Medical Research and Development [18ek0410041h0002]. Conflict of interest statement . None declared.

Saori Sakaue, Jun Hirata, Yuichi Maeda, Eiryo Kawakami, Takuro Nii, Toshihiro Kishikawa, Kazuyoshi Ishigaki, Chikashi Terao, Ken Suzuki, Masato Akiyama, Naomasa Suita, Tatsuo Masuda, Kotaro Ogawa, Kenichi Yamamoto, Yukihiko Saeki, Masato Matsushita, Maiko Yoshimura, Hidetoshi Matsuoka, Katsunori Ikari, Atsuo Taniguchi, Hisashi Yamanaka, Hideya Kawaji, Timo Lassmann, Masayoshi Itoh, Hiroyuki Yoshitomi, Hiromu Ito, Koichiro Ohmura, Alistair R R Forrest, Yoshihide Hayashizaki, Piero Carninci, Atsushi Kumanogoh, Yoichiro Kamatani, Michiel de Hoon, Kazuhiko Yamamoto, Yukinori Okada
Nucleic Acids Research
Oxford University Press
Figure 1
Figure 2
Figure 3