Chromatin Looping Links Target Genes with Genetic Risk Loci for Dermatological Traits

Chromatin looping between regulatory elements and gene promoters presents a potential mechanism whereby disease risk variants affect their target genes. In this study, we use H3K27ac HiChIP, a method for assaying the active chromatin interactome in two cell lines: keratinocytes and skin lymphoma–derived CD8+ T cells. We integrate public datasets for a lymphoblastoid cell line and primary CD4+ T cells and identify gene targets at risk loci for skin-related disorders. Interacting genes enrich for pathways of known importance in each trait, such as cytokine response (psoriatic arthritis and psoriasis) and replicative senescence (melanoma). We show examples of how our analysis can inform changes in the current understanding of multiple psoriasis-associated risk loci. For example, the variant rs10794648, which is generally assigned to IFNLR1, was linked to GRHL3, a gene essential in skin repair and development, in our dataset. Our findings, therefore, indicate a renewed importance of skin-related factors in the risk of disease.


INTRODUCTION
GWASs have uncovered the genetic factors that contribute to disease risk for many complex disorders, including dermatological conditions such as psoriasis (Ps)  or atopic dermatitis (AD) (Paternoster et al., 2015). It is now accepted that the majority of these genetic risk factors do not influence coding sequences directly but rather regulatory regions such as enhancers and promoters, which can be highly cell-type specific (Ernst et al., 2011;Farh et al., 2015; Roadmap Epigenomics Consortium et al., 2015). Many studies have also shown that the effect of these variants is not necessarily mediated by the closest gene because they can affect distal genes through chromatin looping mechanisms ( GTEx Consortium, 2013;GTEx Consortium et al., 2017;Javierre et al., 2016;Nica and Dermitzakis, 2013;Rao et al., 2014;Võ sa et al., 2018 1 ).
Recently, there has been a growing interest in using chromatin conformation and other functional genomics techniques to investigate how these disease-associated loci lead to disease and identify affected genes. To obtain the resolution necessary to identify enhancer-promoter interactions, several techniques have been developed that couple Hi-C methods with various enrichment strategies to increase resolution and reduce sequencing cost compared with traditional Hi-C. For example, HiChIP achieves this by using chromatin immunoprecipitation to capture the genomic regions associated with a specific histone modification or protein of interest after Hi-C library preparation (Mumbach et al., , 2016, whereas capture Hi-C uses oligonucleotide baits to capture genomic regions of interest (Dryden et al., 2014). Previous studies have used these techniques to identify causal genes at GWAS loci (Cairns et al., 2016;Dryden et al., 2014;Jäger et al., 2015;Martin et al., 2016Martin et al., , 2015McGovern et al., 2016;Mumbach et al., 2017), uncovering many important mechanisms and pathways involved in various diseases. These studies have, however, mainly focused on cells derived from blood and immune cells.
Multiple publications have shown that chromatin interactions are cell-type specific and are altered during differentiation and stimulation (Burren et al., 2017;Dixon et al., 2012;Hansen et al., 2018;Mumbach et al., 2017;Rao et al., 2014;Rubin et al., 2017;Schmitt et al., 2016;Siersbaek et al., 2017). Although for many autoimmune diseases associated risk variants primarily affect immune cells, other cell types might also be involved in the disease development (Farh et al., 2015;Mahajan et al., 2018;Mizoguchi et al., 2018). Autoimmune dermatological traits such as Ps, psoriatic arthritis (PsA), and systemic sclerosis are systemic conditions with heterogeneous effects in multiple cell types and are likely to involve a complex interplay between skin-resident cells and immune cells (Albanesi et al., 2018;Mccoy et al., 2017).
Ps is recognized as an immune-mediated condition, and as such, the roles of immune cells in disease have received great attention. However, more recently, there has been a rejuvenated interest in the role of keratinocytes (KCs), which are the most predominant cells in the epidermis and are highly dysregulated in the disease in terms of proliferation and differentiation (Ni and Lai, 2020). KCs respond to T-cell signals by producing proinflammatory cytokines that further contribute to T-cell activation (Albanesi et al., 2018;Benhadou et al., 2018;Lorscheid et al., 2019). A prevalent Tcell-released signal in inflamed tissue is IFN-g. In Ps, IFN-g is released in the classical T helper type 1 pathway (Lowes et al., 2008) and accumulates in psoriatic lesions (Schlaak et al., 1994;Uyemura et al., 1993), promoting epidermal KC apoptosis (Hijnen et al., 2013).
Another recurrent feature in inflammatory skin diseases is the invasion of CD8þ T cells in the inflammation site (Antohe et al. 2019;Cai et al., 2012;Hennino et al., 2007;O'reilly et al., 2012). In Ps, these cells release IL-17, an important factor in disease pathology (Ortega et al., 2009).
In this study, we use HiChIP to map active chromatin interactions genome wide on spontaneously immortalized KCs (HaCaT), unstimulated or stimulated with IFN-g, and a CD8þ T cell line derived from a cancerous skin plaque (MyLa). In contrast to capture Hi-C, this technique specifically enriches for active regions of the genome. Moreover, it provides interactions genome wide, allowing us to discover candidate genes for a larger set of disorders and to include more recently identified loci in our analysis. We use these new datasets to study disease-associated loci for Ps, PsA, AD, melanoma, and systemic sclerosis and to identify potentially associated genes. Our analysis pipeline is available at https:// github.com/ChenfuShi/keratinocyte_gene_link and allows the possibility of using custom input variants (such as nongenome-wide significant loci or credible SNP sets).
We complement our dataset with matched RNA sequencing and Hi-C. Because the disorders studied have a significant immune component, we include in our analysis publicly available HiChIP data for naïve CD4þ T cells and a B-cell-like lymphoblastoid cell line  reprocessed with the latest computational tools to investigate the GWAS-associated loci, which might be mediated by these cell populations.
We show that the interacting genes enrich for pathways that are highly relevant to the underlying disease mechanisms. Our results can also inform the association of different genes to specific loci, and we provide examples of this for four distinct Ps loci. Therefore, our findings update our view on the underlying disease mechanisms of these traits, facilitating future studies and drug discovery.

A compendium of activity and chromatin interactions in KCs
Chromatin interactions specific for active regulatory elements, such as enhancers and promoters, were identified using HiChIP targeting the active histone mark H3K27ac. Interactions and peaks are highly cell-type specific, as shown by clustering and principal component analysis (Figure 1a and b and Supplementary Figure S1a and b). We noted some batch effects between replicates that caused an imbalance in the number of detected HiChIP loops between replicates (HaCaT cells) and observed that more variance was explained by batch than by IFN-g stimulation in HaCaT cells in the principal component analysis plots. Nevertheless, there is substantial overlap in the interaction called between replicates (Supplementary Figure S2 and Supplementary Table S1). As expected, H3K27ac peaks were strongly enriched in GWAS SNPs compared with the background in our data. Moreover, dermatological conditions such as the ones studied in this study possess a stronger enrichment in KCs than other diseases (Supplementary Figure S3).
We identified >50,000 significant interactions from each of our HiChIP datasets (summary statistics in Supplementary Tables S1 and S2). The median interaction distance was 250 kilobase (kb), which is consistent with the results derived from the public datasets ( Figure 1c and Supplementary Figure S4a). Moreover, the vast majority (90%) of interactions connected the regions within topologically associating domains identified from our matched Hi-C datasets ( Figure 1d). This is consistent with recent reports about topologically associating domains determining the scope of gene regulation (Rao et al., 2014;Rowley and Corces, 2018). We compared this new dataset with our previous study that used capture Hi-C in the same cell lines and targeted GWAS loci from a number of autoimmune disorders (Ray-Jones et al., 2020). We found significant overlap between capture Hi-C and HiChIP, although HiChIP identifies significantly more genes that are specifically active in these cell types (Figure 1c  RNA sequencing for IFN-g-stimulated HaCaT cells revealed 535 differentially expressed (447 upregulated and 88 downregulated) genes that were enriched for pathways related to IFN-g stimulation (Supplementary Figure S5c). This stimulation has a significant impact on the number of interactions that originate from these genes: genes overexpressed on stimulation have 7.08 interactions per gene in stimulated cells compared with 4.48 in unstimulated cells (Supplementary Figure S5d) (Â1.58 in these genes compared with Â1.18 overall interactions). Although it seemed that these interactions linked the regions that have increased H3K27 acetylation (mean: Â1.3, median: Â1.64, in stimulated cells compared with that in unstimulated cells), the majority of these linked peaks already have significant levels

Chromatin contacts identified by HiChIP pinpoint functionally relevant interactions
We first wanted to test whether HiChIP can successfully discover important features involved in gene regulation. To do this, we used the 535 differentially expressed genes that are activated on IFN-g stimulation and assumed that these genes are regulated through enhancers binding transcription factors (TFs) related to IFN-g response. As expected, H3K27ac peaks linked to these genes through HiChIP interactions in HaCaT were enriched for TF motifs that are known to be involved in the response to IFN-g stimulation and/or viral infection (Ramezani et al., 2018;Schroder et al., 2004) and contained similar motifs to those identified in peaks that had different H3K27ac signal in the stimulated condition (Supplementary Figure S5f). H3K27ac peaks linked through HiChIP interactions to upregulated genes specifically in stimulated cells enriched significantly for IRF2, ISRE, and IRF1 motifs compared with the H3K27ac peaks linked to these genes in both conditions (Supplementary Table S3).
Next, we wanted to assess how HiChIP datasets can recapitulate results from large expression quantitative trait loci (eQTL) studies in the discovery of genes dysregulated by GWAS variants. Using Ps loci, we tested our results from naïve T cells and GM12878 cells with the largest blood eQTL database available, eQTLgen (Võsa et al., 2018 1 ), as ground truth. Our HiChIP analysis showed a 52% recall rate and 31% specificity compared with that of the genes identified from eQTLs. Using genes that were linked in both HiChIP replicates, we obtained a 46% recall rate and 40% specificity. For comparison, Javierre et al. (2016) reported a recall rate of 25.7% with lead eQTLs.
We then compared our HaCaT KC datasets with the Genotype-Tissue Expression dataset from sun-exposed skin (GTEx Consortium, 2013), resulting in a recall rate of 56% and a specificity of 14.7% (38% and 20%, respectively, with genes reproduced from both replicates). In this study, the Genotype-Tissue Expression dataset contains relatively few eQTLs compared with the eQTLgen database owing to the limited sample size, reducing specificity in our analysis. By comparison, our previous capture Hi-C dataset generated a 21% recall rate and 11% specificity. Interestingly, 38 of 52 eQTLs linked to Ps loci identified in the skin in Genotype-Tissue Expression are recapitulated in the eQTLgen dataset. These eQTL datasets were generated from slightly different cell populations compared with those analyzed in this study. We have attempted using eQTL datasets generated from pure cell populations (for CD4þ and CD8þ T cells) such as the Kasela et al. (2017) dataset and the Database of Immune Cell Expression eQTL dataset (Schmiedel et al., 2018), but these studies produced very few hits within the studied loci owing to their limited sample size. Regardless, we obtained similar recall rates as reported earlier (22-66%).
Using chromatin conformation and functional genomics to dissect disease-associated loci We next sought to use our dataset to identify the genes whose transcription start sites either overlap or are linked by chromatin interaction to GWAS SNPs associated with PsA, Ps, AD, melanoma, and systemic sclerosis. We identified between 84 and 399 potentially linked genes for each disease studied in our combined datasets (Table 1), with a large fraction of these being recoverable using both replicates for each cell type (Supplementary Figure S6). Importantly, these genes were strongly enriched for disease-relevant pathways. For example, genes linked to melanoma loci were enriched for replicative senescence and cell-cycle pathways, whereas genes linked to Ps and PsA were linked to cytokine signaling and IL-23 (Figure 2), which is a pathway targeted by multiple novel treatments (Sakkas et al., 2019).
Although some genes were common in all the cell lines (e.g., 12% in Ps), most genes were specific to one or a few cell types (Supplementary Figure S7). Moreover, we found that most loci also implicate multiple genes, with an average of 7.1 genes per loci in Ps. Interestingly, the number of genes implicated by our method correlated significantly with the number of genes that were linked by eQTLs for the same loci (P ¼ 4.61e-13) (Supplementary Figure S8). This could be partly caused by the size of the linkage disequilibrium block of the loci, in which the lead SNP implicates a set of coinherited SNPs that can have effects on the expression of many genes.
Using these data, we can begin to provide some functional insight into the mechanisms mediating disease susceptibility. We found that for a significant number of loci, the closest protein-coding genes were not implicated, despite it being possible for linkage disequilibrium blocks to have multiple closest genes. For example, of the 44 Ps loci tested with a linkage disequilibrium block smaller than 100 kb, we linked all the closest genes for only 25 loci. For 14 loci, we did not find any evidence of the closest genes being regulated by the GWAS SNPs in any of the cell lines tested, and for 5 loci, we linked some but not all of the closest genes. Similar numbers can be seen in all the traits studied (Supplementary Table S4). We also identified some loci that do not have any proteincoding genes within 50 kb. We identified 12 of these loci across the diseases studied and provided putative linked genes for all of them in Supplementary File S1.
Looking at a locus, particularly the ETS1 locus, Ps and AD have distinct associations. However, whereas the Ps locus directly overlaps the promoter of ETS1, the association with AD is located 130 kb downstream of the gene. We observed significant chromatin interactions between the AD locus and the ETS1 promoter in naïve T, Myla, and GM12878 cells ( Supplementary Figures S9 and S10). Our data suggest a putative mechanism in which the distinct disease associations at this locus are mediated by a single gene. We follow up with the description of other examples in Ps in the next section.
HiChIP identifies genes associated with Ps loci in a celltype-specific manner The results from our analysis can be applied to augment the existing understanding of disease-associated loci and to link new genes or change the ones currently linked. In this study, we focus on four Ps-associated loci that show cell-typespecific and, to our knowledge, previously unreported interactions.
The Ps locus indexed by SNP rs73178598 is located in an intergenic region overlapping an antisense RNA, SATB1-AS1. We found a 240 kb T-cell-specific interaction present in naïve T and MyLa cells linking this locus with the promoter of SATB1 in all replicates ( Figure 3). Interestingly, for this region, there is an interaction that is present in naïve T cells and Myla but not in KCs or GM12878 with differences visible in Hi-C maps as well (Supplementary Figure S11). Silencing of SATB1 has been shown to have a similar effect to IFN-g stimulation on major histocompatibility complex chromatin organization (Kumar et al., 2007) and is an important regulator of regulatory T cells and autoimmunity (Beyer et al., 2011).
The Ps loci indexed by SNP rs9504361 is intronic to EXOC2 and is typically associated with this gene. However, H3K27ac occupancy at the locus is specific to MyLa cells in our analysis. We detected long-range interactions between this SNP and the promoters of IRF4 (all replicates) and DUSP22 (one replicate only), specifically in this cell type ( Supplementary Figures S12 and S13). IRF4 is a lymphocytespecific TF central to the activation of innate and adaptive immune systems (Huber and Lohoff, 2014), and it was found to be overexpressed in psoriatic skin lesions (Ni et al., 2012). DUSP22 is a phosphatase that might be involved in the c-Jun N-terminal kinase signaling pathway and has been shown to Another example of a gene target, to our knowledge previously unreported, can be found at the 1p36 locus (rs10794648), which to date has been associated with IFNLR1 (also known as IL28-RA) because it is the closest gene to the associated SNPs (Genetic Analysis of Psoriasis Consortium & the Wellcome Trust Case Control Consortium 2, et al., 2010;Stuart et al., 2015). IFNLR1 encodes part of a receptor for IFN-g that is presented in the epidermis and is  thought to promote an antiviral response in Ps (Lazear et al., 2015). However, our HiChIP data showed long-range interactions between the Ps SNPs at 1p36 and the distal gene GRHL3 primarily in HaCaT KCs (all replicates in unstimulated cells and combined only in IFN-g stimulated cells) (Figure 4a and Supplementary Figures S14 and S15). We further explored the chromatin profile in this region using available chromatin immunoprecipitation sequencing data for enhancer-associated marks (H3K27ac and H3K4me1) and a promoter-associated mark (H3K4me3) in progenitor, differentiating and migrating primary KCs (Klein et al., 2017). We found that GRHL3 overlaps H3K27ac and H3K4me3 marks in all conditions, indicating that the promoter is active through differentiation (Supplementary Figure S16a). Moreover, we found that the Ps-associated SNPs overlap H3K4me1 in migrating and H3K27ac in migrating and progenitor cells but not H3K4me3 in any condition (Supplementary Figure S16b). These findings suggest that the SNPs overlap an enhancer that is present in migrating and progenitor cells but are active in progenitor cells only. GRHL3 encodes a TF that stimulates migration. It is upregulated in psoriatic lesions and is required for repair of the epidermal skin barrier after immune-mediated injury (Gordon et al., 2014). Improper regulation of GRHL3 in progenitor cells could adversely affect the migration of KCs, which is known to be highly accelerated in Ps, leading to the formation of the classic skin plaques. Targets of the GRHL3 TF include further GWAS-implicated genes such as IVL (involved in KC differentiation) (Watt, 1983) and KLF4 (TF involved in KC differentiation and skin barrier formation). KLF4 was recently implicated as the likely functional target gene in the 9q31 locus in our previous study (Ray-Jones et al., 2020). Finally, the locus indexed by the Ps SNP rs73183592, to our knowledge, previously undescribed mechanism was linked through a long-range interaction spanning about 500 kb to FOXO1 (Figure 4b and Supplementary Figure S17), a gene with important functions in regulatory T cells. Interestingly, this interaction was identified primarily in KCs in our analysis (all replicates in unstimulated and one replicate only in IFN-g stimulated) but was only weakly supported in T cells because this enhancer seems to be specific to KCs. Dysregulation of this pathway was also found to be important in the development of Ps Zhang and Zhang, 2019).

DISCUSSION
Chromatin conformation and functional genomics studies have the potential to uncover the underlying mechanisms that drive the disease susceptibility of many complex traits. Although these techniques are very promising, there has been a lack of studies in disease-relevant cell types, and as recently evidenced, both chromatin interactions and gene regulation are cell type and stimulation specific (Burren et al., 2017;Dixon et al., 2012;Hansen et al., 2018;Mumbach et al., 2017;Rao et al., 2014;Rubin et al., 2017;Schmitt et al., 2016;Siersbaek et al., 2017).
In this study, we used H3K27ac HiChIP, a modern technique that allows combined analysis of both chromatin conformation and chromatin activity, to create a global study of promoter-enhancer interactions in KC and CD8þ T-cell lines. Equivalent data have so far only been generated in    immune cells with few examples in other cell populations, such as HiChIP in endometrial cancer cells (O'Mara et al., 2019) and promoter capture Hi-C in neuronal cells (Song et al., 2019), cardiomyocytes (Choy et al., 2018), and pancreatic islets (Miguel-Escalada et al., 2019). After assessing the effectiveness of these techniques in linking functional elements with genes in a cell-type-specific manner, we show their possible use in studying diseaseassociated loci. We explore disease-associated SNPs for PsA, Ps, AD, melanoma, and systemic sclerosis and identify all the genes that are linked by chromatin interactions to these variants. We show that these genes are enriched for disease-relevant pathways and provide tables and figures for all the loci (Supplementary File S1 and GitHub repository). We show how these data, using four distinct Ps-associated loci as examples, allow us to identify previously unreported mechanisms and provide functional insight into the diseases studied.
The HiChIP gene targets have the potential to be used as therapeutic targets in drug repurposing and discovery, as recently applied for other diseases (Fang et al., 2019;Martin et al., 2019). As a proof of concept, we tested the genes identified in this work by querying the DrugBank database (Wishart et al., 2008) in search of drugs that are currently used and the ones that could be repurposed. Across the diseases studied, we identified 127 genes that are targeted by approved drugs in some diseases, corresponding to 231 drugs that could be potentially repurposed (Supplementary File S2).
A limitation of this study is that it is based on immortalized cell lines. In the future, these studies should include primary tissues and compare healthy with diseased samples. This has so far been done in a limited way, but early results are showing a significant impact of disease state and genetics on chromatin conformation (Delaneau et al., 2019;Fasolino et al., 2020;Gorkin et al., 2019;Kloetgen et al., 2020). Moreover, although the techniques used in this study are an important strategy to identify potential candidate genes, they require further individual experimental validation (such as genetic perturbation techniques) before they can be used for drug targeting.
In summary, through our analysis, we present a list of potential target genes and pathways for mediating disease risk for five complex diseases, and by documenting individual loci, we highlight some mechanisms by which this risk is mediated. These genes and mechanisms represent a useful resource for further research aimed at characterizing how genetic variation has an impact on disease susceptibility for these and other complex diseases.

MATERIALS AND METHODS
For detailed methods, see the Supplementary Materials and Methods.

HiChIP experiments
HiChIP libraries were generated according to the Chang Lab protocol (Mumbach et al., 2016). Briefly, 10 million cross-linked cells were lysed, the chromatin was digested with MboI, biotinylated, and ligated. After random shearing, chromatin immunoprecipitation was performed against H3K27ac (ab4729, Abcam, Cambridge, United Kingdom). Decrosslinking and biotin pulldown preceded tagmentation with Tn5 transposase (Illumina, San Diego, CA) at 55 C for 10 minutes. For more details, see the Supplementary Materials and Methods.

Linking GWAS loci to putative gene targets
To identify the genes that were linked to disease-associated SNPs, we first identified the transcription start sites of all protein-coding transcripts in the hg38 Gencode V29 annotation (Harrow et al., 2012). We associated all transcripts for which a transcription start site was located within 5 kb of a loop. We also associated the transcripts for which the transcription start site was within 1 kb of an SNP overlapping an H3K27ac peak as identified from HiChIP data. All transcripts were then grouped by gene, and the genes were filtered by an expression level of at least 1 transcript per million in the corresponding cell line.

Data availability statement
The sequence datasets generated and analyzed during this study are available in the Gene Expression Omnibus repository under accession numbers GSE137906 (for capture Hi-C only) and GSE151193 (for HiChIP, RNA sequencing, and Hi-C with updated processed files).
The code required to reproduce the analysis in this study is available on GitHub. Figures with Hi-C and HiChIP loops for every locus for all the diseases studied in this research are also available in the same repository (https://github.com/ChenfuShi/keratinocyte_ gene_link).