2 Data
The sclerostin pQTLs and GWAS results for are processed using the scripts sourced below. The linkage disequilibrium (LD) matrix of genetic variants in the SOST region (100KB downstream and 125KB upstream1 of SOST) is computed using data from 1000 Genomes (1000 Genomes Project Consortium et al. 2015). All of the genetic data has been aligned to build 37 (hg19) coordinates.
2.1 Datasets
2.1.1 Sclerostin pQTLs
The cis sclerostin pQTLs in Table 2.1 were extracted from Supplementary Table 2 in Zheng et al. (2023).
README
rsid
- rsID
chr
- chromosome
pos
- position (build 37)
ea
- effect allele
oa
- other allele
eaf
- effect allele frequency
beta
- effect sizese
- standard error of effect sizepvalue
- p-pvaluen
- number of samples
2.1.2 Study information
The GWAS studies used in the analyses2 are presented in Table 2.2.3
README
id
- dataset ID
source
- source of dataset
pmid
- PubMed ID
author
- author of study
link
- link to dataset
trait
- phenotype
abbr
- abbreviation
ancestry
- ancestry of study
n
- number of samples
n_cases
- number of cases
n_controls
- number of controls
unit
- unit of analysis (IRNT
= inverse rank normal transformation)
flag
- flag if the dataset (or equivalent) was used by Zheng et al. (2023) (Y
= yes,N
= no)
Citations: Xue et al. (2018); Mahajan et al. (2022); Hartiala et al. (2021); Van Der Harst and Verweij (2018); Aragam et al. (2022); Sudlow et al. (2015); Malik et al. (2018); Mishra et al. (2022); Malhotra et al. (2019); Morris et al. (2019)
2.2 Data processing
2.2.1 LD matrix
The LD correlation matrix of the SOST region (100KB downstream and 125KB upstream of SOST) for the European samples from 1000 Genomes Phase 3 V5b (1000 Genomes Project Consortium et al. 2015) is computed using the 01_data_ldmat.R
R script located in the scripts
folder.
source("./scripts/01_data_ldmat.R")
The output of the script is an Rda
file saved in the data
folder as 01_data_ld_mat.Rda
which contains two objects:
ld_snp
- adata.frame
of variant information with the columns:rsid
- rsID
chr
- chromosome
pos
- position (build 37)
ref
- reference allele
alt
- alternate allele
af
- allele frequency of the alternate allele
ld_mat
- amatrix
of variant correlations (\(r\) not \(r^2\))
2.2.2 GWAS data
SOST region
The SOST region is extracted and harmonized from the GWAS datasets using the 02_data_gwas_sos_region.R
R script located in the scripts
folder. The pQTL dataset is also harmonized using this script. The effect allele is aligned to the alternate allele in ld_snps
.
source("./scripts/02_data_gwas_region.R")
The output of the script is an Rda
file saved in the data
folder as 02_data_gwas_sost_region.Rda
which contains three objects:
gwas
- adata.frame
of GWAS results from the studies above with the columns:rsid
- rsID
chr
- chromosome
pos
- position (build 37)
ref
- reference allele
alt
- alternate allele (effect allele)
af
- allele frequency of the alternate allele
beta
- effect size
se
- standard error
pvalue
- p-value
pqtls
- adata.frame
of pQTL results with the columns:rsid
- rsID
chr
- chromosome
pos
- position (build 37)
ref
- reference allele
alt
- alternate allele (effect allele)
af
- allele frequency of the alternate allele
beta
- effect size
se
- standard error
pvalue
- p-value
studies
- adata.frame
of GWAS study information with the columns:id
- dataset ID
source
- source of dataset
pmid
- PubMed ID
author
- author of study
trait
- phenotype
abbr
- abbreviation
ancestry
- ancestry of study
n
- number of samples
n_cases
- number of cases
n_controls
- number of controls
unit
- unit of analysis (IRNT
= inverse rank normal transformation)
flag
- flag if the dataset (or equivalent) was used by Zheng et al. (2023) (Y
= yes,N
= no)
pQTLs
The GWAS results for the sclerostin pQTLs are extracted from 02_data_gwas_sost_region
using the 03_data_gwas_pqtls.R
R script located in the scripts
folder. The associations of rs1107747 and rs4793023 with HDL cholesterol and triglycerides are adjusted for rs72836567 using the COJO methodology (Yang et al. 2012) to account for the effects of the CD300LG gene (see Section 3.2 for further details). No good proxy variants (\(r^2 \geq\) 0.8) were identified for the missing sclerostin pQTLs in GCST006867.
source("./scripts/03_data_gwas_pqtls.R")
The output of the script is an Rda
file saved in the data
folder as 03_data_gwas_pqtls.Rda
which contains three objects:
gwas
- adata.frame
of GWAS results from the studies above with the columns:rsid
- rsID
chr
- chromosome
pos
- position (build 37)
ref
- reference allele
alt
- alternate allele (effect allele)
af
- allele frequency of the alternate allele
beta
- effect size
se
- standard error
pvalue
- p-value
pqtls
- adata.frame
of pQTL results with the columns:rsid
- rsID
chr
- chromosome
pos
- position (build 37)
ref
- reference allele
alt
- alternate allele (effect allele)
af
- allele frequency of the alternate allele
beta
- effect size
se
- standard error
pvalue
- p-value
studies
- adata.frame
of GWAS study information with the columns:id
- dataset ID
source
- source of dataset
pmid
- PubMed ID
author
- author of study
trait
- phenotype
abbr
- abbreviation
ancestry
- ancestry of study
n
- number of samples
n_cases
- number of cases
n_controls
- number of controls
unit
- unit of analysis (IRNT
= inverse rank normal transformation)
flag
- flag if the dataset (or equivalent) was used by Zheng et al. (2023) (Y
= yes,N
= no)
An extra 25KB was added upstream to ensure that all relevant CD300LG variants are included in the region.↩︎
The ischemic and cardioembolic stroke GWAS results from METASTROKE (Malik et al. 2016) used by Zheng et al. (2023) were replaced with those from MEGASTROKE (Malik et al. 2018) and the UK Biobank hypertension GWAS results from OpenGWAS used by Zheng et al. (2023) were replaced with those from Pan-UKBB due to licensing restrictions. The GWAS of coronary artery calcification was not available (either publicly or via application) at the time of this analysis (Kavousi et al. 2022).↩︎
Since Zheng et al. (2023) use trans-ethnic GWAS results, we followed suit. In all of the trans-ethnic analyses the majority of the samples were from European ancestry. There was no material difference in the results if we restricted the analyses to analysing Europeans only where possible Section 8.1.↩︎