We quantile normalized, log2 transformed and then centered the expression levels for every individual gene to a imply of zero and a standard deviation of one. We observed that 42 of these 44 genes show significantly higher expression (Student's T-test P < 0.001), as compared to the other 13 cell types. NS: non significant. However, neutrophil percentage is not significantly associated with gender. We tested the stability of our neutrophil percentage prediction in the EGCUT dataset (n = 825). From the list of 100 probes showing highest correlation with neutrophil percentage, we randomly selected a number of probes (increments of 5 probes, 1000 permutations per increment) and repeated the neutrophil percentage prediction. When including > 10 probes, the neutrophil prediction displays stable correlation with the actual neutrophil percentage (Spearman R ~0.75) and near ideal correlation with the predicted neutrophil percentage used in the meta-analysis (Spearman R ~0.99). Error bars denote standard deviation. Red collection denotes the number of gene expression probes the different cohorts in this study used to estimate neutrophil percentage. We correlated the actual gene expression levels with age in the EGCUT dataset (n = 825, normalized using log2 transformed and quantile normalization, and gene expression levels corrected for 40 principal components) and observed that there is a low, but significant correlation between age and gene expression in the log2 transformed and quantile normalized data, which becomes insignificant when correcting the gene expression data for 40 principal components (which was used to determine the neutrophil conversation effect). However, gene expression levels are not significantly associated with gender. The conversation model we used does not take heteroscedasticity into account. Therefore, we decided standard errors using the 'sandwich' package in R, which allows for the estimation of strong standard errors. We observed strong correlation between standard errors, Z-scores and p-values by our model and a model that applies strong estimation of standard errors in the EGCUT and Fehrmann datasets. Principal component 1 (PC1) and principal component 2 per study. Samples with a correlation < 0.9 with PC1 (red) were excluded from analysis. The gene expression data that was utilized for the conversation meta-analysis was corrected for up to 40 principal components. In order to retain genetic variance in the gene expression data, components that showed a significant correlation with genotypes were not removed. In the EGCUT dataset (n = 825), many of these components also strongly correlate with neutrophil percentage and inferred neutrophil percentage. The majority of the variance in gene expression explained by these components was however removed from this dataset. List of 58 Illumina HT12v3 probes utilized for calculating the estimated neutrophil percentage principal component score and their correlation with neutrophil percentage in the EGCUT dataset (n = 825). Summary statistics for the interaction analysis. Results of the conversation analysis. Summary statistics showing the effect size (correlation coefficient) in each of the tested replication datasets. Results of the neutrophil mediated.

