- •Genes encoding highly abundant secreted proteins define adult gland types
- •Gland-specific activity of transcriptional regulators contributes to proteome diversity
- •Differential retention of fetal genes drives functional diversity in adult glands
- •Cellular heterogeneity underlies gland-specific protein secretions
Salivary proteins are essential for maintaining health in the oral cavity and proximal digestive tract, and they serve as potential diagnostic markers for monitoring human health and disease. However, their precise organ origins remain unclear. Through transcriptomic analysis of major adult and fetal salivary glands and integration with the saliva proteome, the blood plasma proteome, and transcriptomes of 28+ organs, we link human saliva proteins to their source, identify salivary-gland-specific genes, and uncover fetal- and adult-specific gene repertoires. Our results also provide insights into the degree of gene retention during gland maturation and suggest that functional diversity among adult gland types is driven by specific dosage combinations of hundreds of transcriptional regulators rather than by a few gland-specific factors. Finally, we demonstrate the heterogeneity of the human acinar cell lineage. Our results pave the way for future investigations into glandular biology and pathology, as well as saliva’s use as a diagnostic fluid.
- cell-line heterogeneity
- regulatory architecture
- developmental regulation
- salivary proteins
- salivary biomarkers
- salivary mucins
- gene regulation
Saliva is the quintessential gatekeeper at the entry to the gastrointestinal tract (
). It is a complex biofluid and exerts a multitude of important functions in the oral cavity and beyond that depend upon its repertoire of proteins. These functions include breakdown of dietary starch by the salivary enzyme amylase, provision of calcium phosphate to maintain mineralization of tooth enamel, and host defense against pathogenic microorganisms (
) while maintaining a beneficial commensal microbiome in the mouth (
). Saliva also possesses physicochemical properties keeping the oral cavity moist, well lubricated, and forming a barrier against environmental and microbial insult, functions that are equally provided by saliva proteins, especially mucins (
). Thus, variation in the saliva proteome will have important biomedical consequences (
). At the extreme, malfunctioning of the salivary glands because of, for instance, radiation treatment of head and neck cancer or the relatively common autoimmune disease Sjögren’s syndrome, results in severe complications in oral health that debilitate patient quality of life (
). Therefore, understanding how the composition of the saliva proteome is attained and regulated remains an important avenue of inquiry.
A major complication in studying saliva and harnessing its proteome profile for diagnostic applications is the complexity of this oral biofluid, because it is a mixture of components derived from multiple sources. Saliva is predominantly synthesized and secreted by three major pairs of anatomically and histologically distinct craniofacial secretory organs: the parotid, submandibular, and sublingual salivary glands (Figure 1A). Each of these gland types produces a characteristic spectrum of salivary proteins that are thought to be predominantly based on their composition of mucous and serous acinar cells. These intrinsic proteins sustain most major functions of the saliva. In addition, saliva contains extrinsic proteins that originate in other organs and systems, including the bloodstream and cells lining the oral integuments (
). A multitude of studies have been conducted to catalog salivary proteins and distinguish intrinsic from extrinsic protein components (
), including those that specifically investigated ductal secretions (
). These studies include proteomic analyses comparing whole saliva or saliva collected from the ducts of the parotid or submandibular/sublingual glands to body fluids such as plasma, urine, cerebrospinal fluid, and amniotic fluid (
). However, discrepancies among published datasets, likely due to variation in collection procedures, sample integrity, storage conditions, sample size, and analytical methods (
), have impaired the establishment of a robust catalog of saliva-typic proteins and their salivary gland origin—an outcome that has so far severely hampered the use of saliva as a physiological and pathophysiological research tool and as a reliable fluid for disease diagnosis (
To address this gap in knowledge, we sequenced the total RNA of 25 salivary gland samples collected from human adult and fetal submandibular (adult, SM; fetal, sm), sublingual (adult, SL; fetal, sl), and parotid (adult, PAR; fetal, par) glands. This dataset allowed us to analyze transcriptomes across gland type and developmental stage, compare the salivary gland transcriptome to that of other organ systems, and integrate the salivary transcriptome with available proteome data.
The Functional Specialization of Adult Salivary Glands Occurs during Late-Stage Development
To comprehensively identify gene expression differences among the three major salivary gland types, we conducted a transcriptome analysis of multiple healthy male and female SL, sl, SM, sm, PAR, and par glandular tissues (Figure 1A; Table S1). Human salivary gland development begins within 6–8 weeks, with the formation of a branched structure with clearly defined end buds (pre-acini) by 16 weeks and lumenized acini by 20 weeks (
). The period after 20 weeks is associated with cytodifferentiation and the presence of intercalated/striated ducts and is characterized as the last stage of salivary gland development (
). Salivary glands are considered fully differentiated by 28 weeks, as noted by the presence of secretory vesicles and the expression of secretory protein BPIFA1/SPLUNC1 (
). Here we used glandular tissues taken from 22 to 23 weeks of age and, based on these previous studies, define this age group as late-stage development.
We comparatively analyzed the expression levels of 167,278 transcripts consolidated into 40,882 coding and noncoding genes (Table S2). We could clearly differentiate mature glands from fetal glands without any a priori hypothesis based only on the first principal component of transcriptome data (Figure 1B). The second principal component of the transcriptome data evidently separated the mature gland types. However, the same analysis could not differentiate among fetal gland types. We verified these results using a hierarchical clustering analysis (Figure 1C), in which the transcripts of the mature glands clustered into a major branch distinct from that of the fetal glands. Moreover, we found that the transcripts of the mature glands branched up according to their glandular origin, whereas those of the fetal glands did not. We quantified these observations using Pearson correlation analysis (Figure S1). As expected for tissues composed of similar cell types, the PAR gland exhibited greater similarity in global gene expression to the SM than the SL gland.
To determine which transcripts account for the differences among the gland types, we conducted a comparative analysis of glandular transcripts (Figure 1D; Table S2) and found hundreds of transcripts that were differentially expressed among mature glands. Gene Ontology (GO) analysis of these transcripts showed that genes found predominantly expressed in the fetal tissues were significantly enriched in categories linked to growth and development, including cell cycle, cell division, and other fundamental cellular processes (Table S3). Differences among mature gland types mainly resulted from genes that, as defined by the Human Protein Atlas (https://www.proteinatlas.org/), code for secreted proteins.
We confirmed the gland-type-specific and abundant expression of a limited number of genes coding for secreted proteins that are also found in saliva in high abundance (
) (Figure 1E). Several other genes encoding abundant secreted proteins were expressed by two gland types only or by one gland type exclusively (Table 1). For example, MUC7 was highly enriched in the SL and to a lesser extent in the SM but virtually absent from the PAR gland. We also identified several genes coding for secretory proteins that had not been previously described to differ among gland types. One example is cysteine-rich secretory protein 3 (CRISP3), an early response gene that may participate in the pathophysiology of the autoimmune lesions of Sjögren’s disease (
) and is expressed by human labial glands (
), was highly expressed by the SL and to a lesser extent by the SM but absent from the PAR glands.
Table 1Genes Expressed in Abundance in Salivary Glands
|Top Transcribed Genesa Specifically Expressed in Salivary Glands
|STATH, HTN3, HTN1, AMY1, SMR3B, PRH2, ENSG00000254144, CST4, RPPH1, CST1, PRB1, PRB3, PRB4, C6orf58, MUC19, ENSG00000225840, CD24P4, RIMBP3C, LINC00273
|Top 20 Transcribed Genes in the SLb
|RN7SL1, LYZ, ZG16B, MUC7, MTRNR2L8, PIGR, MUC5B, ENSG00000254144, CRISP3, STATH, C6orf58, RPPH1, MTRNR2L1, EEF1A1, PIP, FCGBP, WDR74, DMBT1, ZNF354B, IGHA1
|Top 20 Transcribed Genes in the PARb
|RN7SL1, AMY1A, HTN3, AMY1B, PRB1, MTRNR2L8, PRB3, STATH, PRB2, PRH1, PRH2, PRB4, HTN1, CA6, ENSG00000254144, PIGR, MTRNR2L1, PRR4, EEF1A1, RPPH1
|Top 20 Transcribed Genes in the SMb
|RN7SL1, STATH, HTN3, MTRNR2L8, HTN1, AMY1A, SMR3B, ZG16B, PIGR, PRH2, MTRNR2L1, ENSG00000254144, MUC7, CST4, RPPH1, ZFHX3, PRH1, AMY1B, CST1, EEF1A1
|Additional Highly Transcribed Genes of Reported Functional Relevance in Gland Development, Physiology, or Pathologyc that Show Salivary-Gland-Type-Specific Expression
|KLK1, LRP1B, MUCL1, CNTN5, SPP1, LRP2, EDN3, AGR2, TFF1, GALNT12, GALNT13, FSTL1, COL3A1, SPARC, PLXND1, POSTN
|Additional Highly Transcribed Genes of Reported Functional Relevancec Coding for Non-Secreted Protein Products
|RDH11, MAOB, FEZF2, LIM1XB, SLC5A5, DNER, PGR, ABCC1, GALNT13, NKX2-3, MCFD2, TCN1, FURIN
|Top 10 Proteins Abundantly Found in Whole-Mouth Saliva that Likely Originate from Extrinsic Sources Such as Blood Plasma or Epithelial Linings of the Oral Cavity
|ALB, IGHG2, AZGP1, IGHG1, ACTG1, IGKV3-20, IGKV4-1, IGKV4-5, S100A9, KRT1
|Top 10 Transcribed TF Genesa in Salivary Glands
|ZFHX3, ZNF354B, LTF, XBP1, TFCP2L1, EHF, FEZF2, SON, NFIB, FOXO3, JUN, ETV1, FOS
|Additional Highly Transcribed TF Genes of Reported Functional Relevancec in Gland Development, Physiology, or Pathology
|NKX3-1, BHLHA15, FOXA1, NKX2-3, HEY1, YAP1, TP63, SOX2, SOX2, SOX9, SOX10, FOXC1, FOXD3, CBX2, SOX11, ZBTB16, KLF9
a Listed are the top 10 genes expressed in each major gland type. Because some of these genes overlap, the total of genes listed here is lower than 30. For a more systematic look into their expression in salivary glands, see Tables S2 and S4.
b Underlined gene names designate those that are predominantly expressed in the respective gland category (log2 fold change > 2).
c A more detailed description of these genes, including references, is provided in the main text.
We found that kallikrein 1 (KLK1), low-density lipoprotein receptor-related protein 1B (LRP1B), mucin-like 1 (MUCL1/SBEM;
), carbonic anhydrase (CA6;
), and C6orf46/SSSP1 (skin and saliva secreted protein 1; cell origin of protein is unknown;
) were expressed by the PAR and SM glands and absent from the SL glands, whereas contactin 5 (CNTN5) and secreted phosphoprotein 1/osteopontin (SPP1) were restricted to the PAR and SL glands, respectively. A small fraction of secreted genes was also found to be exclusively expressed by only one adult gland type. For example, transcripts for low-density lipoprotein receptor-related protein 2 (LRP2/megalin), a multiligand uptake receptor that is involved in protein reabsorption (
), were only found in PAR tissue; endothelin 3 (EDN3;
) was highly enriched in the SM; and the mucous components FCGBP (
), AGR2 (
), and trefoil factor 1 (TFF1;
), along with a gene of unknown function enriched in mucous tissues, C6orf58/LEG1 (
), were restricted to the SL. Some proteins that these genes encode are found in saliva (e.g., C6orf58/LEG1;
), whereas others have a negligible presence (e.g., LRP2/megalin), although whether this deficiency results from protein degradation or whether they are simply not secreted into the ductal lumina is yet to be determined.
We found several genes that are not secreted but still show remarkable gland-type specificity. For example, the SL gland was enriched in transcripts for retinol dehydrogenase 11 (RDH11) compared with the PAR and SM glands, whereas transcripts encoding enzymes such as the dopamine-degrading monoamine oxidase B (MAOB), transcription factors (TFs) FEZF2 (a regulator of cell differentiation;
) and LIM1XB (LIM homeobox TF 1 beta), co-transporter SLC5A5, and growth factor or steroid receptors such as DNER (Delta and Notch-like epidermal growth factor-related receptor) and progesterone receptor (PGR) were almost exclusively expressed by the SM and PAR glands. The multi-drug resistance gene ABCC1 was highly enriched in the PAR; GALNT13, an initiator of O-linked glycosylation of mucins, was enriched in the SM; and the TF NKX2-3, which is required for murine sublingual gland development (
), was almost exclusive to the SL. A few of these protein-coding genes were previously reported to be specific for other organ systems not included in the genotype-tissue expression (GTEx) database, e.g., placenta-specific protein (PLAC4).
Adult gland types also differed significantly in the expression of immune-related secretory genes. Through GO analysis (Table S3), we found distinct complement cascades and immunoglobulin production pathways (e.g., IGHV1-58 and C6) that were shared by the PAR and SM but were different from those shared by the SM and SL. The SM and SL showed enhanced levels of transcripts for genes assigned by GO to the categories “acquired immunity” (secretory immunoglobulin A [S-IgA] and immunoglobulin G [IgG]) and “innate immunity” (lysozyme, BPI, BPI-like, and PLUNC proteins; cystatins; mucins; peroxidases; statherin [STATH]; and others). This observation raises the possibility that the SM and SL glands may provide a constant background level of acquired and innate immunity in the oral cavity that is maintained independently of salivary flow stimulation through food intake and chewing activity. In that context, it is of interest that glandular inflammatory conditions (sialadenitis) show a predilection for certain gland types. For example, Heerfordt syndrome causes parotitis (
), whereas chronic sclerosing sialadenitis predominantly affects the SM glands (
The proportion of total gene transcripts encoding secreted proteins was significantly higher in mature glands than in their fetal counterparts (p < 0.05, Mann-Whitney test) (Figure S2). Yet several secretory genes present in saliva were also expressed at significant levels in fetal glands, albeit at lower levels than in adult glands (Figures 1E and 1F). Many of those genes did not match the tissue-specific expression patterns of the adult organs. For example, transcripts for MUC7 and MUC5B, which are expressed exclusively by the SM and SL glands, were expressed by all fetal gland types (Figure 1E, left panel). Such an outcome hints at the possibility of unknown functions of mucin genes during fetal development.
There Is Extensive Retention of Gene Transcripts from Fetal to Adult Stages in All Mature Gland Types but Most Pronouncedly in the SL Gland
We next analyzed our RNA sequencing (RNA-seq) datasets for genes that were retained or depleted during maturation of the salivary gland types. Overall, we found 7,166 genes were expressed at similar levels at both fetal and adult stages (Figure 2A; also see Figure 1E). These globally expressed genes are enriched for functions related to organ development and adult homeostasis and physiology (Table S3). Among the highly retained gene transcripts, we identified factors, e.g., fibroblast growth factors 1, 7, and 10 (
), that in mice are reduced in expression during salivary gland formation and are known to promote salivary gland development in these animals, suggesting species variation in gene retention during gland development.
Despite extensive similarities in gene retention among all adult gland types, the SL gland stands out, because it retains a group of additional 595 genes from fetal to adult stages (Figure 2B). The most highly expressed genes in this group are primarily related to extracellular matrix formation and function (Figure 2C; Table S3). Some genes, including those coding for collagen 1 and 3 isoforms (e.g., COL3A1), as well as SPARC/osteonectin, were retained at fetal-like transcript levels (10- to 100-fold higher than in the SM and PAR). These results suggest that the SL retains a more fetal-like extracellular matrix that may guide stem cell-mediated repair, as was suggested for other organ systems.
We also identified several highly abundant genes not related to the extracellular matrix that were retained in the SL gland compared with the PAR and SM glands. These included the extracellular glycoprotein Fst-SPARC family member follistatin-like 1 (FSTL1), which is an essential regulator of tracheal formation and lung epithelial cell maturation (
), and the receptor for semaphorin class 3 ligands plexin D1 (PLXND1), which has multiple roles during development (e.g., synaptogenesis, heart formation, and vasculogenesis) and is heavily associated with Moebius syndrome, a developmental neurological disorder that is characterized by paralysis of the facial nerves and variable other congenital anomalies (
). In regard to Moebius syndrome, patients show salivary gland dysfunction (
), although whether the tissues are affected at morphological levels is unknown. In addition, periostin (POSTN), which is highly retained in the SL, has been implicated in stem cell regulation in multiple tissues, including bone (
), heart (
), pancreas (
), and tendon (
Few fetal transcripts were absent from all mature gland types (Table 1). Those few that were included gene transcripts involved in fetal blood (e.g., hemoglobin gamma A [HBGA]), embryonic development (e.g., insulin-like growth factor 2 [IGF2]), and cell proliferation (e.g., topoisomerase [DNA] II alpha [TOP2A]), as well as several TFs known to regulate developmental processes in other organ systems (e.g., SOX11) (
). HBGA is a fetal globin gene known to be absent from adults. Thus, its low transcript level in adult glands (~20 transcripts in adults compared with ~2,500 in fetal tissue) ensures the rigor of our study.
The Diverse TF Repertoire of Mature Salivary Glands May Shape Hotspots of Hundreds of Genes with Salivary-Gland-Specific Expression
We next tested the hypothesis that TFs display gland-specific gene expression. To address this hypothesis, we investigated the expression patterns of hundreds of TFs that were (1) highly abundant in each gland type at each developmental stage, (2) showed salivary-gland-specific expression, or (3) had been previously implicated in salivary gland development, disease, or cancer (Figure 3A; Table 1; Table S4). More than 60% of known TFs (1,025 of 1,648) were expressed (>100 DESeq2 normalized counts [NCs]) by at least one of the salivary glands, with 64% (661) of them expressed in each of the fetal and mature glands and thus suggestive of conserved function during maturation and homeostasis.
Our analysis identified a host of TFs previously shown to be essential regulators of salivary gland development in mice to also be expressed in developing human glands. These include regulators of acinar cell development (e.g., SOX2, SOX9, and SOX10;
); targets of FGF10 signaling (e.g., ETV5); regulators of duct formation, such as TFCP2L1 (
) and YAP1 (
), and of basal stem cells such as TP63 (
); and a recently discovered TF that promotes salivary organoid initiation from mouse embryonic stem cells (FOXC1;
). A group of TFs, including FOXD3, CBX2, and SOX11, was found to be exclusively expressed in fetal glands, indicative of roles in human salivary gland development. This latter group of TFs is of high interest to those studying organ bioengineering, wound repair, and cancer, because multiple markers present in fetal tissue are also expressed in various cancers (e.g., SOX11;
) and are required for regeneration and de novo generation of tissues (
), yet their exact functions remained unclear due to the absence of information on fetal organs. We also found several TFs that were far less abundant at the fetal stage than at the adult stage, suggestive of adult-specific functions. Examples are BHLHA15/MIST1, the master regulator of the secretory program and secretory cell architecture (
); KLF9, a negative regulator of epithelial and tumor cell proliferation (
); and ZBTB16, which affects diverse signaling pathways, including cell cycle, differentiation, programmed cell death, and stem cell maintenance (
SL tissue, compared with PAR and SM tissue, demonstrated a 5- to 10-fold enrichment for transcripts of TFs known to regulate mucous cell formation (Table 1), including FOXA1 (
), NKX2-3 (
), and NKX3-1 (
), as well as of TFs regulating cell differentiation, including HEY1, a downstream effector of NOTCH signaling (
). In addition, we found TFs that are routinely used as markers to define gland maturity independent of gland type. Those include BHLHA15 (MIST1), which indeed shows mature-gland-specific expression yet exhibits differential expression at both mRNA and protein levels among mature gland types, with the SM gland showing the highest expression and the SL gland showing the lowest (Figures 3A and 3B). Altogether, our results suggest that rather than a few gland-specific TFs driving functional diversity, specific dosage combinations of dozens, if not hundreds, of TFs likely shape the transcriptome variation of individual adult salivary glands.
To identify genes that are expressed specifically in salivary glands, we compared transcript levels in salivary glands with those of 54 other tissues in the GTEx portal, including other epithelial organs that secrete fluids, such as the pancreas, mammary tissue (non-lactating), and intestine (
) (Figures S3–S5; Table S5). This analysis identified 188 transcripts (Figure 3C; Table S4) with observable gene expression (>100 NCs) in adult salivary glands but negligible (<10 TPM) expression in 53 other tissues and organs reported in the GTEx database.
Of the 188 genes identified as salivary gland specific, 80 are predicted to be protein coding based on RefSeq (
) (Table 1; Table S4). Besides genes encoding proteins abundantly found in saliva (e.g., HTN, MUC7, PRB, and AMY1), most of these salivary-gland-specific genes are long non-coding RNAs (108 genes) that to our knowledge have not been identified in the salivary gland context. They include LINC00273, a possible regulator of lung cancer metastasis (
), and AC092159.2, which has been suggested to play a role in metabolic processes (
). Given the multiple suggested roles of long non-coding RNAs in other organ systems, these transcripts may play a role in controlling nuclear architecture and transcription in the nucleus, as well as in modulating mRNA stability, translation, and posttranslational modifications in the cytoplasm of salivary gland cells.
), CST (
), BPIFA (
), and PRB (
) gene clusters, contain genes encoding proteins secreted in saliva (see also Figures 1D and 1E). This cohort of additional loci harboring salivary-gland-specific gene sets offers opportunities for investigating the regulation of gene candidates within these clusters in salivary gland development, homeostasis, and disease.
Transcriptional and Posttranslational Regulation of Abundant Salivary Secreted Proteins
To determine whether saliva protein abundance is mainly regulated at the transcriptional level, we compared transcript levels in each glandular tissue type with protein abundances in the corresponding glandular ductal secretions as they became available through the Human Salivary Protein Wiki (HSP-Wiki: https://salivaryproteome.nidcr.nih.gov/). Looking at the entirety of the data, we did not find a global correlation between transcript levels of secretory genes in any glands and corresponding protein levels in the respective glandular secretions (R2 < 0.1) or in the whole saliva (R2 < 0.1) (Figure 4A; Figure S6). We also noted differences among gland types in terms of how transcript and protein abundances were related. In SM/SL ductal saliva, a greater proportion of more highly abundant proteins were derived from genes with lower transcript levels in the SL gland (<104 NCs), a relationship that was not observed for the SM or PAR gland (as seen in the top-left quadrant of plots in Figure 4A). However, we did find that most highly abundant proteins in ductal salivary secretions are also highly expressed at the RNA level in their corresponding glandular tissues of origin (Figure 4A). Overall, our data indicate that the major salivary glands differ in posttranscriptional regulation and that transcript levels in salivary glands are not necessarily reflected by protein abundances in saliva, except for those proteins that occur in saliva at highest abundances. This finding suggests that most proteins in whole saliva are not derived from genes expressed in salivary glands or that salivary proteins are being affected by posttranscriptional regulations or modifications, likely including posttranscriptional modifications such as glycosylation, affecting the quantitative detectability of highly glycosylated proteins (e.g., mucins) by mass-spectrometric methods, and massive postsecretory enzymatic modifications known to affect protein abundances in whole-mouth saliva (
We next compared the most abundant proteins, ranked according to protein abundance in the human salivary proteome and according to transcript abundance by our RNA-seq analysis, with publicly available mass-spectrometry-based proteomes of 29 healthy human organ tissues from the Human Protein Atlas project (
) (Table S5). Through this comparative analysis, we delineated 14 of the top 50 secreted saliva proteins to be highly enriched in salivary glands and saliva and 5 additional proteins to be highly expressed in only one or two organs other than in salivary glands (Figure 4B). These findings are supported by our comparative analysis of salivary gland transcriptomes to the 54 tissue and organ transcriptomes in the GTEx database (Figures S3–S5). However, it has to be taken into account that the transcriptomes of some secretory tissues and organs, the pancreas being the exception, are not available in the GTEx database, including the lacrimal gland and lactating mammary gland. It is known that some abundant proteins in saliva (e.g., MUC7, lactoperoxidase [LPO], and PIP) are present in other body fluids, such as tear fluid, milk, and epithelial mucus (
). Examples for this are proteins, such as CST2, CST5, ZG16B, and SMR3B, which showed little to no protein or transcript expression in other tissues or organs, including the mammary gland, pituitary, prostate, pancreas, and lung, but were reported to be present in tear fluid (
We also found genes abundantly transcribed in salivary gland tissues (>1,000 NCs) that were not detected at the protein level in salivary secretions. This group of genes was enriched in functions related to intracellular housekeeping processes, as well as in functions typifying exocrine tissues, including vesicle-mediated transport (e.g., MCFD2), regulated exocytosis (e.g., TCN1), and cell secretion (e.g., FURIN) (Table S3). We also identified genes encoding proteins that previous proteome analyses identified in saliva but that were not detected by us at the RNA level in glandular tissues. This group of genes was enriched in functions characteristic of epithelial cells, including keratinization and cornification (e.g., KRT1 and SPRR1A). One noteworthy protein that is abundantly found in saliva (among the top 10% of proteins) but is not highly expressed (among the bottom 10%) at the RNA level in glandular tissues is albumin. This finding proves that most albumin in the whole saliva is not derived from salivary glands but rather diffuses into whole-mouth fluid via blood plasma leakage, mostly in the form of gingival crevicular fluid, as was suggested earlier (
Certain secreted proteins, which were abundantly detected at the mRNA level in glandular tissues and at the protein level in ductal saliva, such as STATH, LYZ, MUC7, and HTN1, were detectable by mass spectrometric analysis at lower amounts or not detectable in whole-mouth saliva (Figure 4A; Figure S6). Such reduction or loss can result from the proteins being proteolytically degraded once exposed to the mouth environment (
) or through adsorption to oral surfaces after secretion from salivary glands. Indeed, multiple studies have demonstrated STATH, LYZ, and HTN1 to be selectively adsorbed from saliva onto the enamel surface in the form of the acquired pellicle (
). It is also possible that mass spectrometric analysis could not quantitatively detect certain proteins in saliva due to, for example, dense glycosylation that protects them from trypsin cleavage or other molecular features that impede identification of specific peptides in the mass spectrometer (
). In that regard, a recent mass-spectrometry-based proteomic analysis of healthy PAR glands has revealed multiple proteins, including HTN1 and LYZ, to be highly expressed in the glandular tissue (
), thereby supporting our prediction of protein loss after secretion from the gland. Overall, integrating our glandular RNA-seq and mass-spectrometry-derived protein abundance data, we were able to parse out the origins of proteins present in human saliva (Figure 4C).
As stated earlier, some of the most abundant proteins in saliva, such as MUC7, MUC5B, PRB3, and S-IgA, are heavily glycosylated (
) aiding in multiple functions, such as lubrication, mucus barrier formation, and microbial binding (
). Thus, we specifically investigated the expression patterns of genes that regulate O-linked or N-linked glycosylation, as per GO categorization (Table S4). We found that each salivary gland type expresses a typical repertoire of transcripts for genes that regulate glycosylation. Focusing on the most abundantly expressed glycosylation-related genes, it became clear that the SL shows dramatically increased expression of multiple GalNAc transferase genes (GALNTs). This family of enzymes is important for the initiation of O-glycosylation, a hallmark feature of mucin proteins abundantly present in salivary gland secretions. This finding makes sense biologically, given that the SL produces the major proportion of mucin proteins in human saliva. It is also worth emphasizing the magnitude in the expression of GALNTs among salivary gland types. For example, GALNT12 is expressed ~100-fold higher in SL tissue than in the other glands. We also discovered that the expression of GALNT13 was highly specific to the SM gland. GALNT genes have been reported to be non-redundant in both animals and humans and thus likely have specialized roles in catalyzing different types of glycosylation (
). Overall, our results will become particularly important from a biomedical perspective, because the salivary glycome forms an interface with the oral microbiome (
), and abnormalities in glycosylation are discussed as biomarkers for both Sjögren’s syndrome and oral cancers (
Cellular Heterogeneity within Gland Types Underlies Gland-Specific Protein Secretion
To consolidate our previously described findings, we conducted immunofluorescence imaging of tissue sections from the three adult gland types. We found clear concordance of gland-specific expression at the protein level with RNA transcript levels for STATH, AMY1, LPO, CRISP3, MUC7, and MUC5B (Figure 5A). The expression patterns of each of these proteins are tissue specific and are concordant with previous studies describing individual gland types or gland-specific secretions (
One striking example for gland-specific expression is salivary amylase, an enzyme synthesized by serous acinar cells, that shows abundant expression at the protein level in PAR and SM glandular tissue while being virtually absent from the SL. A similar trend was found for STATH and LPO. The lower expression levels of these gene products in the SL likely result from the lower amount of serous acinar cells in this type of glandular tissue (
). However, the near-complete absence of amylase in serous acinar cells of the SL indicates that these cells in the SL are distinctly different from their counterparts in the SM and PAR. Our findings confirm the validity of using these proteins as key markers to discern SM- and PAR-gland-derived tissues from those of the SL.
A different example of gland- and cell-specific expression is MUC7, which shows abundant expression at the protein level in the serous cells of the SL gland and, to a lesser extent, in the serous cells of the SM gland while being absent from serous cells of the PAR gland (Figure 5A), matching MUC7 transcript levels from the respective glandular tissues (Figure 1E). Given this result illustrating the diversity of serous cells across gland types, we next asked whether there was also intraglandular variation in protein synthesis at the cellular level and pursued this question by combining immunostaining for amylase and MUC7. We found MUC7 enriched in subsets of serous acinar cells that were deficient in amylase expression, and we found AMY expression in other subsets of serous acinar cells that were deficient in MUC7 expression (Figure 5B). Our observation suggests that serous cells within the SM exist as distinct populations, each secreting its own repertoire of proteins. Recent single-cell RNA-seq of murine parotid salivary glands indicated acinar cell heterogeneity (
). We propose here that human acinar cells are heterogeneous with respect to secreted protein expression.
We also discovered that for synthesis of the same salivary protein, the three major gland types use different cell lineages. For example, we found that in the SL protein expression of CRISP3 paralleled that of MUC7 in being abundantly produced by acinar cells (Figure 5A). However, in the SM, which expressed lower transcript levels of CRISP3 compared with the SL, CRISP3 protein could be located in only a few acinar cells but was found predominantly in cells of the intercalated ducts. An analogous expression pattern for CRISP3 (i.e., acinar and duct cells expressing CRISP3) has been described in the murine lacrimal gland (
), but it was not known that these two cell populations can each produce the same protein even in different gland types.
To prove whether what we observed at the gland level by immunohistochemistry manifests at the protein level in salivary secretions, we conducted gel electrophoretic separation of glandular ductal secretions and western blot analysis for AMY1, MUC7, CRISP3, BPIFA2/SPLUNC2, and STATH (Figure 5C). As revealed by Coomassie blue and periodic acid Schiff stain, the combined secretions of the SM and SL (SM/SL) glands showed strikingly different patterns of protein and glycoprotein bands compared with PAR secretion, whereas whole mixed saliva showed a combination of both. The presence of AMY1 and MUC7 proteins in glandular secretions, as shown by western blotting, was consistent with transcriptomic and immunofluorescent analyses (Figure 5) and with previous reports (
). We also found BPIFA2, a protein known to exist in whole saliva (
), to be enriched in SM/SL secretion but weakly expressed in PAR secretion, supporting our transcriptome-based evidence that this protein is predominantly derived from the SM. We further found CRISP3, detectable in whole saliva as a doublet of bands, as previously shown (
), to be restricted solely to SM/SL secretions with no detectable protein in PAR ductal saliva, thus matching both our immunohistological and RNA-seq findings (Figure 5A). The CRISP3 band in SM/SL ductal secretion migrated farther during electrophoresis than the double bands in whole-mouth saliva. This outcome suggests that postsecretion enzymatic processing may have occurred, likely resulting in the alteration of CRISP3 sialylation by oral bacterial sialidases, which is known to lead to a loss of negatively charged sialic acid moieties, thus retarding the mobility of the protein in the electrophoretic field (
). It is of note that we found STATH to be present in both PAR and SM/SL ductal secretions with higher abundance in PAR saliva (Figure 5C) (
). STATH was also abundantly detected in the WS sample run on our gel. It has to be noted though that utmost precaution was taken during sample handling and preparation to minimize proteolytic degradation. When other samples, even of the same donor individual, were probed for STATH, only a faint band or no band at all could be detected in WS (data not shown) showing that enzymatic degradation affecting in particular this component can easily occur as was described earlier (
). Overall, our combined immunohistological and immunoblotting data at the protein level in ductal secretions correlate well with transcript levels in the corresponding glands and hence intimately link the fields of human salivary gland and saliva protein research.
Our analysis of the transcriptomes of mature and fetal salivary glands identified hundreds of genes that together define mature salivary glands as specialized secretory organs. We also found that fetal glands, despite the late stage of development and the glands being anatomically separate entities, could not be distinguished based on their transcriptional profiles. This indicates that developmental differentiation of glandular function and functional specialization of the three mature gland types occur during fetal development at a time point later than when the tissue samples in our study were acquired (>22 weeks). These findings pave the way for future studies dissecting mechanisms of regulation of the transcriptome during glandular development and will have significant implications for de novo organ generation.
Our results reveal the extensive level of retention of fetal genes in adult tissues in a human epithelial organ system, strongly suggesting the involvement of these genes in both developmental and homeostatic roles. The major differences between fetal and adult tissue transcriptomes manifest in the upregulation of pathways facilitating secretory and immune function in mature tissue and in the enrichment of developmentally related genes in fetal glands. However, hundreds of fetally expressed genes are retained at significant levels in mature SL tissue while being poorly retained by SM and PAR tissue. Such differential gene retention likely contributes to gland-type differences, including the more fetal-like extracellular matrix of the SL.
Our results have identified hundreds of transcripts that are specifically expressed in salivary glands compared with other major human tissues. These genes provide a robust set for diagnostic purposes both to identify specific glandular types and test for deviations in expression under pathogenic conditions such as cancer. In an attempt to further understand the regulation of these transcripts, we verified TFs that show differential expressions in fetal and mature glands. Our analysis suggests that fine-scale dosage differences of hundreds of TFs shape the expression landscape of mature gland types. Indeed, we found dozens of clusters across the genome in which multiple genes with salivary-gland-specific expression are located in proximity. It is thus plausible that these loci harbor master regulatory sequences that are bound by a specific combination of TFs, leading to salivary-gland-specific expression. Overall, our study provides an exciting next step by having identified dozens of candidate target transcripts and TFs for investigating the mechanism of gland-specific expression regulation in these loci.
Our quantitative transcriptomic and proteomic comparison also allowed us to define relationships between the level of salivary gland gene transcripts and their corresponding proteins in salivary secretions. Similar to other studies exploring the relationship between high-abundance proteins and mRNA levels, as well as to quantitative studies in other secretory cell systems (
), we found that the abundance of most highly expressed saliva proteins is regulated mainly at the transcriptional and posttranslational level. Our data also suggest differences among gland types in terms of the glycosylation machinery that likely increases protein diversity. Furthermore, we traced the origins of abundant proteins in whole-mouth saliva to their respective glands or to extrinsic sources such as blood plasma. Collectively, our study provides a robust framework for modeling the biological makeup of a major complex secretory fluid (summarized in Figure 4C).
Finally, we provide evidence about cellular heterogeneity in human salivary glands. Specifically, we were able to identify two subsets of serous acinar cells in the submandibular gland that appeared to be specialized for expressing either AMY1 or MUC7. Acinar cells have been traditionally defined in a simple binary categorization as either serous or mucous. Our data suggest that these cell types are more heterogeneous than previously acknowledged and that subsets of them are specialized to synthesize specific salivary proteins. In addition, our data revealing CRISP3 production by serous acinar cells in the SL and by ductal and serous acinar cells in the SM demonstrate that the same saliva protein can be produced by different cell lineages. These insights will have major implications for understanding the relationship between glandular secretions and protein content of these secretory fluids, and consequently, inform saliva diagnostics. Furthermore, our transcriptomic analysis of healthy human glandular tissue will have major ramifications for de novo engineering of these organs, as well as for determining the impact of the disease on salivary gland homeostasis and function.
In future studies, it will be important to investigate how salivary gland tissue differentiation, transcriptional and posttranslational regulation, and intraglandular cellular heterogeneity play their roles in human health. We also foresee that our study will open up the possibility of asking specific questions about human evolutionary adaptations to accommodate different environmental conditions, food resources, and pathogen challenges among geographically and culturally distinct human populations with respect to their composition of saliva (
Key Resources Table
|REAGENT or RESOURCE
|Rat polyclonal anti-E cadherin
|Cat# U3254; RRID: AB_477600
|Rabbit polyclonal Anti-MUC7
|Cat# HPA006411; RRID: AB_1854204
|Mouse monoclonal anti-MUC7
|Cat# 4D2-1D7; RRID: AB_10866568
|Mouse monoclonal anti-MUC5B
|Cat# ab105460; RRID: AB_10862195
|Rabbit polyclonal anti-CRISP3
|Cat# HPA054392; RRID: AB_2682472
|Rabbit polyclonal anti-AMY1A
|Cat# HPA045394; RRID: AB_2679311
|Rabbit polyclonal anti-LPO1
|Cat# HPA028688; RRID: AB_10601909
|Rabbit polyclonal anti-STATH
|Dundee Cell Products Ltd., Dundee, UK
|Rabbit polyclonal anti-BPIFA2
|Bingle et al., 2009
|Human Salivary Gland Tissues
|Laboratory of Sarah Knox
|UCSF Biospecimen Resources Program (BIOS) 17-22669
|Human Saliva – Whole Saliva
|Laboratory of Stefan Ruhl
|Human Saliva – Parotid Saliva
|Laboratory of Stefan Ruhl
|Human Saliva – Submandibular/Sublingual Saliva
|Laboratory of Stefan Ruhl
|Chemicals, Peptides, and Recombinant Proteins
|Optimal cutting temperature compound (OCT)
|RNA lysis buffer, filter cartilages and collection tubes
|Pierce bicinchoninic acid (BCA) protein assay kit
|RNaseq data – Salivary glands
|Salivary and serum proteomics data
|Human Salivary Proteome Wiki
|RNaseq data from multiple tissues
|Proteome data from multiple tissues
|Human Protein Atlas
|Software and Algorithms
|Custom codes for downstream analysis
|Bolger et al., 2014
|Bray et al., 2016
|Eden et al., 2009
Further information and requests for resources, codes, and data should be directed to and will be fulfilled by the Lead Contact, Omer Gokcumen (email@example.com).
This study did not generate new reagents. The samples used in this study are from biopsies with specific permissions and are thus restricted in their availability.
Data and code availability
Experimental Model and Subject Details
Human fetal salivary glands were harvested from post-mortem fetuses obtained from elective legal abortions with the written informed consent of the patients undergoing the procedure and the approval of the Institutional Review Board at the University of California San Francisco (IRB# 10-00768). Specimens were donated anonymously at San Francisco General Hospital. Adult human salivary gland biopsies (Table S1) were collected via the UCSF Biospecimen Resources Program (BIOS) under the institutional review body approval number 17-22669. Saliva collection was performed as approved by the University at Buffalo Health Science Institutional Review Board (Study Nr. 030-505616).
Human tissue samples
Adult salivary gland tissues were collected with informed consent from patients aged 23 to 70 years by clinicians at the University of California, San Francisco Medical Center during routine surgeries (UCSF Biorepository, the institutional review body approval number 17-22669). The UCSF pathology lab deemed the investigated tissues to be healthy. For the RNaseq library preparation, we strictly sampled from core sections of the biopsy. Additionally, we took only every third 40 μm section (3-4 sections per sample) for analysis to ensure the specificity of our sampling.
Sample collection for the adult tissues took place during oral surgery procedures that were performed independent of this project. SM and PAR tissue samples were taken during oral surgeries from individuals suffering from cancers of the head and neck. The sample collection was limited to those patients who had not received radiotherapy, chemotherapy or immunotherapy. SL samples were derived from patients with salivary duct stones. Healthy tissue regions were identified and separated from inflamed or cancerous tissues by the UCSF pathology lab. Our immunofluorescent analysis further confirmed tissue health, as determined by cell and tissue morphology and the absence of lymphocytic infiltrates. It is unlikely but remains plausible that the disease status of the patients may have altered transcript levels. For example, it is possible that certain diseases, including oral cancers, can generate widespread inflammation, biasing our results for detecting higher levels of immune-system related genes. Since all adult sublingual glands were derived from female donors, we investigated the variation of gene expression among samples of the same gland type as well as sex-specific expression differences to test for any potential biases. We observed an extremely small variation in gene expression abundances of samples of the same gland type (Figures 1B and 1C) and no sex-specific trends at the global transcriptome level (data not shown). This indicates that the differences among individual samples of the same gland type are much smaller than the gland-specific transcriptome trends we are reporting. Human fetal salivary glands were harvested from post-mortem fetuses between 22 and 24 weeks of gestation with the approval of the Institutional Review Board at the University of California San Francisco (IRB# 10-00768). Tissue was identified by location and glandular appearance. Sex was confirmed through analysis of transcript levels of male-specific genes, namely, UTY and KDM5D (
Preparation of tissue samples
For RNA analysis, tissue was frozen in liquid nitrogen and stored at −80°C. RNA isolation was conducted as described previously (
). For immunofluorescent analysis, tissue was either flash-frozen in optimal cutting temperature compound (OCT) (Tissue-Tek) and stored at −80°C, or immediately fixed with 4% paraformaldehyde (PFA) overnight at 4°C. Fixed samples were washed with PBS, cryoprotected by immersion in a 12.5%–25% stepwise sucrose gradient, and then embedded in OCT for storage at −80°C. Tissue was sectioned (12 μm thickness) immediately before immunofluorescent analysis using a cryostat (Leica).
RNA isolation and sequencing
Human adult and fetal tissue samples were mechanically homogenized using a hand-held homogenizer (Thermo Fisher Scientific) and lysed in 500 μl RNA lysis buffer (Ambion) by sonication (1 × 2-4 s pulse, Branson SFX150). RNA was isolated from 3 × 30 μm sections of human adult and fetal tissue using the RNAqueous Micro Kit (Ambion), and total RNA samples were DNase-treated (Ambion). Sample yield and integrity was analyzed using a 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). RNA sequencing was performed by standard operating procedure by GENEWIZ (https://www.genewiz.com/en) using Illumina HiSeq with a 2 × 150 bp configuration. Quality control of the obtained sequences was performed using FastQC (
) (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ Accessed 12/10/2017). The results were further reviewed by MultiQC (
). Adaptor sequences, low-quality bases from both sides of the read (3 bases or smaller), and reads with a length smaller than 36 bp were discarded by Trimmomatic (
Sample quality control
We reasoned that if any contamination existed in our samples, we expect it to be mainly derived from muscle tissue that these glands reside adjacent to. Thus, as a further precaution, we now searched transcripts of genes that are known to be muscle tissue-specific and not expressed in salivary glands (MYH3, ACTA1, PAX7, MYH8, KLHL41, SGCA, MYBPH, MYOG, MYOZ1, XIRP1, XIRP2, LDB3, TNNT3, TTN, DMD, and MYH1). Based on this analysis, we found evidence of slight contamination in two fetal submandibular gland samples and eliminated these from the analysis reported in this study. We still provided the expression data for these two samples in Table S2 for reproducibility purposes. Also, if contamination were a major factor, we would expect to see more variation among different samples of the same gland type due to varying levels of contamination. As a further quality control step, we screened sequenced tissue for genes encoding inflammatory markers including IL1b, TNF, IL17, CXCL13, and CCL21, and did not find any of these to be upregulated in any of the samples.
The immunofluorescent analysis was performed on 12-μm-thick fixed tissue sections. Sections were imaged using a line-scanning confocal microscope (Leica Sp5). Frozen adult human salivary sections were fixed with 4 % PFA at room temperature (RT) for 20 min and subsequently washed in PBS followed by permeabilization with 0.5 % Triton-X in PBS for 10 min. Tissue sections were blocked with 10% donkey serum (Jackson Laboratories) and 1% BSA (Sigma-Aldrich) in 0.05% PBS-Tween 20 for 1 h at RT. Tissue sections were incubated with the following primary antibodies overnight at RT: rabbit anti-MUC7 (1:200; Sigma HPA006411), mouse anti-MUC7 (1:500; Abcam ab105466), mouse anti-MUC5B (1:500; Abcam ab105460), rat anti-E cadherin (1:300; Sigma U3254), rabbit anti-CRISP3 (1:200; Sigma HPA054392), rabbit anti-AMY1A (1:200; Sigma HPA045394), rabbit anti-AQP5 (1:400; Millipore AB3559), rabbit anti-statherin (1:500, Ruhl laboratory), and rabbit anti-LPO1 (1:200; Sigma HPA028688). Antibodies were detected by incubating samples with Cy2-, Cy3- or Cy5-conjugated secondary Fab fragments (1:300 in 0.05% PBS-Tween-20; Jackson Laboratories) for 2 h at RT. Nuclei were detected using Hoechst 33342 (1:1000, Sigma Aldrich). Fluorescence images were obtained using a Leica Sp5 line-scanning confocal microscope.
Saliva sample collection
Saliva from healthy humans was collected following the protocol approved by the University at Buffalo Human Subjects IRB board (study # 030–505616). Informed consent was obtained from all human participants. Stimulated whole mouth saliva (WS) was collected while chewing on parafilm. Clarified whole saliva supernatant was obtained by centrifugation at 12,000 × g for 15 min at 4°C to remove particulate matter. Ductal salivary secretions were collected following the stimulation of salivary flow by application of 2% citric acid to the dorsum of the tongue. PAR saliva was collected from the orifice of Stensons’s duct using a modified Carlsen-Crittenden device, and SM/SL saliva was collected from the floor of the mouth using disposable plastic Pasteur pipettes (VWR International, Radnor, PA) after isolation of the gland orifice with absorbent cotton rolls and blockage of the PAR duct orifice. Protein concentrations in saliva samples were determined using the Pierce bicinchoninic acid (BCA) protein assay kit (Thermo Scientific, Rockland, IL), using bovine serum albumin as the standard.
SDS-PAGE and immunoblotting
Saliva samples were denatured under reducing conditions. Equal amounts of total protein (15 μg per lane for Coomassie and periodic acid Schiff stain) were subjected to separation by SDS-PAGE using 8%–16% gradient Tris-glycine mini gels. Only for the detection of statherin, because of its small molecular size, Tris-tricine gels were used for better resolution (Novex, Invitrogen, Carlsbad, CA). Proteins were stained by Coomassie blue and glycans by periodic acid Schiff stain (
). Stained gels were imaged using a flat-bed scanner in the transparent mode (ImageScanner III, GE Healthcare). Electrotransfer during immunoblotting was done using a BioRad Trans Turbo Blot apparatus. Blots were probed with the following antibodies diluted in Tris-buffered saline containing 2% milk (TBS-milk): mouse monoclonal anti-human mucin 7 (MUC7) diluted 1:500 (4D2-1D7, Abcam), rabbit polyclonal anti-human alpha-amylase 1A (AMY1) diluted 1:1,000 (HPA045394, Sigma), rabbit polyclonal anti-human parotid secretory protein (SPLUNC2B, BPIFA2) diluted 1:500 (C-20, a gift from Dr. Colin Bingle at the University at Sheffield (
), rabbit polyclonal anti-CRISP3 diluted 1:500 (HPA054392, Sigma), and polyclonal rabbit anti-human STATH (Dundee Cell Products Ltd., Dundee, UK). As secondary antibodies, Alexa Fluor 488 tagged goat anti-rabbit or anti-mouse IgG (Life Technologies) were used diluted 1:1,000 in TBS-milk. Fluorescent bands were detected using a BioRad ChemiDoc imaging system.
Quantification and Statistical Analysis
All the downstream data analysis and comparison of our dataset with publicly available databases were conducted using custom bioinformatic pipelines written on Rstudio (v1.2.1335), R(v3.5.3), and ggplot2 (
). These codes were made available through https://github.com/GokcumenLab/glabBits/tree/master/Saliva-RNASeq. We used Pearson Correlation analysis available in R base package for calculating the relationships between transcriptomes of different gland types presented in Figure S1. We used non-parametric Wilcoxon rank-sum test also available in R base package for testing differences between the expression levels of genes encoding secreted and nonsecreted proteins (Figure S2).
Filtered reads were mapped to the human transcriptome reference (hg19) from Ensembl (
) and biomaRt (
) and quantified using Kallisto (
). Differential expression analysis was performed by DESeq2 (
), which calculates the fold-change of transcription of each gene using the Wald test and a correction for multiple hypotheses from the raw reads. For this analysis, we used pairwise comparison of all fetal and adult gland types for each other. Each of these categories have at least three samples. We used an adjusted (i.e., multiple-hypotheses-corrected p value of < 0.0001) to identify genes that were upregulated (fetus < adult) and downregulated (adult < fetus) during development in each type of salivary gland. Since all adult SL tissue samples were of female background, we excluded the Y chromosome from the analyses. The remaining 40,882 genes were used for subsequent analyses. The normalized RNA abundances for 40,882 genes that we interrogated as well as comparative results are provided in Table S2. The RNA-seq data (fastq files) have been submitted to GEO https://www.ncbi.nlm.nih.gov/geo/ with the project name GSE143702.
Identifying genes with salivary gland specific expression
To identify genes that are expressed in a salivary gland-specific manner, we compared our RNaseq results from the three major salivary glands to RNaseq data available through the GTEx database that were obtained from 53 tissues collected from approximately 1,000 individuals (https://gtexportal.org/home/documentationPage#AboutSamples) (
) (Table S5). These organs include those that synthesize secreted proteins such as the pancreas, mammary gland, thyroid, and intestine. To define whether a gene was specific to salivary glands, we first identified among the non-salivary gland tissues or organs those in which the gene in question was highest expressed. We then compared this level of gene expression to the gene expression levels in each major salivary gland. To account for potential differences in the RNaseq datasets from the GTEx database due to different experimental platforms and bioinformatics data analysis processes, we compared relative expression level ranks by using a (log10 (1+normalized read counts)) transformation for each organs RNaseq dataset.
Functional categorization of genes was determined by cross-checking using Gene Ontology Resources (http://geneontology.org/) (
). Then, an enrichment analysis was conducted using GOrilla (
) applying default settings that enable multiple hypothesis testing (Table S3). Secreted protein gene annotations were obtained from the Human Protein Atlas (https://www.proteinatlas.org/) with the query: “protein_class:Predicted secreted proteins NOT protein_class:Predicted membrane proteins” (
). The glycosylated genes were defined as per Gene Ontology database. We identified genes that are as either “protein N-linked glycosylation” (GO:0006487) or “protein O-linked glycosylation” (GO:0006493), and are abundantly expressed (> 1000 normalized gene count from DESeq) in salivary gland tissues. Note that we omitted genes listed under GO subcategory “O-glycan processing” (GO:0016266), which are mostly mucin genes. The genes encode for proteins that are targets of O-glycosylation and they do not regulate glycosylation.
We used the Human Salivary Proteome Wiki (https://salivaryproteome.nidcr.nih.gov/) to obtain proteome data for the whole saliva, ductal saliva from the PAR, SM, and SL, as well as from blood plasma. The database provides mass-spectrometry-based abundance data for approximately 3,000 proteins compiled from multiple studies (
), including those that focused on ductal secretions (
). Similar to the comparison approach that we used for integrating GTEx data, we used log transformation (log10 (1+normalized abundance)) for each dataset. In addition, we used mass-spectrometry-based protein abundances available through Human Protein Atlas from 29 tissues curated recently by
(Table S5). This dataset contains abundance information on 13,000 proteins.
The authors thank William Lau for help with retrieving the protein abundance data from the NIDCR Human Salivary Proteome Wiki. We are grateful to Tasha Lea (certified pathologist assistant) and Erica Oropeza, along with the Biospecimen Resources Program (BIOS) at UCSF, for facilitating the efficient acquisition, for quality control, and for management of biospecimens. We thank Fatih Aksel for his advise on computational matters. This work was supported by the National Institute of Dental and Craniofacial Research (NIDCR) under award numbers 1R35DE028255 (to S.M.K.) and 2R01DE019807 (to S.R.), by Tobacco-Related Disease Research (TRDRP) under award 28IR-0071 (to S.M.K.), and by National Science Foundation (NSF) under award number 1714867 (to O.G).
M.S. and E.X. conducted the bioinformatics analysis. M.S. contributed to the writing, data curation, formal analysis, and visualization. S.N. conducted the RNA extraction and validation for all salivary glands. E.A.G., A.J.M., and A.G. performed the immunofluorescent analyses. L.N. and S.R. performed western blot analysis. J.C. and W.R. provided human adult tissue and clinical input. S.R., S.M.K., and O.G. were responsible for conceptualization, data curation, formal analysis, validation, visualization, supervision, resources, methodology, writing the original draft, review and editing, and project administration.
Declaration of Interests
The authors declare no competing interests.