Skip to main content
  • Research article
  • Open access
  • Published:

Identification and evolution of C4 photosynthetic pathway genes in plants

Abstract

Background

NADP-malic enzyme (NAPD-ME), and pyruvate orthophosphate dikinase (PPDK) are important enzymes that participate in C4 photosynthesis. However, the evolutionary history and forces driving evolution of these genes in C4 plants are not completely understood.

Results

We identified 162 NADP-ME and 35 PPDK genes in 25 species and constructed respective phylogenetic trees. We classified NADP-ME genes into four branches, A1, A2, B1 and B2, whereas PPDK was classified into two branches in which monocots were in branch I and dicots were in branch II. Analyses of selective pressure on the NAPD-ME and PPDK gene families identified four positively selected sites, including 94H and 196H in the a5 branch of NADP-ME, and 95A and 559E in the e branch of PPDK at posterior probability thresholds of 95%. The positively selected sites were located in the helix and sheet regions. Quantitative RT-PCR (qRT-PCR) analyses revealed that expression levels of 6 NADP-ME and 2 PPDK genes from foxtail millet were up-regulated after exposure to light.

Conclusion

This study revealed that positively selected sites of NADP-ME and PPDK evolution in C4 plants. It provides information on the classification and positive selection of plant NADP-ME and PPDK genes, and the results should be useful in further research on the evolutionary history of C4 plants.

Background

Photosynthesis is the process used by plants to convert solar energy into chemical energy. This enables them to produce their own food for development [1]. Photosynthesis in higher plants can be classified into C3, C4 and Crassulacean acid metabolism (CAM) based on how they fix carbon during the process that leads to different initial photosynthesized products. The majority of land plants use the C3 pathway, whereas C4 and CAM plants were evolved from C3 plants [2, 3]. C4 plants are more efficient than C3 plants in utilizing CO2 leading to superior adaptiveness to subtropical and tropical environments, lower concentrations of CO2, and more stressed environments [4]. Numerous studies have focused on understanding the efficiency and the mechanism of carbon fixation in C4 plants [5, 6].

Among the many enzymes involved in the C4 photosynthesis pathway PPDK and NADP-ME are considered to be the most important [7, 8].

PPDK is a critical enzyme that controls the photosynthetic rate in C4 plants [9]. Many PPDK genes in C4 and CAM plants have been cloned, exemplified by those in maize and Mesembryanthemum crystallinum [10, 11]. A phylogenetic study suggested that PPDK genes in sorghum and rice are homologous [12]. Detailed analysis of PPDK isoform sequences between the Poaceae and Arabidopsis indicated that their sequences share about 20 amino acids of chloroplast transit peptide (cTP), proving that the PPDK genes had evolved before divergence of monocots and dicots [12].

NADP-ME genes can be classified into photosynthetic and non-photosynthetic types. The former mostly function in the chloroplasts [13] and improve photosynthetic efficiency by facilitating the release of CO2 from decarboxylation of malate in proximal bundle-sheath cells, and in C4 plants by providing CO2 to Rubisco for carbon fixation [14, 15]. Genomic and phylogenetic analyses showed that the NADP-ME gene family in the Poaceae has four branches, with one branch (NADP-ME IV) being expressed in the plastids. The C4-specific NADP-ME has some codons suppressed under positive selection and is independent of the NADP-ME IV family [16, 17].

Natural selection, a key factor in biological evolution, includes positive selection, purifying selection, and neutral selection [18]. The base substitution rate (non-synonymous/synonymous, ω = dN/dS), an index that determines selection pressure after change, is typically used to understand the direction of evolution and its selective strength in a coding sequence. If ω > 1, a gene might undergo positive selection or presence of a new amino acid offers a fitness advantage; ω =1 is indicative of neutral selection; and a value of ω < 1 indicates purifying selection [19]. As an important basis of adaptive evolution, positive selection functions in a population by favorable transmission and increased frequency of a mutant allele [18].

Positive selection often implies the emergence of a new function [19, 20]. In transformation of the C3 to C4 pathway positive selection mainly occurred in key enzymes in C4 photosynthetsis, such as Rubisco, phosphoenolpyruvate carboxylase (PEPC), NADP-ME, and PPDK [12, 21,22,23,24,25,26]. For example, two positively selected large subunit (LSu) amino acid substitutions, M309I and D149A, distinguish C4 Rubiscos from the ancestral C3 species [21]. With the switch to C4, 21 amino acids evolved under positive selection and converged to similar or identical amino acids in most of the grass C4 PEPC lineages [22]. Acquisitions of C4 PEPC in sedges (Cyperaceae) were driven by positive selection on at least 16 codons [23]. Previous studies used variation in amino acids to study rates of evolution in the C4-NADP-ME pathway, and a number of residues was found to be under significant positive selection [24]. During independent evolution of NADP-ME in C4 plants strong positive selection led to sequence convergence [25]. For example, among the 29 residues of C4 NADP-MEs and non C4 NADP-MEs, residues 284, 450 and 539 were identified as having been under positive selection during evolution of C4-NADP-ME in grasses, suggesting they were important in explaining kinetic and structural differences between C4 and non-C4 groups [26]. Phylogenetic analysis also suggested that the maize PPDK gene and its sorghum ortholog were under significant positive selection, implying possible functional changes [12].

The underlying molecular mechanisms of C4 photosynthesis are poorly understood and few studies have been directed to understanding whether positive selection was associated with evolution of NADP-ME and PPDK in C4 plants. Completion of the whole genome sequences of C4 plants such as sorghum and maize [27, 28], and improved knowledge of photosynthetic pathways and evolution, have set a solid foundation for study of the evolution and expression of key C4 enzyme genes. A comparison of the PPDK and NADP-ME gene families in C4 plants could advance knowledge of the evolutionary, functional and metabolic roles of these genes during photosynthesis. This study investigated the evolutionary processes in NADP-ME and PPDK in algal, moss, Lycopodiophyta, monocotyledon and dicotyledon species, providing new information regarding C4 photosynthesis.

Results

Numbers of NADP-ME and PPDK genes in plants

A total 162 NADP-ME and 35 PPDK sequences were found in 25 species, including one algal, one moss, one Lycopodiophyta, 10 monocot (including 6 C4), and 12 dicot (including 1 C4) species (Additional file 1: Table S1; Additional file 2: Table S2). There were 14 NADP-ME genes in soybean. Carrot, cotton and poplar each had 9 NADP-ME genes and Selaginella moellendorffii had 3 (Additional file 1: Table S1). The number of PPDK genes was far fewer, with the largest number being 3 in the banana species Musa acuminata. Most other species had only 1 or 2 PPDK genes (Additional file 2: Table S2).

Analysis of conserved amino acid sequences in NADP-ME and PPDK proteins

The MEME program used to analyze conserved sequences in NADP-ME and PPDK proteins identified 20 motifs (Additional file 3: Table S3; Additional file 4: Table S4). Among NADP-ME genes, those from algae (Cre14.g629700.t1.1, Cre14.g628650.t1.2, Cre14.g629750.t2.1, Cre01.g022500.t1.2) did not contain motifs 13, 14, 15, 17 and 19. Subfamily A had two unique motifs, 17 and 19, whereas subfamily B had three unique motifs, 13, 14 and 15 (Additional file 5: Figure S1). The PPDK gene in green algae lacked motif 15, whereas all other PPDK genes had all 20 candidate motifs (Additional file 6: Figure S2).

Phylogeny of NADP-ME and PPDK

We constructed a phylogenetic tree for all 162 NADP-ME genes from 25 species and discovered that they shared a common ancestor. The algal NADP-ME was the most ancient gene and was divergent from the rest of the clade. Subfamilies A and B separated after whole genome duplication (Additional file 7: Figure S3). In subfamily A, the NADP-ME gene in algae branched off first, and the rest were classified into subfamilies A1 and A2. A clear clustering between monocot and dicot plants for each subfamily was observed. Among the A1 and A2 monocot branches, NADP-ME in Musa acuminata and Ananas comosus branched off before the Poaceae. Within the Poaceae, NADP-ME genes in C4 plants were more closely related to each other (Fig. 1). In the B subfamily, the NADP-ME genes of algae again branched off first, followed by the land plants Physcomitrella patens and Selaginella moellendorffii. Among angiosperm species, NADP-ME from dicots (B2 subfamily) branched first and NADP-ME in the monocots diverged after gene duplication and formed the B1 subfamily which underwent three whole genome duplication events. Like the A subfamily, the NADP-ME in Musa acuminata and Ananas comosus branched off earlier than counterpart in the Poaceae in which there were four branches, namely, NADP-ME-B-M1, NADP-ME-B-M2, NADP-ME-B-M3 and NADP-ME-B-M4 (Fig. 2). We discovered that the NADP-ME genes were clustered and closely related within each of the C3 and C4 species groups.

Fig. 1
figure 1

Phylogenetic tree established for the NADP-ME gene family in 25 species (A branch). The A branch was further diverged into subfamilies A1 and A2. For the branch model, a1-a7 were assigned to be the front branches in the selective pressure analysis. C4 plants are marked with green circles

Fig. 2
figure 2

Phylogenetic tree established for the NADP-ME gene family in 25 species (B branch). The B branch was further diverged into subfamilies B1 and B2. For the branch model, b1-b5 were assigned as front branches in the selective pressure analysis. C4 plants are marked with green circles

All 35 PPDK genes from 25 species were used to construct a phylogenetic tree. The PPDK gene in green algae was first to branch off, and there was further divergence into subfamilies I and II. Subfamily I consisted of monocots and subfamily II consisted of dicots. The PPDK gene in subfamily I first appeared in Musa acuminata and Ananas comosus and later diverged to the Poaceae. Whole genome duplication then occurred after this divergence, and two main branches were formed, with one branch including barley, maize and Brachypodium distachyon showing loss of the PPDK gene or lack of a conserved PPDK structure. It was also discovered that PPDK genes in C4 plants are closely related (Fig. 3).

Fig. 3
figure 3

Phylogenetic tree established for PPDK gene family in 25 species, including branches I and II. For the branch model, a-e were assigned as front branches in the selective pressure analysis. C4 plants are marked with green circles

Analysis of selection pressure on NADP-ME and PPDK genes

Selection pressures within each of the A and B subfamilies of NADP-ME genes were investigated. In the subfamily A, the M0 and M3 models were based on the site model for calculation. Under the M0 model ω was 0.091, indicating that it was under purifying selection. The P-value from the chi-squared test comparing the M0 and M3 models was 0.000, suggesting that the ω value were not constant across loci (Table 1). For the branch model, seven branches, a1-a7, were assigned as front branches. The branch model results showed that the ω values for all front branches were < 1. Likelihood ratio tests (LRT) showed that branches a1, a2 and a3 were significantly different from the other branches with all ω values < 1 thus suggesting purifying selection (Fig. 1; Table 1). The branch-site model revealed that the proportions of positive selection at a1-a5 were 5, 0.2, 5.8, 11 and 1.5%, respectively, whereas the proportions at a6 and a7 were close to 0. The numbers of positively selected sites for a1-a7 were 8, 5, 14, 5, 4, 2 and 2 at a posterior probability of 0.6. The LRT result suggested that branches a1, a3 and a5 were significantly different from the M1 model (P <  0.05). Interestingly, the a1 branch, ancestral to subfamilies A1 and A2, was stabilized after positive selection at both the a1 and a3 branches. On the contrary, a4 and a5 still had positively selected sites following positive selection at the a2 branch. This suggested that subfamily A1 had undergone different levels of positive selection at different branches. The a5 and a7 branches comprised mostly monocots and C4 plants (Fig. 1; Table 1). For subfamily B, the ω values were similar to those of subfamily A on the site model. LRT indicated that subfamily B was still under purifying selection with ω values varying among sites (Table 2). Branches b1-b5, were under strong purifying selection with proportions of positively selected sites of 5.5, 0, 0.6, 8.9 and 1.8% and ω values much smaller than 1 based on the branch and branch-site models (Table 2). The numbers of positively selected sites for b1-b5 were 8, 0, 2, 0 and 5 at a posterior probability of 0.6 (Fig. 2; Table 2). It was concluded that b1 is the most ancient branch of NADP-ME genes in subfamily B with a total of 8 positively selected sites. The b3 and b5 branches had 2 and 5 positively selected sites, whereas both the b2 and b4 branches comprised dicots, with no positively selected sites at a posterior probability of 0.6, thus indicating that the b2 and b4 branches were more conserved than the b3 and b5 branches and that the evolutionary steps from b1 to b3 and b5 in Subfamily B were rather complex (Fig. 2; Table 2).

Table 1 Parameters in the site analysis, branch and branch-site analyses of NADP-ME-A
Table 2 Parameters in the site analysis, branch and branch-site analyses of NADP-ME-B

For the PPDK gene family, the M0 and M3 models compared by LRT yielded a P-value of 0.000 based on the site model. This indicated that the ω values were not constant across sites, similar to the NADP-ME gene family results (Table 3). Branches a-e were assigned as foreground branches in the branch model, with their ω values much smaller than 1, suggesting purifying selection. Interestingly, the ω values from a to e were gradually increasing, with a (0.0006) < b (0.024) < c (0.026) < d (0.078) < e (0.284). This trend suggests that PPDK genes were under strong purifying selection in lower plants prior to divergence of monocots and dicots. Even after divergence of monocots and dicots from lower plants there was duplication of PPDK genes. The ancestral branch of both dicots and monocots (c, b, d) are still under strong purifying selection. Purifying selection on branch e, which contains C4 plants was declining (Fig. 3; Table 3). The branch-site model showed that the proportion of positively selected sites of branches a-d was close to 0, but in the case of branch e it was 4.4%. The numbers of positively selected sites of a-e were 1, 1, 0, 0, and 8 at a posterior probability of 0.6. Positively selected sites on branch e were statistically more than on the other four branches with P <  0.0001 (Fig. 3; Table 3).

Table 3 Parameters in the site analysis, branch and branch-site analyses of PPDK

Protein structural characteristics of NADP-ME and PPDK

Based on the above phylogenetic relationships and positive selection analysis, we conducted detailed structural and functional studies using the protein sequence alignment of NADP-ME at the a5 branch and PPDK at the e branch, which contain monocots and C4 plants, respectively. Cre06.g268750.t1.2 in the a5 branch and Cre10.g424750.t1.2 in the e branch were used as reference sequences for further analyses. Sites 94H and 196H in the a5 branch (Fig. 4) and 95A and 559E in the e branch (Fig. 5) were significantly positively selected at a posterior probability threshold of 95%. Conserved and highly conserved regions were distinguished.

Fig. 4
figure 4

Multi-alignment of the amino acid sequences of NADP-ME in the a5 branch. Cre06.g268750.t1.2, GRMZM2G085747_P05, Sobic.001G201700.1, Seita.9G200600.1, Sevir.9G199800.1, Bradi3g30230.1, HORVU1Hr1G045720.1, LOC_Os10g35960.1, Gorai.007G097100.1, Aco007622.1, and GSMUA_Achr1P00210_001 represent NADP-ME genes of Chlamydomonas reinhardtii, Zea mays, Sorghum bicolor, Setaria italica, Setaria viridis, Brachypodium distachyon, Hordeum vulgare, Oryza sativa, Gossypium raimondii, Ananas comosus, and Musa acuminata, respectively. Positively selected sites for NADP-ME in the above 11 monocotyledons were marked and displayed through espript3.0 (http://espript.ibcp.fr/ESPript/cgi-bin/ESPript.cgi). Cre06.g268750.t1.2 was used as the reference sequence. Posterior probability (P) are indicated: *, P > 95%; **, P > 99%. Conserved regions are boxed, highly conserved loci are in red

Fig. 5
figure 5

Multi-alignment of the amino acid sequences of PPDK in the e branch. Cre10.g424750.t1.2, Seita.9G354600.1, Sevir.9G360400.1, Pahal.9G416400.1 and Sobic.001G326900.1, represent PPDK genes of Chlamydomonas reinhardtii, Setaria italica, Setaria viridis, Panicum hallii, and Sorghum bicolor, respectively. Positively selected sites for PPDK in the above five C4 plants were marked and displayed through espript3.0 (http://espript.ibcp.fr/ESPript/cgi-bin/ESPript.cgi). Cre10.g424750.t1.2 was used as the reference sequence. Posterior probability (P) are indicated: *, P > 95%; **, P > 99%. Conserved regions are boxed, highly conserved loci are in red

Distribution of positively selected sites on three dimensional structures of NADP-ME and PPDK

We took the three-dimensional (3D) model of seita.9G200600.1 and seita.9G354600.1 as an example and analyzed the positively selected sites. As shown in Fig. 6a, the positively selected sites 94H and 196H in the a5 branch of NADP-ME-A were mapped to the sites 148S and 370 W of seita.9G200600.1. Similarly, the positively selected sites 95A and 559E in the e branch of PPDK were mapped to the sites 147R and 663H of seita.9G354600.1 (Fig. 6b). The yellow color in 3D models indicates the helix region, red represents the sheet region, and blue corresponds to specific amino acids. 148S, 147R, 663H were located in helix regions, and 370 W was located in the sheet region (Fig. 6).

Fig. 6
figure 6

Three-dimensional models of Setaria italica (a) seita.9G200600.1 and (b) seita.9G354600.1. Yellow color indicates the helix region, red represents sheet region, and blue corresponds to the positively selected sites. 148S, 147R and 663H was located in the helix region and 370 W was located in sheet region

Expression analysis of foxtail millet NADP-ME and PPDK genes determined by qRT-PCR

Based on the phylogenetic relationships (Figs. 1, 2 and 3), we selected 6 NADP-ME and 2 PPDK foxtail millet genes for qRT-PCR after light treatment. Expression levels of all these genes were up-regulated after light exposure for 1 h. Except for NADP-ME genes, Seita.5G314300.1 and Seita.9G200600.1, the others had higher expression levels after light treatment for 6 h (Fig. 7; Additional file 8: Table S5).

Fig. 7
figure 7

Expression profiling of (a) PPDK and (b) NADP-ME genes in foxtail millet under light treatment. Times of light exposure are shown on the x-axis

Discussion

Evolution of the NADP-ME and PPDK gene families

C4 photosynthesis evolved approximately 30 million years ago [29]. Angiosperm C4 plant species then underwent 62 independent evolutionary events [30]. Most C4 plants are monocots, including 4600 grass and 1600 sedge species, whereas only 1600 C4 species from 16 families are dicots with 75% of them in families Chenopodiaceae, Amaranthaceae, Euphorbiaceae, and Asteraceae [31]. Previous research concluded that despite the specific cell structure of C4 plants the enzymes PEPC, NADP-ME and PPDK were essential for C4 photosynthesis [32, 33]. Interestingly, increases in the numbers of NADP-ME and PPDK genes occurred later in evolution. Various studies have suggested that multiple duplication events occurred during plant evolution, including the γ event that separated monocots and dicots [34], and ρ event that occurred before divergence of wheat, maize and rice, but after divergence of grasses and pineapple [35], and τ and σ events that occurred in the Poaceae [36].

In this study, 14 and 7 NADP-ME genes were identified in soybean and maize, respectively (Additional file 1: Table S1). Although the maize genome size (2300 Mb) is more than twice that of soybean (1100 Mb) [28, 37] the number of NADP-ME genes in maize is less than in soybean, indicating that expansion of the NADP-ME gene family was not by genome duplication, but was caused by different expansion patterns after divergence of monocot and dicot species [38, 39]. For the 35 PPDK genes from 25 species identified in this study most species had only one or two members (Additional file 2: Table S2). Compared to NADP-ME the numbers of PPDK genes were less but were more stable during evolution.

NADP-ME and PPDK genes are widely present in photosynthetic plant species such as algae, mosses, ferns, gymnosperms and angiosperms [40, 41]. From the phylogenetic trees constructed in this study we concluded that NADP-ME genes were branched into subfamilies A and B. The B2 branch containing all dicot species evolved earlier than the B1 branch containing all monocot species, suggesting that the B subfamily evolved independently after divergence of the monocots and dicots a step known as the γ event (Fig. 2) [34]. The phylogenetic tree of the PPDK gene family showed that monocots branched off and formed subfamily I before dicots formed subfamily II, indicating that the PPDK gene family evolved independently after divergence of monocots and dicots [34]. In the Poaceae there was clear clustering within monocots and dicots. For example, the NADP and PPDK genes of C4 plants were more closely related to each other than to C3 plants (Figs. 1, 2 and 3). We inferred that both the NADP-ME and PPDK gene families in the Poaceae underwent independent evolution after the ρ event in monocots [36]. In addition, NADP-ME and PPDK in C4 plants are more closely clustered than in C3 plant species, possibly due to the higher photosynthetic efficiency of C4 plants.

Identification of positively selected sites and their function significance

This study used site, branch and branch-site models to investigate the effects of selection pressure on the NADP-ME and PPDK gene families. Both site and branch models failed to detect any positive sites, possibly negated by purifying selection and neutral drift [42, 43]. The branch-site model is most accurate and can detect rare positively selected sites on specific branches [44]. The branch-site model detected a total of 55 sites at a posterior probability of 0.6 that had undergone positive selection in the NADP-ME gene family (Tables 1 and 2). We found a total of 8, 5, 14, 5, 4, 2, and 2 positively selected sites for the a1-a7 branches, respectively, in subfamily A (Fig. 1; Table 1). In subfamily B we found 8, 0, 2, 0 and 5 positively selected sites for b1-b5 branches (Fig. 2; Table 2). The branch model for the PPDK gene family revealed that the ω values were much smaller than 1 for the five front branches, indicating strong purifying selection (Table 3). The branch-site model detected 1, 1, 0, 0 and 8 positively selected sites for branches a-e (Table 3).

Both site and branch models suggested that the NADP-ME and PPDK gene families had undergone mostly purifying selection while maintaining normal genes function. Detection of a few positively selected sites by the more accurate branch-site model demonstrated that only a few beneficial mutations had occurred during evolution in order to adjust to changing environments [45]. C4 plants are capable of utilizing lower amounts of CO2 compared to their C3 counterparts. This might be related to the positively selected sites found in both the NADP-ME and PPDK families in C4 plants.

Positive selection is the retention and spread of advantageous mutations throughout a population and has long been considered synonymous with shifts in protein function [45]. Determining the amount of positive selection has wide-ranging implications for understanding genome function and maintenance of genetic variation [46]. In this study, four positively selected sites, including 94H and 196H were identified in the a5 branch of NADP-ME and 95A and 559E in the e branch of PPDK at a posterior probability threshold of 95% (Figs. 4 and 5). Previous studies showed that minimal changes in the primary structure were responsible for the different kinetic behavior of each NADP-ME and PPDK isoform [47, 48]. To clarify the roles of positively selected sites in C4 plant evolution and explore the relationship between positively selected sites and high photosynthetic rates in C4 plants, 3D models of seita.9G200600.1 and seita.9G354600.1 were drawn. As shown in Fig. 6, positively selected sites 148S, 147R, and 663H were located in helix regions, whereas 370 W was located in a sheet region. These positive amino acid selection sites might reflect the functional divergence in C4 and C3 plants that caused C4 plants to possess higher photosynthetic capacity. These results also indicated that the amino acid sites of NADP-ME and PPDK family members changed during plant evolution, and that the evolutionary rates were different. It also provided a priority basis for further analysis of the functions of NADP-ME and PPDK.

Further analysis of genes in the a5 branch of NADP-ME and e branch of PPDK showed that the C4 plants in the a5 branch include GRMZM2G085747_P05, Sobic.001G201700.1, Sevir.9G198800.1, Pahal.9G197100.1, Aco007622.1, and Seita.9G200600.1 (Fig. 1). Previous study showed that maize GRMZM2G085747 was involved in the Calvin cycle by carbon fixation in the sheath cells of leaf vascular bundles maize (a C4 species) during photosynthesis [49]. Sorghum NADP-ME gene Sobic.001G201700 showed high transcript abundance in the C4 pathway [50]. Furthermore, a comparison of one C3 and 11 C4 grass species (Poaceae) showed that the transcript abundance of Sobic.001G201700 was consistently elevated in C4 species [24]. The e branch of PPDK members all belonged to C4 plants, including Seita.9G354600.1, Sevir.9G360400.1, Pahal.9G416400.1, and Sobic.001G326900.1 (Fig. 3). A previous study reported that Sobic.001G326900 showed a high transcript abundance in the C4 pathway [50]. In this study, the sites 94H and 196H in the a5 branch of NADP-ME and 95A and 559E in the e branch of PPDK were identified as positively selected at posterior probability thresholds of 95% (Figs. 4 and 5). GRMZM2G085747 and Sobic.001G201700 in the a5 branch of NADP-ME, and Sobic.001G326900 in the e branch of PPDK were all involved in C4 photosynthesis [24, 49, 50]. Our results suggested that these sites were positively selected for high photosynthetic rates during C4 evolution.

Conclusions

One hundred and sixty two NADP-ME and 35 PPDK genes characterized in 25 species had highly similar motif compositions within subfamilies. Phylogenetic analysis showed that the NADP-ME and PPDK genes can be placed in four and two branches, respectively. The NADP-ME and PPDK genes in C4 species had closer evolutionary relationships than in C3 species. Analyses of selective pressure on the NAPD-ME and PPDK gene families identified four positively selected sites, including 94H and 196H in the a5 branch of NADP-ME, 95A and 559E in the e branch of PPDK at posterior probability thresholds of 95%. The positively selected sites were located in helix and sheet region. It was inferred that positive selection was driving the evolution of NADP-ME and PPDK in C4 species. This study contributes to an increased understanding the roles of NADP-ME and PPDK in C3 and C4 species, and provides insights into the evolutionary biology of C4 plants.

Methods

Dataset

Conserved NADP-ME and PPDK protein sequences of Arabidopsis and rice were obtained from the public databases Uniprot (https://www.uniprot.org/) and TAIR (https://www.arabidopsis.org/). All NADP-ME and PPDK protein sequences and CDS (coding sequences) of 25 species, including representatives of algal, moss, Lycopodiophyta, monocotyledon and dicotyledon species were obtained from Phytozome V12 (https://phytozome.jgi.doe.gov/pz/portal.html) and incorporated into a local database. Each sequence was compared to the NADP-ME and PPDK protein sequences from other species and those from Arabidopsis and rice using blastp with a threshold of E < 1e-5. CDD and Pfam were used to investigate whether the sequences contained conserved NADP-ME and PPDK protein structures. Incomplete protein structures were removed.

Molecular weights and isoelectrical points of NADP-ME and PPDK protein sequences were analyzed using Expasy (https://web.expasy.org/compute_pi/).

Construction of phylogenetic trees and analysis of conserved protein sequences

Multiple comparisons of candidate NADP-ME and PPDK protein sequences were made using the software MUSCLE3.8.31 [51]. Neighbor joining (NJ) trees were constructed with the software MEGA 7.0 using the Poisson model with 1000 bootstrap replications, gaps were filled using pairwise methods, and other parameters were based on default values [52]. Maximum likelihood (ML) trees were constructed for NADP-ME and PPDK using the Bayesian Information Criterion (BIC) and 1000 bootstrap replications with the software IQ-TREE1.6.5 [53]. The optimal model of the ML trees was estimated using the parameter M: ONLY TEST. Visualization of the constructed phylogenetic tree used Figtree.

Analysis of conserved protein sequences used the software MEME 2.12.0 with -nmotifs: 20, −minw: 10, maxw: 50 [54]. Other parameters were based on default values. Results were visualized using TBtools software.

Analysis of natural selection pressure

The protein sequences of NADP-ME and PPDK from the multiple comparison analyses were determined using Muscle 3.8.31 software, the CDS and aligned protein sequences are submitted to the online tool PAL2NAL (http://www.bork.embl.de/pal2nal/) for codon alignment. Selection pressure was calculated using the software PAML4.9e, with ω < 1 indicating purifying selection, ω =1 indicating neutrality, and ω > 1 indicating positive selection [55]. Three methods were applied to calculate selection pressure: (1) site-specific models that adopt the M3 and M0 models in testing; (2) branch-specific models that compare the foreground branches to the background branches to test for positive selection; and (3) branch-site models (Model A), that tests for positively selected sites. Statistical analyses were performed using chi-squared tests.

Positive selection in protein sequences and structure analysis

The aligned rearranged CDS and amino acids were entered into PAL2NAL (http://www.bork.embl.de/pal2nal/), a web tool for performing multiple codon alignments. Then the aligned sequences were visualized by ESPript v3 (http://espript.ibcp.fr/ESPript/cgi-bin/ESPript.cgi).

The full-length protein sequences of foxtail millet (Setaria italica) NADP-ME and PPDK were submitted to I-TASSER server (https://zhanglab.ccmb.med.umich.edu/I-TASSER/) to predict the 3D structure. Positively selected sites were tested at a posterior probability threshold of 95% in the branch-site model and mapped onto the surface of 3D structures by PyMol v2.3 (http://PyMOLwiki.org).

Plant growth and harvesting

Foxtail millet cultivar Yugu 1 used for qRT-PCR was provided by Anyang Institute of Agricultural Sciences, Henan. Seeds were surface-sterilized in 0.5% NaClO for 1 min and cleaned three times with sterilized distilled water, then were plated on GM-agar media and stratified in darkness for 3 days at 4 °C. After germination, the seedlings were grown in darkness for 3 days at 27 °C and transferred to a growth chamber at 27 °C and light conditions (600 μmol m− 2 s− 1). After light treatment for 0, 1, and 6 h, leaves were collected, immediately frozen in liquid nitrogen and stored at − 80 °C for RNA isolation. All samples were biologically duplicated 3 times.

qRT-PCR

Primers designed by Primer 3 using cDNA sequences from Setaria italica v2.2 (phytozome.jgi.doe.gov) are listed in Additional file 9: Table S6. qRT-PCRs were performed in triplicate and using SYBR® Green PCR Master Mix Kit (Applied Biosystems, GA, USA). Data acquisition and analyses were performed using the ABI7900 system (Applied Biosystems). Relative expression levels were determined using the 2-ΔΔCT analysis method.

Availability of data and materials

All data generated or analyzed during this study has been contained within the manuscript and supplementary information files.

Abbreviations

3D:

Three-dimensional

BIC:

Bayesian Information Criterion

CAM:

Crassulacean acid metabolism

CDS:

Coding sequences

LRT:

Likelihood ratio test

Lsu:

Large subunit

ML:

Maximum likelihood

NADP-ME:

NADP-malic enzyme

NJ:

Neighbor joining

PEPC:

Phosphoenolpyruvate carboxylase

PPDK:

Pyruvate orthophosphate dikinase

qRT-PCR:

Quantitative RT-PCR

References

  1. Wohlfahrt G, Gu L. The many meanings of gross photosynthesis and their implication for photosynthesis research from leaf to globe. Plant Cell Environ. 2015;38:2500–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Aldous SH, Weise SE, Sharkey TD, Waldera-Lupa DM, Stuhler K, Mallmann J, et al. Evolution of the phosphoenolpyruvate carboxylase protein kinase family in C3 and C4Flaveria spp. Plant Physiol. 2014;165:1076–91.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Yin H, Guo HB, Weston DJ, Borland AM, Ranjan P, Abraham PE, et al. Diel rewiring and positive selection of ancient plant proteins enabled evolution of CAM photosynthesis in Agave. BMC Genomics. 2018;19:588.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. Caemmerer SV, Ghannoum O, Furbank RT. C4 photosynthesis: 50 years of discovery and innovation. J Exp Bot. 2017;68:97–102.

    Article  CAS  Google Scholar 

  5. Wang S, Tholen D, Zhu XG. C4 photosynthesis in C3 rice: a theoretical analysis of biochemical and anatomical factors. Plant Cell Environ. 2017;40:80–94.

    Article  CAS  PubMed  Google Scholar 

  6. Yin X, Struik PC. Can increased leaf photosynthesis be converted into higher crop mass production? A simulation study for rice using the crop model GECROS. J Exp Bot. 2017;68:2345–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Covshoff S, Szecowka M, Hughes TE, Smith-Unna R, Kelly S, Bailey KJ, et al. C4 photosynthesis in the rice paddy: insights from the noxious weed Echinochloa glabrescens. Plant Physiol. 2016;170:57–73.

    Article  CAS  PubMed  Google Scholar 

  8. Chen YB, Lu TC, Wang HX, Shen J, Bu TT, Chao Q, et al. Posttranslational modification of maize chloroplast pyruvate orthophosphate dikinase reveals the precise regulatory mechanism of its enzymatic activity. Plant Physiol. 2014;165:534–49.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Chastain CJ, Baird LM, Walker MT, Bergman CC, Novbatova GT, Mamani-Quispe CS, et al. Maize leaf PPDK regulatory protein isoform-2 is specific to bundle sheath chloroplasts and paradoxically lacks a pi-dependent PPDK activation activity. J Exp Bot. 2018;69:1171–81.

    Article  CAS  PubMed  Google Scholar 

  10. Fukayama H, Tsuchida H, Agarie S, Nomura M, Onodera H, Ono K, et al. Significant accumulation of C4-specific pyruvate, orthophosphate dikinase in a C3 plant, rice. Plant Physiol. 2001;127:1136–46.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Matsuoka M, Furbank RT, Fukayama H, Miyao M. Molecular engineering of C4 photosynthesis. Annu Rev Plant Biol. 2001;52:297–314.

    Article  CAS  Google Scholar 

  12. Wang X, Gowik U, Tang H, Bowers JE, Westhoff P, Paterson AH. Comparative genomic analysis of C4 photosynthetic pathway evolution in grasses. Genome Biol. 2009;10:R68.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Arias CL, Pavlovic T, Torcolese G, Badia MB, Gismondi M, Maurino VG, et al. NADP-dependent malic enzyme 1 participates in the abscisic acid response in Arabidopsis Thaliana. Front Plant Sci. 2018;9:1637.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Sonawane BV, Sharwood RE, Whitney S, Ghannoum O. Shade compromises the photosynthetic efficiency of NADP-ME less than that of PEP-CK and NAD-ME C4 grasses. J Exp Bot. 2018;69:3053–68.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Yin X, Struik PC. The energy budget in C4 photosynthesis: insights from a cell-type-specific electron transport model. New Phytol. 2018;218:986–98.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Gerrard Wheeler MC, Tronconi MA, Drincovich MF, Andreo CS, Flügge UI, Maurino VG. A comprehensive analysis of the NADP-malic enzyme gene family of Arabidopsis thaliana. Plant Physiol. 2005;139:39–51.

    Article  PubMed Central  CAS  Google Scholar 

  17. Christin PA, Samaritani E, Petitpierre B, Salamin N, Besnard G. Evolutionary insights on C4 photosynthetic subtypes in grasses from genomics and phylogenetics. Genome Biol Evol. 2009;1:221–30.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Barreiro LB, Quintana-Murci L. From evolutionary genetics to human immunology: how selection shapes host defence genes. Nat Rev Genet. 2010;11:17–30.

    Article  CAS  PubMed  Google Scholar 

  19. Rosnow JJ, Edwards GE, Roalson EH. Positive selection of Kranz and non-Kranz C4 phosphoenolpyruvate carboxylase amino acids in Suaedoideae (Chenopodiaceae). J Exp Bot. 2014;65:3595–607.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Kapralov MV, Smith JAC, Filatov DA. Rubisco evolution in C4 eudicots: an analysis of Amaranthaceae Sensu Lato. PLoS One. 2012;7:e52974.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Kapralov MV, Kubien DS, Andersson I, Filatov DA. Changes in Rubisco kinetics during the evolution of C4 photosynthesis in Flaveria (Asteraceae) are associated with positive selection on genes encoding the enzyme. Mol Biol Evol. 2011;28:1491–503.

    Article  CAS  PubMed  Google Scholar 

  22. Christin P-A, Salamin N, Savolainen V, Duvall MR, Besnard G. C4 photosynthesis evolved in grasses via parallel adaptive genetic changes. Curr Biol. 2007;17:1241–7.

    Article  CAS  PubMed  Google Scholar 

  23. Besnard G, Muasya AM, Russier F, Roalson EH, Salamin N, Christin PA. Phylogenomics of C4 photosynthesis in sedges (Cyperaceae): multiple appearances and genetic convergence. Mol Biol Evol. 2009;26:1909–19.

    Article  CAS  PubMed  Google Scholar 

  24. Watson-Lazowski A, Papanicolaou A, Sharwood R, Ghannoum O. Investigating the NAD-ME biochemical pathway within C4 grasses using transcript and amino acid variation in C4 photosynthetic genes. Photosynth Res. 2018;138:233–48.

    Article  CAS  PubMed  Google Scholar 

  25. Wang L, Peterson RB, Brutnell TP. Regulatory mechanisms underlying C4 photosynthesis. New Phytol. 2011;190:9–20.

    Article  CAS  PubMed  Google Scholar 

  26. Saigo M, Alvarez CE, Andreo CS, Drincovich MF. Plastidial NADP-malic enzymes from grasses: unraveling the way to the C4 specific isoforms. Plant Physiol Bioch. 2013;63:39–48.

    Article  CAS  Google Scholar 

  27. Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, et al. The Sorghum bicolor genome and the diversification of grasses. Nature. 2009;457:551–6.

    Article  CAS  PubMed  Google Scholar 

  28. Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, et al. The B73 maize genome: complexity, diversity, and dynamics. Science. 2009;326:1112–5.

    Article  CAS  PubMed  Google Scholar 

  29. Aubry S, Kelly S, Kümpers BM, Smith-Unna RD, Hibberd JM. Deep evolutionary comparison of gene expression identifies parallel recruitment of trans-factors in two independent origins of C4 photosynthesis. PLoS Genet. 2014;10:e1004365.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Sage RF, Christin PA, Edwards EJ. The C4 plant lineages of planet earth. J Exp Bot. 2011;62:3155–69.

    Article  CAS  PubMed  Google Scholar 

  31. Gowik U, Westhoff P. The path from C3 to C4 photosynthesis. Plant Physiol. 2011;155:56–63.

    Article  CAS  PubMed  Google Scholar 

  32. Wang Y, Bräutigam A, Weber AP, Zhu XG. Three distinct biochemical subtypes of C4 photosynthesis? A modelling analysis. J Exp Bot. 2014;65:3567–78.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Jiang L, Chen YB, Zheng J, Chen Z, Liu Y, Tao Y, et al. Structural basis of reversible phosphorylation by maize pyruvate orthophosphate dikinase regulatory protein. Plant Physiol. 2016;170:732–41.

    Article  CAS  PubMed  Google Scholar 

  34. Bowers JE, Chapman BA, Rong J, Paterson AH. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003;422:433–8.

    Article  CAS  PubMed  Google Scholar 

  35. Ming R, Van Buren R, Wai CM, Tang H, Schatz MC, Bowers JE, et al. The pineapple genome and the evolution of CAM photosynthesis. Nat Genet. 2015;47:1435–42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Mckain MR, Tang H, Mcneal JR, Ayyampalayam S, Davis JI, de Pamphilis CW, et al. A phylogenomic assessment of ancient polyploidy and genome evolution across the poales. Genome Biol Evol. 2016;8:1150–64.

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, et al. Genome sequence of the palaeopolyploid soybean. Nature. 2010;463:178–83.

    Article  CAS  PubMed  Google Scholar 

  38. Qin Z, Wang Y, Wang Q, Li A, Hou F, Zhang L. Evolution analysis of simple sequence repeats in plant genome. PLoS One. 2015;10:e0144108.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  39. D'Hont A, Denoeud F, Aury JM, Baurens FC, Carreel F, Garsmeur O, et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature. 2012;488:213–7.

    Article  CAS  PubMed  Google Scholar 

  40. Motti CA, Bourne DG, Burnell JN, Doyle JR, Haines DS, Liptrot CH, et al. Screening marine fungi for inhibitors of the C4 plant enzyme pyruvate phosphate dikinase: unguinol as a potential novel herbicide candidate. Appl Environ Microb. 2007;73:1921–7.

    Article  CAS  Google Scholar 

  41. Tronconi MA, Andreo CS, Drincovich MF. Chimeric structure of plant malic enzyme family: different evolutionary scenarios for nad- and nadp-dependent isoforms. Front Plant Sci. 2018;9:565.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Lawrie DS, Messer PW, Hershberg R, Petrov DA. Strong purifying selection at synonymous sites in D melanogaster. PLoS Genet. 2013;9:e1003527.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Smith MD, Wertheim JO, Weaver S, Murrell B, Scheffler K, Kosakovsky Pond SL. Less is more: an adaptive branch-site random effects model for efficient detection of episodic diversifying selection. Mol Biol Evol. 2015;32:1342–53.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Gharib WH, Robinson-Rechavi M. The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC. Mol Biol Evol. 2013;30:1675–86.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Qian J, Liu Y, Chao N, Ma C, Chen Q, Sun J, et al. Positive selection and functional divergence of farnesyl pyrophosphate synthase genes in plants. BMC Mol Biol. 2017;18:3.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Williamson RJ, Josephs EB, Platts AE, Hazzouri KM, Haudry A, Blanchette M, Wright SI. Evidence for widespread positive and negative selection in coding and conserved noncoding regions of Capsella grandiflora. PLoS Genet. 2014;10:e1004622.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. Wei M, Li Z, Ye D, Herzberg O, Dunaway-Mariano D. Identification of domain-domain docking sites within Clostridium symbiosum pyruvate phosphate dikinase by amino acid replacement. J Biol Chem. 2000;275(52):41156–65.

    Article  CAS  PubMed  Google Scholar 

  48. Wheeler MC, Arias CL, Tronconi MA, Maurino VG, Andreo CS, Drincovitch MF. Arabidopsis thaliana NADP-malic enzyme isoforms: high degree of identity but clearly distinct properties. Plant Mol Biol. 2008;67(3):231–42.

    Article  PubMed  CAS  Google Scholar 

  49. Li C, Huang Y, Huang R, Wu Y, Wang W. The genetic architecture of amylose biosynthesis in maize kernel. Plant Biotechnol J. 2018;16(2):688–95.

    Article  CAS  PubMed  Google Scholar 

  50. Döring F, Streubel M, Bräutigam A, Gowik U. Most photorespiratory genes are preferentially expressed in the bundle sheath cells of the C4 grass Sorghum bicolor. J Exp Bot. 2016;67(10):3053–64.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33:1870–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32:268–74.

    Article  CAS  PubMed  Google Scholar 

  54. Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37:W202–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24:1586–91.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge help from Robert A. McIntosh (University of Sydney), with English editing.

Funding

This work was supported by grants from the National Key R&D Program of Shanxi Province (201803D221019–5, 2015-TN-09) and Science & Technology Innovation Foundation of Shanxi Agricultural University (2016YJ05). The funding agency played no role in the design of the study and collection, analysis and interpretation of data or in writing the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

JG and PYG contributed to overall design of the experiments, provided advice for data analysis, and assisted in writing the manuscript. WPS analyzed the data, carried out the experiments and wrote the manuscript. LQY and JHG helped to perform phylogeny, selection pressure and protein modelling analyses. JMW, XYY and SQD participated in the design of experiments. All authors have read and approved the manuscript.

Corresponding authors

Correspondence to Jie Guo or Pingyi Guo.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1:

Table S1. Characterization of NADP-MEs in 25 plant species.

Additional file 2:

Table S2. Characterization of PPDKs in 25 plant species.

Additional file 3:

Table S3. Consensus sequences of motifs 1–20 in NADP-MEs.

Additional file 4:

Table S4. Consensus sequences of motifs 1–20 in PPDKs.

Additional file 5:

Figure S1. Conserved protein motifs in NADP-ME genes of 25 plant species. Motif numbers 1–20 are displayed as different colored boxes. Sequence information for each motif is provided in Additional file 3: Table S3.

Additional file 6:

Figure S2. Conserved protein motifs in PPDK genes of 25 plant species. Motif numbers 1–20 are displayed as different colored boxes. Sequence information for each motif is provided in Additional file 4: Table S4.

Additional file 7:

Figure S3. Phylogenetic tree established for 162 NADP-ME genes in 25 species. A1, A2, B1 and B2 are represented by the red, pink, blue and green, respectively.

Additional file 8:

Table S5. The raw data of qRT-PCR.

Additional file 9:

Table S6. Primer sequences of 8 genes used for qRT-PCR validation.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, W., Yue, L., Guo, J. et al. Identification and evolution of C4 photosynthetic pathway genes in plants. BMC Plant Biol 20, 132 (2020). https://0-doi-org.brum.beds.ac.uk/10.1186/s12870-020-02339-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/s12870-020-02339-x

Keywords