SNP markers for early identification of high molecular weight glutenin subunits (HMW-GSs) in bread wheat

A set of eight SNP markers was developed to facilitate the early selection of HMW-GS alleles in breeding programmes. In bread wheat (Triticum aestivum), the high molecular weight glutenin subunits (HMW-GSs) are the most important determinants of technological quality. Known to be very diverse, HMW-GSs are encoded by the tightly linked genes Glu-1-1 and Glu-1-2. Alleles that improve the quality of dough have been identified. Up to now, sodium dodecyl sulphate–polyacrylamide gel electrophoresis (SDS-PAGE) of grain proteins is the most widely used for their identification. To facilitate the early selection of HMW-GS alleles in breeding programmes, we developed DNA-based molecular markers. For each accession of a core collection (n = 364 lines) representative of worldwide bread wheat diversity, HMW-GSs were characterized by both genotyping and SDS-PAGE. Based on electrophoresis, we observed at least 8, 22 and 9 different alleles at the Glu-A1, Glu-B1 and Glu-D1 loci, respectively, including new variants. We designed a set of 17 single-nucleotide polymorphism (SNP) markers that were representative of the most frequent SDS-PAGE alleles at each locus. At Glu-A1 and Glu-D1, two and three marker-based haplotypes, respectively, captured the diversity of the SDS-PAGE alleles rather well. Discrepancies were found mainly for the Glu-B1 locus. However, statistical tests revealed that two markers at each Glu-B1 gene and their corresponding haplotypes were more significantly associated with the rheological properties of the dough than were the relevant SDS-PAGE alleles. To conclude, this study demonstrates that the SNP markers developed provide additional information on HMW-GS diversity. Two markers at Glu-A1, four at Glu-B1 and two at Glu-D1 constitute a useful toolbox for breeding wheat to improve end-use value.


Introduction
Wheat is one of the three most important crops in the world with production of about 729 million tonnes in 2014 (http:// faost at3.fao.org). In this context, wheat includes tetraploid species (2n = 28) such as durum wheat (Triticum turgidum spp. durum) and hexaploid species (2n = 42) such as bread wheat (T. aestivum spp. aestivum). Wheat is thus a major component of the human diet worldwide, often being the main source of energy. Carbohydrates from bread alone contribute about 20% of energy intake (Shewry and Hey 2015). Wheat is also an important plant source of protein providing on average 20% of the total protein in the human diet. Almost all the world's wheat production is used after industrial processing. Each type of wheat end product requires particular qualities for processing that are mainly based on the properties of dough, determined by unique combinations of cohesiveness and viscoelasticity due to gluten. As described in Shewry et al. (2002), gluten is a continuous network formed when wheat seed storage proteins (SSP) are Communicated by Aimin Zhang.

Electronic supplementary material
The online version of this article (https ://doi.org/10.1007/s0012 2-019-03505 -y) contains supplementary material, which is available to authorized users. mixed with water. Therefore, SSP concentration and composition are largely responsible for wheat end-use quality. While gluten is necessary for processing wheat-based products, it also triggers gluten-related disorders in humans, like allergies, coeliac disease and non-coeliac gluten sensitivity (Sapone et al. 2012).
The wheat SSP represent about 80% of the total protein in the grain. They mainly consist of polymeric glutenins and monomeric gliadins. According to their electrophoretic mobility, gliadins and glutenins are subdivided into several fractions. Glutenins are classified as high molecular weight or low molecular weight glutenin subunits (HMW-GSs and LMW-GSs, respectively). Glutenins, in particular HMW-GSs, confer dough elasticity, while gliadins confer viscosity (MacRitchie 1999;Shewry et al. 2002). Dough quality results from the balance between these two properties or, in other words, the gliadin-to-glutenin ratio. A pioneer study (Payne et al. 1979) reported that HMW-GSs influence dough strength and this has been confirmed by numerous studies, reviewed by Shewry (2009). The effects of HMW-GSs on the rheological properties of dough are mainly explained by their β-spiral structure, which has intrinsic elasticity (Shewry et al. 2001), and their ability to form large polymers stabilized by interchain disulphite and hydrogen bonds (see the review of Shewry et al. 2002). HMW-GSs are therefore essential for dough viscoelasticity as they can form an elastic network that acts as the backbone of gluten, making them an attractive target for genetic engineering.
LMW-GSs are encoded by multigene families located at the orthologous Glu-3 loci. HMW-GSs are encoded by the Glu-1 loci, named Glu-A1, Glu-B1 and Glu-D1, located on the long arms of the homoeologous chromosomes of group 1 (Payne et al. 1987). Each locus comprises the two tightly linked genes Glu-1-1 and Glu-1-2 that encode x-type and y-type HMW-GS, respectively. In wheat cultivars, three to five of the six genes of this small multigene family are generally expressed, as some alleles are known to be null. For example, Glu-A1-2 is rarely expressed, and silent alleles at Glu-A1-1 and Glu-B1-2 have been reported. In addition, duplication of Glu-B1-1 may arise as in the overexpressed Bx7 allele (Bx7OE), leading to overexpression (Ragupathy et al. 2008).
The glutenin coding sequence is composed of a central repetitive domain and two unique sequence termini. The size of the repetitive domain is highly variable, which explains why the proteins can be easily distinguished by differences in electrophoretic mobility in sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE). According to Shewry et al. (1992), size separation of HMW-GSs makes it possible to identify 3 (called a, b and c), 11 (a to k) and 6 (a to f) allelic forms at Glu-A1, Glu-B1 and Glu-D1 loci, respectively. Less frequent alleles have been reported, for instance, three x-type HMW-GSx at the Glu-A1 locus (Margiotta et al. 1996;Gobaa et al. 2007;Ribeiro et al. 2013). The AAC glutenin allele database (http://www.aaccn et.org/initi ative s/defin ition s/Pages /Glute n.aspx) contains records of the alleles of over 8500 wheat genotypes from around the world, giving an overview of the wide diversity of HMW-GSs. Each HMW-GS has different effects on end-use quality. Thus, to breed for end-use value, several studies have ranked HMW-GS alleles in order of their influence on flour quality (Branlard and Dardevet 1985;Payne et al. 1987) and highlighted the Glu-D1d (Dx5 + Dy10), Glu-B1b (Bx7 + By8) and Glu-B1c (Bx7 + By9) alleles as being strongly associated with high quality (Pirozi et al. 2008).
For plant breeding purposes, SDS-PAGE separation is still used to identify HMW-GSs as well as LMW-GSs and select those associated with high quality. The SDS-PAGE method requires flour in order to extract the grain protein, which means that plants need to be grown to a late development stage before testing them. The method is nondestructive in that only half a grain is needed for testing. The interpretation of gel images or electropherograms is not completely accurate, even when performed by experienced staff. Some HMW-GSs may be confounded due to their quasi-identical electrophoretic mobility. Classifying unambiguously LMW-GSs by SDS-PAGE method is also difficult because of their large numbers and overlapping mobility with gliadins in the gel. This method is time-consuming and not suitable for high-throughput analysis. Typing glutenins with molecular markers would solve many of the drawbacks of the SDS-PAGE method. Typing with molecular markers requires low amounts of DNA that can be easily extracted from leaves of seedlings. In addition, high-throughput genotyping methods based on single-nucleotide polymorphisms (SNPs) are now available, even for polyploid wheats (Bérard et al. 2009;Akhunov et al. 2009;Wang et al. 2014;Rimbert et al. 2017).
Since the 1990s, many attempts have been made to develop molecular markers from glutenins sequences, generally based on the presence/absence or amplicon-size polymorphisms, as reviewed by Gale (2005) and later by Liu et al. (2012). Despite the difficulty to capture the complexity of these gene families or to interpret the results obtained, especially for LMW-GSs, some of these markers were successfully used. For instance, Jin et al. (2011), Liu et al. (2010 and more recently Iba et al. (2018) have reported that the identification of HMW-GS or LMW-GS genes by DNA-based methods agrees with the results from SDS-PAGE. Despite continued work (see for instance Liu et al. 2008;Xu et al. 2008), other than the G-A change used to discriminate between the Dx2 and Dx5 alleles (Schwarz et al. 2003), few SNPs are available to distinguish between different HMW-GSs. This may be due to the difficulty of developing gene-specific markers for a multigenic family.

3
The repetitive domain of HMW-GS genes also limits the choice of site for marker development.
The objective of our study was to develop and validate a set of SNPs to identify the main HMW-GSs. We decided to type SNPs by a flexible method based on competitive allele-specific PCR (KASP assay). Although the design of allele-specific primers is constrained by the polymorphism location, the common allele can be designed at a gene-specific location to facilitate base calling in the wheat hexaploid genome. The SNP markers designed were genotyped in a wheat core collection (Balfourier et al. 2007). The clustering observed differed slightly from that based on the SDS-PAGE profiles. The relationship between these markers and end-use quality was established by measuring alveographic parameters. Therefore, the set of markers from HMW-GS sequences developed in this study is of significant value to wheat breeders, who will be able to characterize HMW-GSs routinely and unambiguously from DNA.

Plant materials and phenotyping
The INRA core collection includes 366 accessions from 70 different countries, chosen to represent the genetic diversity present in cultivated wheat (Balfourier et al. 2007). These accessions are either landraces from the nineteenth century or cultivars from the twentieth century. In this work, 364 lines of the core collection were used. All the seeds used for DNA extraction, provided by the Biological Resources Centre on Small Grain Cereals (http://www6.clerm ont.inra. fr/umr10 95_eng/Teams /Exper iment al-Infra struc ture/Biolo gical -Resou rces-Centr e), were obtained from self-pollinated ears. Fresh leaves of five plants per accession were pooled, and bulk genomic DNA was extracted from 100 mg of frozen leaves with the BioSprint 96 kit using a BioSprint workstation (Qiagen).
All accessions of the core collection were grown at Clermont-Ferrand in 2006 as described in Bordes et al. (2008). Seeds were harvested in bulk and phenotyped for quality traits (Bordes et al. 2008). In the present work, we used average single grain dry mass (mg DM grain −1 ), the total quantity of protein per grain (mg N grain −1 ) calculated from the average single grain dry mass and wholemeal flour protein concentration (mg protein g DM −1 ), grain hardness (dimensionless) and alveographic parameters, i.e., dough strength (W, 10 −4 J), tenacity (P, mm H 2 O) and extensibility (L, mm). Grain hardness and wholemeal flour protein concentration were determined by near-infrared spectroscopy. These data were provided by Bordes et al. (2008).
For each accession, proteins were extracted from 10 mg of wholemeal flour ground in a Cyclotec mill (sieve 0.75 mm, mill 6800, FOSS Electric A/S, Hillerod, Denmark). HMW-GS were fractionated in vertical slabs using SDS-PAGE with a modified protocol based on that of Singh et al. (1991). They were identified according to the numbering system developed by Payne and Lawrence (1983) with the names available at https ://shige n.nig.ac.jp/wheat /komug i/.

SNP discovery and genotyping
Sequences from the main alleles of the six HMW-GS genes were retrieved from GenBank (http://www.ncbi.nlm.nih.gov/ genba nk/). Many sequences are available for these genes, so a first set was collated to include at least one sequence per gene encompassing, where possible, the entire coding DNA sequence (CDS), the 5′ untranslated regions (UTRs), the promoter region (about 1-kb region upstream of the start codon) and the 3′ UTR (Table 1, Fig. S1). Then for each HMW-GS gene class, except the silent Glu-A1-2, another set of sequences was collated either from GenBank or from data obtained by Ravel et al. (2006; (Table 1). Each of the latter five files was aligned with Clustal X (Larkin et al. 2007), and alignments were improved manually. Allelic SNPs were detected in each of these alignments. The most relevant SNPs detected to discriminate HMW-GSs were retained for genotyping. SNPs within the repetitive domain were discarded as designing specific primers in this region was deemed impracticable.
KASPar assays (LGC Genomics, LLC, Beverly, MA, USA) were developed to genotype relevant SNPs. This method is based on competitive annealing of two labelled allele-specific primers (Gao et al. 2016). The 3′ end of each allele-specific primer was designed to target the allelic SNPs detected in the alignments for each HMW-GS gene class. A third common primer was designed in the vicinity of the other two to be copy-specific (for A, B or D chromosomes). The 3′ end of the common primer targeted polymorphisms between the six HMW-GS genes deduced from the first alignment made with one sequence per HMW-GS gene. All the primers were designed with Primer3 (Untergasser et al. 2012).
Primer mix was prepared as recommended by LGC Genomics and described in Gao et al. (2016). The total reaction volume was 5 μL in a 384-well plate, composed following LGC Genomics recommendations. The concentration of MgCl 2 was adjusted to 2.2 mM when the mean GC per cent of the common primer sequence was between 33 and 55% ( Table 2). The amplification reaction was performed in a Veriti ® thermal cycler (Applied Biosystems) programmed as recommended by LGC Genomics with a modification in the 10 touchdown cycles, so annealing temperature was decreased by 0.8 °C per cycle from 65 to 57 °C for 1 min, followed by 40 cycles of 94 °C for 20 s and 57 °C for 1 min.

3
The LightCycler ® 480 System (Roche) was used as a plate reader to detect the fluorescence signal.

Statistical analysis and haplotype representation
The distribution observed for the HMW-GSs identified by SDS-PAGE was compared in a Chi-squared test with that reported in Shewry et al. (1992), which was derived from the 300 accessions characterized by Payne and Lawrence (1983).
The relevance of the main factors, i.e., marker, haplotype (i.e., the combination of alleles at each marker for a given locus) and SDS-PAGE profiles, was tested independently through ANOVA conducted with the linear model (lm) function of R version 3.3.0 (R Core Team, 2016). In these analyses, heterozygous markers were treated as missing data. In each analysis, the model included covariates to estimate the part played by the different markers (or by the haplotypes) in the explanation of the phenotypic traits under study. These covariates were the structure components of the core collection (Bordes et al. 2008), the grain hardness and the protein concentration. The structure components of the core collection were considered to avoid spurious association (Flint-Garcia et al. 2003). The grain hardness and the protein concentration were taken into account because these traits are known to modify dough viscoelasticity (Branlard et al. 2001;Eagles et al. 2006). Rare haplotypes (frequency < 2.5%) were discarded from the analysis. F tests were considered significant at α = 0.01. For haplotypes and SDS-PAGE profiles, means were compared by the Student-Newman-Keuls test function from the R library Agricolae. Means were judged to be significantly different when the P values of the Student-Newman-Keuls test were < 0.05. For loci with a high level of polymorphism, the SNP combination (haplotype) was represented using sequence logos, plotted with seqLogo available in Bioconductor (Bemborm 2016), with the overall height of the stacks proportional to the information content at the position considered and the Gene-specific primers Allelespecific primers height of letters within the stack, indicating the relative frequency of each nucleotide.

Results
In this work, we developed a series of PCR-based assays to readily identify wheat HMW-GSs based on competitive annealing of primers to SNPs using the KASP approach (Semagn et al. 2014). The HMW-GSs of 364 accessions were identified by SDS-PAGE profiling and by molecular markers. The results obtained with the two methods were compared to each other. At Glu-A1 and Glu-D1, the molecular marker captured the diversity of the SDS-PAGE alleles rather well. Most discrepancies were found at Glu-B1. However, two markers at each Glu-B1 gene and their corresponding haplotypes were more significantly associated with the rheological properties of the dough than were the relevant SDS-PAGE alleles.

Allelic diversity of HMW-GSs assessed by SDS-PAGE
For 364 accessions of the INRA core collection (n = 366), HMW-GSs were characterized by SDS-PAGE. The results obtained were statistically compared to previously published data (Shewry et al. 1992), and significant differences were pointed out between both collections. In the core collection, 4, 18 and 7 known HMW-GS alleles were detected at Glu-A1, Glu-B1 and Glu-D1, respectively (Table 3, Table S1). In addition, 10, 18 and 11 lines had previously undefined SDS-PAGE profiles at Glu-A1, Glu-B1 and Glu-D1 loci, respectively (Table S1), that may be indicative of novel alleles. Up to 8.7% of the accessions are heterozygous at least at one of the three loci. At Glu-A1, the three main alleles a, b and c were detected in 340 lines. In a Chi-squared test, no significant difference was observed (χ 2 = 0.15, P = 0.92) between the allele frequencies in the INRA core collection and those reported by Shewry et al. (1992). As expected, the Glu-A1c (Axnull) allele found in 41% of the core collection is the major allele. The rare allele Glu-A1t characterizes a single line, accession number 2301 from Australia (Table S1). We observed 10 lines with undefined patterns that could correspond to four novel alleles (Table S1).

Development and validation of SNP markers to establish HMW-GS composition
To establish HMW-GS composition from genomic DNA, we developed SNP markers for genotyping by a strategy based on competitive allele-specific PCR (KASP). These markers were developed based on the polymorphisms found after aligning the nucleic sequences of HMW-GS genes. The markers all target non-repetitive regions of the genes, i.e., promoters, UTRs or non-repetitive regions of the CDS (Table 2, Figure S1).
Twenty SNPs and 2 indels were found in the alignment of sequences at Glu-A1-1, which was restricted to the CDS ( Figure S2). The first two SNP assays developed target G/A differences in the non-repetitive 5′ region of the CDS ( Table 2). The A allele at the first marker identified Glu-A1a (Ax1), while the G allele at the second locus identified Glu-A1c (Axnull). Three haplotypes are thus expected: AA-AA, GG-AA and GG-GG for the HMW-GS Ax1, Ax2* (Glu-A1b) and Axnull, respectively.
To identify SNPs in Glu-D1-1, sequences up to about 1250 nucleotides upstream of the start codon were aligned ( Figure S3). This region contains seven SNPs. We designed two primer pairs to discriminate the alleles encoding the main HMW-GSs (Dx2, Dx3 and Dx5) ( Table 2). The T polymorphism (or A in the alignment as allele-specific primers were designed on the reverse strand) at the marker Mk-D1-1-4 is specific to the Dx5 allele and the G nucleotide at Mk-D1-1-8 characterizes the Dx3 allele. We observed 25 SNPs and two insertion-deletions (indels) between both Glu-D1-2 sequences comprising the promoter region and the CDS ( Figure S4). Allelespecific primers of the marker developed for this SNP, Mk-D1-2-1, were designed on the reverse strand. The A and G polymorphisms (T and C in the alignment, Figure  S4) characterize Dy10 and Dy12 alleles, respectively. Considering these SNPs, the haplotypes expected at Glu-D1 by combining alleles at Mk-D1-1-8, Mk-D1-1-4 and Mk-D1-2-1 are AA-CC-GG for Glu-D1a (Dx2 + Dy12), GG-CC-GG for Glu-D1b (Dx3 + Dy12) and AA-TT-AA for Glu-D1d (Dx5 + Dy10). The rare Glu-D1e allele (Dx2 + Dy10) could be distinguished with this set of markers by the haplotype AA-CC-AA.
Glu-B1 is known for its high level of diversity, which makes it difficult to identify HMW-GS encoded at this locus. Unsurprisingly, the alignment of sequences (including the promoter region, UTRs and CDS) reveals up to 124 SNPs plus 8 indels (> 1 bp) in the CDS of Glu-B1-1 and 94 SNPs plus 4 indels in the CDS of Glu-B1-2. Six markers were designed for each gene to reflect this diversity as much as possible (Table 2, Figures S5 and S6). The decision tree deduced from the alignment at Glu-B1-1 is as follows: T at Mk-B1-1-1 specifies the HMW-GS Bx7 or Bx17 alleles (Ravel et al. 2014); T at Mk-B1-1-4 specifies the Bx6 and Bx6.1 alleles; A (that is T on the reverse strand) at Mk-B1-1-8 specifies the Bx13 allele; and A at Mk-B1-1-9 specifies the Bx20 or Bx14 alleles. The two remaining markers Mk-B1-1-11 and Mk-B1-1-OE were designed in the promoter region to distinguish the Bx7 overexpressed allele (Bx7 OE ). The C polymorphism at Mk-B1-1-11 should therefore indicate this HMW-GS. However, only a few sequences were available at this location for our alignment, making it difficult to predict the behaviour of this marker with confidence. In addition, we noticed that its allele-specific primers did not match the sequence of the Bx23 HMW-GS and missing data at this marker may characterize this allele. Mk-B1-1-OE was designed at the junction of both repeats of Glu-B1-1 to again characterize Bx7 OE . At Glu-B1-2, the alignment illustrated the difficulty of identifying SNP markers for By types. Indeed, there are polymorphisms even between HMW-GS sequences of a given By type. We designed six markers. Mk-B1-2-19, Mk-B1-2-11, Mk-B1-2-14 and Mk-B1-2-20 should be able to discriminate the By8 alleles. G (that is C on the reverse strand) at Mk-B1-2-19 indicates the By8 allele associated with Bx6, G at Mk-B1-2-15 should specify the By15 alleles (and some By8 alleles), while T at Mk-B1-2-18 should characterize By18. However, behaviour of these markers is difficult to predict due to the intra-By variability. (Table S2) At Glu-A1-1, 340 accessions have one of the three main SDS-PAGE alleles Glu-A1a, b or c (for Ax1, Ax2* and Axnull, respectively). Of these, SNP calling with at least one of the two molecular markers produced missing or heterozygous data in only 3 and 10 accessions, respectively. In the 327 accessions with complete data, results from SDS-PAGE and molecular markers converged to specify identical alleles for all but six lines (1.8%; Fig. 1). Accession 8233 has the haplotype GG-GG diagnostic of Glu-A1c even though it has a typical Glu-A1b profile. The three lines 1236, 1332 and 2153 are AA-AA (Glu-A1a) although the electrophoretic profile was characteristic of the null allele. The accessions 236 and 4482 are both scored as GG-AA and hence as Glu-A1b, but in electrophoresis they appear as Glu-A1c and Glu-A1a alleles, respectively. Each marker was responsible for half of the misclassifications (3 each), indicating that the error rate per marker is less than 1%.

Comparison of molecular markers and SDS-PAGE profiling for HMW-GS identification
At Glu-D1, 314 lines were assigned as having one of the main alleles, Glu-D1a (Dx2 + Dy12), Glu-D1b (Bx3 + Dy12) or Glu-D1d (Dx5 + Dy10). In these lines, we only observed one instance of missing data at Mk-D1-1-4. As expected, lines with the haplotypes AA-TT and GG-CC have the Dx5 and Dx3 HMW-GS, respectively (Fig. 2). However, the lines with Dx3 (n = 28) are distributed between two haplotypes, GG-CC and AA-CC (7 and 21 lines, respectively). Thus, Mk-D1-1-8 allows 25% of the lines with Dx3 to be detected. No marker was designed for the Dx4. Consequently, the lines with this allele cannot be discriminated. No data were missing for the marker Mk-D1-2-1 at Glu-D1-2 and results were perfectly consistent with SDS-PAGE identification (Fig. 2).
The SDS-PAGE allele at Glu-B1 was unambiguously identified for 318 lines. Genotyping with the six SNP markers designed at Glu-B1-1 gave missing data in one to seven lines per marker. The percentage of missing data for these markers was thus low (no more than 2%). As predicted, the marker Mk-B1-1-11 gave three alleles (Table 4). Mk-B1-1-11 data were missing for all the lines with Bx20 or Bx23. All these lines formed a third cluster with Mk-B1-1-11 showing its reproducibility. This third group is likely due to sequence dissimilarity between the allele-specific primers and the target genome region as observed for Bx23. Therefore, lines in this cluster can be considered as offtarget variants (OTVs). Taking into account the OTVs at Mk-B1-11, the markers at Glu-B1-1 defined nine haplotypes, most of them restricted to a given Bx type (Fig. 3). The haplotypes called H6, H7, H8 and H9 in Table 4 specify the Bx13, Bx14, Bx23 and Bx20, respectively. Four haplotypes (H1, H2, H3 and H4) contains all the lines with the Bx7 and BX7 OE . The haplotypes H1 and H4 include each two HMW-GS types. In H1, we observed all the B7 OE lines plus two Bx7 lines. H4 includes the Bx7 and Bx17 types. All the Bx17 lines are found in this haplotype (Fig. 3). When the y type is considered, all the Bx7 + By9 lines, except for accession 1005, are also in H4 (Fig. 3), and all the lines with Bx7 and silent for Glu-B1-2 are in H2 (except for accession 7276). Therefore, lines with the Bx7 + By8 combination are either in H2 (62 lines, 58%) or H4 (45 lines, 41%). The haplotype H5 corresponds to the Bx6 and Bx6.1. Mk-B1-1-8 and Mk-B1-1-9 are diagnostic markers as one of their alleles is strictly associated with a given Bx type. All the lines (n = 13) with T at Mk-B1-1-8 have the Bx13 allele and those with A at Mk-B1-1-9 have the Bx20 allele (n = 15) except accession 1232. The T allele at Mk-B1-1-4 signed all the lines with the HMW-GS Bx6 or Bx6.1. At Mk-B1-1-1, 230 lines have a T polymorphism specifying Bx7 or Bx17 HMW-GSs. With this marker, 98% of the lines with these two HMW-GSs were correctly identified, apart from the accession 1236, which has C instead. In addition, accessions 1232 and 23896 have the T polymorphism despite having neither the Bx7 nor the Bx17 allele (Table S1). The misclassification of accession 1232 had already been noted with Mk-B1-1-9, so may have been due to experimental error. As expected, Mk-B1-1-OE discriminated the lines with the Bx7 OE allele, which all have the C polymorphism.
There were only a few instances of missing data (for 1 to 5 lines) with four out of the six markers at Glu-B1-2. However, with Mk-B1-2-11 and Mk-B1-2-18 there were missing data for 31 lines. None of the 14 haplotypes (Table 4) defined by these markers matched the groups defined by SDS-PAGE profiles. Deducing the SDS-PAGE allele at this locus through SNP markers is clearly difficult (Fig. 4). It is nevertheless possible to say that all the lines which are A at Mk-B1-2-14 or G at Mk-B1-2-19 have the HMW-GS By8, except for two lines with the A at Mk-B1-2-14 that are silent at Glu-B1-2. As expected, all the lines with the T at Mk-B1-2-18 are By18, but 41% of the lines with By18 do not have T at this site. Although Mk-B1-2-15 was designed from a sequence representative of By15, neither of the two lines in our collection with this HMW-GS had the expected G allele. All the lines with By16 have the T polymorphism at Mk-B1-2-20 as do 34 other lines (28 of which have the silent allele at Glu-B1-2). Puzzlingly, both alleles at Mk-B1-2-11 were observed in all the By types, except By22 in six lines. Thus, this marker specifies two groups without any correspondence to a particular HMW-GS. The markers at Glu-B1-2 were analysed in terms of the haplotype at Glu-B1-1. Mk-B1-2-19 is a diagnostic marker as the G allele is strictly associated with the By8 of Glu-B1d (Bx6 + By8). This marker could be used to specify the electrophoretic alleles of lines sharing identical haplotypes at Glu-B1-1 like Bx6 and Bx6.1. In this case, T at Mk-B1-2-19 clearly indicates By22 associated with Bx6.1. Similarly, haplotypes AA-CC-CC-TT-GG-CC (H2) and AA-GG-CC-TT-GG-CC (H4) at Glu-B1-1 correspond to several SDS-PAGE alleles. For these two haplotypes, taking markers at Glu-B1-2 into account could be conclusive. For H4, as expected, T at Mk-B1-2-18 discriminates 59% of lines (10 out 17) with the Glu-B1i (Bx17 + By18) allele. For both haplotypes, A at Mk-B1-2-14 is clearly associated with the By8 allele of Glu-B1b. This allows the assignment of 66% of lines with the Glu-B1b allele. Finally, for haplotype H2, G at Mk-B1-2-14 and T at Mk-B1-2-20 indicate Glu-B1a (Bx7 + ByNull). Thus, correct assignment was possible with markers at Glu-A1 and Glu-D1, but some discrepancies remain at Glu-B1. Because we focused on the main SDS-PAGE alleles, the less frequent alleles could not be typed with the markers developed here. Generally, rare electrophoretic alleles are confounded with more frequent ones, such as in Comet (accession 2301), which has the rare Glu-A1t allele and was classified as Glu-A1b using molecular markers or in lines with Dx4, Dx2.2 or Dxnull, which are all confused with Dx2 (Fig. 2). Because of the discrepancies found at Glu-B1-1, the question arose as to whether SNP markers at this locus are as relevant as the SDS-PAGE alleles in terms of technological quality.

SNP markers in HMW-GS genes to breed for technological quality
For Glu-B1, both techniques reveal similar, but not identical, groupings. To see whether molecular markers at this locus could be useful in improving the technological value of bread, their effects on rheological parameters of dough were measured and statistically compared to those of the electrophoretic alleles using a linear model with the structure components of each line, the grain hardness and the grain protein content as covariates. Such an analysis was also performed using the haplotypes resulting from the combination of individual markers at each gene or at the locus to reflect as best as possible the polymorphism found at the Glu-B1 locus as a factor.
Significant associations at 1% between some markers and technological quality traits (alveograph parameters and total quantity of protein per grain) were detected. Two markers per locus were significantly associated with at least one of the traits studied, Mk-B1-1-1 and Mk-B1-1-8 at Glu-B1-1 and Mk-B1-2-11 and Mk-B1-2-14 at Glu-B1-2 ( Table 5). The polymorphism T at Mk-B1-1-8 increased the quantity of protein per grain by 20%. The polymorphism T at Mk-B1-2-11 strongly increased strength, tenacity and the ratio of tenacity to extensibility. Strength, tenacity and the tenacity-to-extensibility ratio were on average 1.6, 1.35 and 1.4 times higher for the T than for the C polymorphism at this marker, respectively. This marker did not influence dough extensibility, suggesting that its action on strength is explained by the significant changes in tenacity alone.
A missing value at a single SNP marker generates an undefined haplotype. To bypass this problem, we decided to study haplotypes derived from the four significant markers only. Results from haplotype analysis confirmed preceding results (Table 6). When significant, haplotypes explained either a similar or even a higher proportion of the phenotypic variance observed than individual markers. As Glu-B1-1 and Glu-B1-2 are tightly linked, we also defined global haplotypes from the combination of haplotypes at each locus. Global haplotypes significantly affected all the traits studied except for extensibility. Global haplotypes generally affected the traits more strongly than the haplotypes made only with markers at each HMW-GS gene. For example, for dough strength the global haplotype explained the highest proportion of the phenotypic variance (13%; Table 6). On average, dough strength of the best global haplotype (TT-CC at Glu-B1-1 and TT-GG at Glu-B1-2) was more than twice that of the worst (CC-CC and CC-GG, respectively). SDS-PAGE alleles at Glu-B1 loci significantly influenced the total quantity of protein per grain, the dough strength and tenacity (Table 7). In accordance with results from global haplotype analysis, the strongest effect is observed for dough strength with up to 7% of the total phenotypic variance explained by the SDS-PAGE profile at Glu-B1-1 and Glu-B1-2. Glu-B1c (Bx7 + By9), Glu-B1b (Bx7 + By8) and Glu-B1i (Bx17 + By18) are the best alleles with strength values on average 70% higher than those of the worst group made up of the Glu-B1a (Bx7 + Bynull), Glu-B1d (Bx6 + By8), Glu-B1e (Bx20 + By20) and Glu-B1f (Bx13 + By16) alleles.
At Glu-B1-2, Mk-B1-2-11 appeared to be an interesting marker to improve quality. It is one of the most associated Fig. 4 Logo sequences of haplotypes for each HMW-GS at Glu-B1-2 represented by at least two lines in the INRA core collection. The order of markers is according to the order in the sequence 1 3 markers without being related to a given SDS-PAGE allele. This marker could be seen as a novel means to improve quality. Remarkably, the proportion of phenotypic variance explained by the model based on molecular markers combined into global haplotypes for both HMW-GS genes was about twice as high as that obtained with the model based on SDS-PAGE information.

HMW-GS loci in the core collection are highly polymorphic
Using the SDS-PAGE technology on a collection of 364 lines representing the worldwide diversity of bread wheat, and taking into account novel alleles detected, we observed at least 8, 22 and 9 alleles at the Glu-A1, Glu-B1 and Glu-D1 loci, respectively. We found ~ 5% of lines exhibited heterozygosity at least at one of the six Glu-1 loci. Heterozygosity may reflect the fact that the collection includes some lines that are really a mix of seeds, which may occur when landraces are used and the seed sets are not pure enough. Heterozygosity may also result from duplication of a locus within the genome, a phenomenon that is frequent in the wheat genome (Choulet et al. 2014;Glover et al. 2015). The number of alleles detected was more than was described by Shewry et al. (1992), who reported the presence of 3 (a, b, c), 11 (from a to k) and 6 alleles for Glu-A1, Glu-B1 and Glu-D1 loci, respectively, in a collection of 300 lines. The latter alleles, apart from Glu-Bj (Bx21 + By21), are also present in the INRA core collection we studied. In both collections, the most frequent alleles at each locus were the same although we observed different frequencies of alleles at Glu-B1 probably due to the different genetic structures of these two collections. Indeed, the set of lines used by Shewry et al. (1992) was not representative of the worldwide wheat diversity. In addition, we observed novel HMW-GSs at each locus illustrating the high level of diversity of the INRA collection (Balfourier et al. 2007;Horvath et al. 2009) and validating the sampling strategy that was based on maximizing the number of alleles at neutral markers (Schoen and Brown 1993) and the number of geographical origins. The novel HMW-GSs found should be confirmed before being added to the catalogue of Glu-1 alleles started by Payne and Lawrence (1983), available through databases (e.g., the database of the National BioResource Project at https :// shige n.nig.ac.jp/wheat /komug i/genes /macge ne/2013/GeneS ymbol .pdf), and named according to current nomenclature. The high level of polymorphism found in the INRA core collection proved a valuable source of material to reach our objective. This material is likely to be much more complex than that generally used by breeders.

How well do KASPar assays identify HMW-GSs compared to SDS-PAGE?
HMW-GSs and LMW-GSs strongly influence dough functionality (Shewry et al. 2002). For their identification, molecular markers have many advantages over biochemical markers. Therefore, HMW-GS and LMW-GS gene markers have been developed (see for instance Zhang et al. 2004;  The general linear model comprised the main effect haplotype, five ancestor groups, grain hardness and grain protein content as covariates. All haplotypes with a frequency < 0.025 were discarded. Differences were judged to be significant at a   Wang et al. 2009Wang et al. , 2010Iba et al. 2018). In addition, different systems based on PCR markers have been developed as reported for LMW-GSs by Zhang et al. (2011). In our work, for an easy and early identification of HMW-GSs, we focused our marker development strategy on SNPs since such markers are now recognized worldwide as the best markers for high-throughput genotyping in most species, even in polyploids. Numerous strategies for SNP genotyping, reviewed by Paux et al. (2011), have given rise to several high-throughput wheat genotyping tools like highdensity arrays (for instance, see Wang et al. 2014, Rimbert et al. 2017 or KASP assays (Allen et al. 2011). Here, the set of SNP markers developed is based on the KASPar technology, which requires two allele-specific primers and a common primer. The latter primer has to target a locusspecific region. This makes KASPar technology flexible and well suited for use in polyploid species or when genotyping members of multigene families. This flexibility and the relatively low cost of KASPar technology explain why it has been adopted for plant genotyping for research or breeding purposes. In wheat, it has been successfully used for fingerprinting diverse collections of accessions (Gao et al. 2016) or for marker-assisted breeding as demonstrated by Neelam et al. (2013), who developed a diagnostic KASPar assay for a locus involved in leaf rust resistance. A set of KASPar markers to identify HMW-GSs should provide breeders with a more convenient and efficient tool than SDS-PAGE to select HMW-GSs associated with a high technological quality. To reach this objective, we developed 17 KASPar assays and observed a high degree of concordance between the HMW-GS identified by SDS-PAGE and molecular markers despite a few discrepancies between the two methods. Indeed, just two and three SNP markers, respectively, can be used to differentiate the three main alleles at Glu-A1 and up to three alleles at Glu-D1. However, identification of Glu-D1b is partial. The six markers at Glu-B1-1 discriminated the main Bx types known, while six markers at Glu-B1-2 can help to define alleles at Glu-B1 once the Bx type is determined. The differences observed are mainly explained by the fact that SNP markers distinguished more alleles than electrophoresis of proteins. This is the case for the HMW-GS Bx7, By8 and By18. Based on markers, the By8 associated with Bx7 differs from the By8 associated with Bx6. SNP markers discriminated three groups including lines with Bx7 in addition to the group with Bx7 OE . This result is not surprising since the four alleles Glu-B1b, Glu-B1u, Glu-B1ak and Glu-B1al (Bx7 + By8, Bx7* + By8, Bx7* + By8* and Bx7 OE + By8*, respectively) are difficult to distinguish using SDS-PAGE (Marchylo et al. 1992). Similarly, markers indicate that By18 does not correspond to a unique sequence. This is also the case for the Dx3 allele found in at least two molecular haplotypes. One of them is identical to that of Dx2, making it impossible to detect all the lines having the Dx3 allele with the SNP markers developed here. Discrepancies could also result from a lack of discriminating SNP markers as for Bx17, which has a molecular haplotype similar to those of some lines carrying Bx7. All SDS-PAGE silent alleles gave clear SNP alleles, confirming that they do not derive from deletions. Indeed, the molecular basis of the silent Glu-A1-2 gene may involve stop codons that would lead to premature termination of translation (Forde et al. 1985;Bustos et al. 2000), transposon-like insertion in the coding sequence (Harberd et al. 1987) or transcriptional inactivation due to alteration probably in the distal 5′ or 3′ regions (D'Ovidio et al. 1996). Discrepancies observed in typing HMW-GSs by SDS-PAGE or molecular markers were expected as electrophoresis is based on the length polymorphism of proteins, while KAS-Par assays are based on nucleotide sequence polymorphism. The latter were more frequent at Glu-B1 as this locus is known for its high level of diversity with more than 69 SDS-PAGE alleles already reported (Lei et al. 2006;MacIntosh et al. 2013;Janni et al. 2017). At this locus, for Glu-B1-2, Mk-B1-2-11 clearly identified alleles which did not match any known SDS-PAGE allele. This marker thus brings novel information and confirms that the level of polymorphism observed at the gene level is higher than at the protein level. Resolution at Glu-B1 is improved by taking into account the haplotypes of both genes Glu-B1-1 and Glu-B1-2. However, the question then arises as to whether molecular haplotypes at Glu-B1 would be useful for breeding purposes even if they do not perfectly reflect the diversity profiled by SDS-PAGE.

A set of SNP markers at Glu-B1 to improve end-use quality
Association analysis was used to investigate the impact of SNP markers at Glu-B1 on the total quantity of protein per grain and the alveographic parameters (dough strength, tenacity, extensibility and the tenacity-to-extensibility ratio). Two markers per HMW-GS gene, and their respective haplotypes, affect all these traits except extensibility. The strongest effect is observed on dough strength. This confirms that HMW-GSs have a key role in determining dough tenacity and strength by forming the backbone of the gluten network (Shewry et al. 2001). Extensibility was not modified by HMW-GSs, as was expected, as this parameter depends mainly on LMW-GSs (Rasheed et al. 2014). The global haplotype defined by the combination of significant markers at Glu-B1-1 and Glu-B1-2 explained a larger proportion of the phenotypic variance than was found with the SDS-PAGE alleles. Breeding for processing quality using these four SNP markers at Glu-B1, plus the two at Glu-A1 and three at Glu-D1 (markers for the latter two genes matched the SDS-PAGE alleles well) appears more efficient than using SDS-PAGE alleles. In addition, using molecular markers will offer breeders a simple low-cost tool for selection at the plantlet stage. This time saving is a great advantage compared to SPS-PAGE that can only be done on grains.
To conclude, the SNP markers developed here provide an easy tool for typing HMW-GSs. Generally, SNP-defined haplotypes fit SDS-PAGE allele profiles well despite a few differences. The minimum number of markers needed to select for improved processing qualities is eight, namely Mk-A1-1-12 and Mk-A1-1-13 at Glu-A1, either Mk-D1-1-4 for Glu-D1-1 or Mk-D1-2-1 for Glu-D1-2 (redundant markers probably due to a high level of linkage disequilibrium between the two loci) and Mk-D1-1-8 at Glu-D1, plus at least four markers (out of 12) associated with rheological traits at Glu-B1. Despite the absence of polymorphism at Glu-B-1-2-11, similar associations have been detected in a file of elite lines (n = 245) showing the consistency of our results. However, HMW-GSs are not the sole determinants of end-use performance of flour as breeders have to deal with other traits, especially as far as storage proteins are concerned, with low molecular weight glutenins and gliadins. The HMW-GS markers have been tested here on a large diverse set of wheat germplasm, so they are ready for use in large-scale and high-throughput HMW-GS screening in breeding programs.