Welcome to e-Maize Challenge

Our first "Corn Maze" is a "Maize" quesion.

e-Mazie Challenge Winners



Shuqin Jiang and Jun Yan
China Agricultural University




Ao Li
Shanghai Clinical Research Center and Shanghai Fenglin Clinical Laboratory Company




Haixiao Dong
Washington State University and Jili University



1 Prize

The winners of this e-Maize Challenge are divided into: champion, runner-up. Champion will receive 100,000 RMB (pre - tax), runner - up will receive 10,000 RMB (pre - tax). The winners will be invited to participate in the e-Maize Winter Camp and Seminar organized by Huazhong Agricultural University and Tsinghua University.

2 e-Maize Challenge date:

From May 1 to October 1.

3 Task

Use the genotype data to predict the phenotypes of Maize.

The robustness of statistical models or methodologies will be assessed by comparing the predicted values of three traits with the real measured values.

4 The evaluation of the prediction results

The e-Maize challenge data comprise genotype and phenotype files, generally, including approximately 1.9 million SNPs and 6210 hybrid lines with three traits. We provide both training and test data sets:

a. Training set: used for training model and cross-validation. It includes both input data (Genotype) and output data (Phenotype).

b. Test set: used for test the model’s performance. It includes only input data.

The pedigree information, indicating the identifiers of female and male for each hybrid line, where "f" means the female, "m" means the male; totally, 6210 hybrids are derived from 207 female (f1-f207) and 30 male (m1-m30) parents.

4.1 The competition is evaluated as follows:

The entire survey group to the phenotypic data availability, can be divided into training data set (Training set, blue in the figure) and predictive test data set (Test set, gray figure):

a. The Training set contains the complete SNP genotype (input) and trait phenotype (output) for all hybrids. Participants can use this data set to develop and refine the corresponding statistical algorithms.

b. The Test set contains only the SNP genotype (input) of the hybrid, and the participant needs to use the developed algorithm to predict the phenotype of the hybrid in this dataset by genotype.

4.2 Result submission

The robustness of statistical models or methodologies will be assessed by comparing the predicted values of three traits with the real measured values. For this kind of phenotype data with continuous distribution, the accuracy of prediction will be judged by Pearson correlation coefficient (PCC) between predicted values and real-measured values for each of the three traits.

4.3 The competitors are requested to submit 2 result files:

4.3.1 A prediction file (MS Excel): predicted values of 3 phenotype traits for hybrids in the test set (see template file “sample_results”).

4.3.2 An introduction file (MS word or PDF): describe your computational model briefly; benchmark your model; particularly, please provide the prediction accuracy for each of 3 traits (Pearson correlation coefficients (PCCs) between predicted values and real-measured values) and computational time based on N-folds cross-validation on the training set.

Please send the forecast results document, and the brief introduction of prediction model to IMAZE101@163.COM

Result Template Submit

5 e-MAIZE Challenge Official Challenge Rules.


a. Participation in the e-Maize Challenge means to submit entries to leaderboards or final entries to the any of the e-Maize Challenges.

b. Cheating by, for example, misleading other participants with decoy submissions, or using fake identities to make more submissions than allowed, is tantamount to scientific fraud. iMaize Group takes incidences of cheating extremely seriously...

Read more

6 Data download and presentation.

Using the second-generation sequencing technique, we have completed the hybridization of the female parent (207 materials) and the male parent (30 materials) for low coverage (1x) genome-wide sequencing, co-pumping about 1.9 million representative Of the SNP (tag-SNP) tag, each maize chromosome distribution of about 150K to 280K SNP markers. This tag-SNP tag is the genotype data used for this contest.

All of the corn hybrids involved in this competition have completed field planting and phenotypic data collection for five typical maize growing areas in the country. We used the best linear unbiased prediction (BLUP) method to estimate the breeding values of each material in the phenotypic data measured in five environments to correct the experimental error between the environments. Taking into account the different heterosis, this race to determine the target trait for flowering, plant height and yield of three traits, representing low, moderate and high heterosis. This hybrid phenotype BLUP data is the phenotype data used for this contest.

Documentation Sign up and Data download

7 Background.

7.1 Heterosis:

Heterosis is a common phenomenon in the biological world, which means that the heterozygote is superior to two parents in one or more traits. For example, the hybrids of different strains, different breeds, and even different species are often more powerful than their parents, and they lead to higher organ development, increased body size, increased yield, Disease, insect resistance, stress resistance, viability, fecundity, viability and so on (see below).

Please use wikipedia to know these keywords: Heterosis.

7.2 The maize of Genetic characteristics and breeding methods:

Genetic characteristics and breeding methods of maize At present, almost all of the maize production hybrids, with high yield, precocious, plant type reasonable strong hybrid breeding has become the core of the fierce competition of corn seed industry. Corn is not only an important food crop in the world, but also an important source of feed and bioenergy raw materials, with an annual planting area of <200b><200b>more than 5 million mu. Corn as the world's earliest use of heterosis crops, the use of its heterosis for the world food security has made tremendous contributions. Due to the large differences in the heterosis of the hybrid combinations of different germplasm materials, it is the only breeding method to screen the strong combination by large-scale hybridization experiment under the conventional breeding mode. In the labor costs, funding and land area and other factors, a maize hybrid varieties of breeding often takes 6-8 years.

Please use wikipedia to know these keywords: Maize, Breeding, Crossbreed.

7.3 The modern genomics change the traditional breeding methods:

In the genomic era, the genotyping of individual materials has been much lower than the cost of phenotypic identification and is still declining year by year. In the breeding process of pig, cattle and other livestock breeding, genome selection strategy has played a fruitful role in the same breeding cycle, can be doubled to improve breeding gain. In contrast, due to the complexity of plant pedigree relationships, conventional genome selection strategies are still stretched. Therefore, the breakthrough in the statistical analysis method, the development of new algorithms, the effective integration of massive genomic data, the hybrid traits of plant hybrid phenotypes to accurately predict, which is the crop molecular breeding in the urgent need to address the core issues.

Please use wikipedia to know these keywords:Genomics, Genotype, Phenotype, Molecular breeding.