Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction_ICT in Agrifood Sustainability_Research

November 07, 2022 | Molecular Plant |

Introduction: Climate change and population growth necessitate a transition from traditional phenotypic selection to data-driven "smart breeding". A research team led by CIMMYT-China and the Chinese Academy of Agricultural Sciences review how big data analytics and artificial intelligence are transforming crop breeding through the integration of genomic and enviromic information. The paper introduces the integrated genomic–enviromic prediction (iGEP) framework, which links genetic, environmental, and management data to better capture genotype performance across diverse environments. Drawing on examples from maize, wheat, and rice, the review positions AI-driven prediction as a core component of future “smart breeding” systems.

Key findings: The study introduces iGEP as an extension of genomic selection, demonstrating that treating environmental components (E) with comparable dimensionality to genotypes (G) and phenotypes (P) improves prediction under strong genotype-by-environment (G×E) interactions. Across reviewed cases, incorporating biologically informed enviromic covariates—such as temperature, radiation, and water-stress indices—raises prediction accuracy by approximately 10–30% compared with genome-only models. The authors show that machine learning and deep learning approaches better capture non-linear and multi-layer biological relationships than linear regressions. When transcriptomic and metabolomic data are used as intermediate phenotypes, maize yield prediction accuracy increases from 0.159 (genomic selection alone) to 0.245. High-throughput phenotyping (HTP) and remote sensing further enable earlier, more cost-efficient selection.

Importantly, iGEP supports crop redesign at multiple scales. At the micro scale, it enables the redesign of genes, metabolic pathways, and regulatory networks underlying complex traits. At the macro scale, it informs the redesign of individuals, populations, and species, supporting concepts such as perennial crop development and virtual evaluation of genotypes in untested or future environments for climate-adaptive variety targeting. To address big-data challenges, the study emphasizes overcoming the curse of dimensionality through feature selection and dimensionality reduction, as well as the digitalization of breeders’ experiential knowledge via AI-assisted decision support. Transfer learning is identified as a key approach for extending iGEP to non-model crops with limited labeled data, while managing the “9 Vs” of breeding big data—volume, velocity, variety, and veracity among them—remains essential for translating smart breeding into sustained genetic gains.

Figure | Overview of a system using big data and AI for smart breeding.

Data are collected for envirotype (E), genotype (G), and phenotype (P). Big data are stored, processed, and sampled to build and validate models with machine and deep learning and AI-assisted deep analyses. Trained models are used for phenotype prediction and selection to develop improved varieties. Smart breeding is driven by data, computational capacity, algorithms, and knowledge. GCA, general combining ability; SCA, specific combining ability; MAS, marker-assisted selection; MARS, marker-assisted recurrent selection; GS, genomic selection; iGEP, integrated genomic-enviromic prediction.