食品科学 ›› 2025, Vol. 46 ›› Issue (20): 318-326.doi: 10.7506/spkx1002-6630-20250430-261

• 安全检测 • 上一篇    下一篇

基于近红外光谱与t-SNE的机器学习方法对五常稻花香米的快速无损鉴别

孙鑫悦,李彦龙,陈明明,宋妍,钱丽丽,左锋,关海鸥,张涛,刘兴泉,周国鑫   

  1. (1.黑龙江八一农垦大学食品学院,黑龙江 大庆 163319;2.黑龙江八一农垦大学信息与电气工程学院,黑龙江 大庆 163319;3.国家粮食和物资储备局,北京 100834;4.浙江农林大学食品与健康学院,浙江 杭州 311300)
  • 出版日期:2025-10-25 发布日期:2025-09-17
  • 基金资助:
    国家重点研发计划项目(2023YFD2301604-4);黑龙江省自然科学基金联合指导项目(LH2019C075)

Rapid and Non-destructive Identification of Wuchang Daohuaxiang Rice Using Near-Infrared Spectroscopy and t-Distributed Stochastic Neighbor Embedding

SUN Xinyue, LI Yanlong, CHEN Mingming, SONG Yan, QIAN Lili, ZUO Feng, GUAN Hai’ou, ZHANG Tao, LIU Xingquan, ZHOU Guoxin   

  1. (1. College of Food Science, Heilongjiang Bayi Agricultural University, Daqing 163319, China;2. College of Information and Electrical Engineering, Heilongjiang Bayi Agricultural University, Daqing 163319, China;3. National Food and Strategic Reserves Administration, Beijing 100834, China;4. College of Food and Health, Zhejiang Agriculture and Forestry University, Hangzhou 311300, China)
  • Online:2025-10-25 Published:2025-09-17

摘要: 本研究提出一种基于近红外光谱技术结合机器学习算法快速无损鉴别五常稻花香米的方法。通过采集不同品种的大米近红外光谱数据,使用偏最小二乘回归模型确认一阶导数为最佳预处理方法,对比主成分分析和t-分布邻域嵌入降维方法,构建人工神经网络、K近邻算法、随机森林、决策树和朴素贝叶斯5 种机器学习模型进行品种判别对比。研究结果表明t-分布邻域嵌入在Calinski-Harabasz指数上提升了1 078.005 1,表现出更好的聚类效果。经过t-分布邻域嵌入降维处理后,5 种模型的各项评价指标均优于未降维处理的模型,模型平均准确率达到了95.78%,其中朴素贝叶斯模型准确率提高了18.89%,提升效果最佳,随机森林模型的判别效果最好,预测集准确率和精准率分别为98.89%和98.96%。本方法可快速无损鉴别五常稻花香米,助力品牌保护和消费者权益维护,也可为其他地理标志农产品鉴别提供新思路。

关键词: 五常稻花香米;近红外光谱技术;随机森林;相似大米品种鉴别;t-分布邻域嵌入

Abstract: This study proposed a rapid, non-destructive method for identifying Wuchang Daohuaxiang rice based on near-infrared (NIR) spectroscopy combined with machine learning algorithms. NIR spectra of different varieties of rice were collected. First-order derivative was determined as the best spectral preprocessing method using partial least squares regression (PLSR). Two dimensionality reduction methods, principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE), were compared, and five machine learning models including artificial neural network (ANN), K-nearest neighbors (KNN), random forest (RF), decision tree (DT), and Naive Bayes (NB) were constructed for variety classification and comparison. The results showed that t-SNE improved the Calinski-Harabasz index by 1 078.005 1, demonstrating better clustering performance. After t-SNE dimensionality reduction, the performance of all five models was superior to that without dimensionality reduction. The average classification accuracy was 95.78%. The accuracy of the NB model was improved most effectively (by 18.89%). The random forest model showed the best classification performance, with prediction accuracy and precision of 98.89% and 98.96%, respectively. This method provides a rapid, non-destructive solution for identifying Wuchang Daohuaxiang rice, which will contribute to brand protection and safeguarding consumer rights, and also offers a new approach for the identification of other geographical indication agricultural products.

Key words: Wuchang Daohuaxiang rice; near-infrared spectroscopy; random forest; discrimination of similar rice varieties; t-distributed stochastic neighbor embedding

中图分类号: