食品科学 ›› 2020, Vol. 41 ›› Issue (12): 273-278.doi: 10.7506/spkx1002-6630-20190213-060

• 安全检测 • 上一篇    下一篇

基于特征选择与特征提取融合的鸡蛋新鲜度光谱快速检测模型优化

段宇飞,王巧华   

  1. (1.湖北工业大学农机工程研究设计院,湖北 武汉 430068;2.华中农业大学工学院,湖北 武汉 430070;3.国家蛋品加工技术研发分中心,湖北 武汉 430070)
  • 出版日期:2020-06-25 发布日期:2020-06-22
  • 基金资助:
    国家自然科学基金面上项目(31371771;31871863);“十二五”国家科技支撑计划项目(2015BAD19B05); 湖北工业大学科研启动基金项目(BSQD2017076)

Optimization of a Predictive Model for Rapid Detection of Egg Freshness Using Visible Near-Infrared Spectra Based on Combination of Feature Selection and Feature Extraction

DUAN Yufei, WANG Qiaohua   

  1. (1. Research and Design Institute of Agricultural Mechanical Engineering, Hubei University of Technology, Wuhan 430068, China; 2. College of Engineering, Huazhong Agricultural University, Wuhan 430070, China; 3. National Research and Development Center for Egg Processing, Wuhan 430070, China)
  • Online:2020-06-25 Published:2020-06-22

摘要: 为有效提高鸡蛋新鲜度检测效率、优化检测模型,本研究结合波长特征选择和特征提取方法各自的优点,对二者进行有效融合共同优化鸡蛋新鲜度检测模型。利用一阶微分对550~950 nm范围内鸡蛋的可见-近红外透射光谱数据进行预处理,考虑到冗余光谱信息对模型精度的影响,使用特征选择方法中的竞争性自适应重加权(competitive adaptive reweighted sampling,CARS)算法融合非线性特征提取局部切空间排列(local tangent space alignment,LTSA)算法最小化光谱无用信息,建立支持向量机回归(support vector regression,SVR)模型,结果表明单一使用CARS特征波长选择建立模型得到训练集交叉验证相关系数(Rcv)为0.880 5,交叉验证均方根误差(root mean square error of cross validation,RMSECV)为8.59,预测集相关系数(Rp)为0.888 9,预测集均方根误差(root mean square error of prediction,RMSEP)为8.42,融合LTSA特征提取方法后得到Rcv为0.896 0,RMSECV为8.04,Rp为0.898 3,RMSEP为8.18,与CARS-SVR模型相比较,融合模型预测精度均有所提高,同时数据维数再次减少14 个,进一步简化了预测模型。研究表明,将特征选择与特征提取二者融合共同应用于鸡蛋可见-近红外光谱数据,不仅提升了光谱检测效率,而且提高了鸡蛋新鲜度预测模型精度,可为鸡蛋新鲜度光谱检测模型优化提供参考依据。

关键词: 鸡蛋, 可见-近红外光谱, 特征选择, 特征提取, 优化, 新鲜度

Abstract: In order to improve the detection efficiency of egg freshness by visible near-infrared spectroscopy and develop an optimized predictive model, we optimized the modelling process by taking the advantages of a combination of wavelength feature selection and feature extraction. First derivative was used to preprocess the visible near-infrared transmittance spectral data in the range of 550–950 nm. Considering the influence of redundant spectral information on the model accuracy, a total of 45 sensitive characteristic wavelengths were selected from the preprocessed spectral data by competitive adaptive reweighted sampling (CARS) for support vector regression (SVR) modeling. The correlation coefficients of cross-validation (Rcv) and prediction (Rp) of the developed model were 0.880 5 and 0.888 9, and the root mean square errors of cross-validation (RMSECV) and prediction (RMSEP) were 8.59 and 8.42, respectively. In order to improve the calculation rate and the stability of the model, we used local tangent space alignment (LTSA) as a nonlinear feature extraction method to reprocess the selected characteristic wavelengths. In the new CARS-LTSA model, Rcv and Rp were 0.896 0 and 0.898 3, and RMSECV and RMSEP were 8.04 and 8.18. Compared with the CARS model, the CARS-LTSA model showed improved prediction accuracy and was simplified by eliminating 14 data dimensions. The results of this study illustrated that combined use of feature selection and feature extraction for visible near-infrared spectral data preprocessing not only improved the detection efficiency but also enhanced the accuracy of the predictive model and therefore could provide a reference method for the optimization of predictive modelling for detecting egg freshness based on infrared spectral data.

Key words: egg, visible near-infrared spectrum, feature selection, feature extraction, optimization, freshness

中图分类号: