食品科学 ›› 2021, Vol. 42 ›› Issue (7): 35-44.doi: 10.7506/spkx1002-6630-20200505-027

• 基础研究 • 上一篇    下一篇

基于结构化分析和语义相似度的食品安全事件领域数据挖掘模型

陈默,张景祥,胡恩华,吴林海,张义   

  1. (1.南京航空航天大学经济与管理学院,江苏 南京 211106;2.江南大学理学院,江苏 无锡 214122;3.江南大学生物工程学院,江苏 无锡 214122;4.江南大学商学院,食品安全风险治理研究院,江苏 无锡 214122)
  • 出版日期:2021-04-15 发布日期:2021-05-17
  • 基金资助:
    国家社会科学基金重点项目(19AGL021)

Data Mining Model for Food Safety Incidents Based on Structural Analysis and Semantic Similarity

CHEN Mo, ZHANG Jingxiang, HU Enhua, WU Linhai, ZHANG Yi   

  1. (1. College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; 2. School of Science, Jiangnan University, Wuxi 214122, China; 3. School of Biotechnology, Jiangnan University, Wuxi 214122, China; 4. Institute for Food Safety Risk Management, School of Business, Jiangnan University, Wuxi 214122, China)
  • Online:2021-04-15 Published:2021-05-17

摘要: 食品安全关系群众切身利益和社会稳定。本文通过分析主流媒体报道食品安全事件的空间分布、食品类别、风险因子、危险环节等特征,构建食品安全事件文本数据的语义结构模板,提出食品安全事件的多层多级语义结构排序策略(strategy of multi-layer and multi-level semantic structure of rank,MMSS-Rank)算法,计算食品安全数据与语义结构模板的相似度,确定其综合得分,选择适当的阈值确定食品安全事件精度。通过抓取数据,清洗后构建真实食品安全数据,采用支持向量机和语义分析等方法进行分类精度对比。结果表明,MMSS-Rank在食品安全事件大数据识别准确率和召回率较好,证明MMSS-Rank算法可行、有效。

关键词: 食品安全事件;语义分析;语义结构模板;大数据

Abstract: Food safety concerns public health and the stability of society. In this paper, we analyzed the characteristics of the food safety incidents (FSIs), including spatial distribution, food categories, risk factors, and supply chain links, reported by mainstream media in China. Based on our analysis, we constructed a semantic template for text data related to FSIs. Moreover, we introduced a strategy of multi-layer and multi-level semantic structure of rank (MMSS-Rank) algorithm to measure the similarity between the collected food safety data and the semantic template, and then calculated the overall scores and selected an appropriate threshold to determine the accuracy of the FSI data. Supporting vector machine and semantic structure template are adopted to conduct the classification accuracy comparison via data extraction and cleansing.Results showed that compared with the traditional methods, MMSS-Rank was an efficient and robust method for identifying large-scale FSI data with higher accuracy and recall rate.

Key words: food safety incidents; semantic analysis; semantic structure template; big data

中图分类号: