食品科学 ›› 0, Vol. ›› Issue (): 0-0.

• 基础研究 •    下一篇

基于结构化分析和语义相似度的食品安全事件领域数据挖掘模型

陈默1,张景祥2,胡恩华1,吴林海3,张义2   

  1. 1. 南京航空航天大学
    2. 江南大学
    3. 江南大学江苏省食品安全研究基地
  • 收稿日期:2020-05-05 修回日期:2021-02-24 出版日期:2021-04-15 发布日期:2021-04-30
  • 通讯作者: 吴林海 E-mail:wlh6799@hotmail.com
  • 基金资助:
    食品安全体系框架的科学内涵与设计研究

Data Mining Model for Food Safety Incidents Based on Structural Analysis and Semantic Similarity

  • Received:2020-05-05 Revised:2021-02-24 Online:2021-04-15 Published:2021-04-30

摘要: 食品安全关系群众切身利益,关系社会稳定。文章通过分析主流媒体报道食品安全事件的空间分布、食品类别,风险因子、危险环节等特征,构建食品安全事件文本数据的语义结构模板,提出食品安全事件的多层多级语义结构(Strategy of Multi-layer and multi-level semantic structure of rank, MMSS-Rank)算法,计算食品安全数据与语义结构模板的相似度确定其综合得分,选择适当的阈值确定食品安全事件精度。实验表明,基于多层多级语义结构化算法较传统方法,对食品安全事件大数据识别准确率高,证明该方法可行,有效。

关键词: 食品安全事件, 语义分析, 语义结构模板, 大数据

Abstract: Food safety is of vital interest for public health and the stability of society. In this paper, we analyzed the characteristics of food safety incidents (FSIs), including spatial distribution, food categories, risk factors, and supply chain links, reported by mainstream media in China. Based on our analysis, we constructed a semantic template for text data related to FSIs. Furthermore, we introduced a multi-layer, multi-level semantic structure of rank (MMSS-Rank) algorithm to measure the similarity between collected food safety data and the semantic template. We then calculated the overall scores (i.e., text layer weight, semantic template weight, and keyword density matrix) and selected an appropriate threshold to determine the accuracy of the FSI data. Results showed that, compared with traditional methods, MMSS-Rank is an efficient and robust method for identifying large-scale FSI data with higher accuracy and recall rate.

Key words: Food safety incidents, Semantic analysis, Data mining model, Big data

中图分类号: