TY - Jour A2 - 苟,建平Au - 张,冯··李,郭曼·奥 - 刘,苏格·奥 - 宋,钱普利 - 2020DA - 2020/12/18 TI - 流程图的交叉语言源代码相似性检测SP - 8835310 VL - 2020 AB - 源代码相似性检测在代码抄袭检测和软件知识产权保护中具有各种应用。在计算机编程教学中,学生可以将以一种编程语言编写的源代码转换为代码分配提交的另一种语言。由于不同编程语言之间的语法差异,以相同语言编写的源代码的现有相似性测量不适用于跨语言代码相似性检测。同时,现有的跨语言源相似性检测方法易于复杂的代码混淆技术,例如替换等效控制结构并添加冗余语句。为了解决这个问题,我们提出了一种基于代码流程图的跨语言代码相似性检测(CLCSD)方法。通常,以不同编程语言编写的两个源代码片段被转换为标准化的代码流程图(SCFC),并且通过测量它们对应的SCFC来获得它们的相似性。更具体地,我们首先将标准化的代码流程图(SCFC)模型介绍,是以不同语言编写的源代码的统一流程图表示。SCFC是语言无关的,因此,它可以用作源代码相似性检测的中间结构。同时,给出转换技术以将以特定编程语言编写的源代码转换为SCFC。 Second, we propose the SCFC-SPGK algorithm based on the shortest path graph kernel to measure the similarity between two SCFCs. Thus, the similarity between two pieces of source code in different programming languages is given by the similarity between SCFCs. Experimental results show that compared with existing approaches, CLCSD has higher accuracy in cross-language source code similarity detection. Furthermore, CLCSD cannot only handle common source code obfuscation techniques used by students in computer programming teaching but also obtain nearly 90% accuracy in dealing with some complex obfuscation techniques. SN - 1058-9244 UR - https://doi.org/10.1155/2020/8835310 DO - 10.1155/2020/8835310 JF - Scientific Programming PB - Hindawi KW - ER -