zoukankan      html  css  js  c++  java
  • [Benchmark] Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools

    Basic Information

    • Publication: ICSE'17
    • Authors: Shin Hwei Tan, Jooyong Yi, Yulis, Sergey Mechtaev, Abhik Roychoudhury
    • Language: C Program
    • Source: Codeforces Programming Contest (Reject/Accept)
    • Description: a set of 3902 defects from 7436 programs automatically classified across 39 defect classes
    • Dataset Homepage

    Summary

    Existing benchmarks (like ManyBugs and IntroClass) on automated program repairs do not allow thorough investigation of the relationship between fault types and the effectiveness of repair tools.
    Four criterias for a benchmark that allows extensive evaluation of repair tools:

    • C1: Diverse types of real defects.
    • C2: Large number of defects.
    • C3: Large number of programs.
    • C4: Programs that are algorithmically complex
    • C5: Large held-out test suite for patch correctness verification

    Overall, author crawled over 10000 webpages from Codeforces programming contest. For each rejected submission r, they find another accepted submission a by the same user for the same programming problem in the crawled data. Each fault is represented by the submission pair (r, a). In total, they obtain 5544 defects. Then they further exclude 924 defects due to inadequate held-out tests, 677 defects due to non-reproducible bugs, and 41 defects due to a known CIL bugs2 in handling variable sized multidimensional array.

    All defects are divided into 39 classes by using Gumtree on AST-level syntactic differences between buggy program and patched program.

    Structure

    codeflaws
    |> 1-A-bug-18353198-18353306 (<contestid>-<problem>-bug-<buggy-submisionid>-<accepted-submissionid>)
    |===> 1-A-18353198.c (<contestid>-<problem>-<buggy-submisionid>.c)
    |===> 1-A-18353306.c (<contestid>-<problem>-<accepted-submissionid>.c)
    |===> input-neg1 (Test input files: input[0-9]+ file used by Test suite (i))
    |===> output-neg1 (Test output files: output[0-9]+ file used by Test suite (i))
    |===> heldout-input-pos1 (heldout-input[0-9]+ file used by Test suite (ii))
    |===> heldout-output-pos1 (heldout-output[0-9]+ file used by Test suite (ii))
    |===> 1-A-18353198.c.revlog(Test configuration for SPR that specify the name for pass/fail test: --.c.revlog)
    |===> test-genprog.sh (Repair Test script (test suite given to repair tools for generating repair), test-genprog.sh is for search-based repair tools (GenProg, SPR, Prophet))
    |===> test-angelix.sh (Repair Test script (test suite given to repair tools for generating repair), test-angelix.sh is for Angelix as it requires inserting special instrumentation)
    |===> test-valid.sh(Test script for patch validation (held-out test suite): test-valid.sh is for validating the correctness of patches)
    |===> Makefile (Makefile for compiling the buggy submission. This contains the CFLAGS options recommended by Codeforces. To compile the accepted submission, use the command make FILENAME=10-A-13543524)
    |===> Makefile.genprog (Makefile.genprog for compiling the buggy submission using cilly. This is for GenProg experiments as GenProg works on CIL representation.)

  • 相关阅读:
    AC日记——色板游戏 洛谷 P1558
    AC日记——方差 洛谷 P1471
    AC日记——[Scoi2010]序列操作 bzoj 1858
    AC日记——Sagheer and Nubian Market codeforces 812c
    AC日记——Sagheer, the Hausmeister codeforces 812b
    AC日记——Sagheer and Crossroads codeforces 812a
    [BJOI2019]排兵布阵 DP
    多重背包二进制优化
    BZOJ 3211 花神游历各国 线段树
    「CQOI2006」简单题 线段树
  • 原文地址:https://www.cnblogs.com/XBWer/p/9195057.html
Copyright © 2011-2022 走看看