Blayne Dreier

Expeller v1.0 – The Cheat Detector

 

April 21, 2005

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Index

 

  1. Executive Summary……………………………………………………………….1
  2. User Manual……………………………………………………………………….2
    1. Non-technical Overview…………………………………………………..2
    2. User Operations…………………………………………………………...3
    3. User Screens………………………………………………………………4
    4. Defects…………………………………………………………………….8
    5. Project Extensions………………………………………………………...8
    6. What I would do differently……………………………………………..11
  3. Technical Manual………………………………………………………………..13
    1. Installation process....................................................................................13
    2. Technical Overview..................................................................................13
    3. Source Code……………………………………………………..............20
    4. After graduation project maintainability………………………………...41

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Expeller is meant to curb the problem of source code plagiarism among students. It is a piece of software that searches through student program files and makes textual comparisons between them.

Many Computer Science students will openly admit that they do not like to program. Many of them also become frustrated and anxious due to program due dates and specific instructor-requested guidelines. Rather than go to tutoring or ask for help in understanding programming fundamentals, they become experts at cheating by copying another student’s source code, manipulating it in order to change its basic structure or appearance, and then submitting the plagiarized result as their own.

A manipulated program is not as easy for a human to detect as, for example, a copied term paper. The reader of linguistic texts can obtain an idea or feeling while he reads each paper, even if all of the papers’ topics are the same. This is not often possible when reading source code, as the code defines specific actions in a procedure-oriented language, which is not easily read and retained block-by-block. Additionally, programs written for any given programming assignment all have the same fundamental purpose, meaning that a human will be unable to distinguish one program from another, based on the function each describes.

Expeller takes a different approach. Rather than simply basing its comparison on the function of a program, it compares the style. It then outputs a minimized list of all possible plagiarized programs, along with their source, to a simple text file. This file can then be read by the user and the possible plagiarized programs can be examined. By the methods described in this paper, you will see that Expeller is surprisingly accurate by replicating and comparing the techniques used by cheating students.

User Manual

Overview

Expeller detects source code plagiarism among students by making textual comparisons between files. It then outputs the results to a simple, minimized text file for user-review. A few examinations can then be made by the user to determine whether the students are indeed believed to be guilty of cheating.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

User Operations

            A listing of all user operations is easily reached by using the ‘help’ runtime argument, which is actually a user option itself. Upon using the ‘help’ argument, the user will receive the following list of user options:

            -d <full path>

Provides the user with the ability to select the first directory in which Expeller will search for files.

     -o <output file name>

Allows the user to choose the name of the output file that will contain all of the possible cheaters.

     -f <filename or extension>

Filters the search to only include specific file types (by extension) or specific file names (such as if all of the students’ files are named Lab_1).

     -s

Flag that enables the searching of subdirectories within the specified directory in the -d argument.

-t <cheat threshold>

          Allows the user to set a threshold for which cheaters will be reported.

     help

                 Displays the help screen with a listing of these user operations.

 

 

 

 

 

Screens

            Expeller is completely text based to allow it to be used remotely, so there are only four terminal screens that the user will encounter:

            Help Screen

 

The Help Screen lists all of Expeller’s user options (described on previous page).

 

 

 

 

 

 

 

 

 

Run Screen

 

In this screen, we are running the program with various user options. The directory (–d) option tells Expeller to begin searching in /afs/uncc.edu/usr/f/cbdreier/sp/samplefiles, the output (–o) option says that the output will be in ‘cheaters.txt,’ the filter (–f) option tells Expeller to only search for files with the .java extension, the subdirectories (–s) option tells Expeller to search all subdirectories, and the threshold (–t) option says to only output files that are 85% (or greater) alike.

 

 

 

 

 

Run-time Screen

 

When Expeller runs, it displays all of the values for user configurable options. It then lets the user know that it is loading files by displaying one period for each file loaded and then displaying the total number of files loaded. It does the same for comparing files and then displays the total time that the comparisons took. Lastly, Expeller tells the user how many possible cheats it found and the output file’s name.

 

 

 

 

 

 

 

User Review Screen

 

            The last screen isn’t a screen per se (unless you are using Expeller via terminal), but a display of the results file. Here we see that the results file is telling the user to make four manual comparisons (down from 120). We have each file pair’s cheat possibility listed as well as the full path location of the offending files. At the bottom, the total number of possible cheats is displayed. This can be helpful in cases of very long cheaters files.

 

 

 

 

Defects

            There are no known defects in Expeller.

Extensions to the Project

            There is a clear direction that the project should now take. First, the modCountTokens(File file1, File file2);  method should be expanded upon by adding more regular expressions. Second, a new module should be implemented. This module should be a parsed tokenized string matching algorithm. The student that builds this module will benefit greatly by having taken a compiler construction course. In order to build this module, the following description should be followed:

            Tokens are created by categorizing statements. This is easily understood by example. If your program contains a statement: int thisNumber = 10; the token to replace it would be VARDEF which stands for ‘variable declaration.’ Similarly, a statement such as import java.io.*; would be replaced with an IMPORT token. The table found on page twelve is taken from one of the best cheat detectors in the industry, JPlag (http://www.jplag.de/). It is a generic token table, so it should not be considered plagiarism to implement it into Expeller. Additionally, one can add his/her own tokens to make this table more complete. Tokens can be added for any language, thereby expanding Expeller’s abilities.

            The easiest way to carry out the implementation of the module described above is to use Expeller’s getFileAsString(File file, boolean deleteComments); method, with deleteComments set to true. This will return a file as a string object and allow you to replace matches to regular expressions with tokens. (Tokens are simply strings with values equal to the token name.) You will find information about how to create and match regular expressions in the java.util.regex package, as well as in the modCountTokens(File file1, File file2); method of Expeller where tokens are matched with regular expressions. They are then counted and totals are compared via the percentMatch(int number1, int number2); method.

After the token strings are built by replacing all token matches by token names, an algorithm will need to be written to match the token strings of each file to the other. The programmer will need to keep in mind that he/she will need to match token strings (combinations of tokens), as well as individual tokens, between the two files. The programmer will need to devise a sophisticated algorithm, remembering that tokens in the supposed cheater’s program can be moved around without affecting its operation.

Lastly, more modules can be added that compare the two files in any fashion. However, the programmer should remember to test each module on real sample data, and throw out any modules that skew Expeller away from actual detection of cheating. This was done many times during the development of Expeller v1.0. Also, to carry a tradition and to let the current programmer know how many revisions have been made to Expeller, please add one tenth to the version number for every new senior project on which the code is based.

A note on building modules: Expeller was built to be completely modular. At minimum, all one has to do to add to the sophistication of Expeller’s results is add a module to the main code, and then run it and add its percent match to the total percent match of the compareFiles method. In order to keep Expeller additions this simplistic, when creating a method, always only allow it to accept, as parameters, two files, and always return the percent match for that module.

What I would do differently

            When I started this project, I had not taken any compiler construction courses, and therefore knew nothing of their topics. I will say that one introduction course in compiler construction has changed the direction of this project so drastically that any student that works on Expeller in the future should have first taken one of these classes.

            If I were to begin this project again, I would start with the algorithm described in the “Extensions to the Project” section of this paper. Even though the current modules of Expeller work incredibly well, seeing the results of the suggested algorithm could spark ideas for even more sophisticated modules than the ones that are currently implemented. Since the project only exists, for most students, over a short period of time (one to two semesters) brainstorming is costly as implementation tends to take the most time.

            Also, if there was more than one member in the group, I would have assigned one of the members to do some undercover investigative research. He would have asked a professor that teaches an introductory Java programming course (1214 or 1215) to sit in on about a quarter of his/her classes (especially around the time programs are due). Students would probably not be tipped off that he was not a regularly enrolled student, and he could seek out the cheaters and record their methods. They would most likely not hesitate to provide him with information, especially if they had guaranteed anonymity. Additionally, he may be able to work out some scenario where the professor will make an assignment for every student in the class to alter a program in order to fool the professor into believing that it’s an original. He could then collect this assignment from the teacher and record what the students have altered, thereby strengthening Expeller.

            Expeller works by detecting mimics of cheat methods. The most sophisticated module algorithm is useless if it does not detect real student cheating habits.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Technical Manual

Installation Process

            There is no installation process, per se. You simply copy the expeller.java file into a directory anywhere on the system that the target files are located and compile it with the system’s native Java Runtime Environment (JRE).

Technical Overview

            The most thorough way to describe Expeller in a technical manner is to explain its methods in detail. I will first describe the functional methods (the methods that allow Expeller to function as a program) and then move to the modular methods

.expeller(String[] args) – Constructor

            Since Expeller runs more as a standalone program and not simply a class, the constructor takes care of running all of the various methods in order. First, it prints out a version number. Second it gathers and dissects the argument user options using the dissectArgs(LinkedList argList) method. It then prints the set user options, and begins loading files with the growFileAndDirectoryArray() method. Fourth, it compares the files using the compareFiles() method and keeps track of the total time that the comparisons took so that it can be outputted to the user. Lastly, it tells the user how many cheats were found and what file they have been outputted to.

dissectArgs(LinkedList argList)

      Arguments are supplied to Expeller at runtime. They need to be recursively evaluated because no argument is mandatory. dissectArgs takes, as a parameter, a LinkedList of arguments, which will act as a stack, from the constructor method. It then checks to see if a help argument is first (meaning the user would like to have a display of all the user options) and then proceeds to check each other argument in the list. Invalid arguments are those that are not ‘help’ and do not start with a ‘-,’ so the validity of this is checked first. If an invalid flag is found Expeller tells the user of this error. After Expeller knows that it’s dealing with a valid argument, the letter for the argument is examined, an action (determined by the argument option letter, described in the “User Options” section) is carried out. The argument is then popped off the stack, and then the method checks to see if the stack is empty. If not, dissectArgs is called again recursively.

compareFiles()

            This method runs the comparison modules of Expeller, records percent matches from each module, computes the total percent match between the two files and then prints the result to an output file.

            Files are stored in two arrays, file1[] and file2[]. They are cross-matched via two for loops so that every file is compared against every other. Each pair of files is run through all four modules, the score is recorded, and the loop continues with the next pair. After the total results are computed, the final scores that are at or above the threshold user option are written to an output file. The file stream is then closed.

percentMatch(int number1, int number2)

            This method finds the percent match between two numbers. It uses the generic percent difference formula, (difference/average)*100, and then subtracts the result from 100. If the sum of the two numbers is equal to 0, this method returns a percent match of 100 because this would mean that both numbers are equal to 0 (this method will never receive negative parameters) and the absence of any attribute is just as important as its presence. If the sum is not zero, the method continues. After the percent match is calculated, if it is greater than 0 (percent difference is less than 100), the percent match is returned. If the percent match is less than 0 (percent difference is greater than 100), a 0 is returned.

calculateComparisons(int numberOfFiles)

            This method is used to calculate the number of comparisons that Expeller will have to make, based on the number of files that it found. The formula for comparisons based on number of files is: number of comparisons = (n-1) + (n-2) + (n-3) +……+ (n-(n-1)), n being equal to the number of files.

getDepthOfFile(File file)

            When we create the arrays for depth length comparisons in the modCountBlockSize method, we need to know the depth of each file. This method calculates that depth by incrementing the depth value for every found ‘{‘ and decrementing it for every found ‘},’ all the while keeping track of the deepest depth found.

getFileAsString(File file, Boolean deleteComments)

            This method is the heart of textual comparisons in Expeller. It takes as arguments a file and a boolean that determines whether the returned file string will contain comments.

The method reads in each line of a file and determines if a comment is contained in the line by using the regular expression, (?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*) to make matches and replace them with a blank line, and by searching for occurrences of /* and */ and removing them. If a comment is completely contained within a line, the line is removed, and if a comment is in a line along with valid code, the comment is removed and the code is saved. After all comments are removed from a line, the length is tested, if it is greater than 0, the line is added to the final file string. If not, it is forgotten and the next line is read in. Therefore, during the process of removing comments, this method also removes blank lines (white space) which are not used in comparison.

greaterOfTwo(int int1, int int2)

            This method simply returns the greater of two numbers. It is used to size arrays.

printManual()

            When a user needs to know what user option arguments that he can provide Expeller at runtime, he will use the argument ‘help.’ This will trigger the printManual method, which will print all of the available user options to the terminal. Additionally, if a user enters an invalid command, this method is run to educate him on how his error could be corrected.

growFileAndDirectoryArray()

            In order to compare files within a directory, they need to be loaded into some data structure. The easiest way to cross-compare elements is by putting them in arrays and running two for loops, one encapsulated in the other. This method searches through the directory specified at runtime, loads all of the files in that directory into an array and if the subdirectory argument has been given, it examines any directories within the initial directory and also adds files that they contain to the files array.

Additionally, the idea of the filter argument is carried out in this method. If the current found file does not match the filter pattern, it is not added to the file array.

While files are added to the array, a ‘.’ is printed for each one, to let the user know that the loading is occurring.

 

 

modCompareLength(File file1, File file2)

            This module is one of two dumb modules implemented. It simply counts the character length of the string representation of a file1 and file2, and then compares the lengths based on the percentMatch method. It is thought of as ‘dumb’ because there is no sophisticated calculation involved. However, ‘dumb’ is not to be confused with ineffective; the results are very extremely useful.

modCompareNumberOfLines(File file1, File file2)

            This method effectively does the same thing as modCompareLength, except that it compares the number of lines in each file rather than the number of characters. It is another ‘dumb’ module and its results are most important when dealing with files of very different lengths. It very easily rules those out as possible matches. So, even though modCompareLength and modCompareNumberOfLines are simplistic, in some ways, they play a greater role than the more sophisticated modules.

modCountBlockSize(File file1, File file2)

            Most cheaters will change variable names and move code around. They are looking for the easy way out and that does not include re-writing any methods. If they do, they might as well not cheat to begin with. This method counts the number of lines at each depth of a program. Depths are described in the getDepthOfFile method, and are simply thought of as a number value for every instance of ‘{‘ or ‘}.’ Any text outside of curly braces is thought to be in depth 0, any in the first ‘{‘ is depth 1 and so on.

            The number of lines at each depth is calculated and stored in an array at the index equal to the depth value. Each individual depth’s number of lines of each file is compared with that of the other file and a percent match array is formed. Then all of the percent matches are summed and divided by the number of entries in the percent match array, giving Expeller a total percent match for this module, for any two given files.

            The module breaks a file string up into lines via delimiting by the ‘\n’ character. It then determines if the line contains a ‘{‘ in order to increment the block counter. Once inside the ‘{,’ it begins to add to the array index [blockCounter] for that depth. If a ‘}’ is found, the block counter is decremented and lines succeeding it are added to the next shallow depth. If no curly braces are found on a line, one is added to the current depth.

            The percent match array is sized by the largest depth array size. Meaning if a file1 has one more depth than file2, the percent match array will have the length of file1’s block count array. The value for this index in the percent match array will be zero because one file is 100% different than another in respect to that depth value.

modCountTokens(File file1, File file2)

            When plagiarizing another student’s code, it is very easy to change variable names. However, changing the variable types, loop types, included libraries, and system calls is more than difficult; it requires knowledge of the language – something that the cheater most likely doesn’t have.

Counting generally found keywords in files and then comparing the number of matches is a very effective way of catching copies of files. However, comparing the matches of one single keyword between two files is counter-productive -- it would be easy to rewrite one form of loop. So, as more regular expressions are added to this module, the more effective it will become.

            This module works by taking an array of regular expressions, cycling through it, and counting the number of matches within each file, and then recording each match at the regular expression’s index within the expressions match array. It then calculates a percent match between each index of file1’s match array and the corresponding index of file2’s match array.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Source Code

            Java was chosen as the language for Expeller because of its usability across many platforms without the need to change any of the source code. It was determined that having all of the modules and functional code contained in one file would be better than having separate module classes, for ease of transferability (sharing). Rather than requiring the user to download and install a package on a system with which he/she may not be familiar, all that is required to “install” Expeller is the compilation of one file.

package expeller;

 

import java.io.*;

import java.util.*;

import java.util.regex.*;

 

public class expeller {

 

  /* File Array - will contain all of the files found to be compared */

     private File[] filesArray;

 

  /* Directory Array - will contain all of the directories where files    

     are found */

  private File[] directoriesArray;

 

  /* Array used to hold each file in the Linked List for them to be  

     compared against each other */

  Object[] filesListToArray;

 

  /* Holds the directory from which the program was started to know   

     where to begin searching */

  String topDir = System.getProperty("user.dir");

  /* The program will need to move through many directories

     recursively, so initially, the curDir will be the topDir */

  String curDir = topDir;

 

  /* A flag is a complete argument to the program, the '-' and whatever

     letters or words are attached to it */

  String curFlag;

  /* The argument key that is attached to the '-' */

  String curArg;

 

  /* This string can be changed to represent the current version number

     of Expeller, which will be displayed at runtime */

  final String VERSION_NUMBER = "1.0";

 

  /* Linked list that contains all of the directories to be searched */

  LinkedList directories = new LinkedList();

  /* Linked list that contains all of the files to be searched */

  LinkedList files = new LinkedList();

 

  /* Linked List that contains all of the runtime arguments to the

     program */

  LinkedList argsList = new LinkedList();

 

  /* One argument that can be given to the program is a filter, so that

     the program will only search for a specific filename or extension,

     this variable holds the filter */

  String fileNameOrTypeFilter = "No filter";

  /* Contains the output filename for the file where the possible

     cheaters will be located. By default, it's cheaters.txt, but the

     name can also be supplied as an argument at runtime */

  String outputFileName = "cheaters.txt";

 

  /* An argument can be given that tells the program whether to search

     through subdirectories */

  boolean subDirectories;

 

  /* Argument to supply the percent match threshold for which cheaters

     will be reported */

  int percentMatchThreshold = 75;

  int cheaters = 0;

 

  /* This is the number of built-in intelligent modules. Intelligent

     modules are deemed intelligent by their complexity and their

     ability to provide a sophisticated comparison of two files */

  final int NUMBER_OF_INTELLIGENT_MODULES = 2;

  /* This is the number of built-in dumb modules. Dumb modules return

     some simple comparison about the file. The two dumb modules  

     currently implemented count the lines and character lengths of

     files */

  final int NUMBER_OF_DUMB_MODULES = 2;

 

  /* This public method allows Expeller to be run as a stand alone

     program */

  public static void main(String[] args) {

 

    expeller Expeller = new expeller(args);

  }

 

  /* This constructor allows Expeller to be called from another program

     as a stand alone class */

  public expeller (String[] args) {

 

    /* Program title and version number */

    System.out.println("\nExpeller v" + VERSION_NUMBER);

 

    /* If there were command line arguments, add them to an array and

       then dissect each one */

    if(args.length != 0) {

 

      for(int i = 0; i < args.length; i++) {

 

        argsList.add(args[i]);

      }

 

      dissectArgs(argsList);

    }

 

    /* These are informational lines to tell the user which options

       he/she has selected at runtime */

    System.out.println();

    System.out.println("Root search directory: " + curDir);

    System.out.println("Filename or extension filter: " +

                        fileNameOrTypeFilter);

    System.out.println("Search subdirectories: " + subDirectories);

    System.out.println("Output file name: " + outputFileName);

    System.out.println("Percent match threshold: " +

                        percentMatchThreshold);

    System.out.println();

 

    /* This line displays to let the user know that the program is

       currently loading files */

    System.out.print("Loading files");

 

    /* This method actually builds the files and directories arrays in

       order to do recursive searching for files */

    growFileAndDirectoryArray();

 

    /* Copies the elements in the files LinkedList to an Array to

       compare each file against each other */

    filesListToArray = files.toArray();

 

    /* Output spacing at runtime */

    System.out.println();

    System.out.println();

 

    /* This line tells the user how many files have been loaded for

       comparison */

    System.out.println("(" + files.size() + " files loaded)" + "\n");

 

    /* This variable will be used to show the number of comparisons to

       the user */

    int numberOfComparisons = calculateComparisons(files.size());

 

    /* This line tells the user that file comparisons are currently

       happening */

    System.out.print("Comparing Files");

 

    /* This variable is used to hold the time before the comparisons

       begin so that it can be compared to the end time and the time

       that the comparisons took can be calculated */

    long startTime = System.currentTimeMillis();

 

    /* This method actually handles the comparison of the files */

    compareFiles();

 

    /* Holds the system time at the end of comparisons to calculate the

       time it took to compare them all */

    long endTime = System.currentTimeMillis();

 

    /* Calculates how long the comparisons took */

    long secondsForComparisons = (endTime - startTime)/1000;

 

    /* These lines tell the user how many comparisons took place and

       how long they took */

    System.out.println();

    System.out.println("(" + numberOfComparisons + " comparisons

                         completed in "

                          + secondsForComparisons + " seconds)");

    System.out.println();

 

    /* This line tells the user how many cheats were found and where

       the output file is located */

    System.out.println(

      "(Output of " + cheaters + " possible Cheats located in " +

      outputFileName + ")");

 

    System.out.println();

  }

 

  public void dissectArgs(LinkedList argsList) {

 

  /* Get the first flag from the argsList */

  curFlag = argsList.getFirst().toString();

  /* Then remove the flag so that we can get the argument next */

  argsList.removeFirst();

 

  /* If the flag is the help/? flag */

  if (curFlag.equalsIgnoreCase("help")) {

 

    /* Print the Help Menu */

    printManual();

    System.exit(0);

  }

 

  /* If the first flag starts correctly, with a "-" */

  else if(curFlag.charAt(0) == '-') {

 

    /* If it's a directory flag */

    if (curFlag.equalsIgnoreCase("-d")) {

 

      /* Try getting it off the stack */

      try {

 

        /* Take the first argument off the stack and make it curArg */

        curArg = argsList.getFirst().toString();

        /* Now remove the first argument on the top of the stack so

           that next time the first argument is pulled, it will be the

           next argument on the stack */

        argsList.removeFirst();

      }

 

      /* If the flag was thrown and no argument was supplied, catch it

         */

      catch (Exception e) {

 

        /* Print out an error letting the user know what's going on */

        System.out.println("\nNo directory argument supplied");

       

        /* Print the manual so that the user can correct his/her error

        */

        printManual();

        /* Kill the program so that the user can think about his/her

           error and re-initiate the program with valid arguments */

        System.exit(0);

      }

 

      /* Temporarily storing the directory object to check if its

         actually a directory */

      File tempCurDir = new File(curArg);

 

      /* Check if the directory object is actually a directory. If so,

         make the current directory this directory */

      if (tempCurDir.isDirectory())

        curDir = curArg;

 

        /* If the current argument is not a directory, throw error and

           exit */

      else {

 

        System.out.println("\nERROR: Directory does not exist.");

        printManual();

        System.exit(0);

      }

    }

 

    /* If the flag is a directory flag */

    else if (curFlag.equalsIgnoreCase("-f")) {

 

      try {

 

        curArg = argsList.getFirst().toString();

        argsList.removeFirst();

      }

 

      catch (Exception e) {

 

        System.out.println("\nNo file argument supplied");

        printManual();

        System.exit(0);

      }

 

      /* Pull the current argument off the stack and make it the       

         fileNameOrType variable so that we can use it to search files         

         */

      fileNameOrTypeFilter = curArg;

    }

 

    /* If the flag is the outputFile flag */

    else if (curFlag.equalsIgnoreCase("-o")) {

 

      try {

 

        curArg = argsList.getFirst().toString();

        argsList.removeFirst();

      }

 

      catch (Exception e) {

 

        System.out.println("\nNo file argument supplied");

        printManual();

        System.exit(0);

      }

 

      /* Pull the current argument off the stack and make it the

         outputFileName variable so that we can use it to output

         Cheaters */

      outputFileName = curArg;

    }

 

    /* If the flag is the percentMatchThreshold flag */

    else if (curFlag.equalsIgnoreCase("-t")) {

 

      try {

 

        curArg = argsList.getFirst().toString();

        argsList.removeFirst();

      }

 

      catch (Exception e) {

 

        System.out.println("\nNo file argument supplied");

        printManual();

        System.exit(0);

      }

 

      /* Pull the current argument off the stack and make it the

         percentMatchThreshold variable so that we can use it to output