Blayne Dreier
Expeller v1.0 The Cheat Detector
Index
Expeller is meant to curb the problem of source code plagiarism among students. It is a piece of software that searches through student program files and makes textual comparisons between them.
Many Computer Science students will openly admit that they do not like to program. Many of them also become frustrated and anxious due to program due dates and specific instructor-requested guidelines. Rather than go to tutoring or ask for help in understanding programming fundamentals, they become experts at cheating by copying another students source code, manipulating it in order to change its basic structure or appearance, and then submitting the plagiarized result as their own.
A manipulated program is not as easy for a human to detect as, for example, a copied term paper. The reader of linguistic texts can obtain an idea or feeling while he reads each paper, even if all of the papers topics are the same. This is not often possible when reading source code, as the code defines specific actions in a procedure-oriented language, which is not easily read and retained block-by-block. Additionally, programs written for any given programming assignment all have the same fundamental purpose, meaning that a human will be unable to distinguish one program from another, based on the function each describes.
Expeller takes a different approach. Rather than simply basing its comparison on the function of a program, it compares the style. It then outputs a minimized list of all possible plagiarized programs, along with their source, to a simple text file. This file can then be read by the user and the possible plagiarized programs can be examined. By the methods described in this paper, you will see that Expeller is surprisingly accurate by replicating and comparing the techniques used by cheating students.
User Manual
Overview
Expeller detects source code plagiarism among students by making textual comparisons between files. It then outputs the results to a simple, minimized text file for user-review. A few examinations can then be made by the user to determine whether the students are indeed believed to be guilty of cheating.
User Operations
A listing of all user operations is easily reached by using the help runtime argument, which is actually a user option itself. Upon using the help argument, the user will receive the following list of user options:
-d <full path>
Provides
the user with the ability to select the first directory in which Expeller will
search for files.
-o
<output file name>
Allows
the user to choose the name of the output file that will contain all of the
possible cheaters.
-f
<filename or extension>
Filters
the search to only include specific file types (by extension) or specific file
names (such as if all of the students files are named Lab_1).
-s
Flag that enables the searching of subdirectories within the specified directory in the -d argument.
-t <cheat threshold>
Allows the user to set a threshold for which
cheaters will be reported.
help
Displays the help screen with a listing of these user operations.
Screens
Expeller is completely text based to allow it to be used remotely, so there are only four terminal screens that the user will encounter:
Help Screen

The Help Screen lists all of Expellers user options (described on previous page).
Run Screen

In this screen, we are running the program with various user options. The directory (d) option tells Expeller to begin searching in /afs/uncc.edu/usr/f/cbdreier/sp/samplefiles, the output (o) option says that the output will be in cheaters.txt, the filter (f) option tells Expeller to only search for files with the .java extension, the subdirectories (s) option tells Expeller to search all subdirectories, and the threshold (t) option says to only output files that are 85% (or greater) alike.
Run-time Screen

When Expeller runs, it displays all of the values for user configurable options. It then lets the user know that it is loading files by displaying one period for each file loaded and then displaying the total number of files loaded. It does the same for comparing files and then displays the total time that the comparisons took. Lastly, Expeller tells the user how many possible cheats it found and the output files name.
User Review Screen

The last screen isnt a screen per se (unless you are using Expeller via terminal), but a display of the results file. Here we see that the results file is telling the user to make four manual comparisons (down from 120). We have each file pairs cheat possibility listed as well as the full path location of the offending files. At the bottom, the total number of possible cheats is displayed. This can be helpful in cases of very long cheaters files.
Defects
There are no known defects in Expeller.
Extensions to the Project
There is a clear direction that the project should now take. First, the modCountTokens(File file1, File file2); method should be expanded upon by adding more regular expressions. Second, a new module should be implemented. This module should be a parsed tokenized string matching algorithm. The student that builds this module will benefit greatly by having taken a compiler construction course. In order to build this module, the following description should be followed:
Tokens are created by categorizing statements. This is easily understood by example. If your program contains a statement: int thisNumber = 10; the token to replace it would be VARDEF which stands for variable declaration. Similarly, a statement such as import java.io.*; would be replaced with an IMPORT token. The table found on page twelve is taken from one of the best cheat detectors in the industry, JPlag (http://www.jplag.de/). It is a generic token table, so it should not be considered plagiarism to implement it into Expeller. Additionally, one can add his/her own tokens to make this table more complete. Tokens can be added for any language, thereby expanding Expellers abilities.
The easiest way to carry out the implementation of the module described above is to use Expellers getFileAsString(File file, boolean deleteComments); method, with deleteComments set to true. This will return a file as a string object and allow you to replace matches to regular expressions with tokens. (Tokens are simply strings with values equal to the token name.) You will find information about how to create and match regular expressions in the java.util.regex package, as well as in the modCountTokens(File file1, File file2); method of Expeller where tokens are matched with regular expressions. They are then counted and totals are compared via the percentMatch(int number1, int number2); method.
After the token strings are built by replacing all token matches by token names, an algorithm will need to be written to match the token strings of each file to the other. The programmer will need to keep in mind that he/she will need to match token strings (combinations of tokens), as well as individual tokens, between the two files. The programmer will need to devise a sophisticated algorithm, remembering that tokens in the supposed cheaters program can be moved around without affecting its operation.
Lastly, more modules can be added that compare the two files in any fashion. However, the programmer should remember to test each module on real sample data, and throw out any modules that skew Expeller away from actual detection of cheating. This was done many times during the development of Expeller v1.0. Also, to carry a tradition and to let the current programmer know how many revisions have been made to Expeller, please add one tenth to the version number for every new senior project on which the code is based.
A note on building modules: Expeller was built to be completely modular. At minimum, all one has to do to add to the sophistication of Expellers results is add a module to the main code, and then run it and add its percent match to the total percent match of the compareFiles method. In order to keep Expeller additions this simplistic, when creating a method, always only allow it to accept, as parameters, two files, and always return the percent match for that module.

What I would do differently
When I started this project, I had not taken any compiler construction courses, and therefore knew nothing of their topics. I will say that one introduction course in compiler construction has changed the direction of this project so drastically that any student that works on Expeller in the future should have first taken one of these classes.
If I were to begin this project again, I would start with the algorithm described in the Extensions to the Project section of this paper. Even though the current modules of Expeller work incredibly well, seeing the results of the suggested algorithm could spark ideas for even more sophisticated modules than the ones that are currently implemented. Since the project only exists, for most students, over a short period of time (one to two semesters) brainstorming is costly as implementation tends to take the most time.
Also, if there was more than one member in the group, I would have assigned one of the members to do some undercover investigative research. He would have asked a professor that teaches an introductory Java programming course (1214 or 1215) to sit in on about a quarter of his/her classes (especially around the time programs are due). Students would probably not be tipped off that he was not a regularly enrolled student, and he could seek out the cheaters and record their methods. They would most likely not hesitate to provide him with information, especially if they had guaranteed anonymity. Additionally, he may be able to work out some scenario where the professor will make an assignment for every student in the class to alter a program in order to fool the professor into believing that its an original. He could then collect this assignment from the teacher and record what the students have altered, thereby strengthening Expeller.
Expeller works by detecting mimics of cheat methods. The most sophisticated module algorithm is useless if it does not detect real student cheating habits.
Technical Manual
Installation Process
There is no installation process, per se. You simply copy the expeller.java file into a directory anywhere on the system that the target files are located and compile it with the systems native Java Runtime Environment (JRE).
Technical Overview
The most thorough way to describe Expeller in a technical manner is to explain its methods in detail. I will first describe the functional methods (the methods that allow Expeller to function as a program) and then move to the modular methods
.expeller(String[] args) Constructor
Since Expeller runs more as a standalone program and not simply a class, the constructor takes care of running all of the various methods in order. First, it prints out a version number. Second it gathers and dissects the argument user options using the dissectArgs(LinkedList argList) method. It then prints the set user options, and begins loading files with the growFileAndDirectoryArray() method. Fourth, it compares the files using the compareFiles() method and keeps track of the total time that the comparisons took so that it can be outputted to the user. Lastly, it tells the user how many cheats were found and what file they have been outputted to.
dissectArgs(LinkedList
argList)
Arguments are supplied to Expeller at runtime. They need to be recursively evaluated because no argument is mandatory. dissectArgs takes, as a parameter, a LinkedList of arguments, which will act as a stack, from the constructor method. It then checks to see if a help argument is first (meaning the user would like to have a display of all the user options) and then proceeds to check each other argument in the list. Invalid arguments are those that are not help and do not start with a -, so the validity of this is checked first. If an invalid flag is found Expeller tells the user of this error. After Expeller knows that its dealing with a valid argument, the letter for the argument is examined, an action (determined by the argument option letter, described in the User Options section) is carried out. The argument is then popped off the stack, and then the method checks to see if the stack is empty. If not, dissectArgs is called again recursively.
compareFiles()
This method runs the comparison modules of Expeller, records percent matches from each module, computes the total percent match between the two files and then prints the result to an output file.
Files are stored in two arrays, file1[] and file2[]. They are cross-matched via two for loops so that every file is compared against every other. Each pair of files is run through all four modules, the score is recorded, and the loop continues with the next pair. After the total results are computed, the final scores that are at or above the threshold user option are written to an output file. The file stream is then closed.
percentMatch(int
number1, int number2)
This method finds the percent match between two numbers. It uses the generic percent difference formula, (difference/average)*100, and then subtracts the result from 100. If the sum of the two numbers is equal to 0, this method returns a percent match of 100 because this would mean that both numbers are equal to 0 (this method will never receive negative parameters) and the absence of any attribute is just as important as its presence. If the sum is not zero, the method continues. After the percent match is calculated, if it is greater than 0 (percent difference is less than 100), the percent match is returned. If the percent match is less than 0 (percent difference is greater than 100), a 0 is returned.
calculateComparisons(int
numberOfFiles)
This method is used to calculate the number of comparisons that Expeller will have to make, based on the number of files that it found. The formula for comparisons based on number of files is: number of comparisons = (n-1) + (n-2) + (n-3) + + (n-(n-1)), n being equal to the number of files.
getDepthOfFile(File
file)
When we create the arrays for depth length comparisons in the modCountBlockSize method, we need to know the depth of each file. This method calculates that depth by incrementing the depth value for every found { and decrementing it for every found }, all the while keeping track of the deepest depth found.
getFileAsString(File
file, Boolean deleteComments)
This method is the heart of textual comparisons in Expeller. It takes as arguments a file and a boolean that determines whether the returned file string will contain comments.
The method reads in each line of a file and determines if a comment is contained in the line by using the regular expression, (?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*) to make matches and replace them with a blank line, and by searching for occurrences of /* and */ and removing them. If a comment is completely contained within a line, the line is removed, and if a comment is in a line along with valid code, the comment is removed and the code is saved. After all comments are removed from a line, the length is tested, if it is greater than 0, the line is added to the final file string. If not, it is forgotten and the next line is read in. Therefore, during the process of removing comments, this method also removes blank lines (white space) which are not used in comparison.
greaterOfTwo(int
int1, int int2)
This method simply returns the greater of two numbers. It is used to size arrays.
printManual()
When a user needs to know what user option arguments that he can provide Expeller at runtime, he will use the argument help. This will trigger the printManual method, which will print all of the available user options to the terminal. Additionally, if a user enters an invalid command, this method is run to educate him on how his error could be corrected.
growFileAndDirectoryArray()
In order to compare files within a directory, they need to be loaded into some data structure. The easiest way to cross-compare elements is by putting them in arrays and running two for loops, one encapsulated in the other. This method searches through the directory specified at runtime, loads all of the files in that directory into an array and if the subdirectory argument has been given, it examines any directories within the initial directory and also adds files that they contain to the files array.
Additionally, the idea of the filter argument is carried out in this method. If the current found file does not match the filter pattern, it is not added to the file array.
While files are added to the array, a . is printed for each one, to let the user know that the loading is occurring.
modCompareLength(File
file1, File file2)
This module is one of two dumb modules implemented. It simply counts the character length of the string representation of a file1 and file2, and then compares the lengths based on the percentMatch method. It is thought of as dumb because there is no sophisticated calculation involved. However, dumb is not to be confused with ineffective; the results are very extremely useful.
modCompareNumberOfLines(File
file1, File file2)
This method effectively does the same thing as modCompareLength, except that it compares the number of lines in each file rather than the number of characters. It is another dumb module and its results are most important when dealing with files of very different lengths. It very easily rules those out as possible matches. So, even though modCompareLength and modCompareNumberOfLines are simplistic, in some ways, they play a greater role than the more sophisticated modules.
modCountBlockSize(File
file1, File file2)
Most cheaters will change variable names and move code around. They are looking for the easy way out and that does not include re-writing any methods. If they do, they might as well not cheat to begin with. This method counts the number of lines at each depth of a program. Depths are described in the getDepthOfFile method, and are simply thought of as a number value for every instance of { or }. Any text outside of curly braces is thought to be in depth 0, any in the first { is depth 1 and so on.
The number of lines at each depth is calculated and stored in an array at the index equal to the depth value. Each individual depths number of lines of each file is compared with that of the other file and a percent match array is formed. Then all of the percent matches are summed and divided by the number of entries in the percent match array, giving Expeller a total percent match for this module, for any two given files.
The module breaks a file string up into lines via delimiting by the \n character. It then determines if the line contains a { in order to increment the block counter. Once inside the {, it begins to add to the array index [blockCounter] for that depth. If a } is found, the block counter is decremented and lines succeeding it are added to the next shallow depth. If no curly braces are found on a line, one is added to the current depth.
The percent match array is sized by the largest depth array size. Meaning if a file1 has one more depth than file2, the percent match array will have the length of file1s block count array. The value for this index in the percent match array will be zero because one file is 100% different than another in respect to that depth value.
modCountTokens(File
file1, File file2)
When plagiarizing another students code, it is very easy to change variable names. However, changing the variable types, loop types, included libraries, and system calls is more than difficult; it requires knowledge of the language something that the cheater most likely doesnt have.
Counting generally found keywords in files and then comparing the number of matches is a very effective way of catching copies of files. However, comparing the matches of one single keyword between two files is counter-productive -- it would be easy to rewrite one form of loop. So, as more regular expressions are added to this module, the more effective it will become.
This
module works by taking an array of regular expressions, cycling through it, and
counting the number of matches within each file, and then recording each match
at the regular expressions index within the expressions match array. It then
calculates a percent match between each index of file1s match array and the
corresponding index of file2s match array.
Source Code
Java was chosen as the language for Expeller because of its usability across many platforms without the need to change any of the source code. It was determined that having all of the modules and functional code contained in one file would be better than having separate module classes, for ease of transferability (sharing). Rather than requiring the user to download and install a package on a system with which he/she may not be familiar, all that is required to install Expeller is the compilation of one file.
package expeller;
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class expeller {
/* File Array - will contain all of the files
found to be compared */
private
File[] filesArray;
/* Directory Array - will contain all of the
directories where files
are found */
private File[] directoriesArray;
/* Array used to hold each file in the Linked
List for them to be
compared against each other */
Object[] filesListToArray;
/* Holds the directory from which the program
was started to know
where to begin searching */
String topDir =
System.getProperty("user.dir");
/* The
program will need to move through many directories
recursively, so initially, the curDir will
be the topDir */
String curDir = topDir;
/* A flag is a complete argument to the
program, the '-' and whatever
letters or words are attached to it */
String curFlag;
/* The argument key that is attached to the
'-' */
String curArg;
/* This string can be changed to represent the
current version number
of Expeller, which will be displayed at
runtime */
final String VERSION_NUMBER =
"1.0";
/* Linked list that contains all of the
directories to be searched */
LinkedList directories = new LinkedList();
/* Linked list that contains all of the files
to be searched */
LinkedList files = new LinkedList();
/* Linked List that contains all of the
runtime arguments to the
program */
LinkedList argsList = new LinkedList();
/* One argument that can be given to the
program is a filter, so that
the program will only search for a
specific filename or extension,
this
variable holds the filter */
String fileNameOrTypeFilter = "No
filter";
/* Contains the output filename for the file
where the possible
cheaters will be located. By default, it's
cheaters.txt, but the
name can also be supplied as an argument
at runtime */
String outputFileName =
"cheaters.txt";
/* An argument can be given that tells the
program whether to search
through subdirectories */
boolean subDirectories;
/* Argument to supply the percent match
threshold for which cheaters
will be reported */
int percentMatchThreshold = 75;
int cheaters = 0;
/* This is the number of built-in intelligent
modules. Intelligent
modules are deemed intelligent by their
complexity and their
ability to provide a sophisticated
comparison of two files */
final int NUMBER_OF_INTELLIGENT_MODULES = 2;
/* This is the number of built-in dumb
modules. Dumb modules return
some simple comparison about the file. The
two dumb modules
currently implemented count the lines and
character lengths of
files */
final int NUMBER_OF_DUMB_MODULES = 2;
/* This public method allows Expeller to be
run as a stand alone
program */
public static void main(String[] args) {
expeller Expeller = new expeller(args);
}
/* This constructor allows Expeller to be
called from another program
as a stand alone class */
public expeller (String[] args) {
/* Program title and version number */
System.out.println("\nExpeller v"
+ VERSION_NUMBER);
/* If there were command line arguments,
add them to an array and
then dissect each one */
if(args.length != 0) {
for(int i = 0; i < args.length; i++) {
argsList.add(args[i]);
}
dissectArgs(argsList);
}
/*
These are informational lines to tell the user which options
he/she has selected at runtime */
System.out.println();
System.out.println("Root search
directory: " + curDir);
System.out.println("Filename or
extension filter: " +
fileNameOrTypeFilter);
System.out.println("Search
subdirectories: " + subDirectories);
System.out.println("Output file name:
" + outputFileName);
System.out.println("Percent match
threshold: " +
percentMatchThreshold);
System.out.println();
/* This line displays to let the user know
that the program is
currently loading files */
System.out.print("Loading
files");
/* This method actually builds the files
and directories arrays in
order to do recursive searching for
files */
growFileAndDirectoryArray();
/* Copies the elements in the files
LinkedList to an Array to
compare each file against each other */
filesListToArray = files.toArray();
/* Output spacing at runtime */
System.out.println();
System.out.println();
/* This line tells the user how many files
have been loaded for
comparison */
System.out.println("(" +
files.size() + " files loaded)" + "\n");
/* This variable will be used to show the
number of comparisons to
the user */
int numberOfComparisons =
calculateComparisons(files.size());
/* This line tells the user that file
comparisons are currently
happening */
System.out.print("Comparing
Files");
/* This variable is used to hold the time
before the comparisons
begin so that it can be compared to the
end time and the time
that the comparisons took can be
calculated */
long startTime = System.currentTimeMillis();
/* This method actually handles the
comparison of the files */
compareFiles();
/* Holds the system time at the end of
comparisons to calculate the
time it took to compare them all */
long endTime = System.currentTimeMillis();
/* Calculates how long the comparisons took
*/
long secondsForComparisons = (endTime -
startTime)/1000;
/* These lines tell the user how many
comparisons took place and
how long they took */
System.out.println();
System.out.println("(" +
numberOfComparisons + " comparisons
completed in "
+ secondsForComparisons + "
seconds)");
System.out.println();
/* This line tells the user how many cheats
were found and where
the
output file is located */
System.out.println(
"(Output of " + cheaters +
" possible Cheats located in " +
outputFileName + ")");
System.out.println();
}
public void dissectArgs(LinkedList argsList)
{
/* Get the first flag from the argsList */
curFlag = argsList.getFirst().toString();
/* Then remove the flag so that we can get
the argument next */
argsList.removeFirst();
/* If the flag is the help/? flag */
if
(curFlag.equalsIgnoreCase("help")) {
/* Print the Help Menu */
printManual();
System.exit(0);
}
/* If the first flag starts correctly, with a
"-" */
else if(curFlag.charAt(0) == '-') {
/* If it's a directory flag */
if
(curFlag.equalsIgnoreCase("-d")) {
/* Try getting it off the stack */
try {
/* Take the first argument off the
stack and make it curArg */
curArg =
argsList.getFirst().toString();
/* Now remove the first argument on the
top of the stack so
that next time the first argument is
pulled, it will be the
next argument on the stack */
argsList.removeFirst();
}
/* If the flag was thrown and no argument
was supplied, catch it
*/
catch (Exception e) {
/* Print out an error letting the user
know what's going on */
System.out.println("\nNo directory
argument supplied");
/* Print the manual so that the user
can correct his/her error
*/
printManual();
/* Kill the program so that the user
can think about his/her
error and re-initiate the program
with valid arguments */
System.exit(0);
}
/* Temporarily storing the directory
object to check if its
actually a directory */
File tempCurDir = new File(curArg);
/* Check if the directory object is
actually a directory. If so,
make the current directory this
directory */
if (tempCurDir.isDirectory())
curDir = curArg;
/* If the current argument is not a
directory, throw error and
exit */
else {
System.out.println("\nERROR:
Directory does not exist.");
printManual();
System.exit(0);
}
}
/* If the flag is a directory flag */
else if (curFlag.equalsIgnoreCase("-f"))
{
try {
curArg =
argsList.getFirst().toString();
argsList.removeFirst();
}
catch (Exception e) {
System.out.println("\nNo file
argument supplied");
printManual();
System.exit(0);
}
/* Pull the current argument off the
stack and make it the
fileNameOrType variable so that we can
use it to search files
*/
fileNameOrTypeFilter = curArg;
}
/* If the flag is the outputFile flag */
else if
(curFlag.equalsIgnoreCase("-o")) {
try {
curArg =
argsList.getFirst().toString();
argsList.removeFirst();
}
catch (Exception e) {
System.out.println("\nNo file
argument supplied");
printManual();
System.exit(0);
}
/* Pull the current argument off the
stack and make it the
outputFileName variable so that we can
use it to output
Cheaters */
outputFileName = curArg;
}
/* If the flag is the percentMatchThreshold
flag */
else if
(curFlag.equalsIgnoreCase("-t")) {
try {
curArg =
argsList.getFirst().toString();
argsList.removeFirst();
}
catch (Exception e) {
System.out.println("\nNo file
argument supplied");
printManual();
System.exit(0);
}
/* Pull the current argument off the
stack and make it the
percentMatchThreshold variable so that
we can use it to output
only files with percentMatch at or
above this threshold */
percentMatchThreshold =
Integer.valueOf(curArg).intValue();
}
/* If the flag is a subDirectory flag */
else if
(curFlag.equalsIgnoreCase("-s")) {
/* Say that we are searching
subDirectories */
subDirectories = true;
}
/* If it's not any of these, it's an
invalid flag. Print the
manual, throw an error, and quit. */
else {
System.out.println("\nERROR: Invalid
flag supplied.");
printManual();
System.exit(0);
}
/* If there are more arguments in the list,
recursively call this
method and parse them as well */
if (argsList.size() != 0) {
dissectArgs(argsList);
}
}
/* If the flag does not start with a
"-", throw an error, print the
manual and quit */
else {
System.out.println("\nERROR: Invalid
flag supplied. Flags must start" +
" with a
\"-\".");
printManual();
System.exit(0);
}
}
/* This method will take the
files list array and compare each file
with each other */
public void compareFiles() {
/* This number will keep track of the
matching percentage of each
pair of files */
int percentMatch = 0;
/* This variable is only used to tell the
programmer the
percentMatch for each module for
debugging
The tempPercentMatch outputs will be
left in the program (but
commented out) for easy debugging
purposes upon addition of new
modules */
int tempPercentMatch = 0;
/* Here we calculate the number of modules
we currently have
installed DUMB MODULES are divided in
half because their percent
match is not worth as much as an
intelligent module, in fact,
it's worth about half */
int numberOfModulesDenominator =
NUMBER_OF_INTELLIGENT_MODULES +
(NUMBER_OF_DUMB_MODULES/2);
/* We must put this section in a try/catch
statement because we
will be writing to a file, which throws
exceptions */
try {
/* Set up the output stream for the file
to be written to
outputFileName will be cheaters.txt by
default, unless
otherwise specified by the user at
runtime as an argument */
FileWriter outStream = new
FileWriter(outputFileName);
BufferedWriter outputFile = new
BufferedWriter(outStream);
PrintWriter outFile = new
PrintWriter(outputFile);
/* Begin output to cheaters.txt */
outFile.println();
outFile.println("Files to look
at:");
/* These two loops will traverse the
array and compare every two
files */
for (int x = 0; x <
filesListToArray.length - 1; x++) {
for (int y = (x + 1); y <
filesListToArray.length; y++) {
/* Reset the percentMatch and
tempPercentMatch for this pair
*/
percentMatch = 0;
tempPercentMatch = 0;
/* Create two files from the two that
we are currently
looking at in the array */
File file1 = new
File(filesListToArray[x].toString());
File file2 = new File(filesListToArray[y].toString());
/* Prints the two files to be
compared on every iteration
Left in for future debugging */
//System.out.println("\n" +
file1.getName() + " " +
file2.getName());
/* Run the modules, one-by-one, each
one adding to the total
percentMatch */
tempPercentMatch =
modCompareLength(file1, file2);
percentMatch += tempPercentMatch*0.5;
//System.out.println("compare
length percent match: " +
//tempPercentMatch);
tempPercentMatch =
modCompareNumberOfLines(file1, file2);
percentMatch += tempPercentMatch*0.5;
//System.out.println("compare
lines percent match: " +
//tempPercentMatch);
tempPercentMatch =
modCountTokens(file1, file2);
percentMatch += tempPercentMatch;
//System.out.println("count
tokens percent match: " +
//tempPercentMatch);
tempPercentMatch =
modCountBlockSize(file1, file2);
/* We need to have another variable
for this percent match
because if the file comes back
dirty (not a real
compilable code file) the percent
match will be thrown out
*/
int modCountBlockSizePercentMatch =
tempPercentMatch;
/* If the module's percent match is
not thrown out for these
two files, add the
tempPercentMatch to the percentMatch
total */
if(modCountBlockSizePercentMatch !=
-1)
percentMatch += tempPercentMatch;
//System.out.println("count
block size percent match: " +
//tempPercentMatch);
/* If the
modCountBlockSizePercentMatch is valid, we'll need
to divide the total percent match
by it to get a 0-100
average, otherwise divide by one
less to represent the
missing modules percentMatch */
if(modCountBlockSizePercentMatch !=
-1)
percentMatch /=
numberOfModulesDenominator;
else
percentMatch /=
(numberOfModulesDenominator - 1);
//System.out.println("Total FILE
percent match: " +
percentMatch);
/* If the two files' percentMatch is
greater than the
percentMatch threshold, print them
out to the screen */
if (percentMatch >=
percentMatchThreshold) {
/* These lines will print to the
screen any two files that
have a percent match that is the same or
higher than the
threshold Left in for future
debugging */
//System.out.print(filesListToArray[x].toString() + " " +
//filesListToArray[y].toString() + " ");
//System.out.println(percentMatch);
/* These lines write out to file
every two files which
match is at or above the
threshold */
outFile.println();
outFile.println(" Cheat Possibility: " + percentMatch +
"%");
outFile.println();
outFile.println(" " + filesListToArray[x].toString());
outFile.println(" " + filesListToArray[y].toString());
outFile.println();
/* Increment the cheaters value for
output at the end of
the file and on the screen to
tell the user how many
cheaters we found */
cheaters++;
}
/* For every comparison made, output
a '.' to the screen to
let the user know that the system
is working */
System.out.print(".");
}
}
/* Print total number of possible cheats
to file */
outFile.println();
outFile.println(cheaters + "
possible cheats");
outFile.println();
/* Close the output file, which will in
turn write out all data
in the input stream */
outFile.close();
}
catch (IOException exception) {
System.out.println(exception);
}
}
/* This is a simple method to calculate the
number of comparisons
with a give number of files. The formula
used is [(n - 1) + (n
2) + ... + (n - (n - 1)] n = number of
files */
public int calculateComparisons(int
numberOfFiles) {
int numberOfComparisons = 0;
for(int i = numberOfFiles - 1; i > 0;
i--) {
numberOfComparisons += i;
}
return numberOfComparisons;
}
/* Simple method that prints the user manual
*/
public void printManual() {
System.out.println("\nUsage: java
expeller <options>");
System.out.println("where possible
options include:");
System.out.println(" -d <full path> Start search" +
" in this
directory");
System.out.println(" -o <output file name> Output cheaters" +
" to this
file");
System.out.println(" -f <filename or extension> Only search for" +
" a specific
file\n
" +
"name or all files with a\n " +
" specific extension");
System.out.println(" -s Include" +
" subdirectories in
the search");
System.out.println(" -t <cheat threshold> Threshold for" +
" considered
cheaters");
System.out.println(" help/? This menu");
System.out.println();
System.out.println("NOTE: Spaces not
supported in arguments");
System.out.println();
}
/* This method recursively adds files and
directories to their
respective arrays. The recursion comes in
when sub directories
exist */
public void growFileAndDirectoryArray() {
/* Creates a new file object that is
actually the directory from
which the program is run.*/
File file = new File(curDir);
/* Creates a new fileFilter object to
filter only files with the
user specified extension */
FileFilter fileFilter = new FileFilter() {
public boolean accept(File file) {
if(fileNameOrTypeFilter.equals("No
filter"))
fileNameOrTypeFilter = "";
/* If this file object is not a
directory (is a file) and its
extension is the user specified
extension, then return true
to add it to the file array, else
return false and don't add
it to the array */
if (!file.isDirectory() &&
file.getName().endsWith(fileNameOrTypeFilter)
&& file.isFile()) {
return true;
}
else {
return false;
}
}
};
/* Creates a new directoryFilter object to
filter only directories
*/
FileFilter directoryFilter = new
FileFilter() {
public boolean accept(File file) {
/* If this file object is a directory
then return true to add
it to the directory array, else
return false and don't add
it to the array */
if (file.isDirectory()) {
return true;
}
else {
return false;
}
}
};
/* Puts all of the files that comply with
the filter in a file
array */
filesArray = file.listFiles(fileFilter);
/* Puts all of the directories that comply
with the filter in a
directory array */
directoriesArray =
file.listFiles(directoryFilter);
/* Try here because some directories may be
hidden or we may not
have permission to view them */
try {
/* Copies all file locations in the array
to the linked list */
for (int i = 0; i < filesArray.length;
i++) {
/* For every file that is found in
curDir, print a "." */
System.out.print(".");
/* Add every file that is found to the
filesArray */
files.add(filesArray[i]);
}
/* Copies all directory locations in the
array to the linked list
*/
for (int i = 0; i <
directoriesArray.length; i++) {
directories.add(directoriesArray[i]);
}
}
/* Catch the exception that a directory is
hidden or not accessible
because of permissions */
catch(Exception e) {
System.out.println("\nDirectory not
accessible: " + curDir);
System.out.println();
}
/* If there are any directories in the
linked list, we're going to
go into them and search for more files
and directories */
if(directories.size() != 0) {
/* Set the current directory to the
last directory on the
linked list so that we can search
inside it by calling this
method recursively*/
curDir =
directories.getLast().toString();
/* Take the now curDir off the list so
that we don't search it
again */
directories.removeLast();
/* Call the current method recursively
so that we can search
the new curDir for files and
directories */
if(subDirectories == true)
growFileAndDirectoryArray();
}
}
/* This method will return the file as a text
string so that it can
be manipulated and checked. The
deleteComments option indicates
whether you want comments taken out of the
file (it's better to
compare two files without comments since
anything can be put in
comments to throw comparisons off */
public String getFileAsString(File file,
boolean deleteComments) {
/* This variable holds the complete file in
a string as it's read
from the file itself */
String fileAsString = "";
/* Holds each line of the file as it's read
in, and then added to
the
fileAsString */
String currentLine = "";
/* This variable exists to tell Expeller if
the current line is
within a commented segment (the last
line ended without closing
the commented section */
boolean inComment = false;
try {
FileInputStream fileInputStream = new
FileInputStream(file);
InputStreamReader fileInputStreamReader =
new InputStreamReader(
fileInputStream);
BufferedReader fileBufferedReader = new
BufferedReader(
fileInputStreamReader);
/* While there are lines to be read */
while(fileBufferedReader.ready()) {
currentLine =
fileBufferedReader.readLine();
/* If we are currently in a comment and
the comment doesn't end
on this line (there are no end
commend delimiters in this
line), continue the search and this
line will be omitted */
if(inComment == true &&
(currentLine.indexOf("*/") == -1)) {
continue;
}
/* The comment must have ended so set
the inComment flag to
false */
inComment = false;
/* We don't care about leading or
trailing whitespace, so we
trim the line */
currentLine = currentLine.trim();
/* In this section, we will take the
comments out of the file
*/
if(deleteComments == true) {
/* Here we have a regular expression
that should catch all
comments in the file and replace
them with a blank line
("") */
currentLine = currentLine.replaceAll(
"(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)",
"");
/* If we have comments that are not
completely contained on
one line (such as this comment),
we have to signal that we
are inside a comment with the
inComment flag. If the
comment doesn't start at the
beginning of the line, we
still want the text from position
0 to where the comment
begins */
if (currentLine.indexOf("/*")
!= -1) {
currentLine =
currentLine.substring(0, currentLine.indexOf(
("/*"))).
trim();
inComment = true;
}
/* If the current line does contain
an end comment delimiter,
we'll want the text after the
delimiter */
if
(currentLine.indexOf("*/") != -1)
currentLine =
currentLine.substring(currentLine.indexOf("*/")
+ 2,
currentLine.length());
}
/* In the case that a fully commented
line with
[asterik][star] text [asterik][star]
or [slash][slash] is
found, we will replace it with a
blank line, and if so, we
don't want to add it to the
fileAsString because whitespace
doesn't count */
if(currentLine.length() > 0)
fileAsString += currentLine +
"\n";
}
}
catch (Exception e){
e.printStackTrace();
}
return fileAsString;
}
/* This method is used to get the depth of
the file pertaining to
curly braces any information completely
outside of curly braces is
in depth0, anything inside the class call
is depth1, and so on.
Knowing this depth will be helpful in
determining how big our
depth comparison array should be */
public int getDepthOfFile(File file) {
/* This is the depth of any given
succession of opening curly
braces */
int depth = 0;
/* This is the deepest depth found so far
*/
int deepestDepth = 0;
/* Here we're getting the file as a string
and deleting comments
from it */
String fileAsString = getFileAsString(file,
true);
/* We are converting the file string to a
character array so that
we can search through it for the opening
curly brace. We use
this way rather than the StringTokenizer
way due to StringTokenizer's lack of ability to
search for multiple tokens */
char[] fileAsCharArray =
fileAsString.toCharArray();
/* In this for loop we search for open
braces, and for every one,
add one to the depth. If the depth is
greater than the deepest
depth, we make the two equal. We also
subtract one from the
current depth if a closing curly brace
is found to show that we
are leaving a block */
for(int i = 0; i <
fileAsCharArray.length; i ++) {
if(fileAsCharArray[i] == '{') {
depth++;
if(depth > deepestDepth)
deepestDepth = depth;
}
if(fileAsCharArray[i] == '}') {
depth--;
}
}
return deepestDepth;
}
/* Simple method that returns the greater of
two numbers. It is used
to build arrays when there are two
elements involved (since an
array cannot grow) */
public int greaterOfTwo(int int1, int int2) {
if(int1 > int2)
return int1;
else
return int2;
}
/* This dumb module will compare the lengths
of two files */
public int modCompareLength(File file1, File
file2) {
/* Percent match for this module and these
two files */
int percentMatchThisModule = 0;
int file1Length = (int)file1.length();
int file2Length = (int)file2.length();
percentMatchThisModule =
percentMatch(file1Length, file2Length);
return percentMatchThisModule;
}
/* This dumb module will compare the number
of lines in two files */
public int modCompareNumberOfLines(File
file1, File file2) {
int percentMatchThisModule = 0;
int file1NumberOfLines = 0;
int file2NumberOfLines = 0;
String file1AsString =
getFileAsString(file1, true);
String file2AsString =
getFileAsString(file2, true);
/* We simply take the files as strings and
delimit them by the end
of line character in order to count the
number of lines */
StringTokenizer file1AsStringTokenizer =
new
StringTokenizer(file1AsString,
"\n");
StringTokenizer file2AsStringTokenizer =
new
StringTokenizer(file2AsString,
"\n");
file1NumberOfLines = file1AsStringTokenizer.countTokens();
file2NumberOfLines =
file2AsStringTokenizer.countTokens();
percentMatchThisModule =
percentMatch(file1NumberOfLines, file2NumberOfLines);
return percentMatchThisModule;
}
/* This module is an 'intelligent' module. It
counts the number of
lines within any given block */
public int modCountBlockSize(File file1, File
file2) {
/* Here we build arrays to hold the size of
each block in a file.
We add one to the size to account for
text outside of any braces
(depth0) */
int file1BlockSizes[] = new
int[getDepthOfFile(file1) + 1];
int file2BlockSizes[] = new
int[getDepthOfFile(file2) + 1];
/* Array to hold the percent matches for
each depth's lines. It
needs to be the greater size of the two
depths */
int percentMatches[] =
new
int[greaterOfTwo(file1BlockSizes.length,
file2BlockSizes.length)];
/* Used to hold the depth that we are
currently in */
int blockCounter = 0;
/* Holds the percent match so far and the
total percent match for
this module */
int percentMatchesTotal = 0;
int percentMatchThisModule = 0;
/* Used to hold the current line being
looked at from the file */
String file1CurrentLine;
String file2CurrentLine;
/* Both files as strings */
String file1AsString =
getFileAsString(file1, true);
String file2AsString =
getFileAsString(file2, true);
/* We delimit each fileAsString by the \n
symbol to get one line at
a time */
StringTokenizer file1AsStringTokenizer =
new StringTokenizer(file1AsString,
"\n");
StringTokenizer file2AsStringTokenizer =
new StringTokenizer(file2AsString,
"\n");
/* Constant loop to search through the file
*/
while(true) {
/* Try to get the next line */
try {
file1CurrentLine =
file1AsStringTokenizer.nextToken();
}
/* If the file has no (or in some systems
no) lines, we'll need
to abort this comparison */
catch(NoSuchElementException e) {
break;
}
/* Here we're gong to check if this line
contains the beginning
of a block (has opening curly braces)
*/
if
(file1CurrentLine.indexOf("{") != -1 &&
file1CurrentLine.indexOf("}") == -1) {
/* If so, increment the block counter
to show that we have
entered a deeper depth */
blockCounter++;
/* Add one to this depth's location in
the fileBlockSizes array
to show that this depth contains
another line */
file1BlockSizes[blockCounter]++;
continue;
}
/* If we are leaving a block */
if
(file1CurrentLine.indexOf("}") != -1 &&
file1CurrentLine.indexOf("{") == -1) {
/* decrement the block counter */
blockCounter--;
/* If the block counter is less than
zero, that means there
were curly braces that don't close
out a block (invalid
file), so we return a -1 to indicate
that this module's
percent match should not be used in
the total comparison */
if(blockCounter < 0)
return -1;
/* Even tho this line is the end of a
block, it still counts,
so add one to it's location in the
BlockSizes array */
file1BlockSizes[blockCounter]++;
continue;
}
/* This catches all lines that have an
opening and closing
braces, such as a very short method.
We don't add to the
blockCounter here because it would
then be able to fool
Expeller by putting each of your
methods in a single line. We
just count these lines as part of the
block that we are
currently in */
if
((file1CurrentLine.indexOf("}") == -1 &&
file1CurrentLine.indexOf("{") == -1)
|| (file1CurrentLine.indexOf("}")
!= -1 &&
file1CurrentLine.indexOf("{") != -1)) {
file1BlockSizes[blockCounter]++;
continue;
}
}
/* Reset the block counter for file 2 and
do the same thing to it
*/
blockCounter = 0;
while(true) {
try {
file2CurrentLine =
file2AsStringTokenizer.nextToken();
}
catch(NoSuchElementException e) {
break;
}
if
(file2CurrentLine.indexOf("{") != -1 &&
file2CurrentLine.indexOf("}")
== -1) {
blockCounter++;
file2BlockSizes[blockCounter]++;
continue;
}
if
(file2CurrentLine.indexOf("}") != -1 &&
file2CurrentLine.indexOf("{")
== -1) {
blockCounter--;
if(blockCounter < 0)
return -1;
file2BlockSizes[blockCounter]++;
continue;
}
if
((file2CurrentLine.indexOf("}") == -1 &&
file2CurrentLine.indexOf("{")
== -1)
||
(file2CurrentLine.indexOf("}") != -1 &&
file2CurrentLine.indexOf("{")
!= -1)) {
file2BlockSizes[blockCounter]++;
continue;
}
}
/* Here we will compute percent matches for
all of the methods'
lengths and put our results in a percentMatches
array */
for(int i = 0; i <
percentMatches.length; i++) {
try {
percentMatches[i] =
percentMatch(file1BlockSizes[i],
file2BlockSizes[i]);
}
catch(ArrayIndexOutOfBoundsException e) {
/* If a certain file has a depth X, and
another file has the
depth Y, the percentMatch array will
have length equal to
the 'deeper' file. The more shallow
file will not have array
locations for the deeper depths, so
accessing those
locations will throw an exception.
We know why this
exception is thrown (the depth
doesn't exist), so we fill
those percent matches with 0,
because a non-existent depth
compared with any depth other than
non-existent (which isn't
even a valid comparison because the
percentMatches length is
always as long as the deepest depth
file), should be 0 */
while(i < percentMatches.length) {
percentMatches[i] = 0;
i++;
}
break;
}
}
/* Add up all of the depths percent matches
for a total percent
match of this module */
for(int j = 0; j <
percentMatches.length; j++) {
percentMatchesTotal += percentMatches[j];
}
/* We'll need to divide the modules percent
match by the number of
percent matches in order to get a 0-100
scale */
percentMatchThisModule =
percentMatchesTotal /
percentMatches.length;
return percentMatchThisModule;
}
/* This module counts Regular Expressions as
Tokens and compares them
to how many times they were used in
another file. */
public int modCountTokens(File file1, File
file2) {
int numberOfRegularExpressions = 19;
/* This string, along with the entries
below, will be all of the
regular expressions searched for */
String expressionsToSearchFor[] = new
String[numberOfRegularExpressions];
expressionsToSearchFor[0] = "(for)(
*)(\\()";
expressionsToSearchFor[1] = "(while)(
*)(\\()";
expressionsToSearchFor[2] = "(if)(
*)(\\()";
expressionsToSearchFor[3] = "(else)(
*)(if)";
expressionsToSearchFor[4] = "(else)(
*)(\\{)";
expressionsToSearchFor[5] =
"(System.out.println)";
expressionsToSearchFor[6] = "(break)(
*)(;)";
expressionsToSearchFor[7] =
"(continue)( *)(;)";
expressionsToSearchFor[8] =
"(==)";
expressionsToSearchFor[9] =
"(\\+)(\\+)";
expressionsToSearchFor[10] = "(\\-)(\\-)";
expressionsToSearchFor[11] = "(int)(
*)[a-zA-Z][a-zA-Z0-9]*";
expressionsToSearchFor[12] =
"(\\);)";
expressionsToSearchFor[13] =
"(.)";
expressionsToSearchFor[14] =
"[a-zA-Z][a-zA-Z0-9]*\\(\\);";
expressionsToSearchFor[15] = "(!)";
expressionsToSearchFor[16] =
"(\\{)";
expressionsToSearchFor[17] =
"(\\})";
expressionsToSearchFor[18] =
"(||)";
int percentMatchThisModule = 0;
/* Keeps track of the percent match between
the two files for any
given iteration of the search loop */
int percentMatchThisRegularExpression;
/* In these arrays, we'll store the number
of each expression */
int expressionsInFile1[] = new
int[numberOfRegularExpressions];
int expressionsInFile2[] = new
int[numberOfRegularExpressions];
/* These are the required class variables
to perform string
matching with regular expressions */
Pattern pattern;
Matcher matcherFile1;
Matcher matcherFile2;
for(int i = 0; i <
numberOfRegularExpressions; i++) {
percentMatchThisRegularExpression = 0;
/* We compile a pattern from each regular
expression that we are
searching for */
pattern =
Pattern.compile(expressionsToSearchFor[i]);
/* Matcher's created with each file matched
against the regular
expression of this iteration */
matcherFile1 =
pattern.matcher(getFileAsString(file1, true));
matcherFile2 =
pattern.matcher(getFileAsString(file1, true));
/* While finding this pattern in each of
these two files, add 1
to the location in each of their
'found' array */
while(matcherFile1.find())
expressionsInFile1[i]++;
while(matcherFile2.find())
expressionsInFile2[i]++;
/* We add this iterations percent match
to the total percent
match for this module */
percentMatchThisRegularExpression =
percentMatch(expressionsInFile1[i],expressionsInFile2[i]);
percentMatchThisModule +=
percentMatchThisRegularExpression;
}
/* We divide the percentMatchThisModule by
the number of Regular
Expressions and we get a percentMatch on
a scale of 1-100 */
percentMatchThisModule /=
numberOfRegularExpressions;
/* We return the percentMatchThisModule to
be included in the total
percent match */
return percentMatchThisModule;
}
/* This is the method that computes the
percent match of two numbers.
The equation that is used is 100 - percent
difference (which is
difference/average) */
public int percentMatch(int number1, int
number2) {
int difference = number1 - number2;
int sum = number1 + number2;
/* If the two numbers added together are
equal to 0 (if both
numbers are 0 because we won't ever have
negative numbers),
return a 100 percent match because
non-existence is just as
important as existence */
if(sum == 0)
return 100;
/* If one number b is greater than number
a, we'll end up with a
negative difference, so reverse the sign
*/
if(difference < 0)
difference *= -1;
/* We set up our equation */
int numerator = difference;
double denominator = (sum/2.0);
double equation = numerator/denominator;
/* We actually just calculated percent
difference, so to get
percent match we'll need to subtract
percent difference from
100. We cast this as an int because it's
possible for the
denominator to be a decimal number */
int percentMatch = (int)(100 -
(equation*100));
/* If the percent match is greater than 0
(the percent difference
is less than 100), return the percent
match */
if(percentMatch > 0)
return percentMatch;
/* If the percent match is less than or
equal to zero (the percent
difference is greater than 100), return
0 */
else
return 0;
}
}
After-graduation Project Maintainability
Expeller is truly
maintenance-free. Updates are desired, but not needed. One simply needs to keep
a copy of the source code to compile on whatever machines the future has
in-store.
The source code to
Expeller should be openly available. The strongest encryption is one with a
publicly available key. In Expellers case, the key to understanding and
thereby circumventing it is the source code. However, if Expeller becomes
sophisticated enough, having the source code will not aid at all in
circumvention. By open-sourcing the code, Expeller will undoubtedly draw the
attention of intelligent programmers and become nearly invincible to plagiarism
attacks. My intention is to host Expeller and all related informational
material on my website (http://www.blaynedreier.com/)
in hopes of allowing it to grow.