Leveraging the Internet for Exhaustive Plagiarism Detection

Picture of Jacobi Petrucciani

Jacobi Petrucciani

August 29, 2017

In higher education, having the internet as a resource is both a blessing and a curse. To learn more about a subject, nearly anything can be researched to generate an unlimited amount of material on the subject. People who like to teach themselves can do so successfully. Unfortunately for computer science instructors, the internet is also the perfect place place to share, find, and commandeer code. Because assignments in computer science courses can take hours or even days to solve and answers can be found easily online or taken from fellow students, plagiarism runs rampant.

Computer science is a complex subject and it requires mental preparation to learn and become masterful at coding. Many students turn to the internet for assistance. Here, they can learn the inner workings of computers and how humans interact with them. Those who have the time (and willpower) can obtain all sorts of knowledge by using the internet. Although those in computer science need to master online research to be successful in the field, some students still feel the need to cheat because they aren’t willing to put the necessary effort into their coursework.

Plagiarism is a major hurdle in higher education. It obstructs the learning comprehension of the students that partake. It also damages the quality of their education, the reputation of the university, and those who graduate from that university. In 2000, The Center for Academic Integrity found that 80% of college students were cheating during their enrollment [1]. In the 17 years since, the internet’s resources have continued to populate, creating endless opportunities for cheating.

As the internet becomes an ever more powerful resource, plagiarism worsens. How can universities handle plagiarism on such a large scale in computer science? TAs can review each student’s code, but is that really the best use of resources on hand? TAs need to be available to help students actually learn, not just to enforce a school’s plagiarism policy. With the number of computer science students steadily increasing each year, the amount of work required to grade and check for plagiarism is becoming almost impossible to manage without the help of software.

Detecting plagiarism in code is quite complex. Simply comparing the two code files 1:1 to check for plagiarism doesn’t always work. Students can still share their code with others or find it online. Once they’ve obtained the code, there are different things they can alter within it that won’t affect the actual output or results of the program. These modifications make it more difficult to determine that the code was actually copied. Checking two files 1:1 can be easily gamed by making slight adjustments like changing the variable names, or shifting code structure by a few lines to make it seem unique. With tiny changes, the student can make their code look original, when in reality it’s practically still someone else’s work. This is where most automated plagiarism checkers can fall short.

The team at Mimir utilizes a much more comprehensive approach when checking for plagiarism within Mimir Classroom. The platform combines metrics and proven methods of checking file structure and coding style to give accurate readouts detailing why two files look similar and why a plagiarism flag was raised.


Mimir ©

This model allows the platform to check for a miscellany of plagiarism, not only the blatant copy paste style. Algorithms in the platform also learn based on instructor feedback, so they’re constantly improving based on the instructor’s classes. Our tools scan common coding sites including GitHub, StackOverflow for potential sources of plagiarism.

While no system is perfect, our team is constantly working to ensure that the best detection is being provided through the newest and most accurate methods that exist. Continual improvement of the quality of our detection software combined with feedback from instructors using our platform allows us to stay on top of the latest trends in plagiarism. Our goal is to act as a powerful deterrent, while encouraging students to learn and implement the concepts on their own.