UW Ellbogen CTL:  Instructional Computing Services
Back to:  Faculty Help

Last update 18 May, 2009; R. Hill

What's all this about plagiarism and what help can technology offer?

 

Instructors who assign writing projects occasionally find themselves uncomfortably suspicious of the originality of student work, and, having heard that the Internet is a rich source of plagiarized material, wonder how to search for a possible source of that work.  Many Internet sites do indeed provide self-published commentary and fiction, many provide documents as a public service by government agencies, and some have been set up to sell papers and essays outright (www.duenow.com).

The staff of the Ellbogen Center for Teaching and Learning does not, in general, recommend the use of software to detect plagiarism, but rather recommends that students be given developmental assignments, turning in progress reports, prospectuses, drafts, and bibliographies, both to discourage plagiarism and to encourage good research and revision.  The Council of Writing Program Administrators explains this issue in a position statement (http://wpacouncil.org/positions/WPAplagiarism.pdf) that makes sense to us.

We are investigating purpose-built software to assess both its benefits in general and the features of specific products.  For the time being, we recommend

Note, in the tests below, all with copying from Internet sources, that Google search performed as well as anything else, even on variations of the text. 

Any mechanical text search is a simple tool only (with limitations implied by the test results in the Appendix below), and should play a minor part in assessment.  We maintain that both positive matches and negative results should lead to the same step-- a conversation with the student about both the subject matter and the writing process-- and that this conversation is the only reliable indicator of what's going on, anyway.    Please speak to Jane Nelson, Director of the Ellbogen Center, for more advice.

 

TESTS from MAY 2009

Test RESULTS for SeeSources

  1. Report on recombinant DNA, www.ai.mit.edu/research/abstracts/abstracts2000/ps/z-abelson.ps, also used in Tests of 2002 as #6 below
    >>  Submitted last two paragraphs in plain text, after extraction from PostScript file
    >>>>  Result:  Source found in PDF paper and derivative abstracts web page.
  2. Report on recombinant DNA, #6 below, varied by substitution of a few words and phrases
    >>  Submitted variant plain text
    >>>>  Result:  Source revealed (as original PDF document)
  3. Actual paper submitted to Nursing faculty
    >>  Submitted in full (as uploaded Word document)
    >>>>  Result:  Sources found in both PDF brochure and web page from national health care organization, a source already suspected by Nursing faculty member
  4. Column "Hunger Strike" by Ed Griffin-Nolan, Syracuse New Times, July 30, 2008(?)
    >>  Submitted page 3 of 4 pages, plain text
    >>>>  Result:  Source revealed in newspaper's archives
  5. Column by Betsy Wade, 1992, New York times, "Lots of Pomp, and a Little Happenstance," varied by substitution of a few words
    >>  Submitted two paragraphs, with slight changes
    >>>>  Result:  Source revealed in newspaper's archives

Test Results for Doc Cop

  1. Report on recombinant DNA, www.ai.mit.edu/research/abstracts/abstracts2000/ps/z-abelson.ps, from Tests of 2002, #6 below
    >>  Submitted last two paragraphs in plain text, after extraction from PostScript file
    >>>>  Result:  Source revealed (as original PDF document) on clicking Google search link under first string found
  2. Report on recombinant DNA, #6 below, varied by substitution of a few words and phrases
    >>  Submitted variant plain text
    >>>>  Result:  Source revealed (as original PDF document) on clicking Google search link under first string found
  3. Actual paper submitted to Nursing faculty
    >>  Submitted ~500 words made up of selected sentences
    >>>>  Result:  Source revealed (as referenced PDF document described above) on clicking Google search link under first string found; 2% of submission.  (Note that results are shown for all strings, but that only the strings found are relevant to the source search).
  4. Column "Hunger Strike" by Ed Griffin-Nolan, Syracuse New Times, July 30, 2008(?)
    >>  Submitted page 3 of 4 pages, plain text
    >>>>  Result:  Source NOT FOUND
  5. Column by Betsy Wade, 1992, New York times, "Lots of Pomp, and a Little Happenstance," varied by substitution of a few words
    >>  Submitted initial lines, with slight changes
    >>>>  Result:  Source revealed in newspaper's archives on clicking Google search link for first string found

Test Results for Google Search Engine

  1. Report on recombinant DNA, www.ai.mit.edu/research/abstracts/abstracts2000/ps/z-abelson.ps, from Tests of 2002, #6 below
    >>  Submitted first 256 characters in search box
    >>>>  Result:  Source revealed (as original PDF document) at top of list
  2. Report on recombinant DNA, #6 below, varied by substitution of a two initial words
    >>  Submitted first 256 characters to in search box
    >>>>  Result:  Source revealed (as original PDF document) at top of list
  3. Actual paper submitted to Nursing faculty
    >>  Submitted sentence from middle of paper, truncated to Google limit
    >>>>  Result:  Source revealed (as original PDF document described above) at top of list
  4. Column "Hunger Strike" by Ed Griffin-Nolan, Syracuse New Times, July 30, 2008(?)
    >>  Submitted page 3 of 4 pages, plain text
    >>>>  Result:  Source revealed in the paper's archives
  5. Column by Betsy Wade, 1992, New York times, "Lots of Pomp, and a Little Happenstance," varied by substitution of a few words
    >>  Submitted initial lines, with slight changes
    >>>>  Result:  Source revealed in newspaper's archives

 

Out-of-date

 

 

Out-of-date

 

 

Out-of-date

 

 

Out-of-date

 

 

Out-of-date

 

 

Out-of-date

 

 

Out-of-date

 

 

Out-of-date

 

 

Out-of-date

 

 

Out-of-date

 

 

Out-of-date

 

 

Out-of-date

 

 

Out-of-date

 

 

Out-of-date

Among the many services that compare text submissions to Internet documents are Turnitin and EVE2.  Finding Turnitin inadequate due in great measure to its dubious writing model, in which a student presumably submits successive refinements of a paper until it "passes" the plagiarism test, UW selected EVE2. However, as of 2009, we cannot make EVE2 yield dependable results.  We no longer support or provide this software.

TESTS of 2002  (Adapted from 20 September 2002 workshop notes)
NOW OUT-OF-DATE

 

TEST RESULTS-- EVE2

EVE2 does not provide direct links between passages from the submitted text and source documents found. In other words, to discover plagiarism, the instructor would have to look through the source documents found, perhaps with a string search, to spot exact duplication. The following tests use these EVE2 settings:  Quick search, 50% cutoff
  1. A Robert Frost essay appearing as an HTML (text) file at www.robertfrost.org/essay.html
    >> Submitted as "The Most of Rhodora" by Lilliwhite Hands
    >>>> Result: Original source (HTML essay) found. 30% match
  2. A cut-and-paste combination of two reviews of the book "Xanthippic Dialogues" by Roger Scruton, one at www.geocities.com/Athens/Ithaca/2564/scruton.htm and one at www.staugustine.net/review.html
    >> Submitted as "Xanthippe's Presence" by Constant Cadger
    >>>> Result: One original source, the Geocities page, was found; the St. Augustine Press page no longer on the web. 43% match
  3. A short report, four paragraphs with headings, by someone named Justin on a Geocities page at http://www.geocities.com/lizards_312/Justins-Universe-dense-objects.html
    >> Submitted as "Afterlives of Stars" by Justice Knott (verbatim, in full)
    >>>> Result: Original source (Geocities page) found. 75% match.
  4. An essay from David Corker at the University of East Anglia (American Studies) on methaphor, available at http://www.uea.ac.uk/eas/People/corker/In%20Defence%20of%20Metaphor.htm
    >> Submitted as "Metaphor Rules" by Diablo Corker (verbatim, in full)
    >>>> Result: Original source found (UEA faculty page). 100% match.
  5. The first seven pages of an essay on tourism in national parks from a Dutch university source, in PDF form, by Jan van der Straaten, found by searching for "environment rain forest species" in Google, at http://greywww.kub.nl:2080/greyfiles/worc/1996/doc/17.pdf
    >> Submitted as "What's All This Then About National Parks" by Margy Bargy
    >>>> Result: Original PS source found ("greywww.kub.nl"), but with a low match level, either because only the first few pages were submitted or because the Postscript file would contain many extraneous printer language commands. (Also found my own notes for this workshop!) 12% match.
  6. A report on recombinant DNA in PostScript form (text with embedded commands) at www.ai.mit.edu/research/abstracts/abstracts2000/ps/z-abelson.ps
    >> Submitted as "Recombining" by Joe Schmo
    >>>> Result: Found 11 sources, mostly MIT sites, including the original, but not the original abstract. 67% match.
  7. A brief extract from a longer observation on Chaucer's Clerk's Tale, from http://www.richardhay.com/chaucer.html
    >> Submitted with slight variations in wording, and a couple of additional sentences inserted.
    >>>> Result: Not found, possibly because the 50% match criterion was not met due to the brevity of the extract relative to the original document.

EVE2 succeeded in most cases, finding obvious Internet documents along with original Postscript sources, amateur pages on commercial servers, and the problematic overseas university page. We advise, however, that the instructor never use its results alone as a basis for judgment regarding any given essay. As the last test shows, mechanical matching driven by parameters can yield false negatives, and false positives can be generated by earlier versions of the test document, or by lengthy quotation.

TEST RESULTS-- TURNITIN

(March 2002)  Turnitin failed half of the tests submitted, especially for PDF and PostScript files, commercial servers available to the public, and overseas sources. See notes for the workshop of March 2002 for the full story. The company's claim that the use of paper mills will be revealed remains untested, as we balked at purchasing such a paper for test submission.

TEST RESULTS-- SEARCH ENGINES

  1. (March, 2002) This extract from the Jan van der Straaten PDF paper (see above) was typed in to the search phrase window of various search engines.
    "The disadvantages of this development are increasingly being recognised by politicians, particularly within the European Union. In recent European documents, such as the Fifth Action Programme, it is argued that the traditional development of the countryside should be stopped and that a sustainable development of society should result in limitations to the 'normal' economic development of regions."

    Results:
    Google (Advanced, exact phrase): "404 Not Found"
    Alta Vista (Advanced, exact phrase): "Found 0 results"
    MSN Search for exact string cut off at "... European doc," then failed.
    Excite: No exact phrase search.
    Dogpile: No exact phrase search.

  2. (March 2002) This shorter extract, a single sentence from the van der Straaten paper, was submitted.
    "In recent European documents, such as the Fifth Action Programme, it is argued that the traditional development of the countryside should be stopped and that a sustainable development of society should result in limitations to the 'normal' economic development of regions."

    Results:
    Google (Advanced, exact phrase): Successful; found van der Straaten paper.
    Google was able to find this source because it translates PDF documents into HTML as it inspects them.
    Alta Vista (Advanced, exact phrase): "Found 0 results"

  3. A single sentence from the essay on Chaucer (see above) was submitted to Google.

    Results: Google immediately found the original Richard Hay piece.