I have a case where a professor is charged with plagiarism, that is stealing a lot of pages of another professor's book and published it. The suspect professor's laptop is seized. I have both the laptop and the plagiarized texts of the book. The prosecutor is asking whether or not the laptop has the whole or any part of the text of the book.
Preliminary search amogst text, zip, pdf, image files did not yield the texts as a file. And I have checked the deleted files. No positive results. In case it might been deleted, I chose 10 keywords from the text and made a search. No positive result.
Now, I think about the possiblity that suspect might have copied only some part of the text, which is still plagiarism, and my keywords might not be from those parts. Then the keyword search might not guarantee that laptop does not have even a slightest part from the book.
It does not seem practical to chose a keyword from each paragraph which would make hundreds of keywords. So, what do you think will be the best method to guarantee that the suspect laptop does or does not have any part from the book?
The prosecutor is asking whether or not the laptop has the whole or any part of the text of the book.
This is an exercise in textual analysis it very probably needs the appropriate tools. There's one system called 'PAIRWise' that identifies possible plagiarism between one suspect text and a database of known sources. I think it looks for exact matches of six or more contiguous words between two documents. I've seen a demo of it, but have not had reason to use it myself.
It's available for free download, but it needs a bit of setting up to work.
Just Google for 'PAIRwise download'.
One important check is to determine how the Professor writes books, ie does he use Word, or other WP packages. If Word, which version, 2003 will have straight text, DOCX has text in Zip containers, and all XML based.
Cloud computing where the data is elsewhere?
Encryption and password protection. Does (s)he use that.
Determine the Professors normal mode of operation and make sure your search routine finds key phrases in these documents.
Check all file signatures - has he renamed a .doc to be a .plag??
There is something I do not understand.
At least here plagiarism is when you publish as yours something that has been extensively "extracted" from an ALREADY PUBLISHED book/work.
If this is the case, it is well possible that the "bad" professor simply typed his book while reading a paperback copy of the "original author".
And usually plagiarism, expecially at academic level is more "conceptual" than "textual".
Practical (layman's) example, the sentence
It has been observed that cats like to stay over the table because they don't like to stay on the ground.
may roll be a breakthough in zoology, when published on "Feline Monthly" 😯 .
If another professor writes the sentence
Mammals of the Felinae subfamily of the family Felidae - and particularly the Felis catus - due to their adversion against lower places do indulge in the habit of residing usually on desks, benches and similia.
that would be a plagiarism even if very few of the "same" words are in both sentences.
What you can eventually prove by doing a plain "text search" is that the "bad" professor STOLE a file containing the "original author" book….
…. which has nothing much to do with a plagiarism suspicion.
jaclaz
jaclaz,
The bad professor steals the text from another prof's unpublished texts, not from a published book. Moreover the bad professor publishes stolen text in his book, acting before the victim.
athulin,
I have checked PAIRwise, however, it requires the suspect text in the form of a computer file to compare against a database, wherease I only have photocopied hard copies of the stolen text. Thanks anyway.
One other way seems to be examination of text linguistically, in terms of author's style in wording, based on authors' previously published works. I have talked a forensic linguist and he said the author's wording style could help in determining who is the real owner of a piece of text.
One other way seems to be examination of text linguistically, in terms of author's style in wording, based on authors' previously published works. I have talked a forensic linguist and he said the author's wording style could help in determining who is the real owner of a piece of text.
Yes, that is exactly the direction I was suggesting. D
jaclaz,
The bad professor steals the text from another prof's unpublished texts, not from a published book. Moreover the bad professor publishes stolen text in his book, acting before the victim.
That would create (here) a primary charge for the actual stealing of the original AFAIK, that what was perplexing me.
If the "stolen original" was NOT in the form of a file, it is however unlikely that the "bad professor" typed it "as is" on his PC and then "transformed it" in it's work, it is more probable that either the original never "entered" the laptop or it did so in the form of scans, in which case you need to search for .bmp, .jpg and also "scanned" .pdf's for which no "text search" is available.
Once you have recovered any and all pictures, a quick and dirty way to find actual scans is using colour mapping, this tool is handy for initial "supervising"
http//
jaclaz
If the professor kept the original files as TIFs or PDFs as images, your search of text will not find the original pages. You will need to find the TIFs and PDFs, OCR them and then search.
and my keywords might not be from those parts
Use key words from suspicious parts of the text.
Of course, if the professor is stealing concepts, (rather than phrasing), that could make it more difficult to find the alleged originals. The originals being unpublished also makes things more difficult.
Might you be going about this the wrong way? Let the professors and academics sort out questions of language and concept. Focus on what evidence you can find on the professor's computer.
Dom
-woohoo, first post.
I vote for a duel.
I think a duel with épée , or rapier between the two professors is appropriate.