To Catch a Plagiarist

Joshua T. Katz May 1, 2024

The plagiarism wars have begun. Claudine Gay is out as president of Harvard, in large part because of conduct that the Harvard Corporation and Gay herself refuse to describe with the p-word, and the coming months will probably be painful for quite a few people who write for a living.

As a result of outrage on both left and right (the former often seems intent on bringing down established universities and other institutions from the inside, the latter from the outside), we can be certain that a number of scholars, journalists, speechwriters, and pundits—men, women, black, white, young, old, Democratic, Republican—will be hit with credible charges of plagiarism. Although few cases are likely to be as remarkable in their bang-for-the-buck as Gay’s, many writers are wondering whether the inadvertent omission of a quotation mark decades ago will pop up in an AI search and destroy a career. In an article for The Atlantic, Ian Bogost describes the unnerving process of checking his own work himself, and I would not be surprised if someone, somewhere, were right now busily uploading into a plagiarism bot everything I’ve published on Homer, Old Irish, and the dismal state of higher education. (If you are doing this, I hope you’ll take the time actually to read the work.)

So, how do you catch a plagiarist? Of course there is plagiarism software, which computer science departments have been successfully using for years to detect programming assignments that students have copied from others, sometimes with light modifications. These days, both teachers and editors of scholarly journals are increasingly putting work written in a natural language—in short, essays—through plagiarism detection programs. The expansion of AI in everyday life will normalize such efforts.

But there are also old-fashioned methods of detecting plagiarism, and we should not abandon them. For one thing, software applied to work that was definitely composed by flesh-and-blood people still yields “false-positives”—putative instances of plagiarism that aren’t—through which a human must laboriously comb. Furthermore, such software does not yet appear to be especially good at reliably determining what was written by man and what by machine.

Two decades ago, before text-matching software was widely available, I served on a Princeton student–faculty committee that investigated dozens of cases of suspected plagiarism by undergraduates, many of them in essays for classes in the humanities and social sciences on such topics as the War of 1812, To the Lighthouse, and the Japanese economy. I was struck by just how easy it often was to spot instances of plagiarism, even before I had seen the copied text. The giveaways fell—and, I expect, still fall—into three categories: inconsistent typography, inconsistent punctuation, and broader stylistic inconsistencies.

First, inconsistent typography. It was astonishing to me how frequently students submitted work in which a sentence or paragraph was formatted differently from the rest of the paper. In such cases, a quick search would usually reveal that just those words had been copied and pasted from some online source. An essay written in twelve-point type would suddenly have a sentence in eleven-point type. Or, out of nowhere, words would appear in dark gray Calibri rather than in Times New Roman and standard black. The spacing between lines in one section would be subtly different from the spacing everywhere else. In one memorable instance, quotation marks and apostrophes in an essay were “curly”—except in the plagiarized sections, where they were “straight.”

Second, inconsistent punctuation. Some students regularly use the Oxford comma; others don’t. Unfortunately, increasingly many students have no conception of consistency, which is bad news but not a matter of plagiarism. However, for those who do, when one striking sentence has an Oxford comma and no other sentence with the form “X, Y(,) and Z” does, experience shows that something will turn up when that sentence is googled. I can say similar things about the use of lowercase or capital letters after a colon and any employment at all of the semicolon.

Finally, there are things that just don’t make stylistic sense. In an essay on World War II written by an American, you don’t expect to find instances of the locution “the Second World War”—unless they’re between quotation marks because they’ve been taken from a properly credited source. You also don’t expect to find, as I once did, a sentence beginning with the word “Whilst” but including the word “honor”—because the American student, copying the sentence from a British publication, knew enough to change “honour” to “honor” but not enough to change the conjunction.

Some readers may view these last paragraphs as quaint, and at some level they are. But even in the age of automated plagiarism detectors, these old-fashioned methods have their use: On occasion, a plagiarist claims not to have deliberately copied but rather to have internalized another’s language and accidentally reproduced it. This must indeed sometimes happen. With certain typographical and stylistic inconsistencies, however, anyone can tell at once that that’s not the case.

This brings me to a larger question: What is plagiarism? Two issues deserve attention. One, to which I will return, is whether we should learn to speak differently of different kinds of plagiarism, more or less as the law distinguishes among first-, second-, third-, and fourth-degree criminal offenses.

The other concerns AI. No one should forget that in the months before the shake-up at Harvard, academic dishonesty was already on everyone’s mind because of ChatGPT. More people in the United States googled “plagiarism” in the last week of April and first week of May 2023 than in the first two weeks of December, when Gay’s plagiarism was exposed. According to a poll conducted by the online magazine Intelligent, within weeks of the debut of ChatGPT, “30% of college students ha[d already] used ChatGPT on written homework.” Reliable statistics are hard to come by, in part because there are so many other AI-powered tools (for instance, Claude and Grok), but it is hard to imagine that the percentage has not been rising in the 2023–24 academic year.

Figuring out how to sustain academic integrity in an environment more and more dominated by AI—which, of course, also powers plagiarism-detection software—needs to be a top priority for administrators and teachers at all educational levels. We must decide whether what might be called conventional plagiarism is fundamentally the same as using AI to do what is supposed to be one’s own work. I admit that I don’t yet know what exactly I think but point readers to an essay in The Atlantic by Matteo Wong (I will assume he wrote it himself) titled “What if We Held ChatGPT to the Same Standard as Claudine Gay?” Noting that AI depends on copyrighted materials, Wong concludes that “the technology [is] guilty of mind-boggling levels of plagiarism.” Maybe so. But it is not obvious that for an individual to steal another person’s work (the Latin word plagiarius means “kidnapper”) is the same sort of offense as passing off as one’s own the work of an anonymous bot.

Another important question is how plagiarism should be punished. Does intent matter? What about magnitude? How should we assess the differences between copying one or two short sentences and copying one or two long paragraphs? Between copying a paragraph for one essay and copying dozens of paragraphs for dozens of essays?

I believe it is wrong to suspend undergraduates for comparatively minor academic infractions—and I feel more strongly about this than I did when I sat on that committee at Princeton. Anyone who disagrees with me on this point should at least recognize that it is hypocritical of universities like Harvard and Princeton to punish students harshly while downplaying the plagiaristic behavior of senior faculty. As Aaron Sibarium has pointed out, when she was dean of the Faculty of Arts and Sciences, Claudine Gay “watered down [Harvard’s] policy on research misconduct” so that faculty—but not students—“could be sanctioned only if they plagiarized ‘knowingly, intentionally, or recklessly.’” I am not deaf to arguments in favor of this sort of change for everyone, faculty and students alike. But if Gay’s record of verbal theft doesn’t count as “reckless,” we are redefining that term as well as “plagiarism.”

And one more question: What kind of offense is plagiarism? In January, the philosopher Kathleen Stock wrote an article titled “Plagiarism is not a Sin,” harshly condemning plagiaristic practice but arguing that “[t]he infringement is intellectual not moral.” To my eyes, it is both.

Simply put, plagiarism is theft. Yes, there is some truth to Stock’s assertion that “[w]ords are public property anyway. It’s not like you are stealing possessions from people”: Because intellectual property is not considered a possession, the law treats it differently from personal property; additionally, although I am not terribly sympathetic to them, there are philosophical objections to the very idea that intellectual property deserves robust protection. But words do matter. Consider how assiduously the Harvard Corporation and Claudine Gay worked to avoid the p-word: They spoke instead of “inadequate citation,” “duplicative language without appropriate attribution,” and “material [that] duplicated other scholars’ language, without proper attribution.” In this case, one particular word, “plagiarism,” mattered so much that they were willing to defy their English Sprachgefühl in order to avoid it.

It is telling that Harvard’s euphemisms compound the ugliness of plagiarism with the ugliness of deliberately obscurant bureaucratese. The bigger problem here is that we—parents, teachers, journalists, administrators, the wider public—often fail to model good linguistic practice, especially when it comes to inculcating in children an appreciation for the beauty and power of words. A proper education involves reading widely, admiring good sentences and scoffing at bad ones, writing draft after draft of one’s own compositions, and generally attending to how rhetoric shapes argument and narrative.

There are very few occasions—terse emergency instructions present one—when one person’s language should be interchangeable with another’s. You may or may not like my style, but for better or for worse, it is mine. If I suddenly began to sound like someone else, or produced what the technology writer Anna Wiener has dubbed “garbage language,” I hope that those who know me would notice. To judge by the dreck that so many people churn out, in some cases even duplicate, no one has taught them about style (the word is related to “stylus,” with both going back to Latin stilus, “spiked writing instrument”) or pointed out to them that (as I put it last year in the New Criterion) “[t]he sentences I write don’t sound as good in your mouth”—or on your page—“for much the same reason that your shirt doesn’t quite fit on me.”

Language reflects reality imperfectly, but it’s by far the best medium we have to express what is, as well as what was and what might yet be. Simply put, we are logocentric creatures. Style matters because it shows our interlocutors that we take our—and their—verbal expressions seriously. And for many of us, an appreciation of words is ultimately an appreciation of logos, of the Word.

If you believe, with John, that the Word is God, then to abuse it is a sin. But certain kinds of linguistic abuse can be a moral failure even for nonbelievers, who should strive to employ words faithfully, even though they do not have faith. Now and again, all of us do violence to and, maybe, also with language. We curse, prevaricate, belittle, and engage in sophistry. And sometimes we may, intentionally or not, take someone else’s phrase or thought as our own.

What we need now are honest discussions of issues that are colliding in new and forceful ways: of how to instill a love of language in the young; of when, if ever, language (or its absence: silence) may be called violence; of the future of authorship and personal style in the age of AI; and of what plagiarism is, how it should be punished, and how those who transgress may redeem themselves.

Language is a gift, whether or not you hold it to be divine. Language deserves to be appreciated, cultivated, and delighted in. It is high time that we recommit ourselves to logos.

Joshua T. Katz is senior fellow at the American Enterprise Institute.

Image by Microbiz Mag on Wikimedia Commons, licensed via Creative Commons. Image cropped.

FEATURED IN THE

May 2024 Issue

Joshua T. Katz

Joshua T. Katz is a senior fellow at the American Enterprise Institute.

MORE BY THE AUTHOR

We’re glad you’re enjoying First Things

Create an account below to continue reading.

Or, subscribe for full unlimited access

Already a have an account? Sign In

Cookie	Duration	Description
__cf_bm	1 hour	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__stripe_mid	1 year	Stripe sets this cookie to set a unique session identifier to recognize users across sessions.
__stripe_sid	1 hour	Stripe sets this cookie to set a unique session identifier for a single session.
_cfuvid	session	Calendly sets this cookie to track users across sessions to optimize user experience by maintaining session consistency and providing personalized services
_shopify_country	1 hour	Shopify sets this cookie to store the preferred country setting chosen by the visitor.
_tracking_consent	1 year	Shopify sets this cookie to store a user's preferences if a merchant has set up privacy rules in the visitor's region.
ak_bmsc	2 hours	This cookie is used by Akamai to optimize site security by distinguishing between humans and bots
AWSALBCORS	7 days	Amazon Web Services set this cookie for load balancing.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
cookiesession1	1 year	This cookie is set by the Fortinet firewall. This cookie is used for protecting the website from abuse.
crumb	session	Squarespace sets this cookie to prevent cross-site request forgery (CSRF).
issuem_lp	28 days	The cookie is set by Leaky Paywall plugin to track restricted content based on the subscription levels.
keep_alive	1 hour	The keep_alive cookie is used to maintain a user's session active on a website, preventing automatic logout during periods of inactivity.
lp_us_his	30 days	The cookie is set by Leaky Paywall plugin to track restricted content based on the subscription levels.
m	1 year 1 month 4 days	Stripe sets this cookie for fraud prevention purposes. It identifies the device used to access the website, allowing the website to be formatted accordingly.
ts	1 year	PayPal sets this cookie to mitigate risks and ensure transaction integrity.
ts_c	1 year	PayPal sets this cookie to ensure the security of transactions and verify user authentication.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
XSRF-TOKEN	session	This cookie enhances visitor browsing security by preventing cross-site request forgery.

Cookie	Duration	Description
pu-cookie-encrypted	session	Popup Maker plugin sets this cookie to enable the website to display a popup on the website. The cookies also serve the purpose of preventing the same popup being shown to the users repetitively.
pum_alm_first_activity	1 day	Popup Maker plugin sets this cookie to enable the website to display a popup on the website. The cookies also serve the purpose of preventing the same popup being shown to the users repetitively.
pum_alm_last_activity	1 day	Popup Maker plugin sets this cookie to enable the website to display a popup on the website. The cookies also serve the purpose of preventing the same popup being shown to the users repetitively.
pum_alm_pages_viewed	3 months	Popup Maker plugin sets this cookie to enable the website to display a popup on the website. The cookies also serve the purpose of preventing the same popup being shown to the users repetitively.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
yt-remote-cast-available	session	The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
yt-remote-cast-installed	session	The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-fast-check-period	session	The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app	session	The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name	session	The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY	never	The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie	Duration	Description
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
loglevel	never	Squarespace sets this cookie to maintain settings and outputs when using the Developer Tools Console on the current session.

Cookie	Duration	Description
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_landing_page	14 days	Shopify installs this cookie to track landing pages.
_orig_referrer	14 days	Shopify sets this cookie to be used in connection with shopping cart.
_scribd_session	3 years	Scribd sets this cookie to implement audio-files on the website and determines how many and who have listened to these files.
_shopify_s	1 hour	This cookie is associated with Shopify's analytics suite.
_shopify_y	1 year	This cookie is associated with Shopify's analytics suite.
vuid	1 year 1 month 4 days	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos on the website.

Cookie	Duration	Description
NID	6 months	Google sets the cookie for advertising purposes; to limit the number of times the user sees an ad, to unwanted mute ads, and to measure the effectiveness of ads.
scribd_ubtc	1 year 1 month 4 days	Scribd sets this cookie to gather data on user behaviour across several websites and maximise the relevancy of the advertisements on the website.
VISITOR_INFO1_LIVE	6 months	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
VISITOR_PRIVACY_METADATA	6 months	YouTube sets this cookie to store the user's cookie consent state for the current domain.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

To Catch a Plagiarist

May 2024 Issue

Joshua T. Katz

Pure Episcopalianism

October 7 and the World’s Narrative

Athens, Jerusalem, and Manhattan

We’re glad you’re enjoying First Things

Cookie	Duration	Description
__cflb	1 hour	This cookie is used by Cloudflare for load balancing.
__Secure-ROLLOUT_TOKEN	6 months	Description is currently not available.
__tad	10 years	No description available.
atl_uuid	12 years 7 months 22 days 10 hours	Description is currently not available.
atltestbucketv1	session	Description is currently not available.
blaize_session	session	No description available.
blaize_tracking_id	session	No description available.
cp_sessionid	1 year 1 month 4 days	Description is currently not available.
guest	1 month	No description available.
is_gdpr	session	No description available.
kppid	session	No description available.
techno	5 minutes	Description is currently not available.
userReferer	1 month	No description available.
WMF-Uniq	1 year	Description is currently not available.