Was Shakespeare really a chatbox

Increasingly ChatGPT is being used by humans to generate content, varying from speeches, parts of dissertations and even classically written drama. A new classifier tool has been developed that attempts to distinguish between human outputs and those generated by ChatGPT.

Unfortunately, ChatGPT’s new classifier tool to detect AI-generated text was revealed to be imperfect within a few hours of its launch.

Sebastian Raschka, an AI and ML researcher educator at Lightning AI, began testing the OpenAI Text Classifier on ChatGPT with text snippets from a book he published in 2015. Three different passages received varying results, the tool reported that it was “unclear” whether the book’s preface was written by AI, but the foreword was “possibly” AI and a paragraph from the first chapter was “likely” AI.

When text from Shakespeare’s Macbeth was subjected to scrutiny by the classifier, it rated it as “likely”. Given Macbeth was written over 400 years ago, it is a fair bet to assume Shakespeare did not use AI. When asked if he was surprised by the results, Raschka said “Yes and no”. Macbeth was not written in modern English or, indeed, in the English as was spoken in Tudor times so the “classifier” may not have been “trained” on anything similar with which to compare it.

Open AI admits their classifier, which is a GPT model that is fine-tuned via supervised learning to perform binary classification, with a training dataset consisting of human-written and AI-written text passages, is only about 26% accurate.

But it also says the tool can still be useful when used alongside other methods. It should not be used on its own, rather it should be used as a complement to other methods of determining the source of text instead of being the primary decision-making tool.

Open AI say they are making the classifier publicly available “to get feedback on whether imperfect tools like this one are useful,” adding that they will continue working on detecting AI-generated text and “hope to share improved methods in the future.”

OpenAI is not alone in attempting to ratify generative AI detection. There is an abundance of similar classifier tools available to unleash on text. GPTZero, for example, provides a score that then must be interpreted by the user.

Raschka recently said. “GPTZero does not recommend whether the text was AI-generated or not. Instead, it only returns the “perplexity score” for a relative comparison between texts. This is nice because it forces users to compare similar texts critically instead of blindly trusting a predicted label.”

DetectGPT, “perturbs” the text under its scrutiny: That is, if the probability of the new text is noticeably lower than the original one, it is AI-generated. Otherwise, if it’s approximately the same, it’s human-generated. The problem is, such an approach involves using a specific LLM (large language model), which “may not be representative of the AI model used to generate the text in question.”

More at: https://bit.ly/3JDv2ex