Understanding Breaking News and Detecting AI Text

Understanding Breaking News and Detecting AI Text

Reading the news is crucial to understanding the world, but it can also be taxing. Given the multitude of negative things in the news — even and especially as things are happening — it's easy to get overwhelmed by and often hard to make sense of breaking news in the moment.

News, like science, changes over time with a better, more complete picture. Yet as news breaks, an incomplete picture is formed. Many readers flock to social media anecdotes and secondary/tertiary sources to fill in that picture in the moment, which often leads to the spreading of inaccurate, misleading, and sometimes deceptive information.

The following playbook is decidedly unoriginal. The steps were lifted from On the Media's Breaking News Consumer Handbook. Though these first appeared in 2013 and the media landscape has changed drastically since then — pivots to video, away from video, and back again, among many other goofy cash-grabs — they still hold true and should be what every news consumer flocks to every time news breaks (particularly the negative kind) and gets pushed to every reader via notification or email.

(As I believe On the Media is one of the best regular examinations of the news media ecosystem, I invite you to donate to their parent organization, WNYC, to keep them going.)

In the immediate aftermath, news outlets will get it wrong.

This does not mean all news outlets, but time has shown that even the most trusted sources may report details that, later on, don't pan out. The killing of Charlie Kirk comes to mind, as many details about the killer in the first hour and the following 24 were wrongly reported, partially due to incorrect reporting from the FBI. This is why I take initial reports more as "possibilities" than straight facts.

Don't trust anonymous sources

"Something happened" says one source, who declined to be named.

In recent incidents, I've seen this appear in live blogs and breaking news stories, only to be retracted or negated shortly thereafter. Anonymous sources rarely hold weight, especially as news breaks.

Don't trust stories that cite another news outlet as the source of information

I often find this to be a good rule not just for breaking news, but most news. Nearly every reporter and publication relies heavily on scoops to draw eyeballs these days, and unless a story citing said scoops actually builds on the initial story, I often find that it serves little to no value. In the moment of breaking news, however, always try to find the primary source breaking the story — and understand that they still may not have a complete picture.

There's almost never a second shooter

Almost is the keyword here, as in the case with the recent shooting in Bondi, Australia, there was a second shooter. Yet in the case of mass shootings, which unfortunately make up way too much of our news cycle due to their frequency, a second shooter is exceedingly rare.

Pay attention to the language the media uses

As On the Media writes:
• “We are getting reports”… could mean anything.
• “We are seeking confirmation”… means they don’t have it.
• “(News outlet) has learned”… means it has a scoop or is going out on limb.

Look for news outlets close to the incident

This means paying attention to local news and reporters. With shrinking budgets, national news media cannot be everywhere, and though it is sadly shrinking, local reporters add important color and context for specific incidents.

Compare multiple sources

This should be one's MO not just for breaking news, but all news. For instance, Gothamist does a great job writing about local New York news, but specialty publications like Chalkbeat will dive further on stories related to schools. Publications and journalists cannot cover all details and angles, so it's best to see which other trusted sources are covering the same story.

Big news brings out the fakers and photoshoppers

On the Media originally drafted this rule 13 years ago, yet while the core warning remains valid, the stakes have changed significantly: we must now contend with an immediate deluge of deepfakes accompanying every breaking news event. A recent example occurred during the capture of Maduro in Venezuela, where AI-generated fabrications flooded X/Twitter within ten minutes of the arrest announcement. This rapid proliferation of deepfakes is more or less a guarantee for any story with even moderate visibility. Given this reality, readers should severely limit trust in audio and visual content to materials released by verified primary sources and recognized journalists.

Beware reflexive reposting

The more you share breaking, ever-changing, and often unverified news, the greater the potential you have for spreading mis/disinformation. Use the above steps and wait for greater verification of facts before sharing, lest you want to add to the confusion and misinforming of others.


Remember: the odds of fully understanding a story in the moment are rare, if not impossible. You are more likely to get a better picture 12-48 hours removed than you are in the first few hours. By diligently reading the news, following and considering developments, and following the above steps, you're sure to avoid mis/disinformation, come to a better understanding, and hold a more informed opinion of the whole story.


Detecting AI Text

A lot of digital ink is spilled on LinkedIn regarding how to detect AI-generated text. Almost all of it is wrong. Though large language models like to pepper in em-dashes and use "it's not x; it's y" statements, so do humans. (As you can tell, I love em-dashes — and all dashes, too — to interject and add on to thoughts.) The many different pointers and indicators of what may or may not be generative text are as sound as phrenology.

Case in point: one of the paragraphs in this newsletter is fully AI generated. I wrote this newsletter word by word, fed it into Gemini, and asked it to replace one of the paragraphs with a completely rewritten paragraph that said the same thing. Think of this as kind of my own little Rosenhan experiment, only with infinitely lower stakes. (Feel free to reach out with your own assumptions as to which one.)

The point is, manual detection of AI generated text is a fool's errand. Large language models are not smart, but really good at autocomplete. They predict the next best word to come after the previous one with mind-numbing accuracy. As models improve, this prediction and the knowledge base it pulls from gets better and bigger. Yet how they arrange text is still mathematical in nature and the equation in arrangement of text is more or less the same one used years ago.

If you want to detect text, you're better off using an AI text detection model. (Transparently, I happen to work for a platform that builds and sells one, though we are decidedly not for consumer usage.) Yet unless you are a professor or someone who needs to check for cheating or moral/ethical reasons, it unfortunately doesn't really matter anymore. Billions of people worldwide use large language models for writing and a multitude of other tasks, many with the aim of obfuscating the fact that they used them in the first place. To detect all text would be quite the Sisyphean task.

There's also the question of when something stops being AI generated text. Virgil Abloh used to say if you change 3% of something, it becomes an original work. I think 3% is a stretch, to put it mildly, but if you change around, say, 30% or more of AI generated text, it breaks that mathematical equation that arranged the text in the first place. This not only makes it impossible for text detectors to accurately detect said text, but may have enough human meddling to possibly be considered not-so-generated anymore.

This is not an endorsement of LLM writing. I feel that if you don't put forth the effort in writing something, I as the reader should not put forth the effort in reading it. Yet sparring over what is AI generated and what is not on a shallow web filled with AI generated text is a losing battle. There are better things to do, like read the news!

App of the Week: WhisperNotes

Years ago, OpenAI open sourced their speech to text model, Whisper, and put it on GitHub. Regardless of how you feel about ChatGPT and the parent company, Whisper still holds up as one of the best free ways to quickly transcribe voice notes locally — as in, on your computer and only on your computer. (NVIDIA's Parakeet is awesome, but I need to spend more time with it.)

WhisperNotes is the best interface for that model. On Mac, iPhone, and iPad, it allows you to either record audio and transcribe it in seconds or load an audio file and transcribe it. All of this is done locally — it works in a faraday cage — and without connecting to the internet, save for initially downloading the model. If privacy is important to you but you still want to avoid transcribing interviews and voice memos, this app does it with a high degree of accuracy and in seconds. It's $5 without a subscription, which means you could probably ditch that Otter AI subscription like I did.