Countering AI disinformation and deep fakes with digital signatures

According to The Economist, disinformation campaigns (often state-sponsored) use “AI to rewrite real news stories”:

In early March [2024] a network of websites, dubbed CopyCop, began publishing stories in English and French on a range of contentious issues. They accused Israel of war crimes, amplified divisive political debates in America over slavery reparations and immigration and spread nonsensical stories about Polish mercenaries in Ukraine… the stories had been taken from legitimate news outlets and modified using large language models.

Deep fakes of still images and now video clips are similarly based on legitimate original photos and video. Detecting such fakery can be challenging.

Disinformation comes from publishers (social media posters, newspapers, bloggers, commenters, journalists, photographers, etc.) who invent or misquote factual claims or evidence. Ultimately, we trust publishers based on their reputation – for most of us an article published by chicagotribune.com is given more credence than one published by infowars.com.

An obvious partial solution (that I haven’t seen discussed) is for publishers to digitally sign their output, identifying themselves as the party whose reputation backs the claims, and perhaps including a permanent URL where the original version could be accessed for verification.

Publishers who wish to remain anonymous could sign with a nym (pseudonym; a unique identifier under control of an author – for example an email address or unique domain name not publicly connected with an individual); this would enable anonymous sources and casual social media posters to maintain reputations.

Web browsers (or extensions) could automatically confirm or flag fakery of the claimed publisher identity, and automatically sign social media posts, comments, and blog posts. All that’s needed is a consensus standard on how to encode such digital signatures – the sort of thing that W3C and similar organizations produce routinely.

Third party rating services could assign trust scores to publishers. Again, a simple consensus standard could allow a web browser to automatically retrieve ratings from such services. (People with differing views will likely trust different rating services). Rating services will want to keep in mind that individual posters may sometimes build a reputation only to later “spend” it on a grand deception; commercial publishers whose income depends on their reputation may be more trustworthy.

Posts missing such signatures, or signed by publishers with poor trust scores, could be automatically flagged as unreliable or propaganda.

Signatures could be conveyed in a custom HTML wrapper that needn’t be visible to readers with web browsers unable to parse them – there’s no need to sprinkle “BEGIN PGP SIGNED MESSAGE” at the start of every article; these can be invisible to users.

Signatures can be layered – a photo could be signed by the camera capturing the original (manufacturer, serial number), the photographer (name, nym, unique email address), and publisher, all at the same time, similarly for text news articles.

When a new article is created by mixing/editing previously published material from multiple sources, the new article’s publisher could sign it (taking responsibility for the content as a whole) while wrapping all the pre-existing signatures. A browser could, if a user wanted and the sources remain available, generate a revision history showing the original sources and editorial changes (rewording, mixing, cropping, etc.). Trust scores could be automatically generated by AI review of changes from the sources.

Video could be signed on a per-frame basis as well as a whole-clip or partial-clip basis. Per frame signatures could include consecutive frame numbers (or timestamps), enabling trivial detection of selective editing to produce out-of-context false impressions.

If there’s a desire for immutability or verifiable timestamps, articles (or signed article hashes) could be stored on a public blockchain.

Somebody…please pursue this?

Using AI to moderate online discussions

This brief post is here solely as prior art to make it more difficult for someone to patent these ideas. Probably I’m too late (the idea is so obvious the patent office probably has a dozen applications already), but I’m trying.

GOALS

One of my pet projects for years has been finding a way to promote civil discussion online. As everyone knows, most online discussion takes place in virtual cesspits – Facebook, Twitter, the comments sections of most news articles, etc. Social media and the ideological bubbles it promotes have been blamed for political polarization and ennui of young people around the world. I won’t elaborate on this – others have done that better than I can.

The problem goes back at least to the days of Usenet – even then I was interested in crowdsourced voting systems where “good” posts would get upvoted and “bad” ones downvoted in various ways, together with collaborative filtering on a per-reader basis to show readers the posts they’ll value most. I suppose many versions of this must have been tried by now; certainly sites like Stack Exchange and Reddit have made real efforts. The problem persists, so these solutions are at best incomplete. And of course some sites have excellent quality comments (I’m thinking of https://astralcodexten.substack.com/ and https://www.overcomingbias.com/), but these either have extremely narrow audiences or the hosts spend vast effort on manual moderation.

My goal (you may not share it) is to enable online discussion that’s civil and rational. Discussion that consists of facts and reasoned arguments, not epithets and insults. Discussion that respects the Principle of Charity. Discussion where people try to seek truth and attempt to persuade rather than bludgeon those who disagree. Discussion where facts matter. I think such discussions are more fun for the participants (they are for me), more informative to readers, and lead to enlightenment and discovery.

SHORT VERSION

Here’s the short version: When a commenter (let’s say on a news article, editorial, or blog post) drafts a post, the post content is reviewed by an AI (a LLM such as a GPT, as are currently all the rage) for conformity with “community values”. These values are set by the host of the discussion – the publication, website, etc. The host describes the values to the AI, in plain English, in a prompt to the AI. My model is that the “community values” reflect the kind of conversations the host wants to see on their platform – polite, respectful, rational, fact-driven, etc. Or not, as the case may be. My model doesn’t involve “values” that shut down rational discussion or genuine disagreement (“poster must claim Earth is flat”, “poster must support Republican values”…), altho I suppose some people may want to try that.

The commenter drafts a post in the currently-usual way, and clicks the “post” button. At that point the AI reviews the text of the comment (possibly along with the conversation so far, for context) and decides whether the comment meets the community values for the site. If so, the comment is posted.

If not, the AI explains to the poster what was wrong with the comment – it was insulting, it was illogical, it was…whatever. And perhaps offers a restatement or alternative wording. The poster may then modify their comment and try again. Perhaps they can also argue with the AI to try to convince it to change its opinion.

IMPORTANT ELABORATIONS

The above is the shortest and simplest version of the concept.

One reasonable objection is that this is, effectively, a censorship mechanism. As described, it is, but limited a single host site. I don’t have a problem with that, since the Internet is full of discussions and people are free to leave sites they find too constraining.

Still, there are many ways to modify the system to remove or loosen the censorship aspect, and perhaps those will work better. Below are a couple I’ve thought of.

OVERRIDE SYSTEMS

If the AI says a post doesn’t meet local standard, the poster can override the AI and post the comment anyway.

Such overrides would be allowed only if the poster has sufficient “override points”, which are consumed each time a poster overrides the AI (perhaps a fixed number per post, or perhaps variable based on the how far out of spec the AI deems to the post); once they’re out of points they can’t override anymore.

Override points might be acquired:

  • so many per unit time (each user gets some fixed allocation weekly), or
  • by posting things approved of by the AI or by readers, or
  • by seniority on the site, or
  • by reputation (earned somehow), or
  • by gift of the host (presumably to trusted people), or
  • by buying them with money, or
  • some combination of these.

Re buying them with money, a poster could effectively bet the AI about the outcome of human moderator review. Comments posted this way go online and also to a human moderator, who independently decides if the AI was right. If so, the site keeps the money. If the moderator sides with poster, the points (or money) is returned.

The expenditure of override points is also valuable feedback to the site host who drafts the “community values” prompt – the host can see which posts required how many override points (and why, according to the AI), and decide whether to modify the prompt.

READER-SIDE MODERATION

Another idea (credit here to Richard E.) is that all comments are posted, just with different ratings, and readers see whatever they’ve asked to see based on the ratings (and perhaps other criteria).

The AI rates the comment on multiple independent scales – for example, politeness, logic, rationality, fact content, charity, etc., each scale defined in an AI prompt by the host. The host offers a default set of thresholds or preferences for what readers see but readers are free to change those as they see fit.

(Letting readers define their own scales is possible but computationally expensive – each comment would need to be rated by the AI for each reader, rather than just once when posted).

In this model there could also be a points system that allows posters to modify their ratings, if they want to promote something the AI (or readers) would prefer not to see.

The only good patent is an expired patent

Update – 2024-05-09:

In the 14 years since I posted this, I’ve changed my mind. Without patent protection against low-effort knock-offs, it can be really difficult for small firms to get VC funding. At the very least, I overstated the case.

Somewhere else I’ve said “Everything is more complicated than it seems.”. This is one of those.


…indeed the best thing about patents is that they eventually expire. (And I say this as an inventor with multiple issued patents.)

See, for instance, http://news.cnet.com/8301-10784_3-57374941-87/litigation-lunacy-silicon-valleys-lost-its-collective-mind/?part=rss&subj=news&tag=2547-1_3-0-20.

Or http://en.wikipedia.org/wiki/Nest_Labs#Litigation.

And in general: http://www.techdirt.com/.

Having said that, I guess I have to also say that I’m not against patents in principle, only in practice.

I usually try to stay out of politics on this blog, but this affects us nerds in particular.

If we could somehow have a working patent system that limited patents to truly original and (above all,) non-obvious inventions, ideally ones that involved genuine investment (instead of off-the-cuff ideas), then I’d be in favor of that. But the current system is supposed to do that already, and fails miserably.

The result is worse than no patent system at all. Ironically, the current patent system, which is supposed to encourage innovation, instead stifles it – the risk of company-killing lawsuits over genuinely independent inventions (and therefore, in my book, obvious ones) far outweighs any encouragement.

In my view, to qualify as non-obvious, an applicant should be required to show that her invention solves a long-standing (not recent) problem which other people have had ample opportunity to solve, but have been unable to. Too many modern patents are obvious solutions to new problems which either never existed before (because a new technology raises new problems) or which only recently became solvable because of new technology. For example, sending voice over the Internet is obvious once you have an Internet. Nobody should get a patent on that just because they were the first, as this is obvious. Heatsinking a LED for domestic lighting is obvious – no patent should issue simply on the basis that nobody did it before, as that is because LED use for domestic lighting is a new application, and therefore the problem never came up before.

Apparently somebody was issued a patent on using a rotating mirror to scan a laser in a polymer 3D printer – because it’s a “new application” of the invention of the rotating mirror. There’s at least one guy who therefore is using a rotating prism instead, to work around the patent. In his own words “even though this would have been obvious for any bachelor physics student looking into this topic”. Using a rotating mirror (or prism) this way is not in any sense “invention” – it’s workaday engineering.

Solutions should be considered obvious if they appear very quickly after appearance of the problem, or if multiple independent “inventors” come up with the same solution over a short period of time.

While I’d prefer real reform of the patent system along these lines – which would reject 98%+ of currently issued patents (including most of mine) – political reality seems to make that unlikely. Given the choice between the current system and no patent system at all, I’d choose none.