Ask HN: What do you use for content moderation of UGC?

36 points by jhunter1016 6 months ago

My co-founder and I started a static site hosting platform. From our past experience in user-generated content, we knew we needed moderation. So, we wrote a very simple script that checks for common phishing attempts and alerts us.

However, this simple script does not catch some of the more advance phishing sites and it doesn't catch other types of content that we can't support on our platform like porn.

Curious if anyone has tips and tricks for content moderation? We're still going to be manually reviewing sites because we haven't reached a scale that makes that impossible. But automation is nice.

nbadg 5 months ago

Follow-up question: what work has been done on client-side moderation? I know this gets dangerously close to the kind of content scanning that eg apple has tried (to very detrimental results), but hear me out: I really think this is a prerequisite to end-to-end encryption on a social network (there has to be some level of protection; even if 100% of users report 100% of bad content, imagine scrolling a feed and stumbling upon CSAM simply because you were the first person to see it). I also also think it's possible to strike a balance that preserves user agency while still protecting them, by simply inserting a manual reporting step. So, for example, potentially problematic content gets put behind an interstitial with a content warning and options to view, hide, report, etc. But again, this requires client-side content classification.

I'm aware of eg NSFWJS, which is a tensorflowJS model [1]. Is there anything else that, say, can also do violence/gore detection?

[1] https://github.com/infinitered/nsfwjs

seabass-labrax 5 months ago

I don't think such scanning is logically a prerequisite to end-to-end encryption. In the case of a direct message from one person to another, one presumes that the recipient either wishes to receive the content or considers it spam.
Spam filters on the recipient side are already viable; there's no advantage to having them on the sending side where they can be more easily bypassed. If the recipient does intend to receive illegal content, then they will simply arrange to use a communication method which does not have scanning on the sender's side. Whether or not encryption is used is immaterial.
What about group chats? Again, if the group doesn't want to receive illegal content then they can use a bot account for spam detection. End-to-end encryption doesn't change the situation at all.
I'm weary (if not wary) of discussion on this topic because it always comes down to the idea that developers of communication methods have a responsibility to read and potentially censor everything that is sent by that method. Yet that was never the usual expectation for snail mail; indeed many countries have laws against reading other people's letters. It was also never the expectation for the Internet back when it was being designed. So by all means discuss the relative merits of censorship, but the existence of encryption is not a valid argument one way or the other!
- everforward 5 months ago
  
  I think OP is asking for the opposite. Platform-level moderation practically has to happens server-side because clients can just lie about having done the scanning.
  I believe the ask here is for receive-side moderation, similar to an ad blocker. If the messages are encrypted, the platform can't moderate them. Consumers would do their own content moderation locally and discard/hide/clickthrough/etc content they don't want to see, without actually removing it from the platform.
  Eg I'm sure the video of the UHC CEO getting shot made the rounds in some group chats, and that would be hard for some people to see. These kinds of filters would let those users hide the video on their messenger without impacting anyone else.
  - nbadg 5 months ago
    
    Exactly correct, and excellent real-world example!
petercooper 5 months ago

If it's going on in any large scale way on a major site, I've not encountered it (but it might not be something they'd shout about) but similar types of models are being put into apps frequently, as seen in this post recently posted to HN: https://altayakkus.substack.com/p/you-wouldnt-download-an-ai (a mobile app that detects currency client-side)
One problem is legal liability around training and testing. If you use a large foundation vision model, you could describe your moderation criteria in text and cross your fingers, but those aren't going to be running client-side. If you want a compact, efficient model to embed into an app, you need to train it.. but with what? There's a dilemma here where it's easier to build filters for legal things because you can't legally use the illegal things to train a filter for the illegal things! Systems like NSFWJS work because run of the mill porn is legal in most jurisdictions, but an equivalent "CSAMJS" filter would be a legal nightmare to produce.
- nbadg 5 months ago
  
  Re: liability: definitely an issue. Might be the kind of thing that can only really be done in cooperation with eg ICMEC.
  That being said, it might be possible to get to "better than nothing" with transfer learning, by training a model to detect both "there are children here" as well as "this is pornography". I have no idea what the success rate would be, but there have been some pretty impressive generalization results in recent years with classification models.
  - petercooper 5 months ago
    
    That seems viable. But even if it works, you need to eval/test it.. which isn't a job I'd ever want to be within a million miles of. I hope the organizations tasked with tackling the problem are working on solutions third parties can deploy because they're probably the only people who legally and morally can. New laws like the UK's Online Safety Act make this quite an urgent task, too.

Freak_NL 5 months ago

Is a static site hosting platform required to proactively monitor which content paying users host in your jurisdiction?

Wouldn't a solid set of processes to handle content complaints and knowing who your customers are in case the hosting country's law enforcement has a case suffice?

Or are you having some free tier where users can anonymously upload stuff?

In the latter case — a free place to stash megabytes — you'll need to detect password protected archives in addition to unencrypted content. Get ready for a perpetual game of whack-a-mole though.

jhunter1016 5 months ago

Especially for phishing content, the requirement isn’t so much a legal one (though that’s important to watch for) as it is a practical one. People flagging phishing content on your domain can get your domain wiped from the internet. Trust me, we had a terrifying 8 hours at my day job a couple of years ago because of this.
- EE84M3i 5 months ago
  
  Sounds like you got on the safe browsing list? How did you get it resolved?
  - jhunter1016 5 months ago
    
    We had to call our DNS provider who told us there was nothing they could do because it was blocked by the tld owner which was an Italian company. So we called anyone and everyone we could and tried to communicate in broken Italian until they finally agreed to lift the block. It was the worst day we ever had as a company.
stevage 5 months ago

Legal is not the only justification though. Depending on how visible the UGC is to other users, spam/porn/phishing could quickly degrade the site's user experience and reputation.

brudgers 6 months ago

If you care about moderation, a lot of it has to be done manually. Manual moderation requires placing a high level of trust in the moderators. That means that either you pay them well enough to care, or you build a community which user-moderators will protect. Or both.

That makes approximately all business ideas to host user generated content non-viable. The conflict is dynamic and you are the Maginot Line...except that any breach of laws creates a potential attack by state enforcement agencies too.

To put it another way, ASCII files and a teletype were enough to see pictures of naked ladies. Good luck.

Terretta 5 months ago

OpenAI's moderation API

https://platform.openai.com/docs/guides/moderation

1f60c 5 months ago

Genius! I assume it's intended to be used to moderate the input to and output of OpenAI models, but I think it will moderate anything regardless of the source, and it's free (aside from one network call, and potentially sharing data with OpenAI?).
- spjt 5 months ago
  
  I would be very wary of relying on anything from OpenAI, pricing and availability can change at any time.
  - fragmede 5 months ago
    
    As can any vendor. Has OpenAI pulled a Google maps before, or is this just a generic worry about relying on any vendor, of which there is plenty of competition?
jhunter1016 5 months ago

Oh man, this is smart! I’ll have to try this.

mooreds 5 months ago

Full disclosure, I work for the company that owns Cleanspeak[0].

We have many happy clients that moderate UGC with Cleanspeak, including gaming and customer service applications. You can read more about the approach in the docs[1] or the blog[2]. Here's a blog post[3] that talks about root word list extrapolation, which is one of the approaches.

Cleanspeak is not the cheapest option and there's some integration work via API required, but if you are looking for performant, scalable, flexible moderation, it's worth an eval.

0: https://cleanspeak.com/

1: https://cleanspeak.com/docs/3.x/tech/

2: https://webflow.cleanspeak.com/blog

3: https://cleanspeak.com/blog/root-word-list-advantage-content...

ceejayoz 5 months ago

Tell your company they need a pricing page.
- mooreds 5 months ago
  
  Will do!

AyyEye 5 months ago

We use a combo of AI and the cheapest African contractors we could find.

dsr_ 5 months ago

Remember that the price of not having a good appeal process overseen by reliable humans is a terrible reputation.

BrunoBernardino 5 months ago

How serendipitous! I did an Ask HN last week [1] trying to get platform creators to talk about this without much success. In any case, I've built a solution for links and emails, with an API [2], in case that helps. No subscription and I'm happy to provide some credits for free, for you to test it. Reach out if you're interested!

[1]: https://news.ycombinator.com/item?id=42780265 [2]: https://oxcheck.com/safe-api

jhunter1016 5 months ago

Interesting, this seems to be designed for end user protection but if I’m understanding it correctly it could be used by us for proactive detection.
- BrunoBernardino 5 months ago
  
  Thanks! It was initially, and our first early users were individuals, but we received a few comments that this functionality would be more valuable and useful for platforms, so we're also exploring that now, with a couple of platform customers already!

jamesponddotco 5 months ago

I don’t have an answer for everything you are looking for, but I wrote bonk[1] to solve a similar issue, as I needed to ensure users weren’t uploading porn[2]. Maybe you can find an use for it too.

[1]: https://git.sr.ht/~jamesponddotco/bonk

[2]: Because I host in Germany.

abraae 5 months ago

> The name bonk comes from the "Go to horny jail" meme
You are likely aware by now but "bonking" someone is also anglosphere slang for exactly what you're trying to detect.
- jamesponddotco 5 months ago
  
  Yep, but I choose to focus on the meme since it makes me chuckle.

bauerpl 5 months ago

Gumroad open-sourced Iffy yesterday, and you can check it out: https://github.com/anti-work/iffy

seabass-labrax 5 months ago

Unfortunately it's only source-available rather than open source: the licence is granted specifically to small businesses, and there are some additional factors that make modification impractical. Also, it looks like the software relies on ChatGPT, which of course isn't open source itself.
- sahillavingia 5 months ago
  
  Appreciate the feedback!
  We’ll look into making it easy to support a self-hosted model.

scarface_74 6 months ago

I hate to be that guy. But this seems like the perfect use case for an LLM. First put content through your script and then through a decently prompted LLM. Anything it catches, put in a queue for manual review.

TYPE_FASTER 5 months ago

Yep, Azure offers automated content moderation: https://azure.microsoft.com/en-us/products/ai-services/ai-co...
I don't know if they're using a LLM specifically, as opposed to say computer vision models or some other method.
Pricing: https://azure.microsoft.com/en-us/pricing/details/cognitive-...
AWS has something similar: https://aws.amazon.com/rekognition/content-moderation/
notatoad 5 months ago

i'm pretty sure most content moderation strategies operate on a budget below what most LLMs would cost.
for any site hosting user-generated content, how efficiently you can run your moderation system is essentially your unique selling point - LLM is the baseline.
- simonw 5 months ago
  
  You may be shocked at how inexpensive some of the LLMs are these days.
  Google Gemini 1.5 Flash 8B charges $0.04/million input tokens and $0.15/million output tokens.
  If a piece of content that needs to be moderated is 1,000 tokens (that's pretty long!) and you expect a 10 token return it will cost you 0.0039 cents - that's not a dollar amount, that's less than a 250th of a single cent.
  So 1 cent will moderate 250 items of content. $1 will moderate 25,000 items of content.
  LLMs are dirt cheap. You can play around with pricing across different models using my calculator here: https://tools.simonwillison.net/llm-prices
- anshumankmr 5 months ago
  
  you could use some cheaper LLMs for this, even GPT-3.5turbo could excel at this or the moderations API (but that in my experience was more trained to obey the laws of USA, for example guns were okay to be asked about but are pretty much illegal in my country). The simplest way is a blocklist of terms which was what we had previously used containing an insane number of terms beyond the regular abusive terms, but it would need some updating from time to time.
- scarface_74 5 months ago
  
  > i'm pretty sure most content moderation strategies operate on a budget below what most LLMs would cost.
  The alternative is paying a person…
jhunter1016 5 months ago

Definitely considering LLMs. At the day job, we had a team fine tune a model built to detect phishing content and it worked surprisingly well.
- pornel 5 months ago
  
  I'd be worried that LLMs are incredibly gullible.
  A phisher may insert text for an LLM with a disclaimer that's only an educational example of what not to do, or that they're the PayPal CEO authorizing this page.
socrateslee 5 months ago

I agree that LLM could do most of the moderation work. You could use a multi-modal LLM for image moderation.

raywu 5 months ago

Other comments already mentioned multiple services (from OpenAI to Cleanspeak). I want to provide a high level clarification from experience.

Moderation is a vast topic - there are different services that focus on different areas: such as, text, images, CSAM, etc. Traditionally you treat each problem area differently.

Within each area, you, as an operator, need to define the level of sensitivity for the category of offense (policies).

Some policies seem more clear cut (eg image: porn) while others seem more difficult to define precisely (eg text: bullying or child grooming).

In my experience, text moderation is more complex and presents a lot of risks.

There are different approaches for text moderation.

Keyword based matching services like Cleanspeak, TwoHat, etc. are baseline level useful but limiting because assessing a keyword requires context. A word can be miscategorized and results in false positive or false negative with this approach, which may impact your operation at scale; or UX if a platform requires more of a real-time experience.

LLM is theoretically well suited for taking context into account for text moderation; however they are also pricier and may require furthering fine tuning or self-hosting for cost savings.

CSAM as a problem area presents the highest risks though may be more clear cut. There are dedicated image services and regulatory bodies that focus on this area (for automating reporting to local law enforcement).

Finally, EU (DSA) also requires social media companies adhere to self report on moderation actions. EU also requires companies to provide pathways for users to own and delete their data (GDPR).

Edit: FIXED typos; ADDED a note on CSAM and DSA & GDPR

catlover76 5 months ago

[dead]

Koalageddon 5 months ago

[dead]

Hamuko 5 months ago

[flagged]