Ask HN: What do you use for content moderation of UGC?

36 points by jhunter1016 4 days ago

My co-founder and I started a static site hosting platform. From our past experience in user-generated content, we knew we needed moderation. So, we wrote a very simple script that checks for common phishing attempts and alerts us.

However, this simple script does not catch some of the more advance phishing sites and it doesn't catch other types of content that we can't support on our platform like porn.

Curious if anyone has tips and tricks for content moderation? We're still going to be manually reviewing sites because we haven't reached a scale that makes that impossible. But automation is nice.

nbadg 2 days ago

Follow-up question: what work has been done on client-side moderation? I know this gets dangerously close to the kind of content scanning that eg apple has tried (to very detrimental results), but hear me out: I really think this is a prerequisite to end-to-end encryption on a social network (there has to be some level of protection; even if 100% of users report 100% of bad content, imagine scrolling a feed and stumbling upon CSAM simply because you were the first person to see it). I also also think it's possible to strike a balance that preserves user agency while still protecting them, by simply inserting a manual reporting step. So, for example, potentially problematic content gets put behind an interstitial with a content warning and options to view, hide, report, etc. But again, this requires client-side content classification.

I'm aware of eg NSFWJS, which is a tensorflowJS model [1]. Is there anything else that, say, can also do violence/gore detection?

[1] https://github.com/infinitered/nsfwjs

  • seabass-labrax 2 days ago

    I don't think such scanning is logically a prerequisite to end-to-end encryption. In the case of a direct message from one person to another, one presumes that the recipient either wishes to receive the content or considers it spam.

    Spam filters on the recipient side are already viable; there's no advantage to having them on the sending side where they can be more easily bypassed. If the recipient does intend to receive illegal content, then they will simply arrange to use a communication method which does not have scanning on the sender's side. Whether or not encryption is used is immaterial.

    What about group chats? Again, if the group doesn't want to receive illegal content then they can use a bot account for spam detection. End-to-end encryption doesn't change the situation at all.

    I'm weary (if not wary) of discussion on this topic because it always comes down to the idea that developers of communication methods have a responsibility to read and potentially censor everything that is sent by that method. Yet that was never the usual expectation for snail mail; indeed many countries have laws against reading other people's letters. It was also never the expectation for the Internet back when it was being designed. So by all means discuss the relative merits of censorship, but the existence of encryption is not a valid argument one way or the other!

    • everforward 2 days ago

      I think OP is asking for the opposite. Platform-level moderation practically has to happens server-side because clients can just lie about having done the scanning.

      I believe the ask here is for receive-side moderation, similar to an ad blocker. If the messages are encrypted, the platform can't moderate them. Consumers would do their own content moderation locally and discard/hide/clickthrough/etc content they don't want to see, without actually removing it from the platform.

      Eg I'm sure the video of the UHC CEO getting shot made the rounds in some group chats, and that would be hard for some people to see. These kinds of filters would let those users hide the video on their messenger without impacting anyone else.

      • nbadg a day ago

        Exactly correct, and excellent real-world example!

  • petercooper 2 days ago

    If it's going on in any large scale way on a major site, I've not encountered it (but it might not be something they'd shout about) but similar types of models are being put into apps frequently, as seen in this post recently posted to HN: https://altayakkus.substack.com/p/you-wouldnt-download-an-ai (a mobile app that detects currency client-side)

    One problem is legal liability around training and testing. If you use a large foundation vision model, you could describe your moderation criteria in text and cross your fingers, but those aren't going to be running client-side. If you want a compact, efficient model to embed into an app, you need to train it.. but with what? There's a dilemma here where it's easier to build filters for legal things because you can't legally use the illegal things to train a filter for the illegal things! Systems like NSFWJS work because run of the mill porn is legal in most jurisdictions, but an equivalent "CSAMJS" filter would be a legal nightmare to produce.

    • nbadg 2 days ago

      Re: liability: definitely an issue. Might be the kind of thing that can only really be done in cooperation with eg ICMEC.

      That being said, it might be possible to get to "better than nothing" with transfer learning, by training a model to detect both "there are children here" as well as "this is pornography". I have no idea what the success rate would be, but there have been some pretty impressive generalization results in recent years with classification models.

      • petercooper 2 days ago

        That seems viable. But even if it works, you need to eval/test it.. which isn't a job I'd ever want to be within a million miles of. I hope the organizations tasked with tackling the problem are working on solutions third parties can deploy because they're probably the only people who legally and morally can. New laws like the UK's Online Safety Act make this quite an urgent task, too.

Freak_NL 2 days ago

Is a static site hosting platform required to proactively monitor which content paying users host in your jurisdiction?

Wouldn't a solid set of processes to handle content complaints and knowing who your customers are in case the hosting country's law enforcement has a case suffice?

Or are you having some free tier where users can anonymously upload stuff?

In the latter case — a free place to stash megabytes — you'll need to detect password protected archives in addition to unencrypted content. Get ready for a perpetual game of whack-a-mole though.

  • jhunter1016 2 days ago

    Especially for phishing content, the requirement isn’t so much a legal one (though that’s important to watch for) as it is a practical one. People flagging phishing content on your domain can get your domain wiped from the internet. Trust me, we had a terrifying 8 hours at my day job a couple of years ago because of this.

    • EE84M3i 2 days ago

      Sounds like you got on the safe browsing list? How did you get it resolved?

      • jhunter1016 2 days ago

        We had to call our DNS provider who told us there was nothing they could do because it was blocked by the tld owner which was an Italian company. So we called anyone and everyone we could and tried to communicate in broken Italian until they finally agreed to lift the block. It was the worst day we ever had as a company.

  • stevage 2 days ago

    Legal is not the only justification though. Depending on how visible the UGC is to other users, spam/porn/phishing could quickly degrade the site's user experience and reputation.

brudgers 4 days ago

If you care about moderation, a lot of it has to be done manually. Manual moderation requires placing a high level of trust in the moderators. That means that either you pay them well enough to care, or you build a community which user-moderators will protect. Or both.

That makes approximately all business ideas to host user generated content non-viable. The conflict is dynamic and you are the Maginot Line...except that any breach of laws creates a potential attack by state enforcement agencies too.

To put it another way, ASCII files and a teletype were enough to see pictures of naked ladies. Good luck.

Terretta 3 days ago
  • 1f60c 2 days ago

    Genius! I assume it's intended to be used to moderate the input to and output of OpenAI models, but I think it will moderate anything regardless of the source, and it's free (aside from one network call, and potentially sharing data with OpenAI?).

    • spjt 2 days ago

      I would be very wary of relying on anything from OpenAI, pricing and availability can change at any time.

      • fragmede 2 days ago

        As can any vendor. Has OpenAI pulled a Google maps before, or is this just a generic worry about relying on any vendor, of which there is plenty of competition?

  • jhunter1016 2 days ago

    Oh man, this is smart! I’ll have to try this.

mooreds 2 days ago

Full disclosure, I work for the company that owns Cleanspeak[0].

We have many happy clients that moderate UGC with Cleanspeak, including gaming and customer service applications. You can read more about the approach in the docs[1] or the blog[2]. Here's a blog post[3] that talks about root word list extrapolation, which is one of the approaches.

Cleanspeak is not the cheapest option and there's some integration work via API required, but if you are looking for performant, scalable, flexible moderation, it's worth an eval.

0: https://cleanspeak.com/

1: https://cleanspeak.com/docs/3.x/tech/

2: https://webflow.cleanspeak.com/blog

3: https://cleanspeak.com/blog/root-word-list-advantage-content...

  • ceejayoz 2 days ago

    Tell your company they need a pricing page.

AyyEye 2 days ago

We use a combo of AI and the cheapest African contractors we could find.

dsr_ 2 days ago

Remember that the price of not having a good appeal process overseen by reliable humans is a terrible reputation.

BrunoBernardino 2 days ago

How serendipitous! I did an Ask HN last week [1] trying to get platform creators to talk about this without much success. In any case, I've built a solution for links and emails, with an API [2], in case that helps. No subscription and I'm happy to provide some credits for free, for you to test it. Reach out if you're interested!

[1]: https://news.ycombinator.com/item?id=42780265 [2]: https://oxcheck.com/safe-api

  • jhunter1016 2 days ago

    Interesting, this seems to be designed for end user protection but if I’m understanding it correctly it could be used by us for proactive detection.

    • BrunoBernardino 2 days ago

      Thanks! It was initially, and our first early users were individuals, but we received a few comments that this functionality would be more valuable and useful for platforms, so we're also exploring that now, with a couple of platform customers already!

jamesponddotco 2 days ago

I don’t have an answer for everything you are looking for, but I wrote bonk[1] to solve a similar issue, as I needed to ensure users weren’t uploading porn[2]. Maybe you can find an use for it too.

[1]: https://git.sr.ht/~jamesponddotco/bonk

[2]: Because I host in Germany.

  • abraae 2 days ago

    > The name bonk comes from the "Go to horny jail" meme

    You are likely aware by now but "bonking" someone is also anglosphere slang for exactly what you're trying to detect.

    • jamesponddotco 2 days ago

      Yep, but I choose to focus on the meme since it makes me chuckle.

bauerpl 2 days ago

Gumroad open-sourced Iffy yesterday, and you can check it out: https://github.com/anti-work/iffy

  • seabass-labrax 2 days ago

    Unfortunately it's only source-available rather than open source: the licence is granted specifically to small businesses, and there are some additional factors that make modification impractical. Also, it looks like the software relies on ChatGPT, which of course isn't open source itself.

    • sahillavingia 2 days ago

      Appreciate the feedback!

      We’ll look into making it easy to support a self-hosted model.

raywu 2 days ago

Other comments already mentioned multiple services (from OpenAI to Cleanspeak). I want to provide a high level clarification from experience.

Moderation is a vast topic - there are different services that focus on different areas: such as, text, images, CSAM, etc. Traditionally you treat each problem area differently.

Within each area, you, as an operator, need to define the level of sensitivity for the category of offense (policies).

Some policies seem more clear cut (eg image: porn) while others seem more difficult to define precisely (eg text: bullying or child grooming).

In my experience, text moderation is more complex and presents a lot of risks.

There are different approaches for text moderation.

Keyword based matching services like Cleanspeak, TwoHat, etc. are baseline level useful but limiting because assessing a keyword requires context. A word can be miscategorized and results in false positive or false negative with this approach, which may impact your operation at scale; or UX if a platform requires more of a real-time experience.

LLM is theoretically well suited for taking context into account for text moderation; however they are also pricier and may require furthering fine tuning or self-hosting for cost savings.

CSAM as a problem area presents the highest risks though may be more clear cut. There are dedicated image services and regulatory bodies that focus on this area (for automating reporting to local law enforcement).

Finally, EU (DSA) also requires social media companies adhere to self report on moderation actions. EU also requires companies to provide pathways for users to own and delete their data (GDPR).

Edit: FIXED typos; ADDED a note on CSAM and DSA & GDPR

scarface_74 3 days ago

I hate to be that guy. But this seems like the perfect use case for an LLM. First put content through your script and then through a decently prompted LLM. Anything it catches, put in a queue for manual review.

  • notatoad 2 days ago

    i'm pretty sure most content moderation strategies operate on a budget below what most LLMs would cost.

    for any site hosting user-generated content, how efficiently you can run your moderation system is essentially your unique selling point - LLM is the baseline.

    • simonw 2 days ago

      You may be shocked at how inexpensive some of the LLMs are these days.

      Google Gemini 1.5 Flash 8B charges $0.04/million input tokens and $0.15/million output tokens.

      If a piece of content that needs to be moderated is 1,000 tokens (that's pretty long!) and you expect a 10 token return it will cost you 0.0039 cents - that's not a dollar amount, that's less than a 250th of a single cent.

      So 1 cent will moderate 250 items of content. $1 will moderate 25,000 items of content.

      LLMs are dirt cheap. You can play around with pricing across different models using my calculator here: https://tools.simonwillison.net/llm-prices

    • anshumankmr 2 days ago

      you could use some cheaper LLMs for this, even GPT-3.5turbo could excel at this or the moderations API (but that in my experience was more trained to obey the laws of USA, for example guns were okay to be asked about but are pretty much illegal in my country). The simplest way is a blocklist of terms which was what we had previously used containing an insane number of terms beyond the regular abusive terms, but it would need some updating from time to time.

    • scarface_74 2 days ago

      > i'm pretty sure most content moderation strategies operate on a budget below what most LLMs would cost.

      The alternative is paying a person…

  • jhunter1016 2 days ago

    Definitely considering LLMs. At the day job, we had a team fine tune a model built to detect phishing content and it worked surprisingly well.

    • pornel 2 days ago

      I'd be worried that LLMs are incredibly gullible.

      A phisher may insert text for an LLM with a disclaimer that's only an educational example of what not to do, or that they're the PayPal CEO authorizing this page.

  • socrateslee 3 days ago

    I agree that LLM could do most of the moderation work. You could use a multi-modal LLM for image moderation.