AI-Assisted Content Moderation for Independent Web Applications

Building automated moderation for small web platforms is far more nuanced than simply applying a toxicity score. In this article I describe a practical approach to combining the Perspective API with a Google Cloud Conversational Agent to create a more robust moderation workflow.

While large social platforms invest heavily in trust and safety teams, independent developers must balance automation, fairness, and computational cost. My objective was to design a lightweight moderation layer for my Guest Book feature that could:

  • Reject extreme profanity
  • Filter nonsensical or spam-like input
  • Prevent unduly hostile or abusive submissions
  • Return structured validation responses to the application

This page demonstrates the architectural approach and allows you to test the moderation pipeline interactively.

Limitations of Threshold-Based Toxicity Scoring

The Perspective API provides probabilistic toxicity scoring based on machine learning models. However, threshold-based moderation alone can be insufficient for small platforms where contextual tone, repetition, or nonsensical input must also be evaluated.

During implementation, I observed that even aggressive threshold values would still allow content that was critical in tone but not technically "toxic" according to the model. This highlighted a broader issue: moderation is not purely a sentiment classification problem.

Architectural Solution: Conversational Agent as a Moderation Engine

Instead of relying solely on sentiment scoring, I implemented a Google Cloud Conversational Agent using a structured playbook. The agent evaluates both the name and message fields and returns deterministic output parameters to the application.

Conceptually, the playbook acts as a rule engine layered on top of probabilistic AI scoring.

Rule-Based Evaluation Logic

The moderation instructions function similarly to a structured switch statement found in traditional programming languages. Each condition evaluates a specific failure case and exits immediately upon match.

This deterministic branching ensures predictable outcomes and avoids ambiguity that can arise from purely probabilistic systems.

The Converational Agent

Goal

Determine if the message in the guest book is worth saving by examining the name and message parameters supplied.

Instructions Copy code
          
  1. - Do NOT great the user. Examine the name and message supplied by the user.
  2. - If the message is too negative in tone:
  3. - Set the result parameter to 0
  4. - Set the resultMessage parameter to "The message is too negative or unduly critical."
  5. - Exit this playbook
  6. - If the name or the message contains any extreme profanity:
  7. - Set the result parameter to 0
  8. - Set the resultMessage parameter to "Sorry no profanity please."
  9. - Exit this playbook
  10. - If the name is just nonsense characters:
  11. - Set the result parameter to 0
  12. - Set the resultMessage parameter to "I do not believe that is a proper name."
  13. - Exit this playbook
  14. - If the message doesn't make any sense or message is just the same character repeated:
  15. - Set the result parameter to 0
  16. - Set the resultMessage to "That message doesn't seem to make any sense."
  17. - Exit this playbook
  18. - Set the result parameter to 1
  19. - Set the resultMessage parameter to "Name and message accepted."

System Integration

The web application submits moderation requests to the agent using the Dialogflow CX Node.js client library. The input is formatted as:

name:Douglas,message:This is the message for the agent to evaluate

The agent responds with two structured parameters:

This design allows the front-end to remain stateless while the moderation logic is centralized within the cloud agent.

The instructions basically act like a switch statement in C or JavaScript (yes C had them decades before JavaScript I was there).

Back end console output:

        
  1. input:name:hal,message:This message will be allowed as it expresses a positive sentiment.
  2. Sending query "name:hal,message:This message will be allowed as it expresses a positive sentiment." to session ...
  3. Dialogflow CX Response for session:
  4. Current Page: N/A
  5. Matched Intent: N/A
  6. Parameters: {
  7. "resultMessage": {
  8. "stringValue": "Name and message accepted.",
  9. "kind": "stringValue"
  10. },
  11. "result": {
  12. "numberValue": 1,
  13. "kind": "numberValue"
  14. }
  15. }
  16. Fulfillment Messages: []

I have purposefully removed the session IDs but you can see the website submits the name (hardcoded as 'hal' for this purpose) and receives the two output parameters. There is no fulfillment text.

Conclusion

Effective moderation for small platforms requires more than toxicity scoring. A hybrid approach combining probabilistic AI classification with structured rule evaluation provides greater control and transparency.

While this implementation is intentionally lightweight, the same pattern can scale to include logging, appeal workflows, audit trails, and human review queues.

Use the Test button above to observe how the moderation pipeline responds to different inputs.

Post a Comment



Loading...
Perspective
Agent