douglascolquitt.com - Content Moderation

Why the Perspective API alone isn't sufficient to automate content moderation.

I asked ChatGPT to generate some code to moderate entries made to my Guest Book. It obligingly produced some workable front and back end code using the Perspective API.

I soon discovered that regardless of the Perspective API threshold value it still allowed input that was critical of my site or myself as the developer.

My solution was to create a Google Cloud Conversational Agent with a single Routine Playbook having the following instructions:

        - Do NOT great the user. Examine the name and message supplied by the user.
- If the message is too negative in tone:
    - Set the result parameter to 0
    - Set the resultMessage parameter to "The message is too negative or unduly critical."
    - Exit this playbook
- If the name or the message contains any extreme profanity:
    - Set the result parameter to 0
    - Set the resultMessage parameter to "Sorry no profanity please."
    - Exit this playbook
- If the name is just nonsense characters:
    - Set the result parameter to 0
    - Set the resultMessage parameter to "I do not believe that is a proper name."
    - Exit this playbook
- If the message doesn't make any sense or message is just the same character repeated:
    - Set the result parameter to 0
    - Set the resultMessage to "That message doesn't seem to make any sense."
    - Exit this playbook
- Set the result parameter to 1
- Set the resultMessage parameter to "Name and message accepted."

The agent has two output parameters:

result
resultMessage

It does not have any input parameters as simply submitting the name and comment to be validated in the format name:Douglas,message:This is the message for the agent to evaluate seems to work just fine.

In addition to catching negative sentiment about the site or its developer the agent will catch "rubbish" input like random characters and although it is not demonstrated here it will also reject a "rubbish" name.

The instructions basically act like a switch statement in C or JavaScript (yes C had them decades before JavaScript I was there).

Back end console output:

        input:name:hal,message:This message will be allowed as it expresses a positive sentiment.
Sending query "name:hal,message:This message will be allowed as it expresses a positive sentiment." to session ...
Dialogflow CX Response for session:
  Current Page: N/A
  Matched Intent: N/A
  Parameters: {
  "resultMessage": {
    "stringValue": "Name and message accepted.",
    "kind": "stringValue"
  },
  "result": {
    "numberValue": 1,
    "kind": "numberValue"
  }
}
  Fulfillment Messages: []

I have purposefully removed the session IDs but you can see the website submits the name (hardcoded as 'hal' for this purpose) and receives the two output parameters. There is no fulfillment text.

I need to reword the instructions a bit to allow for some modest swearing and perhaps supply more granular feedback to the user if their input is rejected but at the same time keep the token count low.

Content Moderation With AI

Post a Comment