Why the Perspective API alone isn't sufficient to automate content moderation.
I asked ChatGPT to generate some code to moderate entries made to my Guest Book. It obligingly produced some workable front and back end code using the Perspective API.
I soon discovered that regardless of the Perspective API threshold value it still allowed input that was critical of my site or myself as the developer.
My solution was to create a Google Cloud Conversational Agent with a single Routine Playbook having the goal "Determine if the message in the guest book is worth saving by examining the name and message parameters supplied" and the following instructions:
- - Do NOT great the user. Examine the name and message supplied by the user.
- - If the message is too negative in tone:
- - Set the result parameter to 0
- - Set the resultMessage parameter to "The message is too negative or unduly critical."
- - Exit this playbook
- - If the name or the message contains any extreme profanity:
- - Set the result parameter to 0
- - Set the resultMessage parameter to "Sorry no profanity please."
- - Exit this playbook
- - If the name is just nonsense characters:
- - Set the result parameter to 0
- - Set the resultMessage parameter to "I do not believe that is a proper name."
- - Exit this playbook
- - If the message doesn't make any sense or message is just the same character repeated:
- - Set the result parameter to 0
- - Set the resultMessage to "That message doesn't seem to make any sense."
- - Exit this playbook
- - Set the result parameter to 1
- - Set the resultMessage parameter to "Name and message accepted."
My site then calls this Agent using the Google dialogflow-cx client library.
The agent has two output parameters:
result
resultMessage
It does not have any input parameters as simply submitting the name and comment to be validated in the format name:Douglas,message:This is the message for the agent to evaluate
seems to work just fine.
In addition to catching negative sentiment about the site or its developer the agent will catch "rubbish" input like random characters and although it is not demonstrated here it will also reject a "rubbish" name.
The instructions basically act like a switch
statement in C or JavaScript (yes C had them decades before JavaScript I was there).
Back end console output:
- input:name:hal,message:This message will be allowed as it expresses a positive sentiment.
- Sending query "name:hal,message:This message will be allowed as it expresses a positive sentiment." to session ...
- Dialogflow CX Response for session:
- Current Page: N/A
- Matched Intent: N/A
- Parameters: {
- "resultMessage": {
- "stringValue": "Name and message accepted.",
- "kind": "stringValue"
- },
- "result": {
- "numberValue": 1,
- "kind": "numberValue"
- }
- }
- Fulfillment Messages: []
I have purposefully removed the session IDs but you can see the website submits the name (hardcoded as 'hal' for this purpose) and receives the two output parameters. There is no fulfillment text.
Please feel free to use the Test button above!