- Violence and Hate
- Sexual Content
- Criminal Planning
- Guns and Illegal Weapons
- Regulated or Controlled Substances
- Self-Harm
- Profanity
layerup.content_moderation
guardrail. This will analyze the content and flag any harmful elements based on the predefined categories. If harmful content is detected, you can take appropriate actions such as filtering the content, alerting a moderator, or rejecting the response altogether.
Our content moderation model is an essential tool for creating a safe environment for users to interact with LLMs without exposure to harmful content.