Content moderation is a critical aspect of maintaining the integrity and safety of interactions with LLMs. Layerup Security has developed a custom model specifically designed to detect harmful content within the responses generated by LLMs. Our model evaluates the content against several categories of harmful content, ensuring that the responses adhere to our strict guidelines for safe and respectful communication. The harmful content categories that our model can detect include:Documentation Index
Fetch the complete documentation index at: https://docs.uselayerup.com/llms.txt
Use this file to discover all available pages before exploring further.
- Violence and Hate
- Sexual Content
- Criminal Planning
- Guns and Illegal Weapons
- Regulated or Controlled Substances
- Self-Harm
- Profanity
layerup.content_moderation guardrail. This will analyze the content and flag any harmful elements based on the predefined categories. If harmful content is detected, you can take appropriate actions such as filtering the content, alerting a moderator, or rejecting the response altogether.
Our content moderation model is an essential tool for creating a safe environment for users to interact with LLMs without exposure to harmful content.

