41.5 Bedrock Guardrails: Content Filtering and PII Redaction
Right, let’s talk about guardrails. You’ve got this incredibly powerful, creative, borderline-ungovernable model sitting in Bedrock. It’s like a genius intern who’s read the entire internet—the good parts, the weird parts, and the parts that would get you a visit from HR. You need to let them do their brilliant work, but you also need to stop them from accidentally writing a sonnet about your company’s AWS secret keys. That’s where Bedrock Guardrails come in. They’re your system of polite, but firm, bouncers for generative AI.
Think of it this way: the model’s native content filters are your first line of defense. They’re baked in by the model providers (Anthropic, AI21, etc.) to catch the blatantly bad stuff. But they’re generic. Guardrails are your second line of defense. You configure them. They let you enforce rules specific to your application, your company’s policies, and your unique definition of “inappropriate.” This is where you move from using AI to applying it responsibly.
The Two Big Levers: Denied Topics and Content Filters
A guardrail primarily gives you two powerful, and slightly different, tools to control the conversation.
Denied Topics are your high-level, conversational stop signs. You don’t write regex here; you describe concepts in plain English. For instance, you could create a denied topic named “Financial Advice” with the description “Discussing or recommending specific investments, stock picks, or financial strategies.” If a user asks, “Should I invest my life savings in dogecoin?”, the guardrail will kick in before the request even gets to the model, because it recognizes the prompt itself violates your rule. It’s a semantic, intent-based filter.
Content Filters, on the other hand, are your keyword and regex-based scrubs. They work on both the input (what the user says) and the output (what the model generates). This is where you’d drop in a list of your competitors’ names, internal project codenames, or a regex pattern to catch those pesky social security numbers. They’re less about the intent and more about the presence of specific strings.
Here’s the kicker, and a place where the abstraction leaks a bit: you can’t just say “redact all PII.” You have to define what “PII” means to you by creating individual filters for each type. Let’s build one.
Implementing a PII Redaction Guardrail
Let’s say we want to stop the model from spitting out email addresses and US social security numbers. First, we define our guardrail. Note the filterStrength—HIGH means it will block the message entirely, while LOW would just redact the offending text.
import boto3
import json
bedrock_client = boto3.client('bedrock', region_name='us-east-1')
guardrail_identifier = bedrock_client.create_guardrail(
name="MyApp-PII-Guardrail",
description="Blocks emails and SSNs from being generated.",
blockedInputMessaging="This request contains blocked content.",
blockedOutputsMessaging="Sorry, I can't generate that as it contains restricted information.",
topicPolicyConfig={
'topicsConfig': [] # We're not using denied topics here
},
contentPolicyConfig={
'filtersConfig': [
{
'type': 'PROFANITY',
'inputStrength': 'NONE', # Let the model's own filter handle profanity
'outputStrength': 'NONE'
},
{
'type': 'INSULTS',
'inputStrength': 'NONE',
'outputStrength': 'NONE'
},
{
'type': 'MISCONDUCT',
'inputStrength': 'NONE',
'outputStrength': 'NONE'
},
{
'type': 'PROMPT_ATTACK',
'inputStrength': 'NONE',
'outputStrength': 'NONE'
}
]
},
wordPolicyConfig={
'wordsConfig': [
{
'text': 'email',
'type': 'REGEX',
'match': r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
},
{
'text': 'SSN',
'type': 'REGEX',
'match': r'\b\d{3}-\d{2}-\d{4}\b'
}
]
},
sensitiveInformationPolicyConfig={
'piiEntitiesConfig': [] # We're handling this via regex above for demo
}
)['guardrailIdentifier']
guardrail_version = bedrock_client.create_guardrail_version(
guardrailIdentifier=guardrail_identifier,
description='Initial version with email and SSN filters'
)['version']
print(f"Guardrail ID: {guardrail_identifier}, Version: {guardrail_version}")
Now, to use it, you simply reference its ARN and version when you invoke a model. This is the beautiful part—it works across all supported models in Bedrock.
# Invoke a model with the guardrail applied
bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-east-1')
prompt = "Generate a fake user profile for testing, including a name, email, and SSN."
body = json.dumps({
"prompt": prompt,
"max_tokens_to_sample": 300
})
model_id = 'anthropic.claude-v2'
try:
response = bedrock_runtime.invoke_model(
modelId=model_id,
body=body,
guardrailIdentifier=guardrail_identifier,
guardrailVersion=guardrail_version
)
result = json.loads(response['body'].read())
print(result['completion'])
except bedrock_runtime.exceptions.GuardrailInterceptedException as e:
print(f"GUARDRAIL TRIPPED: {e}")
# Handle the blocked request gracefully
If your prompt or the model’s response contains an email or an SSN in the format you specified, the guardrail will intercept the call and throw a GuardrailInterceptedException. No data leaks.
The Rough Edges and Best Practices
Now, the honest part. This is powerful, but it’s not magic.
- Regex is a Sharp Tool: My SSN regex above? It’s naive. It won’t catch SSNs without hyphens, and it might false-positive on some other 9-digit number. You need to craft your patterns carefully. Test them extensively. This is your responsibility.
- The Strength Settings are Crucial: Setting
inputStrengthandoutputStrengthtoLOWinstead ofHIGHwill merely redact the matched text (replacing it with[REDACTED]) instead of blocking the entire message. This is often what you want for PII! You can let the conversation continue while still protecting data. I usedHIGHabove for dramatic effect. - Version Everything: You must specify a
guardrailVersionwhen you invoke. This prevents a updated guardrail from accidentally breaking your live application. You update the version in your code deliberately, just like any other dependency. - It’s Not Free: Enabling guardrails adds latency. You’re adding another entire processing step. For most applications, it’s negligible and worth it, but if you’re counting milliseconds, you need to benchmark.
- Combine with Native Filters: Use the model’s built-in filters for the broad stuff (hate speech, violence) and your custom guardrails for the specific stuff (your company’s secrets, your unique compliance needs). It’s a layered defense.
The goal isn’t to strangle the model into uselessness. It’s to create a safe, contained space where it can be maximally helpful without being horrifying. Guardrails are how you move from a cool demo to a production-worthy application. Now go build something, and for goodness sake, keep your SSNs to yourself.