What to monitor in bedrock and opensearch in aws?
Monitoring Amazon Bedrock and Amazon OpenSearch Service in AWS is crucial for ensuring performance, availability, and cost-effectiveness. Here's a breakdown of what to monitor in each service:
🧠 Amazon Bedrock – What to Monitor
Since Bedrock is a fully managed service for foundation models (FMs), you have limited but critical observability and monitoring options via CloudWatch, CloudTrail, and Service Quotas.
✅ Key Metrics (via CloudWatch)
-
Invocation Metrics
-
Invocations: Total number of requests made to the model. -
InvocationErrors: Failed invocation counts. -
ModelLatency: Latency for successful invocations (in milliseconds). -
ThrottledRequests: Requests rejected due to exceeding throttling limits. -
ValidationErrors: Bad input/request payloads.
-
-
Throughput Metrics
-
TokensIn: Number of input tokens used. -
TokensOut: Number of output tokens generated. -
TotalTokens: Sum ofTokensIn+TokensOut.
-
-
Cost and Quota Monitoring
-
Monitor
TokensIn/Outto track usage-based costs. -
Use
Service Quotasto monitor limits on invocation rates or model-specific quotas.
-
🔔 Alerts & Best Practices
-
Alert on high latency or invocation errors.
-
Set thresholds for monthly token usage to control costs.
-
Use CloudTrail to audit who is calling which model.
🔍 Amazon OpenSearch – What to Monitor
Amazon OpenSearch requires deeper observability because you're managing a cluster. You should monitor cluster health, performance, security, and resource usage.
✅ Key Metrics (via CloudWatch / OpenSearch Dashboards)
1. Cluster Health
-
ClusterStatus: Should be GREEN (YELLOW = replica shards missing; RED = primary shards missing). -
NumberOfNodes -
ActivePrimaryShards/ActiveShards -
UnassignedShards
2. Search and Indexing Performance
-
SearchLatency: High latency can affect user experience. -
SearchRate: Requests/sec. -
IndexingLatency: Time to index documents. -
IndexingRate: Documents indexed per second.
3. Node/Instance Metrics
-
JVMMemoryPressure: JVM heap pressure; should ideally be < 75%. -
CPUUtilization: High values indicate contention. -
FreeStorageSpace: Monitor EBS disk usage. -
InstanceHealth: To catch instance-level issues.
4. Storage & IOPS
-
EBS metrics like
BurstBalance,VolumeReadOps,VolumeWriteOps.
5. Slow Logs
-
Enable and analyze slow search and slow indexing logs.
6. Security Monitoring
-
Track failed logins or unauthorized access attempts.
-
Audit logs (via Cognito/IAM or custom integration).
-
Monitor fine-grained access control policies (if enabled).
📈 Tools & Tips
| Tool | Use |
|---|---|
| CloudWatch Dashboards | Custom dashboards for both Bedrock & OpenSearch |
| AWS CloudTrail | Monitor API usage for auditing |
| OpenSearch Dashboards | Visualize logs, search metrics |
| Service Quotas | Set alarms for Bedrock limits |
| AWS Cost Explorer | Track token usage costs (Bedrock) |
🎯 Summary
| Category | Bedrock | OpenSearch |
|---|---|---|
| Invocation Errors | ✅ | ✅ |
| Latency | ✅ | ✅ |
| Usage Quotas | ✅ | ❌ |
| Token Consumption | ✅ | ❌ |
| CPU & Memory | ❌ | ✅ |
| JVM Pressure | ❌ | ✅ |
| Disk Space | ❌ | ✅ |
| Indexing/Search Performance | ❌ | ✅ |
| Security Events | ✅ (via IAM) | ✅ (with fine-grained auth) |
Here are the CloudWatch namespace names for both Amazon Bedrock and Amazon OpenSearch Service:
🧠 Amazon Bedrock
-
CloudWatch Namespace:
AWS/Bedrock
Example Metrics in AWS/Bedrock:
-
Invocations -
InvocationErrors -
ModelLatency -
TokensIn -
TokensOut
🔍 Amazon OpenSearch Service
-
CloudWatch Namespace:
AWS/ES
(Note: AWS retained ES from the older "Elasticsearch Service" name.)
Example Metrics in AWS/ES:
-
ClusterStatus.green -
ClusterStatus.red -
JVMMemoryPressure -
CPUUtilization -
FreeStorageSpace -
SearchLatency -
IndexingRate
No comments