What to monitor in bedrock and opensearch in aws?
Monitoring Amazon Bedrock and Amazon OpenSearch Service in AWS is crucial for ensuring performance, availability, and cost-effectiveness. Here's a breakdown of what to monitor in each service:
🧠 Amazon Bedrock – What to Monitor
Since Bedrock is a fully managed service for foundation models (FMs), you have limited but critical observability and monitoring options via CloudWatch, CloudTrail, and Service Quotas.
✅ Key Metrics (via CloudWatch)
-
Invocation Metrics
-
Invocations
: Total number of requests made to the model. -
InvocationErrors
: Failed invocation counts. -
ModelLatency
: Latency for successful invocations (in milliseconds). -
ThrottledRequests
: Requests rejected due to exceeding throttling limits. -
ValidationErrors
: Bad input/request payloads.
-
-
Throughput Metrics
-
TokensIn
: Number of input tokens used. -
TokensOut
: Number of output tokens generated. -
TotalTokens
: Sum ofTokensIn
+TokensOut
.
-
-
Cost and Quota Monitoring
-
Monitor
TokensIn/Out
to track usage-based costs. -
Use
Service Quotas
to monitor limits on invocation rates or model-specific quotas.
-
🔔 Alerts & Best Practices
-
Alert on high latency or invocation errors.
-
Set thresholds for monthly token usage to control costs.
-
Use CloudTrail to audit who is calling which model.
🔍 Amazon OpenSearch – What to Monitor
Amazon OpenSearch requires deeper observability because you're managing a cluster. You should monitor cluster health, performance, security, and resource usage.
✅ Key Metrics (via CloudWatch / OpenSearch Dashboards)
1. Cluster Health
-
ClusterStatus
: Should be GREEN (YELLOW = replica shards missing; RED = primary shards missing). -
NumberOfNodes
-
ActivePrimaryShards
/ActiveShards
-
UnassignedShards
2. Search and Indexing Performance
-
SearchLatency
: High latency can affect user experience. -
SearchRate
: Requests/sec. -
IndexingLatency
: Time to index documents. -
IndexingRate
: Documents indexed per second.
3. Node/Instance Metrics
-
JVMMemoryPressure
: JVM heap pressure; should ideally be < 75%. -
CPUUtilization
: High values indicate contention. -
FreeStorageSpace
: Monitor EBS disk usage. -
InstanceHealth
: To catch instance-level issues.
4. Storage & IOPS
-
EBS metrics like
BurstBalance
,VolumeReadOps
,VolumeWriteOps
.
5. Slow Logs
-
Enable and analyze slow search and slow indexing logs.
6. Security Monitoring
-
Track failed logins or unauthorized access attempts.
-
Audit logs (via Cognito/IAM or custom integration).
-
Monitor fine-grained access control policies (if enabled).
📈 Tools & Tips
Tool | Use |
---|---|
CloudWatch Dashboards | Custom dashboards for both Bedrock & OpenSearch |
AWS CloudTrail | Monitor API usage for auditing |
OpenSearch Dashboards | Visualize logs, search metrics |
Service Quotas | Set alarms for Bedrock limits |
AWS Cost Explorer | Track token usage costs (Bedrock) |
🎯 Summary
Category | Bedrock | OpenSearch |
---|---|---|
Invocation Errors | ✅ | ✅ |
Latency | ✅ | ✅ |
Usage Quotas | ✅ | ❌ |
Token Consumption | ✅ | ❌ |
CPU & Memory | ❌ | ✅ |
JVM Pressure | ❌ | ✅ |
Disk Space | ❌ | ✅ |
Indexing/Search Performance | ❌ | ✅ |
Security Events | ✅ (via IAM) | ✅ (with fine-grained auth) |
Here are the CloudWatch namespace names for both Amazon Bedrock and Amazon OpenSearch Service:
🧠 Amazon Bedrock
-
CloudWatch Namespace:
AWS/Bedrock
Example Metrics in AWS/Bedrock
:
-
Invocations
-
InvocationErrors
-
ModelLatency
-
TokensIn
-
TokensOut
🔍 Amazon OpenSearch Service
-
CloudWatch Namespace:
AWS/ES
(Note: AWS retained ES
from the older "Elasticsearch Service" name.)
Example Metrics in AWS/ES
:
-
ClusterStatus.green
-
ClusterStatus.red
-
JVMMemoryPressure
-
CPUUtilization
-
FreeStorageSpace
-
SearchLatency
-
IndexingRate
No comments