Monitoring
Overslash exposes a Prometheus metrics endpoint, structured JSON logs, and a public status page that operators can mirror or reuse. The metrics surface counts actions, approvals, secret reads, OAuth refreshes, and HTTP error rates by service — enough to alert on stuck approvals, failing connections, and unusual write volume.
Prometheus metrics
Metrics are exposed at GET /internal/metrics in Prometheus text format, on the API's normal port. The endpoint is mounted outside auth and rate limiting so a scraper can reach it without credentials — keep it on an internal network or restrict it at your proxy.
A minimal scrape config:
scrape_configs:
- job_name: 'overslash'
metrics_path: /internal/metrics
static_configs:
- targets: ['overslash:8080']Every metric is prefixed overslash_. They are grouped below by area.
HTTP
| Metric | Type | Labels |
|---|---|---|
overslash_http_requests_total | counter | method, path, status |
overslash_http_request_duration_seconds | histogram | method, path |
overslash_http_requests_in_flight | gauge | — |
Action execution
| Metric | Type | Labels |
|---|---|---|
overslash_action_executions_total | counter | template_key, mode, status |
overslash_action_execution_duration_seconds | histogram | template_key, mode |
overslash_action_validations_total | counter | template_key, mode, outcome |
overslash_action_validation_duration_seconds | histogram | template_key, mode |
overslash_outbound_http_total | counter | template_key, status_class |
overslash_outbound_http_duration_seconds | histogram | template_key, status_class |
Approvals
| Metric | Type | Labels |
|---|---|---|
overslash_approval_events_total | counter | event, identity_kind |
overslash_approval_resolution_duration_seconds | histogram | decision |
overslash_approvals_pending | gauge | — |
OAuth
| Metric | Type | Labels |
|---|---|---|
overslash_oauth_events_total | counter | provider, flow, status |
overslash_oauth_token_refresh_duration_seconds | histogram | provider, status |
Permissions & rate limiting
| Metric | Type | Labels |
|---|---|---|
overslash_permission_checks_total | counter | decision, layer |
overslash_rate_limit_decisions_total | counter | scope, decision |
Search & secrets
| Metric | Type | Labels |
|---|---|---|
overslash_search_queries_total | counter | mode, status |
overslash_secret_operations_total | counter | op, status |
Webhooks
| Metric | Type | Labels |
|---|---|---|
overslash_webhook_deliveries_total | counter | event_type, status, final |
overslash_webhook_delivery_attempts | histogram | event_type, outcome |
Database & background tasks
| Metric | Type | Labels |
|---|---|---|
overslash_db_pool_connections | gauge | state (active/idle) |
overslash_background_task_ticks_total | counter | task, status |
overslash_background_task_duration_seconds | histogram | task |
overslash_background_task_last_success_timestamp | gauge | task |
Structured logs
Overslash emits structured logs via tracing. Control verbosity with RUST_LOG — a global level or per-target filters:
RUST_LOG=info
RUST_LOG=info,overslash_metrics=debug # per-crate overrideRun behind a log collector that parses the output and ships it to your aggregator.
Health checks
Two unauthenticated endpoints, mounted outside auth and rate limiting and safe to poll frequently:
| Endpoint | Meaning |
|---|---|
GET /health | Liveness — always returns 200 once the process is up. |
GET /ready | Readiness — returns 200 when the app is initialised (migrations done, pool connected). |
Wire /ready to your load balancer / Kubernetes readinessProbe and /health to the livenessProbe.
Status page
Live production health: status.overslash.com.
Recommended alerts
A starting set, expressed against the metrics above:
- Stuck background task —
time() - max by (task) (overslash_background_task_last_success_timestamp) > 300. A task that hasn't succeeded in 5 minutes is wedged. - Pending approvals piling up — sustained high
overslash_approvals_pending, or growth without resolutions inoverslash_approval_events_total{event="approved"}. - HTTP 5xx ratio —
sum(rate(overslash_http_requests_total{status=~"5.."}[5m])) / sum(rate(overslash_http_requests_total[5m]))above your threshold. - OAuth refresh failures —
rate(overslash_oauth_events_total{flow="refresh",status="failure"}[5m]) > 0. Failing refreshes mean connections will start breaking. - Webhook delivery failures —
rate(overslash_webhook_deliveries_total{status="failed"}[15m]) > 0. - Secret operation errors / denials — watch
overslash_secret_operations_total{status="error"}and{status="denied"}for misconfiguration or abuse.