What BytePort handles without paging you

BytePort autonomously detects, diagnoses, and remediates these pager classes. We ship a new one every week.

10 Pager classes owned
12 Signal adapters
<5m Median MTTR

Autonomy matrix

What BytePort does and when it stops to escalate. Updated June 2, 2026.

Pager class Detection signals Autonomous actions Escalation criteria Runbook
CrashLoopBackOff (v2)
kubernetes
crashloopbackoff container_restart_threshold_exceeded (3/10/30 sub-severity) pod_not_ready_oom (exit 137 at startup) liveness_probe_failure readiness_probe_failure
Rollback bad image tag (1/service/hour) · bump memory limit 1.5× for OOM-at-startup (1/pod/24h) · loosen liveness probe initialDelaySeconds · hold pod restart + wait for dependency healthcheck · audit missing ConfigMap/Secret (never auto-creates Secrets)
Unknown classification after 2 diagnostic passes · memory bump ceiling hit (1/pod/24h limit) · dependency healthcheck times out · rollback rate limit reached (1/service/hour) · confidence <0.70
v0.1.30

Source
OOMKilled / Pod Memory Pressure
kubernetes
pod_oom_killed memory_pressure_high node_memory_pressure cgroup_oom_event jvm_heap_exhausted gomemlimit_exceeded
Bump memory limit (≤2×, 1/pod/24h) · trigger HPA scale-out · cordon+drain memory-pressured node · surface runtime misconfig as PR proposal
Genuine memory leak (rising RSS, no traffic correlation) · confidence <0.70 · second limit-bump within 24h · verify fails twice
v0.1.29

Source
Node NotReady / Kubelet Failure
kubernetes
node_not_ready node_disk_pressure node_memory_pressure node_pid_pressure kubelet_unhealthy node_network_unavailable
Restart kubelet or container runtime · prune disk/images · restart CNI daemonset pod · cordon+drain (max 1 node concurrent)
Hardware / cloud failure (kernel OOM/MCE) · confidence <0.70 · control-plane node · PDB violation during drain
v0.1.11

Source
TLS Certificate Expiry
tls
cert_expiring_soon (30/14/7/1d) cert_expired cert_chain_invalid ocsp_stapling_failure acme_challenge_failure
Renew via certbot / cert-manager / ACM · hot-reload nginx/envoy/haproxy/traefik/k8s ingress · verify chain and OCSP staple
Pinned cert · wildcard without dns-01 provider · rate-limited domain · commercial or internal CA · chain validation failure
v0.1.27

Source
Failed Deploy Auto-Rollback
deploys
deploy_health_regression readiness_probe_failure_post_deploy deploy_5xx_spike canary_slo_burn rollout_stuck
kubectl rollout undo to last-known-good revision (max 1 rollback/deployment/hour) · watch pod recovery · emit postmortem
Regression predates deploy (false attribution) · rollback itself fails · confidence <0.70 · verify fails twice
v0.1.26

Source
Database Connection Pool Exhaustion
database
db_pool_saturated db_connection_timeout db_too_many_connections idle_in_transaction_backlog pgbouncer_pool_full
Terminate idle-in-transaction sessions (cap 20/min) · cancel long-running queries · reload pgbouncer · rolling-restart leaking deployment
Kill cap (20/min) reached · superuser or pgbouncer session targeted · verify fails (util still >70%) · confidence <0.70
v0.1.25

Source
CrashLoopBackOff
kubernetes
CrashLoopBackOff container_restart_threshold_exceeded (>5/10min) liveness_probe_failure init_container_failure
Bump memory request 1.5× for OOM-at-startup · rollback bad image tag · loosen probe timing · cordon node on pressure eviction
External dependency failure in init container · missing ConfigMap/Secret · confidence <0.70 · daily resource-bump limit hit
v0.1.24

Source
Host Memory Pressure / OOM
host
oom_killed memory_pressure_high (≥90% RSS) swap_thrash pod_evicted_memory jvm_heap_exhausted
Heap snapshot + restart with backoff · trigger HPA · patch limit (≤2×) · cordon+drain noisy-neighbor node
Genuine memory leak confirmed · stateful workload not in allowlist · verify fails (RSS stays >70%) · confidence <0.70
v0.1.23

Source
CPU Saturation
infrastructure
cpu_throttle_high load_avg_spike process_cpu_saturation scheduler_queue_backlog
Annotate HPA for scale-out · kill runaway process if confidence ≥0.85 · throttle cron fanout · surface over-provisioned limit as alert
No HPA present and no runaway process identified · kill confidence <0.85 · cluster-wide saturation (>80% nodes affected)
v0.1.3

Source
Disk Pressure
infrastructure
disk_usage_high (≥85%) inode_exhaustion disk_io_saturation disk_pressure (NodeCondition)
Delete old log files and WAL archives · prune dangling container images · clean package cache · alert on inode exhaustion
Cleanup insufficient (disk still >90% after prune) · inode exhaustion persists · confidence <0.70
v0.1.1

Source

Coming next

Pager classes actively in development. Ship cadence: one per week.

DNS Resolution Failures
Next

CoreDNS failures, NXDOMAIN storms, UDP truncation, search-domain misconfig. Auto-restart CoreDNS, flush ndots cache, escalate on cluster-wide outage.

Ingress Controller Crash
Next

nginx-ingress / envoy pod crash, admission webhook failure, wildcard route misconfiguration. Rolling restart with config validation gate.

Queue Backup / Consumer Lag
Soon

Kafka consumer lag, RabbitMQ queue depth, SQS age-of-oldest-message breach. Scale consumer pods, dead-letter storm detection, poison-pill quarantine.

Replication Lag / DB Failover
Soon

Postgres streaming replication lag, MySQL replica delay, read-replica promotion. Auto-failover gate with split-brain detection and quorum check.

Cert Rotation Chain Failure
Planned

Intermediate CA rotation breaking leaf cert verification. Chain rebuild, trust-store reload, zero-downtime rotation via cert-manager CRD.

Don't see your pager class?

Tell us which alert woke you up last week. We'll prioritize it for next week's release.