Reference

What BytePort handles without paging you

BytePort autonomously detects, diagnoses, and remediates these pager classes. We ship a new one every week.

10 Pager classes owned

12 Signal adapters

<5m Median MTTR

Install now → Browse runbooks 5-min quickstart

Autonomy matrix

What BytePort does and when it stops to escalate. Updated June 2, 2026.

Pager class	Detection signals	Autonomous actions	Escalation criteria	Runbook
CrashLoopBackOff (v2) kubernetes	crashloopbackoff container_restart_threshold_exceeded (3/10/30 sub-severity) pod_not_ready_oom (exit 137 at startup) liveness_probe_failure readiness_probe_failure	Rollback bad image tag (1/service/hour) · bump memory limit 1.5× for OOM-at-startup (1/pod/24h) · loosen liveness probe initialDelaySeconds · hold pod restart + wait for dependency healthcheck · audit missing ConfigMap/Secret (never auto-creates Secrets)	Unknown classification after 2 diagnostic passes · memory bump ceiling hit (1/pod/24h limit) · dependency healthcheck times out · rollback rate limit reached (1/service/hour) · confidence <0.70	v0.1.30 Source
OOMKilled / Pod Memory Pressure kubernetes	pod_oom_killed memory_pressure_high node_memory_pressure cgroup_oom_event jvm_heap_exhausted gomemlimit_exceeded	Bump memory limit (≤2×, 1/pod/24h) · trigger HPA scale-out · cordon+drain memory-pressured node · surface runtime misconfig as PR proposal	Genuine memory leak (rising RSS, no traffic correlation) · confidence <0.70 · second limit-bump within 24h · verify fails twice	v0.1.29 Source
Node NotReady / Kubelet Failure kubernetes	node_not_ready node_disk_pressure node_memory_pressure node_pid_pressure kubelet_unhealthy node_network_unavailable	Restart kubelet or container runtime · prune disk/images · restart CNI daemonset pod · cordon+drain (max 1 node concurrent)	Hardware / cloud failure (kernel OOM/MCE) · confidence <0.70 · control-plane node · PDB violation during drain	v0.1.11 Source
TLS Certificate Expiry tls	cert_expiring_soon (30/14/7/1d) cert_expired cert_chain_invalid ocsp_stapling_failure acme_challenge_failure	Renew via certbot / cert-manager / ACM · hot-reload nginx/envoy/haproxy/traefik/k8s ingress · verify chain and OCSP staple	Pinned cert · wildcard without dns-01 provider · rate-limited domain · commercial or internal CA · chain validation failure	v0.1.27 Source
Failed Deploy Auto-Rollback deploys	deploy_health_regression readiness_probe_failure_post_deploy deploy_5xx_spike canary_slo_burn rollout_stuck	kubectl rollout undo to last-known-good revision (max 1 rollback/deployment/hour) · watch pod recovery · emit postmortem	Regression predates deploy (false attribution) · rollback itself fails · confidence <0.70 · verify fails twice	v0.1.26 Source
Database Connection Pool Exhaustion database	db_pool_saturated db_connection_timeout db_too_many_connections idle_in_transaction_backlog pgbouncer_pool_full	Terminate idle-in-transaction sessions (cap 20/min) · cancel long-running queries · reload pgbouncer · rolling-restart leaking deployment	Kill cap (20/min) reached · superuser or pgbouncer session targeted · verify fails (util still >70%) · confidence <0.70	v0.1.25 Source
CrashLoopBackOff kubernetes	CrashLoopBackOff container_restart_threshold_exceeded (>5/10min) liveness_probe_failure init_container_failure	Bump memory request 1.5× for OOM-at-startup · rollback bad image tag · loosen probe timing · cordon node on pressure eviction	External dependency failure in init container · missing ConfigMap/Secret · confidence <0.70 · daily resource-bump limit hit	v0.1.24 Source
Host Memory Pressure / OOM host	oom_killed memory_pressure_high (≥90% RSS) swap_thrash pod_evicted_memory jvm_heap_exhausted	Heap snapshot + restart with backoff · trigger HPA · patch limit (≤2×) · cordon+drain noisy-neighbor node	Genuine memory leak confirmed · stateful workload not in allowlist · verify fails (RSS stays >70%) · confidence <0.70	v0.1.23 Source
CPU Saturation infrastructure	cpu_throttle_high load_avg_spike process_cpu_saturation scheduler_queue_backlog	Annotate HPA for scale-out · kill runaway process if confidence ≥0.85 · throttle cron fanout · surface over-provisioned limit as alert	No HPA present and no runaway process identified · kill confidence <0.85 · cluster-wide saturation (>80% nodes affected)	v0.1.3 Source
Disk Pressure infrastructure	disk_usage_high (≥85%) inode_exhaustion disk_io_saturation disk_pressure (NodeCondition)	Delete old log files and WAL archives · prune dangling container images · clean package cache · alert on inode exhaustion	Cleanup insufficient (disk still >90% after prune) · inode exhaustion persists · confidence <0.70	v0.1.1 Source

Coming next

Pager classes actively in development. Ship cadence: one per week.

DNS Resolution Failures

CoreDNS failures, NXDOMAIN storms, UDP truncation, search-domain misconfig. Auto-restart CoreDNS, flush ndots cache, escalate on cluster-wide outage.

Ingress Controller Crash

nginx-ingress / envoy pod crash, admission webhook failure, wildcard route misconfiguration. Rolling restart with config validation gate.

Queue Backup / Consumer Lag

Soon

Kafka consumer lag, RabbitMQ queue depth, SQS age-of-oldest-message breach. Scale consumer pods, dead-letter storm detection, poison-pill quarantine.

Replication Lag / DB Failover

Soon

Postgres streaming replication lag, MySQL replica delay, read-replica promotion. Auto-failover gate with split-brain detection and quorum check.

Cert Rotation Chain Failure

Planned

Intermediate CA rotation breaking leaf cert verification. Chain rebuild, trust-store reload, zero-downtime rotation via cert-manager CRD.

Don't see your pager class?

Tell us which alert woke you up last week. We'll prioritize it for next week's release.

Request a pager class → Open a GitHub issue