BytePort — Autonomous Infrastructure Ops

What it handles today

Seven runbooks. Zero pages.

Every card below is a live runbook — a real action BytePort takes when a signal breaches threshold. Not a recommendation. A fix.

Browse the full runbook catalog →

live

Disk pressure

Signal: disk_usage > 85%

Vacuums journalctl logs, rotates stale log files, prunes Docker layers, sweeps /tmp. Resolves without restart in 90% of cases.

avg resolution: 38s

live

Memory pressure

Signal: memory_used > 90%

Identifies the leaking process via cgroup stats. Restarts it using the byteport.restart=safe label so only opted-in services are touched.

avg resolution: 52s

live

CPU pressure

Signal: cpu_pct > 95% sustained

Applies cgroup CPU throttle via byteport.throttle=safe label. Buys headroom without killing the process, then files a postmortem with root cause.

avg resolution: 29s

live

Host unhealthy

Signal: host_unhealthy = 1

Runs connectivity checks, service probes, and NTP drift detection. Restarts unhealthy services in dependency order, escalates if the host itself is unreachable.

avg resolution: 71s

live

Pod crashloop

Signal: pod_crashloop_count > 3

Classifies crash cause (OOM, config error, missing dep). Safe-restarts the pod with back-off guard. Requires byteport.io/allow-remediation: "true" annotation on the namespace.

avg resolution: 44s

live

Deployment rollback

Signal: failed deploy within 30 min

Reads the byteport.io/allow-rollback annotation. If set, triggers kubectl rollout undo and notifies the commit author via Slack or email.

avg resolution: 18s

live

OOM-killed pod

Signal: oom_killed_count > 0

Distinguishes memory leak from under-provisioned limits. If limits look correct, restarts with debug logging enabled. If leak signature detected, patches the resource limit and files a postmortem.

avg resolution: 61s

All runbooks support --dry-run. Global cooldown prevents re-triggering within 5 minutes of a fix. Full audit log written to the DB.

Get started

Running in 60 seconds.

Point it at your signal source. Give it a notification target. It watches from there.

bash

npm install @byteport/agent

# minimum required env vars
export BYTEPORT_API_KEY=your_api_key
export PROMETHEUS_URL=http://prometheus:9090    # or DATADOG_API_KEY
export POSTMARK_SERVER_TOKEN=your_token        # or SLACK_WEBHOOK_URL

node -e "require('@byteport/agent').start()"

bash

docker run -d \
  -e BYTEPORT_API_KEY=your_api_key \
  -e PROMETHEUS_URL=http://prometheus:9090 \
  -e POSTMARK_SERVER_TOKEN=your_token \
  byteport/agent:latest

yaml

helm repo add byteport https://byteport.polsia.app/charts
helm repo update

# Default install is dry-run safe — never mutates without your permission
helm install byteport byteport/byteport-agent \
  --namespace byteport-agent \
  --create-namespace

# Confirm it's watching (logs appear within 30s)
kubectl logs -n byteport-agent \
  -l app.kubernetes.io/name=byteport-agent --tail=20 -f

Full quickstart → View on GitHub

The autonomous ops engineer that fixes infrastructure before your on-call gets paged.

Watch it run

Seven runbooks. Zero pages.

Disk pressure

Memory pressure

CPU pressure

Host unhealthy

Pod crashloop

Deployment rollback

OOM-killed pod

Running in 60 seconds.

Works with what you already run.

What BytePort fixed recently.

Want this watching your infra? →