📄 EC2 Inference

📅 Timeline & Action Plan (2–3 Weeks)

🔹 Week 1 — Assessment & Containerization

1.1 Analyze Current SageMaker Setup (Day 1–2)

Deliverables

  • Model type (PyTorch / sklearn / custom)

  • Inference entrypoint (inference.py)

  • Input / Output schema

  • Dependencies (requirements.txt)

  • Model artifacts location (S3)

Checklist

  • model.tar.gz structure

  • preprocessing / postprocessing logic

  • threshold / config JSON

  • latency target (ms)

1.2 Build Docker Image for Inference (Day 3–4)

Tasks

  • สร้าง Dockerfile

  • แยก inference logic ออกจาก SageMaker SDK

  • Mount / download model from S3

Deliverables

  • Dockerfile

  • app.py หรือ main.py

  • requirements.txt

  • Docker image run ได้ locally

1.3 Define API Contract (Day 5)

Tasks

  • Freeze request/response JSON

  • Match output format กับของเดิม (SageMaker)

Deliverables

  • API Spec (JSON Example)

  • Versioned endpoint /v1/predict

🔹 Week 2 — EC2 Deployment & Integration

2.1 Provision EC2 + Docker Runtime (Day 6–7)

Recommended

  • EC2: c6i.large / c6g.large (CPU inference)

  • OS: Amazon Linux 2023

  • Install: Docker + docker-compose

Deliverables

  • EC2 instance

  • Security Group (Allow 80 / 443 / 8080)

2.2 Deploy Container on EC2 (Day 8–9)

Tasks

  • Pull image / build on EC2

  • Use docker-compose

  • Set restart policy

Deliverables

  • Production container running

  • Health check /health

2.3 Connect API / System (Day 10)

Tasks

  • Update Node.js / Backend ให้ชี้ไป EC2

  • Feature flag (SageMaker vs EC2)

Deliverables

  • Toggle switch (env-based)

  • Fallback plan

🔹 Week 3 — Testing, Optimization & Cutover

3.1 Load & Performance Testing (Day 11–13)

Metrics

  • p95 latency

  • CPU / RAM usage

  • Throughput (req/sec)

Deliverables

  • Benchmark report

  • Instance sizing decision

3.2 Hardening & Ops (Day 14–15)

Tasks

  • Auto-restart

  • Log rotation

  • Basic monitoring (CloudWatch Agent / Prometheus)

  • IAM role (S3 read-only)

Deliverables

  • Runbook

  • Alarm thresholds

3.3 Production Cutover (Day 16–17)

Steps

  • Switch traffic 10% → 50% → 100%

  • Monitor errors & latency

  • Keep SageMaker as rollback (1–2 weeks)

Deliverables

  • Production sign-off