Перейти к содержимому

F-06 real-data success после Phase A re-design

Outcome

После полного re-design F-06 архитектуры (Phase A: 4 параллельных agents + 1 integration + 1 fix) — 3 collectors run на real Николаевских данных без единого 429.

StepCollectorDurationRecordsOutcome
1wb.token.verify (ping UNLIMITED)106 ms1 (token metadata)Personal token validated, JWT decoded, cached
2wb.common.seller_info193 ms1”ИП Горюнов Н.А.” retrieved
3wb.statistics.sales584 ms1,431 sales / 1.14 MB30 дней real продаж — incremental cursor advanced

Total: ~21 секунд wall-clock. Zero 429. Zero retries. Zero банов.

Что сделало это возможным (architecture changes)

Per research/BP-zero-ban-design-synthesis-2026-05-17.md Phase A:

  1. PostgreSQL persistent rate bucket (etl.rate_bucket table) — survives process restart, unlike in-memory dict который вызвал первый ban
  2. Safety margin 17%effective_interval = nominal × 1.176 (proven by wb-tools-supply-booking 17 дней uptime)
  3. Preemptive backpressure — parsing X-Ratelimit-Remaining на каждом response → slow down при <20%
  4. Circuit breaker per (token, endpoint) — N consecutive 429/5xx → OPEN 1h, half-open probe → close
  5. JWT token type detection — Personal vs Base critical (Base = 1/24h seller-info quota)
  6. Probe-free token verification — common-api/ping (UNLIMITED) для cold-start validation
  7. Separate sessions для rate_limiter/circuit_breaker — own short-lived sessions, не ломают runner’s atomic transaction
  8. Sticky proxy binding per (account, endpoint) — consistent IP не воспринимается как fraud

Path к success (timeline)

  • 17:00 — initial F-06 verification attempt → 429 → bug ban (Base token, 23h common-api)
  • 17:30 — Николай feedback: “research how to never get 429”
  • 18:00 — 3 parallel research agents (archeologist + industry + empirical)
  • 19:00 — synthesis BP doc + Phase A plan
  • 19:30 — Николай: Personal token + approval Phase A
  • 19:30-20:00 — Phase A: 4 parallel agents (migration + rate_limiter + circuit_breaker + integration)
  • 20:10 — integration agent commit da607be (124 tests pass)
  • 20:15 — bug found: mid-transaction commit в rate_limiter ломает runner
  • 20:30 — fix agent: separate sessions pattern, commit f3cdcf5
  • 20:45 — Phase B real-data probe → 3 collectors success

Total elapsed: ~4 часа (включая 30 min research + 90 min Phase A + 15 min fix + 5 min probe). Original plan 5-8 дней.

Tracking (continuous learning)

PlanActualSpeedup
5-8 дней (40-64h)~4h (включая ban incident + redesign)10-15x

Note: speedup посчитан включая halturность penalty (надо было читать BP1+BP2 ДО code, не после ban). Если бы сразу — было бы ~2-3h (~20x).

What changed in code (final state)

New files (Phase A)

  • apps/api/alembic/versions/0003_rate_bucket_circuit_state_token_metadata.py
  • apps/api/src/razmakh_api/etl/wb/persistent_rate_limiter.py
  • apps/api/src/razmakh_api/etl/wb/circuit_breaker.py
  • apps/api/src/razmakh_api/etl/wb/collectors/token_verify.py
  • apps/api/tests/test_persistent_rate_limiter.py (19 tests)
  • apps/api/tests/test_circuit_breaker.py (18 tests)
  • apps/api/tests/test_wb_client_integration.py (11 tests)
  • apps/api/tests/test_token_verify.py (24 tests)
  • apps/api/tests/test_runner_integration.py (3 tests включая regression)
  • research/BP1-wbpulse-netnik-scheduler-archeology-2026-05-17.md
  • research/BP2-zero-ban-polling-industry-patterns-2026-05-17.md
  • research/BP-zero-ban-design-synthesis-2026-05-17.md

Modified

  • apps/api/src/razmakh_api/etl/wb/client.py — circuit breaker + rate limiter integration + token invalidation
  • apps/api/src/razmakh_api/etl/runner.py — module-singleton limiter/breaker injection через ctx.extra
  • apps/api/src/razmakh_api/etl/wb/collectors/{sales,seller_info}.py — use new infrastructure
  • apps/api/src/razmakh_api/etl/wb/rate_limiter.py — DEPRECATED marker
  • scripts/run_wb_collector.py — PERSONAL_TOKEN reference + token_verify first

Test count

  • F-04 baseline: 7 tests (RLS)
  • F-05: 26 tests (manifest)
  • F-06 Phase A.1-A.4: 75 new tests (rate_limiter + circuit_breaker + WB client + token_verify + runner regression)
  • Total project: 124 tests pass on real PG via VPS integration

Lessons learned (process)

  1. Read research files DO implementation — повторение lesson из f06-skipped-research-halturность. Researchquality > implementation speed.
  2. Bursts на production tokens — никогда — даже 5 probes за 10 минут запустят tarpit. Use mocks для testing.
  3. Mid-transaction commits — anti-pattern — every component should own its own session lifecycle ИЛИ rely on outer transaction (один контракт)
  4. Personal vs Base tokens матерят — 1/24h vs 5 RPS, explicitly verify в WB cabinet UI при onboarding
  5. Token bans up to 23h+ — circuit breaker MUST honor full X-Ratelimit-Retry header
  6. Re-seed integration data carefully — pytest integration tests TRUNCATE core.organization → каждый pytest run кила seed. Fix: tests должны использовать unique fixture rows (slug test-{uuid}), не nikolay-main.

Open followups (для future agents)

  • #1 GH issue F-06.1: ETL run lifecycle observability gap (resolved через A.4 — _create_run теперь runs visible immediately). Может close после verify в F-10.
  • #2 GH issue F-06.2: WB token circuit breaker (resolved через Phase A.3). Close после prod observation 7 дней.
  • F-06.3 NEW: pytest integration tests должны не TRUNCATE production seed orgs. Use fixture rows с UNIQUE slug per test.
  • F-06.4 NEW: token-type verification automation — alert если token expires < 30 дней (180-day token lifecycle WB).
  • F-06.5 NEW: Replace etl.rate_bucket PG storage на Redis для multi-worker phase 1.5+ (avoid PG advisory lock contention).

Sources

  • Николая real Personal WB API token (Personal acc=3, supplier 157628200)
  • WB API real responses от common-api/ping, common-api/seller-info, statistics-api/sales
  • BP1 archeology of wb-tools production code (17 дней uptime proof)
  • BP2 industry patterns (Stripe, AWS, Twitter, WB engineering blog)
  • 6 параллельных Phase A agents результаты