zlacker

This spec reads like what we've been building at toran.sh - transparent proxy for AI API calls with observability.

The core idea: you create a "toran" (read-only inspection endpoint) bound to a single upstream. Point your client at the toran URL instead of the API directly. No SDK changes, no code changes - just swap the base URL. It shows exactly what went over the wire in real time.

For the multi-provider setup you're describing (OpenAI, Anthropic, Google, etc.), you'd create separate torans for each upstream. Auth passthrough works because the toran is transparent - it just forwards headers.

We're still early (focused on the "see what's happening" problem before tackling rate limiting/policy), but if the visibility piece would help with your setup, happy to give you access and hear how it compares to litellm+langfuse for your use case.