Budget guard
Stop a runaway customer from blowing up your AI bill.
The budget guard is the enforcement half of Margint. Analytics without a kill switch just tells you how much money you already lost.
How it works
- In the dashboard, configure a budget — workspace-wide, per-customer, or per-feature.
- Wrap hot LLM calls in
guardedCall()(TS) /guarded_call()(Python). - Before the call, the SDK hits a budget-check endpoint (cached 60 s).
- If the customer is over budget, the SDK throws
BudgetExceededError— you handle it however you like.
The check is fast. The endpoint is backed by Redis and typically responds in single-digit milliseconds.
Budget actions
Every budget has an action:
warn— budget is over, but the call still runs. The dashboard surfaces the breach and notifications fire. Use for monitoring-only rollouts.block— SDK throwsBudgetExceededError. The call does not run. Use this to actually stop spend.
For graceful degradation (fall back to a cheaper model on breach), set the budget to block and handle the thrown error in your own code — see the example below.
Scopes
Budgets stack. A request is allowed only if every applicable budget is satisfied.
- Workspace-wide — total spend across all customers / features.
- Per-customer — one customer, all features.
- Per-feature — one feature across all customers.
- Per-customer-per-feature — most specific.
Example: cap a customer's agent usage
import { Margint, BudgetExceededError } from '@margint-ai/sdk'
const m = new Margint({ apiKey: process.env.MARGINT_API_KEY! })
try {
const res = await m.guardedCall(
{ customerId: user.id, feature: 'agent' },
() => openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: prompt }]
})
)
return res
} catch (err) {
if (err instanceof BudgetExceededError) {
// Decide how to respond — serve a cached result, show a modal, etc.
return { error: 'You\'ve hit your monthly agent quota.' }
}
throw err
}
Example: graceful downgrade
from margint import BudgetExceededError
def ask(user_id, prompt):
try:
return m.guarded_call(
customer_id=user_id,
feature="chat",
fn=lambda: client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
),
)
except BudgetExceededError:
# Caller doesn't know — fall back to a cheaper model
return client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
)
What guardedCall does not do
- It does not auto-track the call. Wrap the client inside the
guardedCallclosure if you want both checking and tracking:const guarded = m.wrap(openai, { customerId: user.id, feature: 'agent' }) const res = await m.guardedCall( { customerId: user.id, feature: 'agent' }, () => guarded.chat.completions.create({ ... }) ) - It does not replay queued requests when the budget resets. If the call was blocked, the caller sees the error — it's up to your code to retry.