Budget guard

Stop a runaway customer from blowing up your AI bill.

The budget guard is the enforcement half of Margint. Analytics without a kill switch just tells you how much money you already lost.

How it works

  1. In the dashboard, configure a budget — workspace-wide, per-customer, or per-feature.
  2. Wrap hot LLM calls in guardedCall() (TS) / guarded_call() (Python).
  3. Before the call, the SDK hits a budget-check endpoint (cached 60 s).
  4. If the customer is over budget, the SDK throws BudgetExceededError — you handle it however you like.

The check is fast. The endpoint is backed by Redis and typically responds in single-digit milliseconds.

Budget actions

Every budget has an action:

  • warn — budget is over, but the call still runs. The dashboard surfaces the breach and notifications fire. Use for monitoring-only rollouts.
  • block — SDK throws BudgetExceededError. The call does not run. Use this to actually stop spend.

For graceful degradation (fall back to a cheaper model on breach), set the budget to block and handle the thrown error in your own code — see the example below.

Scopes

Budgets stack. A request is allowed only if every applicable budget is satisfied.

  • Workspace-wide — total spend across all customers / features.
  • Per-customer — one customer, all features.
  • Per-feature — one feature across all customers.
  • Per-customer-per-feature — most specific.

Example: cap a customer's agent usage

import { Margint, BudgetExceededError } from '@margint-ai/sdk'

const m = new Margint({ apiKey: process.env.MARGINT_API_KEY! })

try {
  const res = await m.guardedCall(
    { customerId: user.id, feature: 'agent' },
    () => openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: prompt }]
    })
  )
  return res
} catch (err) {
  if (err instanceof BudgetExceededError) {
    // Decide how to respond — serve a cached result, show a modal, etc.
    return { error: 'You\'ve hit your monthly agent quota.' }
  }
  throw err
}

Example: graceful downgrade

from margint import BudgetExceededError

def ask(user_id, prompt):
    try:
        return m.guarded_call(
            customer_id=user_id,
            feature="chat",
            fn=lambda: client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": prompt}],
            ),
        )
    except BudgetExceededError:
        # Caller doesn't know — fall back to a cheaper model
        return client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
        )

What guardedCall does not do

  • It does not auto-track the call. Wrap the client inside the guardedCall closure if you want both checking and tracking:
    const guarded = m.wrap(openai, { customerId: user.id, feature: 'agent' })
    const res = await m.guardedCall(
      { customerId: user.id, feature: 'agent' },
      () => guarded.chat.completions.create({ ... })
    )
    
  • It does not replay queued requests when the budget resets. If the call was blocked, the caller sees the error — it's up to your code to retry.