Fintech

IBKR Futures Automation

NLP-driven command interface for futures and options trading — Claude as the command parser, risk gates before every order.

Customer

Self — proof-of-concept for retail trading automation

Timeline

2025–Present

Status

Working prototype

Capability

Agentic AIFintechPrototype

Stack

Pythonib_insyncReactClaude APINLPFintech

Outcome

Working

NLP command interface

Parse → risk gate → submit

99.5%+

Target eval accuracy

Before live deployment

<2s

Parse + risk check

Latency budget per order

Phase 1

Paper trading

Current stage

Customer Context

Who they are and what world they live in

Retail futures and options trading — specifically vertical spreads on micro contracts — requires precise multi-leg order entry across a complex brokerage API. The workflow is: identify a trade setup, calculate position sizing based on account risk tolerance, enter a multi-leg order with specific strikes and expirations, monitor position, and exit on target or stop. Every step that requires switching between mental calculation and the brokerage UI is a place where mistakes happen.

The Problem

The fuzzy ask, translated

The stated goal was 'automate the trading workflow.' The real design challenge: how do you build a natural-language command interface for a domain where a misinterpreted command costs real money? The LLM has to be right, not just helpful. And 'right' means: correct symbol, correct strike, correct expiration, correct quantity, correct order type — with a risk gate that prevents execution if any parameter is outside bounds.

The Constraints

Time · Budget · Regulatory · Technical · Organizational

Real money trading APIs — IBKR ib_insync has no sandbox mode for futures; mistakes execute against real positions

Zero tolerance for command misinterpretation — 'buy 2 MES calls at 5400' must parse exactly, not approximately

Real-time risk gates — position sizing, account exposure, and margin checks must run before any order touches the API

Latency budget — options orders on micro futures are time-sensitive; the NLP parse + risk check must complete in under 2 seconds

Phase-based development: paper trading → micro contracts ($50 margin) → scaled positions

Architecture Decisions

What I chose. What I rejected. Why.

Command parsing

Chosen

Claude as NLP parser with strict structured output schema — command → JSON with explicit fields for symbol, action, quantity, strike, expiration, order type

Rejected

Regex-based parser / traditional NLP

Why

Trading commands have infinite natural-language variation. Regex breaks on the second person who uses it. Claude's schema-constrained output gives structured JSON from any valid command phrasing — and refuses to parse ambiguous commands rather than guessing.

Risk gate placement

Chosen

Risk validation as a blocking step between command parse and order submission — account exposure check, position sizing rules, margin check, all gates must pass

Rejected

Post-execution risk monitoring

Why

The risk gate is the actual hard problem. Post-execution monitoring is a loss management tool, not a risk management tool. Every order that violates risk rules must be blocked before touching the API.

Development phasing

Chosen

Paper trading (no real orders) → micro contracts (real orders, minimum size) → scaled

Rejected

Full-scale testing

Why

The only way to verify that the system behaves correctly under real API conditions is to run it with real orders at minimum position size. Paper trading tests the logic; micro contracts test the API integration, error handling, and latency under production conditions.

The Hard Problem

The one thing that almost broke the deployment

The LLM command parser is not the hard problem. The hard problem is the eval harness for command interpretation accuracy. How do you measure whether 'sell 3 ES put spreads at 5300/5200 for 20 points' is being parsed correctly across 200 command variations before deploying against a live account? Without an eval harness, you're flying blind.

The Fix

Building the eval harness now before scaling. 200 command/intent pairs, automated comparison of parsed JSON against ground truth, coverage across all supported order types and common variations. The system does not move out of paper trading until eval accuracy on the test set exceeds 99.5% and all edge cases (ambiguous expirations, mid-sentence corrections, multi-leg abbreviations) have known behavior.

Production Reality

What I had to fix in week 2

ib_insync's async event loop and the Claude API's HTTP client do not share an event loop gracefully. Early builds had race conditions between incoming market data events and the NLP parse async calls. Separated them: market data runs in a dedicated ib_insync event loop thread; NLP parsing runs in a separate thread pool with a queue. The order submission is synchronous on the ib_insync thread.

Lessons Carried Forward

What this taught me that I apply to every deployment

The risk gate is the actual hard problem in trading automation — LLM command parsing is the easy part

Build the eval harness before scaling — 200 command/intent pairs reveal edge cases that a demo never surfaces

Separate async event loops for I/O-bound tasks that don't share a runtime — ib_insync and aiohttp are not friends on the same loop

Phase-based development is not optional when real money is involved — paper trading tests logic, micro contracts test the API

Related Deployments

Open Source · Pro Bono

OmmSai — Healthcare AI Pipeline

15K PDFs · 20× faster

Enterprise AI

Agentic AI Document Review

60–80% effort reduction