port-sqlite-to-rust · best: 100% · GPT-5.4 xhigh · 5 hrs · $52
Build a SQLite-compatible SQL engine in Rust from scratch. Match the reference engine's behavior across 5.8 million sqllogictest records spanning DDL, joins, subqueries, and the indexed milestones that separate naive implementations from real query planners.
The agent starts with a scaffolded Cargo project: a working JSONL request/response driver,
type-system stubs, an empty parser with all the AST types pre-declared, and an executor with
"not implemented" stubs for every SQL statement variant. The full SQLite source amalgamation
(~256,000 lines of C) sits in reference/ as study material.
The job is to fill in the parser, the executor, the storage layer, the scalar and aggregate
functions, and an index-aware query planner — to the point where the engine matches real SQLite's
behavior on the sqllogictest corpus. Linking against rusqlite or shelling out
to sqlite3 is explicitly forbidden.
Milestones unlock sequentially. The next milestone is only attempted when the current one passes at least 60%. The first eight milestones are pure SQL semantics (~11k records total). The next seven add indexed operations (~2M records). The final three are the long tail: ~3.7M records of scalar expressions, aggregates, and scaled joins.
| # | Milestone | Suite | Tests |
|---|---|---|---|
| 1 | DDL Foundations | ddl_foundations | 69 |
| 2 | UPDATE Mutations | update_mutations | 27 |
| 3 | IN / NOT IN with NULLs | membership_null_logic | 267 |
| 4 | SELECT: CASE and Subqueries | select_case_subqueries | 1,031 |
| 5 | SELECT: NULLs and COALESCE | select_nulls_coalesce | 1,031 |
| 6 | SELECT: EXISTS and Correlated | select_exists_correlated | 3,351 |
| 7 | SELECT: UNION / EXCEPT / INTERSECT | select_set_operations | 3,857 |
| 8 | SELECT: Multi-Table Joins | select_multitable_joins | 1,436 |
| 9 | Indexed DELETE and Filter | indexed_delete_filter | 160,942 |
| 10 | Indexed IN / NOT IN | indexed_membership | 132,907 |
| 11 | Indexed BETWEEN | indexed_range | 124,618 |
| 12 | Indexed ORDER BY | indexed_ordering | 808,122 |
| 13 | Indexed Predicate Commutativity | indexed_commutativity | 514,321 |
| 14 | Indexed Randomized Predicates | indexed_randomized | 208,168 |
| 15 | Indexed Views | indexed_views | 109,229 |
| 16 | Scalar Expressions | expressions_scalar | 1,200,366 |
| 17 | Aggregate Functions | aggregates | 1,292,350 |
| 18 | Multi-Table Joins (scaled) | joins_multitable | 1,244,323 |
| Total | 5,806,415 |
Every production sqlite run, sorted by peak milestone-completion percentage. "Wall" is when the last useful state was recorded — for runs that ended in a platform failure, this is when the failure happened, not the configured budget.
| Model | Agent | Budget | Wall | Peak % | Tests | Cost | Tokens | Outcome |
|---|---|---|---|---|---|---|---|---|
| GPT-5.4 (xhigh) | codex | 12 hrs | 5 hrs | 100% | 5,806,415 / 5,806,415 | $52 | 139 M | all tests pass |
| GPT-5.5 (xhigh) | codex | 12 hrs | 8 hrs 54 min | 99.91% | 5,803,177 / 5,806,415 | $258 | 386 M | platform bug |
| GPT-5.4-mini | codex | 2 hrs | 2 hrs | 68.75% | 1,783,123 / 5,806,415 | $9 | 64 M | time budget |
| GPT-5.4-mini | codex | 12 hrs | 7 hrs 42 min | 58.17% | 431,937 / 5,806,415 | $20 | 162 M | model gave up |
| GPT-5.5 (xhigh) | codex | 12 hrs | 1 hr 24 min | 44.17% | 164,453 / 5,806,415 | $33 | 48 M | platform crash |
| DeepSeek-v4-pro | opencode | 12 hrs | 4 hrs 54 min | 36.59% | 8,038 / 5,806,415 | $4.61 | 155 M | model gave up |
| GPT-5.4-mini | codex | 12 hrs | 42 min | 14.44% | 325 / 5,806,415 | $3 | 23 M | platform crash |
| GPT-5.4-mini | codex | 2 hrs | 18 min | 0% | 0 / 5,806,415 | $2.39 | 22.7 M | sandbox lost |
Tokens are total cumulative reported tokens: Codex input includes cached reads; OpenCode Go cache reads/writes are added from provider usage. Terminal labels come from platform state.