This is Part 1 of a series on madsql, a deterministic CLI tool for SQL dialect migration. The series covers split-statements, infer-schema, and convert in the order you’d reach for them on a real migration project.
The instinct on a migration project is to jump straight to conversion. You have a T-SQL script, you need Oracle SQL, so you run the converter. This works fine for small, clean input. It breaks down on real-world migration assets, vendor drops, application SQL bundles, legacy schema scripts, where the first thing you actually need to know is what’s in the file.
split-statements is the right first step. It breaks a multi-statement SQL file into one numbered file per statement without changing the dialect. That gives you a clear inventory of the work ahead, isolates any parse problems before conversion begins, and produces a stable set of individually-addressable statements you can replay, review, or investigate selectively.
When to Use split-statements
Reach for split-statements when:
- You’ve received a large SQL script from a vendor or another team
and need to understand what’s in it before doing anything else - You want to commit individual statements to source control rather than one monolithic file
- A conversion run is failing and you need to identify exactly which
statement index is the problem - You need to prepare SQL for selective replay against a target
database, running statement 7 without running 1 through 6 - You want to split now and convert later, keeping the two operations
separate and auditable
It’s also useful as a pure diagnostic: if split-statements produces clean output for a file, you have confidence the dialect parser can handle that input before you invest in a full conversion run.
Basic Usage
Split a single file:
madsql split-statements --in ./migration.sql --out ./split
--out is always required, split-statements writes one file per statement and needs a directory to write into.
Provide a dialect hint when the file uses dialect-specific syntax that might confuse the parser. For T-SQL:
madsql split-statements --source tsql --in ./migration.sql --out ./split
Split a directory tree, the relative structure is preserved under --out:
madsql split-statements --source tsql --in ./sql --out ./split
Read from stdin when you want to split a quick ad-hoc payload:
cat migration.sql | madsql split-statements --source tsql --out ./split
Understanding the Output Layout
Given this input file at ./examples/input/mssql/example_queries.sql, the output lands at:
./split/└── examples/ └── input/ └── mssql/ └── example_queries/ ├── 0001_stmt.sql ├── 0002_stmt.sql ├── 0003_stmt.sql ├── 0004_stmt.sql ├── 0005_stmt.sql ├── 0006_stmt.sql ├── 0007_stmt.sql ├── 0008_stmt.sql └── 0009_stmt.sql
Three things are worth noting about this layout:
First, the relative path of the input file is preserved under --out. If you’re splitting multiple files from a directory tree, every file lands in its own subdirectory with its own numbered statements, no collisions, no flattening.
Second, statement indexes are padded with leading zeros (0001, 0002, …) so directory listings sort correctly regardless of statement count.
Third, and critically for debugging: indexes are stable even when statements fail. If statement 4 fails during a later conversion run, 0001_stmt.sql, 0002_stmt.sql, and 0003_stmt.sql still exist. 0004_stmt.sql is absent. You know immediately which statement to investigate without reading through a log file.
Formatting the Output
Compact single-line SQL is the default. Use --pretty when you want multiline formatted output, useful for code review:
madsql split-statements --source tsql --in ./migration.sql --out ./split --pretty
Note that pretty rendering depends on SQLGlot being able to parse the statement. When SQLGlot falls back to sqlparse for boundary detection (covered below), pretty rendering isn’t available for that input.
The sqlparse Fallback
madsql tries SQLGlot first for parsing and boundary detection. If SQLGlot can’t parse a split-only input, typically because of unsupported syntax or non-standard constructs, it falls back to sqlparse for boundary detection on that input.
The fallback matters for two reasons:
- You still get per-statement files even when full dialect parsing
fails. The files won’t be pretty-printed, but the boundaries are
correct. - The
--reportflag will note which files used fallback parsing.
That’s a useful signal: a file that needed sqlparse fallback is a
candidate for closer review before conversion.
To surface this clearly, always add --report to split runs on files you haven’t inspected before:
madsql split-statements --source tsql --in ./migration.sql --out ./split --report
The Markdown report is written to ./split with a timestamp prefix:
YYYYMMDD-HHMMSS-madsql-split-statements-report.md.
Combining split-statements With Schema Inference
You can generate a schema side artifact during the split run without running a separate command. This is useful when you want a quick schema inventory at the same time as your statement breakdown:
madsql split-statements --source tsql --in ./migration.sql --out ./split --infer-schema --infer-schema-format json
The JSON schema artifact lands at the base of --out with a deterministic name: inferred_schema-tsql.json. For DDL output instead:
madsql split-statements --source tsql --in ./migration.sql --out ./split --infer-schema
This writes inferred_schema-tsql.sql. Schema inference is covered in depth in Part 2 of this series.
Practical Example: Pre-Migration Script Inventory
You’ve received a large T-SQL schema and data script from another team. Before converting anything, split it to understand what you’re working with:
madsql split-statements --source tsql --in ./vendor-drop.sql --out ./inventory --pretty --report
This gives you:
- One numbered file per statement in
./inventory/ - A Markdown report showing total statement count, success rate, and
any fallback parsing usage
From the numbered files you can immediately answer: how many statements are DDL vs DML, are there any statements the parser couldn’t handle cleanly, and which indexes correspond to the tables you care about most.
Once you’re satisfied with the split output, convert is the next step and you can point it at the split directory rather than the original file if you want to convert selectively.
Common Pitfalls
Forgetting –source on dialect-specific files. T-SQL batch separators (GO), Oracle block terminators (/), and other dialect-specific constructs can trip up the parser when no dialect hint is given. If the statement count looks wrong or the split output looks merged, add --source.
Expecting pretty output from fallback-parsed statements. If a statement needed sqlparse fallback, the output won’t be reformatted, it’s the raw boundary-detected text. The --report flag tells you which files this applies to.
Not using –report on unfamiliar input. The Markdown report is low cost and high value on files you haven’t seen before. Make it a habit.
Files outside the current working directory. Absolute input paths outside the current working directory are mapped under an _external/... prefix in the output so the layout stays deterministic. If your output looks like it’s under an unexpected subdirectory, this is why.
Re-running without –overwrite. madsql won’t overwrite existing output files by default. Add --overwrite when you need to replace a previous run’s output.
CI Integration Notes
split-statements is lightweight and fast, which makes it a good candidate for early-stage CI checks on SQL repositories.
A useful pattern: run split-statements as a parse validation step before conversion. If it exits with code 1, the JSON error report tells you exactly which file and which statement index failed, machine-readable and easy to surface in a CI summary.
madsql split-statements --source tsql --in ./sql --out ./split --errors split-errors.json --report --continue
Exit codes:
0— all statements split cleanly1— completed with parse errors recorded2— fatal CLI misuse (invalid flags, missing required arguments)
To keep a CI job green while still capturing diagnostics for review:
madsql split-statements --source tsql --in ./sql --out ./split --errors split-errors.json --ignore-errors
--ignore-errors returns exit code 0 even when errors are recorded.
Diagnostics still go to stderr and to split-errors.json.
Next in the series, Part 2: infer-schema, recovering CREATE TABLE DDL from a SQL workload when your source schema is incomplete or missing.