Step-by-Step Guide to Using madsql Split-Statements

This is Part 1 of a series on madsql, a deterministic CLI tool for SQL dialect migration. The series covers split-statements, infer-schema, and convert in the order you’d reach for them on a real migration project.

The instinct on a migration project is to jump straight to conversion. You have a T-SQL script, you need Oracle SQL, so you run the converter. This works fine for small, clean input. It breaks down on real-world migration assets, vendor drops, application SQL bundles, legacy schema scripts, where the first thing you actually need to know is what’s in the file.

split-statements is the right first step. It breaks a multi-statement SQL file into one numbered file per statement without changing the dialect. That gives you a clear inventory of the work ahead, isolates any parse problems before conversion begins, and produces a stable set of individually-addressable statements you can replay, review, or investigate selectively.


When to Use split-statements

Reach for split-statements when:

  • You’ve received a large SQL script from a vendor or another team
    and need to understand what’s in it before doing anything else
  • You want to commit individual statements to source control rather than one monolithic file
  • A conversion run is failing and you need to identify exactly which
    statement index is the problem
  • You need to prepare SQL for selective replay against a target
    database, running statement 7 without running 1 through 6
  • You want to split now and convert later, keeping the two operations
    separate and auditable

It’s also useful as a pure diagnostic: if split-statements produces clean output for a file, you have confidence the dialect parser can handle that input before you invest in a full conversion run.


Basic Usage

Split a single file:

madsql split-statements
--in ./migration.sql
--out ./split

--out is always required, split-statements writes one file per statement and needs a directory to write into.

Provide a dialect hint when the file uses dialect-specific syntax that might confuse the parser. For T-SQL:

madsql split-statements
--source tsql
--in ./migration.sql
--out ./split

Split a directory tree, the relative structure is preserved under --out:

madsql split-statements
--source tsql
--in ./sql
--out ./split

Read from stdin when you want to split a quick ad-hoc payload:

cat migration.sql | madsql split-statements --source tsql --out ./split

Understanding the Output Layout

Given this input file at ./examples/input/mssql/example_queries.sql, the output lands at:

./split/
└── examples/
└── input/
└── mssql/
└── example_queries/
├── 0001_stmt.sql
├── 0002_stmt.sql
├── 0003_stmt.sql
├── 0004_stmt.sql
├── 0005_stmt.sql
├── 0006_stmt.sql
├── 0007_stmt.sql
├── 0008_stmt.sql
└── 0009_stmt.sql

Three things are worth noting about this layout:

First, the relative path of the input file is preserved under --out. If you’re splitting multiple files from a directory tree, every file lands in its own subdirectory with its own numbered statements, no collisions, no flattening.

Second, statement indexes are padded with leading zeros (0001, 0002, …) so directory listings sort correctly regardless of statement count.

Third, and critically for debugging: indexes are stable even when statements fail. If statement 4 fails during a later conversion run, 0001_stmt.sql, 0002_stmt.sql, and 0003_stmt.sql still exist. 0004_stmt.sql is absent. You know immediately which statement to investigate without reading through a log file.


Formatting the Output

Compact single-line SQL is the default. Use --pretty when you want multiline formatted output, useful for code review:

madsql split-statements
--source tsql
--in ./migration.sql
--out ./split
--pretty

Note that pretty rendering depends on SQLGlot being able to parse the statement. When SQLGlot falls back to sqlparse for boundary detection (covered below), pretty rendering isn’t available for that input.


The sqlparse Fallback

madsql tries SQLGlot first for parsing and boundary detection. If SQLGlot can’t parse a split-only input, typically because of unsupported syntax or non-standard constructs, it falls back to sqlparse for boundary detection on that input.

The fallback matters for two reasons:

  • You still get per-statement files even when full dialect parsing
    fails. The files won’t be pretty-printed, but the boundaries are
    correct.
  • The --report flag will note which files used fallback parsing.
    That’s a useful signal: a file that needed sqlparse fallback is a
    candidate for closer review before conversion.

To surface this clearly, always add --report to split runs on files you haven’t inspected before:

madsql split-statements
--source tsql
--in ./migration.sql
--out ./split
--report

The Markdown report is written to ./split with a timestamp prefix:
YYYYMMDD-HHMMSS-madsql-split-statements-report.md.


Combining split-statements With Schema Inference

You can generate a schema side artifact during the split run without running a separate command. This is useful when you want a quick schema inventory at the same time as your statement breakdown:

madsql split-statements
--source tsql
--in ./migration.sql
--out ./split
--infer-schema
--infer-schema-format json

The JSON schema artifact lands at the base of --out with a deterministic name: inferred_schema-tsql.json. For DDL output instead:

madsql split-statements
--source tsql
--in ./migration.sql
--out ./split
--infer-schema

This writes inferred_schema-tsql.sql. Schema inference is covered in depth in Part 2 of this series.


Practical Example: Pre-Migration Script Inventory

You’ve received a large T-SQL schema and data script from another team. Before converting anything, split it to understand what you’re working with:

madsql split-statements
--source tsql
--in ./vendor-drop.sql
--out ./inventory
--pretty
--report

This gives you:

  • One numbered file per statement in ./inventory/
  • A Markdown report showing total statement count, success rate, and
    any fallback parsing usage

From the numbered files you can immediately answer: how many statements are DDL vs DML, are there any statements the parser couldn’t handle cleanly, and which indexes correspond to the tables you care about most.

Once you’re satisfied with the split output, convert is the next step and you can point it at the split directory rather than the original file if you want to convert selectively.


Common Pitfalls

Forgetting –source on dialect-specific files. T-SQL batch separators (GO), Oracle block terminators (/), and other dialect-specific constructs can trip up the parser when no dialect hint is given. If the statement count looks wrong or the split output looks merged, add --source.

Expecting pretty output from fallback-parsed statements. If a statement needed sqlparse fallback, the output won’t be reformatted, it’s the raw boundary-detected text. The --report flag tells you which files this applies to.

Not using –report on unfamiliar input. The Markdown report is low cost and high value on files you haven’t seen before. Make it a habit.

Files outside the current working directory. Absolute input paths outside the current working directory are mapped under an _external/... prefix in the output so the layout stays deterministic. If your output looks like it’s under an unexpected subdirectory, this is why.

Re-running without –overwrite. madsql won’t overwrite existing output files by default. Add --overwrite when you need to replace a previous run’s output.


CI Integration Notes

split-statements is lightweight and fast, which makes it a good candidate for early-stage CI checks on SQL repositories.

A useful pattern: run split-statements as a parse validation step before conversion. If it exits with code 1, the JSON error report tells you exactly which file and which statement index failed, machine-readable and easy to surface in a CI summary.

madsql split-statements
--source tsql
--in ./sql
--out ./split
--errors split-errors.json
--report
--continue

Exit codes:

  • 0 — all statements split cleanly
  • 1 — completed with parse errors recorded
  • 2 — fatal CLI misuse (invalid flags, missing required arguments)

To keep a CI job green while still capturing diagnostics for review:

madsql split-statements
--source tsql
--in ./sql
--out ./split
--errors split-errors.json
--ignore-errors

--ignore-errors returns exit code 0 even when errors are recorded.
Diagnostics still go to stderr and to split-errors.json.


Next in the series, Part 2: infer-schema, recovering CREATE TABLE DDL from a SQL workload when your source schema is incomplete or missing.


Leave a Reply

Discover more from OraMatt: YABAOracle

Subscribe now to keep reading and get access to the full archive.

Continue reading