sieve

Filter and transform a mysqldump stream.

This command processes mysqldump output, potentially filtering or transforming the output based on the provided command line options.

sieve effective works in two modes:

  • streaming; mysqldump is read from --input-file and written to stdout possibly with different output depending on the provided options.
  • directory; mysqldump is read from --input-file and split into separate files in the requested directory. This allows converting a large dump in a file-per-table easily. Files output in this mode are additionally filtered through --compress-command and are processed through gzip --fast by default so the output is compressed on disk by default.

Usage

Usage: dbsake sieve [options]

  Filter and transform mysqldump output.

  sieve can extract single tables from a mysqldump file and perform useful
  transformations, such as adding indexes after the table data is loaded for
  InnoDB tables, where such indexes can be created much more efficiently
  than the default incremental rebuild that mysqldump performs.

Options:
  -F, --format <name>             Select the output format (directory, stream)
  -C, --directory <path>          Specify output directory when
                                  --format=directory
  -i, --input-file <path>         Specify input file to process instead of
                                  stdin
  -z, --compress-command <name>   Specify compression command when
                                  --format=directory
  -t, --table <glob>              Only output tables matching the given glob
                                  pattern
  -T, --exclude-table <glob>      Excludes tables matching the given glob
                                  pattern
  --defer-indexes                 Add secondary indexes after loading table
                                  data
  --defer-foreign-keys            Add foreign key constraints after loading
                                  table data
  --write-binlog / --no-write-binlog
                                  Include SQL_LOG_BIN = 0 in output to disable
                                  binlog
  --table-schema / --no-table-schema
                                  Include/exclude table schema from output.
  --table-data / --no-table-data  Include/exclude table data from output
  --routines / --no-routines      Include / exclude database routines from
                                  output
  --events / --no-events          Include / exclude database events from
                                  output
  --triggers / --no-triggers      Include/exclude table triggers from output
  --master-data / --no-master-data
                                  Uncomment/comment CHANGE MASTER in input, if
                                  present
  -O, --to-stdout                 Force output on stdout, even to a terminal.
  -?, --help                      Show this message and exit.

Example

$ mysqldump --routines sakila | dbsake sieve --format=directory --directory=backups/
$ tree backups
backups
└── sakila
    ├── actor.sql.gz
    ├── address.sql.gz
    ├── category.sql.gz
    ├── city.sql.gz
    ├── country.sql.gz
    ├── customer.sql.gz
    ├── film_actor.sql.gz
    ├── film_category.sql.gz
    ├── film.sql.gz
    ├── film_text.sql.gz
    ├── inventory.sql.gz
    ├── language.sql.gz
    ├── payment.sql.gz
    ├── rental.sql.gz
    ├── routines.ddl.gz
    ├── staff.sql.gz
    ├── store.sql.gz
    └── views.ddl.gz

1 directory, 18 files

Options

Changed in version 2.0.0: Renamed split-mysqldump to sieve; Significant rewrite of functionality.

Changed in version 2.0.0: Remove –regex option in favor of -t/–table and -T/–exclude-table option which accepts globs.

-F, --format <name>

Output file format. Must be one of ‘stream’ or ‘directory’. If set to ‘stream’, output will be written on stdout. Unless –force is also specified the sieve command with refuse to write to a terminal.

If set to ‘directory’, output will be written to the path specified by the --directory option, with a file per table.

New in version 2.0.0.

-C, --directory <output directory>

Path where the sieve command should create output files. Ignored if --format is set to ‘stream’. The sieve command will create this path if it does not already exist.

Defaults to ‘.’ - the current working directory.

-i, --input-file <path>

Input file to read mysqldump input from. Default to “-” and reads from stdin. This must be an uncompressed data source, so to process an already compressed .sql.gz file you might run it through “zcat backup.sql.gz | dbsake sieve [options…]”

New in version 2.0.0.

-z, --compress-command <command>

Filter output files through this command. If --format is not set to ‘directory’, then this option is ignored. The sieve command will detect most common compression command and create an appropriate extension on the output files. For example, –compress-command=gzip will create .sql.gz files under the path specified by --directory.

Defaults to “gzip -1”.

Changed in version 2.0.0: -f/–filter-command was renamed to -z/–compress-command

-t, --table <glob pattern>

f --table is specified, then only tables matching the provided glob pattern will be included in the output of the sieve command. Each table is qualified by the database name in “database.table” format and then compared against the glob pattern. For example, to include all tables in the “mysql” database you would specify –table=”mysql.*”.

This option may be specified multiple times and sieve will include any table that matches at least one of these options so long as the table does not also match an --exclude-table option.

If no –table options are provided, all tables are included in the output that do not otherwise match an --exclude-table pattern.

New in version 2.0.0.

-T, --exclude-table <glob pattern>

If --exclude-table is specified, then only tables not matching the provided glob pattern will be included in the output of the sieve command. Each table is qualified by the database name in “database.table” format and then compared against the glob pattern. For example, to exclude the mysql.user table from output you would specify the option: “–exclude-table=mysql.user”.

This option may be specified multiple times and sieve will include any table that matches at least one of these options so long as the table does not also match an --exclude-table option.

If no --exclude-table options are provided, all tables are included in the output that match at least one --table pattern, or all output is included if neither --exclude-table or --table options are provided.

New in version 2.0.0.

--defer-indexes

This option rewrites the output of CREATE TABLE statements and arranges for secondary indexes to be created after the table data is loaded. This causes an additional ALTER TABLE statement to be output after the table data section of each table, when there is at least one secondary index to be added.

If there are foreign key constraints on the table, associated indexes will not be deferred unless the --defer-foreign-keys option is also specified.

This option only applies to InnoDB tables and is only efficient on MySQL 5.1+ (if the innodb plugin is enabled) or on MySQL 5.5+ (default InnoDB engine), where the fast alter path may be used.

--defer-foreign-keys

This option rewrites the output of CREATE TABLE statements and adds foreign key constraints after the table data is loaded. This is primarily useful to allow deferring secondary indexes with associated foreign keys.

This option only makes sense if reloading a dump into MySQL 5.6+, othrewise adding indexes will require a full table rebuild and will end up being much slower than just reloading the mysqldump unaltered.

--write-binlog / --no-write-binlog

If --no-write-binlog is set, sieve will output a SET SQL_LOG_BIN=0 SQL command to the beginning of the dump to avoid writing to the binary log when reloading the resulting output. Use the option with care, as the resulting dump will not replicate to a slave if this option is set.

New in version 2.0.0.

--table-schema / --no-table-schema

If --no-table-schema is used, sieve will not output any CREATE TABLE statements and will not output any CREATE VIEW statements. Only table data, routines and events will be output (as dictated by other options).

New in version 2.0.0.

--table-data / --no-table-data

If --skip-table-data is set, sieve will not output any table data sections and only output DDL. Reloading such a dump will result in empty tables.

New in version 2.0.0.

--master-data / --no-master-data

If the --master-data option is set, any commented out CHANGE MASTER statements will be uncommented in the output. This is useful of setting up a replication slave from a backup created using –master-data=2.

If the --no-master-data option is set, any CHANGE MASTER statements will be commented out in the output, ensuring no CHANGE MASTER is run. This is useful for dumps created with –master-data[=1].

New in version 2.0.0.

--routines / --no-routines

Include or exclude routines from the output, if routines were found in the input file. By default routines are not excluded and will ony be excluded if the –no-routines option is specified. The –routines option used to cancel a previous –no-routines option.

New in version 2.0.0.

--events / --no-events

Include or exclude events from the output, if events were found in the input file. By default events are not excluded and will ony be excluded if the –no-events option is specified. The –events option can be used to cancel a previous –no-events option.

New in version 2.0.0.

--triggers / --no-triggers

Include or exclude table triggers from the output, if triggers were found in the input file. By default triggers are included for any output tables (subject to table filtering). –no-triggers will disable output for all triggers and –triggers can be used to cancel the effects of an earlier –no-triggers option.

New in version 2.0.0.

-O, --to-stdout

The --to-stdout option will force output to be written to stdout even if stdout appears to be an active terminal. This can be useful in cases when filtering the mysqldump output or when not outputing large amounts of data and want to read it directly on the terminal. By default, the sieve command will abort if it detects that it would output to a terminal and –to-stdout is not used.