Skip to content

Commit

Permalink
Merge pull request #66 from github/execute-on-replica
Browse files Browse the repository at this point in the history
--migrate-on-replica
  • Loading branch information
Shlomi Noach authored Jun 15, 2016
2 parents 574c372 + 1226fa8 commit 60fef6b
Show file tree
Hide file tree
Showing 8 changed files with 35 additions and 17 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,14 @@ It is still OK to connect `gh-ost` directly on master; you will need to confirm
Newcomer? We think you would enjoy building trust with this tool. You can ask `gh-ost` to simulate a migration on a replica -- this will not affect data on master and will not actually do a complete migration. It will operate on a replica, and end up with two tables: the original (untouched), and the migrated. You will have your chance to compare the two and verify the tool works to your satisfaction.

```
gh-ost --conf=.my.cnf --database=mydb --table=mytable --verbose --alter="engine=innodb" --execute --initially-drop-ghost-table --initially-drop-old-table -max-load=Threads_connected=30 --switch-to-rbr --chunk-size=2500 --cut-over=two-step --exact-rowcount --test-on-replica --verbose
gh-ost --conf=.my.cnf --database=mydb --table=mytable --verbose --alter="engine=innodb" --execute --initially-drop-ghost-table --initially-drop-old-table -max-load=Threads_connected=30 --switch-to-rbr --chunk-size=2500 --exact-rowcount --test-on-replica --verbose
```
Please read more on [testing on replica](testing-on-replica.md)

#### Executing on master

```
gh-ost --conf=.my.cnf --database=mydb --table=mytable --verbose --alter="engine=innodb" --execute --initially-drop-ghost-table --initially-drop-old-table -max-load=Threads_connected=30 --switch-to-rbr --chunk-size=2500 --cut-over=two-step --exact-rowcount --verbose
gh-ost --conf=.my.cnf --database=mydb --table=mytable --verbose --alter="engine=innodb" --execute --initially-drop-ghost-table --initially-drop-old-table -max-load=Threads_connected=30 --switch-to-rbr --chunk-size=2500 --exact-rowcount --verbose
```

Note: "executing on master" does not mean you need to _connect_ to the master. `gh-ost` is happy if you connect to a replica; it then figures out the identity of the master and makes the connection itself.
Expand Down
2 changes: 1 addition & 1 deletion build.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash
#
#
RELEASE_VERSION="0.9.1"
RELEASE_VERSION="0.9.2"

buildpath=/tmp/gh-ost
target=gh-ost
Expand Down
2 changes: 1 addition & 1 deletion doc/command-line-flags.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ password=123456

##### cut-over

Required. You are asked to explicitly state which cut-over algorithm you wish to use. Please see more discussion on [cut-over](cut-over.md)
Optional. Default is `safe`. See more discussion in [cut-over](cut-over.md)

##### exact-rowcount

Expand Down
8 changes: 5 additions & 3 deletions doc/cut-over.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ The [facebook OSC](https://www.facebook.com/notes/mysql-at-facebook/online-schem

`gh-ost` supports various types of cut-over options:

- `--cut-over=safe` - this is the default value if not specified otherwise. The "safe" cut-over algorithm is depicted [here](https://github.com/github/gh-ost/issues/65). It is essentially a multi-step, locking solution, where queries block until table swapping is complete -- when all operates normally. If, for some reason, connections participating in the cut-over are killed throughout the process, the worst-case scenario is a table-outage: a brief moment where the table ceases to exist, followed by an immediate rollback.
- `--cut-over=two-step` - this method is similar to the one taken by the facebook OSC. It's non-blocking but also non-atomic. The original table is first renames and pushed aside, then the ghost table is renamed to take its place. In between the two renames there's a brief period of time where your table just does not exist, and queries will fail.
- `--cut-over=voluntary-lock` - as depicted in [Solving the Facebook-OSC non-atomic table swap problem](http://code.openark.org/blog/mysql/solving-the-facebook-osc-non-atomic-table-swap-problem), this solution uses voluntary MySQL locks, and makes for a blocking swap, where your queries do not fail, but block until operation is complete. This effect is desired. There is danger in this solution, since connection failure of the two sessions involved in creating the lock, would result in a premature swap of the tables, hence with potentially corrupted data.
- We are working at this time on a blocking, safe, atomic solution, using wait conditions and via User Defined Functions which will need to be dynamically loaded onto your MySQL server. See [udf-ghost-wait-condition](https://github.com/openark/udf-ghost-wait-condition)
- With [`--test-on-replica`](testing-on-replica.md) there is no table swap (**NOTE**: this is going to change. We will do cut-over on replica in exact same way as on master so as to invoke exact flow)

Also note:
- With `--migrate-on-replica` the cut-over is executed in exactly the same way as on master.
- With `--test-on-replica` the replication is first stopped; then the cut-over is executed just as on master, but then reverted (tables rename forth then back again).
16 changes: 11 additions & 5 deletions doc/testing-on-replica.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,18 +24,20 @@ Apply `--test-on-replica --host=<a.replica>`.
- So... everything

`gh-ost` will sync the ghost table with the original table.
- When it is satisfied, it will issue a `STOP SLAVE IO_THREAD`, effectively stopping replication
- When it is satisfied, it will issue a `STOP SLAVE`, stopping replication
- Will finalize last few statements
- Will terminate. No table swap takes place. No table is dropped.
- Will swap tables via normal [cut-over](cut-over.md), and immediately revert the swap.
- Will terminate. No table is dropped.

You are now left with the original table **and** the ghost table. They _should_ be identical.
You are now left with the original table **and** the ghost table. When using a trivial `alter` statement, such as `engine-innodb`, both tables _should_ be identical.

You now have the time to verify the tool works correctly. You may checksum the entire table data if you like.
- e.g.
`mysql -e 'select * from mydb.mytable order by id' | md5sum`
`mysql -e 'select * from mydb._mytable_gst order by id' | md5sum`
- or of course only select the shared columns before/after the migration
- We use the trivial `engine=innodb` for `alter` when testing. This way the resulting ghost table is identical in structure to the original table (including indexes) and we expect data to be completely identical. We use `md5sum` on the entire dataset to confirm the test result.
- When adding/dropping columns, you will want to use the explicit list of shared columns before/after migration. This list is printed by `gh-ost` at the beginning of the migration.

### Cleanup

Expand All @@ -47,13 +49,17 @@ It's your job to:

Simple:
```shell
$ gh-osc --host=myhost.com --conf=/etc/gh-ost.cnf --database=test --table=sample_table --alter="engine=innodb" --chunk-size=2000 --max-load=Threads_connected=20 --initially-drop-ghost-table --initially-drop-old-table --cut-over=voluntary-lock --test-on-replica --verbose --execute
$ gh-osc --host=myhost.com --conf=/etc/gh-ost.cnf --database=test --table=sample_table --alter="engine=innodb" --chunk-size=2000 --max-load=Threads_connected=20 --initially-drop-ghost-table --initially-drop-old-table --test-on-replica --verbose --execute
```

Elaborate:
```shell
$ gh-osc --host=myhost.com --conf=/etc/gh-ost.cnf --database=test --table=sample_table --alter="engine=innodb" --chunk-size=2000 --max-load=Threads_connected=20 --switch-to-rbr --initially-drop-ghost-table --initially-drop-old-table --cut-over=voluntary-lock --test-on-replica --postpone-cut-over-flag-file=/tmp/ghost-postpone.flag --exact-rowcount --allow-nullable-unique-key --verbose --execute
$ gh-osc --host=myhost.com --conf=/etc/gh-ost.cnf --database=test --table=sample_table --alter="engine=innodb" --chunk-size=2000 --max-load=Threads_connected=20 --switch-to-rbr --initially-drop-ghost-table --initially-drop-old-table --test-on-replica --postpone-cut-over-flag-file=/tmp/ghost-postpone.flag --exact-rowcount --allow-nullable-unique-key --verbose --execute
```
- Count exact number of rows (makes ETA estimation very good). This goes at the expense of paying the time for issuing a `SELECT COUNT(*)` on your table. We use this lovingly.
- Automatically switch to `RBR` if replica is configured as `SBR`. See also: [migrating with SBR](migrating-with-sbr.md)
- allow iterating on a `UNIQUE KEY` that has `NULL`able columns (at your own risk)

### Further notes

Do not confuse `--test-on-replica` with `--migrate-on-replica`; the latter performs the migration and _keeps it that way_ (does not revert the table swap nor stops replication)
1 change: 1 addition & 0 deletions go/base/context.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ type MigrationContext struct {

Noop bool
TestOnReplica bool
MigrateOnReplica bool
OkToDropTable bool
InitiallyDropOldTable bool
InitiallyDropGhostTable bool
Expand Down
11 changes: 10 additions & 1 deletion go/cmd/gh-ost/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,9 @@ func main() {
flag.BoolVar(&migrationContext.NullableUniqueKeyAllowed, "allow-nullable-unique-key", false, "allow gh-ost to migrate based on a unique key with nullable columns. As long as no NULL values exist, this should be OK. If NULL values exist in chosen key, data may be corrupted. Use at your own risk!")

executeFlag := flag.Bool("execute", false, "actually execute the alter & migrate the table. Default is noop: do some tests and exit")
flag.BoolVar(&migrationContext.TestOnReplica, "test-on-replica", false, "Have the migration run on a replica, not on the master. At the end of migration tables are not swapped; gh-ost issues `STOP SLAVE` and you can compare the two tables for building trust")
flag.BoolVar(&migrationContext.TestOnReplica, "test-on-replica", false, "Have the migration run on a replica, not on the master. At the end of migration replication is stopped, and tables are swapped and immediately swap-revert. Replication remains stopped and you can compare the two tables for building trust")
flag.BoolVar(&migrationContext.MigrateOnReplica, "migrate-on-replica", false, "Have the migration run on a replica, not on the master. This will do the full migration on the replica including cut-over (as opposed to --test-on-replica)")

flag.BoolVar(&migrationContext.OkToDropTable, "ok-to-drop-table", false, "Shall the tool drop the old table at end of operation. DROPping tables can be a long locking operation, which is why I'm not doing it by default. I'm an online tool, yes?")
flag.BoolVar(&migrationContext.InitiallyDropOldTable, "initially-drop-old-table", false, "Drop a possibly existing OLD table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists")
flag.BoolVar(&migrationContext.InitiallyDropGhostTable, "initially-drop-ghost-table", false, "Drop a possibly existing Ghost table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists")
Expand Down Expand Up @@ -127,6 +129,13 @@ func main() {
if migrationContext.AllowedRunningOnMaster && migrationContext.TestOnReplica {
log.Fatalf("--allow-on-master and --test-on-replica are mutually exclusive")
}
if migrationContext.AllowedRunningOnMaster && migrationContext.MigrateOnReplica {
log.Fatalf("--allow-on-master and --migrate-on-replica are mutually exclusive")
}
if migrationContext.MigrateOnReplica && migrationContext.TestOnReplica {
log.Fatalf("--migrate-on-replica and --test-on-replica are mutually exclusive")
}

switch *cutOver {
case "safe", "default", "":
migrationContext.CutOverType = base.CutOverSafe
Expand Down
8 changes: 4 additions & 4 deletions go/logic/migrator.go
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ func (this *Migrator) shouldThrottle() (result bool, reason string) {
if time.Duration(lag) > time.Duration(this.migrationContext.MaxLagMillisecondsThrottleThreshold)*time.Millisecond {
return true, fmt.Sprintf("lag=%fs", time.Duration(lag).Seconds())
}
if this.migrationContext.TestOnReplica && (atomic.LoadInt64(&this.allEventsUpToLockProcessedInjectedFlag) == 0) {
if (this.migrationContext.TestOnReplica || this.migrationContext.MigrateOnReplica) && (atomic.LoadInt64(&this.allEventsUpToLockProcessedInjectedFlag) == 0) {
replicationLag, err := mysql.GetMaxReplicationLag(this.migrationContext.InspectorConnectionConfig, this.migrationContext.ThrottleControlReplicaKeys, this.migrationContext.ReplictionLagQuery)
if err != nil {
return true, err.Error()
Expand Down Expand Up @@ -666,11 +666,11 @@ func (this *Migrator) initiateInspector() (err error) {
if this.migrationContext.ApplierConnectionConfig, err = this.inspector.getMasterConnectionConfig(); err != nil {
return err
}
if this.migrationContext.TestOnReplica {
if this.migrationContext.TestOnReplica || this.migrationContext.MigrateOnReplica {
if this.migrationContext.InspectorIsAlsoApplier() {
return fmt.Errorf("Instructed to --test-on-replica, but the server we connect to doesn't seem to be a replica")
return fmt.Errorf("Instructed to --test-on-replica or --migrate-on-replica, but the server we connect to doesn't seem to be a replica")
}
log.Infof("--test-on-replica given. Will not execute on master %+v but rather on replica %+v itself",
log.Infof("--test-on-replica or --migrate-on-replica given. Will not execute on master %+v but rather on replica %+v itself",
this.migrationContext.ApplierConnectionConfig.Key, this.migrationContext.InspectorConnectionConfig.Key,
)
this.migrationContext.ApplierConnectionConfig = this.migrationContext.InspectorConnectionConfig.Duplicate()
Expand Down

0 comments on commit 60fef6b

Please sign in to comment.