*instinctools regularly holds events where it gathers leading experts from various IT areas. A recent meeting was devoted to Golang using practice. One of the online meetup speakers was the leading software *instinctools engineer Alexander Rudyak.
Alexander spoke about common approaches and main data migration tools. The migration topic is quite extensive, so in his speech Alexander focused on the data schema migration aspect , in particular, on how data evolves within one application. (Database schema migration (data evolution). We share with you the highlights from the speech.
How it could be done
There are several common data migration types . There is no strict division. A specific data transfer case can refer, for example, to as migration using a connector (Connect to database and make DDL changes), so to migration via SQL review using a DBA engineer (SQL review and actual changes by DBA engineer), or use a third approach − Database as code. Alexander dwelled on it in more detail, since, according to most data migration specialists, this is the most reliable and most functional of all data migration methods.
Problems that crop up with DB
From the pool of problems that arise when working with database migrations, there are a few of the most common ones.
Many databases in different environments. If the system is large, there is often a situation where the databases are on different versions and in different environments.
Team working on different features that require different data changes. A fairly common problem is the interaction between team members or working groups that develop different features. If you have to change the schema, conflicts can arise in the data and in the interests of the working groups.
Database schema drift. Perhaps one of the most common problems that results from schema changes is Database schema drift. In this case, we are talking about the difference between the expected and actual database schema. For example, there is a table in the database with different columns, but the table titles or individual columns do not match the documentation.
This adds a lot of complexity to the migration preparation, as we make incorrect assumptions about what our data consists of.
Microservice architecture & zero downtime rollouts (databases are stateful)
Professionals who work with data within a microservice architecture, especially in Go, often face a large number of product requirements, including the need to run services 24/7 while maintaining zero downtime for them.
The peculiarity lies in the fact that microservices easily scale horizontally, but do not save their state (process information, but do not store it).
Database as Code
Initially, the Database as Code approach was used in infrastructure and DevOps communities. In this approach, the data structure is described as a configuration file. It does not matter on which platform (Azure, Google cloud, Amazon, etc.) it is launched – you will deploy almost the same set of units with the same set of systems, as it was written in a single file.
Exactly the same approach is used with databases and schemas – the described set of commands and expressions becomes the code base part, which makes it possible to review and version schemas. In addition, this makes it possible to use continuous integration with delivery workflow, which allows you to set up separate automated processes.
With Database as Code, the data is part of the code and we know exactly how the schema currently looks. In other words, during the migration process, Database as Code is the only correct source for databases.
Database as Code: Pros & Cons
Like any solution, Database as Code has its pros and cons
Learning curve. The first and probably the biggest disadvantage of this approach is that in order to apply it, you need to learn how to do it 🙂
Hard to set up initially. Discipline, care, and resource planning are essential to the Database as Code success. These qualities are almost the main components of how successfully the migration process will be implemented.
Slowed process means that we need to go through more stages of approvals, reviews and rollout of our changes before they get into real conditions.
Database as Code Approaches Classification
|Generate changes from diff between schemas
– SSOT in repository in a single file
– A lot of tools
– Support of popular application frameworks with specific programming languages
There are two main approaches in Database as Code.
The State-based approach is based on the desired state of the data schema. It uses code written in one file – a declarative declaration of a schema, operations, constraints, indexes, etc. As a rule, the code is stored in a separate file, and we can easily describe or find out its current state in the project.
The basic State-based principle is that a so-called diff file is generated based on two different versions of files. This is a set of instructions explaining how to move from version-N to version-N+1. The file is generated automatically, based on the differences between the two schema versions.
The Migration-based approach implies that the data schema working process is iterative – it is broken down into small steps and each step is added to the project. This approach is based on the fact that we change the database schema using diffs, i.e. changes are not generated automatically, but are written manually, or using an additional graphical editor.
As a rule, two directions of migration are used in Migration-based. The direct one, when a new change is rolled onto the previous version and a new version is created, and the reverse one is useful in situations where something went wrong and there was a need to return to the previous version.
The difficulty in implementing this approach lies in the fact that we are almost constantly in the process of evolution, and sometimes it is difficult to understand what certain changes can lead to.
But the Migration-based approach has undeniable advantages.
The main one is a large number of tools, developed documentation and recommendations that make it relatively easy to use this approach in your projects.
In addition, this type of migration is supported by popular programming languages, within which separate frameworks and libraries have been developed that allow you to make migrations with ready-made code.
How to deal with migrations from developers’ view
From the developer’s point of view, the migration process can be divided into two directions.
The first one is Database migration focused software where migrations are implemented in isolation from the project. In this case, special software is used that does not allow you to focus on one programming language or implementation, and also makes it possible to use a separate tool for managing migration.
Framework/language dependent library is a direction in which a separate programming language is used to create libraries, code and project files.
Among the popular tools for carrying out migrations “in isolation from the project”, the following can be distinguished.
Liquibase. Many developers and professionals who work with Java are familiar with Liquibase. Liquibase is a Migration-based utility that contains a whole system with lots of versioning and running options.
Flyway is an alternative to Liquibase. The tool is similar in functionality, contains a platform for migrations and is also popular in the Java world.
Bytebase uses a combined (Migration-based + State-based) approach, i.e. it uses a single file to store the database schema, but still allows for additional changes that can be added manually.
Sqitch is a migration-based utility that also uses versioning and incremental migrations.
What can be recommended for migration in the framework itself?
|Framework/language dependent libraries
– gorm (SB)
– ent (SB)
|SQL/other based versioned migrations:
– golang-migrate/migrate (MB)
– pressly/goose (MB)
– jackc/tern (MB)
– adlio/schema (MB)
– atlas (SB with its own DDL)
ORM or using migration directly in the service.
In this approach, the code that allows you to migrate data is embedded in the service, and ORM systems are often used as a helper.
The most famous in the community are gorm and ent. Gorm is the more popular system, but ent (used in the Facebook development) is already breathing down the neck. Both systems are based on the State-based approach.
SQL/other based versioned migrations or other types of migrations (not ORM) allow us to directly control the tools we plan to use in the migration process.
Among the most common, it is worth highlighting golang-migrate/migrate. It is a popular Migration-based tool with a large number of drivers available to support different databases. Golang-migrate/migrate uses a special table with all migration versions and allows you to roll back changes directly from the interface forward or backward.
You should also pay attention to the State-based atlas. At the material preparation time, the tool had already had a library prototype in Go and a separate utility in order to run service migrations.
A few approaches that enable to go through migrations easier
1. Stick to the tool you’ve already chosen, as replacing it can be more resource intensive than maintaining it.
2. Avoid irreversible changes – in case we suddenly need to return the data to the previous state. These include many operations, in particular deleting columns, records, tables, and other destructive actions.
3. Check your migrations for operations that could be potentially dangerous, such as creating constraints or updating databases that contain a large number of records. This will help avoid blocking access to data for a long period of time.
- Gitlab migration style guide: https://docs.gitlab.com/ee/development/migration_style_guide.html
- Microservice migrations: https://moduscreate.com/blog/microservices-databases-migrations
- Strong migration recommendations: https://github.com/ankane/strong_migrations#checks