Quantcast
Channel: DDL – Cloud Data Architect
Viewing all 275 articles
Browse latest View live

Historical – Adding Roles to MySQL

$
0
0

Feed: Planet MySQL
;
Author: Mark Callaghan
;

This post was shared on code.google.com many years ago but code.google has been shutdown. It describes work done by my team at Google. I am interested in the history of technology and with some spare time have been enable to republish it.

I added support for roles to MySQL circa 2008. They arrived upstream with MySQL 8 in 2018. I wasn’t able to wait. I enjoyed the project more than expected. It wasn’t hard in terms of algorithms or performance but I had to avoid mistakes to avoid security bugs and the upstream code was well written. I had a similar experience implementing BINARY_FLOAT and BINARY_DOUBLE at Oracle. There I got to learn about the IEEE754 standard and had to go out of my way to catch all of the corner cases. Plus I enjoyed working with Minghui Yang who did the PL/SQL part of it.

MySQL roles and mapped users

The access control model in MySQL does not scale for a deployment with thousands of accounts and thousands of tables. The problems are that similar privileges are specified for many accounts and that the only way to limit an account from accessing a table is to grant privileges at the table or column level in which case the mysql.user table has millions of entries.

Privileges may be associated once with a role, and then many accounts may be mapped to that role. When many accounts have the same privileges, this avoids the need to specify the privileges for each account.

We have implemented mapped users in the MySQL access control model. These are used to simulate roles and solve one of these problems. A mapped user provides authentication credentials and is mapped to a _role_ for access control. A new table, mysql.mapped_user, has been added to define mapped users. Entries in an existing table, mysql.user, are reused for roles when there are entries from mysql.mapped_user that reference them.

To avoid confusion:

  • mapped user – one row in mysql.mapped_user
  • role – one row in mysql.user referenced by at least one row in mysql.mapped_user

This provides several features:

  • multiple passwords per account
  • manual password expiration
  • roles
  • transparent to users (mysql -uuser -ppassword works regardless of whether authentication is done using entries in mysql.mapped_user or mysql.user)

Use Case


Create a role account in mysql.user. Create thousands of private accounts in mysql.mapped_user that map to the role. By map to I mean that the value of mysql.mapped_user.Role is the account name for the role.

Implementation

Authentication in MySQL is implemented using the _mysql.user_ table. mysqld sorts these entries and when a connection is attempted, the first entry in the sorted list that matches the account name and hostname/IP of the client is used for authentication. A challenge response protocol is done using the password hash for that entry.
A new table is added to support mapped users. This table does not have columns for privileges. Instead, each row references an account name from mysql.user that provides the privileges. The new table has a subset of the columns from mysql.user:

  • User – the name for this mapped user
  • Role – the name of the account in mysql.user from which this account gets its privileges
  • Password – the password hash for authenticating a connection
  • PasswordChanged – the timestamp when this entry was last updated or created. This is intended to support manual password expiration via a script that deletes all entries where PasswordChanged less than the cutoff.
  • ssl_type, ssl_cipher, x509_issuer, x509_subject – values for SSL authentication, note that code has yet to be added in the server to handle these values

DDL for the new table:

CREATE TABLE mapped_user (
  User char(16) binary DEFAULT ” NOT NULL,
  Role char(16) binary DEFAULT ” NOT NULL,
  Password char(41) character set latin1 collate latin1_bin DEFAULT ” NOT NULL,
  PasswordChanged Timestamp DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP NOT NULL,
  ssl_type enum(”,’ANY’,’X509′,’SPECIFIED’) character set utf8 NOT NULL default ”,
  ssl_cipher blob NOT NULL,
  x509_issuer blob NOT NULL,
  x509_subject blob NOT NULL,
  PRIMARY KEY (User, Role, Password)
) engine=MyISAM
CHARACTER SET utf8 COLLATE utf8_bin
comment=’Mapped users’;

Authentication


Entries from mysql.mapped_user are used to authenticate connection attempts only when authentication fails with entries in mysql.user. The failure may have occurred because there was no entry in mysql.user for the user/host or because the password was wrong. If authentication succeeds using an entry in mysql.mapped_user, the mysql.mapped_user.Role column in that entry and the client’s hostname/IP are used to search mysql.user for a matching entry. And if one is found, that entry provides the privileges for the connection. By provides the privileges I mean that:

  • the values of mysql.user.User and mysql.user.Host are used to search the other privilege tables
  • the global privileges stored in mysql.user for the matching entry are used

The mysql.mapped_user table supports multiple passwords per account. When a user tries to create a connection with a username that is in the mysql.mapped_user table and there are multiple entries with a matching value in mysql.mapped_user.User, then authentication is attempted for one entry at a time using the password hash in mysql.mapped_user.Password until authentication succeeds or there are no more entries. Note that the order in which the entries from mysql.mapped_user are checked is *not* defined, but this is only an issue when there are entries in mysql.mapped_user with the same value for _User_ and different values for _Role_ and that deployment model should not be used. Also note that this does not require additional RPCs during client authentication.

Entries are ignored from mysql.mapped_user when:

  • Role is the empty string
  • User is the empty string
  • Password is the empty string

There is no constraint between the values in mysql.mapped_user.User and mysql.user.User.  Thus, a bogus mapping (Role references an account that does not exist in mysql.user) can be created. In that case, the entry in mysql.mapped_user cannot be used to create connections and will get access denied errors.

There is a primary key index on mysql.mapped_user, but that is not sufficient to enforce all of the integrity constraints that are needed. Entries with the same values for User and Role but different passwords are allowed, and the primary key forces the password to be different. Entries with the same value for User but different values for _Role_ should not be allowed. However, this can only be enforced with a check constraint on the table and MySQL does not enforce check constraints. We can write a tool to find such entries.

SQL Interfaces

Roles can be added via the _create mapped user_ command that is similar to create user but extended to support options for SSL connections. Roles can be dropped by the drop mapped user command that is similar to drop user. These commands update internal data structures and update the mysql.mapped_user table. There is no need to run flush privileges with these commands.

The following have been changed to print the value of mysql.mapped_user.User rather than the value of mysql.user.User when a role is used to create a connection.

  • error messages related to access control
  • select current_user()
  • select user()
  • show user_statistics
  • show processlist

The output of show grants has not been changed and will display the privileges for the role (the entry in _mysql.user).

set password = password(STRING)_ fails for accounts that use a role. The only way to change a password for an entry in mysql.mapped_user is by an insert statement.

how processlist with roles displays the role for connections from mapped users rather than the mapped user name. show processlist displays the value from mysql.mapped_user.

show user_statistics with roles displays statistics aggregated by role for connections from mapped users. show user_statistics displays values aggregated by the value from mysql.mapped_user.

Mapped users can be created by inserting into mysql.mapped_user and then running FLUSH PRIVILEGES. They are also created by the _create mapped user_ command. An example is create mapped user mapped_readonly identified by ‘password’ role readonly.

Mapped users can be dropped by deleting from mysql.mapped_user and then running FLUSH PRIVILEGES. They are also dropped by the _drop mapped user_ command. An example is *drop mapped user foo*. This drops all entries from mysql.mapped_user with that user name. A delete statement must be used to drop an entry matching either (username, role) or (username, role, password).

select user() displays the value of the mapped user name when connected as a mapped user. select current_user() displays the value of the role when connected as a mapped user. This is done because current_user() is defined to return the name of the account used for access control.

make user delayed is done on the value of the account name. It does not matter whether the account is listed in mysql.user or mysql.mapped_user.

mysql.mapped_user does not have columns for resource limits such as max connections and max QPS. Limits are enforced per role.

This feature is only supported when the configuration variable mapped_users is used (add to /etc/my.cnf). This feature is disabled by default. Also, the mysql.mapped_user table must exist. This table does not exist in our current deployment. It must be created before the feature is enabled. The scripts provided by MySQL to create the system databases will create the table, but we do not use those scripts frequently.

The value of the mysql.user.Host column applies to any mapped users trying to create a connection. This can be used to restrict clients to connect from prod or corp hosts.

Open Requests

  • Add a unique index on (User, Password)
  • Add an email column to mysql.mapped_user
  • Inherit limits (hostname/IP address from which connections are allowed, connection limits, max queries per minute limit) from the mysql.user table.
  • Implement support for SSL — the mysql.mapped_user table has columns for SSL authentication. Code has not been added to the server to handle them.


Understanding MariaDB architecture

$
0
0

Feed: MariaDB Knowledge Base Article Feed.
Author: .

MariaDB architecture is partly different from the architecture of traditional DBMSs, like SQL Server. Here we will examine the main components that a new MariaDB DBA needs to know. We will also discuss a bit of history, because this may help understand MariaDB philosophy and certain design choices.

This section is an overview of the most important components. More information is included in specific section of this migration guide, or in other pages of the MariaDB KnowledgeBase (see the links scattered over the text).

Storage engines

MariaDB was born from the source code of MySQL, in 2008. Therefore, its history begins with MySQL.

MySQL was born at the beginning of the 90s. Back in the days, if compared to its existing competitors, MySQL was lightweight, simple to install, easy to learn. While it had a very limited set of features, it was also fast in certain common operations. And it was open source. These characteristics made it suitable to back the simple websites that existed at that time.

The web evolved rapidly, and the same happened to MySQL. Being open source helped a lot in this respect. Because the community needed functionalities that weren’t supported at that time.

MySQL was probably the first database system to support a pluggable storage engine architecture. Basically, this means that MySQL knows very little about creating or populating a table, reading from it, building proper indexes and caches. It just delegated all these operations to a special plugin type called storage engines.

One of the first plugins developed by third parties was InnoDB. It is very fast, and it adds two important features that are not supported otherwise: transactions and foreign keys.

Note that when MariaDB asks a storage engine to write or read a row, the storage engine could theoretically do anything. This led to the creation of very interesting alternative engines, like BLACKHOLE (which doesn’t write or read any data, acting like the /dev/null file in Linux), or CONNECT (which can read and write to files written in many different formats, or remote DBMSs, or some other special data sources).

Nowadays InnoDB is the default MariaDB storage engine, and it is the best choice for most use cases. But for particular needs, sometimes using a different storage engine is desirable. In case of doubts about the best storage engine to use for a specific case, check the Choosing the Right Storage Engine page.

When we create a table, we specify its storage engine. It is possible to convert an existing table to another storage engine, thought this is a blocking operation which requires a complete table copy. 3rd party storage engines can also be installed while MariaDB is running.

Note that it is perfectly possible to use tables with different storage engines in the same transaction (even if some engines are not transactional). It is even possible to use different engines in the same query, for example with JOINs and subqueries.

The binary log

As we mentioned, different tables can be built using different storage engines. It is important to notice that not all engines are transactional, and that different engines implement the transaction logs in different ways. For this reason, MariaDB cannot replicate data from a master to a slave using an equivalent of SQL Server transactional replication.

Instead, it needs a global mechanism to log the changes that are applied to data. This mechanism is the binary log, often abbreviated to binlog.

The binary log can be written in two formats:

  • STATEMENT logs SQL statements that modify data;
  • ROW logs a reference to the rows that has been modified, if any (usually it’s the primary key), and the new values that have been added or modified, in a binary format.

In most cases, STATEMENT is slower because the SQL statement needs to be re-executed by the slave, and because certain statements may produce a different result in the slave (think about queries that user LIMIT without ORDER BY, or the CURRENT_TIMESTAMP() function). But there are exceptions, and anyway DDL statements are always logged as Smetadata_lock_infoTATEMENT to avoid flooding the binary log. Therefore, the binary log may well contain both ROW and STATEMENT entries. One can even set binlog_format=MIXED to log changes as STATEMENT, except when they may produce different results on a slave.

See Binary Log Formats.

The binary log allows:

  • replication, on the master;
  • on a slave, if it could be promoted to master;
  • incremental backups;
  • flashback, which means, seeing data as they were in a point of time in the past;
  • restore a backup and re-apply the binary log, with the exception of a data change which caused problems (human mistake, application bug, SQL injection);
  • Capture Data Changes (CDC), by streaming the binary log to technologies like Apache Kafka.

If you don’t plan to use any of these features on a server, it is possible to disable the binary log to slightly improve the performance.

The binary log can be inspected using the mysqlbinlog utility, which comes with MariaDB.

Plugins

As mentioned, storage engines are a special type of plugins. But others exist. For example plugins can add authentication methods, new features, SQL syntax, functions, informative tables, and more.

Many plugins are installed by default, or available but not installed by default. They can be installed or uninstalled at runtime with SQL statements, like INSTALL PLUGIN, UNINSTALL PLUGIN and others; see Plugin SQL Statements. 3rd party plugins can be made available for installation by simply copying them to the plugin_dir.

It is important to note that different plugins may have different maturity levels. It is possible to prevent the installation of plugins we don’t consider production-ready by setting the plugin_maturity system variable. For plugins that are distributed with MariaDB, the maturity level is determined by the MariaDB team based on the bugs reported and fixed.

Some plugins are developed by 3rd parties. Even some 3rd party plugins are included in MariaDB official distributions – the ones available in mariadb.org. All plugins distributed with MariaDB are maintained by the MariaDB company or the MariaDB Foundation.

In MariaDB every authorization method (including the default one) is provided by an authentication plugin. A user can be required to use a certain authentication plugin. This gives us a big flexibility and control. Windows users may be interested in gsapi (which supports Windows authentication, Kerberos and NTLM) and name_pipe (which uses named pipe impersonation).

Other plugins that can be very useful include userstat, which includes statistics about resources and table usage, and METADATA_LOCK_INFO, which provides information about metadata locks.

Thread pool

MariaDB supports thread pool. It works differently on UNIX and on Windows. On Windows, it is enabled by default and its implementation is quite similar to SQL Server. It uses the Windows native CreateThreadpool API.

SQL Server features not available in MariaDB

$
0
0

Feed: MariaDB Knowledge Base Article Feed.
Author: .

When planning a migration between different DBMSs, one of the most important aspects to consider is that the new database system will probably miss some features supported by the old one. This is not relevant for all users. The most widely used features are supported by most DBMSs. However, it is important to make a list of unsupported features and check which of them are currently used by applications. In most cases it is possible to implement such features on the application side, or simply stop using them.

This page has a list of SQL Server features that are not supported in MariaDB. The list is not exhaustive.

Introduced in SQL Server 2016 or older

  • Full outer joins
  • GROUP BY CUBE
  • MERGE
  • User-Defined Types
  • Rules
  • Triggers on DDL and login; INSTEAD OF triggers, DISABLE TRIGGER
  • Cursors advanced features
    • Global cursors
    • DELETE CURRENT OF, UPDATE CURRENT OF (MariaDB cursors are dead-only)
    • Specifying a direction (MariDB cursors can only advance by one row)
  • Synonyms
  • Queues
  • XML indexes, XML schema collection, XQuery
  • User access to system functionalities, for example:
    • Running system commands (xp_cmdshell())
    • Sending emails (sp_send_dbmail())
    • Sending HTTP requests

Introduced in SQL Server 2017

  • Adaptive joins
  • Graph SQL
  • External libraries (MariaDB only supports procedural SQL and PL/SQL)

Introduced in SQL Server 2019

  • External languages (MariaDB only supports procedural SQL and PL/SQL)

Comments

Content reproduced on this site is the property of its respective owners,
and this content is not reviewed in advance by MariaDB. The views, information and opinions
expressed by this content do not necessarily represent those of MariaDB or any other party.

Hands-on SQL in PostgreSQL

$
0
0

Feed: Databasejournal.com – Feature Database Articles.
Author: .

In this article you will learn about SQL support in PostgreSQL as well as learn how commands are typically entered using the PostgreSQL interactive terminal – psql. The hands-on part of the SQL deals with CRUD, viewing data etc. as some of the basic SQL commands in PostgreSQL. PostgreSQL is an advanced ORDBMS package and can be thought of as an open source rival to none other than Oracle. There are many intricate database concepts incorporated in this system, but for the sake of brevity and simplicity we’ll stick to hands-on SQL and its related ideas only.

SQL (Structured Query Language) is the standard way to communicate with database servers in a Relational Database Management System (RDBMS). It was originated in 70s, as a domain specific language to conduct database queries. The name initially was SEQUEL (Structured English Query Language) later renamed to SQL due to copyright issues. The definition and intuitive capabilities of SQL made it popular and became integral part of RDBMS supplied by most vendors, PostgreSQL is no exception. The SQL was standardized in ANSI X3.135 in 1986 later adopted by ISO as ISO 9075-1987. The ISO committee revises it periodically and the most recent revision was in 2016 in nine parts (ISO/IEC 9075-1:2016, ISO/IEC 9075-2:2016, ISO/IEC 9075-3:2016, ISO/IEC 9075-4:2016, ISO/IEC 9075-9:2016, ISO/IEC 9075-10:2016, ISO/IEC 9075-11:2016, ISO/IEC 9075-14:2016, ISO/IEC 9075-19:2016). Donald Chamberlin and Ray Boyce are the well-known progenitors of this language.

SQL in PostgreSQL

PostgreSQL is an Object Relational Database System (ORDBMS) based upon POSTGRES version 4.2, a project of University of California, Berkeley, CS Department who pioneered many new concepts that was later adopted by many commercial database systems. PostgreSQL is open source and the current version (version 12) supports a large part of SQL:2016 core standard features (160 of 179) with a long list of optional features. This grossly means that standard SQL syntax can be used to communicate with the database provided by PostgreSQL system.

PostgreSQL uses client/server model communication; therefore, the server continually runs and waits for clients request to come in and responses back with appropriate result. Note that PostgreSQL runs as an independent process on the computer. Users cannot interact with it directly. A client interface has been designed specifically for user interaction. Similar to many modern database systems, there is a GUI interface called pgAdmin to make life easier for the developer to deal with administrative usages of PostgreSQL. The GUI also accommodates psql client application interface where SQL can be written.


Figure 1 –
PgAdmin4 GUI

Otherwise, SQL shell (psql) can be used to write SQL commands as follows.

Figure 2 – SQL Shell (psql)

Getting Started with psql

At this point I assume that:

  • PostgreSQL is installed properly
  • The server is up and running
  • It has been configured properly

Note: These are easy steps mostly accomplished during installation phases. If in doubt, do not forget to consult the manual. It has all the information to make PostgreSQL functional.

PostgreSQL controls several databases. From the point of view the server these databases are storage areas to store repository, such as, employee database, payroll database, inventory database and so on. A database is empty when first created, later tables are created, and relationships added among them using DDL (Data Definition Language). The DDL part of SQL is used to specify database schema structure while DML (Data Manipulation Language) part of the SQL is used to access, retrieve, insert and modify data to and from the database.

  • Typical DDL commands of SQL: CREATE, ALTER, DROP, RENAME, TRUNCATE etc.
  • Typical DML commands of SQL: SELECT, INSERT, UPDATE, DELETE, MERGE, CALL etc.

We start a psql session and connect to the database, in our case postgres. To keep things simple, I’ll use SQL Shell (psql) as the SQL interface to PostgreSQL server.

Hands on SQL in psql

Once you have connected the prompt should appear as shown in Fig 2. To test try some basic SQL commands (one may use lower or uppercase letters, does not matter unless you are typing string in quotes):

postgres=# SELECT CURRENT_USER;

 current_user 
--------------
 postgres
(1 row)

postgres=# SELECT CURRENT_TIMESTAMP;

        current_timestamp         
----------------------------------
 2019-11-22 12:20:37.242123+05:30
(1 row)

Multiline query:

postgres=# SELECT
postgres-# 10-8+6
postgres-# ;

Result

?column? 
----------
        8
(1 row)

Notice how the prompt changes from postgres=# to postgresql-# in multiline query. And a SQL command as always ends with a semicolon (;). The psql maintains a buffer that we can use it to our advantage while typing queries. For example, you can use backslash-p (p) to see the content of the query buffer. You also can use left, right arrow keys to move around the text you have entered in the prompt and up and down arrow key to retrieve previously typed query. The buffer can be erased with backslash-r (r). To quit psql type backslash-q (q). Just type backslash-? (?) to get more such commands.

CREATE database

There are two ways to create a database in PostgreSQL. The simplest one is as follows:

CREATE DATABASE employees;

There is a wrapper around this SQL command to create database called createdb. This command is located in the bin directory where PostgreSQl is installed. For example, you can create one in the terminal as follows.

createdb -h localhost -p 5432 -U postgres employees
password ****

This will prompt for password of the admin user.

You can then find out the list of databases available in the server by typing following in the SQL Shell (psql).

postgres-# l

 

Note: To switch between databases from SQL Shell (psql) terminal, you can write:

postgres-# c employees

This would change the current database from default postgres to a new database you may have created, say, employees. The prompt will change to employees as follows to indicate the change.

employees-#

CREATE table

The mathematical basis of a relational database is that all data stored in it are arranged in a uniform structure. This structure is made visible in the form of a table. Tables are the foundation of relational database management system (RDBMS). Each table has a name signifying the entity it represents, records of the entity denoted by rows and columns denotes the properties of the entity.

In PostgreSQL, we may loosely call a Table as a schema, relation or class, row as a record or tuple, column as field or attribute.

Let’s create one in psql.

CREATE TABLE employee
(
   emp_id INTEGER NOT NULL,
   birth_date DATE,
   email CHAR(20) ,    
   emp_name CHAR(30),
   join_date DATE,
   phone CHAR(10),
   CONSTRAINT employee_pkey PRIMARY KEY (emp_id)
);

This would create the employee table as follows.

emp_id

birth_date

email

emp_name

join_date

phone

The type associated with the column name indicates the column type and length. For example, CHAR (10) means the column holds a maximum of 10 characters. There are different column types supported by PostgreSQL such as DATE, INTEGER, BIGINT etc. You can find the information of a specific table as follows:

postgres=# d employee

Insert data

You must use single quotes to insert character data, double quotes will not work. Numeric data is inserted without any quotes. And for date, you must also use single quotes. Let’s insert some data.

INSERT INTO employee VALUES (101,'10-10-1987','abc@gmail.com','Sameer Rawat','1-3-2015','9876543210');

When using INSERT command, you must make sure that each piece of data matches with the receiving column. Otherwise, you must write a more elaborately query as follows:

INSERT INTO employee(emp_id, emp_name,birth_date,email,join_date,phone) VALUES (102,’Arvind Verma’,’2-3-1988′,‘av@gmail.com‘,’2-3-2014′,’1234567890’);

Viewing data

Now, to make sure both the record has been rightly stored in the database you can enter the following command.

The only command to retrieve data from the database is SELECT. This command tells the database that we want to retrieve data. But what data? The asterisk (*) means all the data. But from where? The FROM clause tells that it is from the employee table. The clause ASC LIMIT 100 denotes that first hundred rows will be returned.

SELECT * FROM employee ASC LIMIT 100;

Understand that, SELECT has a large number of variations. Here you’ll see only a few.

6

SELECT COUNT(emp_id) FROM employee;SELECT emp_name,phone FROM employee;

 

SELECT emp_name, birth_date FROM employee WHERE emp_id < 103;

 

Removing data

Removing data from a database table is pretty simple. The DELETE command can remove any row in the table. Also, you can use the same command to remove all the rows from the table.

DELETE FROM employee WHERE emp_id = 101;

This will delete a single record whose emp_id is 101. Now, if you write DELETE command as follows it will delete all record the table in our case because you have set all emp_id greater than 100.

DELETE FROM employee WHERE emp_id > 100;

Modifying data

The UPDATE command can be used to modify existing data in the table. Suppose, you want to change the phone number of the employee record with emp_id 101, you can do so in the following manner.

UPDATE employee SET phone = ‘1122334455’ WHERE emp_id = 101;

The SET phone = ‘...’ sets the new value to the particular record selected by the WHERE clause. The record is modified accordingly.

Destroying table

Unlike anything, destroying a table is actually very simple. It is done with the help of DROP command.

DROP tablename;

This would efface completely the table from the database. That’s all.

Conclusion

Here we have scratched upon some of the key ideas of PostgreSQL database and how to interact with the server using trusty SQL commands. PostgreSQL is an excellent opensource ORDBMS package, alternative only to Oracle in its features. SQL being the primary interactive language, you have glimpsed how to use this in PostgreSQL in brief. Note that there is nothing new in the sense that you can do almost anything with SQL in PostgreSQL, as you may have already learnt. Stay tuned we’ll touch upon many interesting concepts in PostgreSQL. Happy learning.

# # #

Webinar: Time Series Data Capture & Analysis in MemSQL 7.0

$
0
0

Feed: MemSQL Blog.
Author: Floyd Smith.

With the MemSQL 7. 0 release, MemSQL has added more special-purpose features, making it even easier to manage time series data within our best-of-breed operational database. These new features allow you to structure queries on time series data with far fewer lines of code and with less complexity. With time series features in MemSQL, we make it easier for any SQL user, or any tool that uses SQL, to work with time series data, while making expert users even more productive. In a recent webinar (view the recording here), Eric Hanson described the new features and how to use them.

The webinar begins with an overview of MemSQL, then describes how customers have been using MemSQL for time series data for years, prior to the MemSQL 7.0 release. Then there’s a description of the time series features that MemSQL has added, making it easier to query and manage time series data, and a Q&A section at the end.

Introducing MemSQL

MemSQL is a very high-performance scalable SQL relational database system. It’s really good for scalable operations, both for transaction processing and analytics on tabular data. Typically, it can be as much as 10 times faster, and three times more cost-effective, than legacy database providers for large volumes under high concurrency.

We like to call MemSQL the No-Limits Database because of its amazing scalability. It’s the cloud-native operational database that’s built for speed and scale. We have capabilities to support operational analytics. So, operational analytics is when you have to deliver very high analytical performance in an operational database environment where you may have concurrent updates and queries running at an intensive, demanding level. Some people like to say that it’s when you need “Analytics with an SLA.”

Now, I know that everybody thinks they have an SLA when they have an analytical database, but when you have a really demanding SLA like requiring interactive, very consistent response time in an analytical database environment, under fast ingest, and with high concurrency, that’s when MemSQL really shines.

We also support predictive ML and AI capabilities. For example, we’ve got some built-in functions for vector similarity matching. Some of our customers were using MemSQL in a deep learning environment to do things like face and image matching and customers are prototyping applications based on deep learning like fuzzy text matching. The built-in dot product and Euclidean distance functions we have can help you make those applications run with very high performance. (Nonprofit Thorn is one organization that uses these ML and AI-related capabilities at the core of their app, Spotlight, which helps law enforcement identify trafficked children. – Ed.)

Also, people are using MemSQL when they need to move to cloud or replace legacy relational database systems. When they reach some sort of inflection point, like they know they need to move to cloud, they want to take advantage of the scalability of the cloud, they want to consider a truly scalable product, and so they’ll look at MemSQL. Also, when it comes time to re-architect the legacy application – if, say, the scale of data has grown tremendously, or is expected to change in the near future, people really may decide they need to find a more scalable and economical platform for their relational data, and that may prompt them to move to MemSQL.

Here are examples of the kinds of workloads and customers we support: Half of the top 10 banks banks in North America, two of the top three telecommunications companies in North America, over 160 million streaming media users, 12 of the Fortune 50 largest companies in the United States, and technology leaders from Akamai to Uber.

If you want to think about MemSQL and how it’s different from other database products, you can think of it as a very modern, high-performance, scalable SQL relational database. We have all three: speed, scale, and SQL. We get our speed because we compile queries to machine code. We also have in-memory data structures for operational applications, an in-memory rowstore structure, and a disk-based columnstore structure.

MemSQL is the No-Limits Database

We compile queries to machine code and we use vectorized query execution on our columnar data structure. That gives us tremendous speed on a per-core basis. We’re also extremely scalable. We’re built for the cloud. MemSQL is a cloud-native platform that can gang together multiple computers to handle the work for a single database, in a very elegant and high-performance fashion. There’s no real practical limit to scale when using MemSQL.

Finally, we support SQL. There are some very scalable database products out there in the NoSQL world that are fast for certain operations, like put and get-type operations that can scale. But if you try to use these for sophisticated query processing, you end up having to host a lot of the query processing logic in the application, even to do simple things like joins. It can make your application large and complex and brittle – hard to evolve.

So SQL, the relational data model, was invented by EF Codd (PDF) – back around 1970 – for a reason. To separate your query logic from the physical data structures in your database, and to provide a non-procedural query language that makes it easier to find the data that you want from your data set. The benefits that were put forth when the relational model was invented are still true today.

We’re firmly committed to relational database processing and non-procedural query languages with SQL. There’s tremendous benefits to that, and you can have the best of both. You can have speed, and you can have scale, along with SQL. That’s what we provide.

How does MemSQL fit into the rest of your data management environment? MemSQL provides tremendous support for analytics, application systems like dashboards, ad-hoc queries, and machine learning. Also other types of applications like real-time decision-making apps, Internet of Things apps, dynamic user experiences. The kind of database technology that was available before couldn’t provide the real-time analytics that are necessary to give the truly dynamic user experience people are looking for today; we can provide that.

MemSQL architectural chart CDC and data types

We also provide tremendous capabilities for fast ingest and change data capture (CDC). We have the ability to stream data into MemSQL from multiple sources like file systems and Kafka. We have a feature called Pipelines, which is very popular, to automatically load data from file folders, AWS S3, Kafka. You can transform data as it’s flowing into MemSQL, with very little coding. We support a very high performance and scalable bulk load system.

We have support for a large variety of data types including relational data, standard structured data types, key-value, JSON, geospatial, time-oriented data, and more. We run everywhere. You can run MemSQL on-premises, you can run it in the cloud as a managed database platform, or as a service in our new Helios system, which just was delivered in September.

We also allow people to self-host in the cloud. If they want full control over how their system is managed, they can self-host on all the major cloud providers and also run in containers; so, wherever you need to run, we are available.

I mentioned scalability earlier and I wanted to drill into that a little bit to illustrate the, how our platform is organized. MemSQL provides an image to the database client application as just, it’s just a database. You have a connection string, you connect, you set your connection to use us as a database, and you can start submitting SQL statements. It’s a single system image. The application doesn’t really know that MemSQL is distributed – but, underneath the sheets, it’s organized as you see in this diagram.

MemSQL node and leaf architecture

There are one or more aggregator nodes, which are front-end nodes that the client application connects to. Then, there can be multiple back-end nodes. We call them leaf nodes. The data is horizontally partitioned across the leaf nodes – some people call this sharding. Each leaf node has one or more partitions of data. Those partitions are defined based on some data definition language (DDL); when you create your table, you define how to shard the data across nodes.

MemSQL’s query processor knows how to take a SQL statement and divide it up into smaller units of work across the leaf nodes, and final assembly results is done by the aggregator node. Then, the results are sent back for the client. As you need to scale, you can add additional leaf nodes and rebalance your data, so that it’s easy to scale the system up and down as needed.

How Customers Have Used MemSQL for Time Series Data

So with that background on MemSQL, let’s talk about using MemSQL for time series data. First of all, for those of you who are not really familiar with time series, a time series is simply a time-ordered sequence of events of some kind. Typically, each time series entry has, at least, a time value and some sort of data value that’s taken at that time. Here’s an example time series of pricing of a stock over time, over like an hour and a half or so period.

MemSQL time series stock prices

You can see that the data moves up and down as you advance in time. Typically, data at any point in time is closely correlated to the immediately previous point in time. Here’s another example, of flow rate. People are using MemSQL for energy production, for example, in utilities. They may be storing and managing data representing flow rates. Here’s another example, a long-term time series of some health-oriented data from the US government, from the Centers for Disease Control, about chronic kidney disease over time.

These are just three examples of time series data. Virtually every application that’s collecting business events of any kind has a time element to it. In some sense, almost all applications have a time series aspect to them.

Let’s talk about time series database use cases. It’s necessary, when you’re managing time-oriented data, to store new time series events or entries, to retrieve the data, to modify time series data – to delete or append or truncate the data, or in some cases, you may even update the data to correct an error. Or you may be doing some sort of updating operation where you are, say, accumulating data for a minute or so. Then, once the data has sort of solidified or been finalized, you will no longer update it. There are many different modification scenarios for time series data.

Another common operation on time series data is to do things like convert an irregular time series to a regular time series. For example, data may arrive with a random sort of arrival process, and the spacing between events may not be equal, but you may want to convert that to a regular time series. Like maybe data arrives every 1 to 10 seconds, kind of at random. You may want to create a time series which has exactly 1 data point every 15 seconds. That’s an example of converting from an irregular to a regular time series.

MemSQL time series use cases

Another kind of operation on time series is to downsample. That means you may have a time series with one tick every second, maybe you want to have one tick every one minute. That’s downsampling. Another common operation is smoothing. So you may have some simple smoothing capability, like a five-second moving average of a time series, where you average together like the previous five seconds worth of data from the series, or a more complex kind of smoothing – say, where you fit a curve through the data to smooth it , such as a spline curve. There are many, many more kind of time series use cases.

A little history about how MemSQL has been used for time series is important to give, for context. Customers already use MemSQL for time series event data extensively, using our previously shipped releases, before the recent shipment of MemSQL 7.0 and its time series-specific features. Lots of our customers store business events with some sort of time element. We have quite a few customers in the financial sector that are storing financial transactions in MemSQL. Of course, each of these has a time element to it, recording when the transaction occurred.

MemSQL Time series plusses

Also, lots of our customers have been using us for Internet of Things (IoT) events. For example, in utilities, in energy production, media and communications, and web and application development. For example, advertising applications. As I mentioned before, MemSQL is really tremendous for fast and easy streaming. With our pipelines capability, it’s fast and easy to use load data, and just very high-performance insert data manipulation language (DML). You can do millions of inserts per second on a MemSQL cluster.

We have a columnstore storage mechanism which has tremendous compression – typically, in the range of 5x to 10x, compared to raw data. It’s easy to store a very large volume of historical data in a columnstore table in MemSQL. Because of the capabilities that MemSQL provides for high scalability, high-performance SQL, fast, and easy ingest, and high compression with columnar data storage. All those things have made MemSQL really attractive destination for people that are managing time series data.

New Time Series Features in MemSQL 7.0

(For more on what’s in MemSQL 7.0, see our release blog post, our deep dive into resiliency features, and our deep dive into MemSQL SingleStore. We also have a blog post on our time series features. – Ed.)

Close to half of our customers are using time series in some form, or they look at the data they have as time series. What we wanted to do for the 7.0 release was to make time series querying easier. We looked at some of our customers’ applications, and some internal applications we had built on MemSQL for historical monitoring. We saw that, while the query language is very powerful and capable, it looked like some of the queries could be made much easier.

MemSQL easy time series queries

We wanted to provide a very brief syntax to let people write common types of queries – to do things like downsampling, or converting irregular time series to regular time series. You want to make that really easy. We wanted to let the more typical developers do things they couldn’t do before with SQL because it was just too hard. Let experts do more, and do it faster ,so they could spend more time on other parts of their application rather than writing tricky queries to extract information from time series.

So that said, we were not trying to be the ultimate time series specialty package. For example, if you need curve fitting, or very complex kinds of smoothing ,or you need to add together two different time series, for example. We’re not really trying to enable those use cases to be as easy and fast as they can be. We’re looking at sort of a conventional ability to manage large volumes of time series data, ingest the time series fast, and be able to do typical and common query use cases through SQL easily. That’s what we want to provide. If you need some of these specialty capabilities, you probably want to consider a more specialized time series product like KBB+ or something similar to that.

Throughout the rest of the talk, I’m going to be referring a few times to an example based on candlestick charts. A candlestick chart is a typical kind of chart used in the financial sector to show high, low, open, and close data for a security, during some period of time – like an entire trading day, or by minute, or by hour, et cetera.

MemSQL time series candlestick chart

This graphic shows a candlestick chart with high, low, open, close graphic so that the little lines at the top and bottom show the high and low respectively. Then, the box shows the open and close. Just to start off with, I wanted to show a query using MemSQL 6.8 to calculate information that is required to render a candlestick chart like you see here.

MemSQL time series old and new code

On the left side, this is a query that works in MemSQL 6.8 and earlier to produce a candlestick chart from a simple series of financial trade or transaction events. On the right-hand side, that’s how you write the exact same query in MemSQL 7.0. Wow. Look at that. It’s about one third as many characters as you see on the left, and also it’s much less complex.

On the left, you see you’ve got a common table expression with a nested select statement that’s using window functions, sort of a relatively complex window function, and several aggregate functions. It’s using rank, and then using a trick to pick out the top-ranked value at the bottom. Anyway, that’s a challenging query to write. That’s an expert-level query, and even experts struggle a little bit with that. You might have to refer back to the documentation.

I’ll go over this again in a little more detail, but just please remember this picture. Look how easy it is to manage time series data to produce a simple candlestick chart on the right compared to what was required previously. How did we enable this? We provide some new time series functions and capabilities in MemSQL 7.0 that allowed us to write that query more easily.

New MemSQL time series functions

We provide three new built-in functions: FIRST(), LAST(), and TIME_BUCKET(). FIRST() and LAST() are aggregate functions that provide the first or last value in a time window or group, based on some time period that defines an ordering. I’ll say more about those in a few minutes. TIME_BUCKET() is a function that maps a timestamp to a one-minute or five-minute or one-hour window, or one-day window, et cetera. It allows you to do it in a very easy way with a very brief syntax, that’s fairly easy to learn and remember.

Finally, we’ve added a new designation called the SERIES TIMESTAMP column designation, which allows you to mark one of your columns as the time column for your time series. That allows some shorthand notations that I’ll talk about more.

Time series timestamp example

Here’s a very simple example table that holds time series data for financial transactions. We’ve got a ts column, that’s a datetime 6 marked as the series timestamp. The data type is datetime 6, which is, it’s standard datetime with six places to the right of the decimal point. It’s accurate down to the microsecond. Symbol is like a stock symbol, a character string up to five characters. Price is a decimal, with up to 18 digits in 4 places to the right of the decimal point. So very simple time series table for financial information.

Some examples that are going to follow, I’m going to use this simple data set. Now, we’ve got two stocks, made-up stocks, ABC and XYZ that have some data that’s arrived in a single day, February 18th of next year, in a period of a few minutes. We’ll use that data and some examples set in the future.

Let’s look in more detail at the old way of querying time series data with MemSQL using window functions. I want to, for each symbol, for each hour, produce high, low, open, and close. This uses a window function that partitions by a time bucket. The symbol and time bucket ordered by timestamp, and the rows are between unbounded preceding and unbounded following. “Unbounded” means that any aggregates we calculate over this window will be over the entire window.

Old code for time series with SQL

Then, we compute the rank, which is the serial number based on the sort order like 1, 2, 3, 4, 5. One is first, two is second, so forth. Then, the minimum and maximum over the window, and first value and last value over the window. First value and last value are the very original value and the very final value in the window, based on the sort order of the window. Then, you see that from Unix time, Unix timestamp, ts divided by 60 times 60, times 60 times 60.

This is a trick that people who manage time series data with SQL have learned. Basically, you can multiply, you can divide a timestamp by a window, and then multiply by the window again, and that will chunk up a fine-grain timestamp into a coarser grain that is bound at a window boundary. In this case, it’s 60 times 60. Then, finally, the select block at the end, you’ve got, you’re selecting the time series, the timestamp from above the symbol, min price, max price, first, last, but above that produced an entry for every single point in the series, so we really only want one. We pick out the top-ranked one.

Anyway, this is tricky. I mean, this is the kind of thing that will take an expert user from several minutes, to many minutes, to write, and with references back to the documentation. Can we do better than this? How can we do better? We introduced first and last as regular aggregate functions, in order to enable this kind of use case, with less code. We’ve got a very basic example. Now, select first, price, ts from tick, but the second argument to the first aggregate is a timestamp, but it’s optional.

If it’s not present, then we infer that you meant to use the series timestamp column of the table that you’re querying. The top one is the full notation, but in the bottom query, you say select first price, last price from tick. That first price and last price from tick implicitly use the series timestamp column ts as the time argument, the second argument to those aggregate functions. It just makes the query easier to write. You don’t have to remember to explicitly put in the series time value in the right place when you use those functions.

Next, we have a new function for time bucketing. You don’t have to write that tricky divide, and then that multiply kind of expression that I showed you before. Much, much easier to use, more intuitive. Time bucket takes a bucket width, and that’s a character string like 5m, for five minutes, 1h for one hour, and so forth. Then, two optional arguments – the time and the origin.

New code with MemSQL Time Series functions

The time is optional just like before. If you don’t use it, if you don’t specify it, then we implicitly add the series timestamp column from the table or table, from the table that you’re querying. Then, origin allows you to provide an offset. For example, if you want to do time bucketing but start at 8:000 AM every day, you want a bucket by day but start your day at 8AM instead of midnight, then you can put in an origin argument.

Again, this is far easier than the tricky math expression that we used for that candlestick query before. Here’s some example of uses of origin with an 8AM origin. For example, we’ve got this table T with that, and ts is the series timestamp ,and v is a value that’s a double-precision float. You see the query there in the middle: select time bucket 1d ts, and then you pick a date near the timestamps that you’re working with, and provide… That’s your origin. It’s an 8AM origin.

Then, some of these. You can see down below that the days, the day bucket boundaries are starting at 8AM. Normally, you’re not going to need to use an origin, but if you do have that need to have an offset you can do that. Again, let’s look at the new way of answering, providing the candlestick chart query. This uses, we say select time bucket 1h, which is a one hour bucket. Then, the symbol, the minimum price, the maximum price, the first price, and the last price.

Notice that in first and last and time bucket, we don’t even have to refer to the timestamp column in the original data set, because it’s implicit. Some of you may have worked with specialty products for managing web events like Splunk or Azure Kusto, and so this concept of using a time bucket function or a bucket function with an easy notation like this, you may be familiar with that from those kind of systems.

One of the reason people like those products so much for the use cases that they’re designed for is that it’s really easy to query the data. The queries are very brief. We try to bring that brevity for time series data to SQL with this new capability, with the series timestamp that’s an implicit argument to these functions. Then, just group by 2, 1, which is the time bucket and the symbol and order by 2, 1. So, very simple query expression.

Just to recap, MemSQL for several years has been great for time series ingest and storage. People loved it for that. We have very fast ingest, powerful SQL capability, with time-oriented functions as part of our window function capability. High-performance query processing based on compilation to machine code and vectorization, as well as scalability through scale-out and also the ability to support high concurrency, where you’ve got lots of writers and readers concurrently working on the same data set. And not to mention, we provide transactions, support, easy manageability, and we’re built for the cloud.

Now, given all the capabilities we already had, we’re making it even easier to query time series data with this new brief syntax, these new functions, first, last, and time bucket in the series timestamp concept, that allows you to write queries very briefly, without having to reference, repeatedly and redundantly, to the time column in your table.

Time series functions recap

This lets non-expert users do more than they could before, things they just weren’t capable of before with time series data, and it makes experts users more productive. I’d like to invite you to try MemSQL for free today, or contact Sales. Try it for free by using our free version, or go on Helios and do an eight-hour free trial. Either way, you can try MemSQL for no charge. Thank you.

Q&A: MemSQL and Time Series

Q. What’s the best way to age out old data from a table storing time series data?

A. The life cycle management of time series data is really important in any kind of time series application. One of the things you need to do is eliminate or purge old data. It’s really pretty easy to do that in MemSQL. All you have to do is run a delete statement periodically to delete the old data. Some other database products have time-oriented partitioning capabilities, and their delete is really slow, so they require you to, for instance, swap out an old partition once a month or so to purge old data from a large table. In MemSQL, you don’t really need to do that, because our delete is really, really fast. We can just run a delete statement to delete data prior to a certain time, whenever you need to remove old data.

Q. Can you have more than one time series column in a table?

A. You can only designate one column in a table as the series timestamp. However, you can have multiple time columns in a table and if you want to use different columns, you can use those columns explicitly with our new built-in time functions – FIRST(), LAST(), and TIME_BUCKET().There’s an optional time argument, so if you want to have like a secondary time on a table that’s not your primary series time stamp, but you want to use it for some of those functions, you can do it. You just have to name the time column explicitly in the FIRST(), LAST(), and TIME_BUCKET() functions.

Q. Does it support multi-tenancy?

A. Does it support multi-tenancy? Sure. MemSQL supports any number of concurrent users, up a very high number of concurrent queries. You can have multiple databases on a single cluster, and each application can have its own database if you want to, to have multi-tenant applications running on the same cluster.

Q. Does MemSQL keep a local copy of the data ingested or does it only keep references? If MemSQL keeps a local copy, how is it kept in sync with external sources?

A. MemSQL is a database system. You create tables, you insert data in the tables, you query data in the tables, you can update the data, delete it. So when you add a record to MemSQL it, that record, a copy of the information and that record, the record itself is kept in MemSQL. It doesn’t store data by reference, it stores copies of the data. If you want to keep it in sync with external sources, you need to, as the external values change, you’ll need to update the record that represents that information in MemSQL.

Q. How can you compute a moving average on a time series in MemSQL?

A. Sure. You can compute a moving average; it depends on how you want to do it. If you just want to average the data in each time bucket, you can just use average to do that. If you want to do a moving average, you can use window functions for that, and you can do an average over a window as it moves. You can average over a window from three preceding rows, to the current row, to average the last four values.

Q. Did you mention anything about Python interoperability? In any event, what Python interface capabilities do you offer?

A. We do have Python interoperability – in that, you can let client applications that connect to MemSQL and insert data, query data, and so forth in just about any popular query language. We support connectivity to applications through drivers that are MySQL wire protocol-compatible. Essentially, any application software that can connect to the MySQL database and insert data, update data, and so forth, can also connect to MemSQL.
We have drivers for Python that allow you to write a Python application and connect it to MemSQL. In addition, in our Pipeline capability, we support what are called transforms. Those are programs or scripts that can be applied to transform batches of information that are flowing into MemSQL through the Pipeline. You can write transforms in Python as well.

Q. Do I need to add indexes to be able to run fast select queries on time series data, with aggregations?

A. So, depending on the nature of the queries and how much data you have, how much hardware you have, you may or may not need to use indexes to make certain queries run fast. I mean, it really depends on your data and your queries. If you have very large data sets and high-selectivity queries and a lot of concurrency, you’re probably going to want to use indexes. We support indexes on our rowstore table type, both ordered indexes and hash indexes.

Then, our columnstore table type, we have a sort key, a primary sort key, which is like an index in some ways, as well as support for secondary hash indexes. However, the ability to share your data across multiple nodes in a large cluster and use columnstore, data storage structures that with very fast vectorized query execution makes it possible to run queries with response times of a fraction of a second, on very large data sets, without an index.
That can make it easier as an application developer, you can let the power of your computing cluster and database software just make it easier for you and not have to be so clever about defining your indexes. Again, it really depends on the application.

Q. Can you please also talk about encryption and data access roles, management for MemSQL?

A. With respect to encryption, for those customers that want to encrypt their data at rest, we recommend that they use Linux file system capabilities or cloud storage platform capabilities to do that, to encrypt the data through the storage layer underneath the database system.
Then, with respect to access control, MemSQL has a comprehensive set of data access capabilities. You can grant permission to access tables and views to different users or groups. We support single sign-on through a number of different mechanisms. We have a pretty comprehensive set of access control policies. We also support row-level security.

Q. What of row locking will I struggle kind with by using many transactions, selects, updates, deletes at once?

MemSQL has multi-version concurrency control, so readers don’t block writers and vice versa. Write-Write conflicts usually happen at row-level lock granularity.

Q. How expensive is it to reindex a table?

A. CREATE INDEX is typically fast. I have not heard customers have problems with it.

Q. Your reply on moving averages seem to pertain to simple moving averages, but how would you do exponential moving averages or weighted moving averages where a windows function may not be appropriate?

A. For that you’d have to do it in the client application or in a stored procedure. Or consider using a different time series tool.

Q. Are there any utilities available for time series data migration to / from an existing datastores like Informix,

A. For straight relational table migration, yes. But you’d have to probably do some custom work to move data from a time series DataBlade in Informix to regular tables in MemSQL.

Q. Does series timestamp accept integer data type or it has to be datetime data type?

A. The data type must be time or datetime or timestamp. Timestamp is not recommended because it has implied update behavior.

Q. Any plans to support additional aggregate functions with the time series functions? (e.g. we would have liked to get percentiles like first/last without the use of CTEs)

A. Percentile_cont and percentile_disc work in MemSQL 7.0 as regular aggs. If you want other aggs, let us know.

Q. Where can I find more info on AI (ML & DL) in MemSQL?

A. See the documentation for dot_product and euclidean_distance functions. And see webinar recordings about this from the past. And see blog: https://www.memsql.com/blog/memsql-data-backbone-machine-learning-and-ai/

Q. Can time series data be associated with asset context and queried in asset context. (Like a tank, with temperature, pressure, etc., within the asset context of the tank name.)

A. A time series record can have one timestamp and multiple fields. So I think you could use regular string table fields for context and numeric fields for metrics to plot and aggregate.

Q. Guessing the standard role based security model exists to restrict access to time series data.

A. Yes.

(End of Q&A)

We invite you to learn more about MemSQL at https://www.memsql.com, or give us a try for free at https://www.memsql.com/free.

Maximizing Database Query Efficiency for MySQL – Part Two

$
0
0

Feed: Planet MySQL
;
Author: Severalnines
;

This is the second part of a two-part series blog for Maximizing Database Query Efficiency In MySQL. You can read part one here.

Using Single-Column, Composite, Prefix, and Covering Index

Tables that are frequently receiving high traffic must be properly indexed. It’s not only important to index your table, but you also need to determine and analyze what are the types of queries or types of retrieval that you need for the specific table. It is strongly recommended that you analyze what type of queries or retrieval of data you need on a specific table before you decide what indexes are required for the table. Let’s go over these types of indexes and how you can use them to maximize your query performance.

Single-Column Index

InnoD table can contain a maximum of 64 secondary indexes. A single-column index (or full-column index) is an index assigned only to a particular column. Creating an index to a particular column that contains distinct values is a good candidate. A good index must have a high cardinality and statistics so the optimizer can choose the right query plan. To view the distribution of indexes, you can check with SHOW INDEXES syntax just like below:

root[test]#> SHOW INDEXES FROM users_accountG

*************************** 1. row ***************************

        Table: users_account

   Non_unique: 0

     Key_name: PRIMARY

 Seq_in_index: 1

  Column_name: id

    Collation: A

  Cardinality: 131232

     Sub_part: NULL

       Packed: NULL

         Null: 

   Index_type: BTREE

      Comment: 

Index_comment: 

*************************** 2. row ***************************

        Table: users_account

   Non_unique: 1

     Key_name: name

 Seq_in_index: 1

  Column_name: last_name

    Collation: A

  Cardinality: 8995

     Sub_part: NULL

       Packed: NULL

         Null: 

   Index_type: BTREE

      Comment: 

Index_comment: 

*************************** 3. row ***************************

        Table: users_account

   Non_unique: 1

     Key_name: name

 Seq_in_index: 2

  Column_name: first_name

    Collation: A

  Cardinality: 131232

     Sub_part: NULL

       Packed: NULL

         Null: 

   Index_type: BTREE

      Comment: 

Index_comment: 

3 rows in set (0.00 sec)

You can inspect as well with tables information_schema.index_statistics or mysql.innodb_index_stats.

Compound (Composite) or Multi-Part Indexes

A compound index (commonly called a composite index) is a multi-part index composed of multiple columns. MySQL allows up to 16 columns bounded for a specific composite index. Exceeding the limit returns an error like below:

ERROR 1070 (42000): Too many key parts specified; max 16 parts allowed

A composite index provides a boost to your queries, but it requires that you must have a pure understanding on how you are retrieving the data. For example, a table with a DDL of…

CREATE TABLE `user_account` (

  `id` int(11) NOT NULL AUTO_INCREMENT,

  `last_name` char(30) NOT NULL,

  `first_name` char(30) NOT NULL,

  `dob` date DEFAULT NULL,

  `zip` varchar(10) DEFAULT NULL,

  `city` varchar(100) DEFAULT NULL,

  `state` varchar(100) DEFAULT NULL,

  `country` varchar(50) NOT NULL,

  `tel` varchar(16) DEFAULT NULL

  PRIMARY KEY (`id`),

  KEY `name` (`last_name`,`first_name`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1

…which consists of composite index `name`. The composite index improves query performance once these keys are reference as used key parts. For example, see the following:

root[test]#> explain format=json select * from users_account where last_name='Namuag' and first_name='Maximus'G

*************************** 1. row ***************************

EXPLAIN: {

  "query_block": {

    "select_id": 1,

    "cost_info": {

      "query_cost": "1.20"

    },

    "table": {

      "table_name": "users_account",

      "access_type": "ref",

      "possible_keys": [

        "name"

      ],

      "key": "name",

      "used_key_parts": [

        "last_name",

        "first_name"

      ],

      "key_length": "60",

      "ref": [

        "const",

        "const"

      ],

      "rows_examined_per_scan": 1,

      "rows_produced_per_join": 1,

      "filtered": "100.00",

      "cost_info": {

        "read_cost": "1.00",

        "eval_cost": "0.20",

        "prefix_cost": "1.20",

        "data_read_per_join": "352"

      },

      "used_columns": [

        "id",

        "last_name",

        "first_name",

        "dob",

        "zip",

        "city",

        "state",

        "country",

        "tel"

      ]

    }

  }

}

1 row in set, 1 warning (0.00 sec

The used_key_parts show that the query plan has perfectly selected our desired columns covered in our composite index.

Composite indexing has its limitations as well. Certain conditions in the query cannot take all columns part of the key.

The documentation says, “The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is =, <=>, or IS NULL. If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses = from the first comparison. It also uses >= from the second comparison but considers no further key parts and does not use the third comparison for interval construction…”. Basically, this means that regardless you have composite index for two columns, a sample query below does not cover both fields:

root[test]#> explain format=json select * from users_account where last_name>='Zu' and first_name='Maximus'G

*************************** 1. row ***************************

EXPLAIN: {

  "query_block": {

    "select_id": 1,

    "cost_info": {

      "query_cost": "34.61"

    },

    "table": {

      "table_name": "users_account",

      "access_type": "range",

      "possible_keys": [

        "name"

      ],

      "key": "name",

      "used_key_parts": [

        "last_name"

      ],

      "key_length": "60",

      "rows_examined_per_scan": 24,

      "rows_produced_per_join": 2,

      "filtered": "10.00",

      "index_condition": "((`test`.`users_account`.`first_name` = 'Maximus') and (`test`.`users_account`.`last_name` >= 'Zu'))",

      "cost_info": {

        "read_cost": "34.13",

        "eval_cost": "0.48",

        "prefix_cost": "34.61",

        "data_read_per_join": "844"

      },

      "used_columns": [

        "id",

        "last_name",

        "first_name",

        "dob",

        "zip",

        "city",

        "state",

        "country",

        "tel"

      ]

    }

  }

}

1 row in set, 1 warning (0.00 sec)

In this case (and if your query is more of ranges instead of constant or reference types) then avoid using composite indexes. It just wastes your memory and buffer and it increases the performance degradation of your queries.

Prefix Indexes

Prefix indexes are indexes which contain columns referenced as an index, but only takes the starting length defined to that column, and that portion (or prefix data) are the only part stored in the buffer. Prefix indexes can help lessen your buffer pool resources and also your disk space as it does not need to take the full-length of the column.What does this mean? Let’s take an example. Let’s compare the impact between full-length index versus the prefix index.

root[test]#> create index name on users_account(last_name, first_name);

Query OK, 0 rows affected (0.42 sec)

Records: 0  Duplicates: 0  Warnings: 0



root[test]#> ! du -hs /var/lib/mysql/test/users_account.*

12K     /var/lib/mysql/test/users_account.frm

36M     /var/lib/mysql/test/users_account.ibd

We created a full-length composite index which consumes a total of 36MiB tablespace for users_account table. Let’s drop it and then add a prefix index.

root[test]#> drop index name on users_account;

Query OK, 0 rows affected (0.01 sec)

Records: 0  Duplicates: 0  Warnings: 0



root[test]#> alter table users_account engine=innodb;

Query OK, 0 rows affected (0.63 sec)

Records: 0  Duplicates: 0  Warnings: 0



root[test]#> ! du -hs /var/lib/mysql/test/users_account.*

12K     /var/lib/mysql/test/users_account.frm

24M     /var/lib/mysql/test/users_account.ibd






root[test]#> create index name on users_account(last_name(5), first_name(5));

Query OK, 0 rows affected (0.42 sec)

Records: 0  Duplicates: 0  Warnings: 0



root[test]#> ! du -hs /var/lib/mysql/test/users_account.*

12K     /var/lib/mysql/test/users_account.frm

28M     /var/lib/mysql/test/users_account.ibd

Using the prefix index, it holds up only to 28MiB and that’s less than 8MiB than using full-length index. That’s great to hear, but it doesn’t mean that is performant and serves what you need. 

If you decide to add a prefix index, you must identify first what type of query for data retrieval you need. Creating a prefix index helps you utilize more efficiency with the buffer pool and so it does help with your query performance but you also need to know its limitation. For example, let’s compare the performance when using a full-length index and a prefix index.

Let’s create a full-length index using a composite index,

root[test]#> create index name on users_account(last_name, first_name);

Query OK, 0 rows affected (0.45 sec)

Records: 0  Duplicates: 0  Warnings: 0



root[test]#>  EXPLAIN format=json select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' G

*************************** 1. row ***************************

EXPLAIN: {

  "query_block": {

    "select_id": 1,

    "cost_info": {

      "query_cost": "1.61"

    },

    "table": {

      "table_name": "users_account",

      "access_type": "ref",

      "possible_keys": [

        "name"

      ],

      "key": "name",

      "used_key_parts": [

        "last_name",

        "first_name"

      ],

      "key_length": "60",

      "ref": [

        "const",

        "const"

      ],

      "rows_examined_per_scan": 3,

      "rows_produced_per_join": 3,

      "filtered": "100.00",

      "using_index": true,

      "cost_info": {

        "read_cost": "1.02",

        "eval_cost": "0.60",

        "prefix_cost": "1.62",

        "data_read_per_join": "1K"

      },

      "used_columns": [

        "last_name",

        "first_name"

      ]

    }

  }

}

1 row in set, 1 warning (0.00 sec)



root[test]#> flush status;

Query OK, 0 rows affected (0.02 sec)



root[test]#> pager cat -> /dev/null; select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' G

PAGER set to 'cat -> /dev/null'

3 rows in set (0.00 sec)



root[test]#> nopager; show status like 'Handler_read%';

PAGER set to stdout

+-----------------------+-------+

| Variable_name         | Value |

+-----------------------+-------+

| Handler_read_first    | 0 |

| Handler_read_key      | 1 |

| Handler_read_last     | 0 |

| Handler_read_next     | 3 |

| Handler_read_prev     | 0 |

| Handler_read_rnd      | 0 |

| Handler_read_rnd_next | 0     |

+-----------------------+-------+

7 rows in set (0.00 sec)

The result reveals that it’s, in fact, using a covering index i.e “using_index”: true and uses indexes properly, i.e. Handler_read_key is incremented and does an index scan as Handler_read_next is incremented.

Now, let’s try using prefix index of the same approach,

root[test]#> create index name on users_account(last_name(5), first_name(5));

Query OK, 0 rows affected (0.22 sec)

Records: 0  Duplicates: 0  Warnings: 0



root[test]#>  EXPLAIN format=json select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' G

*************************** 1. row ***************************

EXPLAIN: {

  "query_block": {

    "select_id": 1,

    "cost_info": {

      "query_cost": "3.60"

    },

    "table": {

      "table_name": "users_account",

      "access_type": "ref",

      "possible_keys": [

        "name"

      ],

      "key": "name",

      "used_key_parts": [

        "last_name",

        "first_name"

      ],

      "key_length": "10",

      "ref": [

        "const",

        "const"

      ],

      "rows_examined_per_scan": 3,

      "rows_produced_per_join": 3,

      "filtered": "100.00",

      "cost_info": {

        "read_cost": "3.00",

        "eval_cost": "0.60",

        "prefix_cost": "3.60",

        "data_read_per_join": "1K"

      },

      "used_columns": [

        "last_name",

        "first_name"

      ],

      "attached_condition": "((`test`.`users_account`.`first_name` = 'Maximus Aleksandre') and (`test`.`users_account`.`last_name` = 'Namuag'))"

    }

  }

}

1 row in set, 1 warning (0.00 sec)



root[test]#> flush status;

Query OK, 0 rows affected (0.01 sec)



root[test]#> pager cat -> /dev/null; select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' G

PAGER set to 'cat -> /dev/null'

3 rows in set (0.00 sec)



root[test]#> nopager; show status like 'Handler_read%';

PAGER set to stdout

+-----------------------+-------+

| Variable_name         | Value |

+-----------------------+-------+

| Handler_read_first    | 0 |

| Handler_read_key      | 1 |

| Handler_read_last     | 0 |

| Handler_read_next     | 3 |

| Handler_read_prev     | 0 |

| Handler_read_rnd      | 0 |

| Handler_read_rnd_next | 0     |

+-----------------------+-------+

7 rows in set (0.00 sec)

MySQL reveals that it does use index properly but noticeably, there’s a cost overhead compared to a full-length index. That’s obvious and explainable, since the prefix index does not cover the whole length of the field values. Using a prefix index is not a replacement, nor an alternative, of full-length indexing. It can also create poor results when using the prefix index inappropriately. So you need to determine what type of query and data you need to retrieve.

Covering Indexes

Covering Indexes doesn’t require any special syntax in MySQL. A covering index in InnoDB refers to the case when all fields selected in a query are covered by an index. It does not need to do a sequential read over the disk to read the data in the table but only use the data in the index, significantly speeding up the query. For example, our query earlier i.e. 

select last_name from users_account where last_name='Namuag' and first_name='Maximus Aleksandre' G

As mentioned earlier, is a covering index. When you have a very well-planned tables upon storing your data and created index properly, try to make as possible that your queries are designed to leverage covering index so that you’ll benefit the result. This can help you maximize the efficiency of your queries and result to a great performance.

Leverage Tools That Offer Advisors or Query Performance Monitoring

Organizations often initially tend to go first on github and find open-source software that can offer great benefits. For simple advisories that helps you optimize your queries, you can leverage the Percona Toolkit. For a MySQL DBA, the Percona Toolkit is like a swiss army knife. 

For operations, you need to analyze how you are using your indexes, you can use pt-index-usage

Pt-query-digest is also available and it can analyze MySQL queries from logs, processlist, and tcpdump. In fact, the most important tool that you have to use for analyzing and inspecting bad queries is pt-query-digest. Use this tool to aggregate similar queries together and report on those that consume the most execution time.

For archiving old records, you can use pt-archiver. Inspecting your database for duplicate indexes, take leverage on pt-duplicate-key-checker. You might also take advantage of pt-deadlock-logger. Although deadlocks is not a cause of an underperforming and inefficient query but a poor implementation, yet it impacts query inefficiency. If you need table maintenance and requires you to add indexes online without affecting the database traffic going to a particular table, then you can use pt-online-schema-change. Alternatively, you can use gh-ost, which is also very useful for schema migrations.

If you are looking for enterprise features, bundled with lots of features from query performance and monitoring, alarms and alerts, dashboards or metrics that helps you optimize your queries, and advisors, ClusterControl may be the tool for you. ClusterControl offers many features that show you Top Queries, Running Queries, and Query Outliers. Checkout this blog MySQL Query Performance Tuning which guides you how to be on par for monitoring your queries with ClusterControl.

Conclusion

As you’ve arrived at the ending part of our two-series blog. We covered here the factors that cause query degradation and how to resolve it in order to maximize your database queries. We also shared some tools that can benefit you and help solve your problems.

Collect and distribute high-resolution crypto market data with ECS, S3, Athena, Lambda, and AWS Data Exchange

$
0
0

Feed: AWS Big Data Blog.

This is a guest post by Floating Point Group. In their own words, “Floating Point Group is on a mission to bring institutional-grade trading services to the world of cryptocurrency.”

The need and demand for financial infrastructure designed specifically for trading digital assets may not be obvious. There’s a rather pervasive narrative that these coins and tokens are effectively natively digital counterparts to traditional assets such as currencies, commodities, equities, and fixed income. This narrative often manifests in the form of pithy one-liners recycled by pundits attempting to communicate the value proposition of various projects in the space (such as, “Bitcoin is just a currency with an algorithmically controlled, tamper-proof monetary policy,” or, “Ether is just a commodity like gasoline that you can use to pay for computational work on a global computer.”). Unsurprisingly, we at FPG often hear the question, “What’s so special about cryptocurrencies that they warrant dedicated financial services? Why do we need solutions for problems that have already been solved?”

The truth is that these assets and the widespread public interest surrounding them are entirely unprecedented. The decentralized ledger technology that serves as an immutable record of network transactions, the clever use of proof-of-work algorithms to economically incentivize rational actors to help uphold the security of the network (the proof-of-work concept dates back at least as far as 1993, but it was not until bitcoin that the technology showed potential for widespread adoption), the irreversible nature of transactions that poses unique legal challenges in cases such as human error or extortion, the precariousness of self-custody (third-party custody solutions don’t exactly have track records that inspire trust), the regulatory uncertainties that come with the difficulty of both classifying these assets as well as arbitrating their exchange which must ultimately be reconciled by entities like the IRS, SEC, and CFTC—it is all very new, and very weird. With 24-hour market volume regularly exceeding $100 billion, we decided to direct our focus towards problems related specifically to trading these assets. Granted, crypto trading has undoubtedly matured since the days of bartering for bitcoin in web forums and witnessing 10% price spreads between international exchanges. But there is still a long path ahead.

One major pain point we are aiming to address for institutional traders involves liquidity (or, more precisely, the lack thereof). Simply put, the buying and selling of cryptocurrencies occurs across many different trading venues (exchanges), and liquidity (the offers to buy or sell a certain quantity of an asset at a certain price) continues to become more fragmented as new exchanges emerge. So say you’re trying to buy 100 bitcoins. You must buy from people who are willing to sell. As you take the best (cheapest) offers, you’re left with increasingly expensive offers. By the time you fill your order (in this example, buy all 100 bitcoins), you may have paid a much higher average price than, say, the price you paid for the first bitcoin of your order. This phenomenon is referred to as slippage. One easy way to minimize slippage is by expanding your search for offers. So rather than looking at the offers on just one exchange, look at the offers across hundreds of exchanges. This process, traditionally referred to as smart order routing (SOR), is one of the core services we provide. Our SOR service allows traders to easily submit orders that our system can match against the best offers available across multiple trading venues by actively monitoring liquidity across dozens of exchanges.

Fanning out large orders in search of the best prices is a rather intuitive and widely applicable concept—roughly 75% of equities are purchased and sold via SOR. But the value of such a service for crypto markets is particularly salient: a perpetual cycle of new exchanges surging in popularity while incumbents falter has resulted in a seemingly incessant fragmentation of liquidity across trading venues—yet traders tend to assume an exchange-agnostic mindset, concerned exclusively with finding the best price for a given quantity of an asset.

Access to both real-time and historical market data is essential to the functionality of our SOR service. The highest resolution data we could hope to obtain for a given market would include every trade and every change applied to the order book, effectively allowing us to recreate the state of a market at any given point in time. The updates provided through the WebSocket streams are not sufficient for reconstructing order books. We also need to periodically fetch snapshots of the order books and store those, which we can do using an exchange’s REST API. We can fetch a snapshot and apply the corresponding updates from the streams to “replay” the order book.

Fortunately, this data is freely available, because many exchanges offer real-time feeds of market data via WebSocket APIs. We found several third-party vendors selling subscriptions to these data sets, typically in the form of CSV dumps delivered at a weekly or monthly cadence. This presented the question of build vs. buy. Given that we felt capable of building a robust and reliable system for ingesting real-time market data in a relatively short amount of time and at a fraction of the cost of purchasing the data from a vendor, we were already leaning in favor of building. Further investigation made buying look like an increasingly unattractive option. Disclaimers that multiple vendors issued about their inability to guarantee data quality and consistency did not inspire confidence. Inspecting sample data sets revealed that some essential fields provided in the original data streams were missing—fields necessary for achieving our goal of recreating the state of a market at an arbitrary point in time. We also recognized that a weekly or monthly delivery schedule would restrict our ability to explore relatively recent market data.

This post provides a high-level overview of how we ingest and store real-time market data and how we use the AWS Data Exchange API to organize and publish our data sets programmatically. Our system’s functionality extends well beyond data ingestion, normalization, and persistence; we run dedicated services for data validation, caching the most recent trade and order book for every market, computing and storing derivative metrics, and other services that help safeguard data accuracy and minimize the latency of our trading systems.

Data ingestion

The WebSocket streams we connect to for data consumption are often the same APIs responsible for providing real-time updates to an exchange’s trading dashboard.

WebSocket connections transmit data as discrete messages. We can inspect the content of individual messages as they stream into the browser. For example, the following screenshot shows a batch of order book updates.

The updates are expressed as arrays of bids and asks that were either added to the book or removed from it. Client-side code processes each update, resulting in a real-time rendering of the market’s order book. In practice, our data ingestion service (Ingester) does not read a single stream, but rather thousands of different streams, covering various data feeds for all markets across multiple exchanges. All the connections required for such broad coverage and the resulting flood of incoming data raise some obvious concerns about data loss. We’ve taken several measures to mitigate such concerns, including a redundant system design that allows us to spin up an arbitrary number of instances of the Ingester service. Like most of our microservices, Ingester is a Dockerized service run on Amazon ECS and deployed via Terraform.

All these instances consume the same data feeds as each other while a downstream mechanism handles deduplication (this is covered in more detail later in this post). We also set up Amazon CloudWatch alerts to notify us when we detect non-contiguous messages, indicating a gap in the incoming data. The alerts don’t directly mitigate data loss, but they do serve the important function of prompting an investigation.

Ingester builds up separate buffers of incoming messages, split out by data-type/exchange/market. Then, after a fixed time interval, each buffer is flushed into Amazon S3 as a gzipped JSON file. The buffer-flush cycle repeats.

The following screenshot shows a portion of the file content.

This code snippet is a single, pretty-printed JSON record from the file in the screenshot above.

{
   "event_type":"trade",
   "timestamp":1571980320422,
   "ticker_pair":"BTCUSDT",
   "trade_id":194230159,
   "price":"7405.69000000",
   "quantity":"3.20285300",
   "buyer_order_id":730178987,
   "seller_order_id":730178953,
   "trade_timestamp":1571980320417,
   "buyer_market_maker":false,
   "M":true
}

Ingester handles additional functionality, such as applying pre-defined mappings of venue-specific field names to our internal field names. Data normalization is one of many processes necessary to enable our systems to build a holistic understanding of market dynamics.

As with most distributed system designs, our services are written with horizontal scalability as a first-order priority. We took the same approach in designing our data ingestion service, but it has some features that make it a bit different than the archetypical horizontally scalable microservice. The most common motivations for adjusting the number of instances of a given service are load-balancing and throttling throughput. Either your system is experiencing backpressure and a consumer service scales to alleviate that pressure, or the consumer is over-provisioned and you scale down the number of instances for the sake of parsimony. For our data ingestion service, however, our motivation for running multiple instances is to minimize data loss via redundancy. The CPU usage for each instance is independent of instance count, because each instance does identical work.

For example, rather than helping alleviate backpressure by pulling messages from a single queue, each instance of our data ingestion service connects to the same WebSocket streams and performs the same amount of work. Another somewhat unusual and confounding aspect of horizontally scaling our data ingestion service is related to state: we batch records in memory and flush the records to S3 every minute (based on the incoming message’s timestamp, not the system timestamp, because those would be inconsistent). Redundancy is our primary measure for minimizing data loss, but we also need each instance to write the files to S3 in such a way that we don’t end up with duplicate records. Our first thought was that we’d need a mechanism for coordinating activity across the instances, such as maintaining a cache that would allow us to check if a record had already been persisted. But we realized that we could perform this deduplication without any coordination between instances at all. Most of the message streams we consume publish messages with sequence IDs. We can combine the sequence IDs with the incoming message timestamp to achieve our deduplication mechanism: we can deterministically generate the same exact file names containing the exact same data by writing our service code to check that the message added to the batch has the appropriate sequence ID relative to the previous message in the batch and using the timestamp on the incoming message to determine the exact start and end of each batch (we typically get a UNIX timestamp and check when we’ve rolled over to the next clock minute). This allows us to simply rely on a key collision in S3 for deduplication.

AWS suggests a similar solution for a slightly different problem, relating to Amazon Kinesis Data Streams. For more information, see Handling Duplicate Records.

With this scheme, even if records are processed more than one time, the resulting Amazon S3 file has the same name and has the same data. The retries only result in writing the same data to the same file more than one time.

After we store the data, we can perform simple analytics queries on the billions of records we’ve stored in S3 using Amazon Athena, a query service that requires minimal configuration and zero infrastructure overhead. Athena has a concept of partitions (inherited from one of its underlying services, Apache Hive). Partitions are mappings between virtual columns (in our case: pair, year, month, and day) and the S3 directories in which the corresponding data is stored.

S3’s file system is not actually hierarchical. Files are prepended with long key prefixes that are rendered as directories in the AWS console when browsing a bucket’s contents. This has some non-trivial performance consequences when querying or filtering on large data sets.

The following screenshot illustrates a typical directory path.

By pointing Athena directly to a particular subset of data, a well-defined partitioning scheme can drastically reduce query run times and costs. Though the ability the perform ad hoc business analytics queries is primarily a convenience, taking time to choose a sane multi-level partitioning scheme for Athena based on some of our most common access patterns seemed worthwhile. A poorly designed partition structure can result in Athena unnecessarily scanning huge swaths of data and ultimately render the service unusable.

Data publication

Our pipeline for transforming thousands of small gzipped JSON files into clean CSVs and loading them into AWS Data Exchange involves three distinct jobs, each expressed as an AWS Lambda function.

Job 1

Job 1 is initiated shortly after midnight UTC by a cron-scheduled CloudWatch event. As mentioned previously, our data ingestion service’s batching mechanism flushes each batch to S3 at a regular time interval. A timestamp on the incoming message (applied server-side) determines the rollover from one interval to the next, as opposed to the ingestion service’s system timestamp, so in the rare case that a non-trivial amount of time elapses between the consumption of the final message of batch n and the first message of batch n+1, we kick off the first Lambda function 20 minutes after midnight UTC to minimize the likelihood of omitting data pending write.

Job 1 formats values for the date and data source into an Athena query template and outputs the query results as a CSV to a specified prefix path in S3. (Every Athena query produces a .metadata file and a CSV file of the query results, though DDL statements do not output a CSV.) This PUT request to S3 triggers an S3 event notification.

We run a full replica data ingestion system as an additional layer of redundancy. Using the coalesce conditional expression, the Athena query in Job 1 merges data from our primary system with the corresponding data from our replica system, and fills in any gaps while deduplicating redundant records.

We experimented fairly extensively with AWS Glue and PySpark for the ETL-related work performed in Job 1. When we realized that we could merge all the small source files into one, join the primary and replica data sets, and sort the results with a single Athena query, we decided to stick with this seemingly simpler and more elegant approach.

The following code shows one of our Athena query templates.

Job 2

Job 2 is triggered by the S3 event notification from Job 1. Job 2 simply copies the query results CSV file to a different key within the same S3 bucket.

The motivation for this step is twofold. First, we cannot dictate the name of an Athena query results CSV file; it is automatically set to the Athena query ID. Second, when adding an S3 object as an asset to an AWS Data Exchange revision, the asset’s name is automatically set to the S3 object’s key. So to dictate how the CSV file name appears in AWS Data Exchange, we must first rename it, which we accomplish by copying it to a specified S3 key.

Job 3

Job 3 handles all work related to AWS Data Exchange and AWS Marketplace Catalog via their respective APIs. We use boto3, AWS’s Python SDK, to interface with these APIs. The AWS Marketplace Catalog API is necessary for adding data set revisions to products that have already been published. For more information, see Tutorial: Adding New Data Set Revisions to a Published Data Product.

Our code explicitly defines mappings with the following structure:

data source / DataSet / Product

The following code shows how we configure relationships between data sources, data sets, and products.

Our data sources are typically represented by a trading venue and data type combination (such as Binance trades or CoinbasePro order books). Each new file for a given data source is delivered as a single asset within a single new revision for a particular data set.

An S3 trigger kicks off the Lambda function. The trigger is scoped to a specified prefix that maps to a single data set. The function alias feature of AWS Lambda allows us to define the unique S3 triggers for each data set while reusing the same underlying Lambda function. Job 3 carries out the following steps (note that steps 1 through 5 refer to the AWS Data Exchange API while steps 6 and 7 refer to the AWS Marketplace Catalog API):

  1. Submits a request to create a new revision for the corresponding data set via CreateRevision.
  2. Adds the file that was responsible for triggering the Lambda function to the newly created revision via CreateJob using the IMPORT_ASSETS_FROM_S3 job type. To submit this job, we need to supply a few values: the S3 bucket and key values for the file are pulled from the Lambda event message, while the RevisionID argument comes from the response to the CreateRevision call in the previous step.
  3. Kicks off the job with StartJob, sourcing the JobID argument from the response to the CreateJob call in the previous step.
  4. Polls the job’s status via GetJob (using the job ID from the response to the StartJob call in the previous step) to check that our file (the asset) was successfully added to the revision.
  5. Finalizes the revision via UpdateRevision.
  6. Requests a description of the marketplace entity using DescribeEntity, passing in the product ID stored in our hardcoded mappings as the EntityID
  7. Kicks off the entity ChangeSet via StartChangeSet, passing in the entity ID from the previous step, the entity ID from the DescribeEntity response in the previous step as EntityID, the revision ARN parsed from the response to our earlier call to CreateRevision as RevisionArn, and the data set ARN as DataSetArn, which we fetch at the start of the code’s runtime using AWS Data Exchange API’s GetDataSet.

Here’s a thin wrapper class we wrote to carry out the steps detailed above:

from time import sleep
import logging
import json

import boto3

from config import (
    DATA_EXCHANGE_REGION,
    MARKETPLACE_CATALOG_REGION,
    LambdaS3TriggerMappings
)

logger = logging.getLogger()


class CustomDataExchangeClient:
    def __init__(self):
        self._de_client = boto3.client('dataexchange', region_name=DATA_EXCHANGE_REGION)
        self._mc_client = boto3.client('marketplace-catalog', region_name=MARKETPLACE_CATALOG_REGION)
    
    def _get_s3_data_source(self, bucket, prefix):
        return LambdaS3TriggerMappings[(bucket, prefix)]

    # Job State can be one of: WAITING | IN_PROGRESS | ERROR | COMPLETED | CANCELLED | TIMED_OUT
    def _wait_for_de_job_completion(self, job_id):
        while True:
            get_job_resp = self._de_client.get_job(JobId=job_id)
            if get_job_resp['State'] == 'COMPLETED':
                logger.info(f"Job '{job_id}' succeeded:nt{get_job_resp}")
                break
            elif get_job_resp['State'] in ('ERROR', 'CANCELLED'):
                raise Exception(f"Job '{job_id}' failed:nt{get_job_resp}")
            else:
                sleep(5)
                logger.info(f"Still waiting on job {job_id}...")
        return get_job_resp

    # ChangeSet Status can be one of: PREPARING | APPLYING | SUCCEEDED | CANCELLED | FAILED
    def _wait_for_mc_change_set_completion(self, change_set_id):
        while True:
            describe_change_set_resp = self._mc_client.describe_change_set(
                Catalog='AWSMarketplace',
                ChangeSetId=change_set_id
                )
            if describe_change_set_resp['Status'] == 'SUCCEEDED':
                logger.info(
                    f"ChangeSet '{change_set_id}' succeeded:nt{describe_change_set_resp}"
                )
                break
            elif describe_change_set_resp['Status'] in ('FAILED', 'CANCELLED'):
                raise Exception(
                    f"ChangeSet '{change_set_id}' failed:nt{describe_change_set_resp}"
                )
            else:
                sleep(1)
                logger.info(f"Still waiting on ChangeSet {change_set_id}...")
        return describe_change_set_resp

    def process_s3_event(self, s3_event):
        source_bucket = s3_event['Records'][0]['s3']['bucket']['name']
        source_key = s3_event['Records'][0]['s3']['object']['key']
        source_prefix = '/'.join(source_key.split('/')[0:-1])
        s3_data_source = self._get_s3_data_source(source_bucket, source_prefix)
        obj_name = source_key.split('/')[-1]
        
        s3_data_source.validate_object_name(obj_name)
        
        for data_set in s3_data_source.lambda_s3_trigger_target_data_sets:
            # Create revision
            create_revision_resp = self._de_client.create_revision(
                DataSetId=data_set.id,
                Comment=obj_name
            )
            logger.debug(create_revision_resp)
            revision_id = create_revision_resp['Id']
            revision_arn = create_revision_resp['Arn']

            # Create job
            create_job_resp = self._de_client.create_job(
                Type='IMPORT_ASSETS_FROM_S3',
                Details={
                    'ImportAssetsFromS3': {
                      'AssetSources': [
                          {
                              'Bucket': source_bucket,
                              'Key': source_key
                          },
                      ],
                      'DataSetId': data_set.id,
                      'RevisionId': revision_id
                    }
                }
            )
            logger.debug(create_job_resp)

            # Start job
            job_id = create_job_resp['Id']
            start_job_resp = self._de_client.start_job(JobId=job_id)
            logger.debug(start_job_resp)

            # Wait for Data Exchange job completion
            get_job_resp = self._wait_for_de_job_completion(job_id)
            logger.debug(get_job_resp)

            # Finalize revision
            update_revision_resp = self._de_client.update_revision(
                DataSetId=data_set.id,
                RevisionId=revision_id,
                Finalized=True
            )
            logger.debug(update_revision_resp)

            # Ensure revision finalization succeeded
            finalized_status = update_revision_resp['Finalized']
            if finalized_status is not True:
                raise Exception(f"Failed to finalize revision:n{update_revision_resp}")

            # Publish the new revision to each product associated with the data set
            for product in data_set.products:
                # Describe the AWS Marketplace entity corresponding to the Data Exchange product
                describe_entity_resp = self._mc_client.describe_entity(
                    Catalog='AWSMarketplace',
                    EntityId=product.id
                )
                logger.debug(describe_entity_resp)

                entity_type = describe_entity_resp['EntityType']
                entity_id = describe_entity_resp['EntityIdentifier']

                # Isolate the target data set in the DescribeEntity response
                describe_entity_resp_data_sets = json.loads(describe_entity_resp['Details'])['DataSets']
                describe_entity_resp_data_set = list(
                    filter(lambda ds: ds['DataSetArn'] == data_set.arn, describe_entity_resp_data_sets)
                )
                # We should get the data set of interest in describe_entity_resp and only that data set
                assert len(describe_entity_resp_data_set) == 1

                # Start a ChangeSet to add the newly finalized revision to an existing product
                start_change_set_resp = self._mc_client.start_change_set(
                    Catalog='AWSMarketplace',
                    ChangeSet=[
                        {
                            "ChangeType": "AddRevisions",
                            "Entity": {
                                "Identifier": entity_id,
                                "Type": entity_type
                            },
                            "Details": json.dumps({
                                "DataSetArn": data_set.arn,
                                "RevisionArns": [revision_arn]
                            })
                        }
                    ]
                )
                logger.debug(start_change_set_resp)

                # Wait for the ChangeSet workflow to complete
                change_set_id = start_change_set_resp['ChangeSetId']
                describe_change_set_resp = self._wait_for_mc_change_set_completion(change_set_id)
                logger.debug(describe_change_set_resp)

The following screenshot shows the S3 trigger for Job 3.

The following screenshot shows an example of CloudWatch logs for Job 3.

The following screenshot shows a CloudWatch alarm for Job 3.

Finally, we can verify that our revisions were successfully added to their corresponding data sets and products through the AWS console.

AWS Data Exchange allows you to create private offers for your AWS account IDs, providing a convenient means of checking that revisions show up in each product as expected.

Conclusion

This post demonstrated how you can integrate AWS Data Exchange into an existing data pipeline frictionlessly. We’re pleased to have been invited to participate in the AWS Data Exchange private preview, and even more pleased with the service itself, which has proven to be a sophisticated yet natural extension of our system.

I want to offer special thanks to both Kyle Patsen and Rafic Melhem of the AWS Data Exchange team for generously fielding my questions (and patiently enduring my ramblings) for the better part of the past year. I also want to thank Lucas Adams for helping me design the system discussed in this post and, more importantly, for his unwavering vote of confidence.

If you are interested in learning more about FPG, don’t hesitate to contact us.

MariaDB Transactions and Isolation Levels for SQL Server Users

$
0
0

Feed: MariaDB Knowledge Base Article Feed.
Author: .

This page explains how transactions work in MariaDB, and highlights the main differences between MariaDB and SQL Server transactions.

Note that XA transactions are handled in a completely different way and are not covered in this page. See XA Transactions.

Missing Features

These SQL Server features are not available in MariaDB:

  • Autonomous transactions;
  • Distributed transactions.

Transactions, Storage Engines and the Binary Log

In MariaDB, transactions are optionally implemented by storage engines. The default storage engine, InnoDB, fully supports transactions. Other transactional storage engines include MyRocks and TokuDB. Most storage engines are not transactional, therefore they should not considered general purpose engines.

Writing into a non-transactional table in a transaction can still be useful. The reason is that a metadata lock is acquired on the table for the duration of the transaction, so that ALTER TABLEs are queued.

It is possible to write into transactional and non-transactional tables within a single transactions. It is important to remember that non-transactional engines will have the following limitations:

  • In case of rollback, changes to non-transactional engines won’t be undone. We will receive a warning `1196` which reminds us this.
  • Data in transactional tables cannot be changed by other connections in the middle of a transaction, but data in non-transactional tables can.
  • In case of a crash, committed data written into a transactional table can always be recovered, but this is not necessarily true for non-transactional tables.

If the binary log is enabled, writing into different transactional storage engines in a single transaction, or writing into transactional and non-transactional engines inside the same transaction, implies some extra work for MariaDB. It needs to perform a two-phase commit to be sure that changes to different tables are logged in the correct order. This affects the performance.

Transaction Syntax

The first read or write to an InnoDB table starts a transaction. No data access is possible outside a transaction.

By default autocommit is on, which means that the transaction is committed automatically after each SQL statement. We can disable it, and manually commit transactions:

SET SESSION autocommit := 0;
SELECT ... ;
DELETE ... ;
COMMIT;

Whether autocommit is enabled or not, we can start transactions explicitly, and they will not be automatically committed:

START TRANSACTION;
SELECT ... ;
DELETE ... ;
COMMIT;

BEGIN can also be used to start a transaction, but it will not work in stored procedures.

Read-only transactions are also available using START TRANSACTION READ ONLY. This is a small performance optimisation. MariaDB will issue an error when trying to write data in the middle of a read-only transaction.

Only DML statements are transactional and can be rolled back. This may change in a future version, see MDEV-17567 – Atomic DDL and MDEV-4259 – transactional DDL.

Changing autocommit and explicitly starting a transaction will implicitly commit the active transaction, if any. DDL statements, and several other statements, implicitly commit the active transaction. See SQL statements That Cause an Implicit Commit for the complete list of these statements.

A rollback can also be triggered implicitly, when certain errors occur.

You can experiment with transactions to check in which cases they implicitly commit or rollback. The in_transaction system variable can help: it is set to 1 when a transaction is in progress, or 0 when no transaction is in progress.

This section only covers the basic syntax for transactions. Much more options are available. For more information, see Transactions.

Constraint Checking

MariaDB supports the following constraints:

In some databases, constraint can temporarily be violated during a transaction, and their enforcement can be deferred to the commit time. SQL Server does not support this, and always validate data against constraints at the end of each statement.

MariaDB does something different: it always check constraint after each row change. There are cases this policy makes some statements fail with an error, even if those statements would work on SQL Server.

For example, suppose you have an id columns that is the primary key, and you need to increase its value for some reason:

MariaDB [test]> SELECT id FROM customer;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  4 |
|  5 |
+----+

MariaDB [test]> UPDATE customer SET id = id + 1;
ERROR 1062 (23000): Duplicate entry '2' for key 'PRIMARY'

The reason why this happens is that, as first thing, MariaDB tries to change 1 to 2, but a value of 2 is already present in the primary key.

A solution is using this non-standard syntax:

MariaDB [test]> UPDATE customer SET id = id + 1 ORDER BY id DESC;
Query OK, 5 rows affected (0.00 sec)
Rows matched: 5  Changed: 5  Warnings: 0

Changing the id’s in reversed order won’t duplicate any value.

Similar problems can happen with CHECK constraints and foreign keys. To solve them, we can use a different approach:

SET SESSION check_constraint_checks := 0;
-- run some queries
-- that temporarily violate a CHECK clause
SET SESSION check_constraint_checks := 1;

SET SESSION foreign_key_checks := 0;
-- run some queries
-- that temporarily violate a foreign key
SET SESSION foreign_key_checks := 1;

The last solutions temporarily disable CHECK constraints and foreign keys. Note that, while this may solve practical problems, it is dangerous because:

  • This doesn’t disable a single CHECK or foreign key, but also others, that you don’t expect to violate.
  • This doesn’t defer the constraint checks, but it simply disables them for a while. This means that, if you insert some invalid values, they will not be detected.

See check_constraint_checks and foreign_key_checks system variables.

Isolation Levels and Locks

Locking reads

In MariaDB, the locks acquired by a read do not depend on the isolation level (with one exception noted below).

As a general rule:

  • Plain SELECTs are not locking, they acquire snapshots instead.
  • To force a read to acquire a shared lock, use SELECT ... LOCK IN SHARED MODE.
  • To force a read to acquire an exclusive lock, use SELECT ... FOR UPDATE.

Changing the Isolation Level

The default isolation level, in MariaDB, is REPEATABLE READ. This can be changed with the tx_isolation system variable.

Applications developed for SQL Server and later ported to MariaDB may run with READ COMMITTED without problems. Using a stricter level would reduce scalability. To use READ COMMITTED by default, add the following line to the MariaDB configuration file:

tx_isolation = 'READ COMMITTED'

It is also possible to change the default isolation level for the current session:

SET SESSION tx_isolation := 'READ COMMITTED';

Or just for one transaction, by issuing the following statement before starting a transaction:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

How Isolation Levels are Implemented in MariaDB

InnoDB Transactions


Using Referential Constraints with Partitioned Tables in InnoDB

$
0
0

Feed: Planet MySQL
;
Author: MySQL Performance Blog
;

One of our support customers approached us with the following problem the other day:

They could not create a table with an FK relation! So, of course, we asked to see the parent table definition, which was:

The parent table is partitioned!  This immediately explained the problem; partitioned tables can not be part of an FK relationship, as described (in point 10) here – MySQL Error Code 1215: “Cannot add foreign key constraint”.

Quoting the official MySQL manual for completeness:

Partitioned tables using the InnoDB storage engine do not support foreign keys. More specifically, this means that the following two statements are true:

  • No definition of an InnoDB table employing user-defined partitioning may contain foreign key references; no InnoDB table whose definition contains foreign key references may be partitioned.
  • No InnoDB table definition may contain a foreign key reference to a user-partitioned table; no InnoDB table with user-defined partitioning may contain columns referenced by foreign keys.

So, after verifying it was impossible to guarantee referential integrity using CONSTRAINTs, we turned to an old alternative from MyISAM era of MySQL: using a set of triggers that would intercept the DML statements before they execute, and verify if the parent row actually exists.

So for this, we would create child_table without the constraint:

And then we create 4 triggers: BEFORE INSERT and BEFORE UPDATE  on the child table, and BEFORE UPDATE and BEFORE DELETE on the parent table.

Testing the Triggers:

Populate parent_table:

Test insert:

So far so good! For valid child ids, inserts are accepted, and for invalid child ids, trigger rejects the insert.

Test Update:

Test Delete:

For both delete and update, we also tested trigger is working as expecting and checking FK integrity.

Insert new row on parent_table which we should be able to delete without failing the “constraint” (as it will have no child rows) :

Unfortunately, the non-standard REPLACE INTO is not compatible with the above method, as it actually consists of two operations – a DELETE and a subsequent INSERT INTO, and doing the DELETE on the parent table for a referenced row would trigger the FK error:

REPLACE INTO the child_table should work without issues.

On the other hand, INSERT…ON DUPLICATE KEY CHECK will work as expected as the trigger on the UPDATE will work correctly and prevent breaking referential integrity.

For convenience FK triggers can be disabled on the session; This would be the equivalent of SET foreign_key_checks=0.  You can disable by setting the following variable:

Disclaimer:

The above is a proof of concept and while it should work for the vast majority of uses, there are two cases that are not checked by the triggers and will break referential integrity: TRUNCATE TABLE parent_table and DROP TABLE parent_table,  as it will not execute the DELETE trigger and hence will allow all child rows to become invalid at once.  

And in general, DDL operations which can break referential integrity (for example ALTER TABLE modifying column type or name) are not handled as these operations don’t fire TRIGGERs of any kind, and also it relies on you writing the correct query to find the parent rows (for example if you have a parent table with a multi-column primary key, you must check all the columns in the WHERE condition of the triggers)

Also, keep in mind added performance impact; Triggers will add overhead, so please make sure to measure impact on the response time of the DML in these two tables. Please test thoroughly before deploying to production!

Why and When You Need Transactional DDL in Your Database (Corporate Blog)

$
0
0

Author: .

We typically talk about transactions in the context of Data Manipulation Language (DML), but the same principles apply when we talk about Data Definition Language (DDL). As databases increasingly include transactional DDL, we should stop and think about the history of transactional DDL. Transactional DDL can help with application availability by allowing you perform multiple modifications in a single operation, making software upgrades simpler. You’re less likely to find yourself dealing with a partially upgraded system that requires your database administrator (DBA) to go in and fix everything by hand, losing hours of their time and slowing your software delivery down.

Why Do You Care?

If you make a change to application code and something doesn’t work, you don’t want to have to deal with a complicated recovery. You want the database to be able to roll it back automatically so you get back to a working state very rapidly. Today, very often people don’t write code for databases, they have frameworks that do it (Hibernate, for example). This makes it impossible for a software engineer to write the code properly, or roll it back, because they don’t work on that level. When you make changes using rolling application upgrades, it is simpler, less likely to fail, and more obvious what to do when you do experience a failure.

With transactional DDL, it’s far less likely that you will have to deal with a partially upgraded system that has essentially ground your application to a stop. Partial upgrades like that may require your database administrator (DBA) to go in and fix everything by hand, losing hours of their time and slowing your software delivery down. With transactional DDL, you can roll back to the last working upgrade and resolve the issues rapidly, without taking your software delivery system or your application offline.

A Short Explanation of DML and DDL

Essentially, DML statements are structured query language (SQL) statements that we use to manipulate data — as you might have guessed. Specifically, the DML class includes the INSERT, UPDATE, and DELETE SQL statements. Sometimes, we refer to these three statements as WRITE DML, and we call the SELECT statement READ DML. The standard does not differentiate between read and write, but for this article, it is an important distinction.

On the other hand, DDL is a family of SQL language elements used to define the database structure, particularly database schemas. The CREATE, ALTER, and DROP commands are common examples of DDL SQL statements, but DDL language elements may include operations with databases, tables, columns, indexes, views, stored procedures, and constraints.

Next, let’s start by defining a transaction as a sequence of commands collected together into a single logical unit, and a transaction is then executed as a single step. With this definition, if the execution of a transaction is interrupted, the transaction isn’t executed. Because a transaction must be ACID — Atomic, Consistent, Isolated, and Durable, that means that when a transaction executes several statements, some of which are DDL, it treats them as a single operation that can either be rolled back or committed. This means that you will never leave the database in a temporary, non-consistent state. Historically, databases haven’t provided the functionality of transactional DDL statements, but even today not all databases provide the functionality of truly transactional DDL. In most cases, this functionality comes with limitations.

Now, what does true “transactional DDL” mean? It means that all statements should be ACID, regardless of whether they are DML or DDL statements. In practice, with most databases, DDL statements break the transactionality of the enclosing transaction and cause anomalies.

A Brief History of DDL

Originally, the idea of a data definition language was introduced as part of the Codasyl database model. CODASYL is the Conference/Committee on Data Systems Languages, and was formed as a consortium in 1959 to guide development of a standard programming language, which resulted in COBOL, as well as a number of technical standards. CODASYL also worked to standardize database interfaces, all part of a goal from its members to promote more effective data systems analysis, design, and implementation.

In 1969 CODASYL’s Data Base Task Group (DBTG) published its first language specifications for their data model: a data definition language for defining the database schema, another DDL to define application views of the database, and (you guessed it) a data manipulation language that defined verbs to request and update data in the database. Later DDL was used to refer to a subset of SQL to declare tables, columns, data types, and constraints, and SQL-92 introduced a schema manipulation language and schema information tables to query schemas. In SQL:2003 these information tables were specified as SQL/Schemata.

Transactionality of DDL, however, is not part of the ANSI SQL standard. Section 17.1 of ANSI SQL 2016 () only specifies the grammar and the supported isolation levels. It does not specify how a transaction should behave or what ‘transactional’ means.

Why Isn’t Transactional DDL Universally Provided?

There’s no reason why DDL statements shouldn’t be transactional, but in the past, databases haven’t provided this functionality. In part that’s because transactional DDL implies that DDL must happen in isolation from other transactions that happen concurrently. That means that the metadata of the modified table must be versioned. To correctly process metadata changes, we need to be able to roll back DDL changes that were aborted due to a transaction rollback. That’s not easy — in fact it’s a complex algorithmic task that requires the database to support metadata delta (diff), which corresponds to DDL changes within the current transaction of each connection to the database. This delta exists before the transaction is closed, and so it could be rolled back as a single transaction, or in parts in RDBMSs that support multi-level transactions or savepoints. Essentially, it’s not universally provided because it’s hard to do correctly.

For organizations moving to microservices, DevOps, and CI/CD, they have an essential new requirement — a database that supports online transactional DDL.

Let’s return to our concept of WRITE DML (update, delete, insert) vs. READ DML. You might ask yourself how do these statements collide in a system that supports multiple concurrent transactions and a DDL transaction is ongoing? Ideally the set of transactions that collide is as small as possible. A SELECT statement does not collide with an INSERT statement. There is no reason why this should be any different in the context of DDL. A DDL statement ALTER TABLE should not prevent a SELECT statement from executing concurrently. This is a common pattern in database design.

For DDL to be transactional you need to support multiple versions concurrently. Similar to multiversion concurrency control (MVCC), readers don’t block writers and writers don’t block readers. Without MVCC it’s hard to have transactional DDL. Traditionally, databases started with a locking system instead of MVCC. That implementation wasn’t suited to transactional DDL, which is why around 2005 there was a big shift towards MVCC — to provide concurrent access to the database, and to implement transactional memory in programming languages.

MVCC provides the semantics we might naturally desire. Read DML can proceed while conflicting writes (write DML and DDL) are executed concurrently.

Write DML and DDL in a Live System

We have established that Read DML (SELECT) can happily proceed regardless of what else is executing concurrently in the system. Write DML (INSERT, UPDATE, DELETE) is not allowed to execute on a table that is being concurrently modified by DDL. Explaining the semantics of the conflicts expected behavior based on the Isolation Levels of all concurrent transactions is beyond the scope of this article.

To simplify the discussion, we state that both write DML and DDL are mutually exclusive if executed on the same resource. This results in operations blocking each other.  If you have a long-running DDL transaction, such as a rolling upgrade of your application, write DML will be prevented for a long period of time.

So even though the DDL is transactional, it can still lead to database downtime and maintenance windows. Or does it?

Always Online, Always Available

The database industry is moving towards an always online, always available model. Earlier databases resulted in a message saying that something wasn’t available — that was essentially because you grabbed a lock in a database for a long period of time. That isn’t an option in an always online, always available model.

Customers, and therefore organizations, require applications to be online and available all the time. That means that transactional DDL is mandatory for the new world, and not only for the applications to run the way customers require them to. It’s also mandatory for organizations adopting DevOps and continuous integration and continuous delivery models (CI/CD). Specifically, that’s because without transactional DDL, applications developers cannot safely and easily make database schema changes (along with their app changes) online. That means that for organizations moving to microservices, DevOps, and CI/CD, they have an essential new requirement — a database that supports online transactional DDL.

I personally consider the term ONLINE misleading. The database is not offline while it holds a lock on a resource. A more appropriate term would have been LOCK FREE DDL. That is, metadata modification can happen without locking out concurrent DML.

The Availability vs. Simplicity Tradeoff

We said that write DML cannot happen concurrently with DDL to avoid ACID violations. Now, what happens to a system that has to be always up and needs to execute long-running DDL statements? Luckily enough, most DDL statements do not take a long time. Adding or removing columns from a table takes a constant amount of time, regardless of how long the table is. If the set of changes is small enough, it is OK to lock DML out for a short period of time.

But there are some DDL statements that need to process every row in the table and hence it can take a long time to process large tables. CREATE INDEX is a prime example of a long-running statement. Given that index creation can take multiple hours, it is not an acceptable option for an always online, always available application.

Specifically, for index creation, NuoDB and other databases implement an ONLINE or CONCURRENT version (I would have preferred to call it a LOCK FREE version). This version allows DBAs to create indexes without locking the table or requiring a maintenance window — an extremely important capability in a 24×7 availability world. However, these capabilities do not come for free. Online versions of common DDL statements tend to be slightly slower than their LOCKING versions, have complicated failure modes, and hard to understand ACID semantics. They also cannot be part of a larger multistatement DDL transaction.

Interestingly enough, sometimes the speed of execution is not the primary concern. In which case you may not want to use the online version. You might have an application that requires complex changes to a database schema and some LOCKING is a viable tradeoff for a much simpler upgrade procedure.

Atomicity, one of the four guarantees of ACID, states that: “Each transaction is treated as a single ‘unit,’ which either succeeds completely or fails completely.”

This becomes a very desirable quality if we think of transactions as a set of many DDL statements. An example would be: alter a table; create a log table with a similar name; insert a few rows to other tables. We already know that if DDL is not treated transactionally, you could end up with new rows in the other tables, but neither the CREATE nor the ALTER succeeded. Or you could end up with just the CREATE and no ALTER.

So, if you have an application that assumes that if there is a log table (the CREATE) it can also assume that the ALTER has happened, you might run into subtle bugs in production if the upgrade did not fully complete. Transactional DDL gives database administrators the ability to perform multiple modifications (such as the example above) in a single operation.

For developers, the strong Isolation guarantees of transactional DDL makes the development of applications easier. An application can only observe the database in state A (before the upgrade) or in-state B (after the upgrade) and will never see partial results. This reduces the required test matrix and increases confidence in the rolling upgrade procedure. Now that is easy to code against.

The tradeoff between simplicity of rolling upgrades that is LOCKING and the always available, always online that is NOT TRANSACTIONAL has been known to the industry since InterBase introduced MVCC to the commercial market.

Choose Transactional DDL

Databases have changed a lot since 1959, and there have been many changes in customer expectations for user experience and application availability since then. Transactional DDL helps you avoid a scenario in which your application is no longer available, and gives your DBAs some peace of mind, knowing they won’t have to painstakingly repair the database to bring your software delivery back up to speed. Today, many databases offer transactional DDL, which will help you resolve immediate issues quickly by rolling back to the last working upgrade. In order to meet the always available requirements of today, choose a database that offers transactional DDL. But keep in mind that modern always-available, always-online applications require a database that not only simplifies upgrade scenarios, but also limits downtime due to long-running metadata modifications.

Feature image via Pixabay.

Originally published in The New Stack.

Hans-Juergen Schoenig: PostgreSQL: You might need to increase max_locks_per_transaction

$
0
0

Feed: Planet PostgreSQL.

out of shared memory”: Some of you might have seen that error message in PostgreSQL already. But what does it really mean, and how can you prevent it? The problem is actually not as obscure as it might seem at first glance. max_locks_per_transaction is the critical configuration parameter you need to use to avoid trouble.

out of shared memory”: When it happens

Most of the shared memory used by PostgreSQL is of a fixed size. This is true for the I/O cache (shared buffers) and for many other components as well. One of those components has to do with locking. If you touch a table inside a transaction, PostgreSQL has to track your activity to ensure that a concurrent transaction cannot drop the table you are about to touch. Tracking activity is important because you want to make sure that a DROP TABLE (or some other DDL) has to wait until all reading transactions have terminated. The trouble is, you have to store information about tracked activity somewhere– and this point is exactly what you have to understand.

Let us run a simple script:

BEGIN;

SELECT 'CREATE TABLE a' || id || ' (id int);' 
       FROM generate_series(1, 20000) AS id;

gexec

What this script does is to start a transaction and to generate 20.000 CREATE TABLE statements. It simply generates SQL which is then automatically executed (gexec treats the result of the previous SQL statement as input). 

Let us see what the SELECT statement produced …

BEGIN
          ?column?          
----------------------------
 CREATE TABLE a1 (id int);
 CREATE TABLE a2 (id int);
 CREATE TABLE a3 (id int);
 CREATE TABLE a4 (id int);
 CREATE TABLE a5 (id int);
...

And now let us see what PostgreSQL does:

...
CREATE TABLE
CREATE TABLE
ERROR:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.
ERROR:  current transaction is aborted, commands ignored until end of transaction block
ERROR:  current transaction is aborted, commands ignored until end of transaction block
ERROR:  current transaction is aborted, commands ignored until end of transaction block
ERROR:  current transaction is aborted, commands ignored until end of transaction block
...

After a few thousand tables, PostgreSQL will error out: “out of shared memory”. What you can see is that we created all those tables in a single transaction. PostgreSQL had to lock them and eventually ran out of memory. Remember: The database is using a fixed-size shared memory field to store those locks.

The logical question is: What is the size of this memory field? Two parameters come into play:

test=# SHOW max_connections;
 max_connections
-----------------
 100
(1 row)

test=# SHOW max_locks_per_transaction;
 max_locks_per_transaction
---------------------------
 64
(1 row)

The number of locks we can keep in shared memory is max_connections x max_locks_per_transaction. Keep in mind that row level locks are NOT relevant here. You can easily do a …

SELECT * FROM billions_of_rows FOR UPDATE;

… without running out of memory because row locks are stored on disk and not in RAM. Therefore the number of tables is relevant – not the number of rows.

Inspecting pg_locks

How can you figure out what is currently going on? To demonstrate what you can do, I have prepared a small example:

test=# CREATE TABLE t_demo (id int);
CREATE TABLE

First of all, you can create a simple table.
As you might know, in PostgreSQL names are not relevant at all. Internally, only numbers count. To fetch the object ID of a simple table, try the following statement:

test=# SELECT oid, relkind, relname
		FROM 	pg_class
 		WHERE relname = 't_demo';
  oid   | relkind | relname
--------+---------+---------
 232787 | r       | t_demo
(1 row)

In my example, the object id is 232787. Let us figure out where this number pops up:

test=# BEGIN;
BEGIN
test=# SELECT * FROM t_demo;
 id
----
(0 rows)

test=# x
Expanded display is on.
test=# SELECT * FROM pg_locks WHERE relation = '232787';
-[ RECORD 1 ]------+----------------
locktype           | relation
database           | 187812
relation           | 232787
page               |
tuple              |
virtualxid         |
transactionid      |
classid            |
objid              |
objsubid           |
virtualtransaction | 3/6633
pid                | 106174
mode               | AccessShareLock
granted            | t
fastpath           | t

Since we are reading from the table, you can see that PostgreSQL has to keep an ACCESS SHARE LOCK which only ensures that the table cannot be dropped or modified (= DDL) in a way that harms concurrent SELECT statements.
The more tables a transaction touches, the more entries pg_locks will have. In case of heavy concurrency, multiple entries can become a problem.

PostgreSQL partitioning and how it relates to “out of shared memory”

If you are running a typical application, out of memory errors are basically rare because the overall number of relevant locks is usually quite low. However, if you are heavily relying on excessive partitioning, life is different. In PostgreSQL, a partition is basically a normal table– and it is treated as such. Therefore, locking can become an issue.

Let us take a look at the following example:

BEGIN;

CREATE TABLE t_part (id int) PARTITION BY LIST (id);

SELECT 'CREATE TABLE t_part_' || id
	|| ' PARTITION OF t_part FOR VALUES IN ('
	|| id || ');'
FROM 	generate_series(1, 1000) AS id;

gexec

SELECT count(*) FROM t_part;

First of all, a parent table is created. Then, 1000 partitions are added. For the sake of simplicity, each partition is only allowed to hold exactly one row– but let’s not worry about that for now. Following that, a simple SELECT statement is executed—such a statement is guaranteed to read all partitions.

The following listing shows which SQL the script has generated to create partitions:

                              ?column?                              
--------------------------------------------------------------------
 CREATE TABLE t_part_1 PARTITION OF t_part FOR VALUES IN (1);
 CREATE TABLE t_part_2 PARTITION OF t_part FOR VALUES IN (2);
 CREATE TABLE t_part_3 PARTITION OF t_part FOR VALUES IN (3);
 CREATE TABLE t_part_4 PARTITION OF t_part FOR VALUES IN (4);
 CREATE TABLE t_part_5 PARTITION OF t_part FOR VALUES IN (5);
...

After running the

SELECT count(*) FROM t_part

statement, the important observation is now:

SELECT 	count(*)
FROM 	pg_locks
WHERE 	mode = 'AccessShareLock';
 count
-------
  1004
(1 row)

PostgreSQL already needs more than 1000 locks to do this. Partitioning will therefore increase the usage of this shared memory field and make “out of memory” more likely. If you are using partitioning HEAVILY, it can make sense to change max_locks_per_transaction.

Finally …

In case you are interested in Data Science and Machine Learning, you can check out Kevin Speyer’s post on “Reinforcement Learning” which can be found here.

Julien Rouhaud: pg qualstats 2: Global index advisor

$
0
0

Feed: Planet PostgreSQL.

Coming up with good index suggestion can be a complex task. It requires
knowledge of both application queries and database specificities. Over the
year multiple projects tried to solve this problem, one of which being PoWA
with the version 3
, with the help of
pg_qualstats
extension
.
It can give pretty good index suggestion, but it requires to install and
configure PoWA, while some users wanted to only have the global index advisor.
In such case and for simplicity, the algorithm used in PoWA is now available in
pg_qualstats version 2 without requiring any additional component.

What is pg_qualstats

A simple way to explain what is pg_qualstats would be to say that it’s like
pg_stat_statements
working at the predicate level.

The extension will save useful statistics for WHERE and JOIN clauses:
which table and column a predicate refers to, number of time the predicate has
been used, number of execution of the underlying operator, whether it’s a
predicate from an index scan or not, selectivity, constant values used and much
more.

You can deduce many things from such information. For instance, if you examine
the predicates that contains references to different tables, you can find which
tables are joined together, and how selective are those join conditions.

Global suggestion?

As I mentioned, the global index advisor added in pg_qualstats 2 uses the same
approach as the one in PoWA, so the explanation here will describe both tools.
The only difference is that with PoWA you’ll likely get a better suggestion, as
more predicates will be available, and you can also choose for wich time
interval you want to detect missing indexes.

The important thing here is that the suggestion is performed globally,
considering all interesting predicates at the same time. This approach is
different to all other approaches I saw that only consider a single query at a
time. I believe that a global approach is better, as it’s possible to reduce
the total number of indexes, maximizing multi-column indexes usefulness.

How global suggestion is done

The first step is to gather all predicates that could benefit from a new index.
This is easy to get with pg_qualstats, by filtering the predicates coming from
sequential scans, executed many time, that filter many rows (both in number of
rows and in percentage) you get a perfect list of predicates that likely miss
an index (or alternatively the list of poorly written queries in certain
cases). For instance, let’s consider an application which uses those 4
predicates:

List of all predicates
found

Next, we build the full set of paths with each AND-ed predicates that contains
other, also possibly AND-ed, predicates. Using the same 4 predicates, we would
get those paths:

Build all possible paths of
predicates

Once all the paths are built, we just need to get the best path to find out the
best index to suggest. The scoring is for now done by giving a weight to each
node of each path corresponding to the number of simple predicates it contains
and summing the weight for each path. This is very simple and allows to prefer
a smaller amount of indexes to optimize as many queries as possible. With our
simple example, we get:

Weight all paths and choose the highest
score

Of course, other scoring approaches could be used to take into account other
parameters and give possibly better suggestions. For instance, combining the
number of executions or the predicate selectivity. If the read/write ratio for
each table is known (this is available using
powa-archivist), it would also
be possible to adapt the scoring method to limit index suggestions for
write-mostly tables. With this algorithm, all of that could be added quite
easily.

Once the best path is found, we can generate an index DDL! As the order of the
columns can be important, this is done using getting the columns for each node
in ascending weight order. In our example, we would generate this index:

CREATE INDEX ON t1 (id, ts, val);

Once an index is found, we simply remove the contained predicates for the
global list of predicates and start again from scratch until there are no
predicate left.

Additional details and caveat

Of course, this is a simplified version of the suggestion algorithm. Some
other informations are required. For instance, the list of predicates is
actually expanded with operator classes and access
method
depending
on the column types and operator, to make sure that the suggested indexes are
valid. If multiple index methods are found for a best path, btree will be
chosen in priority.

This brings another consideration: this approach is mostly thought for
btree indexes, for which the column order is critical. Some other access
methods don’t require a specific column order, and for those it could be
possible to get better index suggestions if the column order parameters wasn’t
considered.

Another important point is that the operator classes and access method is not
hardcoded but retrieved at execution time using the local catalogs. Therefore,
you can get different (and possibly better) results if you make sure that
optional operator classes are present when using the index advisor. This could
be btree_gist or btree_gin extensions, but also other access methods.
It’s also possible that some type / operator combination doesn’t have any
associated access method recorded in the catalogs. In this case, those
predicates are returned separately as a list of unoptimizable predicates, that
should be manually analyzed.

Finally, as pg_qualstats isn’t considering expression predicates, this advisor
can’t suggest indexes on expression, for instance if you’re using fulltext
search.

Usage example

A simple set-returning function is provided, with optional parameters, that
returns a jsonb value:

CREATE OR REPLACE FUNCTION pg_qualstats_index_advisor (
    min_filter integer DEFAULT 1000,
    min_selectivity integer DEFAULT 30,
    forbidden_am text[] DEFAULT '{}')
    RETURNS jsonb

The parameter names are self explanatory:

  • min_filter: how many tuples should a predicate filter on average to be
    considered for the global optimization, by default 1000.
  • min_selectivity: how selective should a predicate filter on average to be
    considered for the global optimization, by default 30%.
  • forbidden_am: list of access methods to ignore. None by default,
    although for PostgreSQL 9.6 and prior hash indexes will internally be
    discarded
    , as those are only safe since version 10.

Using pg_qualstats regression tests, let’s see a simple example:

CREATE TABLE pgqs AS SELECT id, 'a' val FROM generate_series(1, 100) id;
CREATE TABLE adv (id1 integer, id2 integer, id3 integer, val text);
INSERT INTO adv SELECT i, i, i, 'line ' || i from generate_series(1, 1000) i;
SELECT pg_qualstats_reset();
SELECT * FROM adv WHERE id1 < 0;
SELECT count(*) FROM adv WHERE id1 < 500;
SELECT * FROM adv WHERE val = 'meh';
SELECT * FROM adv WHERE id1 = 0 and val = 'meh';
SELECT * FROM adv WHERE id1 = 1 and val = 'meh';
SELECT * FROM adv WHERE id1 = 1 and id2 = 2 AND val = 'meh';
SELECT * FROM adv WHERE id1 = 6 and id2 = 6 AND id3 = 6 AND val = 'meh';
SELECT * FROM adv WHERE val ILIKE 'moh';
SELECT COUNT(*) FROM pgqs WHERE id = 1;

And here’s what the function returns:

SELECT v
  FROM jsonb_array_elements(
    pg_qualstats_index_advisor(min_filter => 50)->'indexes') v
  ORDER BY v::text COLLATE "C";
                               v
---------------------------------------------------------------
 "CREATE INDEX ON public.adv USING btree (id1)"
 "CREATE INDEX ON public.adv USING btree (val, id1, id2, id3)"
 "CREATE INDEX ON public.pgqs USING btree (id)"
(3 rows)

SELECT v
  FROM jsonb_array_elements(
    pg_qualstats_index_advisor(min_filter => 50)->'unoptimised') v
  ORDER BY v::text COLLATE "C";
        v
-----------------
 "adv.val ~~* ?"
(1 row)

The version 2 of pg_qualstats is
not released yet, but feel free to test it and report any issue you may
find
!

DevOps for databases: “DataOps”

$
0
0

Feed: James Serra’s Blog.
Author: James Serra.

DevOps, a set of practices that combines software development (Dev) and information-technology operations (Ops), has become a very popular way to shorten the systems development life cycle and provide continuous delivery of applications (“software”). The implementation of continuous delivery and DevOps to data analytics has been termed DataOps, which is the topic of this blog.

Databases are more difficult to manage than applications from a development perspective. Applications, generally, do not concern themselves with state.  For any given “release” or build an application can be deployed and overlaid over the previous version without needing to maintain any portion of the previous application.  Databases are different. It’s much harder to deploy the next version of your database if you need to be concerned with maintaining “state” in the database. 

So what is the “state” you need to be concerned with
maintaining? 

Lookup data is the simple example.  Almost every database has tables that are used for allowable values, lookup data, and reference data.  If you need to change that data for a new release, how do you do that?  What happens if the customer or user has already changed that data?  How do you migrate that data? 

Another example: a table undergoes a major schema migration.  New columns are added and the table is split and normalized among new tables.  How do we write the migration code to ensure it runs exactly once or runs multiple times without side effects (using scripts that are “idempotent”)? 

Other objects that require state to be considered during
an upgrade:

  • Indexes: what happens if an index is renamed or an included column is added?  What happens if the DBA adds a new emergency index?  Will your DevOps tool remove it since it isn’t in an official build? 
  • Keys: if you change a primary key, will that change require the PK to be dropped and recreated?  If so, what happens to the foreign keys?

In most cases, database objects like functions, views, and stored procedures have no state considerations and can be re-deployed during every release. 

So how do you overcome these “state” difficulties,
especially if you are aiming towards frequent releases and agile, collaborative
development?

The first step is to make a major decision when including databases in your DevOps processes, and that is how you will store the data model. There are two options:

Migration-based deployment: Sometimes called transformation-based deployment, this is the most common option today and is a very traditional way to work with databases during development. At some point you create an initial database (a “seed” database that is a single migration script stored inside source control), and after that you keep every script that’s needed to bring the database schema up to the current point (you can use SQL Server Management Studio to create the scripts). Those migration scripts will have an incremental version number and will often include data fixes or new values for reference tables, along with the Data Definition Language (DDL) required for the schema changes. So basically you are migrating the database from one state to another. The system of truth in a migration-based approach is the database itself.  There are a few problems with this option:

  • Deployments keep taking longer as more and more scripts need to be applied when upgrading a database. A way around this is to create new seed databases on a regular basis to avoid starting with the very first database
  • A lot of wasted time can happen with large databases when dealing with, for example, the design of an index. If the requirements keep changing, a large index can be added to the database, then deleted, then reapplied slightly differently (i.e. adding a new column to it), and this can be repeated many times
  • There is no data model that shows what the database should really look like. The only option is to look at the freshly updated database
  • Upgrade scripts can break if schema drift occurs. This could happen if a patch was made to a production server and those changes didn’t make it back to the development environment or were not implemented the same way as was done in the production environment
  • Upgrade scripts can also break if not run in the correct order

State-based deployment: With this option you store the data model by taking a snapshot of the current state of the database and putting it in source control, and using comparison tools to figure out what needs to be deployed (i.e. doing a schema compare between your repository and the target database). Every table, stored procedure, view, and trigger will be saved as separate sql files which will be the real representation of the state of your database object. This is a much faster option as the only changes deployed are those that are needed to move from the current state to the required state (usually via a DACPAC). This is what SQL Server Data Tools (SSDT) for Visual Studio does with its database projects that includes schema comparison and data comparison tools, or you can use a product like SQL Compare from Red-Gate. Using the example above of creating an index, in this option you simply create the final index instead of creating and modifying it multiple times. In a state-based approach the system of truth is the source code itself.  Another good thing is that you do not have to deal with ALTER scripts with a state-based approach – the schema/data compare tool takes care of generating the ALTER scripts and runs it against the target database without any manual intervention. So the developer just needs to keep the database structure up-to-date and the tools will do all the work. The end result is there is much less work needed with this option compared to the migration-based deployment.

While it may seem state-based deployment is always the way to go, the migration-based deployment may make more sense in scenario’s where you need more fine-grain control in the scripts as with the state-based deployment you are not able to modify the difference script. And having control over the scripts allows you to write better scripts than you think the script compare would generate. Other reasons are: by making the change a first class artifact, you can “build once, deploy often” (as opposed to something new that is generated prior to each deployment); you encourage small, incremental changes (per Agile/DevOps philosophy); and it’s much easier to support parallel development strategies with migrations – in part because the migrations themselves are small, incremental changes (i.e. the ability to deploy different features or development branches to target databases, that is environments like stage and production).

Once you figure out which deployment method you will use, the next step is to learn the options for version control and an automated build process. Check out these blogs for help: Automating Builds from Source Control for the WideWorldImporters Database (state-based approach), Database Development in Visual Studio using SQL Change Automation: Getting Started (migration-based approach), Deploying Database changes with Redgate SQL Change Automation and Azure DevOps (lab), Introduction to SQL Server database continuous integration, Basic Database Continuous Integration and Delivery (CI/CD) using Visual Studio Team Services (VSTS), Continuous database deployments with Azure DevOps, Why You Should Use a SSDT Database Project For Your Data Warehouse.

One last point of clarification: DataOps focuses on keeping track of database objects such as tables, stored procedures, views, and triggers, while DevOps focuses on source control for application code. These are usually done separately (i.e. separate projects and pipelines in Azure DevOps). Also usually done separately are Azure Databricks (see CI/CD with Databricks and Azure DevOps), Power BI (see The future of content lifecycle management in Power BI), Azure Data Factory (see Continuous integration and delivery (CI/CD) in Azure Data Factory, Azure data Factory –DevOps, and Azure DevOps Pipeline Setup for Azure Data Factory (v2), and Azure Analysis Services (see Azure Analysis Services deploy via DevOps)). There is also a whole separate category for machine learning operations (MLOps), see the MLOps workshop by Dave Wetnzel (who helped with this blog). If you have any interest in these topics, please comment below and based on the interest I will follow up with additional blog posts.

More info:

Managing Schemas And Source Control For Databases

DevOps: Why Don’t Database Developers Use Source Control?

DevOps: Should databases use migration-based or state-based deployment?

Database Delivery – State based vs Migration based

Evolutionary Database Design

Migration-base vs State-based database development approach

DataBase DevOps Migration-Based vs. State-Based

A strategy for implementing database source control

Using Oracle EXPAND_SQL_TEXT

$
0
0

Feed: Databasejournal.com – Feature Database Articles.
Author: .

Database tables are not static entities; besides the usual insert/update/delete events, occasional DDL can be executed to add columns, drop columns or add needed constraints or indexes. The first two items can create problems with stored procedures, packages, functions and possibly triggers by changing the number of columns that need to be processed when explicit variables are used. If the programmer used a record variable (as shown below) then no issues are likely to be seen:


SPLEEBO @ gwankus > 
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > -- Code to display all employee information
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > -- This should succeed
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > declare
  2  	     cursor get_emp_info is
  3  	     select * From emp;
  4  begin
  5  	     for emp_rec in get_emp_info loop
  6  		     dbms_output.put_line(emp_rec.empno||' '||emp_rec.ename||' '||emp_rec.job||' '||emp_rec.mgr||' '||emp_rec.hiredate||' '||emp_rec.sal||' '||emp_rec.comm||' '||emp_rec.deptno);
  7  	     end loop;
  8  end;
  9  /
7369 SMITH CLERK 7902 17-DEC-80 800  20                                         
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30                                  
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30                                   
7566 JONES MANAGER 7839 02-APR-81 2975  20                                      
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30                                
7698 BLAKE MANAGER 7839 01-MAY-81 2850  30                                      
7782 CLARK MANAGER 7839 09-JUN-81 2450  10                                      
7788 SPLEEBO ANALYST 7566 09-DEC-82 3000  20                                      
7839 KING PRESIDENT  17-NOV-81 5000  10                                         
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30                                   
...
6100 MILLER CLERK 7782 23-SEP-97 1300  10                                       
SPLEEBO @ gwankus > 

If explicit variables are used, then it’s likely the code will fail because of the change. It can be disconcerting to see the following in an error message:


PLS-00394: wrong number of values in the INTO list of a FETCH statement 

Of course investigation will reveal an ‘alter table …’ statement was executed prior to the failure or a describe on the affected table will report a different number of columns than were present when the code was originally written. Once this fact is known, fixing the problem is up to the developer who wrote it. They will have choices exist on how to go about making such changes.

Looking at an example using explicitly coded variables and an added column, let’s explore using a procedure from the DBMS_UTILITY package, EXPAND_SQL_TEXT, to generate a complete column list from the modified table to use as a reference for code changes. First, the original code and the error it generates:


SPLEEBO @ gwankus > 
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > -- Code to display all employee information
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > -- Fails because not enough variables are declared
SPLEEBO @ gwankus > -- and populated
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > declare
  2  	     v_empno emp.empno%type;
  3  	     v_ename emp.ename%type;
  4  	     v_job   emp.job%type;
  5  	     v_mgr   emp.mgr%type;
  6  	     v_hiredate      emp.hiredate%type;
  7  	     v_sal   emp.sal%type;
  8  	     v_comm  emp.comm%type;
  9  	     v_deptno	     emp.deptno%type;
 10  
 11  	     cursor get_emp_info is
 12  	     select * From emp;
 13  begin
 14  	     open get_emp_info;
 15  	     loop
 16  		     fetch get_emp_info into v_empno, v_ename, v_job, v_mgr, v_hiredate, v_sal, v_comm, v_deptno;
 17  		     exit when get_emp_info%notfound;
 18  		     dbms_output.put_line(v_empno||' '||v_ename||' '||v_job||' '||v_mgr||' '||v_hiredate||'  '||v_sal||' '||v_comm||' '||v_deptno);
 19  	     end loop;
 20  end;
 21  /
		fetch get_emp_info into v_empno, v_ename, v_job, v_mgr, v_hiredate, v_sal, v_comm, v_deptno;
		*
ERROR at line 16:
ORA-06550: line 16, column 3: 
PLS-00394: wrong number of values in the INTO list of a FETCH statement 
ORA-06550: line 16, column 3: 
PL/SQL: SQL Statement ignored 


SPLEEBO @ gwankus > 
SPLEEBO @ gwankus > 

Create a copy of the original code (to preserve it should it be needed) and use DBMS_UTILITY.EXPAND_SQL_TEXT to generate, as a comment, the expanded results of a ‘select * from …” query against the modified table. The procedure requires a CLOB variable be declared to hold the results of the procedure call and also requires the specific ‘select *’ query to operate on. The code shown below generates this output as a comment and can be reused by editing the table name in the supplied query:


SPLEEBO @ gwankus > --
SPLEEBO @ gwankus >-- Expand the 'select *' query to see all
SPLEEBO @ gwankus >-- of the returned columns
SPLEEBO @ gwankus >--
SPLEEBO @ gwankus >-- Add the output to the failing script
SPLEEBO @ gwankus >-- to facilitate corrective edits
SPLEEBO @ gwankus >--
SPLEEBO @ gwankus >spool new_query.sql
SPLEEBO @ gwankus >declare
  2    l_clob clob;
  3  begin
  4    dbms_utility.expand_sql_text (
  5      input_sql_text  => 'select * from emp',
  6      output_sql_text => l_clob
  7    );
  8
  9    dbms_output.put_line('/*');
 10    dbms_output.put_line(lower(l_clob));
 11    dbms_output.put_line('*/');
 12  end;
 13  /

/*                                                                              
select "a1"."empno" "empno","a1"."ename" "ename","a1"."job" "job","a1"."mgr"    
"mgr","a1"."hiredate" "hiredate","a1"."sal" "sal","a1"."comm"                   
"comm","a1"."deptno" "deptno","a1"."term_dt" "term_dt" from "scott"."emp" "a1"  
*/                                                                              

PL/SQL procedure successfully completed.

SPLEEBO @ gwankus > spool off

Create a working copy and append the output of the above query to it. Then, open the modified file in the editor of choice to make the necessary changes:


SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > -- Copy the original script to preserve code
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > !cp emp_info_pl_orig.sql emp_info_pl.sql

SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > -- Add the output generated above as a comment
SPLEEBO @ gwankus > -- for reference purposes
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > !cat new_query.sql >> emp_info_pl.sql

SPLEEBO @ gwankus > 
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > -- Edit the script copy to fix the issue by
SPLEEBO @ gwankus > -- adding the necessary variable declaration
SPLEEBO @ gwankus > -- and editing the code to populate it
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > !vi emp_info_pl.sql

declare
        v_empno emp.empno%type;
        v_ename emp.ename%type;
        v_job   emp.job%type;
        v_mgr   emp.mgr%type;
        v_hiredate      emp.hiredate%type;
        v_sal   emp.sal%type;
        v_comm  emp.comm%type;
        v_deptno        emp.deptno%type;
        v_term_dt        emp.term_dt%type;

        cursor get_emp_info is
        select * From emp;
begin
        open get_emp_info;
        loop
                fetch get_emp_info into v_empno, v_ename, v_job, v_mgr, v_hiredate, v_sal, v_comm, v_deptno, v_term_dt;
                exit when get_emp_info%notfound;
                dbms_output.put_line(v_empno||' '||v_ename||' '||v_job||' '||v_mgr||' '||v_hiredate||' '||v_sal||' 
'||v_comm||' '||v_deptno||' '||v_term_dt); end loop; end; / /* select "a1"."empno" "empno","a1"."ename" "ename","a1"."job" "job","a1"."mgr" "mgr","a1"."hiredate" "hiredate","a1"."sal" "sal","a1"."comm" "comm","a1"."deptno" "deptno","a1"."term_dt" "term_dt" from "scott"."emp" "a1" */

Test the changes to ensure everything works as expected:


SPLEEBO @ gwankus > 
SPLEEBO @ gwankus > set head on feedback on pagesize 60
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > -- Run modified code
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > -- The anonymous block now completes
SPLEEBO @ gwankus > -- without error
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > @emp_info_pl
SPLEEBO @ gwankus > declare
  2  	     v_empno emp.empno%type;
  3  	     v_ename emp.ename%type;
  4  	     v_job   emp.job%type;
  5  	     v_mgr   emp.mgr%type;
  6  	     v_hiredate      emp.hiredate%type;
  7  	     v_sal   emp.sal%type;
  8  	     v_comm  emp.comm%type;
  9  	     v_deptno	     emp.deptno%type;
 10  	     v_term_dt	      emp.term_dt%type;
 11  
 12  	     cursor get_emp_info is
 13  	     select * From emp;
 14  begin
 15  	     open get_emp_info;
 16  	     loop
 17  		     fetch get_emp_info into v_empno, v_ename, v_job, v_mgr, v_hiredate, v_sal, v_comm, v_deptno, v_term_dt;
 18  		     exit when get_emp_info%notfound;
 19  		     dbms_output.put_line(v_empno||' '||v_ename||' '||v_job||' '||v_mgr||' '||v_hiredate||' '||v_sal||' '||v_comm||' '||v_deptno||' '||v_term_dt);
 20  	     end loop;
 21  end;
 22  /
7369 SMITH CLERK 7902 17-DEC-80 800  20 31-DEC-99                               
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 31-DEC-99                        
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 31-DEC-99                         
7566 JONES MANAGER 7839 02-APR-81 2975  20 31-DEC-99                            
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 31-DEC-99                      
7698 BLAKE MANAGER 7839 01-MAY-81 2850  30 31-DEC-99                            
7782 CLARK MANAGER 7839 09-JUN-81 2450  10 31-DEC-99                            
7788 SPLEEBO ANALYST 7566 09-DEC-82 3000  20 31-DEC-99                            
7839 KING PRESIDENT  17-NOV-81 5000  10 31-DEC-99                               
...
6100 MILLER CLERK 7782 23-SEP-97 1300  10 31-DEC-99                             

PL/SQL procedure successfully completed.

SPLEEBO @ gwankus > --

This same technique can be used on tables with a relatively large number of columns:


SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > -- Let's take another example
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > -- Create a table with 21 columns
SPLEEBO @ gwankus > -- and populate it
SPLEEBO @ gwankus > --
SPLEEBO @ gwankus > @lotsa_cols
SPLEEBO @ gwankus > create table lotsacols(
  2  a1      number,
  3  a2      number,
  4  a3      number,
  5  a4      number,
  6  a5      number,
  7  a6      number,
  8  a7      number,
  9  a8      number,
 10  a9      number,
 11  a10     number,
 12  a11     number,
 13  a12     number,
 14  a13     number,
 15  a14     number,
 16  a15     number,
 17  a16     number,
 18  a17     number,
 19  a18     number,
 20  a19     number,
 21  a20     number,
 22  a21     number);

Table created.

SPLEEBO @ gwankus > 
SPLEEBO @ gwankus > begin
  2  	     for z in 1..1000 loop
  3  		     insert into lotsacols(a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,a21)
  4  		     values(mod(z,3)+1,mod(z,13)+1,mod(z,21)+1,mod(z,34)+1,mod(z,47)+1,mod(z,53)+1,
mod(z,67)+1,mod(z,79)+1,mod(z,81)+1,mod(z,97)+1,mod(z,3)+1,mod(z,7)+1,mod(z,6)+1,mod(z,2)+1,
mod(z,9)+1,mod(z,8)+1,mod(z,101)+1,mod(z,407)+1,mod(z,313)+1,mod(z,271)+1,mod(z,133)+1); 5 end loop; 6 7 commit; 8 end; 9 / PL/SQL procedure successfully completed. SPLEEBO @ gwankus > SPLEEBO @ gwankus > SPLEEBO @ gwankus > -- SPLEEBO @ gwankus > -- Rather than do a DESC on the table SPLEEBO @ gwankus > -- use expand_sql_text to generate the SPLEEBO @ gwankus > -- column list and spool it to a file for SPLEEBO @ gwankus > -- later use SPLEEBO @ gwankus > -- SPLEEBO @ gwankus > -- Edit that file to create a working block SPLEEBO @ gwankus > -- of PL/SQL to generate results from the SPLEEBO @ gwankus > -- table data SPLEEBO @ gwankus > -- /* select "a1"."a1" "a1","a1"."a2" "a2","a1"."a3" "a3","a1"."a4" "a4","a1"."a5" "a5","a1"."a6" "a6","a1"."a7" "a7","a1"."a8" "a8","a1"."a9" "a9","a1"."a10" "a10","a1"."a11" "a11","a1"."a12" "a12","a1"."a13" "a13","a1"."a14" "a14","a1"."a15" "a15","a1"."a16" "a16","a1"."a17" "a17","a1"."a18" "a18","a1"."a19" "a19","a1"."a20" "a20","a1"."a21" "a21" from "scott"."lotsacols" "a1" */ declare cursor get_lotsa is select * From lotsacols; begin dbms_output.put_line('Your lucky LOTTO numbers are: '); for lotsa in get_lotsa loop dbms_output.put_line(lotsa.a1||' '||lotsa.a6||' '||lotsa.a7||' '||lotsa.a13||' '||lotsa.a17||' '||lotsa.a20); end loop; end; / SPLEEBO @ gwankus > -- SPLEEBO @ gwankus > -- Execute the code SPLEEBO @ gwankus > -- Your lucky LOTTO numbers are: 2 50 8 5 7 209 3 51 9 6 8 210 1 52 10 1 9 211 ... 2 53 11 2 10 212 1 32 51 1 14 179 2 33 52 2 15 180 3 34 53 3 16 181 PL/SQL procedure successfully completed.

Using EXPAND_SQL_TEXT can be easier than generating a table listing using DESC and spooling the results as it creates a smaller file that can easily be incorporated into a change procedure. Since the expanded SQL text is generated as a comment, it can remain after the edits are completed in the event further code changes are required or desired.

The choice is the developer’s to make, but it certainly seems easier to let Oracle generate usable output in a somewhat automated fashion to facilitate code edits. In the end it’s whatever the developer is comfortable with that matters most. But, it might be worth investigating using EXPAND_SQL_TEXT to put that reference information in the script being edited and possibly avoid getting lost between two screens of code. That could save editing time.

# # #

See articles by David Fitzjarrell

Why and When You Need Transactional DDL in Your Database

$
0
0

Author: mkysel.

We typically talk about transactions in the context of Data Manipulation Language (DML), but the same principles apply when we talk about Data Definition Language (DDL). As databases increasingly include transactional DDL, we should stop and think about the history of transactional DDL. Transactional DDL can help with application availability by allowing you perform multiple modifications in a single operation, making software upgrades simpler. You’re less likely to find yourself dealing with a partially upgraded system that requires your database administrator (DBA) to go in and fix everything by hand, losing hours of their time and slowing your software delivery down.

Why Do You Care?

If you make a change to application code and something doesn’t work, you don’t want to have to deal with a complicated recovery. You want the database to be able to roll it back automatically so you get back to a working state very rapidly. Today, very often people don’t write code for databases, they have frameworks that do it (Hibernate, for example). This makes it impossible for a software engineer to write the code properly, or roll it back, because they don’t work on that level. When you make changes using rolling application upgrades, it is simpler, less likely to fail, and more obvious what to do when you do experience a failure.

With transactional DDL, it’s far less likely that you will have to deal with a partially upgraded system that has essentially ground your application to a stop. Partial upgrades like that may require your database administrator (DBA) to go in and fix everything by hand, losing hours of their time and slowing your software delivery down. With transactional DDL, you can roll back to the last working upgrade and resolve the issues rapidly, without taking your software delivery system or your application offline.

A Short Explanation of DML and DDL

Essentially, DML statements are structured query language (SQL) statements that we use to manipulate data — as you might have guessed. Specifically, the DML class includes the INSERT, UPDATE, and DELETE SQL statements. Sometimes, we refer to these three statements as WRITE DML, and we call the SELECT statement READ DML. The standard does not differentiate between read and write, but for this article, it is an important distinction.

On the other hand, DDL is a family of SQL language elements used to define the database structure, particularly database schemas. The CREATE, ALTER, and DROP commands are common examples of DDL SQL statements, but DDL language elements may include operations with databases, tables, columns, indexes, views, stored procedures, and constraints.

Next, let’s start by defining a transaction as a sequence of commands collected together into a single logical unit, and a transaction is then executed as a single step. With this definition, if the execution of a transaction is interrupted, the transaction isn’t executed. Because a transaction must be ACID — Atomic, Consistent, Isolated, and Durable, that means that when a transaction executes several statements, some of which are DDL, it treats them as a single operation that can either be rolled back or committed. This means that you will never leave the database in a temporary, non-consistent state. Historically, databases haven’t provided the functionality of transactional DDL statements, but even today not all databases provide the functionality of truly transactional DDL. In most cases, this functionality comes with limitations.

Now, what does true “transactional DDL” mean? It means that all statements should be ACID, regardless of whether they are DML or DDL statements. In practice, with most databases, DDL statements break the transactionality of the enclosing transaction and cause anomalies.

A Brief History of DDL

Originally, the idea of a data definition language was introduced as part of the Codasyl database model. CODASYL is the Conference/Committee on Data Systems Languages, and was formed as a consortium in 1959 to guide development of a standard programming language, which resulted in COBOL, as well as a number of technical standards. CODASYL also worked to standardize database interfaces, all part of a goal from its members to promote more effective data systems analysis, design, and implementation.

In 1969 CODASYL’s Data Base Task Group (DBTG) published its first language specifications for their data model: a data definition language for defining the database schema, another DDL to define application views of the database, and (you guessed it) a data manipulation language that defined verbs to request and update data in the database. Later DDL was used to refer to a subset of SQL to declare tables, columns, data types, and constraints, and SQL-92 introduced a schema manipulation language and schema information tables to query schemas. In SQL:2003 these information tables were specified as SQL/Schemata.

Transactionality of DDL, however, is not part of the ANSI SQL standard. Section 17.1 of ANSI SQL 2016 () only specifies the grammar and the supported isolation levels. It does not specify how a transaction should behave or what ‘transactional’ means.

Why Isn’t Transactional DDL Universally Provided?

There’s no reason why DDL statements shouldn’t be transactional, but in the past, databases haven’t provided this functionality. In part that’s because transactional DDL implies that DDL must happen in isolation from other transactions that happen concurrently. That means that the metadata of the modified table must be versioned. To correctly process metadata changes, we need to be able to roll back DDL changes that were aborted due to a transaction rollback. That’s not easy — in fact it’s a complex algorithmic task that requires the database to support metadata delta (diff), which corresponds to DDL changes within the current transaction of each connection to the database. This delta exists before the transaction is closed, and so it could be rolled back as a single transaction, or in parts in RDBMSs that support multi-level transactions or savepoints. Essentially, it’s not universally provided because it’s hard to do correctly.

For organizations moving to microservices, DevOps, and CI/CD, they have an essential new requirement — a database that supports online transactional DDL.

Let’s return to our concept of WRITE DML (update, delete, insert) vs. READ DML. You might ask yourself how do these statements collide in a system that supports multiple concurrent transactions and a DDL transaction is ongoing? Ideally the set of transactions that collide is as small as possible. A SELECT statement does not collide with an INSERT statement. There is no reason why this should be any different in the context of DDL. A DDL statement ALTER TABLE should not prevent a SELECT statement from executing concurrently. This is a common pattern in database design.

For DDL to be transactional you need to support multiple versions concurrently. Similar to multiversion concurrency control (MVCC), readers don’t block writers and writers don’t block readers. Without MVCC it’s hard to have transactional DDL. Traditionally, databases started with a locking system instead of MVCC. That implementation wasn’t suited to transactional DDL, which is why around 2005 there was a big shift towards MVCC — to provide concurrent access to the database, and to implement transactional memory in programming languages.

MVCC provides the semantics we might naturally desire. Read DML can proceed while conflicting writes (write DML and DDL) are executed concurrently.

Write DML and DDL in a Live System

We have established that Read DML (SELECT) can happily proceed regardless of what else is executing concurrently in the system. Write DML (INSERT, UPDATE, DELETE) is not allowed to execute on a table that is being concurrently modified by DDL. Explaining the semantics of the conflicts expected behavior based on the Isolation Levels of all concurrent transactions is beyond the scope of this article.

To simplify the discussion, we state that both write DML and DDL are mutually exclusive if executed on the same resource. This results in operations blocking each other.  If you have a long-running DDL transaction, such as a rolling upgrade of your application, write DML will be prevented for a long period of time.

So even though the DDL is transactional, it can still lead to database downtime and maintenance windows. Or does it?

Always Online, Always Available

The database industry is moving towards an always online, always available model. Earlier databases resulted in a message saying that something wasn’t available — that was essentially because you grabbed a lock in a database for a long period of time. That isn’t an option in an always online, always available model.

Customers, and therefore organizations, require applications to be online and available all the time. That means that transactional DDL is mandatory for the new world, and not only for the applications to run the way customers require them to. It’s also mandatory for organizations adopting DevOps and continuous integration and continuous delivery models (CI/CD). Specifically, that’s because without transactional DDL, applications developers cannot safely and easily make database schema changes (along with their app changes) online. That means that for organizations moving to microservices, DevOps, and CI/CD, they have an essential new requirement — a database that supports online transactional DDL.

I personally consider the term ONLINE misleading. The database is not offline while it holds a lock on a resource. A more appropriate term would have been LOCK FREE DDL. That is, metadata modification can happen without locking out concurrent DML.

The Availability vs. Simplicity Tradeoff

We said that write DML cannot happen concurrently with DDL to avoid ACID violations. Now, what happens to a system that has to be always up and needs to execute long-running DDL statements? Luckily enough, most DDL statements do not take a long time. Adding or removing columns from a table takes a constant amount of time, regardless of how long the table is. If the set of changes is small enough, it is OK to lock DML out for a short period of time.

But there are some DDL statements that need to process every row in the table and hence it can take a long time to process large tables. CREATE INDEX is a prime example of a long-running statement. Given that index creation can take multiple hours, it is not an acceptable option for an always online, always available application.

Specifically, for index creation, NuoDB and other databases implement an ONLINE or CONCURRENT version (I would have preferred to call it a LOCK FREE version). This version allows DBAs to create indexes without locking the table or requiring a maintenance window — an extremely important capability in a 24×7 availability world. However, these capabilities do not come for free. Online versions of common DDL statements tend to be slightly slower than their LOCKING versions, have complicated failure modes, and hard to understand ACID semantics. They also cannot be part of a larger multistatement DDL transaction.

Interestingly enough, sometimes the speed of execution is not the primary concern. In which case you may not want to use the online version. You might have an application that requires complex changes to a database schema and some LOCKING is a viable tradeoff for a much simpler upgrade procedure.

Atomicity, one of the four guarantees of ACID, states that: “Each transaction is treated as a single ‘unit,’ which either succeeds completely or fails completely.”

This becomes a very desirable quality if we think of transactions as a set of many DDL statements. An example would be: alter a table; create a log table with a similar name; insert a few rows to other tables. We already know that if DDL is not treated transactionally, you could end up with new rows in the other tables, but neither the CREATE nor the ALTER succeeded. Or you could end up with just the CREATE and no ALTER.

So, if you have an application that assumes that if there is a log table (the CREATE) it can also assume that the ALTER has happened, you might run into subtle bugs in production if the upgrade did not fully complete. Transactional DDL gives database administrators the ability to perform multiple modifications (such as the example above) in a single operation.

For developers, the strong Isolation guarantees of transactional DDL makes the development of applications easier. An application can only observe the database in state A (before the upgrade) or in-state B (after the upgrade) and will never see partial results. This reduces the required test matrix and increases confidence in the rolling upgrade procedure. Now that is easy to code against.

The tradeoff between simplicity of rolling upgrades that is LOCKING and the always available, always online that is NOT TRANSACTIONAL has been known to the industry since InterBase introduced MVCC to the commercial market.

Choose Transactional DDL

Databases have changed a lot since 1959, and there have been many changes in customer expectations for user experience and application availability since then. Transactional DDL helps you avoid a scenario in which your application is no longer available, and gives your DBAs some peace of mind, knowing they won’t have to painstakingly repair the database to bring your software delivery back up to speed. Today, many databases offer transactional DDL, which will help you resolve immediate issues quickly by rolling back to the last working upgrade. In order to meet the always available requirements of today, choose a database that offers transactional DDL. But keep in mind that modern always-available, always-online applications require a database that not only simplifies upgrade scenarios, but also limits downtime due to long-running metadata modifications.

Feature image via Pixabay.

Originally published in The New Stack.


Why and When You Need Transactional DDL in Your Database (Tech Blog)

$
0
0

Author: .

We typically talk about transactions in the context of Data Manipulation Language (DML), but the same principles apply when we talk about Data Definition Language (DDL). As databases increasingly include transactional DDL, we should stop and think about the history of transactional DDL. Transactional DDL can help with application availability by allowing you perform multiple modifications in a single operation, making software upgrades simpler. You’re less likely to find yourself dealing with a partially upgraded system that requires your database administrator (DBA) to go in and fix everything by hand, losing hours of their time and slowing your software delivery down.

Why Do You Care?

If you make a change to application code and something doesn’t work, you don’t want to have to deal with a complicated recovery. You want the database to be able to roll it back automatically so you get back to a working state very rapidly. Today, very often people don’t write code for databases, they have frameworks that do it (Hibernate, for example). This makes it impossible for a software engineer to write the code properly, or roll it back, because they don’t work on that level. When you make changes using rolling application upgrades, it is simpler, less likely to fail, and more obvious what to do when you do experience a failure.

With transactional DDL, it’s far less likely that you will have to deal with a partially upgraded system that has essentially ground your application to a stop. Partial upgrades like that may require your database administrator (DBA) to go in and fix everything by hand, losing hours of their time and slowing your software delivery down. With transactional DDL, you can roll back to the last working upgrade and resolve the issues rapidly, without taking your software delivery system or your application offline.

A Short Explanation of DML and DDL

Essentially, DML statements are structured query language (SQL) statements that we use to manipulate data — as you might have guessed. Specifically, the DML class includes the INSERT, UPDATE, and DELETE SQL statements. Sometimes, we refer to these three statements as WRITE DML, and we call the SELECT statement READ DML. The standard does not differentiate between read and write, but for this article, it is an important distinction.

On the other hand, DDL is a family of SQL language elements used to define the database structure, particularly database schemas. The CREATE, ALTER, and DROP commands are common examples of DDL SQL statements, but DDL language elements may include operations with databases, tables, columns, indexes, views, stored procedures, and constraints.

Next, let’s start by defining a transaction as a sequence of commands collected together into a single logical unit, and a transaction is then executed as a single step. With this definition, if the execution of a transaction is interrupted, the transaction isn’t executed. Because a transaction must be ACID — Atomic, Consistent, Isolated, and Durable, that means that when a transaction executes several statements, some of which are DDL, it treats them as a single operation that can either be rolled back or committed. This means that you will never leave the database in a temporary, non-consistent state. Historically, databases haven’t provided the functionality of transactional DDL statements, but even today not all databases provide the functionality of truly transactional DDL. In most cases, this functionality comes with limitations.

Now, what does true “transactional DDL” mean? It means that all statements should be ACID, regardless of whether they are DML or DDL statements. In practice, with most databases, DDL statements break the transactionality of the enclosing transaction and cause anomalies.

A Brief History of DDL

Originally, the idea of a data definition language was introduced as part of the Codasyl database model. CODASYL is the Conference/Committee on Data Systems Languages, and was formed as a consortium in 1959 to guide development of a standard programming language, which resulted in COBOL, as well as a number of technical standards. CODASYL also worked to standardize database interfaces, all part of a goal from its members to promote more effective data systems analysis, design, and implementation.

In 1969 CODASYL’s Data Base Task Group (DBTG) published its first language specifications for their data model: a data definition language for defining the database schema, another DDL to define application views of the database, and (you guessed it) a data manipulation language that defined verbs to request and update data in the database. Later DDL was used to refer to a subset of SQL to declare tables, columns, data types, and constraints, and SQL-92 introduced a schema manipulation language and schema information tables to query schemas. In SQL:2003 these information tables were specified as SQL/Schemata.

Transactionality of DDL, however, is not part of the ANSI SQL standard. Section 17.1 of ANSI SQL 2016 () only specifies the grammar and the supported isolation levels. It does not specify how a transaction should behave or what ‘transactional’ means.

Why Isn’t Transactional DDL Universally Provided?

There’s no reason why DDL statements shouldn’t be transactional, but in the past, databases haven’t provided this functionality. In part that’s because transactional DDL implies that DDL must happen in isolation from other transactions that happen concurrently. That means that the metadata of the modified table must be versioned. To correctly process metadata changes, we need to be able to roll back DDL changes that were aborted due to a transaction rollback. That’s not easy — in fact it’s a complex algorithmic task that requires the database to support metadata delta (diff), which corresponds to DDL changes within the current transaction of each connection to the database. This delta exists before the transaction is closed, and so it could be rolled back as a single transaction, or in parts in RDBMSs that support multi-level transactions or savepoints. Essentially, it’s not universally provided because it’s hard to do correctly.

For organizations moving to microservices, DevOps, and CI/CD, they have an essential new requirement — a database that supports online transactional DDL.

Let’s return to our concept of WRITE DML (update, delete, insert) vs. READ DML. You might ask yourself how do these statements collide in a system that supports multiple concurrent transactions and a DDL transaction is ongoing? Ideally the set of transactions that collide is as small as possible. A SELECT statement does not collide with an INSERT statement. There is no reason why this should be any different in the context of DDL. A DDL statement ALTER TABLE should not prevent a SELECT statement from executing concurrently. This is a common pattern in database design.

For DDL to be transactional you need to support multiple versions concurrently. Similar to multiversion concurrency control (MVCC), readers don’t block writers and writers don’t block readers. Without MVCC it’s hard to have transactional DDL. Traditionally, databases started with a locking system instead of MVCC. That implementation wasn’t suited to transactional DDL, which is why around 2005 there was a big shift towards MVCC — to provide concurrent access to the database, and to implement transactional memory in programming languages.

MVCC provides the semantics we might naturally desire. Read DML can proceed while conflicting writes (write DML and DDL) are executed concurrently.

Write DML and DDL in a Live System

We have established that Read DML (SELECT) can happily proceed regardless of what else is executing concurrently in the system. Write DML (INSERT, UPDATE, DELETE) is not allowed to execute on a table that is being concurrently modified by DDL. Explaining the semantics of the conflicts expected behavior based on the Isolation Levels of all concurrent transactions is beyond the scope of this article.

To simplify the discussion, we state that both write DML and DDL are mutually exclusive if executed on the same resource. This results in operations blocking each other.  If you have a long-running DDL transaction, such as a rolling upgrade of your application, write DML will be prevented for a long period of time.

So even though the DDL is transactional, it can still lead to database downtime and maintenance windows. Or does it?

Always Online, Always Available

The database industry is moving towards an always online, always available model. Earlier databases resulted in a message saying that something wasn’t available — that was essentially because you grabbed a lock in a database for a long period of time. That isn’t an option in an always online, always available model.

Customers, and therefore organizations, require applications to be online and available all the time. That means that transactional DDL is mandatory for the new world, and not only for the applications to run the way customers require them to. It’s also mandatory for organizations adopting DevOps and continuous integration and continuous delivery models (CI/CD). Specifically, that’s because without transactional DDL, applications developers cannot safely and easily make database schema changes (along with their app changes) online. That means that for organizations moving to microservices, DevOps, and CI/CD, they have an essential new requirement — a database that supports online transactional DDL.

I personally consider the term ONLINE misleading. The database is not offline while it holds a lock on a resource. A more appropriate term would have been LOCK FREE DDL. That is, metadata modification can happen without locking out concurrent DML.

The Availability vs. Simplicity Tradeoff

We said that write DML cannot happen concurrently with DDL to avoid ACID violations. Now, what happens to a system that has to be always up and needs to execute long-running DDL statements? Luckily enough, most DDL statements do not take a long time. Adding or removing columns from a table takes a constant amount of time, regardless of how long the table is. If the set of changes is small enough, it is OK to lock DML out for a short period of time.

But there are some DDL statements that need to process every row in the table and hence it can take a long time to process large tables. CREATE INDEX is a prime example of a long-running statement. Given that index creation can take multiple hours, it is not an acceptable option for an always online, always available application.

Specifically, for index creation, NuoDB and other databases implement an ONLINE or CONCURRENT version (I would have preferred to call it a LOCK FREE version). This version allows DBAs to create indexes without locking the table or requiring a maintenance window — an extremely important capability in a 24×7 availability world. However, these capabilities do not come for free. Online versions of common DDL statements tend to be slightly slower than their LOCKING versions, have complicated failure modes, and hard to understand ACID semantics. They also cannot be part of a larger multistatement DDL transaction.

Interestingly enough, sometimes the speed of execution is not the primary concern. In which case you may not want to use the online version. You might have an application that requires complex changes to a database schema and some LOCKING is a viable tradeoff for a much simpler upgrade procedure.

Atomicity, one of the four guarantees of ACID, states that: “Each transaction is treated as a single ‘unit,’ which either succeeds completely or fails completely.”

This becomes a very desirable quality if we think of transactions as a set of many DDL statements. An example would be: alter a table; create a log table with a similar name; insert a few rows to other tables. We already know that if DDL is not treated transactionally, you could end up with new rows in the other tables, but neither the CREATE nor the ALTER succeeded. Or you could end up with just the CREATE and no ALTER.

So, if you have an application that assumes that if there is a log table (the CREATE) it can also assume that the ALTER has happened, you might run into subtle bugs in production if the upgrade did not fully complete. Transactional DDL gives database administrators the ability to perform multiple modifications (such as the example above) in a single operation.

For developers, the strong Isolation guarantees of transactional DDL makes the development of applications easier. An application can only observe the database in state A (before the upgrade) or in-state B (after the upgrade) and will never see partial results. This reduces the required test matrix and increases confidence in the rolling upgrade procedure. Now that is easy to code against.

The tradeoff between simplicity of rolling upgrades that is LOCKING and the always available, always online that is NOT TRANSACTIONAL has been known to the industry since InterBase introduced MVCC to the commercial market.

Choose Transactional DDL

Databases have changed a lot since 1959, and there have been many changes in customer expectations for user experience and application availability since then. Transactional DDL helps you avoid a scenario in which your application is no longer available, and gives your DBAs some peace of mind, knowing they won’t have to painstakingly repair the database to bring your software delivery back up to speed. Today, many databases offer transactional DDL, which will help you resolve immediate issues quickly by rolling back to the last working upgrade. In order to meet the always available requirements of today, choose a database that offers transactional DDL. But keep in mind that modern always-available, always-online applications require a database that not only simplifies upgrade scenarios, but also limits downtime due to long-running metadata modifications.

Feature image via Pixabay.

Originally published in The New Stack.

Moving from MySQL 5.7 to MySQL 8.0 – What You Should Know

$
0
0

Feed: Planet MySQL
;
Author: Severalnines
;

April 2018 is not just a date for the MySQL world. MySQL 8.0 was released there, and more than 1 year after, it’s probably time to consider migrating to this new version.

MySQL 8.0 has important performance and security improvements, and, as in all migration to a new database version, there are several things to take into account before going into production to avoid hard issues like data loss, excessive downtime, or even a rollback during the migration task.

In this blog, we’ll mention some of the new MySQL 8.0 features, some deprecated stuff, and what you need to keep in mind before migrating.

What’s New in MySQL 8.0?

Let’s now summarize some of the most important features mentioned in the official documentation for this new MySQL version.

  • MySQL incorporates a transactional data dictionary that stores information about database objects.
  • An atomic DDL statement combines the data dictionary updates, storage engine operations, and binary log writes associated with a DDL operation into a single, atomic transaction.
  • The MySQL server automatically performs all necessary upgrade tasks at the next startup to upgrade the system tables in the mysql schema, as well as objects in other schemas such as the sys schema and user schemas. It is not necessary for the DBA to invoke mysql_upgrade.
  • It supports the creation and management of resource groups, and permits assigning threads running within the server to particular groups so that threads execute according to the resources available to the group. 
  • Table encryption can now be managed globally by defining and enforcing encryption defaults. The default_table_encryption variable defines an encryption default for newly created schemas and general tablespace. Encryption defaults are enforced by enabling the table_encryption_privilege_check variable. 
  • The default character set has changed from latin1 to utf8mb4.
  • It supports the use of expressions as default values in data type specifications. This includes the use of expressions as default values for the BLOB, TEXT, GEOMETRY, and JSON data types.
  • Error logging was rewritten to use the MySQL component architecture. Traditional error logging is implemented using built-in components, and logging using the system log is implemented as a loadable component.
  • A new type of backup lock permits DML during an online backup while preventing operations that could result in an inconsistent snapshot. The new backup lock is supported by LOCK INSTANCE FOR BACKUP and UNLOCK INSTANCE syntax. The BACKUP_ADMIN privilege is required to use these statements.
  • MySQL Server now permits a TCP/IP port to be configured specifically for administrative connections. This provides an alternative to the single administrative connection that is permitted on the network interfaces used for ordinary connections even when max_connections connections are already established.
  • It supports invisible indexes. This index is not used by the optimizer and makes it possible to test the effect of removing an index on query performance, without removing it.
  • Document Store for developing both SQL and NoSQL document applications using a single database.
  • MySQL 8.0 makes it possible to persist global, dynamic server variables using the SET PERSIST command instead of the usual SET GLOBAL one. 

MySQL Security and Account Management

As there are many improvements related to security and user management, we’ll list them in a separate section.

  • The grant tables in the mysql system database are now InnoDB tables. 
  • The new caching_sha2_password authentication plugin is now the default authentication method in MySQL 8.0. It implements SHA-256 password hashing, but uses caching to address latency issues at connect time. It provides more secure password encryption than the mysql_native_password plugin, and provides better performance than sha256_password.
  • MySQL now supports roles, which are named collections of privileges. Roles can have privileges granted to and revoked from them, and they can be granted to and revoked from user accounts. 
  • MySQL now maintains information about password history, enabling restrictions on reuse of previous passwords. 
  • It enables administrators to configure user accounts such that too many consecutive login failures due to incorrect passwords cause temporary account locking. 

InnoDB enhancements

As the previous point, there are also many improvements related to this topic, so we’ll list them in a separate section too.

  • The current maximum auto-increment counter value is written to the redo log each time the value changes, and saved to an engine-private system table on each checkpoint. These changes make the current maximum auto-increment counter value persistent across server restarts
  • When encountering index tree corruption, InnoDB writes a corruption flag to the redo log, which makes the corruption flag crash-safe. InnoDB also writes in-memory corruption flag data to an engine-private system table on each checkpoint. During recovery, InnoDB reads corruption flags from both locations and merges results before marking in-memory table and index objects as corrupt.
  • A new dynamic variable, innodb_deadlock_detect, may be used to disable deadlock detection. On high concurrency systems, deadlock detection can cause a slowdown when numerous threads wait for the same lock. At times, it may be more efficient to disable deadlock detection and rely on the innodb_lock_wait_timeout setting for transaction rollback when a deadlock occurs.
  • InnoDB temporary tables are now created in the shared temporary tablespace, ibtmp1.
  • mysql system tables and data dictionary tables are now created in a single InnoDB tablespace file named mysql.ibd in the MySQL data directory. Previously, these tables were created in individual InnoDB tablespace files in the mysql database directory.
  • By default, undo logs now reside in two undo tablespaces that are created when the MySQL instance is initialized. Undo logs are no longer created in the system tablespace.
  • The new innodb_dedicated_server variable, which is disabled by default, can be used to have InnoDB automatically configure the following options according to the amount of memory detected on the server: innodb_buffer_pool_size, innodb_log_file_size, and innodb_flush_method. This option is intended for MySQL server instances that run on a dedicated server. 
  • Tablespace files can be moved or restored to a new location while the server is offline using the innodb_directories option. 

Now, let’s take a look at some of the features that you shouldn’t use anymore in this new MySQL version.

What is Deprecated in MySQL 8.0?

The following features are deprecated and will be removed in a future version.

  • The utf8mb3 character set is deprecated. Please use utf8mb4 instead.
  • Because caching_sha2_password is the default authentication plugin in MySQL 8.0 and provides a superset of the capabilities of the sha256_password authentication plugin, sha256_password is deprecated.
  • The validate_password plugin has been reimplemented to use the server component infrastructure. The plugin form of validate_password is still available but is deprecated.
  • The ENGINE clause for the ALTER TABLESPACE and DROP TABLESPACE statements.
  • The PAD_CHAR_TO_FULL_LENGTH SQL mode.
  • AUTO_INCREMENT support is deprecated for columns of type FLOAT and DOUBLE (and any synonyms). Consider removing the AUTO_INCREMENT attribute from such columns, or convert them to an integer type.
  • The UNSIGNED attribute is deprecated for columns of type FLOAT, DOUBLE, and DECIMAL (and any synonyms). Consider using a simple CHECK constraint instead for such columns.
  • FLOAT(M,D) and DOUBLE(M,D) syntax to specify the number of digits for columns of type FLOAT and DOUBLE (and any synonyms) is a nonstandard MySQL extension. This syntax is deprecated.
  • The nonstandard C-style &&, ||, and ! operators that are synonyms for the standard SQL AND, OR, and NOT operators, respectively, are deprecated. Applications that use the nonstandard operators should be adjusted to use the standard operators.
  • The mysql_upgrade client is deprecated because its capabilities for upgrading the system tables in the mysql system schema and objects in other schemas have been moved into the MySQL server.
  • The mysql_upgrade_info file, which is created data directory and used to store the MySQL version number.
  • The relay_log_info_file system variable and –master-info-file option are deprecated. Previously, these were used to specify the name of the relay log info log and master info log when relay_log_info_repository=FILE and master_info_repository=FILE were set, but those settings have been deprecated. The use of files for the relay log info log and master info log has been superseded by crash-safe slave tables, which are the default in MySQL 8.0.
  • The use of the MYSQL_PWD environment variable to specify a MySQL password is deprecated.

And now, let’s take a look at some of the features that you must stop using in this MySQL version.

What Was Removed in MySQL 8.0?

The following features have been removed in MySQL 8.0.

  • The innodb_locks_unsafe_for_binlog system variable was removed. The READ COMMITTED isolation level provides similar functionality.
  • Using GRANT to create users. Instead, use CREATE USER. Following this practice makes the NO_AUTO_CREATE_USER SQL mode immaterial for GRANT statements, so it too is removed, and an error now is written to the server log when the presence of this value for the sql_mode option in the options file prevents mysqld from starting.
  • Using GRANT to modify account properties other than privilege assignments. This includes authentication, SSL, and resource-limit properties. Instead, establish such properties at account-creation time with CREATE USER or modify them afterward with ALTER USER.
  • IDENTIFIED BY PASSWORD ‘auth_string’ syntax for CREATE USER and GRANT. Instead, use IDENTIFIED WITH auth_plugin AS ‘auth_string’ for CREATE USER and ALTER USER, where the ‘auth_string’ value is in a format compatible with the named plugin. 
  • The PASSWORD() function. Additionally, PASSWORD() removal means that SET PASSWORD … = PASSWORD(‘auth_string’) syntax is no longer available.
  • The old_passwords system variable.
  • The FLUSH QUERY CACHE and RESET QUERY CACHE statements.
  • These system variables: query_cache_limit, query_cache_min_res_unit, query_cache_size, query_cache_type, query_cache_wlock_invalidate.
  • These status variables: Qcache_free_blocks, Qcache_free_memory, Qcache_hits, Qcache_inserts, Qcache_lowmem_prunes, Qcache_not_cached, Qcache_queries_in_cache, Qcache_total_blocks.
  • These thread states: checking privileges on cached query, checking query cache for a query, invalidating query cache entries, sending cached result to the client, storing result in the query cache, Waiting for query cache lock.
  • The tx_isolation and tx_read_only system variables have been removed. Use transaction_isolation and transaction_read_only instead.
  • The sync_frm system variable has been removed because .frm files have become obsolete.
  • The secure_auth system variable and –secure-auth client option have been removed. The MYSQL_SECURE_AUTH option for the mysql_options() C API function was removed.
  • The log_warnings system variable and –log-warnings server option have been removed. Use the log_error_verbosity system variable instead.
  • The global scope for the sql_log_bin system variable was removed. sql_log_bin has session scope only, and applications that rely on accessing @@GLOBAL.sql_log_bin should be adjusted.
  • The unused date_format, datetime_format, time_format, and max_tmp_tables system variables are removed.
  • The deprecated ASC or DESC qualifiers for GROUP BY clauses are removed. Queries that previously relied on GROUP BY sorting may produce results that differ from previous MySQL versions. To produce a given sort order, provide an ORDER BY clause.
  • The parser no longer treats N as a synonym for NULL in SQL statements. Use NULL instead. This change does not affect text file import or export operations performed with LOAD DATA or SELECT … INTO OUTFILE, for which NULL continues to be represented by N. 
  • The client-side –ssl and –ssl-verify-server-cert options have been removed. Use –ssl-mode=REQUIRED instead of –ssl=1 or –enable-ssl. Use –ssl-mode=DISABLED instead of –ssl=0, –skip-ssl, or –disable-ssl. Use –ssl-mode=VERIFY_IDENTITY instead of –ssl-verify-server-cert options.
  • The mysql_install_db program has been removed from MySQL distributions. Data directory initialization should be performed by invoking mysqld with the –initialize or –initialize-insecure option instead. In addition, the –bootstrap option for mysqld that was used by mysql_install_db was removed, and the INSTALL_SCRIPTDIR CMake option that controlled the installation location for mysql_install_db was removed.
  • The mysql_plugin utility was removed. Alternatives include loading plugins at server startup using the –plugin-load or –plugin-load-add option, or at runtime using the INSTALL PLUGIN statement.
  • The resolveip utility is removed. nslookup, host, or dig can be used instead.

There are a lot of new, deprecated, and removed features. You can check the official website for more detailed information.

Considerations Before Migrating to MySQL 8.0

Let’s mention now some of the most important things to consider before migrating to this MySQL version.

Authentication Method

As we mentioned, caching_sha2_password is not the default authentication method, so you should check if your application/connector supports it. If not, let’s see how you can change the default authentication method and the user authentication plugin to ‘mysql_native_password’ again.

To change the default  authentication method, edit the my.cnf configuration file, and add/edit the following line:

$ vi /etc/my.cnf

[mysqld]

default_authentication_plugin=mysql_native_password

To change the user authentication plugin, run the following command with a privileged user:

$ mysql -p

ALTER USER ‘username’@’hostname’ IDENTIFIED WITH ‘mysql_native_password’ BY ‘password’;

Anyway, these changes aren’t a permanent solution as the old authentication could be deprecated soon, so you should take it into account for a future database upgrade.

Also the roles are an important feature here. You can reduce the individual privileges assigning it to a role and adding the corresponding users there. 

For example, you can create a new role for the marketing and the developers teams:

$ mysql -p

CREATE ROLE 'marketing', 'developers';

Assign privileges to these new roles:

GRANT SELECT ON *.* TO 'marketing';

GRANT ALL PRIVILEGES ON *.* TO 'developers';

And then, assign the role to the users:

GRANT 'marketing' TO 'marketing1'@'%';

GRANT 'marketing' TO 'marketing2'@'%';

GRANT 'developers' TO 'developer1'@'%';

And that’s it. You’ll have the following privileges:

SHOW GRANTS FOR 'marketing1'@'%';

+-------------------------------------------+

| Grants for [email protected]%                   |

+-------------------------------------------+

| GRANT USAGE ON *.* TO `marketing1`@`%`    |

| GRANT `marketing`@`%` TO `marketing1`@`%` |

+-------------------------------------------+

2 rows in set (0.00 sec)

SHOW GRANTS FOR 'marketing';

+----------------------------------------+

| Grants for [email protected]%                 |

+----------------------------------------+

| GRANT SELECT ON *.* TO `marketing`@`%` |

+----------------------------------------+

1 row in set (0.00 sec)

Character Sets

As the new default character set is utf8mb4, you should make sure you’re not using the default one as it’ll change.

To avoid some issues, you should specify the character_set_server and the collation_server variables in the my.cnf configuration file.

$ vi /etc/my.cnf

[mysqld]

character_set_server=latin1

collation_server=latin1_swedish_ci

MyISAM Engine

The MySQL privilege tables in the MySQL schema are moved to InnoDB. You can create a table engine=MyISAM, and it will work as before, but coping a MyISAM table into a running MySQL server will not work because it will not be discovered.

Partitioning

There must be no partitioned tables that use a storage engine that does not have native partitioning support. You can run the following query to verify this point.

$ mysql -p

SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE ENGINE NOT IN ('innodb', 'ndbcluster') AND CREATE_OPTIONS LIKE '%partitioned%';

If you need to change the engine of a table, you can run:

ALTER TABLE table_name ENGINE = INNODB;

Upgrade Check

As a last step, you can run the mysqlcheck command using the check-upgrade flag to confirm if everything looks fine.

$ mysqlcheck -uroot -p --all-databases --check-upgrade

Enter password:

mysql.columns_priv                                 OK

mysql.component                                    OK

mysql.db                                           OK

mysql.default_roles                                OK

mysql.engine_cost                                  OK

mysql.func                                         OK

mysql.general_log                                  OK

mysql.global_grants                                OK

mysql.gtid_executed                                OK

mysql.help_category                                OK

mysql.help_keyword                                 OK

mysql.help_relation                                OK

mysql.help_topic                                   OK

mysql.innodb_index_stats                           OK

mysql.innodb_table_stats                           OK

mysql.password_history                             OK

mysql.plugin                                       OK

mysql.procs_priv                                   OK

mysql.proxies_priv                                 OK

mysql.role_edges                                   OK

mysql.server_cost                                  OK

mysql.servers                                      OK

mysql.slave_master_info                            OK

mysql.slave_relay_log_info                         OK

mysql.slave_worker_info                            OK

mysql.slow_log                                     OK

mysql.tables_priv                                  OK

mysql.time_zone                                    OK

mysql.time_zone_leap_second                        OK

mysql.time_zone_name                               OK

mysql.time_zone_transition                         OK

mysql.time_zone_transition_type                    OK

mysql.user                                         OK

sys.sys_config                                     OK

world_x.city                                       OK

world_x.country                                    OK

world_x.countryinfo                                OK

world_x.countrylanguage                            OK

There are several things to check before performing the upgrade. You can check the official MySQL documentation for more detailed information.

Upgrade Methods

There are different ways to upgrade MySQL 5.7 to 8.0. You can use the upgrade in-place or even create a replication slave in the new version, so you can promote it later. 

But before upgrading, step 0 must be backing up your data. The backup should include all the databases including the system databases. So, if there is any issue, you can rollback asap. 

Another option, depending on the available resources, can be creating a cascade replication MySQL 5.7 -> MySQL 8.0 -> MySQL 5.7, so after promoting the new version, if something went wrong, you can promote the slave node with the old version back. But it could be dangerous if there was some issue with the data, so the backup is a must before it.

For any method to be used, it’s necessary a test environment to verify that the application is working without any issue using the new MySQL 8.0 version.

Conclusion

More than 1 year after the MySQL 8.0 release, it is time to start thinking to migrate your old MySQL version, but luckily, as the end of support for MySQL 5.7 is 2023, you have time to create a migration plan and test the application behavior with no rush. Spending some time in that testing step is necessary to avoid any issue after migrating it.

Automatic Schema Synchronization in NDB Cluster 8.0: Part 2

$
0
0

Feed: Planet MySQL
;
Author: MySQL High Availability
;

In part 1, we took a brief, high-level look at the various protocols and mechanisms used to keep the Data Dictionary (DD) of MySQL servers connected to a MySQL Cluster in synchronization with each other and with the NDB Dictionary. More specifically, we explored the problems with the implementation of user-triggered synchronization in the NDB Cluster 7.x versions. These concerns are addressed in NDB Cluster 8.0 through a new feature: Automatic Schema Synchronization (or auto schema sync for short).

A new component called Metadata Change Monitor has been introduced to detect any NDB metadata changes. This component runs in the background and compares the contents of the NDB Dictionary with that of the MySQL server’s DD at fixed, user-configurable intervals of time. The Metadata Change Monitor detects any mismatch i.e. the scenario where a metadata object exists in NDB Dictionary and is missing from the MySQL server DD and vice versa. The metadata objects checked for mismatches are:

  • Logfile groups
  • NDB tablespaces
  • Databases (or schemata) containing NDB tables
  • NDB tables

The Metadata Change Monitor submits any mismatched objects detected to a queue from which they are eventually synchronized with the NDB Dictionary. These objects are eventually synchronized by the NDB Event Handling component making the discovery and synchronization of mismatched objects asynchronous by design. The NDB Event Handling component picks up an object from the head of the queue and attempts to synchronize it by either creating or deleting the object in the MySQL server’s DD depending on whether it exists in NDB Dictionary or not. The rate of schema object synchronization is throttled to avoid any significant performance overhead.

Usability

The primary goal for auto schema sync in terms of improving usability is to remove the need for users to perform a manual step in order for metadata changes made using native NdbApi to be visible in MySQL servers. By default the Metadata Change Monitor component polls for mismatches every 60 seconds which ensures that all metadata changes are eventually propagated to the MySQL servers without any user intervention. The feature can be enabled or disabled by setting the ndb_metadata_check MySQL server system variable to 1 or 0 while the interval can be tweaked using the ndb_metadata_check_interval system variable. The shorter the interval, the quicker the mismatches will be detected and synchronized but this also results in more resource utilization which is a trade-off the user will have to be wary of.

There are a couple of MySQL server status variables: Ndb_metadata_detected_count and Ndb_metadata_synced_count which contain the count of the number of objects detected and synchronized respectively.

The above mechanism ensures that the metadata is eventually present in the MySQL server’s DD and also serves as an option to fall back on for certain failed schema distribution or schema synchronization attempts. It is, however, not a drop-in replacement for the erstwhile SHOW TABLES behaviour. There’s still a valid use case where, for example, an application needs to restore metadata using the ndb_restore utility and then ensure that all the metadata is now present in the MySQL server before continuing with further processing. In such cases, the eventual consistency achieved by the polling Metadata Change Monitor and synchronization of the queue is not ideal as it would require additional application logic to see if the metadata is present or polling the above status variables until the desired state is detected. To solve this, a new MySQL server system variable called ndb_metadata_sync was introduced.

The usage is neatly summarized in the MySQL manual which is quoted verbatim below for the sake of convenience:

Setting this variable causes the change monitor thread to override any values set for ndb_metadata_check or ndb_metadata_check_interval, and to enter a period of continuous change detection. When the thread ascertains that there are no more changes to be detected, it stalls until the binary logging thread has finished synchronization of all detected objects. ndb_metadata_sync is then set to false, and the change monitor thread reverts to the behavior determined by the settings for ndb_metadata_check and ndb_metadata_check_interval.

This can be demonstrated with the help of a small example as follows:

Assume that the above metadata is backed up using the ndb_mgm client (skipped for the sake of brevity) and then the database ‘db1’ is dropped using the MySQL client. The ndb_restore utility can be used to create the metadata in the NDB Dictionary but not in the DD of the MySQL server. Rather than waiting for periodic polling to find the mismatch and synchronize the schema, a user can simply set the ndb_metadata_sync variable to true and wait until it is automatically flipped back to its default value of false.

Global Locks

In the NDB Cluster 7.x implementation, a global lock is taken which spans the entire duration of the synchronization activity. With auto schema sync, it is now held only for multiple, short intervals. The NDB Event Handling component acquires (and releases) this global lock on a per object basis. An important thing to note is that a try-lock strategy is employed when it comes to acquiring this lock. This, coupled with the fact that the locks are short-lived, makes auto schema sync less intrusive and less likely to affect other DDL changes that may be taking place in parallel.

No additional overhead during SHOW TABLES

In NDB Cluster 8.0, the SHOW TABLES query does just that and no more. The additional synchronization and resource contention in terms of locking that occurs as a side-effect in NDB Cluster 7.x versions has been completely removed.

Design concern

The Metadata Change Monitor component is used to simply detect any mismatches and submit it to the NDB Event Handling component. It is the NDB Event Handling component that is actually responsible for acquiring the appropriate global and metadata locks while modifying the MySQL server’s DD. This is in line with the design of the schema synchronization and schema distribution protocols therefore aligning the 3 different mechanisms from a design perspective. From a code point of view, this also enabled removal of code since the functionality is encapsulated in a single place.

One interesting design challenge for this feature is the scenario when the NDB Event Handling component fails to synchronize an object due to a permanent error in execution. In such cases, the same mismatch could be detected again and again by the Metadata Change Monitor along with (possibly) successive failed attempts by the NDB Event Handling component. This is prevented by maintaining a blacklist of objects that the NDB Event Handling component has failed to synchronize. On failure, the object is placed in the blacklist. The user is then expected to resolve the mismatch by attempting to discover the object using SELECT or SHOW queries or triggering a reconnection of the MySQL server to the MySQL Cluster in more extreme cases. The number of objects present in the blacklist can be checked using the Ndb_metadata_blacklist_size variable.

For as long as an object exists in the blacklist, it’s ignored by the Metadata Change Monitor in subsequent iterations. The validation of the objects in the blacklist is done by the Metadata Change Monitor at the beginning of the next detection cycle. Each object in the blacklist is checked to see if the mismatch still exists. If it doesn’t, then the object is removed from the blacklist and is considered to be a viable candidate for automatic schema synchronization from that point onwards. If the mismatch still exists, then the object is ignored for another detection cycle and will continue to be ignored until the user manually intervenes to correct the mismatch.

Summary

From a user point of view, the main change as a result of auto schema sync in NDB Cluster 8.0 is how metadata restored using the ndb_restore utility is propagated to the DD of the MySQL server.

In 7.x versions, users are expected to issue the following query in order to synchronize changes:

In 8.0, users can simply wait for the periodic polling and synchronization of the changes to occur. The polling period can be changed by tweaking the ndb_metadata_check_interval MySQL server system variable:

Alternatively, in 8.0, the user can set the ndb_metadata_sync MySQL server system variable to true and wait until it is automatically flipped back to false:

There’s more work planned in the area with increased functionality and exposing more details to the user on top of the wishlist. As with any new feature, early feedback from the community is vital and much appreciated!

543 total views, 96 views today


Automatic Schema Synchronization in NDB Cluster 8.0: Part 1

$
0
0

Feed: Planet MySQL
;
Author: MySQL High Availability
;

Data nodes are the distributed, sharded storage core of MySQL NDB Cluster. Its data is usually accessed by MySQL Servers (also called SQL nodes in NDB parlance). The MySQL servers each have their own transactional Data Dictionary (DD) where all the metadata describing tables, databases, tablespaces, logfile groups, foreign keys, and other objects are stored for use by MySQL server. The MySQL server DD, introduced in version 8.0, has enabled improvements such as atomic and crash-safe DDL and the INFORMATION_SCHEMA implementation among other things. At the storage engine level, NDB has its own distributed data dictionary describing all of the schema objects which can be modified directly using native NdbApi.

From an NDB Cluster perspective, the NDB Dictionary is viewed as the source of truth while each MySQL server’s DD is equivalent to a cached copy whose overlapping contents need to be kept in synchronization with that of the NDB Dictionary. This synchronization is achieved by the ndbcluster storage engine plugin through the following three mechanisms:

  1. Schema Synchronization: This occurs every time a MySQL server reconnects to the Cluster. The schema synchronization mechanism ensures that the DD of the MySQL server is updated with any NDB metadata changes that might have occurred while the MySQL server was not connected to the Cluster. It is important to note that there are no changes made to metadata in the NDB Dictionary in this phase with the NDB Dictionary remaining read-only until the synchronization concludes.
  2. Schema Distribution: While a MySQL server is connected to the Cluster, we rely on the schema distribution mechanism to ensure that all connected MySQL servers remain in synchronized states. This is done by ensuring that all DDL changes involving NDB metadata are distributed across all connected MySQL servers.
  3. User-triggered Synchronization: Unlike the first 2 mechanisms which are executed automatically in the background, this requires the user to take action and trigger a synchronization of metadata. In NDB Cluster 7.x versions, this is useful after the ndb_restore utility is used to restore metadata in the NDB Dictionary. Such changes then have to be reflected in the DD of the MySQL server and require the user to manually trigger a synchronization which can be done on a larger scale by issuing a SHOW TABLES query or using the “table discovery” mechanism to synchronize on a per table basis. Table discovery can be triggered by any DMLs that involve a table open such as SELECT or SHOW CREATE TABLE for example.

In MySQL 8.0, the MySQL Server data dictionary was reimplemented, storing schema information in InnoDB tables, and using InnoDB transactions to give transactional behaviour to MySQL Server data dictionary DDL operations. For NDB, the introduction of the transactional DD in MySQL 8.0 involved large changes to the internal working of schema synchronization and distribution including improvements to the respective protocols. Most of this schema synchronization work is done automatically in the background and will have little or no impact to the user. The user-triggered synchronization, on the other hand, is obviously different and we took the chance to review its behaviour and indeed change the working in its entirety in NDB Cluster 8.0 (which is now GA!).

In NDB Cluster 7.x versions, issuing a SHOW TABLES command performs the equivalent of a schema synchronization comparing the contents of the data directory with that of the NDB Dictionary and correcting any mismatch detected. This is less than ideal due to the following reasons:

  • Usability: The user is expected to issue an additional query after restoring metadata to the NDB Dictionary to ensure that the metadata is also visible in the MySQL server. This can become tedious with larger configurations since it has to be done on every MySQL server connected to the Cluster.
  • Global locks: This requires acquiring and holding global locks which prevents other metadata changes from occurring during the synchronization.
  • Additional work done by SHOW TABLES: SHOW TABLES is meant to be a simple read query but instead performs additional metadata changes and uses more resources than one would expect.
  • Design concern: The user thread performs synchronization which is primarily the responsibility of the NDB Event Handling component.

This functionality in NDB Cluster 7.x versions relied on the presence of .frm files which have been removed with the advent of the MySQL server DD in MySQL 8.0. This gave us the chance to wipe the slate clean in NDB Cluster 8.0 and look at how to approach the problem again. Read the follow-up post for more details about Automatic Schema Synchronization in NDB Cluster 8.0!

421 total views, 72 views today


Understanding Query Execution in Relational Database System Architecture

$
0
0

Feed: Databasejournal.com – Feature Database Articles.
Author: .

The Relational Database Management System (RDBMS) is built with numerous complex algorithms and data structure just to store and retrieve information properly. The complexity is almost akin to an operating system that functions in sync with many features almost in real time. Modern RDBMS has built-in facility for memory management, file buffering, network communication support etc. They form the basic architecture of the RDBMS package. The article provides a glimpse of what goes behind the scene when a user submits a query until the result is obtained from the database.

Understanding RDBMS

An RDBMS package is typically a database server that serves multiple clients via communication pathways under the aegis of network protocol such as such as socket, pipes etc. In a standalone database application client communicates with the database via programmatic interfaces. In such a case the database server becomes part of the client application or vice versa. Sometimes the database is contained within the embedded system as a slave to the host system. Generally, in a large database application, the RDBMS server is separated from the concern on the application by hosting the server in a remote location. The business logic interacts with the database server via network as per requirement. Regardless, the logic for query processing remains the same be it an embedded database application, network application or a standalone application.

Database Connectors

Applications connect to the database using a set of protocols called database connectors. The Open Database Connectivity (ODBC) is a well-known database connector that an application can use to connect to almost any database. There are also vendor specific database connectors for an RDBMS such as MySQL. MySQL supports connectors for Java (JDBC), PHP, Python, .NET etc. These implementations mostly support communication over network protocols. These connectors are designed (API) to transfer SQL commands to the database server and retrieve information upon request by the client. The connectors typically consist of database driver and client access APIs.

Query Interface

Queries are nothing more than questions put to the database according to the syntax and semantics of standard query language called SQL (Structured Query Language). The database server understands the language and replies back as per the query submitted. According to the semantics of SQL, queries can be of two types. The first type of query is a Data Definition Language (DDL) query, which is typically used to create and do things with the dataabse such as creating and altering tables, defining indexes, managing constraints, etc. A second type of query called the Data Manipulation Query (DML) is used to work on the data of the database. This includes actions such as SELECT querying, updating, and deleting data in the database tables.

A typical SELECT query syntax may be written as follows. The square bracket ([]) represents optional parameters and the lowercase notation depicts user-defined variables.

SELECT [ DISTINCT ] columns
FROM tables
[ WHERE expression ]
[ GROUP BY columns ] 
[ HAVING expression ] 
[ ORDER BY columns ] ;
  • The DISTINCT keyword removes the duplicate records in the final result.
  • The FORM clause forms a projection on the references that appear in the other clauses.
  • The WHERE applies the expression on the referenced table.
  • The GROUP BY clause groups the result according to the specified attribute.
  • The HAVING clause applies filter on the groups.
  • The ORDER BY clause sorts the result.

Query Processing

Once the client submits a database query statement via network protocol to the database server, it is first interpreted and then executed. The interpretation is meant to decipher the meaning of the query. This is done by parsing the SQL statement and breaking it into elements before executing. The interpretation of the query is a two-step process: one, in the logical plan it describes what the query is supposed to do and secondly, in the physical plan, it describes how to implement the query.

The physical plan of the query is handled by the database system’s query execution engine. A tree structure is created where each node represents query operator with number of children. These children represent a number of tables involved in the operation. The query is passed through several phases before execution such as parsing, validation, optimization, plan generation/compilation and finally execution.

  • Parsing breaks the SQL statement into parts, validates it and translate the logical query (SQL query) into a query tree according to the syntactical scheme of the relational algebra. This is the logical plan.
  • The logical query is then translated into a physical plan. There can be many such plans, but the query optimizer finds the best one, say, according to the estimated execution performance. This is done by taking on the relational algebra tree into optimizer’s search space and expanding it by forming alternative execution plans and then finally choosing the best among them. The result is akin to the code-generation part the compiling of SQL. The critical resources to optimize the code is obtained from the database system’s catalog that contains the information about number if tuples, and many other things such as stored relations referenced by the query etc. The optimizer finally copies the optimal plan from the memory structure and send it to the query execution engine. The query execution engine executes the plan using database relation as input and generates new table with rows and columns that matches the query criteria.

Note that the plan is always optimal or near optimal within the search space of the optimizer. The interpretation of a SQL query by the RDBMS is not that simple after all. Optimization is a costly affair because it analyses on alternative execution plans. A single query can have an infinite number of possibilities. Therefore, it consumes additional processing time impacting on both the query optimizer, query execution engine and overall database response time.

Conclusion

This is just a glimpse of an overall SQL query execution process. In short, parsing breaks the SQL statement into elements which is then passed through validation phase to validate for errors and check the syntax according to the SQL standard and identify query operation. The parser transfers the query into an intermediate form, recognized by the optimizer which generates efficient query execution plan. The execution engine then takes the optimized query and executes the query. The result thus obtained from the execution is finally returned to the client.

# # #

Viewing all 275 articles
Browse latest View live