The Horror of NOLOGGING in Oracle

May 3, 2019, 8:33 am

≫ Next: Introducing the MemSQL Kubernetes Operator

≪ Previous: How to create multiple accounts for an app?

Feed: Databasejournal.com – Feature Database Articles.
Author: .

Transactions are the lifeblood of a relational database and Oracle is no exception. The key to this concept is the ability to replay those transactions should the need arise to recover ‘lost’ data due to a database crash or, worse, corruption requiring a restore and recovery. Yes, recording the transactions does take time but that time is usually minimal when compared to the overall transaction time. Some, however, feel the need to suppress such logging to ‘speed things up’, and Oracle allows that to happen in certain cases. Disabling logging ensures that affected transactions will not be recoverable, putting recovery at risk. In this article, you’ll see a scenario (that should NEVER happen in production) and learn what can be done to discover unrecoverable data files and how to fix such an issue.

Redo generation can be ‘successfully’ suppressed for a small set of statements, including CREATE TABLE AS SELECT, INSERT INTO … AS SELECT and CREATE INDEX. When such commands are executed the associated redo statements will not be found in the redo logs (and, later, the archive logs) so they cannot be ‘replayed’ during recovery and won’t appear in the logs at standby sites which makes the standby ‘out of synch’ with the primary (not a good situation since it may be necessary at some point to activate the standby database as production and it won’t match the primary). Executing an example may help illustrate this; a table is created as the trip down Nologging Lane begins:


SQL> create table t_rex tablespace yerg as
  2   select rownum as t_id, 'Jnorg floppneqt aspleezius vung' as t_col
  3   from dual connect by level <= 1e5;

Table created.

SQL>

Let’s backup that datafile now, before going any further:



C:Users>rman target sys/%%%%%%%%%%%

Recovery Manager: Release 12.1.0.2.0 - Production on Thu Apr 13 10:56:18 2017

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

connected to target database: FNORG (DBID=3119604680)

RMAN> @fnorg_yerg_hot_backup_rman.rmn

RMAN> run {
2> allocate channel d1 type disk;
3> backup
4> tablespace yerg
5> format 'c:backupsfnorgfnorg_yerg_hot_%t_%s_%p';
6> }
using target database control file instead of recovery catalog
allocated channel: d1
channel d1: SID=129 device type=DISK

Starting backup at 13-APR-17
channel d1: starting full datafile backup set
channel d1: specifying datafile(s) in backup set
input datafile file number=00007 name=C:APPDNS-DDFORADATAFNORGYERG01.DBF
channel d1: starting piece 1 at 13-APR-17
channel d1: finished piece 1 at 13-APR-17
piece handle=C:BACKUPSFNORGFNORG_YERG_HOT_941194592_3_1 tag=TAG20170413T105631 comment=NONE
channel d1: backup set complete, elapsed time: 00:00:03
Finished backup at 13-APR-17
released channel: d1

RMAN>
RMAN> **end-of-file**

RMAN>

All of our changes have now been preserved, although since the table was created, LOGGING recovery would have restored it without the current backup. You can verify this with the following query:


SQL> select file#,to_char(UNRECOVERABLE_TIME,'yyyy-mm-dd:hh24:mi:ss')
  2  from v$datafile where file#=(select file_id from dba_data_files where tablespace_name = 'YERG');

     FILE# TO_CHAR(UNRECOVERAB
---------- -------------------
         7

SQL>

It’s time to throw a monkey-wrench into the works; Let’s create an index on table T_REX and do it NOLOGGING:


SQL> create index t_rex_idx on t_rex(t_id) tablespace yerg nologging;

Index created.

SQL>

Run that same unrecoverable query again:


SQL> select file#,to_char(UNRECOVERABLE_TIME,'yyyy-mm-dd:hh24:mi:ss')
  2  from v$datafile where file#=(select file_id from dba_data_files where tablespace_name = 'YERG');

     FILE# TO_CHAR(UNRECOVERAB
---------- -------------------
         7 2017-04-13:11:10:52

SQL>

For even more proof, you can ask RMAN to report unrecoverable items:


RMAN> report unrecoverable;

using target database control file instead of recovery catalog
Report of files that need backup due to unrecoverable operations
File Type of Backup Required Name
---- ----------------------- -----------------------------------
7    full or incremental     C:APPORADATAFNORGYERG01.DBF

RMAN>

It’s a good thing we checked on this; it’s not a step a DBA usually takes in their day-to-day duties, but it might be a good check to run on occasion just to verify recovery would succeed for a given database or tablespace. If this is left in its current state, a restore and recover of the tablespace from the current backup will not restore the index (which should be obvious given the NOLOGGING creation of it). Any backup of the database or tablespace prior to the NOLOGGING changes will essentially be useless. [Such changes should be limited to the Development database and developers who perform NOLOGGING operations should discuss this with the DBA prior to implementing such a strategy so that objects and/or data can be recovered if need be. On the other hand, the developers should have such tasks scripted so they can be repeated at will which may make the life of the DBA easier.] Thinking the ‘unthinkable’ the developer’s code was migrated to production ‘as-is’, leaving the NOLOGGING directive intact. Luckily, she noticed this and notified the DBA so a current tablespace backup could be taken:


RMAN> @fnorg_yerg_hot_backup_rman.rmn

RMAN> run {
2> allocate channel d1 type disk;
3> backup
4> tablespace yerg
5> format 'c:backupsfnorgfnorg_yerg_hot_%t_%s_%p';
6> }
allocated channel: d1
channel d1: SID=246 device type=DISK

Starting backup at 13-APR-17
channel d1: starting full datafile backup set
channel d1: specifying datafile(s) in backup set
input datafile file number=00007 name=C:APPDNS-DDFORADATAFNORGYERG01.DBF
channel d1: starting piece 1 at 13-APR-17
channel d1: finished piece 1 at 13-APR-17
piece handle=C:BACKUPSFNORGFNORG_YERG_HOT_941197231_4_1 tag=TAG20170413T114031 comment=NONE
channel d1: backup set complete, elapsed time: 00:00:03
Finished backup at 13-APR-17
released channel: d1

RMAN>
RMAN> **end-of-file**

Checking again for unrecoverable objects you get a different report:


RMAN> report unrecoverable;

Report of files that need backup due to unrecoverable operations
File Type of Backup Required Name
---- ----------------------- -----------------------------------

RMAN>

NOLOGGING operations are intentional, as the NOLOGGING keyword needs to be added to any DDL statement executed in the database. There is no user-accessible setting that will put the entire database in NOLOGGING mode. It should not be left to the DBA to discover such actions so corrective measures can be taken, and it certainly should not be acceptable practice to execute DDL in production databases NOLOGGING.

Yes, sometimes the unthinkable happens, but it should be a rare (or, ideally, non-existent) occurrence in a properly managed production environment. Being aware that such possibilities exist puts the DBA in a better position to ensure the database is properly backed up so recovery will not encounter issues should that need arise. Forewarned is forearmed, it is said, and that’s very true in situations like this; since the primary responsibility of the DBA is backup and recovery above all else it might be a good idea to add a check for unrecoverable objects and datafiles, just to be safe. You never know when the unimaginable may happen.

↧

Introducing the MemSQL Kubernetes Operator

May 8, 2019, 5:59 am

≫ Next: Troubleshooting Data Differences in a MySQL Database Cluster

≪ Previous: The Horror of NOLOGGING in Oracle

Feed: MemSQL Blog.
Author: Carl Sverre.

Kubernetes has taken the world by storm, transforming how applications are developed, deployed, and maintained. For a time, managing stateful services with Kubernetes was difficult, but that has changed dramatically with recent innovations in the community. Building on that work, MemSQL is pleased to announce the availability of our MemSQL Kubernetes Operator, and our certification by Red Hat to run on the popular OpenShift container management platform.

Kubernetes has quickly become one of the top three most-loved platforms by developers. Now, with the MemSQL Kubernetes Operator, technology professionals have an easy way to deploy and manage an enterprise-grade operational database with just a few commands.

The new Operator is certified by Red Hat to run MemSQL software on Red Hat OpenShift, or you can run it with any Kubernetes distribution you choose. Running MemSQL on Kubernetes gives data professionals the highest level of deployment flexibility across hybrid, multi-cloud, or on-premises environments. As Julio Tapia, director of the Cloud Platforms Partners Ecosystem for Red Hat, put it in our press release, services in a Kubernetes-native infrastructure “‘just work’ across any cloud where Kubernetes runs.”

As a cloud-native database, MemSQL is a natural fit for Kubernetes. MemSQL is a fully distributed database, deploys and scales instantly, and is configured quickly and easily using the native MemSQL API. MemSQL customers have requested the Kubernetes Operator, and several participated in testing prior to this release.

The majority of MemSQL customers today deploy MemSQL on one or more public cloud providers. Now, with the Kubernetes Operator, they can deploy on any public or private infrastructure more easily.

How to Use The MemSQL Kubernetes Operator

You use the MemSQL Kubernetes Operator like other standard Kubernetes tools. Use the Kubernetes command-line interface (CLI) and the Kubernetes API to interact with the Operator and manage the application. The task of managing the cluster is greatly simplified. DevOps and administration teams can also use the Operator to implement partial or full automation.

The Operator enables you to create, read, update, and delete MemSQL clusters. Among the options you specify (see here for details):

The cluster size. Cluster size is defined in units, where one unit of cluster size is equal to one leaf node.
The memory and CPU assigned. This is defined as height, where one unit of height equals 8 vCPUs and 32GB of RAM.
The redundancy level. Level 1 is no redundancy, level 2 is one redundant copy (recommended for production use).
The storage size. How much disk space you want to reserve.

Because Kubernetes is a declarative, rather than imperative, environment, you describe the state of the cluster that you want Kubernetes to create and maintain. Kubernetes then maintains that state for you. The commands and operations are the same across all the major public clouds, private clouds, and on-premises installations as well.

The minimum cluster size you should specify is a single leaf unit with height 1, three aggregator units (automatically created, with height 0.5), and redundancy level 2. When you create the cluster, a DDL endpoint is returned to you. You connect to the cluster using the DDL endpoint.

The MemSQL Kubernetes Operator does not currently include the ability to split and merge partitions. You will need to perform this function manually, outside of Kubernetes. We expect to include partition management in a future release.

Next Steps

If you’re already a MemSQL customer, you can begin using the Kubernetes Operator today. Access it here.

If you are not already a customer, you’ll find MemSQL a great fit for a wide range of operational analytics use cases. Try MemSQL for free today or contact us to learn how we can help you.

↧

Troubleshooting Data Differences in a MySQL Database Cluster

May 9, 2019, 9:40 am

≫ Next: Small Files, Big Foils: Addressing the Associated Metadata and Application Challenges

≪ Previous: Introducing the MemSQL Kubernetes Operator

Feed: Planet MySQL
;
Author: Continuent
;

Overview

The Skinny

From time to time we are asked how to check whether or not there are data discrepancies between Master/Slave nodes within a MySQL (or MariaDB) cluster that’s managed with Tungsten Clustering. This is always a challenging task, not least because we hope and believe that our replication mechanism would avoid such occurrences, that said there can be factors outside of our control that can appear to “corrupt” data – such as inadvertent execution of DML against a slave using a root level user account.

Tungsten Replicator, the core replication component in our Tungsten Clustering solution for MySQL (& MariaDB), is just that, a replicator – it takes transactions from the binary logs and replicates them around. The replicator isn’t a data synchronisation tool in that respect, the replicator won’t/can’t compare tables – this is by design and is one of the many benefits in the product that avoids a) us being tightly coupled to the database and b) avoids the inherent performance impact of what could be incredibly resource consuming processes, add to that the complications of how to confidently complete such checks in extremely active environments.

Agenda

What’s Here?

The following steps walk through the recommended methods for troubleshooting based on a 3-node cluster using MySQL 5.7 community edition, managed by Tungsten Clustering 6.0

What’s Not Here?

There are a number of tools that can help identify and fix data drift, and even structural differences, however a lot of them assume native MySQL replication is in place and therefore just the default usage of such products within a Tungsten Clustering environment can cause further issues, and may even fail completely.

In this blog post, I am not going to cover the best practices to avoid data drift nor the rights and wrongs of it, what I will cover is how to utilise existing third-party tools designed for doing such tasks.

Identify Structural Differences

Simple, Yet Effective

If you suspect that there are differences to a table structure, a simple method to resolve this will be to compare schema DDL, mysqldump offers an easy and fast way to extract DDL without row data, then using simple OS commands we can identify such differences.

Extract DDL on the Master node, specifying the schema in place of :

 mysqldump -u root -p –no-data -h localhost –databases >master.sql

mysqldump –u root –p —no–data –h localhost —databases >master.sql
Repeat the same on the Slave node(s):

 mysqldump -u root -p –no-data -h localhost –databases >slave.sql

mysqldump –u root –p —no–data –h localhost —databases >slave.sql
Now, using diff, you can compare the results:

 diff master.sql slave.sql

diff master.sql slave.sql

Using the output of diff, you can then craft the necessary DDL statements to re-align your structure

Identify Data Differences

The Real Challenge

The first challenge when looking at data differences is that in busy environments, and especially if you are running a Multi-Site Multi-Master (pre v6) or Composite Multi-Master (v6) Topology, then you may well be presented with false positives due to the constant changing environment.

It is possible to use pt-table-checksum from the Percona Toolkit to identify data differences, providing you use the syntax described below for bypassing the native replication checks.

First of all, it is advisable to familiarise yourself with the product by reading through the providers own documentation here:
https://www.percona.com/doc/percona-toolkit/

Once you are ready, ensure you install the latest version of the persona toolkit on all nodes, or at least ensure the version you install is compatible with your release of MySQL.

Next, execute the following on the Master node:

shell> pt-table-checksum –set-vars innodb_lock_wait_timeout=500 –recursion-method=none –ignore-databases=mysql –ignore-databases-regex=tungsten* h=localhost,u=tungsten,p=secret

shell> pt–table–checksum —set–vars innodb_lock_wait_timeout=500 —recursion–method=none —ignore–databases=mysql —ignore–databases–regex=tungsten* h=localhost,u=tungsten,p=secret

It is important to include the ignore-database options – we do not want to compare the mysql schema, nor do we want to compare any tungsten tracking schemas.

You can add additional schemas to these options if necessary within your environment.

On first run, this will create a database called percona, and within that database a table called checksums. The process will gather checksum information on every table in every database excluding any listed using the ignore options mentioned previously. The tables and the processes will replicate through Tungsten Replicator and therefore you can now query these tables on the slave nodes, the following is an example SELECT that you can use:

SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;

SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks

FROM percona.checksums

WHERE (

master_cnt <> this_cnt

OR master_crc <> this_crc

OR ISNULL(master_crc) <> ISNULL(this_crc))

GROUP BY db, tbl;

This SELECT will return any tables that it detects are different, it won’t show you the differences, or indeed how many, this is just a basic check.

To identify and fix the changes, you could use then use pt-table-sync (Also within the Percona Toolkit), however this product would by default assume native replication and also try and fix the problems for you. This assumption is unavoidable, therefore within a Tungsten Clustering environment we need to supply the --print switch. This won’t execute the statement but will instead, display them on the screen (Or redirect to a file) and then from here you can gather the SQL needed to be executed to fix the mistakes and process this manually.

The output should be reviewed carefully to determine whether you want to manually patch the data, if there are significant differences, then you may need to consider using tungsten_provision_slave to reprovision a node instead.
To use pt-table-sync, first identify the tables with differences on each slave, in this example, the SELECT statement above identified that there was a data difference on the departments table within the employees database on db2. Execute the pt-table-sync script on the master, passing in the database name, table name and the slave host that the difference exists on:

shell> pt-table-sync –databases employees –tables departments –print h=db1,u=tungsten,p=secret,P=13306 h=db2

shell> pt–table–sync —databases employees —tables departments —print h=db1,u=tungsten,p=secret,P=13306 h=db2

The first h= option should be the Master (also the node you run the script from) the second h= option relates to the slave that the difference exist on.

Executing the script will output SQL statements that can be used to patch the data, for example the above statement produces the following output:

UPDATE `employees`.`departments` SET `dept_name`=’Sales’ WHERE `dept_no`=’d007′ LIMIT 1 /*percona-toolkit src_db:employees src_tbl:departments src_dsn:P=13306,h=db1,p=…,u=tungsten dst_db:employees dst_tbl:departments dst_dsn:P=13306,h=db2,p=…,u=tungsten lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:24524 user:tungsten host:db1*/;

UPDATE `employees`.`departments` SET `dept_name`=‘Sales’ WHERE `dept_no`=‘d007’ LIMIT 1 /*percona–toolkit src_db:employees src_tbl:departments src_dsn:P=13306,h=db1,p=...,u=tungsten dst_db:employees dst_tbl:departments dst_dsn:P=13306,h=db2,p=...,u=tungsten lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:24524 user:tungsten host:db1*/;

The UPDATE staments could now be issued directly on the slave to correct the problem.

Warning

Remember, at the start I mentioned one way data drift can happen is due to the inadvertent execution of DML on a slave, which is highly unrecommended, however in the following examples I contradict myself and suggest the only way to fix the data is to actually do just that. Care should be taken, and ALWAYS ensure you have a FULL backup, it would be recommended to place the cluster into MAINTENANCE mode and shun the slave node before making any changes so as not to cause any potential interruption to connected clients!

Summary

The Wrap-Up

In this blog post we discussed how to check whether or not there are data discrepancies between Master/Slave nodes within a cluster.

To learn about Continuent solutions in general, check out https://www.continuent.com/solutions

The Library

Please read the docs!

For more information about troubleshooting data differences in Tungsten clusters, please visit http://docs.continuent.com/tungsten-clustering-6.0/troubleshooting-data.html.

Tungsten Clustering is the most flexible, performant global database layer available today – use it underlying your SaaS offering as a strong base upon which to grow your worldwide business!

For more information, please visit https://www.continuent.com/solutions

Want to learn more or run a POC? Contact us.

↧

Small Files, Big Foils: Addressing the Associated Metadata and Application Challenges

May 9, 2019, 3:08 pm

≫ Next: AWS Glue crawlers now support existing Data Catalog tables as sources

≪ Previous: Troubleshooting Data Differences in a MySQL Database Cluster

Feed: Hadoop – Cloudera Engineering Blog.
Author: Shelby Khan.

Small files are a common challenge in the Apache Hadoop world and when not handled with care, they can lead to a number of complications. The Apache Hadoop Distributed File System (HDFS) was developed to store and process large data sets over the range of terabytes and petabytes. However, HDFS stores small files inefficiently, leading to inefficient Namenode memory utilization and RPC calls, block scanning throughput degradation, and reduced application layer performance. In this blog post, we will define the issue of small file storage and examine ways to tackle it while keeping the complications at bay.
What are Small Files?

A small file is one which is significantly smaller than the default Apache Hadoop HDFS default block size (128MB by default in CDH). One should note that it is expected and inevitable to have some small files on HDFS. These are files like library jars, XML configuration files, temporary staging files, and so on. But when small files become a significant part of datasets, the problems arise. Hence, in this section, we shall discuss why it is a good goal to have a file size as close to a multiple of the HDFS block size as possible.

Hadoop’s storage and application layers are not designed to function efficiently with a large number of small files. Before we get to the implications of this, let’s review how HDFS stores files.

In HDFS, data and metadata are separate entities. Files are split into blocks that are stored and replicated on the DataNodes’ local file systems across the cluster. The HDFS namespace tree and associated metadata are maintained as objects in the NameNode’s memory (and backed up to disk), each of which occupies approximately 150 bytes, as a rule of thumb. This arrangement is described in more detail in the public documentation here.

The two scenarios below illustrate the small files issue:

Scenario 1 (1 large file of 192MiB):

Scenario 2 (192 small files, 1MiB each):

Scenario 1 has one file which is 192MB which is broken down to 2 blocks of size 128MB and 64MB. After replication, the total memory required to store the metadata of a file is = 150 bytes x (1 file inode + (No. of blocks x Replication Factor)).

According to this calculation, the total memory required to store the metadata of this file on the Namenode = 150 x (1 + (2 x 3)) = 1050 Bytes.

In contrast, scenario 2 has 192 1 MB files. These files are then replicated across the cluster. The total memory required by the Namenode to store the metadata of these files = 150 x (192 + (192 x 3)) = 115200 Bytes.

Hence, we can see that we require more than 100x memory on the Namenode heap to store the multiple small files as opposed to one big 192MB file.

Effects on the Storage Layer

When a NameNode restarts, it must load the filesystem metadata from local disk into memory. This means that if the namenode metadata is large, restarts will be slower. The NameNode must also track changes in the block locations on the cluster. Too many small files can also cause the NameNode to run out of metadata space in memory before the DataNodes run out of data space on disk. The datanodes also report block changes to the NameNode over the network; more blocks means more changes to report over the network.

More files mean more read requests that need to be served by the NameNode, which may end up clogging NameNode’s capacity to do so. This will increase the RPC queue and processing latency, which will then lead to degraded performance and responsiveness. An overall RPC workload of close to 40K~50K RPCs/s is considered high.

Effects on Application Layer

In general, having a large number of small files results in more disk seeks while running computations through an analytical SQL engine like Impala or an application framework like MapReduce or Spark.

MapReduce/Spark

In Hadoop, a block is the most granular unit of data on which computation can be performed. Thus, it affects the throughput of an application. In MapReduce, an individual Map task is spawned for each block that must be read. Hence, a block with very little data can degrade performance, increase Application Master bookkeeping, task scheduling, and task creation overhead since each task requires its own JVM process.

This concept is similar for Spark, in which each “map” equivalent task within an executor reads and processes one partition at a time. Each partition is one HDFS block by default. Hence, a single concurrent task can run for every partition in a Spark RDD. This means that if you have a lot of small files, each file is read in a different partition and this will cause a substantial task scheduling overhead compounded by lower throughput per CPU core.

MapReduce jobs also create 0 byte files such as _SUCCESS and _FAILURE. These files do not account for any HDFS blocks but they still register as an inode entry in the Namenode heap which uses 150 bytes each as described earlier. An easy and effective way to clear these files is by using the below HDFS command:

<br>
hdfs dfs -ls -R <path> | awk ‘$1 !~ /^d/ && $5 == “0” { print $8 }’ | xargs -n100 hdfs dfs –rm</path>

hdfs dfs –ls –R <path> | awk ‘$1 !~ /^d/ && $5 == “0” { print $8 }’ | xargs –n100 hdfs dfs –rm

This will move those files to the .Trash location from where it will be cleared out automatically once the trash retention policy takes effect.

Note: This should not be done while your workloads are running on the specified path since it may cause applications to fail if they have dependencies on these files to know when the jobs complete or fail.

Impala—Effects on the Catalog Server

Impala is a high-performance query engine, which caches the HDFS namespace information in the Catalog Server for faster metadata access. Below is an architecture diagram detailing the way the Impala catalog is maintained and distributed across the service.

As seen with complications around NameNode metadata management, a similar issue arises with the metadata that Impala needs to maintain in the Catalog Server. The catalog size is a function of the number and size of objects maintained in the Catalog Server. These objects with their estimated average memory usage are described in the table below:

Object	Memory Usage
Table	5KB
Partition	2KB
Column	100B
Incremental Stats	400B* (per column per partition)
File	750B
File Block	300B

*Can go as high as 1.4KB/Column/Partition

Example: If there are 1000 tables with 200 partitions each and 10 files per partitions, the Impala Catalog Size will be at least (excluding table stats and table width):

<br>
#tables * 5KB + #partitions * 2kb + #files * 750B + #file_blocks * 300B = 5MB + 400MB + 1.5GB + 600MB = ~ <strong>2.5GB</strong>

#tables * 5KB + #partitions * 2kb + #files * 750B + #file_blocks * 300B = 5MB + 400MB + 1.5GB + 600MB = ~ 2.5GB

The larger the Impala Catalog Size the higher its memory footprint. Large metadata in the HMS for Hive/Impala is not advised as it needs to keep track of more files, causing:

Longer Metadata loading time
Longer StateStore topic update time
Slow DDL statement operations
Longer query plan distribution time

In addition to the issues related to the metadata, each disk read is single threaded by default in Impala which can cause a significant overhead in I/O with small files. Further, if the table is stored in the parquet file format, each physical file needs to be opened/closed twice; that is, once for the read footer and again for the column data.

How Do Small Files Originate?

Let us discuss some of the common mistakes that may give birth to insidious small files.

Streaming Data

Data ingested incrementally and in small batches can end up creating a large number of small files over a period of time. Near-real-time requirements for streaming data, with small windows (every few minutes or hours) that do not create much data will cause this problem. Below is a typical streaming ETL ingest pipeline into HDFS.

Large Number of Mappers/Reducers

MapReduce jobs and Hive queries with large number of mappers or reducers can generate a number of files on HDFS proportional to the number of mappers (for Map-Only jobs) or reducers (for MapReduce jobs). Large number of reducers with not enough data being written to HDFS will dilute the result set to files that are small, because each reducer writes one file. Along the same lines, data skew can have a similar effect in which most of the data is routed to one or a few reducers, leaving the other reducers with little data to write, resulting in small files.

Over-Partitioned Tables

An over-partitioned table is a partitioned Hive table with a small amount of data (< 256 MB) per partition. The Hive Metastore Server (HMS) API call overhead increases with the number of partitions that a table maintains. This in return leads to deteriorated performance. In these cases, consider reviewing the partition design and reducing the partition granularity, for example from daily to monthly partitions.

Over-Parallelizing

In a Spark job, depending on the number of partitions mentioned in a write task, a new file gets written per partition. This is similar to having a new file getting created for each reduce task in the MapReduce framework. The more Spark partitions, the more files are written. Control the number of partitions to curb the generation of small files.

File Formats and Compression

Using of inefficient file formats, for example TextFile format and storing data without compression compounds the small file issue, affecting performance and scalability in different ways:

Reading data from very wide tables (tables with a large number of columns) stored as non-columnar formats (TextFile, SequenceFile, Avro) requires that each record be completely read from disk, even if only a few columns are required. Columnar formats, like Parquet, allow the reading of only the required columns from disk, which can significantly improve performance
Use of inefficient file formats, especially uncompressed ones, increases the HDFS space usage and the number of blocks that need to be tracked by the NameNode. If the files are small in size, it means the data is split into a larger number of files thereby increasing the amount of associated metadata to be stored.

Identifying Small Files

FSImage and fsck

Because the NameNode stores all the metadata related to the files, it keeps the entire namespace image in RAM. This is the persistent record of the image stored in the NameNode’s local native filesystem – fsimage. Thus we can analyze the fsimage or the fsck output to identify paths with small files.

The fields available in the fsimage are:

<br>
Path, Replication, ModificationTime, AccessTime, PreferredBlockSize, BlocksCount, FileSize, NSQUOTA, DSQUOTA, Permission, UserName, GroupName

Path, Replication, ModificationTime, AccessTime, PreferredBlockSize, BlocksCount, FileSize, NSQUOTA, DSQUOTA, Permission, UserName, GroupName

The fsimage can be processed in an application framework like MapReduce or Spark and even loaded into a Hive table for easy SQL access.

Another approach is using the fsck output and parsing that to load it into a Hive table for analysis. There are a few variants of this approach; here is a public project that uses PySpark and Hive to achieve this. It aggregates the total number of blocks, average block size and total file size at each HDFS path which can then be queried in Hive or Impala.

Cloudera Navigator

Cloudera Navigator is a data governance product with audit, lineage, metadata management, data stewardship and policy enforcement features.

The Navigator Search and Analytics tabs can be used to identify small files easily. The HDFS search filters in the left panel allows to filter for files under a specific size or range. The new version of Cloudera Navigator (2.14.x) even has an in-built Dashboard widget to identify small files as shown below.

Ways to Tackle Small Files

Preventative

Streaming Ingest Use-Case

As mentioned earlier, ingesting streaming data usually leads to creating small files. Tweaking the rate of ingest, window, or dstream size (Spark) can help alleviate some of the issues. But usually to meet near-real-time analytics demands, some architectural changes need to be introduced in the HDFS ingestion pipeline with respect to intermediate compaction jobs, maintaining multiple landing directories, and active/passive versions of table data. This is discussed in more detail in this Cloudera Engineering blog.

For near-real-time analytical processing, HBase and Kudu are better choices for storage layers, based on the data type (unstructured vs structured), append/update frequency and data usage patterns (random reads vs aggregations).

Batch Ingest Use-Case

For batch ingest pipelines, a good choice is a regularly scheduled compaction job, which compacts files after landing into HDFS. The file compaction tools mentioned later in this blog would be good candidates for this.

Over-Partitioned Tables

We should aim to have partitions with a significant volume of data so that the files within each partition are large. While deciding on the granularity of the partitions, consider the volume of data that will be stored per partition. Plan for partitions that have large files (~256MB or larger with Parquet), even if it means having less granular partitions, such as monthly instead of daily. For example, keeping the number of partitions within 10K-30K during the lifetime of a table is a good guideline to follow.

For tables that have small data volumes (few hundred MBs), consider creating a non-partitioned table. It can be more efficient to scan all the (small) table’s data stored in a single file than having to deal with thousands of files scattered throughout multiple partitions with tiny number of bytes.

Creating buckets for your table can also reduce the number of small files by essentially fixing the number of reducers and output files generated.

Spark Over-Parallelizing

When writing data to HDFS in Spark, repartition or coalesce the partitions before writing to disk. The number of partitions defined in those statements will determine the number of output files. Checking the output of the Spark Job and verifying the number of files created and throughput achieved is highly recommended.

Prescriptive

HDFS File Compaction Tools

The most obvious solution to small files is to run a file compaction job that rewrites the files into larger files in HDFS. A popular tool for this is FileCrush. There are also other public projects available such as the Spark compaction tool.

Re-Create Table in Hive

To ensure a good balance between performance and efficient storage, create tables using the PARQUET file format and ensure that data compression is enabled when writing data to them.

If you have an existing Hive table that has a large number of small files, you can re-write the table with the below configuration settings applied before re-writing:

<br>
set hive.exec.compress.output=true;
<p>set hive.exec.parallel = true;</p>
<p>set parquet.compression=snappy;</p>
<p>set hive.merge.mapfiles=true;</p>
<p>set hive.merge.mapredfiles=true;</p>
<p>set hive.merge.smallfiles.avgsize = 134217728;        –128M</p>
<p>set hive.merge.size.per.task = 268435456;             –256M</p>
<p>set hive.optimize.sort.dynamic.partition = true;</p>
<p>set parquet.blocksize= 268435456;                     –256M</p>
<p>set dfs.block.size=268435456;                         –256M</p>
<p><strong> </strong></p>

set hive.exec.compress.output=true;

set hive.exec.parallel = true;

set parquet.compression=snappy;

set hive.merge.mapfiles=true;

set hive.merge.mapredfiles=true;

set hive.merge.smallfiles.avgsize = 134217728; —128M

set hive.merge.size.per.task = 268435456; —256M

set hive.optimize.sort.dynamic.partition = true;

set parquet.blocksize= 268435456; —256M

set dfs.block.size=268435456; —256M

Note: The average size and parquet block sizes specified here are for representation purposes only and should be changed based on the application and needs. Details on the Hive configuration properties can be found on the official Apache Hive page.

There are two ways to do this:

You can run a CREATE TABLE AS SELECT (CTAS) statement to create the target table, as long as the target table is not partitioned, is not external, and is not bucketed.
To overcome those limitations, instead of a direct CTAS, you can run a CREATE TABLE LIKE (CTL) statement to copy the source table schema to create the target table and then use an INSERT OVERWRITE SELECT statement to load the data from the source table to the target table.
Note: you will need to enable non-strict dynamic partition mode in Hive if the data is being inserted without a static partition name defined. This can be done by setting

 hive.exec.dynamic.partition.mode=nonstrict

hive.exec.dynamic.partition.mode=nonstrict

The partition column(s) must be the last column(s) in the select statement for dynamic partitions to work in this context.

Consider the following simplified example:

<br>
create external table target_tbl like source_tbl
<p>stored as parquet</p>
<p>location <hdfs_path>‘;</hdfs_path></p>
<p>set hive.exec.dynamic.partition.mode=nonstrict;</p>
<p>insert overwrite table target_tbl partition (partition_col)</p>
<p>select * from source_tbl;</p>

create external table target_tbl like source_tbl

stored as parquet

location <hdfs_path>‘;

set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table target_tbl partition (partition_col)

select * from source_tbl;

Similar CTAS can be executed in Impala as well, but if the query runs with multiple fragments on different nodes you will get one file per fragment. To avoid this, you could restrict Impala to run the query on a single node using set num_nodes=1 but this approach is not recommended since it removes parallelism and causes slow inserts, degrading the performance, and could cause the daemon to run out of memory if writing a large table.

Additionally, the number of reducers can be configured directly as well using the mapred.reduce.tasks setting. The number of files created will be equal to the number of reducers used. Setting an optimal reducer value depends on the volume of the data being written.

Conclusion

Prevention is better than cure. Hence, it is critical to review application design and catch users in the act of creating small files. Having a reasonable number of small files might be acceptable, but too many of them can be detrimental to your cluster. Eventually leading to irritation, tears, and extended hours at work. Therefore, Happy Cluster, Happy Life!

Have any questions or want to connect with other users? Check out the Cloudera Community

Shashank Naik is a Senior Solutions Consultant at Cloudera.
Bhagya Gummalla is a Solutions Consultant at Cloudera.

↧

AWS Glue crawlers now support existing Data Catalog tables as sources

May 14, 2019, 3:35 am

≫ Next: Flashback Recovery in MariaDB/MySQL Servers

≪ Previous: Small Files, Big Foils: Addressing the Associated Metadata and Application Challenges

Feed: Recent Announcements.

With this release, crawlers can now take existing tables as sources, detect changes to their schema and update the table definitions, and register new partitions as new data becomes available. This is useful if you want to import existing table definitions from an external Apache Hive Metastore into the AWS Glue Data Catalog and use crawlers to keep these tables up-to-date as your data changes. You can also use this feature if you are creating new table definitions using AWS Glue APIs or Apache Hive DDL statements and want to use crawlers to update the tables going forward.

↧

Flashback Recovery in MariaDB/MySQL Servers

May 21, 2019, 7:07 pm

≫ Next: Automated Refactoring of a New York Times Mainframe to AWS with Modern Systems

≪ Previous: AWS Glue crawlers now support existing Data Catalog tables as sources

Feed: Planet MySQL
;
Author: MyDBOPS
;

In this blog, we will see how to do a flashback recovery or rolling back the data in MariaDB, MySQL and Percona.

As we know the saying “All humans make mistakes”, following that in Database environment the data modified accidentally can bring havoc to any organisations.

Recover the lost data

The data can be recovered from the latest full backup or incremental backup when data size is huge it could take hours to restore it.
From backup of Binlogs.
Data can also be recovered from delayed slaves, this case would be helpful when the mistake is found immediately, within the period of delay.

We can use anyone of the above ways or other that can help to recover the lost data, but what really matters is, What is the time taken to rollback or recover the data? and How much downtime was taken to get back to the initial state ?

To overcome this disaster mysqlbinlog (MariaDB 10.2) has a very useful option i.e –flashback that comes along with binary of MariaDB server 10.2 linux ,debian and ubuntu ,though it comes with MariaDB server, it works well with Oracle Mysql servers and Percona flavour of MySQL.

What is Flashback?

Restoring back the data to the previous snapshot in a MySQL database or in a table is called Flashback.

Flashback options help us to undo the executed row changes(DML events).

For instance, it can change DELETE events to INSERTs and vice versa, and also it will swap WHERE and SET parts of the UPDATE events.

Prerequisites for using flashback :

binlog_format = ROW
binlog_row_image = FULL

Flash back uses the mysqlbinlog to create the rollback statements and it needs a FULL image (Minimal is not supported). Let us simulate a few test cases where flashback comes as a boon for recovering data.

For simulating the test cases I am using employees table and Mariadb version 10.2

MariaDB [employees]> select @@version;
+---------------------+
| @@version           |
+---------------------+
| 10.2.23-MariaDB-log |
+---------------------+
1 row in set (0.02 sec)

Table structure :

MariaDB [employees]> show create table employeesG
*************************** 1. row ***************************
       Table: employees
Create Table: CREATE TABLE `employees` (
  `emp_no` int(11) NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL,
  PRIMARY KEY (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Case 1: Rollback the Deleted data.

Consider the data is deleted was from employees table where first_name =’Chirstian’ .

MariaDB [employees]> select COUNT(*) from employees where first_name ='Chirstian';
+----------+
| COUNT(*) |
+----------+
|      226 |
+----------+
1 row in set (0.07 sec)

MariaDB [employees]> delete from employees where first_name ='Chirstian';
Query OK, 226 rows affected (0.15 sec)

To revert the data to the previous state ,we need to decode the binlog and fetch the right start and stop position of the delete event happened on employees table. Start position should be taken exactly after BEGIN and Stop position is before the final COMMIT.

[root@vm3 vagrant]# mysqlbinlog -v --base64-output=DECODE-ROWS /var/lib/mysql/mysql-bin.000007 > mysql-bin.000007.txt

BEGIN
/*!*/;
# at 427
# at 501
#190417 17:49:49 server id 1  end_log_pos 501 CRC32 0xc7f1c84b  Annotate_rows:
#Q> delete from employees where first_name ='Chirstian'
#190417 17:49:49 server id 1  end_log_pos 569 CRC32 0x6b1b5c98  Table_map: `employees`.`employees` mapped to number 29
# at 569
#190417 17:49:49 server id 1  end_log_pos 7401 CRC32 0x6795a972         Delete_rows: table id 29 flags: STMT_END_F
### DELETE FROM `employees`.`employees`
### WHERE
###   @1=10004
###   @2='1954:05:01'
###   @3='Chirstian'
###   @4='Koblick'
###   @5=1
###   @6='1986:12:01'
# at 23733
#190417 17:49:49 server id 1  end_log_pos 23764 CRC32 0xf9ed5c3e        Xid = 455
### DELETE FROM `employees`.`employees`
### WHERE
### @1=498513
### @2='1964:10:01'
### @3='Chirstian'
### @4='Mahmud'
### @5=1
### @6='1992:06:03'
# at 7401
COMMIT/*!*/;
# at 23764
#190417 17:49:49 server id 1  end_log_pos 23811 CRC32 0x60dfac86        Rotate to mysql-bin.000008  pos: 4
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;

Once the count is verified the from the taken positions, we can prepare rollback statements as a sql file using flashback as below

[root@vm3 vagrant]# mysqlbinlog  -v --flashback --start-position=427 --stop-position=7401 /var/lib/mysql/mysql-bin.000007  > insert.sql

Below is the comparison of conversion from Delete to Insert for a single record:

### DELETE FROM `employees`.`employees`
### WHERE
### @1=498513
### @2='1964:10:01'
### @3='Chirstian'
### @4='Mahmud'
### @5=1
### @6='1992:06:03'

### INSERT INTO `employees`.`employees`
### SET
### @1=498513
### @2='1964:10:01'
### @3='Chirstian'
### @4='Mahmud'
### @5=1
### @6='1992:06:03'

MariaDB [employees]> source insert.sql
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

And the count is verified after the data load.

MariaDB [employees]> select COUNT(*) from employees where first_name ='Chirstian';
+----------+
| COUNT(*) |
+----------+
|      226 |
+----------+
1 row in set (0.06 sec)

Case 2 : Rollbacking the Updated data.

The data was updated based on below conditions

MariaDB [employees]> select COUNT(*) from employees where first_name ='Chirstian' and gender='M';
+----------+
| COUNT(*) |
+----------+
|      129 |
+----------+
1 row in set (0.14 sec)

MariaDB [employees]> update employees set gender='F' where first_name ='Chirstian' and gender='M';
Query OK, 129 rows affected (0.16 sec)
Rows matched: 129  Changed: 129  Warnings: 0

MariaDB [employees]> select COUNT(*) from employees where first_name ='Chirstian' and gender='M';
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (0.07 sec)

To rollback the updated data, the same steps to be followed as in case 1.

[root@vm3 vagrant]# mysqlbinlog -v --flashback --start-position=427 --stop-position=8380 /var/lib/mysql/mysql-bin.000008 > update.sql


MariaDB [employees]> source update.sql
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)

MariaDB [employees]> select COUNT(*) from employees where first_name ='Chirstian' and gender='M';
+----------+
| COUNT(*) |
+----------+
|      129 |
+----------+
1 row in set (0.06 sec)

In the above two cases by using flashback option we were able to change Event Type statements from DELETE to INSERT and Update_Event statements by Swapping the SET part and WHERE part.

Rollbacking data based on Time

There may be chance where DBA’s get the request from team to rollback the deleted /updated data. In those cases DBA flashback can be used with –start-datetime and –stop-datetime options.

Let us consider we get an information that data was deleted at approximate datetime 2019-05-17 4:15:00 and below query was executed.

MariaDB [employees]> delete from employees where emp_no between 10101 and 10210;
Query OK, 110 rows affected (0.00 sec)

MariaDB [employees]> select count(*) from employees where emp_no between 10101 and 10210;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)

By Analysing available binary logs , we can fetch the required binlog for the provided datetime.

-rw-rw----. 1 mysql mysql 1.1G May 16 03:11 mysql-bin.000025
-rw-rw----. 1 mysql mysql 1.1G May 16 03:36 mysql-bin.000026
-rw-rw----. 1 mysql mysql 1.1G May 16 04:02 mysql-bin.000027
-rw-rw----. 1 mysql mysql 1.1G May 16 05:10 mysql-bin.000028
-rw-rw----. 1 mysql mysql 1.1G May 16 05:32 mysql-bin.000028

To recover the data based on datetime, we should use date and time local timezone.

mysql-bin.000027 is the binlog which is nearest time to 4:15:00

snap from binlog

BEGIN
/*!*/;
# at 662711655
# at 662711736
#190516  4:39:42 server id 1  end_log_pos 662711736 CRC32 0xcd7cd191    Annotate_rows:
#Q> delete from employees where emp_no between 10101 and 10210
#190516  4:39:42 server id 1  end_log_pos 662711804 CRC32 0x77719f68    Table_map: `employees`.`employees` mapped to number 29
# at 662711804
#190516  4:39:42 server id 1  end_log_pos 662714885 CRC32 0xde765d28    Delete_rows: table id 29 flags: STMT_END_F
### DELETE FROM `employees`.`employees`
### WHERE
###   @1=10101
###   @2='1952:04:15'
###   @3='Perla'
###   @4='Heyers'
###   @5=2
###   @6='1992:12:28'
### DELETE FROM `employees`.`employees`
### WHERE
###   @1=10102
###   @2='1959:11:04'
###   @3='Paraskevi'
###   @4='Luby'
###   @5=2
###   @6='1994:01:26'
### DELETE FROM `employees`.`employees`
### WHERE
###   @1=10210
###   @2='1958:01:24'
###   @3='Yuping'
###   @4='Alpin'
###   @5=1
###   @6='1994:05:10'
# at 662714885
#190517  4:39:42 server id 1  end_log_pos 662714916 CRC32 0xf9ba3c0a    Xid = 2860541
COMMIT/*!*/;
# at 662714916

The from the binlog we can see actual delete was happened at 4:39:42

[root@vm3]# mysqlbinlog --flashback --start-datetime="2019-05-17 04:39:00"  /var/lib/mysql/mysql-bin.000027  -v --database=employees --table=employees  > insert.sql

MariaDB [employees]> source insert.sql
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)

MariaDB [employees]> select count(*) from employees where emp_no between 10101 and 10210;
+----------+
| count(*) |
+----------+
|      110 |
+----------+
1 row in set (0.04 sec)

Using Flashback in MySQL Community/Percona server.

To check the compatibility of mysqlbinlog tool with Mysql/Percona variants , i have also tested the with MySQL version 5.7.24 .

mysql> select @@version;
+---------------+
| @@version     |
+---------------+
| 5.7.24-27-log |
+---------------+
1 row in set (0.00 sec)

mysql> select count(*) from employees where emp_no between 10001 and 10110;
+----------+
| count(*) |
+----------+
|      110 |
+----------+
1 row in set (0.00 sec)

mysql> delete from employees where emp_no between 10001 and 10110;
Query OK, 110 rows affected (0.00 sec)

Flashback option comes only with Mariadb 10.2 server and above ,so i have copied the binlog which contains my lost transactions to a new server with Mariadb binaries installed.

Mariadb server (binaries) can also be installed in a local system (linux and Debian and Ubuntu) or the binlogs can be copied to pre-installed Mariadb server.

Snap of delete events from MySQL 5.7 binlog

BEGIN
/*!*/;
# at 336
#190506 10:04:25 server id 11  end_log_pos 404 CRC32 0x2b6e0318         Table_map: `employees`.`employees` mapped to number 255
# at 404
#190506 10:04:25 server id 11  end_log_pos 3459 CRC32 0x64cf7a16        Delete_rows: table id 255 flags: STMT_END_F
### DELETE FROM `employees`.`employees`
### WHERE
###   @1=10001
###   @2='1953:09:02'
###   @3='Georgi'
###   @4='Facello'
###   @5=1
###   @6='1986:06:26'
### DELETE FROM `employees`.`employees`
### WHERE
###   @1=10002
###   @2='1964:06:02'
###   @3='Bezalel'
###   @4='Simmel'
###   @5=2
###   @6='1985:11:21'

mysqlbinlog --flashback -v --start-position=336 --stop-position=3459 mysql-bin.000024 > insert_57.sql

mysql> source insert_57.sql
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)

mysql> select count(*) from employees where emp_no between 10001 and 10110;
+----------+
| count(*) |
+----------+
|      110 |
+----------+
1 row in set (0.00 sec)

mysql> select @@version;
+---------------+ 
| @@version     | 
+---------------+ 
| 5.7.24-27-log |
+---------------+ 
1 row in set (0.00 sec)

The MariaDB flashback option works fine with MySQL variants too.

Flashback with MySQL GTID

The flashback works fine with the conversion of delete into insert statements in GTID, but it fails to execute the when the data is loaded.

Snap of binlog from GTID enabled MySQL server

SET @@SESSION.GTID_NEXT= '7d60dbfb-29dd-11e9-a71a-080027dfb17a:64'/*!*/;
# at 259
#190506 11:36:29 server id 11  end_log_pos 336 CRC32 0xaa55b2c9         Query   thread_id=29022 exec_time=0     error_code=0
SET TIMESTAMP=1557142589/*!*/;
SET @@session.pseudo_thread_id=29022/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 336
#190506 11:36:29 server id 11  end_log_pos 404 CRC32 0x8e10c960         Table_map: `employees`.`employees` mapped to number 255
# at 404
#190506 11:36:29 server id 11  end_log_pos 3459 CRC32 0x02d82af3        Delete_rows: table id 255 flags: STMT_END_F
### DELETE FROM `employees`.`employees`
### WHERE
###   @1=10001
###   @2='1953:09:02'
###   @3='Georgi'
###   @4='Facello'
###   @5=1
###   @6='1986:06:26'

mysqlbinlog --flashback -v --start-position=336 --stop-position=3459 mysql-bin.000029  > gtid.sql

mysql> source gtid.sql
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
ERROR 1782 (HY000): @@SESSION.GTID_NEXT cannot be set to ANONYMOUS when @@GLOBAL.GTID_MODE = ON.
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

In this case, we need to disable the GTID mode and restore data,ta , which in painful for production servers.

Limitations of Flashback

It Doesn’t support DDL ( DROP/TRUNCATE or other DDL’s)
It Doesn’t support encrypted binlog
It Doesn’t support compressed binlog

Use cases of flashback

When we are not aware of exact start and stop positions , use –start-datetime and –stop-datetime
Also, we can transform the changes for a particular database or a table using the options –database(-d) and table (-T)
For more Binlog options refer Link

Key Takeaways:

To reverse the mishandled operations from binary logs.
No need to stop the server to carry out this operation.
When the data is small to revert back, flashback process is very faster than recovering the data from Full Backup.
Point in time recovery (PITR) becomes easy.

There is a similar open source tool called binlog2sql developed by US team review DBA team (Shanghai) , which works on the same principle of mysqlbinlog flashback.

This tool can also be used based on convienient and usecase.

Photo by Jiyeon Park on Unsplash

Automated Refactoring of a New York Times Mainframe to AWS with Modern Systems

May 22, 2019, 10:08 am

≫ Next: Percona Server for MySQL 5.7.26-29 Is Now Available

≪ Previous: Flashback Recovery in MariaDB/MySQL Servers

Feed: AWS Partner Network (APN) Blog.
Author: Phil de Valence.

By Michael Buzzetti, Principal Engineer at The New York Times
By Barry Tait, Director of Modernization and Cloud Strategies at Modern Systems
By Phil de Valence, Solutions Architect for Mainframe Modernization at AWS

The New York Times had a critical business workload running on a mainframe as the core IT system supporting the daily Home Delivery Platform of its newspaper.

They collaborated with Modern Systems, an AWS Partner Network (APN) Select Technology Partner, to successfully transform their legacy COBOL-based application into a modern Java-based application, which today runs on Amazon Web Services (AWS).

Using innovative automated refactoring, the application was modernized to object-oriented code, and the data was transformed from legacy indexed-files to a relational database.

This post describes the project, including the automated refactoring process and AWS architecture, as well as key lessons learned, business outcomes, and future technology plans.

The New York Times Context and Objectives

The New York Times is an American newspaper with worldwide influence and readership. Founded in 1851, the paper has won 125 Pulitzer Prizes, more than any other newspaper. The Times is ranked 17th in the world by circulation and second in the United States.

The company’s core-business application managed daily home delivery of the newspaper since 1979, supporting a line of business worth more than $500 million annually. It represented years of accumulated experience and knowledge, and yet it significantly resisted modification and evolution.

In addition, the IBM Z mainframe running the z/OS operating system was expensive to operate in comparison to more modern platforms that had evolved at the company. It needed modernization to reduce operating costs and enable the convergence of the Digital Platform with the Home Delivery Platform.

An attempt to manually rewrite the home delivery application between 2006 and 2009 failed. In 2015, an evaluation of alternate approaches determined that a second attempt at redeveloping the application would have been much more expensive, and an alternative emulator re-hosting would have continued to lock-up data in proprietary technology.

With mounting pressure to quickly lower costs, the chosen strategy was to migrate code and data with automated refactoring. This approach promised functional equivalence, lower operational cost, and easier integration with modern technologies.

Source Mainframe Workload

The mainframe application, named CIS on the mainframe and rebranded to Aristo after the migration, executed business-critical functionality such as billing, invoicing, customer accounts, delivery routing, product catalog, pricing, and financial reporting.

CIS was a z/OS-based CICS/COBOL application with a BMS-based 3270 interface accessing VSAM KSDS business data. Batch processing was supported by JCL jobs with CA7 for job scheduling.

In 2015, CIS had grown to more than two million lines of COBOL code, 600 batch jobs, and 3,500 files sent daily to downstream consumers and systems. It consumed around 3 TB of hot data made up of 2 TB of VSAM files, and 1 TB of QSAM sequential files. It used 20 TB of backup cold storage.

Automated Conversion with Refactoring

The New York Times selected an automated conversion approach, which retains functional equivalence and critical business logic while converting core legacy applications to maintainable, refactored object-oriented Java. The code is analyzed during the assessment to determine cloud-readiness and the required effort to obtain the desired level of elasticity (i.e. horizontal scalability and vertical scalability) required by the application workloads.

Modern Systems’ COBOL-to-Universal Solution (CTU) software solution supports typical mainframe-based COBOL application components, including CICS, JCL, and common utilities, such as IDCAMS and SORT, in addition to data stores like DB2 for z/OS, IMS, VSAM, and IDMS databases.

The New York Times used CTU and followed an eight-step methodology:

Figure 1 – Automated Refactoring steps.

Step 1 performs an automated inventory of the mainframe and populates a repository of components to be migrated.
Step 2 consists of a detailed analysis of the applications, data model, architecture preferences, coding styles, database connections, error handling, and refactoring options. All of this leads to the definition of how to piecemeal the code transformation with work packets and the overall test strategy.
In Step 3, for each work packet, the data model is defined and created in the target database.
Step 4 automatically generates programs and processes for unloading, transforming, validating, and loading of data from the source data store to the target database.
In Step 5, Modern Systems’ CTU is used to reverse-engineer the COBOL code into an intermediate language, and then to forward-engineer the target Java code.
Step 6 performs regression tests for each work packet, making sure there is functional equivalence between the source mainframe programs and the new Java code.
Step 7 is the user acceptance test execution process.
In Step 8, once these tests are successful, the cutover to production takes place.

Using Modern Systems’ CTU, the resulting Java application became object-oriented and separated into three layers: presentation logic, business logic, and data access.

To maintain the same user experience, CICS BMS Maps were migrated to equivalent web pages which mimic the original 3270 screens as closely as possible. Each COBOL program was converted to a Java class, and JCL was converted to JSR-352 XML using the Spring Batch runtime for Java. VSAM KSDS files were migrated to a relational Oracle database. During the refactoring process, all VSAM records were analyzed and a DDL generated for each of the best layouts chosen.

The table in Figure 2 shows the technology mapping between the legacy mainframe stack and the target AWS stack.

Figure 2 – Source and target technology mapping.

Replacement components were developed in situations where legacy application dependencies were not supported by the Modern Systems toolset (e.g. REXX, GVEXPORT), when off-the-shelf software packages were not available as a substitute, or if it made more sense to make use of capabilities of the new environment (vendor database backups and restore points, file system snapshots, etc.).

Functional Equivalence Testing

It was critical to have functional equivalence between the Java application and the COBOL application. Component groups assembled related components that would be runnable and testable together as a single entity through existing externally accessible interfaces, such as web services, user interface screens, database tables, and files.

Testing accounted for approximately 70-80 percent of the time spent on the project. The testing process was broken down into stages, with each stage progressively increasing in scope and level of difficulty in isolating the root-cause of test failures.

Stage 1 Pre-Delivery Test: Performed by Modern Systems prior to delivering the refactored code.
Stage 2 Data Migration Validation: Verified the data is the same between the source VSAM and the target relational database.
Stage 3 Component Group Test: Verified the functional behavior with one batch job, one or more screen, one Web Service call.
Stage 4 Batch Process Regression Test: Using static test data (the exact same test data every day) to verify the end-to-end batch process and perform regression testing.
Stage 5 Batch Process Comparison Test: Using dynamic test data to verify the end-to-end batch with the current day data, and comparing the output between legacy and modernized systems.
Stage 6 Batch Process Performance Test: Using production data.
Stage 7 System Integration: This was done in collaboration with other teams and systems within the organization. New transactions are entered via client systems, processed by the batch, and flow to downstream consumers via reports and file feeds.

For these stages, test coverage needed to be high. It can be very time consuming and complex to create test cases, especially for batch jobs. Automation was critical to launch and analyze test cases repeatedly and rapidly.

Target AWS Architecture

As the project progressed, The New York Times pivoted on its overall data center strategy to make cloud the preferred deployment environment. After less than a year of running in a private data center, Aristo was migrated to AWS. The team had gained significant knowledge of what a successful migration looked like, enabling a migration to AWS with minimal impact to the business.

Figure 3 – Target Aristo AWS architecture components.

As shown in Figure 3, once migrated to AWS, the system was broken up into four main components:

Front End system provides internal operators the ability to manage home delivery subscriptions.
API system provides SOAP web services to other systems.
Reporting system builds reports for finance department.
Batch is the main system where all of the nightly jobs execute. The jobs can run on any instance and the source and destination of the job data is stored on Amazon Elastic File System (Amazon EFS).

Both the API and Front End systems are within Auto Scaling Groups, providing the ability to respond to a large number of requests. In steady state, the API system uses four m5.xlarge instances and the Front End system uses two m5.xlarge instances. Reporting also uses two m5.xlarge instances. The Batch system is much larger with three m5.4xlarge instances due to the heavy computation needed to run the jobs.

In order to speed up delivery of releases, a Continuous Integration and Continuous Delivery (CI/CD) pipeline was created including:

Gradle for building and packaging artifacts.
Artifactory for storage and promotion of artifacts.
Jenkins for deployment and orchestration.
Ansible for configuration management.
Hashicorp Vault for secrets management.

Migration Timeline

The transformation of the application powering the Home Delivery Platform began in 2015. It was a two-phase process.

Automated Refactoring

This phase lasted around two years and included both the COBOL-to-Java transformation as well as the VSAM-to-relational database conversion, resulting in the Aristo application being launched in production on-premises.

Around the end of this phase, The New York Times announced its cloud strategy impacting the future of Aristo platform and triggering the next phase.

AWS Migration and Enhancements

Once the application was tested and stabilized, the work began in August 2017 to move the application to the AWS Cloud. This was an eight-month project, which also included the following changes:

From Oracle RAC to Oracle EE.
From Isilon to EFS.
Upgraded Control-M from version 7 to version 8.
Upgraded from FTP to SFTP/S3.
Rebuilt CI/CD pipeline (from Puppet to Ansible).

Figure 4 – Mainframe to AWS migration timeline.

Once in production on AWS in March 2018, Aristo benefited from maintenance and enhancements including promotion code table expansion, premium Home Delivery (HD) with new digital and paper offerings, and AWS cost optimizations.

Looking ahead, The New York Times is focused on these future improvements:

Breaking down the application monolith into microservices.
Continuing convergence of the digital subscription platform capabilities, including payments, product catalog, customer accounts, financial accounting.
Developing new Java-based services alongside the converted code.
Easing access to business data.
Increasing the use of cloud-native technologies and AWS managed services.
AWS Backup for Amazon EFS volumes.

Lessons Learned

The most significant lesson learned in the project was around testing, which ended up being the most time-consuming and underestimated part of the project by far (70-80 percent of the time).

Test cases need to be granular enough and automated.

With the mainframe in operation for more than 35 years, the COBOL application had accumulated a fair amount of obsolete code due to a lack of adequate maintenance. It’s a best practice to identify this code and remove it, which reduces the amount of refactoring and testing work to do.

Mainframes are typically backend processing systems for other servers. Aristo generated more than 3,500 data file feeds and reports for downstream consumers daily. Having a good inventory of all the consumers and interfaces facilitates the modernization and harmonization of the communications.

For Java code maintenance or new feature developments, it’s a good practice to cross-train existing mainframe COBOL developers with Java skills. This allows them to provide both functional insight and knowledge about specific coding standards for the application, such as naming conventions or overall code structure.

Gaining a deep application understanding during the analysis and planning phase is important in order to define work packets, which are about the same size and complexity and allow reusing learnings for later packets. For example, it’s good to start with migrating the batch jobs that are often of high complexity and demanding.

If The New York Times had its cloud strategy already in place before starting the mainframe migration, the company would have chosen to migrate the mainframe directly to AWS, avoiding the extra work for designing and implementing the on-premises Aristo deployment.

Project Benefits for The New York Times

While the modernization project began as a cost-cutting exercise, ultimately The New York Times took methodical and incremental steps toward more cutting edge technology adoption. All of this was done in an effort to improve customer service and gain competitive advantage in a unique industry that has seen significant market dynamic shifts since the project first began in 2015.

This project allowed convergence on a common technology stack (Java and Oracle on AWS) joining the Digital Subscription Platform that is now run, built, and maintained by the same Subscription Platforms group within The New York Times Technology organization.

The team is accelerating how it builds software on the new platform by adopting an agile methodology and CI/CD pipeline. In addition, there is now easier access to data gaining business and technology insights, and more rapid use of cloud-native technologies.

Aristo went live on August 28, 2017. During the first year, it has billed over half a billion dollars in subscription revenue, processed nearly 6.5 million transactions, and continued to route the daily paper to The New York Times’ home delivery subscribers across the United States.

Remarkably, Aristo today costs 70 percent less to operate per year than it did to run on the mainframe in 2015, giving The New York Times a significant cost savings.
.

Modern Systems – APN Partner Spotlight

Modern Systems is an APN Select Technology Partner. They are a legacy modernization company with proven expertise across all areas of legacy code and data migration, infrastructure, operations, monitoring, and maintenance—both during and after the transition.

Contact Modern Systems | Solution Overview | Buy on Marketplace

*Already worked with Modern Systems? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.

↧

Percona Server for MySQL 5.7.26-29 Is Now Available

May 27, 2019, 9:42 am

≫ Next: Michael Paquier: Postgres 12 highlight – Table Access Methods and blackholes

≪ Previous: Automated Refactoring of a New York Times Mainframe to AWS with Modern Systems

Feed: Planet MySQL
;
Author: MySQL Performance Blog
;

Percona announces the release of Percona Server for MySQL 5.7.26-29 on May 27, 2019 (downloads are available here and from the Percona Software Repositories). This release merges changes of MySQL 5.7.26, including all the bug fixes in it. Percona Server for MySQL 5.7.26-29 is now the current GA release in the 5.7 series. All of Percona’s software is open-source and free.

New Features:

Bug Fixes:

TokuDB storage engine would assert on load when used with jemalloc 5.x. Bug fixed #5406.
a read-write workload on compressed InnoDB tables could cause an assertion error. Bug fixed #3581.
using TokuDB or MyRocks native partitioning and index_merge access method could lead to a server crash. Bugs fixed #5206, #5562.
a stack buffer overrun could happen if the redo log encryption with key rotation was enabled. Bug fixed #5305.
TokuDB and MyRocks native partitioning handler objects were allocated from a wrong memory allocator. Memory was released only on shutdown and concurrent access to global memory allocator caused memory corruptions and therefore crashes. Bugs fixed #5508, #5525.
enabling redo log encryption resulted in redo log being written unencrypted. Bug fixed #5547.
if there are multiple row versions in InnoDB, reading one row from PK may have O(N) complexity and reading from secondary keys may have O(N^2) complexity. Bugs fixed #4712, #5450 (upstream #84958).
setting the log_slow_verbosity to include innodb value and enabling the slow_query_log could lead to a server crash. Bug fixed #4933.
the page cleaner could sleep for a long time when the system clock was adjusted to an earlier point in time. Bug fixed #5221 (upstream #93708).
executing SHOW BINLOG EVENT from an invalid position could result in a segmentation fault on 32bit machines. Bug fixed #5243.
BLOB entries in the binary log could become corrupted in a case when a database with Blackhole tables served as an intermediate binary log server in a replication chain. Bug fixed #5353 (upstream #93917).
when Audit Log Plugin was enabled, the server could use a lot of memory when handling large queries. Bug fixed #5395.
XtraDB changed page tracking was missing pages changed by the in-place DDL. Bug fixed #5447.
innodb_encrypt_tables variable accepted FORCE option only inside quotes as a string. Bug fixed #5538.
enabling redo log encryption and XtraDB changed page tracking together would result in the error log flooded with decryption errors. Bug fixed #5541.
system keyring keys initialization wasn’t thread safe. Bugs fixed #5554.
when using the Docker image, if the root passwords set in the mounted .cnf configuration file and the one specified with MYSQL_ROOT_PASSWORD option are different, password from the MYSQL_ROOT_PASSWORD option will be used. Bug fixed #5573.
long running ALTER TABLE ADD INDEX could cause a semaphore wait > 600 assertion. Bug fixed #3410 (upstream #82940).

Other bugs fixed: #5537, #5007 (upstream #93164), #5018, #5561, #5570, #5578, #5610, #5441, and #5442.

This release also contains the fixes for the following security issues: CVE-2019-2632, CVE-2019-1559, CVE-2019-2628, CVE-2019-2581, CVE-2019-2683, CVE-2019-2592, CVE-2019-262, and CVE-2019-2614.

Find the release notes for Percona Server for MySQL 5.7.26-29 in our online documentation. Report bugs in the Jira bug tracker.

↧

Michael Paquier: Postgres 12 highlight – Table Access Methods and blackholes

June 3, 2019, 10:16 pm

≫ Next: Exposing MyRocks Internals Via System Variables: Part 7, Use Case Considerations

≪ Previous: Percona Server for MySQL 5.7.26-29 Is Now Available

Feed: Planet PostgreSQL.

Postgres is very nice when it comes to extending with custom plugins, with
many set of facilities available, like:

After a heavy refactoring of the code, Postgres 12 ships with a basic
infrastructure for
table access methods
which allows to customize how table data is stored and accessed. By default,
all tables in PostgreSQL use the historical heap, which works on a page-based
method of 8kB present in segment files of 1GB (default sizes), with full
tuple versions stored. This means, in simple words, that even updating one
attribute of a tuple requires storing a full new version. This makes the
work related to vacuum and autovacuum more costly as well. Well, the goal
of this post is not to discuss about that, and there is
documentation
on the matter. So please feel free to refer to it.

Table access methods are really cool, because they basically allow to plugin
directly into Postgres a kind of equivalent to MySQL storage engines, making
it possible to implement things like columnar storage, which is something
where heap is weak at. It is possible to roughly classify what is possible
to do into two categories:

Access method going through the storage manager of Postgres, which makes
use of the existing shared buffer layer, with the exiting paging format.
This has two advantages: backups and checksums are normally, and mostly,
automatically supported.
Access method not going through Postgres, which has the advantage to not
rely on Postgres shared buffers (page format can be a problem as well),
making it possible to rely fully on the OS cache. Note that it is then
up to you to add support for checksumming, backups, and such.

Access methods could make a comparison with foreign data wrappers, but the
reliability is much different, one big point being that they are fully
transactional with the backend they work with, which is usually a big deal
for applications, and have transparent DDL and command support (if
implemented in the AM).

Last week at PGCon in Ottawa, there were two talks on the matter by:

The presentation slides are attached directly on those links, and these will
give you more details about the feature. Note that there have been recent
discussions with new AMs, like zheap or zstore (names beginning by ‘z’
because that’s a cool letter to use in a name). It is also limiting to not
have pluggable WAL (generic WAL can be used but that’s limited and not
performance-wise), but this problem is rather hard to tackle as contrary
to table AMs, WAL require registering callbacks out of system catalogs, and
resource manager IDs (understand a category of WAL records) need to have hard
values. Note that TIDs may also become of problem depending on the AM.

There is a large set of callbacks defining what a table AM is (42 as of when
writing this post), and the interface may change in the future, still this
version provides a very nice first cut.

On the flight back from Ottawa, I took a couple of hours to look at this
set of callbacks and implemented a template for table access methods called
blackhole_am.
This AM is mainly here as a base for creating a new plugin, and it has the
property to send to the void any data on a table making use of it. Note that
creating a table access method requires
CREATE ACCESS METHOD,
which is embedded directly in an extension here:

=# CREATE EXTENSION blackhole_am;
CREATE EXTENSION
=# dx+ blackhole_am
   Objects in extension "blackhole_am"
           Object description
-----------------------------------------
 access method blackhole_am
 function blackhole_am_handler(internal)
(2 rows)

Then a table can be defined to use it, throwing away any data:

=# CREATE TABLE blackhole_tab (id int) USING blackhole_am;
CREATE TABLE
=# INSERT INTO blackhole_tab VALUES (generate_series(1,100));
INSERT 0 100
=# SELECT * FROM blackhole_tab;
 id
----
(0 rows)

Note that there is a parameter controlling the default table access
method, called default_table_access_method, enforcing the value of
the USING clause to it. “heap” is the default. This feature opens a
lot of doors and possibilities, so have fun with it.

↧

Exposing MyRocks Internals Via System Variables: Part 7, Use Case Considerations

June 4, 2019, 6:00 am

≫ Next: DDL Queries on Foreign Key Columns in MySQL/PXC

≪ Previous: Michael Paquier: Postgres 12 highlight – Table Access Methods and blackholes

Feed: Planet MySQL
;
Author: The Pythian Group
;

(In the previous post, Part 6, we covered Replication.)

In this final blog post, we conclude our series of exploring MyRocks by taking a look at use case considerations. After all, having knowledge of how an engine works is really only applicable if you feel like you’re in a good position to use it.

Advantages of MyRocks

Let’s start by talking about some of the advantages of MyRocks.

Compression

MyRocks will typically do a good job of reducing the physical footprint of your data. As I mentioned in my previous post in this series about compression, you have the ability to configure compression down to the individual compaction layers for each column family. You also get the advantage of the fact that data isn’t updated once it’s written to disk. Compaction, which was also previously covered, takes data changes and merges them together. The result of this is less free space within data pages, meaning that the size of the data file on disk is a better representation of the size of the data set.

You can read more about this on this page of the MyRocks wiki.

I noted in my previous post on compression that a common mistake that I’ve seen in the past is the adoption of technologies or engines where there is an ability to reduce your storage footprint, particularly in moments where you may be running out of disk space. Fight that urge and consider all the facts in regard to the engine before adopting.

Write-Optimized

If your use case is write-heavy, then MyRocks may be a good solution for you given the fact that it writes data in a log structure based on data changes and then relies on compaction to clean up the data after the fact as it makes its way through compaction layers, thus creating deferred write amplification. This removes a lot of random reads and writes that would be required for storage engines that are reliant on b-tree.

You can read more about write optimization with MyRocks on this page of code.fb.com.

You can even take this a step further by saying that MyRocks is write-optimized at the cluster level considering it supports read free replication.

Better performance when the active data set doesn’t fit in memory

There are benchmark tests that have been performed that show that MyRocks can outperform InnoDB in the case when the active data set (the data most commonly accessed) does not fit entirely in memory. For InnoDB, this would mean that there is data that is frequently accessed that cannot all fit in the buffer pool. For MyRocks, we know that different caches are used for writes and reads, but for all intents and purposes we can say that the case would be when there is data that is frequently accessed that doesn’t fit in the block cache.

You can read more about this by checking out this blog post by Vadim Tkachenko on the Percona Blog.

Backups

When I originally drafted this installment of the blog series, I actually had backups listed as a drawback. The reason is that while you were able to do hot backups of MyRocks datasets using myrocks_hotbackup, you would not be able to use that tool to backup any data that was stored using any other storage engine. This was a common problem with TokuDB enabled instances and thus forced us to fall back to snapshot-based backups when we came across these systems.

However, on May 9th, 2019, Percona announced in their 8.0.6 release of xtrabackup that MyRocks would be supported in the product, allowing you to create backups for systems that used both InnoDB and MyRocks. This is especially important considering that in MySQL 8.0, the data dictionary is now stored in database tables within MySQL, replacing the existing .frm method, and those tables are stored using InnoDB.

Drawbacks of MyRocks

Now let’s cover some of the drawbacks and limitations of MyRocks

Range Lookups

MyRocks gives you some fantastic advantages when it comes to reads in your database that are filtered based on const operators like ‘IN’ and ‘=’ given that it can use bloom filters to increase the velocity of getting that data to disk by invalidating data files without needing to access them beyond reading the filter blocks if they aren’t already in memory.

This changes when you need to look for data in ranges. There is no way to take advantage of standard bloom filters for this, and you may need to spend a lot of time decompressing data and loading it into memory to find what you need. This issue is exacerbated in the case that you need to do full table scans given that you will need to access multiple versions of the table data across the various compaction layers and then take on the overhead of the system having to determine the latest version of the data for each record.

That’s not to say that range lookups in your use case would completely invalidate MyRocks as a potential solution for you. There are optimizations of RocksDB that allow for more efficient range lookups, and even prefix bloom filters that can be beneficial. You can read more about this by reading this blog post on why RocksDB was selected as the underlying storage engine for CockroachDB.

No Online DDL

Currently MyRocks does not support Online DDL, whereas InnoDB has supported this to some degree for quite some time.

You can read more about this limitation by checking out this bug in the MyRocks github project.

No Foreign Keys

Foreign keys are currently not supported in MyRocks. To be completely candid though, this isn’t a deal breaker for me. When I was a database developer and schema designer, I was a big-time supporter of foreign keys because I’m a control freak and don’t like developers messing with the reliability of my data relationships.

Now that I’ve been on the DBA side for a while, I can see the overhead that foreign keys create. They offer that lovely peace of mind when it comes to data relationships, but at a noteworthy cost. Even more so when you apply them to something like Galera.

Given the overhead cost of foreign keys, I’m seeing more and more cases where they are being removed unless data relationships are incredibly critical. You may find that you can live without them completely and, as such, you may not be dissuaded from using MyRocks.

No Transportable Tablespaces

Having the ability to physically move your data at the table level without having to do a logical export / import doesn’t really seem like a big deal until you need it.

This is going to be a difficult challenge to overcome considering how you have multiple tables in a single column family, plus the complexities that come with having multiple compaction layers. I have a feeling that a lack of transportable tablespaces with MyRocks is likely something we’re going to have to deal with for a long time.

Select For Update Not Supported With Repeatable Read Isolation Level

“Select for update” is not available with repeatable read, but is available with read-committed. This is important as you need to understand how this will impact your explicit exclusive lock requests.

So when do I consider it?

I would suggest considering MyRocks if you’re working with….

A large OLTP dataset where your active data set doesn’t fit into memory
Write-intensive workloads
High concurrency reads that don’t filter on range

Conclusion

In this blog series, we took a look at MyRocks internals in an attempt to understand how they work at a high level as well as understanding the variables and metrics that are associated with them. We also covered the strengths and weaknesses that you should take into mind when considering this technology.

MyRocks has grabbed the attention of the community in a big way in the last couple of years but the debate will go on as to whether or not it’s ready for the big leagues, or if it will be an InnoDB killer, or if it will fizzle out entirely within the context of MySQL, etc. The important thing to know is that there has been a focus on this engine, it’s something you should be aware of, and you now have an option to include log structures merge indexes in your data ecosystem without having to stray too far away from the SQL language you’re already comfortable with.

Keep an eye out for use cases in your environment. Keep the limitations in mind. Don’t adopt just because it compresses your data. Above all else, be sure to test, test, and then test some more before a production implementation.

Thank you very much for taking the time to read this blog series. There was a lot of time and effort that went into the research and lab work that supported what has been written here so I can only hope that it will be useful to you as you continue your exploration of log-structured merge capabilities that are now available for your use.

Also, I would like to add a big thank you to my colleague Sandeep Varupula here at Pythian and George Lorch of Percona for helping fact check the installments in this series.

In case you missed the previous posts, here they are:

Part 1: Data Writing

Part 2: Data Flushing

Part 3: Compaction

Part 4: Compression and Bloom Filters

Part 5: Data Reads

Part 6: Replication

Interested in working with Peter? Schedule a tech call.

↧

DDL Queries on Foreign Key Columns in MySQL/PXC

June 4, 2019, 7:44 am

≫ Next: How pt-online-schema-change Handles Foreign Keys

≪ Previous: Exposing MyRocks Internals Via System Variables: Part 7, Use Case Considerations

Feed: Planet MySQL
;
Author: MySQL Performance Blog
;

Recently, I received a support request where the customer wanted to convert an INTEGER column to BIGINT on two tables. These tables are related by a foreign key, and it is a 3 node PXC cluster. These tables are 20GB and 82 GB in size and DDL’s on such tables in a production environment is always a challenge. We have options like direct ALTER or using pt-online-schema-change to get this done, but this is a very special case where none of them will be able to do this DDL. To demonstrate why this is so, please follow the table schema and the example below about DDL queries on foreign key columns. In the end, I will discuss an easy workaround, too.

Please take a look at table schema before reading further. In the below schema, the product_catalog_id column from the product_details table refers to catalog_id column from the product_catalog table. Both these fields are INT(11) and the customer wanted to convert them to BIGINT:

<br>
mysql> show create table product_catalog G<br>
*************************** 1. row ***************************<br>
       Table: product_catalog<br>
Create Table: CREATE TABLE `product_catalog` (<br>
  `catalog_id` int(11) unsigned NOT NULL AUTO_INCREMENT,<br>
  …..<br>
  PRIMARY KEY (`catalog_id`),<br>
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci<br>
1 row in set (0.00 sec)<br>
mysql> show create table product_details G<br>
*************************** 1. row ***************************<br>
       Table: product_details<br>
Create Table: CREATE TABLE `product_details` (<br>
  `product_id` int(11) NOT NULL AUTO_INCREMENT,<br>
  `product_catalog_id` int(11) unsigned NOT NULL,<br>
  …..<br>
  PRIMARY KEY (`product_id`),<br>
  KEY `fk_audit_detail_audit_header_idx` (`product_catalog_id`),<br>
  CONSTRAINT `product_catalog_ibfk_1` FOREIGN KEY (`product_catalog_id`) REFERENCES `product_catalog` (`catalog_id`) ON DELETE NO ACTION ON UPDATE NO ACTION<br>
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci<br>
1 row in set (0.00 sec)

mysql> show create table product_catalog G

*************************** 1. row ***************************

Table: product_catalog

Create Table: CREATE TABLE `product_catalog` (

`catalog_id` int(11) unsigned NOT NULL AUTO_INCREMENT,

PRIMARY KEY (`catalog_id`),

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

1 row in set (0.00 sec)

mysql> show create table product_details G

*************************** 1. row ***************************

Table: product_details

Create Table: CREATE TABLE `product_details` (

`product_id` int(11) NOT NULL AUTO_INCREMENT,

`product_catalog_id` int(11) unsigned NOT NULL,

PRIMARY KEY (`product_id`),

KEY `fk_audit_detail_audit_header_idx` (`product_catalog_id`),

CONSTRAINT `product_catalog_ibfk_1` FOREIGN KEY (`product_catalog_id`) REFERENCES `product_catalog` (`catalog_id`) ON DELETE NO ACTION ON UPDATE NO ACTION

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

1 row in set (0.00 sec)

Changing a column from INT to BIGINT is an ONLINE ALTER. But in this case, it is not an ONLINE ALTER as the primary key is defined in this column. I started with a direct ALTER and later with pt-online-schema-change. Let’s have a look at how these tools react to this DDL change.

Direct Alter:

Considering the tables are close to 100GB in size together, a direct ALTER is not a good choice especially with a PXC cluster, and also on a standard deployment, it would block queries on metadata lock. But let’s see how the direct ALTER reacts here. I will first alter the child table and then the parent table.

<br>
mysql> ALTER TABLE product_details MODIfY COLUMN product_catalog_id BIGINT UNSIGNED NOT NULL  ;<br>
ERROR 1832 (HY000): Cannot change column ‘product_catalog_id’: used in a foreign key constraint ‘product_details_ibfk_1’

mysql> ALTER TABLE product_details MODIfY COLUMN product_catalog_id BIGINT UNSIGNED NOT NULL ;

ERROR 1832 (HY000): Cannot change column ‘product_catalog_id’: used in a foreign key constraint ‘product_details_ibfk_1’

It failed with Error 1832. Let’s try changing the column in the parent table first.

<br>
mysql> ALTER TABLE product_catalog MODIfY COLUMN catalog_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT  ;<br>
ERROR 1833 (HY000): Cannot change column ‘catalog_id’: used in a foreign key constraint ‘product_details_ibfk_1’ of table ‘DB255525.product_details’

mysql> ALTER TABLE product_catalog MODIfY COLUMN catalog_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT ;

ERROR 1833 (HY000): Cannot change column ‘catalog_id’: used in a foreign key constraint ‘product_details_ibfk_1’ of table ‘DB255525.product_details’

This time, it is Error 1833. Both these errors state that it cannot change the column involved in a foreign key. The reason is that a foreign key is defined only between two identical columns. Changing any of those column data types would result in an error.

pt-online-schema-change:

It is always recommended to use pt-online-schema-change for DDL’s in PXC cluster deployments, provided additional disk space is available. You can refer to this blog post to know when to use this tool. pt-osc works by creating a new table with the required change in place, and copies data to the new table. The challenge comes in when there are child tables referring some column in this parent table. Presence of foreign keys complicates the job of pt-osc.

There are two ways the tool handles the foreign key constraints on the child table when the parent table is renamed as part of pt-osc. Below explains each of those cases.

alter-foreign-keys-method=auto

<br>
# pt-online-schema-change –user=root –password=root –host=172.23.0.2 –alter-foreign-keys-method=auto –alter “MODIFY COLUMN product_catalog_id BIGINT UNSIGNED NOT NULL” D=DB1,t=product_details –execute<br>
….<br>
….<br>
2019-05-28T09:19:16 Dropping new table…<br>
2019-05-28T09:19:16 Dropped new table OK.<br>
`DB255525`.`product_details` was not altered.<br>
Error altering new table `DB1`.`_product_details_new`: DBD::mysql::db do failed: Cannot change column ‘product_catalog_id’: used in a foreign key constraint ‘_product_details_ibfk_1’ [for Statement “ALTER TABLE `DB255525`.`_product_details_new` MODIFY COLUMN product_catalog_id BIGINT UNSIGNED NOT NULL”] at /usr/bin/pt-online-schema-change line 9271.

# pt-online-schema-change –user=root –password=root –host=172.23.0.2 –alter-foreign-keys-method=auto –alter “MODIFY COLUMN product_catalog_id BIGINT UNSIGNED NOT NULL” D=DB1,t=product_details –execute

2019–05–28T09:19:16 Dropping new table...

2019–05–28T09:19:16 Dropped new table OK.

`DB255525`.`product_details` was not altered.

Error altering new table `DB1`.`_product_details_new`: DBD::mysql::db do failed: Cannot change column ‘product_catalog_id’: used in a foreign key constraint ‘_product_details_ibfk_1’ [for Statement “ALTER TABLE `DB255525`.`_product_details_new` MODIFY COLUMN product_catalog_id BIGINT UNSIGNED NOT NULL”] at /usr/bin/pt–online–schema–change line 9271.

It failed while renaming the table _product_details_new to product_details. The reason is that it would have BIGINT datatype in child table and INT datatype in the parent table if the rename succeeds – which is not allowed in MySQL. Foreign keys between different data types are not allowed.

alter-foreign-keys-method=rebuild_constraints

<br>
# pt-online-schema-change –user=root –password=root –host=172.23.0.2 –alter-foreign-keys-method=rebuild_constraints –alter “MODIFY COLUMN product_catalog_id BIGINT UNSIGNED NOT NULL” D=DB1,t=product_details –execute<br>
…..<br>
…..<br>
Created new table DB1._product_details_new OK.<br>
Altering new table…<br>
2019-05-28T09:27:24 Dropping new table…<br>
2019-05-28T09:27:25 Dropped new table OK.<br>
`DB255525`.`product_details` was not altered.<br>
Error altering new table `DB1`.`_product_details_new`: DBD::mysql::db do failed: Cannot change column ‘product_catalog_id’: used in a foreign key constraint ‘_product_details_ibfk_1’ [for Statement “ALTER TABLE `DB255525`.`_product_details_new` MODIFY COLUMN product_catalog_id BIGINT UNSIGNED NOT NULL”] at /usr/bin/pt-online-schema-change line 9271.

# pt-online-schema-change –user=root –password=root –host=172.23.0.2 –alter-foreign-keys-method=rebuild_constraints –alter “MODIFY COLUMN product_catalog_id BIGINT UNSIGNED NOT NULL” D=DB1,t=product_details –execute

Created new table DB1._product_details_new OK.

Altering new table...

2019–05–28T09:27:24 Dropping new table...

2019–05–28T09:27:25 Dropped new table OK.

`DB255525`.`product_details` was not altered.

In this case, the child table is rebuilt to point to the correct parent table using an ALTER, but it failed again for the same reason. So, none of the options like direct ALTER nor pt-online-schema-change are working for this particular change.

DDL Queries on Foreign Key Columns Workaround

Even disabling foreign key checks don’t work here, as that trick only works when it comes to data, not the schema changes. This has been reported independently under Percona Server as well as in PXC branches in JIRA. You can see more information about these here and here. There is one easy yet simple workaround that I would suggest. That is to drop the foreign key constraint on the child table, run the DDL on both the child and the parent tables, and finally redefine the foreign key constraint.

As you can see, the integrity constraint is compromised for the duration for this workaround. Be sure to keep the server in read-only mode and not allow any changes to these tables, as it might lead to inconsistent data between the parent and child tables.

<br>
Step #1:<br>
mysql> ALTER TABLE product_details DROP FOREIGN KEY product_details_ibfk_1 ;<br>
Query OK, 0 rows affected (0.95 sec)<br>
Records: 0 Duplicates: 0 Warnings: 0<br>
Step #2:<br>
# pt-online-schema-change –user=root –password=XXXXXXX –host=172.23.0.2 –alter “MODIFY COLUMN product_catalog_id BIGINT UNSIGNED NOT NULL” D=DB1,t=product_details –execute<br>
……<br>
Successfully altered `DB1`.`product_details`.<br>
Step #3:<br>
# pt-online-schema-change –user=root –password=XXXXXXX –host=172.23.0.2 –alter “MODIFY COLUMN catalog_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT” D=DB1,t=product_catalog –execute<br>
…….<br>
Successfully altered `DB1`.`product_catalog`.<br>
Step #4:<br>
mysql> ALTER TABLE product_details ADD FOREIGN KEY product_details_ibfk_1 (product_catalog_id) REFERENCES `product_catalog`(`catalog_id`) ON DELETE NO ACTION ON UPDATE NO ACTION ;<br>
Query OK, 0 rows affected (1.55 sec)<br>
Records: 0 Duplicates: 0 Warnings: 0

mysql> ALTER TABLE product_details DROP FOREIGN KEY product_details_ibfk_1 ;

Query OK, 0 rows affected (0.95 sec)

Records: 0 Duplicates: 0 Warnings: 0

# pt-online-schema-change –user=root –password=XXXXXXX –host=172.23.0.2 –alter “MODIFY COLUMN product_catalog_id BIGINT UNSIGNED NOT NULL” D=DB1,t=product_details –execute

Successfully altered `DB1`.`product_details`.

Step #3:

# pt-online-schema-change –user=root –password=XXXXXXX –host=172.23.0.2 –alter “MODIFY COLUMN catalog_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT” D=DB1,t=product_catalog –execute

Successfully altered `DB1`.`product_catalog`.

mysql> ALTER TABLE product_details ADD FOREIGN KEY product_details_ibfk_1 (product_catalog_id) REFERENCES `product_catalog`(`catalog_id`) ON DELETE NO ACTION ON UPDATE NO ACTION ;

Query OK, 0 rows affected (1.55 sec)

Records: 0 Duplicates: 0 Warnings: 0

Summary:

Foreign keys can only be defined and operated between two identical columns. Due to this constraint, DDL queries on columns involved in foreign keys are still a problem in MySQL/PXC, especially when the tables are huge. This workaround, with little downtime for writes, is the only quick way to get this done without spending time on complex logic building and implementation that involves changes on both the DB and the application.

Photo by Vanessa Bucceri on Unsplash

↧

How pt-online-schema-change Handles Foreign Keys

June 7, 2019, 6:30 am

≫ Next: An Overview of PostgreSQL to MySQL Cross Replication

≪ Previous: DDL Queries on Foreign Key Columns in MySQL/PXC

Feed: Planet MySQL
;
Author: MySQL Performance Blog
;

Foreign key related issues are very common when dealing with DDL changes in MySQL using Percona toolkit. In this blog post, I will explain how the tool (pt-online-schema-change) handles foreign key constraints when executing a DDL change.

First of all, I would like to explain why foreign keys have to be handled at all before writing more about the “How”. Foreign key constraints are aware of table rename operations. In other words, if the parent table is renamed, the child table automatically knows it and changes the foreign key constraint accordingly. Please have a look at the below example, and you can see the table name is automatically updated in the child table after the rename operation on the parent table:

<br>
mysql> show create table prd_details G<br>
*************************** 1. row ***************************<br>
       Table: prd_details<br>
Create Table: CREATE TABLE `prd_details` (<br>
  `product_id` int(11) NOT NULL AUTO_INCREMENT,<br>
  `product_catalog_id` int(11) unsigned NOT NULL,<br>
  ……<br>
  CONSTRAINT `prd_details_ibfk_1` FOREIGN KEY (`product_catalog_id`) REFERENCES `product_catalog` (`catalog_id`) ON DELETE NO ACTION ON UPDATE NO ACTION<br>
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci<br>
1 row in set (0.00 sec)i<br>
mysql> RENAME TABLE product_catalog TO product_cat ;<br>
Query OK, 0 rows affected (0.15 sec)<br>
mysql> show create table prd_details G<br>
*************************** 1. row ***************************<br>
       Table: prd_details<br>
Create Table: CREATE TABLE `prd_details` (<br>
  `product_id` int(11) NOT NULL AUTO_INCREMENT,<br>
  `product_catalog_id` int(11) unsigned NOT NULL,<br>
  ……<br>
  CONSTRAINT `prd_details_ibfk_1` FOREIGN KEY (`product_catalog_id`) REFERENCES `product_cat` (`catalog_id`) ON DELETE NO ACTION ON UPDATE NO ACTION<br>
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

mysql> show create table prd_details G

*************************** 1. row ***************************

Table: prd_details

Create Table: CREATE TABLE `prd_details` (

`product_id` int(11) NOT NULL AUTO_INCREMENT,

`product_catalog_id` int(11) unsigned NOT NULL,

......

CONSTRAINT `prd_details_ibfk_1` FOREIGN KEY (`product_catalog_id`) REFERENCES `product_catalog` (`catalog_id`) ON DELETE NO ACTION ON UPDATE NO ACTION

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

1 row in set (0.00 sec)i

mysql> RENAME TABLE product_catalog TO product_cat ;

Query OK, 0 rows affected (0.15 sec)

mysql> show create table prd_details G

*************************** 1. row ***************************

Table: prd_details

Create Table: CREATE TABLE `prd_details` (

`product_id` int(11) NOT NULL AUTO_INCREMENT,

`product_catalog_id` int(11) unsigned NOT NULL,

......

CONSTRAINT `prd_details_ibfk_1` FOREIGN KEY (`product_catalog_id`) REFERENCES `product_cat` (`catalog_id`) ON DELETE NO ACTION ON UPDATE NO ACTION

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Well, that is indeed very nice and to be expected. But please allow me to explain how this becomes a problem when dealing with DDL changes using pt-online-schema-change. The tool implements the DDL changes as mentioned below. Please keep in mind that these are just to give an idea of how the tool works, as there would be more internal things going on in reality.

Let’s take an example ALTER for this case:

Query:

<br>
ALTER TABLE T1 MODIFY COLUMN c1 BIGINT UNSIGNED NOT NULL ;

ALTER TABLE T1 MODIFY COLUMN c1 BIGINT UNSIGNED NOT NULL ;

pt-online-schema-change steps for the above alter:

Create a similar table _T1_new
Modify the column c1 to BIGINT in the table _T1_new
Define triggers on table T1 so that changes to data on the original table will be applied to _T1_new as well.
Copy the data from table T1 to _T1_new.
Swap the tables
Drop triggers.

All looks good so far. Now let’s see why these steps create a problem, with a close look at Step #5 (Swap the tables).

Without foreign keys: Swapping of these tables is done as below, which looks nice.

Rename T1 —> T1_old
Rename _T1_new –> T1
If everything right, drop the table T1_old
Drop triggers on the new T1 table

With foreign keys: As I mentioned earlier, if there are any child tables with foreign keys to table T1, after renaming, they would automatically point to T1_old but not the new T1. Pt online schema change has to ensure the child table refers to a correct parent table by the end of this DDL change.

Rename T1 —> T1_old =====➤ The child table refers to T1_old automatically.
Rename _T1_new —> T1

In this case, the foreign keys in the child table are still referring to the old table T1_old which don’t have the schema change in place. If you drop T1_old, child table CT1 ends up pointing to a table that doesn’t exist. That’s a very bad situation. Now let’s talk about how the tool handles this.

How does pt-online-schema-change handle this?

The tool comes up with an option named --alter-foreign-keys-method This option supports two values at a high level and below you can see what are those and how they will work.

alter-foreign-keys-method=drop_swap

With this value, it won’t swap as mentioned in the steps. Rather, it drops the old table and then renames the new table with the change in place.

Disable foreign key checks for the session (FOREIGN_KEY_CHECKS=0)
Drop the table T1_old
Rename the new table _T1_new –> T1

The good thing here is that it is quick, but the bad thing is that it’s not reliable. If something goes wrong with renaming, it ends up with the same problem of referring to an unexisting table.

alter-foreign-keys-method=rebuild_constraints

This is the preferred approach for the reason it maintains the consistency of the schema and its relations. In this approach, before dropping the old table, it runs ALTER on all the child tables to drop existing FK and re-add new FK constraints that points to the columns from the new table (with the schema change in place). Below sequence of bullet points explains the same.

Rename T1 –> T1_old
Rename _T1_new –> T1
ALTER on child table to adjust the foreign key so that it points to T1 rather T1_old.

<br>
ALTER TABLE child_table DROP FOREIGN KEY `fk_name`, ADD CONSTRAINT `_fk_name` FOREIGN KEY  (`child_table_column`) REFERENCES _T1_new (`parent_table_column`)

ALTER TABLE child_table DROP FOREIGN KEY `fk_name`, ADD CONSTRAINT `_fk_name` FOREIGN KEY (`child_table_column`) REFERENCES _T1_new (`parent_table_column`)

Drop the table T1_old
Drop triggers from the new T1 table.

I would like to mention that the current implementation to rebuild the child table can be improved by making use of the INPLACE ALTER which I hope would probably be available in upcoming releases. You can see more information about this in the existing bug report here. I will discuss in brief about the two other options available, which are derived based on the above two. Let’s have a quick look.

auto: If this value is used, it leaves the decision up to the tool itself to choose from the two (drop_swap/rebuild_constraints) options available. If the number of rows in the child table is small, it uses rebuild_constraints; otherwise, it goes with the drop_swap approach. For this reason, this option should always be chosen carefully as it can end up with unexpected results when choosing drop_swap. Below is an example log snippet which explains this behavior:

<br>
# pt-online-schema-change –user=root –password=xxxxxxx –alter-foreign-keys-method=auto  –alter “MODIFY COLUMN header_id BIGINT unsigned NOT NULL AUTO_INCREMENT” D=DB1,t=T1 –execute<br>
………..<br>
Copying `DB1`.`T1`:  75% 00:18 remain<br>
2019-05-28T12:49:41 Copied rows OK.<br>
2019-05-28T12:49:41 Max rows for the rebuild_constraints method: 5588<br>
Determining the method to update foreign keys…<br>
2019-05-28T12:49:41   `DB1`.`child_of_T1`: too many rows: 197076; must use drop_swap<br>
2019-05-28T12:49:41 Drop-swapping tables…<br>
………..

# pt-online-schema-change –user=root –password=xxxxxxx –alter-foreign-keys-method=auto –alter “MODIFY COLUMN header_id BIGINT unsigned NOT NULL AUTO_INCREMENT” D=DB1,t=T1 –execute

...........

Copying `DB1`.`T1`: 75% 00:18 remain

2019–05–28T12:49:41 Copied rows OK.

2019–05–28T12:49:41 Max rows for the rebuild_constraints method: 5588

Determining the method to update foreign keys...

2019–05–28T12:49:41 `DB1`.`child_of_T1`: too many rows: 197076; must use drop_swap

2019–05–28T12:49:41 Drop–swapping tables...

...........

none: If this value is used, it is similar to drop_swap but without swapping. In other words, it just drops the original table and leaves the child tables in a state which they point to a table that doesn’t exist. In this case, DBA’s have need to fix the leftover job.

Photo by Silas Köhler on Unsplash

↧

An Overview of PostgreSQL to MySQL Cross Replication

June 10, 2019, 7:19 am

≫ Next: An Overview of PostgreSQL & MySQL Cross Replication

≪ Previous: How pt-online-schema-change Handles Foreign Keys

Feed: Planet MySQL
;
Author: Severalnines
;

This blog is aimed at explaining an overview of cross replication between PostgreSQL and MySQL, and further discussing the methods of configuring cross replication between the two database servers. Traditionally, the databases involved in a cross replication setup are called heterogeneous databases, which is a good approach to move away from one RDBMS server to another.

Both PostgreSQL and MySQL databases are conventionally RDBMS databases but they also offer NoSQL capability with added extensions to have the best of both worlds. This article focuses on the discussion of replication between PostgreSQL and MySQL from an RDBMS perspective.

An exhaustive explanation about internals of replication is not within the purview of this blog, however, some foundational elements shall be discussed to give the audience an understanding of how is replication configured between database servers, advantages, limitations and perhaps some known use cases.

In general replication between two identical database servers is achieved either in binary mode or query mode between a master node (otherwise called publisher, primary or active) and a slave node (subscriber, standby or passive). The aim of replication is to provide a real time copy of the master database on the slave side, where the data is transferred from master to slave, thereby forming an active-passive setup because the replication is only configured to occur one way. On the other hand, replication between two databases can be configured both ways so the data can also be transferred from slave back to master, establishing an active-active configuration. All of this can be configured between two or more identical database servers which may also include a cascading replication. The configuration of active-active or active-passive really depends on the business need, availability of such features within the native configuration or utilizing external solutions to configure and applicable trade-offs.

The above mentioned configuration can be accomplished with diverse database servers, wherein a database server can be configured to accept replicated data from another completely different database server and still maintain real time snapshot of the data being replicated. Both MySQL and PostgreSQL database servers offer most of the configurations discussed above either in their own nativity or with the help of third party extensions including binary log method, disk block method, statement based and row based methods.

The requirement to configure a cross replication between MySQL and PostgreSQL really comes in as a result of a one time migration effort to move away from one database server to another. As both the databases use different protocols so they cannot directly talk to each other. In order to achieve that communication flow, there is an external open source tool such as pg_chameleon.

Background of pg_chameleon

pg_chameleon is a MySQL to PostgreSQL replication system developed in Python 3. It uses an open source library called mysql-replication which is also developed using Python. The functionality involves pulling row images of MySQL tables and storing them as JSONB objects into a PostgreSQL database, which is further decoded by a pl/pgsql function and replaying those changes against the PostgreSQL database.

Features of pg_chameleon

Multiple MySQL schemas from the same cluster can be replicated to a single target PostgreSQL database, forming a many-to-one replication setup
The source and target schema names can be non-identical
Replication data can be pulled from MySQL cascading replica
Tables that fail to replicate or generate errors are excluded
Each replication functionality is managed with the help of daemons
Controlled with the help of parameters and configuration files based on YAML construct

Demo

Host	vm1	vm2
OS version	CentOS Linux release 7.6 x86_64	CentOS Linux release 7.5 x86_64
Database server with version	MySQL 5.7.26	PostgreSQL 10.5
Database port	3306	5433
ip address	192.168.56.102	192.168.56.106

To begin with, prepare the setup with all the prerequisites needed to install pg_chameleon. In this demo Python 3.6.8 is installed, creating a virtual environment and activating it for use.

$> wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tar.xz
$> tar -xJf Python-3.6.8.tar.xz
$> cd Python-3.6.8
$> ./configure --enable-optimizations
$> make altinstall

Following a successful installation of Python3.6, further additional requirements are met such as creating and activating a virtual environment. In addition to that pip module upgraded to the latest version and it is used to install pg_chameleon. In the commands below, pg_chameleon 2.0.9 was deliberately installed whereas the latest version is a 2.0.10. This is done in order to avoid any newly introduced bugs in the updated version.

$> python3.6 -m venv venv
$> source venv/bin/activate
(venv) $> pip install pip --upgrade
(venv) $> pip install pg_chameleon==2.0.9

The next step is to invoke the pg_chameleon (chameleon is the command) with set_configuration_files argument to enable pg_chameleon to create default directories and configuration files.

(venv) $> chameleon set_configuration_files
creating directory /root/.pg_chameleon
creating directory /root/.pg_chameleon/configuration/
creating directory /root/.pg_chameleon/logs/
creating directory /root/.pg_chameleon/pid/
copying configuration  example in /root/.pg_chameleon/configuration//config-example.yml

Now, create a copy of config-example.yml as default.yml to make it the default configuration file. A sample configuration file used for this demo is provided below.

$> cat default.yml
---
#global settings
pid_dir: '~/.pg_chameleon/pid/'
log_dir: '~/.pg_chameleon/logs/'
log_dest: file
log_level: info
log_days_keep: 10
rollbar_key: ''
rollbar_env: ''

# type_override allows the user to override the default type conversion into a different one.
type_override:
  "tinyint(1)":
    override_to: boolean
    override_tables:
      - "*"

#postgres  destination connection
pg_conn:
  host: "192.168.56.106"
  port: "5433"
  user: "usr_replica"
  password: "pass123"
  database: "db_replica"
  charset: "utf8"

sources:
  mysql:
    db_conn:
      host: "192.168.56.102"
      port: "3306"
      user: "usr_replica"
      password: "pass123"
      charset: 'utf8'
      connect_timeout: 10
    schema_mappings:
      world_x: pgworld_x
    limit_tables:
#      - delphis_mediterranea.foo
    skip_tables:
#      - delphis_mediterranea.bar
    grant_select_to:
      - usr_readonly
    lock_timeout: "120s"
    my_server_id: 100
    replica_batch_size: 10000
    replay_max_rows: 10000
    batch_retention: '1 day'
    copy_max_memory: "300M"
    copy_mode: 'file'
    out_dir: /tmp
    sleep_loop: 1
    on_error_replay: continue
    on_error_read: continue
    auto_maintenance: "disabled"
    gtid_enable: No
    type: mysql
    skip_events:
      insert:
        - delphis_mediterranea.foo #skips inserts on the table delphis_mediterranea.foo
      delete:
        - delphis_mediterranea #skips deletes on schema delphis_mediterranea
      update:

The configuration file used in this demo is the sample file that comes with pg_chameleon with minor edits to suit the source and destination environments, and a summary of different sections of the configuration file follows.

The default.yml configuration file has a “global settings” section that control details such as lock file location, logging locations and retention period, etc. The section that follows next is the “type override” section which is a set of rules to override types during replication. A sample type override rule is used by default which converts a tinyint(1) to a boolean value. The next section is the destination database connection details section which in our case is a PostgreSQL database, denoted by “pg_conn”. The final section is the source section which has all the details of source database connection settings, schema mapping between source and destination, any tables to skip including timeout, memory and batch size settings. Notice the “sources” denoting that there can be multiple sources to a single destination to form a many-to-one replication setup.

A “world_x” database is used in this demo which is a sample database with 4 tables containing sample rows, that MySQL community offers for demo purposes, and it can be downloaded from here. The sample database comes as a tar and compressed archive along with instructions to create it and import rows in it.

A dedicated user is created in both the MySQL and PostgreSQL databases with the same name as usr_replica that is further granted additional privileges on MySQL to have read access to all the tables being replicated.

mysql> CREATE USER usr_replica ;
mysql> SET PASSWORD FOR usr_replica='pass123';
mysql> GRANT ALL ON world_x.* TO 'usr_replica';
mysql> GRANT RELOAD ON *.* to 'usr_replica';
mysql> GRANT REPLICATION CLIENT ON *.* to 'usr_replica';
mysql> GRANT REPLICATION SLAVE ON *.* to 'usr_replica';
mysql> FLUSH PRIVILEGES;

A database is created on the PostgreSQL side that will accept changes from MySQL database, which is named as “db_replica”. The “usr_replica” user in PostgreSQL is automatically configured as an owner of two schemas such as “pgworld_x” and “sch_chameleon” that contain the actual replicated tables and catalog tables of replication respectively. This automatic configuration is done by the create_replica_schema argument, indicated further below.

postgres=# CREATE USER usr_replica WITH PASSWORD 'pass123';
CREATE ROLE
postgres=# CREATE DATABASE db_replica WITH OWNER usr_replica;
CREATE DATABASE

The MySQL database is configured with a few parameter changes in order to prepare it for replication, as shown below, and it requires a database server restart for the changes to take effect.

$> vi /etc/my.cnf
binlog_format= ROW
binlog_row_image=FULL
log-bin = mysql-bin
server-id = 1

At this point, it is significant to test the connectivity to both the database servers to ensure there are no issues when pg_chameleon commands are executed.

On the PostgreSQL node:

$> mysql -u usr_replica -Ap'admin123' -h 192.168.56.102 -D world_x

On the MySQL node:

$> psql -p 5433 -U usr_replica -h 192.168.56.106 db_replica

The next three commands of pg_chameleon (chameleon) is where it sets the environment up, adds a source and initializes a replica. The “create_replica_schema” argument of pg_chameleon creates the default schema (sch_chameleon) and replication schema (pgworld_x) in the PostgreSQL database as has already been discussed. The “add_source” argument adds the source database to the configuration by reading the configuration file (default.yml), which in this case is “mysql”, while the “init_replica” initializes the configuration based on the settings of the configuration file.

$> chameleon create_replica_schema --debug
$> chameleon add_source --config default --source mysql --debug
$> chameleon init_replica --config default --source mysql --debug

The output of the above three commands is self explanatory indicating the success of each command with an evident output message. Any failures or syntax errors are clearly mentioned in simple and plain messages, thereby suggesting and prompting corrective actions.

The final step is to start the replication with “start_replica”, the success of which is indicated by an output hint as shown below.

$> chameleon start_replica --config default --source mysql 
output: Starting the replica process for source mysql

The status of replication can be queried with the “show_status” argument while errors can be viewed with ‘show_errors” argument.

$> chameleon show_status --source mysql  
OUTPUT: 
  Source id  Source name    Type    Status    Consistent    Read lag    Last read    Replay lag    Last replay
-----------  -------------  ------  --------  ------------  ----------  -----------  ------------  -------------
          1  mysql          mysql   running   No            N/A                      N/A

== Schema mappings ==
Origin schema    Destination schema
---------------  --------------------
world_x          pgworld_x

== Replica status ==
---------------------  ---
Tables not replicated  0
Tables replicated      4
All tables             4
Last maintenance       N/A
Next maintenance       N/A
Replayed rows
Replayed DDL
Skipped rows
---------------------  ---
$> chameleon show_errors --config default 
output: There are no errors in the log

As discussed earlier that each of the replication functionality is managed with the help of daemons, which can be viewed by querying the process table using Linux “ps” command, exhibited below.

$>  ps -ef|grep chameleon
root       763     1  0 19:20 ?        00:00:00 /u01/media/mysql_samp_dbs/world_x-db/venv/bin/python3.6 /u01/media/mysq l_samp_dbs/world_x-db/venv/bin/chameleon start_replica --config default --source mysql
root       764   763  0 19:20 ?        00:00:01 /u01/media/mysql_samp_dbs/world_x-db/venv/bin/python3.6 /u01/media/mysq l_samp_dbs/world_x-db/venv/bin/chameleon start_replica --config default --source mysql
root       765   763  0 19:20 ?        00:00:00 /u01/media/mysql_samp_dbs/world_x-db/venv/bin/python3.6 /u01/media/mysq l_samp_dbs/world_x-db/venv/bin/chameleon start_replica --config default --source mysql

No replication setup is complete until it is put to the “real-time apply” test, which has been simulated as below. It involves creating a table and inserting a couple of records in the MySQL database, subsequently, the “sync_tables” argument of pg_chameleon is invoked to update the daemons to replicate the table along with its records to the PostgreSQL database.

mysql> create table t1 (n1 int primary key, n2 varchar(10));
Query OK, 0 rows affected (0.01 sec)
mysql> insert into t1 values (1,'one');
Query OK, 1 row affected (0.00 sec)
mysql> insert into t1 values (2,'two');
Query OK, 1 row affected (0.00 sec)

$> chameleon sync_tables --tables world_x.t1 --config default --source mysql
Sync tables process for source mysql started.

The test is confirmed by querying the table from PostgreSQL database to reflect the rows.

$> psql -p 5433 -U usr_replica -d db_replica -c "select * from pgworld_x.t1";
 n1 |  n2
----+-------
  1 | one
  2 | two

If it is a migration project then the following pg_chameleon commands will mark the end of the migration effort. The commands should be executed after it is confirmed that rows of all the target tables have been replicated across, and the result will be a cleanly migrated PostgreSQL database without any references to the source database or replication schema (sch_chameleon).

$> chameleon stop_replica --config default --source mysql 
$> chameleon detach_replica --config default --source mysql --debug

Optionally the following commands will drop the source configuration and replication schema.

$> chameleon drop_source --config default --source mysql --debug
$> chameleon drop_replica_schema --config default --source mysql --debug

Pros of Using pg_chameleon

Simple to setup and less complicated configuration
Painless troubleshooting and anomaly detection with easy to understand error output
Additional adhoc tables can be added to the replication after initialization, without altering any other configuration
Multiple sources can be configured for a single destination database, which is useful in consolidation projects to merge data from one or more MySQL databases into a single PostgreSQL database
Selected tables can be skipped from being replicated

Cons of Using pg_chameleon

Only supported from MySQL 5.5 onwards as Origin database and PostgreSQL 9.5 onwards for destination database
Requires every table to have a primary or unique key, otherwise, the tables get initialized during the init_replica process but they will fail to replicate
One way replication, i.e., MySQL to PostgreSQL. Thereby limiting its use to only an active-passive setup
The source database can only be a MySQL database while support for PostgreSQL database as source is experimental with further limitations (click here to learn more)

pg_chameleon Summary

The replication approach offered by pg_chameleon is favourable to a database migration of MySQL to PostgreSQL. However, one of the significant limitations of one-way replication can discourage database professionals to adopt it for anything other than migration. This drawback of unidirectional replication can be addressed using yet another open source tool called SymmetricDS.

In order to study the utility more in detail, please refer to the official documentation here. The command line reference can be obtained from here.

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

An Overview of SymmetricDS

SymmetricDS is an open source tool that is capable of replicating any database to any other database, from the popular list of database servers such as Oracle, MongoDB, PostgreSQL, MySQL, SQL Server, MariaDB, DB2, Sybase, Greenplum, Informix, H2, Firebird and other cloud based database instances such as Redshift and Azure etc. Some of the offerings include database and file synchronization, multi-master replication, filtered synchronization, and transformation. The tool is developed using Java, requiring a standard edition (version 8.0 or above) of either JRE or JDK. The functionality involves data changes being captured by triggers at source database and routing it to a participating destination database as outgoing batches

Features of SymmetricDS

Platform independent, which means two or more dissimilar databases can communicate with each other, any database to any other database
Relational databases achieve synchronization using change data capture while file system based systems utilize file synchronization
Bi-directional replication using Push and Pull method, which is accomplished based on set rules
Data transfer can also occur over secure and low bandwidth networks
Automatic recovery during the resumption of a crashed node and automatic conflict resolution
Cloud ready and contains powerful extension APIs

Demo

SymmetricDS can be configured in one of the two options:

A master (parent) node that acts as a centralized intermediary coordinating data replication between two slave (child) nodes, in which the communication between the two child nodes can only occur via the parent.
An active node (node1) can replicate to and from another active node (node2) without any intermediary.

In both the options, the communication between the nodes happens via “Push” and “Pull” events. In this demo, an active-active configuration between two nodes will be explained. The full architecture can be exhaustive, so the readers are encouraged to check the user guide available here to learn more about the internals of SymmetricDS.

Installing SymmetricDS is as simple as downloading the open source version of zip file from here and extracting it in a convenient location. The details of install location and version of SymmetricDS in this demo are as per the table below, along with other details pertaining to database versions, Linux versions, ip addresses and communication port for both the participating nodes.

Host	vm1	vm2
OS version	CentOS Linux release 7.6 x86_64	CentOS Linux release 7.6 x86_64
Database server version	MySQL 5.7.26	PostgreSQL 10.5
Database port	3306	5832
ip address	192.168.1.107	192.168.1.112
SymmetricDS version	SymmetricDS 3.9	SymmetricDS 3.9
SymmetricDS install location	/usr/local/symmetric-server-3.9.20	/usr/local/symmetric-server-3.9.20
SymmetricDS node name	corp-000	store-001

The install home in this case is “/usr/local/symmetric-server-3.9.20” which will be the home directory of SymmetricDS, which contains various other sub-directories and files. Two of the sub-directories that are of importance now are “samples” and “engines”. The samples directory contains node properties configuration file samples in addition to sample SQL scripts to kick start a quick demo.

The following three node properties configuration files can be seen in the “samples” directory with names indicating the nature of node in a given setup.

corp-000.properties
store-001.properties
store-002.properties

As SymmetricDS comes with all the necessary configuration files to support a basic 3 node setup (option 1), it is convenient to use the same configuration files to setup a 2 node setup (option 2) as well. The intended configuration file is copied from the “samples” directory to the “engines” on host vm1, and it looks like below.

$> cat engines/corp-000.properties
engine.name=corp-000
db.driver=com.mysql.jdbc.Driver
db.url=jdbc:mysql://192.168.1.107:3306/replica_db?autoReconnect=true&useSSL=false
db.user=root
db.password=admin123
registration.url=
sync.url=http://192.168.1.107:31415/sync/corp-000
group.id=corp
external.id=000

The name of this node in SymmetricDS configuration is “corp-000” with the database connection handled with mysql jdbc driver using the connection string as stated above along with login credentials. The database to connect is “replica_db” and the tables will be created during the creation of sample schema. The “sync.url” denotes the location to contact the node for synchronization.

The node 2 on host vm2 is configured as “store-001” with the rest of the details as configured in the node.properties file, shown below. The “store-001” node runs a PostgreSQL database, with “pgdb_replica” as the database for replication. The “registration.url” enables host “vm2” to communicate with host “vm1” to pull configuration details.

$> cat engines/store-001.properties
engine.name=store-001
db.driver=org.postgresql.Driver
db.url=jdbc:postgresql://192.168.1.112:5832/pgdb_replica
db.user=postgres
db.password=admin123
registration.url=http://192.168.1.107:31415/sync/corp-000
group.id=store
external.id=001

The pre-configured default demo of SymmetricDS contains settings to setup a bi-directional replication between two database servers (two nodes). The steps below are executed on host vm1 (corp-000), which will create a sample schema having 4 tables. Further, execution of “create-sym-tables” with “symadmin” command will create the catalog tables that store and control the rules and direction of replication between nodes. Finally, the demo tables are loaded with sample data.

vm1$> cd /usr/local/symmetric-server-3.9.20/bin
vm1$> ./dbimport --engine corp-000 --format XML create_sample.xml
vm1$> ./symadmin --engine corp-000 create-sym-tables
vm1$> ./dbimport --engine corp-000 insert_sample.sql

The demo tables “item” and “item_selling_price” are auto-configured to replicate from corp-000 to store-001 while the sale tables (sale_transaction and sale_return_line_item) are auto-configured replicate from store-001 to corp-000. The next step is to create the sample schema in the PostgreSQL database on host vm2 (store-001), in order to prepare it to receive data from corp-000.

vm2$> cd /usr/local/symmetric-server-3.9.20/bin
vm2$> ./dbimport --engine store-001 --format XML create_sample.xml

It is important to verify the existence of demo tables and SymmetricDS catalog tables in the MySQL database on vm1 at this stage. Note, the SymmetricDS system tables (tables with prefix “sym_”) are only available in the corp-000 node at this point of time, because that is where the “create-sym-tables” command was executed, which will be the place to control and manage the replication. In addition to that, the store-001 node database will only have 4 demo tables with no data in it.

The environment is now ready to start the “sym” server processes on both the nodes, as show below.

vm1$> cd /usr/local/symmetric-server-3.9.20/bin
vm1$> sym 2>&1 &

The log entries are both sent to a background log file (symmetric.log) under a logs directory in the SymmetricDS install location as well as to the standard output. The “sym” server can now be initiated on store-001 node.

vm2$> cd /usr/local/symmetric-server-3.9.20/bin
vm2$> sym 2>&1 &

The startup of “sym” server process on host vm2 will create the SymmetricDS catalog tables in the PostgreSQL database as well. The startup of “sym” server process on both the nodes will get them to coordinate with each other to replicate data from corp-000 to store-001. After a few seconds, querying all the four tables on either side will show the successful replication results. Alternatively, an initial load can also be sent to the store-001 node from corp-000 with the below command.

vm1$> ./symadmin --engine corp-000 reload-node 001

At this point, a new record is inserted into the “item” table in MySQL database at corp-000 node (host: vm1) and it can be verified to have successfully replicated to the PostgreSQL database at store-001 node (host: vm2). This shows the “Pull” event of data from corp-000 to store-001.

mysql> insert into item values ('22000002','Jelly Bean');
Query OK, 1 row affected (0.00 sec)

vm2$> psql -p 5832 -U postgres pgdb_replica -c "select * from item" 
 item_id  |   name
----------+-----------
 11000001 | Yummy Gum
 22000002 | Jelly Bean
(2 rows)

The “Push” event of data from store-001 to corp-000 can be achieved by inserting a record into the “sale_transaction” table and confirming it to replicate through.

pgdb_replica=# insert into "sale_transaction" ("tran_id", "store_id", "workstation", "day", "seq") values (1000, '001', '3', '2007-11-01', 100);
vm1$> [root@vm1 ~]#  mysql -uroot -p'admin123' -D replica_db -e "select * from sale_transaction";
+---------+----------+-------------+------------+-----+
| tran_id | store_id | workstation | day        | seq |
+---------+----------+-------------+------------+-----+
|     900 | 001      | 3           | 2012-12-01 |  90 |
|    1000 | 001      | 3           | 2007-11-01 | 100 |
|    2000 | 002      | 2           | 2007-11-01 | 200 |
+---------+----------+-------------+------------+-----+

This marks the successful configuration of bidirectional replication of demo tables between a MySQL and PostgreSQL database. Whereas, the configuration of replication for newly created user tables can be achieved using the following steps. An example table “t1” is created for the demo and the rules of its replication are configured as per the procedure below. The steps only configure the replication from corp-000 to store-001.

mysql> create table  t1 (no integer);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into sym_channel (channel_id,create_time,last_update_time) 
values ('t1',current_timestamp,current_timestamp);
Query OK, 1 row affected (0.01 sec)

mysql> insert into sym_trigger (trigger_id, source_table_name,channel_id,
last_update_time, create_time) values ('t1', 't1', 't1', current_timestamp,
current_timestamp);
Query OK, 1 row affected (0.01 sec)

mysql> insert into sym_trigger_router (trigger_id, router_id,
Initial_load_order, create_time,last_update_time) values ('t1',
'corp-2-store-1', 1, current_timestamp,current_timestamp);
Query OK, 1 row affected (0.01 sec)

After this, the configuration is notified about the schema change of adding a new table by invoking the symadmin command with “sync-triggers” argument which will recreate the triggers to match table definitions. Subsequently, execute “send-schema” to send schema changes out to store-001 node, following which the replication of “t1” table will be configured successfully.

vm1$> ./symadmin -e corp-000 --node=001 sync-triggers    
vm1$> ./symadmin send-schema -e corp-000 --node=001 t1

Pros of Using SymmetricDS

Effortless installation and configuration including a pre-configured set of parameter files to build either a 3-node or a 2-node setup
Cross platform database enabled and platform independent including servers, laptops and mobile devices
Replicate any database to any other database, whether on-prem, WAN or cloud
Capable of optimally handling a couple of databases to several thousand databases to replicate data seamlessly
A commercial version of the software offers GUI driven management console with an excellent support package

Cons of Using SymmetricDS

Manual command line configuration may involve defining rules and direction of replication via SQL statements to load catalog tables, which may be inconvenient to manage
Setting up a large number of tables for replication will be an exhaustive effort, unless some form of scripting is utilized to generate the SQL statements defining rules and direction of replication
Plenty of logging information cluttering the logfile, thereby requiring periodic logfile maintenance to not allow the logfile to fill up the disk

SymmetricDS Summary

SymmetricDS offers the ability to setup bi-directional replication between 2 nodes, 3 nodes and so on for several thousand nodes to replicate data and achieve file synchronization. It is a unique tool that performs many of the self-healing maintenance tasks such as the automatic recovery of data after extended periods of downtime in a node, secure and efficient communication between nodes with the help of HTTPS and automatic conflict management based on set rules, etc. The essential feature of replicating any database to any other database makes SymmetricDS ready to be deployed for a number of use cases including migration, version and patch upgrade, distribution, filtering and transformation of data across diverse platforms.

The demo was created by referring to the official quick-start tutorial of SymmetricDS which can be accessed from here. The user guide can be found here, which provides a detailed account of various concepts involved in a SymmetricDS replication setup.

↧

An Overview of PostgreSQL & MySQL Cross Replication

June 12, 2019, 11:15 pm

≫ Next: Percona XtraDB Cluster 5.6.44-28.34 Is Now Available

≪ Previous: An Overview of PostgreSQL to MySQL Cross Replication

Feed: Planet MySQL
;
Author: Severalnines
;

Background of pg_chameleon

Features of pg_chameleon

Multiple MySQL schemas from the same cluster can be replicated to a single target PostgreSQL database, forming a many-to-one replication setup
The source and target schema names can be non-identical
Replication data can be pulled from MySQL cascading replica
Tables that fail to replicate or generate errors are excluded
Each replication functionality is managed with the help of daemons
Controlled with the help of parameters and configuration files based on YAML construct

Demo

Host	vm1	vm2
OS version	CentOS Linux release 7.6 x86_64	CentOS Linux release 7.5 x86_64
Database server with version	MySQL 5.7.26	PostgreSQL 10.5
Database port	3306	5433
ip address	192.168.56.102	192.168.56.106

To begin with, prepare the setup with all the prerequisites needed to install pg_chameleon. In this demo Python 3.6.8 is installed, creating a virtual environment and activating it for use.

$> wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tar.xz
$> tar -xJf Python-3.6.8.tar.xz
$> cd Python-3.6.8
$> ./configure --enable-optimizations
$> make altinstall

$> python3.6 -m venv venv
$> source venv/bin/activate
(venv) $> pip install pip --upgrade
(venv) $> pip install pg_chameleon==2.0.9

The next step is to invoke the pg_chameleon (chameleon is the command) with set_configuration_files argument to enable pg_chameleon to create default directories and configuration files.

(venv) $> chameleon set_configuration_files
creating directory /root/.pg_chameleon
creating directory /root/.pg_chameleon/configuration/
creating directory /root/.pg_chameleon/logs/
creating directory /root/.pg_chameleon/pid/
copying configuration  example in /root/.pg_chameleon/configuration//config-example.yml

Now, create a copy of config-example.yml as default.yml to make it the default configuration file. A sample configuration file used for this demo is provided below.

$> cat default.yml
---
#global settings
pid_dir: '~/.pg_chameleon/pid/'
log_dir: '~/.pg_chameleon/logs/'
log_dest: file
log_level: info
log_days_keep: 10
rollbar_key: ''
rollbar_env: ''

# type_override allows the user to override the default type conversion into a different one.
type_override:
  "tinyint(1)":
    override_to: boolean
    override_tables:
      - "*"

#postgres  destination connection
pg_conn:
  host: "192.168.56.106"
  port: "5433"
  user: "usr_replica"
  password: "pass123"
  database: "db_replica"
  charset: "utf8"

sources:
  mysql:
    db_conn:
      host: "192.168.56.102"
      port: "3306"
      user: "usr_replica"
      password: "pass123"
      charset: 'utf8'
      connect_timeout: 10
    schema_mappings:
      world_x: pgworld_x
    limit_tables:
#      - delphis_mediterranea.foo
    skip_tables:
#      - delphis_mediterranea.bar
    grant_select_to:
      - usr_readonly
    lock_timeout: "120s"
    my_server_id: 100
    replica_batch_size: 10000
    replay_max_rows: 10000
    batch_retention: '1 day'
    copy_max_memory: "300M"
    copy_mode: 'file'
    out_dir: /tmp
    sleep_loop: 1
    on_error_replay: continue
    on_error_read: continue
    auto_maintenance: "disabled"
    gtid_enable: No
    type: mysql
    skip_events:
      insert:
        - delphis_mediterranea.foo #skips inserts on the table delphis_mediterranea.foo
      delete:
        - delphis_mediterranea #skips deletes on schema delphis_mediterranea
      update:

mysql> CREATE USER usr_replica ;
mysql> SET PASSWORD FOR usr_replica='pass123';
mysql> GRANT ALL ON world_x.* TO 'usr_replica';
mysql> GRANT RELOAD ON *.* to 'usr_replica';
mysql> GRANT REPLICATION CLIENT ON *.* to 'usr_replica';
mysql> GRANT REPLICATION SLAVE ON *.* to 'usr_replica';
mysql> FLUSH PRIVILEGES;

postgres=# CREATE USER usr_replica WITH PASSWORD 'pass123';
CREATE ROLE
postgres=# CREATE DATABASE db_replica WITH OWNER usr_replica;
CREATE DATABASE

The MySQL database is configured with a few parameter changes in order to prepare it for replication, as shown below, and it requires a database server restart for the changes to take effect.

$> vi /etc/my.cnf
binlog_format= ROW
binlog_row_image=FULL
log-bin = mysql-bin
server-id = 1

At this point, it is significant to test the connectivity to both the database servers to ensure there are no issues when pg_chameleon commands are executed.

On the PostgreSQL node:

$> mysql -u usr_replica -Ap'admin123' -h 192.168.56.102 -D world_x

On the MySQL node:

$> psql -p 5433 -U usr_replica -h 192.168.56.106 db_replica

$> chameleon create_replica_schema --debug
$> chameleon add_source --config default --source mysql --debug
$> chameleon init_replica --config default --source mysql --debug

The final step is to start the replication with “start_replica”, the success of which is indicated by an output hint as shown below.

$> chameleon start_replica --config default --source mysql 
output: Starting the replica process for source mysql

The status of replication can be queried with the “show_status” argument while errors can be viewed with ‘show_errors” argument.

$> chameleon show_status --source mysql  
OUTPUT: 
  Source id  Source name    Type    Status    Consistent    Read lag    Last read    Replay lag    Last replay
-----------  -------------  ------  --------  ------------  ----------  -----------  ------------  -------------
          1  mysql          mysql   running   No            N/A                      N/A

== Schema mappings ==
Origin schema    Destination schema
---------------  --------------------
world_x          pgworld_x

== Replica status ==
---------------------  ---
Tables not replicated  0
Tables replicated      4
All tables             4
Last maintenance       N/A
Next maintenance       N/A
Replayed rows
Replayed DDL
Skipped rows
---------------------  ---
$> chameleon show_errors --config default 
output: There are no errors in the log

As discussed earlier that each of the replication functionality is managed with the help of daemons, which can be viewed by querying the process table using Linux “ps” command, exhibited below.

$>  ps -ef|grep chameleon
root       763     1  0 19:20 ?        00:00:00 /u01/media/mysql_samp_dbs/world_x-db/venv/bin/python3.6 /u01/media/mysq l_samp_dbs/world_x-db/venv/bin/chameleon start_replica --config default --source mysql
root       764   763  0 19:20 ?        00:00:01 /u01/media/mysql_samp_dbs/world_x-db/venv/bin/python3.6 /u01/media/mysq l_samp_dbs/world_x-db/venv/bin/chameleon start_replica --config default --source mysql
root       765   763  0 19:20 ?        00:00:00 /u01/media/mysql_samp_dbs/world_x-db/venv/bin/python3.6 /u01/media/mysq l_samp_dbs/world_x-db/venv/bin/chameleon start_replica --config default --source mysql

mysql> create table t1 (n1 int primary key, n2 varchar(10));
Query OK, 0 rows affected (0.01 sec)
mysql> insert into t1 values (1,'one');
Query OK, 1 row affected (0.00 sec)
mysql> insert into t1 values (2,'two');
Query OK, 1 row affected (0.00 sec)

$> chameleon sync_tables --tables world_x.t1 --config default --source mysql
Sync tables process for source mysql started.

The test is confirmed by querying the table from PostgreSQL database to reflect the rows.

$> psql -p 5433 -U usr_replica -d db_replica -c "select * from pgworld_x.t1";
 n1 |  n2
----+-------
  1 | one
  2 | two

$> chameleon stop_replica --config default --source mysql 
$> chameleon detach_replica --config default --source mysql --debug

Optionally the following commands will drop the source configuration and replication schema.

$> chameleon drop_source --config default --source mysql --debug
$> chameleon drop_replica_schema --config default --source mysql --debug

Pros of Using pg_chameleon

Simple to setup and less complicated configuration
Painless troubleshooting and anomaly detection with easy to understand error output
Additional adhoc tables can be added to the replication after initialization, without altering any other configuration
Multiple sources can be configured for a single destination database, which is useful in consolidation projects to merge data from one or more MySQL databases into a single PostgreSQL database
Selected tables can be skipped from being replicated

Cons of Using pg_chameleon

Only supported from MySQL 5.5 onwards as Origin database and PostgreSQL 9.5 onwards for destination database
Requires every table to have a primary or unique key, otherwise, the tables get initialized during the init_replica process but they will fail to replicate
One way replication, i.e., MySQL to PostgreSQL. Thereby limiting its use to only an active-passive setup
The source database can only be a MySQL database while support for PostgreSQL database as source is experimental with further limitations (click here to learn more)

pg_chameleon Summary

In order to study the utility more in detail, please refer to the official documentation here. The command line reference can be obtained from here.

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

An Overview of SymmetricDS

Features of SymmetricDS

Platform independent, which means two or more dissimilar databases can communicate with each other, any database to any other database
Relational databases achieve synchronization using change data capture while file system based systems utilize file synchronization
Bi-directional replication using Push and Pull method, which is accomplished based on set rules
Data transfer can also occur over secure and low bandwidth networks
Automatic recovery during the resumption of a crashed node and automatic conflict resolution
Cloud ready and contains powerful extension APIs

Demo

SymmetricDS can be configured in one of the two options:

A master (parent) node that acts as a centralized intermediary coordinating data replication between two slave (child) nodes, in which the communication between the two child nodes can only occur via the parent.
An active node (node1) can replicate to and from another active node (node2) without any intermediary.

Host	vm1	vm2
OS version	CentOS Linux release 7.6 x86_64	CentOS Linux release 7.6 x86_64
Database server version	MySQL 5.7.26	PostgreSQL 10.5
Database port	3306	5832
ip address	192.168.1.107	192.168.1.112
SymmetricDS version	SymmetricDS 3.9	SymmetricDS 3.9
SymmetricDS install location	/usr/local/symmetric-server-3.9.20	/usr/local/symmetric-server-3.9.20
SymmetricDS node name	corp-000	store-001

The following three node properties configuration files can be seen in the “samples” directory with names indicating the nature of node in a given setup.

corp-000.properties
store-001.properties
store-002.properties

$> cat engines/corp-000.properties
engine.name=corp-000
db.driver=com.mysql.jdbc.Driver
db.url=jdbc:mysql://192.168.1.107:3306/replica_db?autoReconnect=true&useSSL=false
db.user=root
db.password=admin123
registration.url=
sync.url=http://192.168.1.107:31415/sync/corp-000
group.id=corp
external.id=000

$> cat engines/store-001.properties
engine.name=store-001
db.driver=org.postgresql.Driver
db.url=jdbc:postgresql://192.168.1.112:5832/pgdb_replica
db.user=postgres
db.password=admin123
registration.url=http://192.168.1.107:31415/sync/corp-000
group.id=store
external.id=001

vm1$> cd /usr/local/symmetric-server-3.9.20/bin
vm1$> ./dbimport --engine corp-000 --format XML create_sample.xml
vm1$> ./symadmin --engine corp-000 create-sym-tables
vm1$> ./dbimport --engine corp-000 insert_sample.sql

vm2$> cd /usr/local/symmetric-server-3.9.20/bin
vm2$> ./dbimport --engine store-001 --format XML create_sample.xml

The environment is now ready to start the “sym” server processes on both the nodes, as show below.

vm1$> cd /usr/local/symmetric-server-3.9.20/bin
vm1$> sym 2>&1 &

vm2$> cd /usr/local/symmetric-server-3.9.20/bin
vm2$> sym 2>&1 &

vm1$> ./symadmin --engine corp-000 reload-node 001

mysql> insert into item values ('22000002','Jelly Bean');
Query OK, 1 row affected (0.00 sec)

vm2$> psql -p 5832 -U postgres pgdb_replica -c "select * from item" 
 item_id  |   name
----------+-----------
 11000001 | Yummy Gum
 22000002 | Jelly Bean
(2 rows)

The “Push” event of data from store-001 to corp-000 can be achieved by inserting a record into the “sale_transaction” table and confirming it to replicate through.

pgdb_replica=# insert into "sale_transaction" ("tran_id", "store_id", "workstation", "day", "seq") values (1000, '001', '3', '2007-11-01', 100);
vm1$> [root@vm1 ~]#  mysql -uroot -p'admin123' -D replica_db -e "select * from sale_transaction";
+---------+----------+-------------+------------+-----+
| tran_id | store_id | workstation | day        | seq |
+---------+----------+-------------+------------+-----+
|     900 | 001      | 3           | 2012-12-01 |  90 |
|    1000 | 001      | 3           | 2007-11-01 | 100 |
|    2000 | 002      | 2           | 2007-11-01 | 200 |
+---------+----------+-------------+------------+-----+

mysql> create table  t1 (no integer);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into sym_channel (channel_id,create_time,last_update_time) 
values ('t1',current_timestamp,current_timestamp);
Query OK, 1 row affected (0.01 sec)

mysql> insert into sym_trigger (trigger_id, source_table_name,channel_id,
last_update_time, create_time) values ('t1', 't1', 't1', current_timestamp,
current_timestamp);
Query OK, 1 row affected (0.01 sec)

mysql> insert into sym_trigger_router (trigger_id, router_id,
Initial_load_order, create_time,last_update_time) values ('t1',
'corp-2-store-1', 1, current_timestamp,current_timestamp);
Query OK, 1 row affected (0.01 sec)

vm1$> ./symadmin -e corp-000 --node=001 sync-triggers    
vm1$> ./symadmin send-schema -e corp-000 --node=001 t1

Pros of Using SymmetricDS

Effortless installation and configuration including a pre-configured set of parameter files to build either a 3-node or a 2-node setup
Cross platform database enabled and platform independent including servers, laptops and mobile devices
Replicate any database to any other database, whether on-prem, WAN or cloud
Capable of optimally handling a couple of databases to several thousand databases to replicate data seamlessly
A commercial version of the software offers GUI driven management console with an excellent support package

Cons of Using SymmetricDS

Manual command line configuration may involve defining rules and direction of replication via SQL statements to load catalog tables, which may be inconvenient to manage
Setting up a large number of tables for replication will be an exhaustive effort, unless some form of scripting is utilized to generate the SQL statements defining rules and direction of replication
Plenty of logging information cluttering the logfile, thereby requiring periodic logfile maintenance to not allow the logfile to fill up the disk

SymmetricDS Summary

↧

Percona XtraDB Cluster 5.6.44-28.34 Is Now Available

June 19, 2019, 9:11 am

≫ Next: SQL Triggers Tutorial With Example | Triggers in SQL

≪ Previous: An Overview of PostgreSQL & MySQL Cross Replication

Feed: Planet MySQL
;
Author: MySQL Performance Blog
;

Percona is glad to announce the release of Percona XtraDB Cluster 5.6.44-28.34 on June 19, 2019. Binaries are available from the downloads section or from our software repositories.

Percona XtraDB Cluster 5.6.44-28.34 is now the current release, based on the following:

All Percona software is open-source and free.

Bugs Fixed

PXC-2480: In some cases, Percona XtraDB Cluster could not replicate CURRENT_USER() used in the ALTER statement. USER() and CURRENT_USER() are no longer allowed in any ALTER statement since they fail when replicated.
PXC-2487: The case when a DDL or DML action was in progress from one client and the provider was updated
from another client could result in a race condition.
PXC-2490: Percona XtraDB Cluster could crash when binlog_space_limit was set to a value other than zero during wsrep_recover mode.
PXC-2497: The user can set the preferred donor by setting the wsrep_sst_donor variable. An IP address is not valid as the value of this variable. If the user still used an IP address, an error message was produced that did not provide sufficient information. The error message has been improved to suggest that the user check the value of the wsrep_sst_donor for an IP address.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

↧

SQL Triggers Tutorial With Example | Triggers in SQL

June 24, 2019, 1:51 am

≫ Next: Percona XtraDB Cluster 5.7.26-31.37 Is Now Available

≪ Previous: Percona XtraDB Cluster 5.6.44-28.34 Is Now Available

Feed: Planet MySQL
;
Author: Krunal Lathiya
;

SQL Triggers Tutorial With Example | Triggers in SQL is today’s topic. SQL trigger is invoked by the database automatically when any change in the event occurs. It is known as a particular type of stored procedure because triggers are called directly, which is not done in case of stored procedures. The only difference between SQL triggers and stored procedures is that stored procedure needs to be called explicitly, whereas SQL triggers are called implicitly. SQL triggers are invoked when a row is inserted in a table, or any columns are being updated.

Content Overview

SQL Triggers Tutorial With Example

Triggers are the stored programs, which are automatically fired or executed when some events take place. It is like event-based programming. Triggers are written to be executed in the response to any of the following events.

The database manipulation (DML) statement (DELETE, INSERT, or UPDATE)
The database definition (DDL) statement (CREATE, ALTER, or DROP).
The database operation (SERVERERROR, LOGIN, LOGOUT, STARTUP, or SHUTDOWN).

Triggers can be defined on the table, view, schema, or database with which an event is associated.

The trigger is a stored procedure in the database which automatically invokes whenever the special event in a database occurs. For example, the trigger can be invoked when the row is inserted into the specified table or when the certain table columns are being updated.

#Syntax

Create trigger [trigger name] 
[before/after]
{insert/update/delete}
On [table_name]
[for each row]
[trigger_body]

#Parameters

create trigger [trigger name]: Used for creating or replacing an already created trigger with new trigger name.
[before | after]: This statement is used for specifying when the trigger will be executed. Before trigger is used to run the triggers before the triggering statement is run. After a trigger is used to run the triggers after the triggering statement is run.
{insert | update | delete}: This specifies the operation which we want to perform in the tables.
on [table_name]: This is the name of the table.
[for each row]: This statement is associated with row triggers, i.e., triggers will be invoked when any row is affected.
[trigger_body]: It provides the operation to be performed at the time trigger is fired.

Let’s understand by this with an example.

Consider a table: (Students)

Field	Type	NULL	Key	Default	Extra
Roll No	Int(2)	NO	PRI	NULL
Name	Varchar(40)	YES		NULL
English	Int(2)	YES		NULL
Physics	Int(2)	YES		NULL
Chemistry	Int(2)	YES		NULL
Maths	Int(2)	YES		NULL
Total	Int(2)	YES		NULL
Per	Int(2)	YES		NULL

#Query: (SQL TRIGGER)

create trigger marks
before insert
on
students
for each row
set new.total=new.english+new.physics+new.chemistry+new.maths, new.per=(new.total/400)*100;

#Explanation to the above query

Name of the trigger is the marks. Now total and percentage will be automatically calculated and stored in the table students as soon as we insert any rows in the table.

Notice, that we have set total using a new keyword because we are not dealing with the old rows instead, we are inserting new rows so after total all other statements are suffixed with new keyword with a dot operator.

Now we are going to insert values.

insert into students values(1,"Shouvik",83,79,80,50,0,0);

#Explanation to the above query

We have inserted the values in the same format as we used to add. We have set the total and percentage values as 0, which will be computed and updated automatically by triggering statements.

#Output

#Advantages of using SQL triggers

It is used for checking the integrity of the data.
Triggers can be used for catching errors in any fields.
SQL triggers are a better alternative for running scheduled tasks, i.e. by using SQL triggers we don’t have to wait for running the tasks planned because the triggers get automatically invoked whenever there is any change in the table.
SQL triggers are also used for official inspection of data in the tables.

#Disadvantages of using SQL triggers

SQL triggers are used for providing an extended validation, and they cannot be used for replacing all the validation which can be done only by the application layer.
SQL triggers are executed from the client applications, which will be challenging to figure out what is happening in the database layer.
It increases the overhead of the database server.

#Another Example of SQL Trigger

If we want to start with, we will be using the CUSTOMERS table. See the following customers table with its columns and values.

Select * from customers;  

+----+----------+-----+-----------+----------+ 
| ID | NAME     | AGE | ADDRESS   | SALARY   | 
+----+----------+-----+-----------+----------+ 
|  1 | Ramesh   |  32 | Ahmedabad |  2000.00 | 
|  2 | Khilan   |  25 | Delhi     |  1500.00 | 
|  3 | kaushik  |  23 | Kota      |  2000.00 | 
|  4 | Chaitali |  25 | Mumbai    |  6500.00 | 
|  5 | Hardik   |  27 | Bhopal    |  8500.00 | 
|  6 | Komal    |  22 | MP        |  4500.00 | 
+----+----------+-----+-----------+----------+

The following program creates the row-level trigger for the customer’s table that would fire for the INSERT or UPDATE or DELETE operations performed on the CUSTOMERS. The trigger will display the salary difference between old values and new values.

CREATE OR REPLACE TRIGGER display_salary_changes 
BEFORE DELETE OR INSERT OR UPDATE ON customers 
FOR EACH ROW 
WHEN (NEW.ID > 0) 
DECLARE 
   sal_diff number; 
BEGIN 
   sal_diff := :NEW.salary  - :OLD.salary; 
   dbms_output.put_line('Old salary: ' || :OLD.salary); 
   dbms_output.put_line('New salary: ' || :NEW.salary); 
   dbms_output.put_line('Salary difference: ' || sal_diff); 
END; 
/

When the above code is executed at the SQL prompt, it produces the following output.

Trigger created.

The following points need to be considered here.

OLD and NEW references are not available for the table-level triggers; instead, you can use them for the record-level triggers.
If you want to query a table in the same trigger, then you should use a AFTER keyword, because the triggers can query a table or change it again only after an initial change is applied and the table is back in the consistent state.
The above trigger has been written in such a way that it will fire before any DELETE or INSERT or UPDATE operation on a table, but you can write your trigger on the single or multiple operations. For example BEFORE DELETE, which will fire whenever a record will be deleted using the DELETE operation on the table.

#Triggering the Trigger

Let us execute some Data Manipulation operations on the CUSTOMERS table. Here is one INSERT statement, which will a create the new record in the database table.

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY) 
VALUES (7, 'Kriti', 22, 'HP', 7500.00 );

When the record is created in the CUSTOMERS table, the above create the trigger, display_salary_changes will be fired, and it will display the following output.

Old salary: 
New salary: 7500 
Salary difference:

Because this is a new record, the old salary is not available, and the above result comes as null. Let us execute one more DML operation on the CUSTOMERS table. The UPDATE statement will modify the existing record in the table.

UPDATE customers 
SET salary = salary + 500 
WHERE id = 2;

When the record is updated in the CUSTOMERS table, the above create a trigger, the display_salary_changes will be fired, and it will show the following result.

Old salary: 1500 
New salary: 2000 
Salary difference: 500

Finally, SQL Triggers Tutorial With Example | Triggers in SQL is over.

↧

Percona XtraDB Cluster 5.7.26-31.37 Is Now Available

June 26, 2019, 11:37 am

≫ Next: MIN/MAX Optimization and Asynchronous Global Index Maintenance

≪ Previous: SQL Triggers Tutorial With Example | Triggers in SQL

Feed: Planet MySQL
;
Author: MySQL Performance Blog
;

Percona is glad to announce the release of Percona XtraDB Cluster 5.7.26-31.37 on June 26, 2019. Binaries are available from the downloads section or from our software repositories.

Percona XtraDB Cluster 5.7.26-31.37 is now the current release, based on the following:

All Percona software is open-source and free.

Bugs Fixed

PXC-2480: In some cases, Percona XtraDB Cluster could not replicate CURRENT_USER() used in the ALTER statement. USER() and CURRENT_USER() are no longer allowed in any ALTER statement since they fail when replicated.
PXC-2487: The case when a DDL or DML action was in progress from one client and the provider was updated
from another client could result in a race condition.
PXC-2490: Percona XtraDB Cluster could crash when binlog_space_limit was set to a value other than zero during wsrep_recover mode.
PXC-2491: SST could fail if the donor had encrypted undo logs.
PXC-2497: The user can set the preferred donor by setting the wsrep_sst_donor variable. An IP address is not valid as the value of this variable. If the user still used an IP address, an error message was produced that did not provide sufficient information. The error message has been improved to suggest that the user check the value of the wsrep_sst_donor for an IP address.
PXC-2537: Nodes could crash after an attempt to set a password using mysqladmin

Other bugs fixed: PXC-2276, PXC-2292, PXC-2476, PXC-2560

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

↧

MIN/MAX Optimization and Asynchronous Global Index Maintenance

June 28, 2019, 1:27 am

≫ Next: MySQL Tutorial – Learn Step by Step

≪ Previous: Percona XtraDB Cluster 5.7.26-31.37 Is Now Available

Feed: Striving for Optimal Performance.

In this short post I would like to point out a non-obvious issue that one of my customers recently hit. On the one hand, it’s a typical case where the query optimizer generates a different (suboptimal) execution plan even though nothing relevant (of course, at first sight only) was changed. On the other hand, in this case after some time the query optimizer automatically gets back to the original (optimal) execution plan.

Let’s have a look at the issue with the help of a test case…

The test case is based on a range partitioned table:

CREATE TABLE t
PARTITION BY RANGE (d) 
(
  PARTITION t_q1_2019 VALUES LESS THAN (to_date('2019-04-01','yyyy-mm-dd')),
  PARTITION t_q2_2019 VALUES LESS THAN (to_date('2019-07-01','yyyy-mm-dd')),
  PARTITION t_q3_2019 VALUES LESS THAN (to_date('2019-10-01','yyyy-mm-dd')),
  PARTITION t_q4_2019 VALUES LESS THAN (to_date('2020-01-01','yyyy-mm-dd'))
)
AS
SELECT rownum AS n, to_date('2019-01-01','yyyy-mm-dd') + rownum/(1E5/364) AS d, rpad('*',10,'*') AS p
FROM dual
CONNECT BY level 
The partitioned table has a global partitioned index (but the behaviour would be the same with a non-partitioned index):
CREATE INDEX i ON T (n) GLOBAL PARTITION BY HASH (n) PARTITIONS 16
The query hitting the issue contains a MIN (or MAX) function:
SELECT min(n) FROM t
Its execution plan is the following and, as expected, uses the MIN/MAX optimization:
--------------------------------------------
| Id  | Operation                   | Name |
--------------------------------------------
|   0 | SELECT STATEMENT            |      |
|   1 |  SORT AGGREGATE             |      |
|   2 |   PARTITION HASH ALL        |      |
|   3 |    INDEX FULL SCAN (MIN/MAX)| I    |
--------------------------------------------
One day the data stored in the oldest partition is no longer needed and, therefore, it’s dropped (a truncate would lead to the same behaviour). Note that to avoid the invalidation of the index, the UPDATE INDEXES clause is added:
ALTER TABLE t DROP PARTITION t_q1_2019 UPDATE INDEXES
After that operation the query optimizer generates another (suboptimal) execution plan. The index, since it’s valid, it’s used. But, the MIN/MAX optimization is not:
---------------------------------------
| Id  | Operation              | Name |
---------------------------------------
|   0 | SELECT STATEMENT       |      |
|   1 |  SORT AGGREGATE        |      |
|   2 |   PARTITION HASH ALL   |      |
|   3 |    INDEX FAST FULL SCAN| I    |
---------------------------------------
And, even worse (?), few hours later the query optimizer gets back to the original (optimal) execution plan.
The issue is caused by the fact that, as of version 12.1.0.1, Oracle Database optimizes the way DROP/TRUNCATE PARTITION statements that use the UPDATE INDEXES clause are carried out. The index maintenance, to make the DROP/TRUNCATE PARTITION statements faster, is delayed and decoupled from the execution of the DDL statement itself. It’s done asynchronously. For detailed information about that feature have a look to the documentation.
To avoid the issue, you have to make sure to immediately carry out the index maintenance after the execution of the DROP/TRUNCATE PARTITION statement. For that purpose, you can run the following SQL statement:
execute dbms_part.cleanup_gidx(user, 'T')
In summary, even though an index is valid and can be used by some row source operations, if it contains orphaned index entries caused by the asynchronously maintenance of global indexes, it cannot be used by INDEX FULL SCAN (MIN/MAX). A final remark, the same is not true for the INDEX RANGE SCAN (MIN/MAX). In fact, that row source operation can be carried out also in case orphaned index entries exist.

↧

MySQL Tutorial – Learn Step by Step

June 30, 2019, 10:56 am

≫ Next: SQL Truncate Table Statement Tutorial With Example

≪ Previous: MIN/MAX Optimization and Asynchronous Global Index Maintenance

Feed: Planet MySQL
;
Author: Meenakshi Agarwal
;

We bring you the best MySQL tutorial to learn all Basic to Advanced concepts step by step. This post covers all MySQL building blocks such as DDL, DML, DCL, and TCL. DDLs are commands to create, drop, alter, truncate, rename, comment databases, and tables.

Also, DMLs are select, insert, update, delete commands to manipulate data. Moreover, you’ll also find the DCLs on this page such as grant and invoke to manage rights and permissions. After that, there are TCLs to control the transactions.

So, let’s start by telling you that the father of MySQL is Michael Widenius, who named it after his daughter My. He also later founded Maria DB and again called it by his other daughter’s name Maria.

MySQL tutorial to learn step by step

What is the purpose of MySQL?

MySQL is a lightweight, quick, and simple relational database software. Since it is an opensource, so it doesn’t incur any cost for using. It is quite popular in the database world and often used with PHP and Apache Tomcat.

Nowadays, many small to medium companies use it to power up their backends. C and C++ are the programming languages used to develop MySQL.

Why is MySQL so popular?

The success of MySQL is time-proven. It has gained expert’s trust after delivering into the field. However, below are some points defining the reasons for its popularity.

It is an opensource, entirely free to use and hence get the impetus.
It has a laudable set of features that put it ahead of the paid solutions.
The design of MySQL is robust and capable of handling big data and complex queries.
You can install it on a variety of platforms such as Windows, Linux, Mac OS X, etc.
You can develop solutions around it using many languages such as C, C++, Java, PHP, etc.
It complies to the most common SQL programming standards.
Most WordPress (Best CMS) installations use MySQL as their backend. It is one of the biggest reason for its rapid use in Web development.

Moreover, you can store millions of record in a MySQL table and can extend as per the below limit.

unsigned int largest value is 4,294,967,295
unsigned bigint largest value is 18,446,744,073,709,551,615

MySQL tutorial | DDL commands

In this section, you’ll learn about the following MySQL commands:

Create, Alter, Drop, Truncate, and Rename

However, you should first know how to add comments while writing commands.

MySQL allows placing comments in the following two way:

Commenting a single line

You can add a double hyphen (–) at the beginning of a statement to comment it out. The compiler won’t execute the code starting from a “–” till the end of the line (EOL).

Check out the below example:

--Creating a database using a DDL command
CREATE SCHEMA Employee;

Commenting multi-line

To add a multi-line comment or commenting a block of code, you can use a pair of “/*” and “*/” symbols. It is the same as a programmer does in C/C++.

Go through the below example:

/* Query:
   Write a simple query that returns all records
   of the Employee table. */
SELECT * FROM Employee;

It is a good practice to attribute your code with useful comments. And you now know how to do that. After this, you can step on the DDL commands.

CREATE

You can use the CREATE statement for creating a schema, tables, or an index. Let’s find out how:

a) CREATE SCHEMA command

This statement results in the creation of a database.

--Syntax:
CREATE SCHEMA Name_of_Database;

--Example:
CREATE SCHEMA EmployeesInfo;

Note:- In MySQL, a schema is similar to a database. You can replace the DATABASE from SCHEMA keyword and use CREATE SCHEMA in place of CREATE DATABASE.

You can check the previous command result using the following:

--Syntax
SHOW SCHEMAS;

b) CREATE TABLE command

This statement results in the creation of a table in a database.

--Syntax:
CREATE TABLE name_of_table (
field1 data_type,
field2 data_type,
field3 data_type,
....
fieldN data_type);

--Example:
CREATE TABLE Employees
(
EmpID int,
EmpFirstName varchar(255),
EmpLastName varchar(255),
SpouseName varchar(255),
Residence varchar(255),
PinCode int,
State varchar(255)
);

c) CREATE TABLE AS command

This statement results in the creation of a table from a pre-existing table. Hence, the output table will have a similar structure and fields. You can though specify which columns to copy or not.

--Syntax:
CREATE TABLE table_out AS
SELECT field1, field2,...,fieldN
FROM preexisting_table
WHERE ....;

--Example:
CREATE TABLE EmployeeNames AS
SELECT EmpID, EmpFname, EmpLname
FROM Employees;

ALTER

The ALTER statement modifies a table structure. It usually does the following operations:

Add,
Modify, or
Delete constraints or columns.

ALTER TABLE command

You can issue this command to change the properties of a table such as add, update or remove constraints and fields from a table.

--Syntax:
ALTER TABLE name_of_table
ADD field_name data_type;

--Example:
ALTER TABLE Employees
ADD EmpAge int;

DROP

The DROP command can perform operations like delete a database, tables, and fields.

DROP SCHEMA command

You can issue this command drop an entire schema.

--Syntax:
DROP SCHEMA name_of_schema;

--Example:
DROP SCHEMA EmployeeDetails;

DROP TABLE command

You can issue this command drop an entire table along with its data.

--Syntax:
DROP TABLE name_of_table;

--Example:
DROP TABLE Employees;

TRUNCATE

This statement performs operations such as to clear the data inside a table. However, it doesn’t remove the table altogether.

--Syntax:
TRUNCATE TABLE name_of_table;

--Example:
TRUNCATE TABLE EmployeesInfo;

RENAME

This statement allows us to change the name of a table or tables.

Note: It works such that you can rename multiple tables in one go.

--Syntax:
RENAME TABLE
table_one TO table_one_new
[, table_two TO table_two_new] ...

--Example:
RENAME EmployeesInfo TO EmpInfo;

Next, we are discussing the different database keys you should know while working with MySQL. So, let’s go ahead with this.

MySQL tutorial | Database keys used in tables

You should be familiar with the following five database keys:

Primary key

It represents a column in a table which has all non-null and unique values. You can choose such a field as the primary key. Also, there could be more than one such columns, but you need to select one of them.

Candidate key

It is a key that has the potential to become a primary key. It means the candidate keys can also uniquely identify the rows of a table. There could be more than one such keys.

Super key

This key has similar attributes as a candidate key; it should be able to distinctly identifies a record.

In conclusion, we can say that a candidate key is a superkey, but the reverse is not valid.

Foreign key

A foreign key is one which exists as a primary key in another table. It could have null values and also have duplicates.

Alternate key

These are candidate keys which remain after selecting the primary key.

Database Constraints

Similar to database keys, they also have multiple constraints. You can enable them for columns in a table. Let’s check out some of the essential ones.

Not Null

It restricts a column not to have a null value.

Unique

It makes a column accept only unique values.

Check

The column will only accept values satisfying the check condition.

Default

If the column is empty, then it gets the default value.

Index

It enables faster access to database records.

The above two were must-know concepts to understand. Hope, you now have got the desired clarity. After this, let’s go through the DML commands.

MySQL tutorial | DML commands

In this section, we are covering DML (data manipulation) commands. You can use these to view, modify, and delete records in a table. These are as follows:

USE; INSERT; UPDATE; DELETE; SELECT

USE

The USE command sets up a database for any later use of DML commands.

--Syntax:
USE name_of_database;

--Example:
USE EmployeesDB;

INSERT

You can issue INSERT command to add a new record to a table. You have two options to trigger this statement.

See below is the first option. It requires to specify the column names along with their values.

--Syntax1:
INSERT INTO target_table (field1, field2, field3, ..., fieldN)
VALUES (data1, data2, data3, ..., dataN);

The second option doesn’t require you to provide the column names. Check below:

--Syntax2:
INSERT INTO target_table
VALUES (data1, data2, data3, ..., dataN);

--Example:
INSERT INTO Employees(EmpID, EmpFirstName, EmpLastName, SpouseName, Residence, State, PinCode, Salary)
VALUES (11011, 'John','Langer', 'Maria Langer', 'Wall Street', 'NewYork', 10005, 45000);

INSERT INTO Employees
VALUES (11011, 'Ben','Stokes', 'Sally Stokes', '10 Downing Street', 'LONDON', 6355, 55000);

UPDATE

This MySQL command makes you change a record or multiple records with new values in a table. Check out the below example.

--Syntax:
UPDATE target_table
SET field1 = data1, field2 = data2, ...
WHERE check_condition;

--Example:
UPDATE EmpInfo
SET EmpFirstName = 'James', PinCode= 10006
WHERE EmpID = 11011;

DELETE

This command removes one or more records matching the given condition in a table. You can use it to delete the tuples that have lost relevance.

--Syntax:
DELETE FROM target_table
WHERE check_condition;

--Example:
DELETE FROM EmpInfo
WHERE EmpFirstName='John';

SELECT

The select command fetches one or more records depending on the specified condition. Hence, you can run this statement to view a part of the table containing the tuples.

It provides two options to execute. The first one makes you customize the result set by specifying columns. Check the below MySQL Select examples.

--Syntax:
SELECT field1, field2, ..., fieldN
FROM target_table;

The second option is where the select command returns all records from the table. It uses the asterisk (*) symbol, which represents ALL.

--Syntax:
SELECT * FROM target_table;

--Examples:
SELECT EmpFirstName, Residence FROM EmpInfo;
SELECT * FROM EmpInfo;

Above were some simple usage of the Select statement. Besides, you can mix up the following keywords with the SELECT command.

DISTINCT, ORDER BY, GROUP BY, HAVING Clauses

These SELECT clauses help you filer the result set. So, we’ll now demonstrate how to use these with SELECT.

a) SELECT DISTINCT command

This command ensures that the result set includes unique values excluding the duplicates. So, it helps you view all different tuples in a table.

--Syntax:
SELECT DISTINCT field1, field2, ..., fieldN
FROM target_table;

--Example:
SELECT DISTINCT Age FROM Employees;

b) ORDER BY command

When used with SELECT, the ORDER BY command forces the result to appear in ascending order by default. You can though make it return in descending order by appending DESC at the end.

Please note that you can order records by multiple columns one after the other.

--Syntax:
SELECT field1, field2, ..., fieldN
FROM target_table
ORDER BY field1, field2, ..., fieldN ASC|DESC;

--Examples:
SELECT EmpFirstName, EmpLastName FROM EmpInfo
ORDER BY State; 

SELECT * FROM EmpInfo
ORDER BY State DESC;

SELECT * FROM EmpInfo
ORDER BY State DESC, EmpLastName ASC;

c) GROUP BY command

The SELECT … GROUP BY command collects results from a set of rows and groups by one or more field. You may need this when using aggregate functions (COUNT/MAX/MIN/SUM/AVG).

Please note that you can group tuples by multiple columns one after the other.

--Syntax:
SELECT field1, field2,...,fieldN
FROM target_table
WHERE condition
GROUP BY some_fields
ORDER BY some_fields;

--Example:
SELECT COUNT(EmpID), State
FROM Employees
GROUP BY State
ORDER BY COUNT(EmpID) ASC;

d) SELECT with HAVING clause

The HAVING clause works as a replacement of the WHERE for SELECT statement when used with GROUP BY on some column. You can provide the necessary condition along with it.

Check out the below example.

--Syntax:
SELECT field1, field2,...,fieldN
FROM target_table
WHERE condition
GROUP BY some_fields
HAVING condition
ORDER BY some_fields;

--Example:
SELECT COUNT(EmpID), State
FROM EmpInfo
GROUP BY State
HAVING COUNT(Salary) > 50000;

MySQL Tutorial | DCL Commands

In this section, you will see the description and details DCL (Data Control) commands. Their purpose is to set rights and privileges for MySQL schema. These are:

GRANT, REVOKE

GRANT

The GRANT statement assigns privileges to users for accessing the database.

--Syntax:
GRANT privileges ON target_table TO user;

--Example:
GRANT CREATE ANY TABLE TO server_name;

REVOKE

The REVOKE statement withdraws privileges and prevents users from accessing the database.

--Syntax:
REVOKE privileges ON target_table FROM user;

--Example:
REVOKE DELETE ON *.* FROM EmpInfo;

Now, we’ve come to the final part of this MySQL tutorial. In this, we will describe the TCL Commands.

MySQL Tutorial | TCL Commands

In this section, we’ll cover the transaction commands of the database. These are:

COMMIT, ROLLBACK, SAVEPOINT, RELEASE SAVEPOINT, SET TRANSACTION

COMMIT

The COMMIT statement confirms all database transactions that are pending at that time.

--Syntax:
COMMIT;

--Example:
DELETE FROM EmpInfo WHERE Salary <= 5000;
COMMIT;

ROLLBACK

The ROLLBACK statement cancels all database transactions that were part of the most recent COMMIT or ROLLBACK.

--Syntax:
ROLLBACK;

--Example:
DELETE FROM EmpInfo WHERE Salary <= 5000;
ROLLBACK;

SAVEPOINT

The SAVEPOINT statement makes a TAG (a savepoint) for the set of transactions in which to ROLLBACK. After this, you can any time move back to the state by issuing ROLLBACK to the TAG.

--Syntax:
SAVEPOINT TAG_NAME; --Command for creating the SAVEPOINT
ROLLBACK TO TAG_NAME; --Command for rolling back to the tag

--Example:
SAVEPOINT TAG1;
DELETE FROM EmpInfo WHERE Salary <= 5000;
SAVEPOINT TAG2;

RELEASE SAVEPOINT

The RELEASE SAVEPOINT statement dismisses a savepoint created earlier.

--Syntax:
RELEASE SAVEPOINT TAG_NAME;

--Example:
RELEASE SAVEPOINT TAG2;

SET TRANSACTION

The SET TRANSACTION statement assigns an isolation level to a database transaction.

--Syntax:
SET [GLOBAL | SESSION] TRANSACTION [ READ WRITE | READ ONLY ];

The GLOBAL transaction enables the transaction level for all subsequent sessions, whereas the SESSION does it for the current session. If none is specified, then the isolation level defaults to the next transaction performed under the current session.

We believe that the above MySQL tutorial would provide you enough material to get comfortable working. If you now like to practice MySQL and build queries, then do go through the below post.

SQL Queries for Practice

↧

SQL Truncate Table Statement Tutorial With Example

July 10, 2019, 10:27 pm

≫ Next: Diagnostic Data Processing on Cloudera Altus

≪ Previous: MySQL Tutorial – Learn Step by Step

Feed: Planet MySQL
;
Author: Krunal Lathiya
;

SQL Truncate Table Statement Tutorial With Example is today’s topic. The DROP TABLE the command deletes a table in the database. Be careful before removing a table and deleting table results in loss of all information stored in the table. The TRUNCATE TABLE command deletes the data inside a table, but not the table itself.

SQL Truncate Table Statement

TRUNCATE statement is the Data Definition Language (DDL) operation that is used to mark the extents of the table for deallocation (empty for reuse).

The result of this operation quickly deletes all the data from the table, typically bypassing the number of integrity enforcing mechanisms. The truncate was officially introduced in SQL:2008 standard.

SQL Truncate performs the same function as a DELETE statement without a WHERE clause.

Okay, now let’s see the syntax of SQL Truncate Table.

TRUNCATE TABLE table_name

Okay, now let’s see the following example.

See the following table.

I have already created and inserted the data into the Apps table.

Now, we will truncate the Apps table using the following query.

TRUNCATE TABLE Apps;

So, it has removed all the data

Now, again type the following query and try to fetch all the records.

Select * from Apps

You will see that there are data left inside the table.

#DROP TABLE and TRUNCATE TABLE

The significant difference between the Drop Table and Truncate Table is that DROP table deletes the table itself, whereas the Truncate Table does not eliminate the Table itself, it removes the data from the table.

You might choose to truncate the table instead of dropping the table and recreating it.

Truncating the table can be faster and does not affect any of the table’s indexes, triggers, and dependencies. It is also a quick way to clear out the records from the table if you don’t need to worry about the rolling back.

So, DROP TABLE command to delete the complete table, but it would remove the entire table structure form the database, and you will need to re-create that table once again if you wish you store some data still or in the future.

Table or Database deletion using a DROP statement cannot be rolled back, so it must be used wisely.

#DELETE TABLE Vs. TRUNCATE TABLE

Truncate table is much faster and it uses lesser resources than the DELETE TABLE statement.

If the table contains a primary key column, the counter for that column will be reset to a first value. For example, we have ID INT IDENTITY(1, 1) which contains the 100 rows/records, and we performed the TRUNCATE Statement table on ID. That truncate statement will delete all the rows from ID, and reset the IDENTITY to 1.

If you are using the DELETE TABLE query and remove one id, then next id will be removed id + 1. Means it does not restructure the whole data.

If the particular id row is gone, then it is gone. That id will not assign to another row in case of DELETE TABLE.

The SQL Truncate Table query deletes all the rows from a specified table, but the table structure, constraints, columns, indexes will remain the same.

Finally, SQL Truncate Table Statement Tutorial With Example is over.

↧