Quantcast
Channel: DDL – Cloud Data Architect
Viewing all 275 articles
Browse latest View live

Luis M Carril : Backup and restore data in PostgreSQL foreign tables using pg_dump

$
0
0

Feed: Planet PostgreSQL.

In this article, we describe a recent enhancement to pg_dump that Swarm64 contributed that more fully supports the backup and restoration of foreign tables. The contribution has been committed to PostgreSQL 13.

The PostgreSQL foreign-data wrapper (FDW)

The PostgreSQL foreign-data-wrapper interface is an extensibility feature that allows PostgreSQL to integrate data from another data source, which might be another PostgreSQL instance, an Oracle server, a CSV file, or more. Queries against a foreign table access the external data source and return results as if they had been accessed from a native PostgreSQL table.

The Swarm64 DA PostgreSQL accelerator extension speeds up PostgreSQL query performance in a number of ways, including the use of a compressed, columnar storage format that we store in a foreign table.

Foreign table backup and restore prior to PostgreSQL 13

Foreign tables extend PostgreSQL in many useful ways. Unfortunately, prior to PostgreSQL 13, pg_dump does not fully support foreign tables; pg_dump correctly dumps the DDL commands needed to define the foreign table again, but no rows will ever be dumped. If you restore that backup, the foreign tables will have been defined again, but there won’t be any data.

Foreign table backup and restore in PostgreSQL 13

In order to make database administration easier for our customers, as of Swarm64 DA 3.1.0, and PostgreSQL 13, we developed a patched version of pg_dump that allows you to dump the content of a foreign table using the --include-foreign-data option. This option instructs pg_dump to perform a COPY (SELECT * FROM foreign_table) TO on the tables that use the foreign server provided by the option.

For example, the following database test_db has a Swarm64 foreign table foo:

CREATE EXTENSION swarm64da;
CREATE FOREIGN TABLE foo(a int) server swarm64da_server;
INSERT INTO foo SELECT * FROM generate_series(1,10);

You can create the backup dump.txt file using the new option:

$ pg_dump -d test_db --include-foreign-data=swarm64da_server > dump.txt

The backup contains the necessary statements to recreate the table and insert the data that is in table foo, as seen in the following excerpt of dump.txt:

--
-- Name: foo; Type: FOREIGN TABLE; Schema: public; Owner: postgres
--

CREATE FOREIGN TABLE public.foo (
	a integer
)
SERVER swarm64da_server;

ALTER FOREIGN TABLE public.foo OWNER TO postgres;
--
-- Data for Name: foo; Type: TABLE DATA; Schema: public; Owner: postgres
--

COPY public.foo (a) FROM stdin;
1
2
3
4
5
6
7
8
9
10
.

--
-- PostgreSQL database dump complete
--

To restore the dump in a new database named new_db, apply the dump using the psql command:

$ psql new_db < dump.txt

If your setup has a different foreign-data wrapper installed, you can issue the option multiple times to selectively dump the content of tables backed by different foreign servers:

$ pg_dump -d test_db --include-foreign-data=swarm64da_server --include-foreign-data=other_fdw_server

Some FDWs are read-only, such as the file_fdw provided by PostgreSQL. Do not use --include-foreign-data with a server that is read-only because it makes your backup unusable. To help avoid this situation, the --include-foreign-data option forces you to explicitly define the foreign servers that should be used. Unfortunately, this is not automatically preventable because pg_dump has no way of knowing if a foreign-data wrapper is read-only.

The --include-foreign-data option for pg_dump is available as of Swarm64 DA 3.1.0 and PostgreSQL 13. If you are using previous versions of PostgreSQL, you can apply the patch, which for PostgreSQL 12 and 11 can be easily adapted.

To download the patch, go to the Swarm64 Github repository or in the PostgreSQL hackers list.

I’d like to thank the PostgreSQL reviewers and my colleague Ursula Kallio for their contributions.


Build a Simplified ETL and Live Data Query Solution using Redshift Federated Query

$
0
0

Feed: AWS Big Data Blog.

You may have heard the saying that the best ETL is no ETL. Amazon Redshift now makes this possible with Federated Query. In its initial release, this feature lets you query data in Amazon Aurora PostgreSQL or Amazon RDS for PostgreSQL using Amazon Redshift external schemas. Federated Query also exposes the metadata from these source databases through system views and driver APIs, which allows business intelligence tools like Tableau and Amazon Quicksight to connect to Amazon Redshift and query data in PostgreSQL without having to make local copies. This enables a new data warehouse pattern—live data query—in which you can seamlessly retrieve data from PostgreSQL databases, or build data into a late binding view, which combines operational PostgreSQL data, analytical Amazon Redshift local data, and historical Amazon Redshift Spectrum data in an Amazon S3 data lake.

Simplified ETL use case

For this ETL use case, you can simplify the familiar upsert pattern with a federated query. You can bypass the need for incremental extracts in Amazon S3 and the subsequent load via COPY by querying the data in place within its source database. This change can be a single line of code that replaces the COPY command with a query to an external table. See the following code:

BEGIN;
CREATE TEMP TABLE staging (LIKE ods.store_sales);
-- replace the following COPY from S3 
COPY staging FROM 's3://yourETLbucket/daily_store_sales/' 
     IAM_ROLE 'arn:aws:iam:::role/' DELIMITER '|' COMPUPDATE OFF;
-- with this federated query to load staging data from PostgreSQL source
INSERT INTO staging SELECT * FROM pg.store_sales p
	WHERE p.last_updated_date > (SELECT MAX(last_updated_date) FROM ods.store_sales)
DELETE FROM ods.store_sales USING staging s WHERE ods.store_sales.id = s.id;
INSERT INTO ods.store_sales SELECT * FROM staging;
DROP TABLE staging;
COMMIT;

In the preceding example, the table pg.store_sales resides in PostgreSQL, and you use a federated query to retrieve fresh data to load into a staging table in Amazon Redshift, keeping the actual delete and insert operations unchanged. This pattern is likely the most common application of federated queries.

Setting up an external schema

The external schema pg in the preceding example was set up as follows:

CREATE EXTERNAL SCHEMA IF NOT EXISTS pg                                                                         
FROM POSTGRES                                                                                                           
DATABASE 'dev' 
SCHEMA 'retail'                                                                                     
URI 'database-1.cluster-ro-samplecluster.us-east-1.rds.amazonaws.com'                                                    
PORT 5432                                                                                                               
IAM_ROLE 'arn:aws:iam::555566667777:role/myFederatedQueryRDS'                                                           
SECRET_ARN 'arn:aws:secretsmanager:us-east-1:555566667777:secret:MyRDSCredentials-TfzFSB'

If you’re familiar with the CREATE EXTERNAL SCHEMA command from using it in Spectrum, note some new parameter options to enable federated queries.

FROM POSTGRES                                                                                                           
DATABASE 'dev' 
SCHEMA 'retail'

Whereas Amazon Redshift Spectrum references an external data catalog that resides within AWS Glue, Amazon Athena, or Hive, this code points to a Postgres catalog. Also, expect more keywords used with FROM, as Amazon Redshift supports more source databases for federated querying. By default, if you do not specify SCHEMA, it defaults to public.

Within the target database, you identify DATABASE ‘dev’ and SCHEMA ‘retail’, so any queries to the Amazon Redshift table pg. get issued to PostgreSQL as a request for retail. in the dev database. For Amazon Redshift, query predicates are pushed down and run entirely in PostgreSQL, which reduces the result set returned to Amazon Redshift for subsequent operations. Going further, the query planner derives cardinality estimates for external tables to optimize joins between Amazon Redshift and PostgreSQL. From the preceding example:

URI 'database-1.cluster-ro-samplecluster.us-east-1.rds.amazonaws.com'                                                    
PORT 5432

The URI and PORT parameters that reference both the PostgreSQL endpoint and port are self-explanatory, but there are a few things to consider in your configuration:

  • Use a read replica endpoint in Aurora or Amazon RDS for PostgreSQL to reduce load on the primary instance.
  • Set up your Amazon RDS for PostgreSQL instance, Aurora serverless or provisioned instances, and Amazon Redshift clusters to use the same VPC and subnet groups. That way, you can add the security group for the cluster to the inbound rules of the security group for the Aurora or Amazon RDS for PostgreSQL instance.
  • If both Amazon Redshift and Aurora or Amazon RDS for PostgreSQL are on different VPCs, set up VPC peering. For more information, see What is VPC Peering?

Configuring AWS Secrets Manager for remote database credentials

To retrieve AWS Secrets Manager remote database credentials, our example uses the following code:

IAM_ROLE 'arn:aws:iam::555566667777:role/myFederatedQueryRDS'                                                           
SECRET_ARN 'arn:aws:secretsmanager:us-east-1:555566667777:secret:MyRDSCredentials-TfzFSB'

These two parameters are interrelated because the SECRET_ARN is also embedded in the IAM policy for the role.

If a service like Secrets Manager didn’t exist and you wanted to issue a federated query from Amazon Redshift to PostgreSQL, you would need to supply the database credentials to the CREATE EXTERNAL SCHEMA command via a parameter like CREDENTIALS, which you also use with the COPY command. However, this hardcoded approach doesn’t take into account that the PostgreSQL credentials could expire.

You avoid this problem by keeping PostgreSQL database credentials within Secrets Manager, which provides a centralized service to manage secrets. Because Amazon Redshift retrieves and uses these credentials, they are transient and not stored in any generated code and are discarded after query execution.

Storing credentials in Secrets Manager takes up to a few minutes. To store a new secret, complete the following steps:

  1. On the Secrets Manager console, choose Secrets.
  2. Choose Store a new secret.
  3. In the Store a new secret section, complete the following:
  • Supply your PostgreSQL database credentials
  • Name the secret; for example, MyRDSCredentials
  • Configure rotation (you can enable this at a later time)
  • Optionally, copy programmatic code for accessing your secret using your preferred programming languages (which is not needed for this post)
  1. Choose Next.

You can also retrieve the credentials easily.

  1. On the Secrets Manager console, choose your secret.
  2. Choose Retrieve secret value.

The following screenshot shows you the secret value details.

This secret is now an AWS resource referenced via a secret ARN. See the following screenshot.

Setting up an IAM role

You can now pull everything together by embedding the secret ARN into an IAM policy, naming the policy, and attaching it to an IAM role. See the following code:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AccessSecret",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetResourcePolicy",
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret",
                "secretsmanager:ListSecretVersionIds"
            ],
            "Resource": "arn:aws:secretsmanager:us-east-1:555566667777:secret:MyRDSCredentials-TfzFSB"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetRandomPassword",
                "secretsmanager:ListSecrets"
            ],
            "Resource": "*"
        }
    ]
}

The following screenshot shows the details of the IAM role called myFederatedQueryRDS, which contains the MyRDSSecretPolicy policy. It’s the same role that’s supplied in the IAM_ROLE parameter of the CREATE EXTERNAL SCHEMA DDL.

Finally, attach the same IAM role to your Amazon Redshift cluster.

  1. On the Amazon Redshift console, choose your cluster.
  2. From the Actions drop-down menu, choose Manage IAM roles.
  3. Choose and add the IAM role you just created.

You have now completed the following steps:

  1. Create an IAM policy and role
  2. Store your PostgreSQL database credentials in Secrets Manager
  3. Create an Amazon Redshift external schema definition that uses the secret and IAM role to authenticate with a PostgreSQL endpoint
  4. Apply a mapping between an Amazon Redshift database and schema to a PostgreSQL database and schema so Amazon Redshift may issue queries to PostgreSQL tables.

You only need to complete this configuration one time.

Querying live operational data

This section explores another use case: querying operational data across multiple source databases. In this use case, a global online retailer has databases deployed by different teams across distinct geographies:

  • Region us-east-1 runs serverless Aurora PostgreSQL.
  • Region us-west-1 runs provisioned Aurora PostgreSQL, which is also configured as a global database with a read replica in us-east-1.
  • Region eu-west-1 runs an Amazon RDS for PostgreSQL instance with a read replica in us-east-1.

Serverless and provisioned Aurora PostgreSQL and Amazon RDS for PostgreSQL are visible in the Amazon RDS console in Region us-east-1. See the following screenshot:

For this use case, assume that you configured the read replicas for Aurora and Amazon RDS to share the same VPC and subnets in us-east-1 with the local serverless Aurora PostgreSQL. Furthermore, you have already created secrets for each of these instances’ credentials, and also an IAM role MyCombinedRDSSecretPolicy, which is more permissive and allows Amazon Redshift to retrieve the value of any Amazon RDS secret within any Region. Be mindful of security in actual production use, however, and explicitly specify the resource ARNs for each secret in separate statements in your IAM policy. See the following code:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AccessSecret",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetResourcePolicy",
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret",
                "secretsmanager:ListSecretVersionIds"
            ],
            "Resource": "arn:aws:secretsmanager:*:555566667777:secret:*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetRandomPassword",
                "secretsmanager:ListSecrets"
            ],
            "Resource": "*"
        }
    ]
}

External schema DDLs in Amazon Redshift can then reference the combined IAM role and individual secret ARNs. See the following code:

CREATE EXTERNAL SCHEMA IF NOT EXISTS useast
FROM POSTGRES
DATABASE 'dev'
URI 'us-east-1-aurora-pg-serverless.cluster-samplecluster.us-east-1.rds.amazonaws.com'
PORT 5432
IAM_ROLE 'arn:aws:iam::555566667777:role/MyCombinedRDSFederatedQuery'
SECRET_ARN 'arn:aws:secretsmanager:us-east-1:555566667777:secret:MyEastUSAuroraServerlessCredentials-dXOlEq'
;

CREATE EXTERNAL SCHEMA IF NOT EXISTS uswest
FROM POSTGRES
DATABASE 'dev'
URI 'global-aurora-pg-west-coast-stores-instance-1.samplecluster.us-east-1.rds.amazonaws.com'
PORT 5432
IAM_ROLE 'arn:aws:iam::555566667777:role/MyCombinedRDSFederatedQuery'
SECRET_ARN 'arn:aws:secretsmanager:us-west-1:555566667777:secret:MyWestUSAuroraGlobalDBCredentials-p3sV9m'
;

CREATE EXTERNAL SCHEMA IF NOT EXISTS europe
FROM POSTGRES
DATABASE 'dev'
URI 'eu-west-1-postgres-read-replica.samplecluster.us-east-1.rds.amazonaws.com'
PORT 5432
IAM_ROLE 'arn:aws:iam::555566667777:role/MyCombinedRDSFederatedQuery'
SECRET_ARN 'arn:aws:secretsmanager:eu-west-1:555566667777:secret:MyEuropeRDSPostgresCredentials-mz2u9L'
;

This late binding view abstracts the underlying queries to TPC-H lineitem test data within all PostgreSQL instances. See the following code:

CREATE VIEW global_lineitem AS
SELECT 'useast' AS region, * from useast.lineitem
UNION ALL
SELECT 'uswest', * from uswest.lineitem
UNION ALL
SELECT 'europe', * from europe.lineitem
WITH NO SCHEMA BINDING
;

Amazon Redshift can query live operational data across multiple distributed databases and aggregate results into a unified view with this feature. See the following code:

dev=# SELECT region, extract(month from l_shipdate) as month,
      sum(l_extendedprice * l_quantity) - sum(l_discount) as sales
      FROM global_lineitem
      WHERE l_shipdate >= '1997-01-01'
      AND l_shipdate < '1998-01-01'
      AND month < 4
      GROUP BY 1, 2
      ORDER BY 1, 2
;
 region | month |      sales
--------+-------+------------------
 europe |     1 | 16036160823.3700
 europe |     2 | 15089300790.7200
 europe |     3 | 16579123912.6700
 useast |     1 | 16176034865.7100
 useast |     2 | 14624520114.6700
 useast |     3 | 16645469098.8600
 uswest |     1 | 16800599170.4600
 uswest |     2 | 14547930407.7000
 uswest |     3 | 16595334825.9200
(9 rows)

If you examine Remote PG Seq Scan in the following query plan, you see that predicates are pushed down for execution in all three PostgreSQL databases. Unlike your initial simplified ETL use case, no ETL is performed because data is queried and filtered in place. See the following code:

dev=# EXPLAIN SELECT region, extract(month from l_shipdate) as month,
      sum(l_extendedprice * l_quantity) - sum(l_discount) as sales
FROM global_lineitem
WHERE l_shipdate >= '1997-01-01'
AND l_shipdate < '1998-01-01'
AND month < 4
GROUP BY 1, 2
ORDER BY 1, 2
;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 XN Merge  (cost=1000000060145.67..1000000060146.17 rows=200 width=100)
   Merge Key: derived_col1, derived_col2
   ->  XN Network  (cost=1000000060145.67..1000000060146.17 rows=200 width=100)
         Send to leader
         ->  XN Sort  (cost=1000000060145.67..1000000060146.17 rows=200 width=100)
               Sort Key: derived_col1, derived_col2
               ->  XN HashAggregate  (cost=60136.52..60138.02 rows=200 width=100)
                     ->  XN Subquery Scan global_lineitem  (cost=20037.51..60130.52 rows=600 width=100)
                           ->  XN Append  (cost=20037.51..60124.52 rows=600 width=52)
                                 ->  XN Subquery Scan "*SELECT* 1"  (cost=20037.51..20041.51 rows=200 width=52)
                                       ->  XN HashAggregate  (cost=20037.51..20039.51 rows=200 width=52)
                                             ->  XN PG Query Scan lineitem  (cost=0.00..20020.84 rows=1667 width=52)
                                                   ->  Remote PG Seq Scan useast.lineitem  (cost=0.00..20000.00 rows=1667 width=52)
                                                         Filter: ((l_shipdate < '1998-01-01'::date) AND (l_shipdate >= '1997-01-01'::date) AND ("date_part"('month'::text, l_shipdate) < 4))
                                 ->  XN Subquery Scan "*SELECT* 2"  (cost=20037.51..20041.51 rows=200 width=52)
                                       ->  XN HashAggregate  (cost=20037.51..20039.51 rows=200 width=52)
                                             ->  XN PG Query Scan lineitem  (cost=0.00..20020.84 rows=1667 width=52)
                                                   ->  Remote PG Seq Scan uswest.lineitem  (cost=0.00..20000.00 rows=1667 width=52)
                                                         Filter: ((l_shipdate < '1998-01-01'::date) AND (l_shipdate >= '1997-01-01'::date) AND ("date_part"('month'::text, l_shipdate) < 4))
                                 ->  XN Subquery Scan "*SELECT* 3"  (cost=20037.51..20041.51 rows=200 width=52)
                                       ->  XN HashAggregate  (cost=20037.51..20039.51 rows=200 width=52)
                                             ->  XN PG Query Scan lineitem  (cost=0.00..20020.84 rows=1667 width=52)
                                                   ->  Remote PG Seq Scan europe.lineitem  (cost=0.00..20000.00 rows=1667 width=52)
                                                         Filter: ((l_shipdate < '1998-01-01'::date) AND (l_shipdate >= '1997-01-01'::date) AND ("date_part"('month'::text, l_shipdate) < 4))
(24 rows)

Combining the data lake, data warehouse, and live operational data

In this next use case, you join Amazon Redshift Spectrum historical data with current data in Amazon Redshift and live data in PostgreSQL. You use a 3TB TPC-DS dataset and unload data from 1998 through 2001 from the store_sales table in Amazon Redshift to Amazon S3. The unloaded files are stored in Parquet format with ss_sold_date_sk as partitioning key.

To access this historical data via Amazon Redshift Spectrum, create an external table. See the following code:

CREATE EXTERNAL TABLE spectrum.store_sales_historical
(
  ss_sold_time_sk int ,
  ss_item_sk int ,
  ss_customer_sk int ,
  ss_cdemo_sk int ,
  ss_hdemo_sk int ,
  ss_addr_sk int ,
  ss_store_sk int ,
  ss_promo_sk int ,
  ss_ticket_number bigint,
  ss_quantity int ,
  ss_wholesale_cost numeric(7,2) ,
  ss_list_price numeric(7,2) ,
  ss_sales_price numeric(7,2) ,
  ss_ext_discount_amt numeric(7,2) ,
  ss_ext_sales_price numeric(7,2) ,
  ss_ext_wholesale_cost numeric(7,2) ,
  ss_ext_list_price numeric(7,2) ,
  ss_ext_tax numeric(7,2) ,
  ss_coupon_amt numeric(7,2) ,
  ss_net_paid numeric(7,2) ,
  ss_net_paid_inc_tax numeric(7,2) ,
  ss_net_profit numeric(7,2)
)
PARTITIONED BY (ss_sold_date_sk int)
STORED AS PARQUET
LOCATION 's3://mysamplebucket/historical_store_sales/';   

The external spectrum schema is defined as the following:

CREATE EXTERNAL SCHEMA spectrum
FROM data catalog DATABASE 'spectrumdb'
IAM_ROLE 'arn:aws:iam::555566667777:role/mySpectrumRole'
CREATE EXTERNAL DATABASE IF NOT EXISTS;

Instead of an Amazon S3 read-only policy, the IAM role mySpectrumRole contains both AmazonS3FullAccess and AWSGlueConsoleFullAccess policies, in which the former allows Amazon Redshift writes to Amazon S3. See the following code:

UNLOAD ('SELECT * FROM tpcds.store_sales WHERE ss_sold_date_sk < 2452276')
TO 's3://mysamplebucket/historical_store_sales/'
IAM_ROLE 'arn:aws:iam::555566667777:role/mySpectrumRole'
FORMAT AS PARQUET
PARTITION BY (ss_sold_date_sk) ALLOWOVERWRITE;

To make partitioned data visible, the ALTER TABLE ... ADD PARTITION command needs to specify all partition values. For this use case, 2450816 through 2452275 correspond to dates 1998-01-02 through 2001-12-31, respectively. To generate these DDLs quickly, use the following code:

WITH partitions AS (SELECT * FROM generate_series(2450816, 2452275))
SELECT 'ALTER TABLE spectrum.store_sales_historical ADD PARTITION (ss_sold_date_sk='|| generate_series || ') '
    || 'LOCATION 's3://mysamplebucket/historical_store_sales/ss_sold_date_sk=' || generate_series || '/';'
FROM partitions;

You can run the generated ALTER TABLE statements individually or as a batch to make partition data visible. See the following code:

ALTER TABLE spectrum.store_sales_historical 
ADD PARTITION (ss_sold_date_sk=2450816)
LOCATION 's3://mysamplebucket/historical_store_sales/ss_sold_date_sk=2450816/';
-- repeated for all partition values

The three combined sources in the following view consist of historical data in Amazon S3 for 1998 through 2001, current data local to Amazon Redshift for 2002, and live data for two months of 2003 in PostgreSQL. When you create this late binding view, you have to re-order Amazon Redshift Spectrum external table columns because the previous UNLOAD operation specifying ss_sold_date_sk as partition key shifted that column’s order to last. See the following code:

CREATE VIEW store_sales_integrated AS
SELECT * FROM uswest.store_sales_live
UNION ALL
SELECT * FROM tpcds.store_sales_current
UNION ALL
SELECT ss_sold_date_sk, ss_sold_time_sk, ss_item_sk, ss_customer_sk, ss_cdemo_sk, 
       ss_hdemo_sk, ss_addr_sk, ss_store_sk, ss_promo_sk, ss_ticket_number, 
       ss_quantity, ss_wholesale_cost, ss_list_price, ss_sales_price, 
       ss_ext_discount_amt, ss_ext_sales_price, ss_ext_wholesale_cost, 
       ss_ext_list_price, ss_ext_tax, ss_coupon_amt, ss_net_paid, 
       ss_net_paid_inc_tax, ss_net_profit
FROM spectrum.store_sales_historical
WITH NO SCHEMA BINDING;

You can now run a query on the view to aggregate date and join tables across the three sources. See the following code:

dev=# SELECT extract(year from b.d_date), count(a.ss_sold_date_sk)
FROM store_sales_integrated a
JOIN tpcds.date_dim b on (a.ss_sold_date_sk = b.d_date_sk)
GROUP BY 1
ORDER BY 1
;
 date_part |   count
-----------+------------
      1998 | 1632403114
      1999 | 1650163390
      2000 | 1659168880
      2001 | 1641184375
      2002 | 1650209644
      2003 |   17994540
(6 rows)

Time: 77624.926 ms (01:17.625)

This following federated query ran on a two-node DC2.8XL cluster and took 1 minute and 17 seconds to join store sales in Amazon S3, PostgreSQL, and Amazon Redshift, with the date dimension table in Amazon Redshift, aggregating and sorting row counts by year:

dev=# EXPLAIN SELECT extract(year from b.d_date), count(a.ss_sold_date_sk)
FROM store_sales_integrated a
JOIN tpcds.date_dim b on (a.ss_sold_date_sk = b.d_date_sk)
GROUP BY 1
ORDER BY 1;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 XN Merge  (cost=1461036320912.29..1461036321094.91 rows=73049 width=8)
   Merge Key: "date_part"('year'::text, b.d_date)
   ->  XN Network  (cost=1461036320912.29..1461036321094.91 rows=73049 width=8)
         Send to leader
         ->  XN Sort  (cost=1461036320912.29..1461036321094.91 rows=73049 width=8)
               Sort Key: "date_part"('year'::text, b.d_date)
               ->  XN HashAggregate  (cost=461036314645.93..461036315011.18 rows=73049 width=8)
                     ->  XN Hash Join DS_DIST_ALL_NONE  (cost=913.11..428113374829.91 rows=6584587963204 width=8)
                           Hash Cond: ("outer".ss_sold_date_sk = "inner".d_date_sk)
                           ->  XN Subquery Scan a  (cost=0.00..263498674836.70 rows=6584587963204 width=4)
                                 ->  XN Append  (cost=0.00..197652795204.66 rows=6584587963204 width=4)
                                       ->  XN Subquery Scan "*SELECT* 1"  (cost=0.00..539836.20 rows=17994540 width=4)
                                             ->  XN PG Query Scan store_sales_live  (cost=0.00..359890.80 rows=17994540 width=4)
                                                   ->  Remote PG Seq Scan uswest.store_sales_live  (cost=0.00..179945.40 rows=17994540 width=4)
                                       ->  XN Subquery Scan "*SELECT* 2"  (cost=0.00..33004193.28 rows=1650209664 width=4)
                                             ->  XN Seq Scan on store_sales_current  (cost=0.00..16502096.64 rows=1650209664 width=4)
                                       ->  XN Subquery Scan "*SELECT* 3"  (cost=0.00..197619251175.18 rows=6582919759000 width=4)
                                             ->  XN Partition Loop  (cost=0.00..131790053585.18 rows=6582919759000 width=4)
                                                   ->  XN Seq Scan PartitionInfo of spectrum.store_sales_historical  (cost=0.00..10.00 rows=1000 width=4)
                                                   ->  XN S3 Query Scan store_sales_historical  (cost=0.00..131658395.18 rows=6582919759 width=0)
                                                         ->  S3 Seq Scan spectrum.store_sales_historical location:"s3://mysamplebucket/historical_store_sales" format:PARQUET (cost=0.00..65829197.59 rows=6582919759 width=0)
                           ->  XN Hash  (cost=730.49..730.49 rows=73049 width=8)
                                 ->  XN Seq Scan on date_dim b  (cost=0.00..730.49 rows=73049 width=8)
(23 rows)

The query plan shows these are full sequential scans running on the three source tables with the number of returned rows highlighted, totaling 8.2 billion. Because Amazon Redshift Spectrum does not generate statistics for external tables, you manually set the numRows property to the row count for historical data in Amazon S3. See the following code:

ALTER TABLE spectrum.store_sales_historical SET TABLE PROPERTIES ('numRows' = '6582919759');

You can join with another dimension table local to Amazon Redshift, this time the 30 million row customer table, and filter by column c_birth_country. See the following code:

dev=# SELECT extract(year from b.d_date), count(a.ss_sold_date_sk)
FROM store_sales_integrated a
JOIN tpcds.date_dim b on (a.ss_sold_date_sk = b.d_date_sk)
JOIN tpcds.customer c on (a.ss_customer_sk = c.c_customer_sk)
AND c.c_birth_country = 'UNITED STATES'
GROUP BY 1
ORDER BY 1
;
 date_part |  count
-----------+---------
      1998 | 7299277
      1999 | 7392156
      2000 | 7416905
      2001 | 7347920
      2002 | 7390590
      2003 |   81627
(6 rows)

Time: 77878.586 ms (01:17.879)

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 XN Merge  (cost=1363288861214.20..1363288861396.83 rows=73049 width=8)
   Merge Key: "date_part"('year'::text, b.d_date)
   ->  XN Network  (cost=1363288861214.20..1363288861396.83 rows=73049 width=8)
         Send to leader
         ->  XN Sort  (cost=1363288861214.20..1363288861396.83 rows=73049 width=8)
               Sort Key: "date_part"('year'::text, b.d_date)
               ->  XN HashAggregate  (cost=363288854947.85..363288855313.09 rows=73049 width=8)
                     ->  XN Hash Join DS_DIST_ALL_NONE  (cost=376252.50..363139873158.03 rows=29796357965 width=8)
                           Hash Cond: ("outer".ss_sold_date_sk = "inner".d_date_sk)
                           ->  XN Hash Join DS_BCAST_INNER  (cost=375339.39..362394963295.79 rows=29796357965 width=4)
                                 Hash Cond: ("outer".ss_customer_sk = "inner".c_customer_sk)
                                 ->  XN Subquery Scan a  (cost=0.00..263498674836.70 rows=6584587963204 width=8)
                                       ->  XN Append  (cost=0.00..197652795204.66 rows=6584587963204 width=8)
                                             ->  XN Subquery Scan "*SELECT* 1"  (cost=0.00..539836.20 rows=17994540 width=8)
                                                   ->  XN PG Query Scan store_sales_live  (cost=0.00..359890.80 rows=17994540 width=8)
                                                         ->  Remote PG Seq Scan uswest.store_sales_live  (cost=0.00..179945.40 rows=17994540 width=8)
                                             ->  XN Subquery Scan "*SELECT* 2"  (cost=0.00..33004193.28 rows=1650209664 width=8)
                                                   ->  XN Seq Scan on store_sales_current  (cost=0.00..16502096.64 rows=1650209664 width=8)
                                             ->  XN Subquery Scan "*SELECT* 3"  (cost=0.00..197619251175.18 rows=6582919759000 width=8)
                                                   ->  XN Partition Loop  (cost=0.00..131790053585.18 rows=6582919759000 width=8)
                                                         ->  XN Seq Scan PartitionInfo of spectrum.store_sales_historical  (cost=0.00..10.00 rows=1000 width=4)
                                                         ->  XN S3 Query Scan store_sales_historical  (cost=0.00..131658395.18 rows=6582919759 width=4)
                                                               ->  S3 Seq Scan spectrum.store_sales_historical location:"s3://mysamplebucket/historical_store_sales" format:PARQUET (cost=0.00..65829197.59 rows=6582919759 width=4)
                                 ->  XN Hash  (cost=375000.00..375000.00 rows=135755 width=4)
                                       ->  XN Seq Scan on customer c  (cost=0.00..375000.00 rows=135755 width=4)
                                             Filter: ((c_birth_country)::text = 'UNITED STATES'::text)
                           ->  XN Hash  (cost=730.49..730.49 rows=73049 width=8)
                                 ->  XN Seq Scan on date_dim b  (cost=0.00..730.49 rows=73049 width=8)
(28 rows)

Query performance hardly changed from the previous query. Because the query only scanned one column (ss_sold_date_sk), it benefits from Parquet’s columnar structure for the historical data subquery. To put it another way, if the historical data is stored as CSV, all the data is scanned, which degrades performance significantly.

Additionally, the TPC-DS model does not store date values in the store_sales fact table. Instead, a foreign key references the date_dim table. If you plan on implementing something similar but frequently filter by a date column, consider adding that column into the fact table and have it as a sort key, and also adding a partitioning column in Amazon Redshift Spectrum. That way, Amazon Redshift can more efficiently skip blocks for local data and prune partitions for Amazon S3 data, in the latter, and also push filtering criteria down to Amazon Redshift Spectrum.

Conclusion

Applications of live data integration in real-world scenarios include data discovery, data preparation for machine learning, operational analytics, IoT telemetry analytics, fraud detection, and compliance and security audits. Whereas Amazon Redshift Spectrum extends the reach of Amazon Redshift into the AWS data lake, Federated Query extends its reach into operational databases and beyond.

For more information about data type differences between these databases, see Data Type Differences Between Amazon Redshift and Supported RDS PostgreSQL or Aurora PostgreSQL Databases. For more information about accessing federated data with Amazon Redshift, see Limitations and Considerations When Accessing Federated Data with Amazon Redshift.


About the Authors

Tito Mijares is a Data Warehouse Specialist Solutions Architect at AWS. He helps AWS customers adopt and optimize their use of Amazon Redshift. Outside of AWS, he enjoys playing jazz guitar and working on audio recording and playback projects.

Entong Shen is a Senior Software Development Engineer for AWS Redshift. He has been working on MPP databases for over 8 years and has focused on query optimization, statistics and SQL language features such as stored procedures and federated query. In his spare time, he enjoys listening to music of all genres and working in his succulent garden.

Niranjan Kamat is a software engineer on the Amazon Redshift query processing team. His focus of PhD research was in interactive querying over large databases. In Redshift, he has worked in different query processing areas such as query optimization, analyze command and statistics, and federated querying. In his spare time, he enjoys playing with his three year old daughter, practicing table tennis (was ranked in top 10 in Ohio, USATT rating 2143), and chess.

Sriram Krishnamurthy is a Software Development Manager for AWS Redshift Query Processing team. He is passionate about Databases and has been working on Semi Structured Data Processing and SQL Compilation & Execution for over 15 years. In his free time, you can find him on the tennis court, often with his two young daughters in tow.

Laurenz Albe: PostgreSQL v13 new feature: tuning autovacuum on insert-only tables

$
0
0

Feed: Planet PostgreSQL.

vacuuming insert-only carpets
© Laurenz Albe 2020

Most people know that autovacuum is necessary to get rid of dead tuples. These dead tuples are a side effect of PostgreSQL’s MVCC implementation. So many people will be confused when they read that from PostgreSQL v13 on, commit b07642dbc adds support for autovacuuming insert-only tables (also known as “append-only tables”).

This article explains the reasons behind that and gives some advice on how to best use the new feature. It will also explain how to achieve similar benefits in older PostgreSQL releases.

Note that all that I say here about insert-only tables also applies to insert-mostly tables, which are tables that receive only few updates and deletes.

How insert-triggered autovacuum works

From v13 on, PostgreSQL will gather statistics on how many rows were inserted since a table last received a VACUUM. You can see this new value in the new “n_ins_since_vacuum” column of the pg_stat_all_tables catalog view (and in pg_stat_user_tables and pg_stat_sys_tables).

Autovacuum runs on a table whenever that count exceeds a certain value. This value is calculated from the two new parameters “autovacuum_vacuum_insert_threshold” (default 1000) and “autovacuum_vacuum_insert_scale_factor” (default 0.2) as follows:

insert_threshold + insert_scale_factor * reltuples

where reltuples is the estimate for the number of rows in the table, taken from the pg_class catalog.

Like other autovacuum parameters, you can override autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor with storage parameters of the same name for individual tables. You can disable the new feature by setting autovacuum_vacuum_insert_threshold to -1.

You can use “toast.autovacuum_vacuum_insert_threshold” and “toast.autovacuum_vacuum_insert_scale_factor” to change the parameters for the associated TOAST table.

Use case 1: “anti-wraparound” vacuum on insert-only tables

Why do insert-only tables need VACUUM?

PostgreSQL stores transaction IDs in the xmin and xmax system columns to determine which row version is visible to which query. These transaction IDs are unsigned 4-byte integer values, so after slightly more than 4 billion transactions the counter hits the upper limit. Then it “wraps around” and starts again at 3.

As described in this blog post, that method would cause data loss after about 2 billion transactions. So old table rows must be “frozen” (marked as unconditionally visible) before that happens. This is one of the many jobs of the autovacuum daemon.

Why anti-wraparound vacuum on insert-only tables can be a problem

The problem is that PostgreSQL only triggers such “anti-wraparound” runs once the oldest unfrozen table row is more than 200 million transactions old. For an insert-only table, this is normally the first time ever that autovacuum runs on a table. There are two potential problems with that:

  • The anti-wraparound vacuum doesn’t get done in time. Then, one million transactions before it would suffer data corruption, PostgreSQL will not accept any new transactions. You have to start it in single-user mode and run VACUUM manually. You can find a description of such cases in this and this blog.
  • Different from other autovacuum runs, anti-wraparound autovacuum will not give up when it blocks a concurrent transaction. This will block even short operations that require an ACCESS EXCLUSIVE lock (like DDL statements on the table). Such a blocked operation will block all other access to the table, and processing comes to a standstill. You can find such a case described in this blog.

How to protect yourself from disruptive anti-wraparound vacuums

From PostgreSQL v13 on, the default settings should already protect you from this problem. This was indeed the motivation behind the new feature.

If it is truly an insert-only table, you should also set the vacuum_freeze_min_age storage option to 0 for that table, so that autovacuum freezes as many tuples as it can:

ALTER TABLE mytable SET (
   vacuum_freeze_min_age = 0
);

With a table that also receives some updates and deletes, you should use a higher value of vacuum_freeze_min_age to avoid unnecessary I/O.

For PostgreSQL versions older than v13, you can achieve a similar effect by triggering anti-wraparound vacuum earlier, so that it becomes less disruptive. For example, if you want to vacuum a table every 100000 transactions, you can set this additional storage parameter:

ALTER TABLE mytable SET (
   autovacuum_freeze_max_age = 100000,
   vacuum_freeze_min_age = 0
);

Use case 2: index-only scans on insert-only tables

How index-only scans work in PostgreSQL

As mentioned above, each row contains the information for which transactions it is visible. However, the index does not contain this information. Now if you consider an SQL query like this:

SELECT count(*) FROM mytables WHERE id < 100;

where you have an index on id, all the information you need is available in the index. So you should not need to fetch the actual table row (“heap fetch”), which is the expensive part of an index scan. But unfortunately you have to visit the table row anyway, just to check if the index entry is visible or not.

To work around that, PostgreSQL has a shortcut that makes index-only scans possible: the visibility map. This data structure stores two bits per 8kB table block, one of which indicates if all rows in the block are visible to all transactions. If a query scans an index entry and finds that the block containing the referenced table row is all-visible, it can skip checking visibility for that entry.

So you can have index-only scans in PostgreSQL if most blocks of a table are marked all-visible in the visibility map.

The problem with index-only scans on insert-only tables

Since VACUUM removes dead tuples, which is required to make a table block all-visible, it is also VACUUM that updates the visibility map. So to have most blocks all-visible in order to get an index-only scan, VACUUM needs to run on the table often enough.

Now if a table receives enough UPDATEs or DELETEs, you can set autovacuum_vacuum_scale_factor to a low value like 0.005. Then autovacuum will keep the visibility map in good shape.

But with an insert-only table, it is not as simple to get index-only scans before PostgreSQL v13. One report of a problem related to that is here.

How to get index-only scans on insert-only tables

From PostgreSQL v13 on, all you have to do is to lower autovacuum_vacuum_insert_scale_factor on the table:

ALTER TABLE mytable SET (
   autovacuum_vacuum_insert_scale_factor = 0.005
);

In older PostgreSQL versions, this is more difficult. You have two options:

  • schedule regular VACUUM runs with cron or a different scheduler
  • set autovacuum_freeze_max_age low for that table, so that autovacuum processes it often enough

Use case 3: hint bits on insert-only tables

In PostgreSQL, the first query that reads a newly created row has to consult the commit log to figure out if the transaction that created the row was committed or not. It then sets a hint bit on the row that persists that information. That way, the first reader saves future readers the effort of checking the commit log.

As a consequence, the first reader of a new row “dirties” (modifies in memory) the block that contains it. If a lot of rows were recently inserted in a table, that can cause a performance hit for the first reader. Therefore, it is considered good practice in PostgreSQL to VACUUM a table after you insert (or COPY) a lot of rows into it.

But people don’t always follow that recommendation. Also, if you want to write software that supports several database systems, it is annoying to have to add special cases for individual systems. With the new feature, PostgreSQL automatically vacuums insert-only tables after large inserts, so you have one less thing to worry about.

Future work

During the discussion for the new feature we saw that there is still a lot of room for improvement. Autovacuum is already quite complicated (just look at the many configuration parameters) and still does not do everything right. For example, truly insert-only tables would benefit from freezing rows right away. On the other hand, for tables that receive some updates or deletes as well as for table partitions that don’t live long enough to reach wraparound age, such aggressive freezing can lead to unnecessary I/O activity.

One promising idea Andres Freund propagated was to freeze all tuples in a block whenever the block becomes dirty, that is, has to be written anyway.

The fundamental problem is that autovacuum serves so many different purposes. Basically, it is the silver bullet that should solve all of the problems of PostgreSQL’s MVCC architecture. That is why it is so complicated. However, it would take a major redesign to improve that situation.

Conclusion

While it seems to be an oxymoron at first glance, autovacuum for insert-only tables mitigates several problems that large databases used to suffer from.

In a world where people collect “big data”, it becomes even more important to keep such databases running smoothly. With careful tuning, that was possible even before PostgreSQL v13. But autovacuum is not simple to tune, and many people lack the required knowledge. So it is good to have new autovacuum functionality that takes care of more potential problems automatically.

Unexpected slow ALTER TABLE in MySQL 5.7

$
0
0

Feed: Planet MySQL
;
Author: Alexander Rubin
;

and  | April 23, 2020 |  Posted In: Intermediate Level, MySQL, Percona Server for MySQL

Usually one would expect that ALTER TABLE with ALGORITHM=COPY will be slower than the default ALGORITHM=INPLACE. In this blog post we describe the case when this is not so.

One of the reasons for such behavior is the lesser known limitation of ALTER TABLE (with default ALGORITHM=INPLACE) that avoids REDO operations. As a result, all dirty pages of the altered table/tablespace have to be flushed before the ALTER TABLE completion.

Some history

A long time ago, all “ALTER TABLE” (DDLs) operations in MySQL were implemented by creating a new table with the new structure, then copying the content of the original table to the new table, and finally renaming the table. During this operation the table was locked to prevent data inconsistency.

Then, for InnoDB tables, the new algorithms were introduced, which do not involve the full table copy and some operations do not apply the table level lock – first the online add index algorithm was introduced for InnoDB, then the non-blocking add columns or online DDLs. For the list of all online DDLs in MySQL 5.7 you can refer to this document.

The problem

Online DDLs are great for common operations like add/drop a column, however we have found out that these can be significantly slower. For example, adding a field to a large table on a “beefy” server with 128G of RAM can take unexpectedly long time.

In one of our “small” Percona Servers, it took a little more than 5 min to add a column to the 13 GB InnoDB table. Yet on another “large” Percona Server, where the same table was 30 GB in size, it took more than 4 hours to add the same column.

Investigating the issue

After verifying that the disk I/O throughput is the same on both servers, we investigated the reason for such a large difference in the duration of ALTER TABLE helios ADD COLUMN  query using Percona Monitoring and Management (PMM) to record and review performance.

On the smaller server, where ALTER TABLE was faster, the relevant PMM monitoring plots show:

In our Percona Server version 5.7, ALTER TABLE helios ADD COLUMN  was executed in place. On the left, we can observe a steady rate of the table rebuild, followed by four spikes corresponding to rebuilding of the four indices.

What is also interesting is that ALTER TABLE with the INPLACE ALGORITHM (which will be the default for adding a field) will need to force flushing of all dirty pages and wait until it is done. This is a much less known fact and very sparsely documented. The reason for this is that undo and redo logging is disabled for this operation:

No undo logging or associated redo logging is required for ALGORITHM=INPLACE. These operations add overhead to DDL statements that use ALGORITHM=COPY.
https://dev.mysql.com/doc/refman/5.7/en/innodb-online-ddl-operations.html

In this situation the only option is to flush all dirty pages, otherwise the data can become inconsistent. There’s a special treatment to be seen for ALTER TABLE in Percona Server for MySQL.

Back to our situation – during table rebuild, InnoDB buffer pool becomes increasingly dirty:

The graph shows peak at about 9 GB corresponding to the table data size. Originally we were under the impression that as dirty pages are flushed to disk, the in-memory dirty pages volume decreases at the rate determined by the Percona adaptive flushing algorithm. It turns out that flushing by ALTER and adaptive flushing have no relation: both happen concurrently. Flushing by ALTER is single page flushing and is done by iterating pages in the flush list and flushing pages of desired space_id (one by one). That probably explains that if the server has more RAM it can be slower to flush as it will have to scan a larger list.

After the last buffer pool I/O request (from the last index build) ends, the algorithm increases the rate of flushing for the remaining dirty pages. The ALTER TABLE finishes when there are no more dirty pages left in the memory.

You can see the six-fold increase in the I/O rate clearly in the plot below:

In contrast, on the “large” server, ALTER TABLE behaved differently. Although, at the beginning it proceeded the similar way:

On the left, we can observe a steady rate of the table rebuild, followed by four spikes corresponding to rebuilding of the four table indices. During table rebuild the buffer pool became increasingly dirty:

Followed by the 21 GB of the table data, there are four kinks corresponding to four indices builds. It takes about twenty minutes to complete this part of ALTER TABLE processing of the 30 GB table. To some degree this is comparable to about four minutes to complete the similar part of ALTER TABLE processing of the 13 GB table. However, the adaptive flushing algorithm behaved differently on that server. It took more than four hours to complete the dirty pages flushing from memory

This is because in contrast to the “small” server, the buffer pool I/O remained extremely low:

This is not a hardware limitation, as PMM monitoring shows that at other times, the “large” server demonstrated ten times higher buffer pool I/O rates, e.g.:

Conclusion

Beware the slower performance of ALTER TABLE … ADD COLUMN (default algorithm is INPLACE). On the large server the difference can be significant: the smaller the buffer pool the smaller is the flush lists and faster the flushing as the ALTER table has a smaller flush_lists to iterate. In some cases it may be better (and with more predictable timing) to use ALTER TABLE ALGORITHM=COPY.

About VirtualHealth

VirtualHealth created HELIOS, the first SaaS solution purpose-built for value-based healthcare. Utilized by some of the most innovative health plans in the country to manage millions of members, HELIOS streamlines person-centered care with intelligent case and disease management workflows, unmatched data integration, broad-spectrum collaboration, patient engagement, and configurable analytics and reporting. Named one of the fastest-growing companies in North America by Deloitte in 2018 and 2019, VirtualHealth empowers healthcare organizations to achieve enhanced outcomes, while maximizing efficiency, improving transparency, and lowering costs. For more information, visit www.virtualhealth.com.
The content in this blog is provided in good faith by members of the open source community. Percona has not edited or tested the technical content. Views expressed are the authors’ own. When using the advice from this or any other online resource test ideas before applying them to your production systems, and always secure a working back up.

Alexander Rubin

Alexander has worked with MySQL since 2000 as a DBA and Application Developer. He has performed MySQL consulting as a principal consultant/architect for over 13 years, starting with MySQL AB in 2006, then Sun Microsystems, then Oracle and then Percona. Alex has helped many customers design large, scalable, and highly available MySQL systems and optimize MySQL performance. He has also helped customers design Big Data stores with Apache Hadoop and related technologies. Currently, Alexander is a Director of Data Architecture at VirtualHealth.

Alexandre Vaniachine

Alexandre is an Open Source enthusiast experienced in troubleshooting and resolving scalability issues across the full software stack.
In industry and academia, he scaled up data processing from terabytes to petabytes, while minimizing data losses below acceptable level.
Early in his career Alexandre pioneered deployment of MySQL databases on Virtual Machines. Alexandre is a Senior MySQL DBA at VirtualHealth

Fun with Bugs #97 – On MySQL Bug Reports I am Subscribed to, Part XXXI

$
0
0

Feed: Planet MySQL
;
Author: Valeriy Kravchuk
;

Time for the next blog post about MySQL bugs! The previous one covered some bugs I considered interesting in March, this one will be about community bug reports that I’ve picked up since March 26, 2020. I’d better review bug fixed in MySQL 5.7.30 instead of this, but it is still not released, even though we know it must get some important security fixes based on “Oracle Critical Patch Update Advisory – April 2020” recently published.

As usual, I am mostly interested in replication, InnoDB, optimizer and few more categories. Here is the list:

  • Bug #99082 – “Problem with replication: XA transaction, temp tables and row based binlog“. As noted by Libor Laichmann, creating temporary tables inside XA transaction leads to broken replication in MySQL 5.7.29, with somewhat misleading error message. This is unfortunate. We do not see any evidence, yet, if 5.6 and 8.0 branches are similarly affected.
  • Bug #99094 – “coredump when install information schema plugin“. Bug reporter, Lou Shuai, tried to create simple plugin for the information_schema and it crashed MySQL server 8.0.19 upon installation attempt. Bot the plugin source code and patch for the bug were contributed.
  • Bug #99100 – “GROUP BY will throw table is full when temptable memory allocation exceed limit“. More fun with TempTable storage engine that I blamed previously. This bug was reported by Dean Zhou, who had performed detailed analysis in gdb and suggested a fix. It took some efforts for the bug reporter to get it “Verified” as a regression bug in MySQL 8.0.19 (without a “regression” tag, surely).
  • Bug #99101 – “SELECT .. LOCK IN SHARE MODE requires LOCK TABLES grant“. Simple regression in MySQL 8.0.11+ (or incompatible change in behavior, if you prefer) vs MySQL 5.7 was found and reported by Matthew Boehm.
  • Bug #99136 – “TempTable wastes 1MB for each connection in thread cache“. Nikolai Ikhalainen demonstrated this additional memory usage in MySQL 8.0.16 comparing to 5.7 quite clearly.
  • Bug #99174 – “Prepare transaction won’t rollback if server crash after online ddl prepare stage“. This bug was reported by Zhang Xiaojian. Additional debugging code (one DBUG_EXECUTE_IF statement) was added to the source to demonstrate the problem easily and it caused some questionable arguments of a kind:

    “First of all, changing our server code in order to cause a bug, can not be considered a repeatable test case.”

    But the bug reporter provided more details and clear test case, and had not agreed with the above. The bug was soon “Verified”, even though no attempts to check (or explain) if 5.7 may be similarly affected were made in public, so we still have to wonder if this is a regression.

  • Bug #99180 – “Accessing freed memory in perfschema when aggregating status vars“. Let me just quote Manuel Ung:

    “When aggregate_thread_status is called for other threads, it’s possible for that thread to have exited and freed the THD between the time we check that the thread was valid, until the time we call get_thd_status_var.”

    Ironically, he had to add some conditional my_sleep() call to server code to get a repeatable test case, and this was NOT a problem for a good Oracle engineer to verify the bug immediately. There are still bugs in my beloved Performance Schema. Who could imagine that?

  • Bug #99200 – “CREATE USER get stuck on SHOW PROCESSLIST and ps.threads on slave“. So, slave may disclose some sensitive information. As demonstrated by Marcelo Altmann, if a query had been rewritten by the parser because it contains sensitive information, it won’t be cleaned up when slave’s SQL thread applies it, making it visible in SHOW PROCESSLIST and performance_schema.threads. Both 5.7.29 and 8.0.19 are affected. Bug reporter had contributed fixes for both versions. See also another bug in Performance Schema that he reported and contributed a fix for, Bug #99204 – “performance_schema threads table PROCESSLIST_INFO incorrect behaviour“.
  • Bug #99205 – “xa prepare write binlog while execute failed“. Then XA PREPARE from the binary log is executed on slave and… we are in troubles. The bug was reported by Phoenix Zhang. Unfortunately it is still not clear from this verified bug report if MySQL 5.6 and 5.7 are similarly affected (probably they are).
  • Bug #99206 – “lock_wait_timeout is taking twice time while adding new partition in table“. Nice finding by Lalit Choudhary. MySQL 8.0.19 is not affected by this bug.
  • Bug #99244 – “bad performance in hash join because join with no join condition between tables“. This optimizer bug (wrong join order when new hash join optimization is used) was reported by Chunyang Xu. But note also a simple test case and perf outputs contributed later by Shane Bester.
  • Bug #99257 – “Inconsistent output and wrong ORDER BY Sorting for query with LIMIT“. Yet another 5.7-only optimizer bug found by Lalit Choudhary. My quick test presented in a comment shows that MySQL 5.6.27 produced correct results, so this is a regression bug.
  • Bug #99273 – “Item_ref in Having Clause points to invalid outer table field“. This may lead to wrong results for simple enough queries. This regression bug that affects both MySQL 5.7.29 and 8.0.19 was reported by Shanshan Ying.
  • Bug #99286 – “Concurrent update cause crash in row_search_mvcc“. This great bug report with code analysis, fix suggested and a test case with detailed instructions was created by Zkong Kong. It was promptly verified, but I still miss any documented attempt to check (by running the test case or by code analysis if the bug applies only to 5.7.29 or MySQL 8.0.x is also potentially affected. For now I’ll try to remember this assertion line:
    InnoDB: Assertion failure in thread 47758491551488 in file rem0rec.cc line 586

    and “crash with row_search_mvcc in backtrace”. Who know when I hit something similar and what fork/version it will be…

Rainy birthday at the seaside in Cap-d’Ail. Almost 15 years of my 50 were spent checking new MySQL bug reports almost every day.

To summarize:

  1. We still see many regression bugs in recent versions of MySQL 5.7.x and 8.0.x, often without “regression” tag.
  2. There are cases when the bug is verified, but there are no clearly documented checks if all GA versions are affected.
  3. XA transactions are (and had always been) a real disaster for modern MySQL versions in replication environments – all kinds of replication breakage and inconsistencies are possible.
  4. Check “MySQL Bug Reporter Hall of Fame” if you want to know who from MySQL Community had contributed a lot of bug reports over last 10 years.

Marco Slot: How the Citus distributed query executor adapts to your Postgres workload

$
0
0

Feed: Planet PostgreSQL.

In one of our recent releases of the open source Citus extension, we overhauled the way Citus executes distributed SQL queries—with the net effect being some huge improvements in terms of performance, user experience, Postgres compatibility, and resource management. The Citus executor is now able to dynamically adapt to the type of distributed SQL query, ensuring fast response times both for quick index lookups and big analytical queries.

We call this new Citus feature the “adaptive executor” and we thought it would be useful to walk through what the Citus adaptive executor means for Postgres and how it works.

Why we needed to improve upon the original Citus query executors

In a distributed database like Citus, there are two ways of handling a Postgres query. Either:

  • the query contains enough information to determine in which shard (i.e. in which Postgres table on the worker) the data is stored—and to route the query to the right node with minimal overhead, or
  • the query is parallelized across all shards to handle large data volumes

Citus supports both models for handling SQL queries in Postgres, which has allowed Citus to scale both multi-tenant SaaS applications as well as real-time analytics applications, including time series dashboards.

To handle the requirements of these different types of PostgreSQL workloads, Citus traditionally had multiple executor code paths. Previously:

  • the Citus router executor was optimized for routing queries directly to the right node, with minimal overhead, and
  • the Citus real-time executor was optimized for large parallel computations by opening a connection per shard, to query many shards using all available cores

Because the original Citus real-time executor was not designed with distributed transactions in mind, it did not have the ability to reuse connections for multiple shards on the same worker.

On the other hand, because the original Citus router executor was designed for minimal overhead, it explicitly reused connections for all shards.

Using different executors in the same transaction block used to complicate migrations from PostgreSQL to Citus. Moreover, there is a large class of queries that do span across shards, but do not benefit from multi-core parallelism yet still paid the overhead of establishing many connections per Citus worker node.

In one of our recent Citus open source releases, we delivered a single executor that can handle different workloads and solves all the shortcomings of the real-time and router executor in a single unified code path: The Citus adaptive executor.

Introducing the Citus Adaptive Executor

The Citus adaptive executor uses a dynamic pool of connections to each worker node to execute Postgres queries on the shards (SQL tasks).

What this means is: the Citus adaptive executor can execute multiple tasks over a single connection per Citus worker node, to minimize overhead or to parallelize queries across connections per worker node to use multiple cores—giving you the ability to use hundreds of cores and combine the memory of many servers.

The Citus executor parallelizes not only SELECT queries, but also DML (e.g. UPDATE), DDL (e.g. CREATE INDEX) and other utility statements (e.g. VACUUM) across multiple worker nodes and multiple cores. Moreover, these parallelized statements can be part of bigger transaction blocks or stored procedures that are executed as one distributed Postgres transaction.

To ensure queries can always see the result of preceding statements in a transaction block, the adaptive executor first checks whether SQL tasks need to be assigned to a particular connection. If there were preceding writes on the shard(s) that the task accesses, then the executor assigns the SQL task to the connection that did the write—and otherwise the executor assigns the task to the pool to be executed as soon as a connection becomes available.

By dynamically adjusting the pool size to the SQL query, the Citus adaptive executor can handle:

  • queries that are routed to a single worker node,
  • queries that are parallelized across all worker node over a single connection (=core) per node, as well as
  • queries that are parallelized across many cores per worker.

The adaptive executor also enables users to limit the number of connections per worker that a query can open, to handle distributed PostgreSQL queries with high concurrency.

The not-so-secret sauce behind the Adaptive Executor

Part of what makes the executor adaptive is a technique we call “slow start”, which is inspired by the famous TCP Slow Start algorithm that allows TCP to adapt to the available bandwidth and avoid congestion.

Every 10 milliseconds, the number of new connections that the adaptive executor can open to a worker node will double, if there are tasks remaining for that worker node. That way, if a distributed query does a quick ~1ms index lookup on each shard, the executor will never open more than a single connection per node. On the other hand, if we’re dealing with an analytical query that takes several seconds, the executor will quickly parallelize it across all available cores by running tasks in parallel over multiple connections per node.

So the Citus adaptive executor does not need to guess how long a SQL query will take, but rather will adapt to each query’s runtime.

In the upcoming Citus 9.3 open source release, the adaptive executor will also factor in the total number of connections it makes to each worker node. When it approaches the connection limit on a worker (max_connections), the executor will avoid opening additional connections for parallelism, such that the total number of connections to each worker never exceeds the total number of connections that the client is making to the coordinator. That ensures you can run many parallel queries concurrently without any issues.

Finally, while this might not seem important to most users, as a developer I really like the Citus adaptive executor’s well-documented code and especially its elegant state machines. The extensive code comments and state machines help ensure we keep one of the core parts of Citus working smoothly. We were also able to eliminate many inefficiencies, so we’ve seen performance improvements across the board as well as better error messages.

The Citus adaptive executor gives you faster queries, higher concurrency, stronger SQL support, & more

All of us on our Citus open source engine team at Microsoft—spread between Amsterdam, Vancouver, and Istanbul—were excited to launch the Citus adaptive executor. Not just because this new feature represents exciting distributed systems work, but because of the benefits the adaptive executor brings to Citus open source users—as well as to Azure Database for PostgreSQL users who are using the Hyperscale (Citus) to scale out Postgres horizontally on Azure.

The list of Citus adaptive executor benefits includes:

  • Improved performance for distributed (multi-shard) SQL queries
  • Improved support for transactions in Postgres that mix single-shard and multi-shard queries (incl. SELECT, DML, DDL, COPY)
  • Improved error messages for multi-shard Postgres queries
  • Up to 20X faster response times for simple multi-shard queries (e.g. Postgres index lookups on non-distribution column)
  • Better resource management by configuring the maximum number of connections that Citus can make to a worker node, and the maximum number of connections Citus can keep across transactions
  • Better fault tolerance when running out of Postgres database connections on the worker node

One of the biggest benefits for Citus open source users is that the adaptive executor will simplify migrations from single-node PostgreSQL to a Citus distributed database cluster. That’s because almost all multi-statement transactions will now just work, and there is no risk of Citus operations getting a lot slower when queries do not have a distribution column filter.

Another benefit is that the Citus adaptive executor opens the door to enabling new use cases, for which the overhead of opening a database connection per shard would have been prohibitive before.

For instance, with the adaptive executor, search applications can now take advantage of parallelism when doing linear searching across all database records—while also being able to run fast index lookups with minimal overhead and high concurrency.

Our teammate begriffs has created some useful getting-started tutorials for Citus

If you want to learn more about how the Citus executors run queries, our Citus Docs has a good SQL reference section on query processing.

And if you want to roll up your sleeves and explore whether scaling out Postgres with Citus is a good fit for you and your application, the 2 best places to start are:

  • Download Citus packages and dive into one of the tutorials that our awesome documentation expert Joe Nelson aka begriffs created, such as the multi-tenant tutorial, the real-time analytics tutorial in our Citus open source documentation.

  • Try out Citus on Microsoft Azure, by following the quickstart docs to provision a Hyperscale (Citus) server group (often called a cluster) on Azure Database for PostgreSQL; there are also useful Postgres tutorials on docs.microsoft.com, including this one on designing a multi-tenant database and another tutorial on designing a real-time analytics dashboard with Hyperscale (Citus).

We can’t wait to see what you build with Citus

Once you get started, remember to join our Citus slack as there is lots of good community Q&A on the Citus Slack every single day. And just in case you don’t already know about it, a quick plug for the Citus newsletter: we curate our Citus technical newsletter each month with useful Postgres and Citus developer content, mostly links to blog posts or videos—with the goal of being useful, and never boring.

Kirk Roybal: Oracle to PostgreSQL: ANSI outer join syntax in PostgreSQL

$
0
0

Feed: Planet PostgreSQL.

We find ourselves at the third article in the Oracle migration series. This time, we look at those strange operators that modify the WHERE clause criteria in Oracle (+). Like everything else, PostgreSQL has a solution for that.

RIGHT JOIN

Oracle supports, and many developers use, ANSI outer JOIN syntax using operators to the qualifications clause.

Typically, that looks something like this:

SELECT *
FROM person, places
WHERE person.id = places.person_id(+)

The objective of this syntax is a right outer join. In set theory terms, this is the subset including all places, regardless of person.

The result of a small sample would look like this:

id last_name first_name id location person_id
1 (NULL) (NULL) 1 Dallas (NULL)
2 Roybal Kirk 2 London 2
3 Riggs Simon 3 Paris 3

This syntax is unsupported in PostgreSQL.

To achieve the same result, you would use the standard SQL syntax for outer joins.

SELECT *
FROM persons
RIGHT JOIN places
ON persons.id = places.person_id;

SQL also provides a clarifying adverb OUTER. This clarifier is completely optional, as any RIGHT JOIN is by definition an OUTER join.

FULL JOIN

Similarly, using the Oracle syntax for a full join does not work in PostgreSQL.

SELECT *
FROM persons, places
WHERE persons.id(+) = places(+);

The objective of this syntax is a full list of persons and places whether a person is associated with a place or not.

The result would like like this:

id last_name first_name** id location person_id
1 (NULL) (NULL) 1 Dallas (NULL)
2 Roybal Kirk 2 London 2
3 Riggs Simon 3 Paris 3
4 Andrew Dunstan (NULL) (NULL) (NULL)

Using PostgreSQL syntax, the query would be written thusly:

SELECT *
FROM persons
FULL JOIN places
ON persons.id = places.person_id;

Again, the OUTER keyword is completely optional.

CROSS JOIN

One distinct advantage of the approach to using keywords rather than implicit relationships is that you are not able to accidentally create a cross product.

The syntax:

SELECT *
FROM persons
LEFT JOIN places;

Will result in an error:

ERROR:  syntax error at or near ";"

Indicating that the statement is not complete at the line ending marker “;”.

PostgreSQL will create the cross join product using the ANSI syntax.

SELECT *
FROM persons, places;
id last_name first_name id location person_id
1 Dunstan Andrew 1 Dallas (null)
1 Dunstan Andrew 2 London 2
1 Dunstan Andrew 3 Paris 3
1 Dunstan Andrew 4 Madrid (null)
2 Roybal Kirk 1 Dallas (null)
2 Roybal Kirk 2 London 2
2 Roybal Kirk 3 Paris 3
2 Roybal Kirk 4 Madrid (null)
3 Riggs Simon 1 Dallas (null)
3 Riggs Simon 2 London 2
3 Riggs Simon 3 Paris 3
3 Riggs Simon 4 Madrid (null)
6 Wong Mark 1 Dallas (null)
6 Wong Mark 2 London 2
6 Wong Mark 3 Paris 3
6 Wong Mark 4 Madrid (null)

Which is more likely a coding error than the intentional result.

To get this functionality intentionally, it is recommended to use the CROSS JOIN statement.

SELECT *
FROM persons
CROSS JOIN places;

Thus making it unambiguous what was meant in the statement.

NATURAL JOIN

PostgreSQL supports the NATURAL JOIN syntax, but a bit under protest.

SELECT *
FROM persons
NATURAL JOIN places;

This produces the following result.

id last_name first_name parent_id location person_id
1 Dunstan Andrew (null) Dallas (null)
2 Roybal Kirk 1 London 2
3 Riggs Simon 1 Paris 3

However, this syntax is a problem. For our example, the “id” column in both tables has nothing to do with each other. This join has produced a result, but one with completely irrelevant content.

Additionally, you may have a query that initially presents the correct result, but subsequent DDL statements silently affect.

Consider:

ALTER TABLE person ADD COLUMN places_id bigint;
ALTER TABLE places ADD COLUMN places_id bigint;
ALTER TABLE person ADD COLUMN person_id bigint;

Now what column is the NATURAL JOIN using? The choices are id, places_id, person_id, and all of the above. I’ll leave the answer as an exercise to the reader.

This syntax is a time bomb for your code. Just don’t use it.

Ok, so you’re not convinced. Well, then at least have some sane coding conventions. For the parent table, name the identity column “myparenttable_id”. When referencing it from child relations, use the same name, “myparenttable_id”.  Never name anything “id”, and never make a reference to a column with a different name. Ah, forget it. Just don’t do this.

You may be tempted to disambiguate the previous puzzle by using the USING keyword. That would look like this:

SELECT *
FROM persons
JOIN places
USING (id);

But the USING keyword can only take advantage of exact name matches across tables. Which again, in our example is just dead wrong.

The best practice choice for PostgreSQL is to simply avoid designing tables by coding convention standards.

Summary

These keyword techniques (vs. operators) are also available on Oracle. They are more cross-platform, and less ambiguous. That alone would make them best practices.

Added to that, they expose logical errors when improperly used. For any development in PostgreSQL, we unilaterally recommend using explicit keywords.

Enforce Primary Key constraints on Replication

$
0
0

Feed: Planet MySQL
;
Author: MySQL High Availability
;

In this post, we introduce a configuration option that controls whether replication channels allow the creation of tables without primary keys. This continues our recent work on replication security, where we allowed users to enforce privilege checks, and/or enforce row-based events.

On 8.0.20 we introduce a new option for the CHANGE MASTER TO statement: REQUIRE_TABLE_PRIMARY_KEY_CHECK. This enables a replication channel to select its own policy when executing queries that create or alter table definitions and their primary keys.

Enforcing primary keys on table definitions is important for example when replicating using row-based logging were table keys play an important role in the replica performance. The tool in the server for enforcing this policy is the variable sql_require_primary_key. In the context of replication, the value of this variable will be sent together with all queries that change a table structure, also known as DDL, and so the replica will follow whatever restrictions were in place on the primary.

However, if the operator of the replica does not control or trust the primary server, it does not suffice to follow restrictions defined there. For this reason, this behavior can now be influence with the value of
REQUIRE_TABLE_PRIMARY_KEY_CHECK

This parameter can be set on a channel to:

  • ON: the replication channel always uses the value ON for the sql_require_primary_key system variable in replication operations, requiring a primary key in all create and alter table operations.
  • OFF: the replication channel always uses the value OFF for the sql_require_primary_key system variable in replication operations, so that a primary key is never required when creating or altering tables, even if the primary enforced such restrictions.
  • STREAM: the default; the replication channel uses whatever value is replicated from the primary for each transaction. This preserves the previous server behavior.

Usage and advantages

The first use case for this new addition is in scenarios where there is no tight control over the primary instance from which the data originates. In such cases, REQUIRE_TABLE_PRIMARY_KEY_CHECK=ON ensures that no primary keys are removed from your tables definitions thus causing performance issues.

This feature is also particularly interesting in multi source replication scenarios. It allows for a more uniform behavior across replication channels from different primaries, keeping a consistent value for sql_require_primary_key.
Using ON safeguards against the loss of primary keys when multiple primaries update the same set of tables and there was a mistake on one of them. Using OFF allows primaries that can manipulate primary keys to work alongside primaries that cannot.

This feature also has advantages when using privilege checks in the replication channel, as setting REQUIRE_TABLE_PRIMARY_KEY_CHECK to a value different from STREAM means the configured user account for PRIVILEGE_CHECKS_USER no longer needs privileges to manipulate sql_require_primary_key. If set to STREAM, besides the basic privileges to create or alter a table, the privilege checks user is required to have session administration level privileges to replicate any query that executes one of these actions in the replica.

Configuration

To explicitly change the behavior of channel in regards to how it handles primary key checks policies you need to stop the replica SQL thread.

Observability

The Performance Schema tables related to the slave applier status were enhanced to display the status of the new CHANGE MASTER TO … statement option, REQUIRE_TABLE_PRIMARY_KEY_CHECK:

Some notes on usage

This feature is affected by RESET SLAVE ALL, but not by RESET SLAVE.

Also, while the Group Replication plugin does enforce every query to be executed in a table with a primary key, the check does not depend on sql_require_primary_key and is less restrictive. Read more on that here.

Summary

This feature is a new tool to secure your replication streams in complex and diverse environments, while also allowing you better control over the privileges you give your replication applier users.

We hope this new feature will allow you to create more secure solutions with the MySQL server. Feel free to test it, and tell us your opinion.

 812 total views,  138 views today



Nawaz Ahmed: My Favorite PostgreSQL Extensions – Part Two

$
0
0

Feed: Planet PostgreSQL.

This is the second part of my blog “My Favorite PostgreSQL Extensions” wherein I had introduced you to two  PostgreSQL extensions, postgres_fdw and pg_partman. In this part I will explore three more.

pgAudit

The next PostgreSQL extension of interest is for the purpose of satisfying auditing requirements by various government, financial and other certifying bodies such as ISO, BSI, and FISCAM, etc. The standard logging facility which PostgreSQL offers natively with log_statement = all is useful for monitoring, but it does not provide the details required to comply or face the audit. The pgAudit extension focuses on the details of what happened under the hood, while a database was satisfying an application request.

An audit trail or audit log is created and updated by a standard logging facility provided by PostgreSQL, which provides detailed session and/or object audit logging. The audit trail created by pgAudit can get enormous in size depending on audit settings, so care must be observed to decide on what and how much auditing is required beforehand. A brief demo in the following section shows how pgAudit is configured and put to use.

The log trail is created within the PostgreSQL database cluster log found in the PGDATA/log location but the audit log messages are prefixed with a “AUDIT: “ label to distinguish between regular database background messages and audit log records. 

Demo

The official documentation of pgAudit explains that there exists a separate version of pgAudit for each major version of PostgreSQL in order to support new functionality introduced in every PostgreSQL release. The version of PostgreSQL in this demo is 11, so the version of pgAudit will be from the 1.3.X branch. The pgaudit.log is the fundamental parameter to be set that controls what classes of statements will be logged. It can be set with a SET for a session level or within the postgresql.conf file to be applied globally. 

postgres=# set pgaudit.log = 'read, write, role, ddl, misc';

SET



cat $PGDATA/pgaudit.log

pgaudit.log = 'read, write, role, ddl, misc'



db_replica=# show pgaudit.log;

         pgaudit.log

------------------------------

 read, write, role, ddl, misc

(1 row)



2020-01-29 22:51:49.289 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,3,1,MISC,SHOW,,,show pgaudit.log;,



db_replica=# create table t1 (f1 integer, f2 varchar);

CREATE TABLE



2020-01-29 22:52:08.327 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,4,1,DDL,CREATE TABLE,,,"create table t1 (f1 integer, f2 varchar);",



db_replica=#  insert into t1 values (1,'one');

INSERT 0 1

db_replica=#  insert into t1 values (2,'two');

INSERT 0 1

db_replica=#  insert into t1 values (3,'three');

INSERT 0 1

2020-01-29 22:52:19.261 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,5,1,WRITE,INSERT,,,"insert into t1 values (1,'one');",

20-01-29 22:52:38.145 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,6,1,WRITE,INSERT,,,"insert into t1 values (2,'two');",

2020-01-29 22:52:44.988 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,7,1,WRITE,INSERT,,,"insert into t1 values (3,'three');",



db_replica=# select * from t1 where f1 >= 2;

 f1 |  f2

----+-------

  2 | two

  3 | three

(2 rows)



2020-01-29 22:53:09.161 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,9,1,READ,SELECT,,,select * from t1 where f1 >= 2;,



db_replica=# grant select on t1 to usr_replica;

GRANT



2020-01-29 22:54:25.283 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,13,1,ROLE,GRANT,,,grant select on t1 to usr_replica;,



db_replica=# alter table t1 add f3 date;

ALTER TABLE



2020-01-29 22:55:17.440 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,23,1,DDL,ALTER TABLE,,,alter table t1 add f3 date;,



db_replica=# checkpoint;

CHECKPOINT



2020-01-29 22:55:50.349 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,33,1,MISC,CHECKPOINT,,,checkpoint;,



db_replica=# vacuum t1;

VACUUM



2020-01-29 22:56:03.007 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,34,1,MISC,VACUUM,,,vacuum t1;,



db_replica=# show log_statement;

 log_statement

---------------

 none



2020-01-29 22:56:14.740 AEDT 4710 db_replica postgres [local] psql LOG:  AUDIT: SESSION,36,1,MISC,SHOW,,,show log_statement;,

The log entries, as shown in the demo above, are only written to the server background logfile when the parameter log_statement is set, however in this case it is not configured but the audit messages are written by virtue of pgaudit.log parameter as evidenced in the demo. There are more powerful options available to fulfill all your database auditing requirements within PostgreSQL, which can be configured by following the official documentation of pgaudit here or on the github repository.pg_repack

This is a favourite extension among many PostgreSQL engineers that are involved directly with managing and keeping the general health of a PostgreSQL cluster. The reason for that will be discussed a little later but this extension offers the functionality to remove database bloat within a PostgreSQL database, which is one of the nagging concerns among very large PostgreSQL database clusters requiring database re-org. 

As a PostgreSQL database undergoes constant and heavy WRITES (updates & deletes), the old data is marked as deleted while the new version of the row gets inserted, but the old data is not actually wiped from a data block. This requires a periodic maintenance operation called vacuuming, which is an automated procedure that executes in the background that clears all the “marked as deleted” rows. This process is sometimes referred to as garbage collection in colloquial terms. 

The vacuuming process generally gives way to the database operations during busier times. The least restrictive manner of vacuuming in favour of database operations results in a large number of “marked as deleted” rows causing databases to grow out of proportion referred to as “database bloat”. There is a forceful vacuuming process called VACUUM FULL, but that results in acquiring an exclusive lock on the database object being processed, stalling database operations on that object.

pg_repack

It is for this reason that pg_repack is a hit among PostgreSQL DBAs and engineers, because it does the job of a normal vacuuming process but offers an efficiency of VACUUM FULL by not acquiring an exclusive lock on a database object, in short, it works online. The official documentation here explains more about the other methods of reorganizing a database but a quick demo as below will put things in appropriate light for better understanding. There is a requirement that the target table must have at least one column defined as a PRIMARY KEY, which is a general norm in most of the production database setups.

Demo

The basic demo shows the installation and usage of pg_repack in a test environment. This demo uses the version 1.4.5 of pg_repack which is the latest version of this extension at the time of publishing this blog. A demo table t1 initially has 80000 rows which undergoes a massive operation of delete, which deletes every 5th row of the table. An execution of pg_repack shows the size of the table before and after.

mydb=# CREATE EXTENSION pg_repack;

CREATE EXTENSION



mydb=# create table t1 (no integer primary key, f_name VARCHAR(20), l_name VARCHAR(20), d_o_b date);

CREATE TABLE

mydb=# insert into t1 (select generate_series(1,1000000,1),'a'||

mydb(# generate_series(1,1000000,1),'a'||generate_series(1000000,1,-1),

mydb(# cast( now() - '1 year'::interval * random()  as date ));

INSERT 0 1000000



mydb=# SELECT pg_size_pretty( pg_total_relation_size('t1'));

 pg_size_pretty

----------------

 71 MB

(1 row)



mydb=# CREATE or replace FUNCTION delete5() RETURNS void AS $$

mydb$# declare

mydb$# counter integer := 0;

mydb$# BEGIN

mydb$#

mydb$#  while counter <= 1000000

mydb$# loop

mydb$# delete from t1 where no=counter;

mydb$# counter := counter + 5;

mydb$# END LOOP;

mydb$# END;

mydb$# $$ LANGUAGE plpgsql;

CREATE FUNCTION

The delete5 function deletes 200000 rows from t1 table using a counter which increments 5 counts

mydb=# select delete5();

 delete5

------



(1 row)

mydb=# SELECT pg_size_pretty( pg_total_relation_size('t1'));

 pg_size_pretty

----------------

 71 MB

(1 row)



$ pg_repack -t t1 -N -n -d mydb -p 5433

INFO: Dry run enabled, not executing repack

INFO: repacking table "public.t1"



$ pg_repack -t t1 -n -d mydb -p 5433

INFO: repacking table "public.t1"



mydb=# SELECT pg_size_pretty( pg_total_relation_size('t1'));

 pg_size_pretty

----------------

 57 MB

(1 row)

As shown above, the original size of the table does not change after executing the delete5  function, which shows that the rows still exist in the table. The execution of pg_repack clears those ‘marked as deleted’ rows from the t1 table bringing down the size of t1 table to 57 MBs. One other good thing about pg_repack is an option for dry run with -N flag, using which you can check what will be executed during an actual run.

HypoPG

The next interesting extension is identical to a popular concept called invisible indexes among proprietary database servers. The HypoPG extension enables a DBA to see the effect of introducing a hypothetical index (which does not exist) and whether it will improve the performance of one or more queries, and hence the name HypoPG.

The creation of a hypothetical index does not require any CPU or disk resources, however, it consumes a connection’s private memory. As the hypothetical index is not stored in any database catalog tables, so there is no impact of table bloat. It is for this reason, that a hypothetical index cannot be used in an EXPLAIN ANALYZE statement while a plain EXPLAIN is a good way to assess if a potential index will be used by a given problematic query. Here is a quick demo to explain how HypoPG works.

Demo

I am going to create a table containing 100000 rows using generate_series and execute a couple of simple queries to show the difference in cost estimates with and without hypothetical indexes.

olap=# CREATE EXTENSION hypopg;

CREATE EXTENSION



olap=# CREATE TABLE stock (id integer, line text);

CREATE TABLE



olap=# INSERT INTO stock SELECT i, 'line ' || i FROM generate_series(1, 100000) i;

INSERT 0 100000



olap=# ANALYZE STOCK;

ANALYZE



olap=#  EXPLAIN SELECT line FROM stock WHERE id = 1;

                       QUERY PLAN

---------------------------------------------------------

 Seq Scan on stock  (cost=0.00..1791.00 rows=1 width=10)

   Filter: (id = 1)

(2 rows)

olap=# SELECT * FROM hypopg_create_index('CREATE INDEX ON stock (id)') ;

 indexrelid |       indexname

------------+-----------------------

      25398 | <25398>btree_stock_id

(1 row)



olap=# EXPLAIN SELECT line FROM stock WHERE id = 1;

                                     QUERY PLAN

------------------------------------------------------------------------------------

 Index Scan using <25398>btree_stock_id on stock  (cost=0.04..8.06 rows=1 width=10)

   Index Cond: (id = 1)

(2 rows)



olap=# EXPLAIN ANALYZE SELECT line FROM stock WHERE id = 1;

                                             QUERY PLAN

----------------------------------------------------------------------------------------------------

 Seq Scan on stock  (cost=0.00..1791.00 rows=1 width=10) (actual time=0.028..41.877 rows=1 loops=1)

   Filter: (id = 1)

   Rows Removed by Filter: 99999

 Planning time: 0.057 ms

 Execution time: 41.902 ms

(5 rows)



olap=# SELECT indexname, pg_size_pretty(hypopg_relation_size(indexrelid))

olap-#   FROM hypopg_list_indexes() ;

       indexname       | pg_size_pretty

-----------------------+----------------

 <25398>btree_stock_id | 2544 kB

(1 row)



olap=# SELECT pg_size_pretty(pg_relation_size('stock'));

 pg_size_pretty

----------------

 4328 kB

(1 row)

The above exhibit shows how the estimated total cost can be reduced from 1791 to 8.06 by adding an index to the “id” field of the table to optimize a simple query. It also proves that the index is not really used when the query is executed with an EXPLAIN ANALYZE which executes the query in real time. There is also a way to find out approximately how much disk space the index occupies using the hypopg_list_indexes function of the extension. 

The HypoPG has a few other functions to manage hypothetical indexes and in addition to that, it also offers a way to find out if partitioning a table will improve performance of queries fetching a large dataset. There is a hypothetical partitioning option of HypoPG extension and more of it can be followed by referring to the official documentation.

Conclusion

As stated in part one, PostgreSQL has evolved over the years in only getting bigger, better and faster with rapid development both in the native source code as well as plug and play extensions. An open source version of the new PostgreSQL can be most suitable for plenty of IT shops that are running one of the major proprietary database servers, in order to reduce their IT CAPEX and OPEX. 

There are plenty of PostgreSQL extensions that offer features ranging from monitoring to high-availability and from scaling to dumping binary datafiles into human readable format. It is hoped that the above demonstrations have shed enormous light on the maximum potential and power of a PostgreSQL database.

Explainable Churn Analysis with MemSQL and Fiddler

$
0
0

Feed: MemSQL Blog.
Author: Floyd Smith.

MemSQL and Fiddler Labs are working together to offer the power of MemSQL to users of Fiddler’s toolset for explainable AI – and to offer Fiddler’s explainability tools to the many MemSQL customers who are already using, or moving to, operational AI. To this end, the two companies are offering new, efficient ways to connect MemSQL self-managed software, and the MemSQL Helios managed service in the cloud, to Fiddler’s toolset. 

MemSQL is very well-suited to the demands of operationalizing AI – that is, powering machine learning models and AI applications as they’re put into production. MemSQL processes relational data, JSON data, time series data, geospatial data, and more, with blazing fast ingest speeds, eye-popping transaction performance, unmatched query responsiveness, and high concurrency. There are many resources available which demonstrate this, but among the best is this webinar on machine learning and AI from our own Eric Hanson.  

Fiddler provides a vital need, as AI moves out of labs and into the real world: Explainable AI. With Fiddler, you can describe why your AI models reached a given conclusion – why did one person get a loan, and another get selected for a clinical trial of a new drug? You need to have answers to these kinds of questions, beyond “the model said so.” With Fiddler, business analytics and data science teams can build and deploy models that are inherently explainable, and provide explainability even for models that do not have it built in from the start. 

In this blog post, we show how the MemSQL database and Fiddler work together to solve a knotty business problem: reducing churn among services customers. Solving this single problem cost-effectively can go far toward improving profitability in your business. 

A version of this blog post also appears on the Fiddler Labs website

Reducing Churn

In today’s turbulent economy, customer needs are changing swiftly, causing business disruptions. As a leader, it’s more important than ever to understand the ‘why’ behind customers’ actions, so you can empower your teams to build successful products and services. Having the right infrastructure and tools is critical to enable your teams to respond to these dynamic needs quickly. 

Analyzing, predicting and monitoring churn accurately is critical for every data science or business intelligence team, especially in times like these. 

By complementing MemSQL’s industry-leading capability of enabling fast access to data at scale – across both streaming and historical datasets – with Fiddler’s Explainable AI Platform, which provides visibility, insights, and actionable analytics, business intelligence and analytics teams are perfectly positioned to respond to shifting customer needs and to ensure that customers are well served. 

Challenges with Churn Analysis

There are a common set of analytics challenges to address when monitoring customer churn: 

  • Descriptive analytics – Identifying possible reasons for customer churn
  • Diagnostic analytics – Ascribing actual customer churn to specific reasons
  • Predictive analytics – Predicting churn and the reasons for it 
  • Prescriptive analytics – Identifying potential actions to reduce future churn

Solution

To begin with, all customer data needs to be effectively organized in one place, to enable teams to leverage AI-powered technologies to model the churn. The database needs to be able to handle streaming data in real time, so analytics are performed on the “latest and greatest” data. 

MemSQL’s fast streaming database is an ideal platform for organizing this data. MemSQL provides unmatched processing capabilities for both transactions and queries, vital for operational analytics, machine learning, and AI. By streaming customer data into MemSQL, users get all the interactive query capabilities, along with the ability to keep the data up-to-date within milliseconds. 

We can then run churn analytics on this by connecting it with Fiddler,  an explainable AI platform that helps data scientists and analysts build trust with AI decisions inside their organizations. Cutting-edge explainability algorithms help business users make sense of AI, getting answers to cause-and-effect questions on the drivers behind a prediction.

BI teams regularly iterate on multiple models to predict churn. Fiddler allows comparison of performance of multiple models to identify the most effective one for a given task. The Explainable AI Platform offers a lens to assess model performance and validate models. Since the precision of the churn model not only impacts performance but also decision-making, customers would like to iterate on the models, and monitor several versions of the model, to help in identifying problems and solutions. 

Integrating MemSQL with Fiddler – As Easy as 1-2-3!

While many analytics tasks bring in data from a CSV file, data used in machine learning generally resides in a database like MemSQL. Bringing this data into Fiddler’s Explainable AI Platform as an ML dataset is the first step in the AI/ML development workflow.

Explainable churn analysis with MemSQL and Fiddler.

There are a few ways to bring data into Fiddler. It can be imported directly from any database which Fiddler supports, such as MemSQL; uploaded as a CSV file in the browser; or loaded directly from a file store such as AWS S3. 

1. Preparing Data in MemSQL

For the purpose of this blog post, we used the popular Telco Churn Dataset from Kaggle as an example. Let’s assume this Telco company saves all customer data in MemSQL in a database named churn_example and a table named telco_customer_churn.  Here’s the DDL; you can also access the DDL on Github. The sample dataset is available in an S3 bucket

DROP DATABASE IF EXISTS churn_example;

CREATE DATABASE churn_example;

USE churn_example;


CREATE TABLE telco_customer_churn

(

    customerID TEXT,

    gender TEXT,

    SeniorCitizen BOOLEAN,

    Partner TEXT,

    Dependents TEXT,

    tenure INT,

    PhoneService TEXT,

    MultipleLines TEXT,

    InternetService TEXT,

    OnlineSecurity TEXT,

    OnlineBackup TEXT,

    DeviceProtection TEXT,

    TechSupport TEXT,

    StreamingTV TEXT,

    StreamingMovies TEXT,

    Contract TEXT,

    PaperlessBilling TEXT,

    PaymentMethod TEXT,

    MonthlyCharges DECIMAL(13, 4),

    TotalCharges DECIMAL(13, 4),

    Churn TEXT,

    PRIMARY KEY (customerID)

);

For the purposes of this tutorial, we will populate telco_customer_churn with the information from the Kaggle Telco Churn dataset. This can be done by creating a MemSQL Pipeline to load the data from S3. 

CREATE or REPLACE PIPELINE `telco_customer_churn` AS

    LOAD DATA S3 'download.memsql.com/first-time/WA_Fn-UseC_-Telco-Customer-Churn.csv'

    CONFIG '{"region": "us-east-1"}'

    SKIP DUPLICATE KEY ERRORS

    INTO TABLE `telco_customer_churn`

    FIELDS

        TERMINATED BY ','

        OPTIONALLY ENCLOSED BY '"'

    IGNORE 1 LINES;




START PIPELINE `telco_customer_churn` FOREGROUND;

Once the data is in place, run SELECT * from telco_customer_churn LIMIT 10 to validate the data and the column names

2. Connecting MemSQL to Fiddler

To add MemSQL as a data source, we need the authentication information to construct the database URI. We can add ‘MemSQL’ as the type of database in Fiddler and furnish the rest of the details like the hostname, port, username, password, the database to connect to and add the connector.

The settings are validated via a connection to the database. The ability to add and remove database connectors is an Administrator-privileged operation, whereas usage of data from the connectors themselves is a non-administrators operation.

Importing data from MemSQL into Fiddler.

Once the connector for MemSQL is in place, users can then import data using the connector into Fiddler. To do this, begin Fiddler’s dataset upload workflow, to add this data as a dataset for churn analysis. Select the data source that was just added, then enter the SQL query to select the data to be imported into Fiddler. 

In the background, a database connection is established, the SQL query is run, and its results are ingested into Fiddler. The data is then parsed and validated to infer the column names and data types, which are presented to the user for adjustment as needed.

3. Analyzing Churn using Explainable AI

Next, start analyzing the data. Glean more insights about the features like their mean, variance, and also look at the statistical covariance across all the features. Fiddler’s Explainable AI Platform allows us to analyze the features using feature distribution and mutual information charts to visualize their statistical dependencies, among other details.

In order to leverage Explainable AI, Fiddler offers 2 options: 

  1. Bring in a custom pre-trained model 
  2. Build an interpretable model 

Data scientists can use option #1 to bring in their own ML models, built on open-source or custom ML platforms, and then use Fiddler to explain them. To do this, Fiddler offers a Python library that data scientists can use to upload a pre-trained model. 

Alternatively, they can follow option #2, and use Fiddler to build an interpretable model on the platform. Once the models are ingested, Fiddler uses sophisticated explainability algorithms to compute the causal drivers for model predictions. And the explanations are presented in a collection of dashboards, consumable by business users as well as data scientists. 

For example, an Account Manager in a Customer Success team can use the dashboard below to understand why this customer is likely to churn, with a probability of 75%. 

Top five reasons why this customer is likely to churn.

As shown in the picture above, the top five reasons why this customer is likely to churn are: 

  1. Short tenure (only 4 months) on the telecom service 
  2. Lack of online security in her package
  3. Being on a month-to-month contract
  4. Not having tech support
  5. And paying high monthly charges of $76

Using this information, the Account Manager can then intervene and fiddle with the inputs, and examine what-if scenarios. For example, they can see what would happen if they tried a couple of actions. 

  1. Offer the customer “TechSupport”
  2. Reduce her Monthly Charges from $76 to $60
Two actions would reduce the customer’s likelihood to churn from 75% to 40%.

In addition to simple, business user-facing explanations, Fiddler also supports advanced slicing and explanation capabilities to go deeper into the data and the models – for instance, to understand why the cohort of, for example, low-tenure users are churning. 

Slicing to explain a cohort of high churn, low tenure users.

4. Using MemSQL and Fiddler Together in Production

Once a churn model is operationalized, Fiddler can be connected to a live MemSQL database to continuously monitor, predict, and explain the churn model.  

After the model is live, users can monitor the performance in production and close the feedback loop. Fiddler will connect with MemSQL to score the model in a continuous manner and monitor performance. That way our users can track business KPIs, performance metrics, and setup alerts when something goes out of the ordinary.  Fiddler’s Explainable Monitoring features help analysts and data scientists to keep track of the following: 

  • Feature Attributions: Outputs of explainability algorithms that allow further investigation, helping to understand which features are the most important causal drivers for model predictions within a given time frame. 
  • Data Drift: Track the data coming from MemSQL, so that analysts and data scientists can get visibility into any training-serving skew. 
  • Outliers: The prediction time series from the model outputs, and outliers that are automatically detected for egregious high-churn or low-churn predictions. 
Monitoring dashboard showing Outliers (orange dots) in Churn Prediction.

MemSQL is well-suited to work in tandem with an Explainable AI Platform like Fiddler:

  • Speed: the faster the database runs, the more up-to-date monitoring is, and more explanations can be tested against more data in a given time frame. This increases the functional value of Fiddler. 
  • Scale: MemSQL, as a fully distributed database, can easily be scaled out as needed. There’s no barrier to handling more data, or to speeding up processing of existing data volumes. 
  • SQL: MemSQL is a relational database with native ANSI SQL support. It fully preserves schema, which serves as a valuable input to Fiddler, while connecting to the wide range of BI tools that depend on SQL. (Also, see our take on NoSQL vs NewSQL.)  
  • Spark: In addition to supporting a wide range of monitoring and analytics tools, including Fiddler, MemSQL also boasts the MemSQL Spark Connector 3.0, speeding model processing via predicate pushdown. 
  • Kafka: Streaming data platforms such as Kafka are often used to bring the “latest and greatest” data to machine learning models, without delay. Kafka partitions map directly to MemSQL leaf node partitions, allowing very rapid, parallel ingest and exactly-once processing
  • Converged data: In addition to relational data, MemSQL handles semi-structured JSON data, full-text search for unstructured data, geospatial data, and time series data, with specific time series functionality

In addition to downloadable, self-managed software that runs on all platforms, from on-premises to any cloud, MemSQL offers MemSQL Helios, an elastic managed service in the cloud. With Helios, you can minimize the time to stand up, and the operational effort to run, an advanced, feature-rich database. 

Like MemSQL, Fiddler works with a wide range of BI tools. You simply export data from Fiddler to a range of tools. The data from Fiddler can then be integrated into dashboards, reports, and the answers to interactive queries. 

Interactive churn dashboard.

Conclusion

The seamless integration between MemSQL and Fiddler enables easy import of data from inside MemSQL to Fiddler’s Explainable AI Platform, for ML insights in minutes. Data science and analytics teams working on customer churn can upload their pre-trained models, or quickly generate interpretable models on Fiddler. 

Once the models are in place, Fiddler users can easily create interactive dashboards for users in the business. They can also export explanations into their favorite BI tool of choice. 

Business users, such as account managers, can then self-serve to understand why a customer is likely to churn, run what-if scenarios, and fiddle with data to see what actions they can take to save a customer from churning. 

You can try MemSQL for free or contact MemSQL for more information. Or contact Fiddler for a trial in the cloud. 

Ernst-Georg Schmid: MQTT as transport for PostgreSQL events

$
0
0

Feed: Planet PostgreSQL.
MQTT has become a de-facto standard for the transport of messages between IoT devices. As a result, a plethora of libraries and MQTT message brokers have become available. Can we use this to transport messages originating from PostgreSQL?

Aa message broker we use Eclipse Mosquitto which is dead simple to set up if you don’t have to change the default settings. Such a default installation is neither secure nor highly available, but for our demo it will do just fine. The event generators are written in Python3 with Eclipse paho mqtt for Python.

There are at least two ways to generate events from a PostgreSQL database, pg_recvlogical and NOTIFY / LISTEN. Both have their advantages and shortcomings.

pg_recvlogical:

  • Configured on server and database level
  • Generates comprehensive information about everything that happens in the database
  • No additional programming neccessary
  • Needs plugins to decode messages, e.g. into JSON
  • Filtering has to be done later, e.g. by the decoder plugin
NOTIFY / LISTEN:
  • Configured on DDL and table level
  • Generates exactly the information and format you program into the triggers
  • Filtering can be done before sending the message
  • Needs trigger programming
  • The message size is limited to 8000 bytes
Examples for both approaches can be found here. The NOTIFY / LISTEN example lacks a proper decoder but this makes be a good excercise to start with. The pg_recvlogical example needs the wal2json plugin, which can be found here and the proper setup, which is also explained in the Readme. Please note, that the slot used in the example is mqtt_slot, not test_slot:

pg_recvlogical -d postgres --slot mqtt_slot --create-slot -P wal2json

Otherwise, setup.sql should generate all objects to run both examples.

RSqLParser – tool to parse your SQL queries.

$
0
0

Feed: R-bloggers.
Author: Rimi.

[This article was first published on R – FordoX, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

A slow performing query is a ticking bomb which can lead to explosion i.e a huge performance overhead in your application, any time specially when there is load on database servers. And knowing the its and bits of your SQL query is of utmost importance in diffusing the bomb.

This is not the only scenario when knowing your SQL is important. From your slow query logs, you might want to find the most used tables and time when a particular table gets maximum hits to do some analysis. This information probably can help you decide upon  a time for you to take dumps or fire alter queries on the table.

Say for instance, you have a relatively large SQL query embedded in your application code which has probably more than tens of bind variables scattered here and there. For debugging purpose, you might want to replace those variable with your chosen values and fire them in a particular SQL execution tool which does not support dynamic bind variable replacement.

To cater to all the needs, I felt there is a need of SQL parser in R and came up with this package – RSqlParser inspired by Java’s JSqlParser. This tool will come handy for carrying out many analysis on SQL queries.

With this package, you can design your free tool to identify the reasons for your poorly performing queries or to address your various other use cases.

RSqlParser is a non-validating SQL parser. It expects syntactically correct SQL statements. It can be used to get various components of SQL statements.

Currently, it supports only SELECT statements.

library(RSqlParser)

Methods

There are currently 4 methods in the package:

get_all_bind_variables: Get the bind variables in sql.
get_all_select_cols_with_alias: Get the names of the selected columns in the sql
get_all_subqueries: Get the subqueries in sql.
get_all_tables_with_alias: Get the names of the tables with alias present in
the sql

There are many more methods waiting to be released in upcoming versions of the package. Not only that, in upcoming versions, package should be able to parse all DML and DDL statements.

Till then, if you are facing any issue using the package, please let me know.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook

Webinar June 25: How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB Cluster

$
0
0

Feed: Planet MySQL
;
Author: MySQL Performance Blog
;

How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB ClusterIn this webinar, Sveta Smirnova, MySQL Engineer at Percona, will uncover nuances of Percona XtraDB Cluster (PXC) schema upgrades and point out details you need to give extra attention to.

Percona XtraDB Cluster (PXC) is a 100% synchronized cluster in regards to DML operations. It is ensured by the optimistic locking model and ability to rollback transaction which cannot be applied on all nodes. However, DDL operations are not transactional in MySQL. This adds complexity when you need to change the schema of the database. Changes made by DDL may affect the results of the queries. Therefore all modifications must replicate on all nodes prior to next data access. For operations that run momentarily, it can be easily achieved, but schema changes may take hours to apply. Therefore in addition to the safest synchronous blocking schema upgrade method: TOI, – PXC supports more relaxed, though not safe, method RSU. RSU: Rolling Schema Upgrade is advertised to be non-blocking. But you still need to take care of updates, running while you are performing such an upgrade. Surprisingly, even updates on not related tables and schema can cause RSU operation to fail.

Please join the Sveta Smirnova on Thursday, June 25 at 12 pm EDT for her webinar “How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB Cluster“.

Register Now

If you can’t attend, sign up anyway and we’ll send you the slides and recording afterward.

Claire Giordano: Release notes for Citus 9.3, the extension that scales out Postgres horizontally

$
0
0

Feed: Planet PostgreSQL.

Our latest release to the Citus open source extension to Postgres is Citus 9.3.

If you’re a regular reader of the Citus Blog, you already know Citus transforms Postgres into a distributed database, distributing your data and SQL queries across multiple servers. This post—heavily inspired by the internal release notes that lead engineer Marco Slot circulated internally—is all about what’s new & notable in Citus 9.3.

And if you’re chomping at the bit to get started and try out Citus open source, just go straight to downloading the Citus open source packages for 9.3. Or head over to the Citus documentation to learn more.

Because Citus 9.3 improves our SQL support for window functions in Citus, we decided to add a few “windows” to the racecar in the Citus 9.3 release graphic below.

Citus 9.3 racecar
Citus 9.3 racecar graphic now has “windows”, because of the window function support added in Citus 9.3.

For those who prefer bullets, a summary of all things new in Citus 9.3

Citus 9.3 builds on top of all the HTAP performance goodness in Citus 9.2 and brings you advanced support for distributed SQL, operational improvements, and things that make it easier to migrate from single-node Postgres to a distributed Citus cluster.

Before we dive into what’s new in Citus 9.3, here’s a quick overview of the major themes:

  • ADVANCED DISTRIBUTED SQL SUPPORT
    • Full support for Window functions (enabling more advanced analytics use cases)
    • Improved shard pruning
    • INSERT..SELECT with sequences
    • Support for reference tables on the Citus coordinator
  • OPERATIONAL IMPROVEMENTS
    • Adaptive connection management
    • Propagation of ALTER ROLE across the cluster
    • Faster, low-memory pg_dump
    • Local data truncation function

Window functions

Window functions are a powerful SQL feature that enable you to run algorithms (do transformations) on top of your Postgres query results, in relationship to the current row. Window function support for cross-shard queries had become one of the top feature requests from our Citus analytics users.

Prior to Citus 9.3, Postgres window functions were always supported for router (e.g. single tenant) queries in Citus—and we also supported window functions in SQL queries across shards that used PARTITION BY distribution_column.

The good news: As of Citus 9.3, Citus now has full support for Postgres window functions, to enable more advanced analytics use cases.

If you’re not yet familiar with Postgres window functions, here’s a simple example. The SQL query below uses window functions (via the OVER syntax) to answer 3 questions:

  • for the current person, whose birthday is right before that person’s birthday?
  • how many people have a birthday in the same year as this person?
  • how many people have a birthday in the same month as this person?
SELECT
   name,
   birth_date,
   lag(name) OVER (ORDER BY extract(doy from birth_date)) AS previous_birthday,
   count(*) OVER (PARTITION BY extract(year from birth_date)) AS same_year,
   count(*) OVER (PARTITION BY extract(month from birth_date)) AS same_month
FROM
   birthdays
ORDER BY
   name;

Marco Slot reminded me: another way to understand window functions is to think about the Postgres planner, and exactly where in the process the window functions are handled.

This sketch from Julia Evans is a super useful way to visualize the order that things run in a PostgreSQL query. You can see in Julia’s zine that the SELECT is not at the beginning but rather, the SELECT is run after doing a GROUP BY.

Tweet: SQL queries run in this order
Source: This tweet from Julia Evans, @b0rk on Twitter.

For those of you who work with analytics, Postgres window functions can be incredibly powerful because you don’t have to write a whole new algorithm or rewrite your SQL query to use CTEs (common table expressions).

Improved shard pruning

We received an amazing open source contribution on the Citus github repo from one of our Citus enterprise users who wanted their Postgres queries to be faster. Many of their SQL queries only need to access a small subset of shards, but when the WHERE clause involves expressions involving OR, some of their SQL queries were still querying all shards.

By expanding the shard pruning logic, our Citus customer (thank you, Markus) has made the shard pruning logic work with arbitrary Boolean expressions. So now, these types of Postgres queries will go to fewer shards (in this case, to just 2 or 3 shards instead of 32). The net result: faster query performance.

INSERT..SELECT with sequences

One of the most powerful commands in the Citus extension to Postgres is INSERT..SELECT, because it can be used for parallel, distributed data transformations that run as one distributed transaction.

Citus 9.3 enables inserting into a distributed table with a sequence, which was one of the few previous limitations of INSERT..SELECT on distributed tables.

Support for reference tables on the Citus coordinator node

To take advantage of distributed tables in Citus, it’s important that your SQL queries can be routed or parallelized along the distribution column. (If you’re still learning about how sharding in Citus works, Joe’s documentation on choosing the distribution column is a good place to start.)

Anyway, the need to route queries along the distribution column (sometimes called the “distribution key” or the “sharding key”) means that if you’re migrating an application originally built on single-node PostgreSQL over to Citus, you might need to make a few data model and query changes.

One of the ways Citus 9.3 simplifies migrations from single-node Postgres to a distributed Citus cluster is by improving support for using a mixture of different table types in Citus: reference tables, distributed tables, and local tables (sometimes referred to as “standard Postgres tables.”)

With Citus 9.3, reference tables can now be JOINed with local tables on the Citus coordinator.

This was kind of sort of possible before, but has become a lot better in Citus 9.3. Having the reference table on the Citus coordinator node and being able to JOIN between the local table and the reference table on the coordinator itself.

And if your application reads from the reference table on the Citus coordinator node, there won’t be a roundtrip to any of the Citus worker nodes, because you’ll be able to read directly from the coordinator (unless you actually want to round robin to balance the load across the cluster.)

In the example SQL below, after adding the coordinator to the Citus metadata, JOINs between local tables and reference tables just work. (And yes, we’re taking steps to remove references to “master” from Citus and to rename master_add_node to something else. Naming matters, stay tuned.)

-- Add coordinator to Citus metadata on the coordinator
SELECT master_add_node('10.0.0.4', 5432, groupid := 0)

-- Joins between local tables and reference tables just work
BEGIN;
INSERT INTO local VALUES (1,2);
SELECT * FROM local_table JOIN reference_table ON (x = a);
END;

An example of how reference tables on the Citus coordinator can help

But what if you have tables like the ones in the schema below, in which:

clicks joins with ads
ads joins with publishers
ads joins with campaigns

Prior to Citus 9.3, for the example I explain above, you would have had to make all the tables either a distributed table or a reference table in order to enable these joins, like this:

diagram 1
Diagram 1: Table Schema for this scenario, prior to Citus 9.3

But now with Citus 9.3, because you can now add a reference table onto the Citus coordinator node, you can JOIN local tables (aka standard Postgres tables) on the Citus coordinator with Citus reference tables.

Imagine if the clicks table is the only really big table in this database. Maybe the size and scale of the clicks table is the reason you’re considering Citus, and maybe the clicks table only needs to JOIN with the ads table.

If you make the ads table a reference table, then you can JOIN all the shards in the distributed clicks table with the ads reference table. And maybe everything else is not that big and we can just keep the rest of the tables as local tables on the Citus coordinator node. This way, any Postgres query that hits those local tables, any DDL, well, we don’t have to change anything, because the query is still being handled by Postgres, by the Citus coordinator acting as a Postgres server.

diagram 2
Diagram 2: Table Schema for this scenario, enabled by Citus 9.3 and ability to put a reference table on coordinator

One piece we don’t have yet is support for foreign keys between the reference table and the local tables, but we are considering making that feature available in the future. (Do you think that would be useful? You can always reach us on our Citus public slack.)

Interesting side-effect: distributed tables can sit entirely on Citus coordinator node

One interesting side effect of this new Citus 9.3 feature: you can now have distributed tables that sit entirely on the coordinator, so you can add the coordinator to the metadata and create a distributed table where all the shards are on the coordinator and the Postgres queries will just work. Just think how useful that can be for testing, since you can now run your test against that single Citus coordinator node.

Adaptive connection management (a super awesome operational improvement)

Those of you who already use Citus to scale out Postgres realize that Citus does much more than just shard your data across the nodes in a database cluster. Citus also distributes the SQL queries themselves across the relevant shards and to the worker nodes in the database cluster.

The way Citus does this: Citus parallelizes your distributed SQL queries across multiple cores per worker node, by opening multiple Postgres connections to each worker during the SQL query execution, to query the shards on each Citus worker node in parallel.

Unlike the built-in parallel query mechanism in PostgreSQL, Citus continues to parallelize queries under high concurrency. However, connections in PostgreSQL are a scarce resource and prior to Citus 9.3, distributed SQL queries could fail when the coordinator exceeded the Postgres connection limit on the Citus worker nodes.

The good news is, in Citus 9.3, the Citus adaptive executor now tracks and limits the number of outgoing connections to each worker (configurable using citus.max_shared_pool_size) in a way that achieves fairness between SQL queries, to avoid any kind of starvation issues.

Effectively, the connection management in the Citus adaptive executor is now adaptive to both the type of PostgreSQL query it is running, as well as the load on the system. And if you have a lot of analytical SQL queries running at the same time, the adaptive connection management in Citus will now magically keep working without needing to set up PgBouncer.

Propagation of ALTER ROLE across the cluster

In Citus 9.3, Citus now automatically propagates commands like ALTER ROLE current_user SET..TO.. to all current and new workers. This gives you a way to reliably set configurations across all nodes in the cluster. (N.B. You do need to be the database superuser to take advantage of this new feature.)

Faster, low-memory pg_dump for Citus

Citus stores the results of PostgreSQL queries on shards in a tuple store (an in-memory data structure that overflows into a file) on the Citus coordinator. The PostgreSQL executor can then process these results on the Citus coordinator as if they came from a regular Postgres table.

However, prior to Citus 9.3, the approach of writing to a tuple store sometimes caused issues when using pg_dump to get the entire table’s contents, since the full distributed table might not fit on the Citus coordinator’s disk.

In Citus 9.3, we changed the implementation of the COPY <table> TO STDOUT command, in order to stream tuples directly from the workers to the client without storing them.

Bottom line, if you need to pg_dump your Citus distributed database, now you can stream your Citus database to your client directly, without using a lot of memory or storage on the Citus coordinator node.

Local data truncation function

When calling create_distributed_table on a table on the coordinator that already contains data, the data is copied into newly created shards, but for technical reasons it is not immediately removed from the local table on the Citus coordinator node.

The presence of the local data could later cause issues, but the procedure for removing the old data was prone to mistakes and sometimes led to operational issues. In Citus 9.3, we introduced a new function, truncate_local_data_after_distributing_table, to make it easier to clean up the local data—which saves space and makes sure you won’t run into situations where you cannot create a Postgres constraint because the local data does not match it.

Helping you scale out Postgres with Citus 9.3

By the time I publish this post, our Citus distributed engineering team will be well on their way to Citus 9.4. I can’t wait to find out what our Postgres team will bring to us next.

With the advanced support for distributed SQL, operational improvements, and things that make it easier to migrate from single-node Postgres to a distributed Citus cluster—well, with Citus 9.3, your window functions will just work, you don’t have to worry about deciding whether every table needs to be distributed or not, you can do pg_dumps more easily, adaptive connection management will improve your operational experience… In short, we think Citus 9.3 is pretty darn awesome. I hope you do, too.

Haroon .: RESTful CRUD API using PostgreSQL and Spring Boot – Part 2

$
0
0

Feed: Planet PostgreSQL.

This article is an extended version atop of the previous article which was a kickstart for building an application using Spring Boot and PostgreSQL. There is no internal feature supported by Java which offers mapping between class objects and database tables; therefore, we use Object Relational Model (ORM). JPA is an abstraction above JDBC and works as an ORM to map Java Objects with the entities in database using metadata. Querying database for all the user operations is another task of JPA for which it uses the concept of “repository”. Consequently, with JPA the development of your product gets at least an order of magnitude faster as it comes up with auto-configuration option which is a goal of Spring Boot to develop Spring applications. 

Note: You should read the documentation for Spring Boot objects using the package and type name provided in the code samples below.

As mentioned, entities are mapped with the tables in the database where every entity instance holds up the data of a row in the table. Therefore, following this convention we will use @Entity annotation to map and store data of our POJO (Plain Old Java Object) to a table in the database. 

package com.demos.crud.data.models;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.Table;

@Entity
@Table(name = "people")
public class Person {

    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)

    @Column(name = "id")
    public Long id;

    @Column(name = "name")
    public String name;

    @Column(name = "role")
    public String role;

    public Person() {}

    public Person(long id, String name, String role) {

        this.id = id;
        this.name = name;
        this.role = role;
    }

    @Override
    public String toString() {

        StringBuilder builder = new StringBuilder();

        builder.append(String.valueOf(id));
        builder.append(", ");
        builder.append(name);
        builder.append(", ");
        builder.append(role);

        return builder.toString();
    }
}

The @Entity annotation should be defined at the class level as it will make JPA aware of this class as a database entity; and help create tables and schema in database upon startup. Like other ORMs, JPA also requires uniquely identified primary key which we have set to automatic generation (like 1, 2, 3… and so on). For different data types, the JPA provider will generate the identities in a different way. 

Creating the repository

CRUD in Spring is available with CrudRepository type that offers CRUD implementation over the underlying database structure.

You can create a new CrudRepository by linking the entity type and the type of primary key using Java generics as:

package com.demos.crud.data.repositories;

import com.demos.crud.data.models.Person;
import org.springframework.data.repository.CrudRepository;
import org.springframework.stereotype.Repository;

@Repository
public interface PersonRepository extends CrudRepository<Person, Long> {
}

That is the entire JPA repository code you need to write to provide an ORM support to your applications. 

Connection settings for repository

Spring allows you to configure application settings from a file called application.properties. You can configure the connection string and connection pooling (see the part 1 for this post) for the PostgreSQL. 

## default connection pool
spring.datasource.hikari.connectionTimeout=20000
spring.datasource.hikari.maximumPoolSize=5
spring.jpa.generate-ddl=false

## PostgreSQL
spring.datasource.url=jdbc:postgresql://172.17.0.2:5432/postgres
spring.datasource.username=postgres
spring.datasource.password=mysecretpassword

# spring.jpa.properties.hibernate.default_schema=techwriting
spring.datasource.hikari.schema=techwriting

#drop n create table again, good for testing, comment this in production
spring.jpa.hibernate.ddl-auto=create

We configure the connection pool to allow up to 5 connections, and the PostgreSQL connection string. Spring Boot JPA will read these settings and create a PostgreSQL ORM. Read more about application.properties here.

Creating a service

Although this step is not necessary, but it is a security practice to create a service around your native ORM and PostgreSQL connection to prevent unvalidated access to your database engine. We do that by creating an interface:

package com.demos.crud.data.services;

import java.util.List;
import com.demos.crud.data.models.Person;

public interface PeopleService {
    List<Person> findAllPeople();
    Person findById(long id);
    Person insert(Person p);
    boolean delete(long id);
    boolean update(Person p);
}

This interface has one job: expose the functions of the repository as they are needed, and only the methods that are needed by application. We will create a concrete implementation of the service as a class and use that class in our controllers later.

@Autowired
private PersonRepository repository;

@Override
public List<Person> findAllPeople() {
    return (List<Person>)repository.findAll();
}

@Override
public Person insert(Person p) {
    return repository.save(p);
}

@Override
public boolean delete(long id) {
    try {
        repository.deleteById(id);
        return true;
    } catch (Exception e) {
        System.out.println(e.getMessage());
        return false;
    }
}

@Override
public Person findById(long id) {
    Optional<Person> result = repository.findById(id);
    if (result.isPresent()) {
        return result.get();
    } else {
        return null;
    }
}

@Override
public boolean update(Person p) {
    try {
        repository.save(p);
        return true;
    } catch (Exception e) {
        System.out.println(e.getMessage());
        return false;
    }
}

This type will be used as an instance in our application, and Spring Boot will provide the implementation by automatically using Autowiring; dependency injection. You note that we are using @Autowired to inject the repository in this instance too. 

Generating schema of the database

When you launch Spring Boot application, JPA will automatically create the table in the schema you specify (see application.properties above). You can see this through PostgreSQL admin panel using OmniDB by providing the connection details and connecting:

Note if you have an existing database or database schema that you are connecting to, you should disable the schema generation of Spring to prevent data loss. PostgreSQL has the columns created for us; and their types are specified by JPA. The SQL for this table was provided in the previous part of this article as well. 

For this demo, we will use RestController to create a RESTful API to demonstrate the connection of Java application with PostgreSQL. The controller has this structure:

package com.demos.crud.controllers.apis;

import java.util.List;
import com.demos.crud.data.models.Person;
import com.demos.crud.data.services.PeopleService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.DeleteMapping;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController()
@RequestMapping("/api/people")
public class PeopleApiController {

    @Autowired
    PeopleService peopleService;
    
    // Routes here
}

This API will listen on /api/people for incoming requests and will have the PeopleService instance injected by the platform. 

Adding the routes

You can create routes in your application to receive the requests and provide a response. I have created the following routes in demo REST controller (add them to the RESTful controller we created in the previous post):

@GetMapping("")
public List<Person> getAllPeople() {
    return peopleService.findAllPeople();
}

@GetMapping("{id}")
public Person getPerson(@PathVariable long id) {
    return peopleService.findById(id);
}

@PostMapping("")
public String addPerson(@RequestBody Person person) {

    if(person != null) {
        peopleService.insert(person);
        return "Added a person";
    } else {
        return "Request does not contain a body";
    }
}

@DeleteMapping("{id}")
public String deletePerson(@PathVariable("id") long id) {

    if(id > 0) {
        if(peopleService.delete(id)) {
            return "Deleted the person.";
        } else {
            return "Cannot delete the person.";
        }
    }
    return "The id is invalid for the person.";
}

@PutMapping("")
public String updatePerson(@RequestBody Person person) {
    if(person != null) {
        peopleService.update(person);
        return "Updated person.";
    } else {
        return "Request does not contain a body";
    }
}

You can add more routes as you need. I have created the routes for the reading part, creating part, updating, and deleting part. Read more about routing in Spring applications here. The routes that we created here do:

  1. getAllPeople is responsible for reading all the people from database and returning them as a List of Person type.
  2. getPerson accepts a variable from the URL path and uses that as an ID to query the database for a Person. If the person is not found, a null value is returned. You can use a simple if…else block to send a custom response.
  3. addPerson corresponds to a POST HTTP request and retrieves the Person details sent in the HTTP request body. Read next parts to learn how to send an object with the request.
  4. deletePerson corresponds to a DELETE HTTP request method and requests the repository to delete a person record from database with the specified ID.
  5. updatePerson corresponds to a PUT HTTP request method and uses the same function (“save”) as used in the addPerson function to update the record. Read the next part to learn the difference in these two functions.

In the next step, you will call the HTTP API endpoints on the application to perform CRUD operations on PostgreSQL database and review the state changes using OmniDB. 

CRUD Operations

I will use Postman to test the API. You can use command-line or other tools to send the requests to the API as well. In the Postman, input the API endpoints and make the request. The default endpoint for our Java application is http://localhost:8080 and our API is hosted at /api/people/.

Create a record

Before sending any requests, let us quickly review the state of our database. Our database does not contain any records in the table as shown in screenshot below:

Now, to add a new record to our database, we will issue a request to out API endpoint with a POST HTTP method. We will pass the JSON notation for our data in the request body for our Java code to extract from:

In the next section we see the state of our database after this operation; with the record added to the table.

Read the record(s)

The entity is added to the database, we can confirm it using OmniDB admin panel, using the SQL:

SELECT t.id
     , t.name
     , t.role
  FROM techwriting.people t
 ORDER BY t.id

You can see that the person record we added is now found in the database. And you can also confirm that this works using the RESTful API method: 

Web application returns the objects from the database. We only have a single person added to the database. Go ahead, execute a few more HTTP calls and create up to 5 records, and see how Java application parses the data from database in a nice JSON document.

Updating a record

You can update a record using the PUT method on the endpoint and passing an object as payload. So far, we have a record in our database with the role “Technical Writer”. In order to update this record’s role to “Software Engineer”, we can use the updatePerson behind PUT HTTP method and send the record as payload to request:

We sent the details for the record; note the ID of the record, since ORM would use the primary key to create a new record or update the record in the database. Postman shows that we were able to update the person. We can check for the record in OmniDB by rerunning the SQL command we shared above, which results in the following data grid:

This shows the record was updated in the database as well. Now, our database contains a record and we would like to see how to delete it, in the next section we will take a look at the deletion of the records.

Deleting a record

You can use the Java methods to delete the objects from the database by:

  1. Passing the object (Person in this case)
  2. Passing the ID (Long object in our case)
  3. Iterable instance

You can use Iterables and Transaction to overcome the load on the database and pass the array as a buffer to the database to process in a single request. 

For sake of simplicity, we do a deletion based on the id of the Person (see the Java code for delete route above). 

The route captures the ID “1” and JPA uses this ID to delete the records from PostgreSQL database. You can rerun the SQL (provided above) to verify the objects are deleted:

You can now check using the GET request on the API endpoint to get the list of people as well. Likewise, it returns an empty list:

This shows that each operation that you perform can create new records, update them or remove them based. 

In this blog, we covered the PostgreSQL side for the Spring Boot and JPA ORM. We create the entities in Java JPA and linked PostgreSQL database. We had JPA create the tables/records in the database for us. We also created the JPA repository for CRUD operations and a custom service on top of JPA repository as a security layer to prevent direct access to our JPA resources. 

We created the API controller and linked it with JPA repository using Spring’s dependency injection. We used Postman to perform CRUD operations on the database. You learnt how easy it is to implement a database and data access layer on top of PostgreSQL instances. Java JPA provides type mapping from POJO to PostgreSQL types. We learnt that we can perform CRUD operations and use HTTP methods as a design principle to control what actions is performed on our API. 

You learnt how to connect PostgreSQL database to your Java application, you can use the code provided above (and in the previous part) to develop your applications. 


Faster, Better, Stronger, InnoDB in MariaDB Server 10.5

$
0
0

Feed: Clustrix Blog.
Author: Marko Mäkelä.

InnoDB is the default storage engine for MariaDB Enterprise Server and MariaDB Community Server. While originally based on the MySQL implementation of InnoDB, MariaDB has been diverging from the original for years in order to give MariaDB users a better experience. For example by implementing persistent AUTO_INCREMENT in 10.2, and trx_sys.mutex scalability improvements, instant ADD COLUMN in 10.3, and instant DROP COLUMN and other ALTER TABLE improvements in 10.4. MariaDB Server 10.5 continues on this path.

MariaDB Server 10.5 includes significant improvements to the InnoDB storage engine in MariaDB. By making better use of hardware resources by InnoDB, we’ve improved performance and scalability and have made backup and recovery faster and easier.

Generally speaking the improvements can be grouped into three categories: changes to configuration parameters; changes to background tasks; and changes to the redo log and recovery.

Changes to Configuration Parameters

We have changed how some configuration parameters behave. We have deprecated some and hardwired others.

  • innodb_buffer_pool_instances=1 Hardwired at 1. In our tests, a single buffer pool provides the best performance in almost every instance.
  • innodb_page_cleaners=1 Hardwired at 1. We only have 1 buffer pool instance so only 1 cleaner is needed.
  • innodb_log_files_in_group=1 (only ib_logfile0) Hardwired at 1. In our benchmarks, 1 showed a performance improvement. In 10.2 you could set it to 1 but the default was 2. In 10.5 you can only have a single file.
  • innodb_undo_logs=128 (the “rollback segment” component of DB_ROLL_PTR) Hardwired at 128 which has been the default for a long time. It can no longer be changed.
  • For MariaDB Enterprise Server only made innodb_log_file_size and innodb_purge_threads dynamic. They can be changed with SET GLOBAL without restarting the server.
  • Cleanup of InnoDB Data Scrubbing code. The background operations for scrubbing freed data have been removed, and related configuration parameters have been deprecated. When scrubbing is enabled or page_compressed tables are being used, the contents of freed pages will be zeroed out or freed to the file system by the normal page flushing mechanism.
  • SHOW GLOBAL STATUS improvement. SHOW GLOBAL STATUS now shows several variables from SHOW ENGINE INNODB STATUS. This simplifies monitoring of these parameters.

Changes to Background Tasks

We addressed some maintenance debt by making changes to a number of background tasks.

  • Eliminated background merges of the change buffer. The change buffer was created some 20 years ago for reasons that made sense at the time, but a lot has changed since then. Asynchronous merges of the change buffer have been recognized as a cause of problematic I/O spikes and random crashes. The change buffer intends to speed up modifications to secondary indexes when the page is not in the buffer pool. We will now only merge buffered changes on demand, when the affected secondary index page must be read.
  • Single buffer pool. In extensive testing, the single buffer performs best in almost every case.
    • with a single buffer pool, it is easier to employ std::atomic data members
    • single buffer pool only requires a single page cleaner, single flush list of dirty pages
    • In a write-heavy workload on a tiny buffer pool, we observed some increased contention on the buffer pool mutex, but that was resolved by making the buffer pool more scalable. A refactored, cache-aware hash table implementation with simple std::atomic based read-write locks addressed this scalability problem.
  • Background tasks use thread pool.  Background tasks now use a thread pool rather than separate internal threads. The internal thread pool dynamically creates or destroys threads based on the number of active tasks. In particular, the I/O threads have been cleaned up.

    This simplifies management because it is easier to configure the maximum number of tasks of a kind than to configure the number of threads. Also, in MariaDB Enterprise Server you can set  innodb_purge_threads dynamically with SET GLOBAL.

    There are still some tasks that use dedicated threads such as background rollback of incomplete transactions and crash recovery. And, there are still some tasks that have yet to be moved to the thread pool, for example, the encryption key notation thread.

  • Innodb-wide r/w lock for locking concurrent DDL replaced by metadata locks(MDL) on table name. Now we use metadata locks when executing some background operations. We acquire the lock to prevent users from executing something like DROP TABLE while a purge of history is executing. This used to be covered by dict_operation_lock covering any InnoDB table.  Acquiring only MDL on the table name improves scalability.

Changes to Redo Log and Recovery

We have improved the Redo Log record format to make it more compact and extensible which also makes backup and recovery faster and more reliable. The changes improve the Redo Log and Recovery while still guaranteeing atomicity, consistency, isolation, and durability (ACID), and full crash safety of user transactions.

  • Freed pages log record. Creating log records that indicate when data pages have been freed, for example, when you are dropping an index or performing a large delete of records in a data file accomplishes a couple of things:
    • We now avoid writes of freed pages after DROP (or rebuild). The new log records indicate that the pages will be freed so the contents will no longer be written to the data files.
    • Recovery is faster since recovery can skip reading those pages when it sees that the page was freed.
  • New log record format. The new log record format explicitly encodes lengths which makes the physical format easy to parse and minimizes copying because a record can never exceed innodb_page_size. This makes for stronger and faster backup and recovery.
  • Improved validation of logical records for UNDO_APPEND, INSERT, DELETE. The special logical records improve performance during normal operations and backup and recovery due to fewer writes and reads of data pages. Improved validation detects corrupted data more reliably.
  • Improved group commit. Reduces contention and improves scalability and write speed.
    • group_commit_lock introduced for more efficient synchronization of redo log writing and flushing. It reduces CPU consumption on log_write_up_to(). It also reduces spurious wakeups, and improves the throughput in write-intensive benchmarks.
  • Optional libpmem interface added for improved performance on Intel ® Optane™ DC Persistent Memory Module. Requires compiling from source.
  • Optimized memory management on recovery (or mariabackup --prepare)

Conclusion

The changes we’ve made to the InnoDB storage engine in MariaDB Server 10.5 provide significant performance improvements, scalability and easier management while supporting in-place upgrades from older versions. It will provide a greatly improved experience to our users now and going forward.

Download MariaDB Community Server 10.5 and give a try yourself to see the difference.

More Information

KafkaCDC

$
0
0

Feed: MariaDB Knowledge Base Article Feed.
Author: .

Overview

The KafkaCDC module reads data changes in MariaDB via replication and converts
them into JSON objects that are then streamed to a Kafka broker.

DDL events (CREATE TABLE, ALTER TABLE) are streamed as JSON objects in the
following format (example created by CREATE TABLE test.t1(id INT)):

{
  "namespace": "MaxScaleChangeDataSchema.avro",
  "type": "record",
  "name": "ChangeRecord",
  "table": "t2",              // name of the table
  "database": "test",         // the database the table is in
  "version": 1,               // schema version, incremented when the table format changes
  "gtid": "0-3000-14",        // GTID that created the current version of the table
  "fields": [
    {
      "name": "domain",       // First part of the GTID
      "type": "int"
    },
    {
      "name": "server_id",    // Second part of the GTID
      "type": "int"
    },
    {
      "name": "sequence",     // Third part of the GTID
      "type": "int"
    },
    {
      "name": "event_number", // Sequence number of the event inside the GTID
      "type": "int"
    },
    {
      "name": "timestamp",    // UNIX timestamp when the event was created
      "type": "int"
    },
    {
      "name": "event_type",   // Event type
      "type": {
        "type": "enum",
        "name": "EVENT_TYPES",
        "symbols": [
          "insert",           // The row that was inserted
          "update_before",    // The row before it was updated
          "update_after",     // The row after it was updated
          "delete"            // The row that was deleted
        ]
      }
    },
    {
      "name": "id",           // Field name
      "type": [
        "null",
        "long"
      ],
      "real_type": "int",     // Field type
      "length": -1,           // Field length, if found
      "unsigned": false       // Whether the field is unsigned
    }
  ]
}

The domain, server_id and sequence fields contain the GTID that this event
belongs to. The event_number field is the sequence number of events inside the
transaction starting from 1. The timestamp field is the UNIX timestamp when
the event occurred. The event_type field contains the type of the event, one
of:

  • insert: the event is the data that was added to MariaDB
  • delete: the event is the data that was removed from MariaDB
  • update_before: the event contains the data before an update statement modified it
  • update_after: the event contains the data after an update statement modified it

All remaining fields contains data from the table. In the example event this
would be the fields id and data.

DML events (INSERT, UPDATE, DELETE) are streamed as JSON objects that
follow the format specified in the DDL event. The objects are in the following
format (example created by INSERT INTO test.t1 VALUES (1)):

{
  "domain": 0,
  "server_id": 3000,
  "sequence": 20,
  "event_number": 1,
  "timestamp": 1580485945,
  "event_type": "insert",
  "id": 1
}

The router stores table metadata in the MaxScale data directory. The
default value is /var/lib/maxscale/<service name>. If data for a table
is replicated before a DDL event for it is replicated, the CREATE TABLE
will be queried from the master server.

During shutdown, the Kafka event queue is flushed. This can take up to 60
seconds if the network is slow or there are network problems.

Configuration

The servers parameter defines the set of servers where the data is replicated
from. The replication will be done from the first master server that is found.

The user and password of the service will be used to connect to the
master. This user requires the REPLICATION SLAVE grant.

The KafkaCDC service must not be configured to use listeners. If a listener is
configured, all attempts to start a session will fail.

Parameters

bootstrap_servers

The list of Kafka brokers to use in host:port format. Multiple values
can be separated with commas. This is a mandatory parameter.

topic

The Kafka topic where the replicated events will be sent. This is a
mandatory parameter.

enable_idempotence

Enable idempotent producer mode. This feature requires Kafka version 0.11 or
newer to work and is disabled by default.

When enabled, the Kafka producer enters a strict mode which avoids event
duplication due to broker outages or other network errors. In HA scenarios where
there are more than two MaxScale instances, event duplication can still happen
as there is no synchronization between the MaxScale instances.

The Kafka C library,
librdkafka,
describes the parameter as follows:

When set to true, the producer will ensure that messages are successfully
produced exactly once and in the original produce order. The following
configuration properties are adjusted automatically (if not modified by the
user) when idempotence is enabled: max.in.flight.requests.per.connection=5 (must
be less than or equal to 5), retries=INT32_MAX (must be greater than 0),
acks=all, queuing.strategy=fifo.

timeout

The connection and read timeout for the replication stream. The default
value is 10 seconds.

Example Configuration

The following configuration defines the minimal setup for streaming replication
events from MariaDB into Kafka as JSON:

# The server we're replicating from
[server1]
type=server
address=127.0.0.1
port=3306
protocol=MariaDBBackend

# The monitor for the server
[MariaDB-Monitor]
type=monitor
module=mariadbmon
servers=server1
user=maxuser
password=maxpwd
monitor_interval=5000

# The MariaDB-to-Kafka CDC service
[Kafka-CDC]
type=service
router=kafkacdc
servers=server1
user=maxuser
password=maxpwd
bootstrap_servers=127.0.0.1:9092
topic=my-cdc-topic

Limitations

  • The KafkaCDC module provides at-least-once semantics for the generated
    events. This means that each replication event is delivered to kafka at least
    once but there can be duplicate events in case of failures.

Haroon .: Bulk transactions with RESTful CRUD API using PostgreSQL and Spring Boot

$
0
0

Feed: Planet PostgreSQL.

A typical database-oriented application setup only must work with single operation SQL or RESTful execution. Your customers might only be sending the INSERT or UPDATE queries one at a time. We have covered this approach in part 2 for our series and have created a simple RESTful API that allows users to perform CRUD operations on a PostgreSQL database. More advanced and high-velocity solutions require an efficient solution that can handle thousands of database operations per second, per client. 

Bulk or batch operations make sense when you are handling loads of data from a single origin—say, IoT or smart home devices. In the next section, you will learn why batching is needed when you are running a query multiple time.

Bulk Operations

In a database, each operation you perform, a SELECT or INSERT query in SQL, takes a trip to database engine to be committed. In a system with hundreds of queries each second, it can get difficult to maintain the performance and server efficiency for our database as well as web application. To solve this, we combine multiple queries and send them to the database in a single go. With PostgreSQL, you should be aware of INSERT statements with multiple values that get added in a single go:

INSERT INTO table_name (col1, col2, col3) 
VALUES 
(val1, val2, val3),
(val4, val5, val6),
(val7, val8, val9);

Executing this INSERT statement is equivalent to the following three INSERT statements in terms of result:

INSERT INTO table_name (col1, col2, col3) 
VALUES (val1, val2, val3);

INSERT INTO table_name (col1, col2, col3)
VALUES (val4, val5, val6);

INSERT INTO table_name (col1, col2, col3) 
VALUES (val7, val8, val9);

Their primary difference is how they sent the query to the server; read Inserting Data to learn more on this approach. Former approach inserts three records in a single go, whereas later one sends three queries to database engine to insert three records. PostgreSQL wraps these three queries in three transactions implicitly, adding an extra overhead; read more on Transactions for this.

Using the JPA model we created in previous posts, we can update it to support batch operations while continuing to support single record operations. JPA repository methods that we used previously, such as:

  1. Save
  2. Delete
  3. Update

All offer overloaded methods that can be used to work on a list of records. The overloaded methods provide a support for Iterable<T> to be used as argument instead of a single Entity type.

Updating Model

Note: If you have been following part 1 and part 2 of this blog, then you should update the existing classes for Model as well as Service and MVC routes. If you are reading this part as first one, then you can copy the code in your existing application.

We will rewrite the Person model to support the Sequential ID generation for the entity. 

package com.demos.crud.data.models;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.Table;

@Entity
@Table(name = "people")
public class Person {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    @Column(name = "id")
    public Long id;

    @Column(name = "name")
    public String name;

    @Column(name = "role")
    public String role;

    public Person() {}

    public Person(long id, String name, String role) {
        this.id = id;
        this.name = name;
        this.role = role;
    }

    @Override
    public String toString() {

        StringBuilder builder = new StringBuilder();
        builder.append(String.valueOf(id));
        builder.append(", ");
        builder.append(name);
        builder.append(", ");
        builder.append(role);

        return builder.toString();
    }
}

This Entity will be used by JPA to enable batch insertions. 

Updating Service and Implementation

Before we connect a Spring Boot MVC to our JPA repository, we need to intercept it via a service that will expose only necessary functions and will hide sensitive resources such as the JPA repositories. To handle this, we create a service that entails the methods that our MVC will use: 

package com.demos.crud.data.services;

import java.util.List;
import com.demos.crud.data.models.Person;

public interface PeopleService {
    List<Person> findAllPeople();
    Person findById(long id);
    List<Person> findByIds(List<Long> id);

    Person insert(Person p);
    List<Person> insertAll(List<Person> p);

    boolean delete(Long id);
    boolean deleteAll(List<Person> ids);

    boolean update(Person p);
    boolean updateAll(List<Person> p);
}

Our PeopleService exposes CRUD functions, for single entities as well as for list of entities where we apply batch operations. We will add implementation for this in our Java class that will be injected to MVC RESTful controllers. Create a new class PeopleServiceImpl and implement the PeopleService interface to connect to the JPA repository—to learn how to create the repository, please check part 2.

package com.demos.crud.data.services;

import java.util.List;
import java.util.Optional;
import com.demos.crud.data.models.Person;
import com.demos.crud.data.repositories.PersonRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

@Service
public class PeopleServiceImpl implements PeopleService {

    @Autowired
    private PersonRepository repository;

    @Override
    public List<Person> findAllPeople() {
    return (List<Person>)repository.findAll();
    }

    @Override
    public Person findById(long id) {
        Optional<Person> result = repository.findById(id);
        if (result.isPresent()) {
            return result.get();
        } else {
            return null;
        }
    }

    @Override
    public List<Person> findByIds(List<Long> ids) {
        return (List<Person>)repository.findAllById(ids);
    }

    @Override
    public Person insert(Person p) {
        return repository.save(p);
    }

    @Override
    public List<Person> insertAll(List<Person> p) {
        return (List<Person>)repository.saveAll(p);
    }

    @Override
    public boolean delete(Long id) {
        try {
            repository.deleteById(id);
            return true;
        } catch (Exception e) {
            System.out.println(e.getMessage());
            return false;
        }
    }

    @Override
    public boolean deleteAll(List<Person> ids) {

        try {
            repository.deleteAll(ids);
            return true;
        } catch (Exception e) {
            return false;
        }
    }

    @Override
    public boolean update(Person p) {

        try {
            repository.save(p);
            return true;
        } catch (Exception e) {
            System.out.println(e.getMessage());
            return false;
        }
    }

    @Override
    public boolean updateAll(List<Person> p) {
        try {
            repository.saveAll(p);
            return true;
        } catch (Exception e) {
            return false;
        }
    }
}

This is where we connect our service to JPA repository and execute functions in repository. A few things to note here:

  1. JPA repository uses Java’s Iterable<T> type as parameter for functions with lists.
  2. JPA repository methods that take an Iterable<T> return an Iterable<T> of the entities that have been persisted in the database.
  3. You can name methods of your implementation however you like them, but it is helpful to name them like repository’s methods. But you should avoid exposing the repository to RESTful controllers for security reasons.

In the code above, we have used List<T> type and passed it to the methods that require an Iterable<T>. It is possible because Iterable<T> is parent for List<T>, and List<T> is widely used container type in Java applications.

Now that we have written the underlying DAO service, next step is to connect the service to our RESTful controller and create HTTP routes that support HTTP methods and their respective operations in the system. Two things to keep in mind while developing a batch-supportive endpoint:

  1. Batch endpoints are expected to in-take a huge amount of data. You can accept input in batch, or batch the data after accepting it from endpoint. We select the former, we accept the input in batch (a List<T>).
  2. It is suitable to create a separate endpoint for single entity operation and a separate endpoint for batch operations. This can help your application take better decision and perform better in batch tasks and do less work and execute the SQL statement for single entity operations. 

We take these in mind and write the operations for RESTful controller:

package com.demos.crud.controllers.apis;

import java.util.ArrayList;
import java.util.List;
import com.demos.crud.data.models.Person;
import com.demos.crud.data.services.PeopleService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.DeleteMapping;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.PutMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@RestController()
@RequestMapping("/api/people")
public class PeopleApiController {

    private static final String REQUEST_NO_BODY = "Request does not contain a body";

    @Autowired
    PeopleService peopleService;

    @GetMapping("")
    public List<Person> getAllPeople() {
        return peopleService.findAllPeople();
    }

    @GetMapping("{id}")
    public Person getPerson(@PathVariable long id) {
        return peopleService.findById(id);
    }

    @PostMapping("")
    public String addPerson(@RequestBody Person person) {
        if(person != null) {
            peopleService.insert(person);
            return "Added a person";
        } else {
            return REQUEST_NO_BODY;
        }
    }

    @PostMapping("bulk")
    public String addPeople(@RequestBody List<Person> people) {
        if(people != null && !people.isEmpty()) {
            peopleService.insertAll(people);
            return String.format("Added %d people.", people.size());
        } else {
            return REQUEST_NO_BODY;
        }
    }

    @DeleteMapping("{id}")
    public String deletePerson(@PathVariable("id") long id) {
        if(id > 0) {
            if(peopleService.delete(id)) {
                return "Deleted the person.";
            } else {
                return "Cannot delete the person.";
            }
        }
        return "The id is invalid for the person.";
    }

    @DeleteMapping("bulk")
    public String deletePeople(@RequestBody List<Person> ids) {
        if(!ids.isEmpty()) {
            if(peopleService.deleteAll(ids)) {
                return "Deleted the person.";
            } else {
                return "Cannot delete the person.";
            }
        }
        return "The request should contain a list of people to be deleted.";
    }

    @PutMapping("")
    public String updatePerson(@RequestBody Person person) {
        if(person != null) {
            peopleService.update(person);
            return "Updated person.";
        } else {
            return REQUEST_NO_BODY;
        }
    }

    @PutMapping("bulk")
    public String updatePeople(@RequestBody List<Person> people) {
        if(people != null) {
            peopleService.updateAll(people);
            return "Updated people.";
        } else {
            return REQUEST_NO_BODY;
        }
    }
}

The controller contains an additional “/bulk” endpoint for each HTTP verb that operates on a list of objects. 

Last thing to configure is to edit your application.properties file to enable a batch_size for the Hibernate operations. This is the size that is used to send queries in a batch/bulk to the database. Append these configurations to enable:

spring.jpa.properties.hibernate.jdbc.batch_size=5
spring.jpa.properties.hibernate.order_inserts=true

You can configure the batch_size anywhere from 2 to 50 based on your system configuration. Note that a larger batch size will require more memory and might cause memory overflow. The complete file is:

## default connection pool
spring.datasource.hikari.connectionTimeout=20000
spring.datasource.hikari.maximumPoolSize=5
spring.jpa.generate-ddl=false

## PostgreSQL
spring.datasource.url=jdbc:<postgresql://172.17.0.2:5432/postgres>
spring.datasource.username=<username>
spring.datasource.password=<password>

# Database schema
spring.datasource.hikari.schema=techwriting
spring.jpa.properties.hibernate.jdbc.batch_size=5
spring.jpa.properties.hibernate.order_inserts=true

# drop and recreate table again, you should comment this in production
spring.jpa.hibernate.ddl-auto=create

Note that the default connection pool settings are ad-hoc for testing workload and use case and not recommended settings for your production workloads. You should consult your DBA and Ops team to decide the settings for connection pooling.

In the next section you will see how we can pass the data to these controllers and how Spring Boot will parse the data so framework can process it.

We will use Postman to send the queries to Java web application, and OmniDB as admin panel for PostgreSQL to review the state of database tables. Before we start, this is the state of our database tables:

We will now send requests to our Java application’s endpoints and show how they mutate the table.

Bulk Insertion

We can use the POST endpoint to insert the data in the database. The endpoint to be used is “http://localhost:8080/api/people/bulk” and we submit a JSON array with details of people. I created a list with 11 elements:

[
{
"name": "Person 1",
"role": "Software Engineer"
},
// redundant part removed
{
"name": "Person 11",
"role": "Software Engineer"
}
]

Executing the code would create the records in the database.

Our code executed and saved 11 people from the list to the database. We can review the PostgreSQL table for the content using the following SQL in OmniDB:

SELECT t.id ,
       t.name ,
       t.role
FROM techwriting.people t
ORDER BY t.id

Now the database contains 11 records that we just created in the statement above; see “Verifying Batch Operations” to understand how the batch statements are created in JPA.

Bulk Updating

To perform a bulk update, we will use the same endpoint but send a PUT request. For testing, we will send an HTTP body with two people and their updated details:

[
{
"id": 2,
"name": "Updated Name",
"role": "Update Role"
},
{
"id": 3,
"name": "Updated Name",
"role": "Update Role"
}
]

This request should update 2 people (with ID 2 and 3). 

We can now verify the data in our database table using the SQL statement again:

Table contains the updated records. 

Bulk Deletion

Similarly, we can execute the deletion of records in batch using the same endpoint but DELETE verb for HTTP. In the delete request we need to post the person records that we want to delete. JPA does not contain a delete method that accepts a List of ID type (Long in Person case). So, we need to send List<Person>.

[
{
"id": 1,
"name": "Person 1",
"role": "Software Engineer"
},
{
"id": 2,
"name": "Person 2",
"role": "Software Engineer"
}
]

Executing the code would now delete the records from the table:

We can check the database table for the state:

Based on this, you can see that the data has been deleted. Now the interesting bit is to verify if the statements were executed in batch.

We have seen how to write the code that would perform batch operations, to verify whether our JPA sends the query in batches or not, we can change the application.properties file to log and show the executions and if they contain a batch operation. Add this line to the end of your application.properties file to show the statistics for queries:

# enable the query statistics
spring.jpa.properties.hibernate.generate_statistics=true

First, if we use the IDENTITY generation mode for the primary key in Person entity, we will get the following log printed for our SQL statements (e.g. the bulk insert operation):

1024993 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
469016 nanoseconds spent preparing 11 JDBC statements;
81414030 nanoseconds spent executing 11 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
438674 nanoseconds spent executing 1 flushes (flushing a total of 11 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)

If we edit the generation strategy to SEQUENCE (or other JPA-supported modes), like we have done in our model above, we will see:

5330717 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
14316641 nanoseconds spent preparing 12 JDBC statements;
1764945 nanoseconds spent executing 11 JDBC statements;
9864966 nanoseconds spent executing 3 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
34051820 nanoseconds spent executing 1 flushes (flushing a total of 11 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)

In this approach, you can see there are 3 JDBC batches that were executed. We had 11 elements in our request and batch_size of 5, so 3 batches were made for the query.

The reason is, that IDENTITY generation mode cannot be predicted by JPA and the ID needs to be captured from database engine. Therefore, JPA will send each INSERT to the database to capture the new primary key. Therefore, Hibernate disables batches implicitly, and to use the batch features you can modify the primary key value generation. 

You can enable this setting and then rerun the code above to see how batches are executed from JPA’s perspective.

Read this article to learn more on best practices for primary key in bulk operations.

In this article, you learned about bulk operations in PostgreSQL database and how JPA provides the support for bulk operations with JPA in Spring Boot. You learned how to create the bulk support in your JPA-abstraction services, what Java types to use and how JPA maps them to PostgreSQL types internally. You also learned how to write RESTful APIs and share the data between client and server applications. 

We concluded the article by suggesting some tips and how to verify the batch execution with JPA.

The WARP storage engine beta: columnar storage for MySQL 8 with automatic bitmap indexing

$
0
0

Feed: Planet MySQL
;
Author: Justin Swanhart
;

Oracle MySQL is in need of a columnar storage engine for analytics workloads.  A columnar engine (or column store) stores data vertically, that is, it stores all the data associated with a column together, instead of the traditional RDBMS storage method of storing entire rows together, either in a index organized manner, like InnoDB, or in a heap, like MyISAM.  

Columnar storage has the benefit of reducing IO when only a subset of the row is accessed in a query, because only the data for the accessed rows must be read from disk (or cache) instead of having to read entire rows.  Most columnar stores do not support indexes, but WARP does.

WARP is open source

You can find the WARP source code release on GitHub.  Binaries can be provided upon request.  Simply open an issue for your desired Linux distribution, and I will make them available as soon as I can.

WARP is beta quality

WARP is based on Fastbit, which is currently version 2, and is used in production in a number of large scientific applications, such as grid computing analysis of particle accelerator data, working with genomic data, and other applications.  

WARP has been tested by a variety of alpha users.  It is likely that there are still bugs or missing features in the MySQL storage engine interface, thus it is not suggested to use WARP for production critical data.  It is suggested to test WARP against the same data in another storage engine to test for correctness.  

Bugs and feature requests can be reported on the GitHub issues page, at the GitHub link provided above.  

Support and consulting for WARP implementations is available through Swanhart Technical Services, as well as generic MySQL training and consulting services.  I will provide information about those options in another blog post.

Bitmap Indexing

While columnar storage is uncommon to open source SQL RDBMs, bitmap indexing is not available at all.  Bitmap indexes have characteristics that make them ideal for queries that traditional btree indexes can not answer efficiently (or at all), but they are not sorted, so they do not provide all of the same capabilities of btree indexes, such as the ability to provide pre-calculated sorting.  

WARP provides both columnar storage and automatic bitmap indexing of columns used in filters.  The end user doesn’t have to pick which specific columns to index.  Compressed bitmap indexes are automatically created to support the queries run against the database.  It is possible to exclude columns from automatic indexing if desired.  

The WARP storage engine

WARP SE is an acronym for Word Aligned Relational Partitioned Storage Engine.  WARP is a storage engine for MySQL 8.0, currently supporting MySQL 8.0.19.  Data is partitioned vertically, and horizontally, automatically.  

WARP does not currently support distributing data over more than one node.  WARP FS is a distributed file system in development to support this feature, but it is not currently available for public testing (yet).

The WARP engine is based on an existing “nosql” database called Fastbit (https://sdm.lbl.gov/fastbit/).  Fastbit is an open source columnar engine which features WAH (word aligned hybrid) compressed bitmap indexes.  WARP is based on a slightly modified version of the Fastbit 2.0.3 code base.  When you compile WARP, the Fastbit source code is automatically retrieved, patched, and compiled, as part of the build process.

Technically pluggable

The WARP engine does not require any modification to the MySQL source code, thus it is technically a pluggable engine, but the current source tree compiles it into the server like the InnoDB storage engine.  This is because MySQL currently changes the storage engine interface in minor releases, so older versions of plugins, even for the same major release, may no longer work in later point releases.  This is unfortunate, but that is a topic for another blog post.

WARP storage engine in detail

For my examples, I am going to use the Star Schema Benchmark test data.  The Star Schema Benchmark test data creates a STAR schema.  STAR schema are very common in data marts.  The fact table is usually a very large table, and dimension tables are smaller “lookup” tables that are joined to the fact table.

The following table is the “fact” table:

The DDL associated with the LINEORDER table of the Star Schema Benchmark

The DDL associated with the LINEORDER table of the Star Schema Benchmark

A note about indexes

You do not need to define indexes ahead of time

You will note that the table does not have any indexes defined on it.  Indexes are automatically created and maintained transparently by WARP when a column is utilized in a filter (WHERE clause).  

It is, however, possible to explicitly define indexes for columns, which will then allow an EXPLAIN plan to show index usage.  Keep in mind, however, that WARP may use indexes (even combining multiple indexes) in ways MySQL is not capable of doing so, so the EXPLAIN plan does not always accurately represent the execution plan of the WARP storage engine.  It is possible to exclude columns from being indexed automatically, but that will be discussed in a future blog post.

You do not need indexes for joins in MySQL 8

MySQL 8.0.19+ supports HASH joins, which are efficient for joining tables, and counter-intuitively, using bitmap indexes for joins may result in slower queries than using HASH joins, so it is not recommended to explicitly create indexes on columns that are going to be used in joins.  Doing so will cause MySQL to choose to use nested loop joins instead of HASH joins.  

If it is desirable to have a UNIQUE key in a dimension table, it is suggested to store the column twice, once without KEY and once with, and to join on the column without the key. This limitation may change in a future version of WARP.

Updates are not “in place”

WARP does not update column data in place.  When a row is deleted, the deleted RID is marked in the sparse bitmap (described below).  When a row is updated, the original version of the row is marked as deleted, and a new row is appended to the table.  This means that the old version of a row may be in a different partition than the new version.  Rebuilding a table with OPTIMIZE TABLE or ALTER TABLE .. FORCE will remove the deleted rows.  

The primary reason for this is that WAH compressed bitmaps are extremely expensive to update, but disk space is relatively cheap.  The second reason is that strings are stored adjacent to each other in a column, and increasing the length of a string would require relocating the whole row anyway.  

Filling in “holes” is fairly simple in a row store, but much more complicated in a column store.

WARP RIDs (row ids)

RID stands for “rowid”.  Every row in a WARP table has an associated RID.  Note that RID values are not monotonic.  A table may contain gaps in RID values, thus a WARP rid is a logical row address.  RID values are 64 bit integers.

The physical offset into a column in a partition is the physical RID of the row in that partition.  But since a table may have many partitions, it is not possible to refer to a physical RID from the storage layer.  If a table is modified or rebuilt with ALTER TABLE, the deleted rows are removed, and physical RID values will change, but logical RID values will not.

RID values are not visible or usable from the MySQL storage layer.  They are only used internally by the storage engine.

Data in the data directory

MySQL stores data on disk in a “data directory”.  Each schema (aka database) is a directory in the data directory.  In this case, the tables for the ssb_warp schema are located in /data/mysql_data/ssb_warp.

The .sdi file is conceptually similar to the .FRM file that would be found in older versions of MySQL, prior to MySQL 8.  MySQL 8 has a transactional data dictionary, and has eliminated .FRM files.  

Files and directories inside of the table directory

The table data itself is stored on disk in a table directory associated with the table.  In this case the data for the “lineorder” table is located in /data/mysql_data/ssb_warp/lineorder.data

Inside of the datafile directory, you will find additional directories which represent each partition of the table.  WARP automatically partitions tables, by default, into horizontal partitions of a maximum size of one million rows.  This is currently configurable on a per-instance, not per-table basis.  

WARP generally reads an entire column in a partition into memory at once.  Every non-empty table has at minimum, one partition.

In addition to the partitions, which are directories  you will find a “-part.txt” file and a “deleted.rids” file.  

-part.txt

The “-part.txt” file is a .ini like file which contains information about the table, including the data types for each column.  You must not delete or modify this file manually.  

The “-part.txt” file also includes the index specifications for bitmap indexes.  WARP supports a wide variety of different bitmap indexing methods.  The default index method is suitable for a wide variety of data, but it is possible to choose the bitmap index characteristics via a comment on a column in the table DDL.  This will be explored in a future blog post.

deleted.rids sparse bitmap file

The “deleted.rids” file is a simple sparse bitmap which indicates which rows in the table have been deleted.  Simple sparse bitmaps are stored on disk, and can represent a bitmap of any size (up to the maximum file size on the filesytem) and do not need to be read completely into memory for querying or modification.

Each RID is represented by one bit in the sparse bitmap.  This bitmap is not WAH compressed because updating WAH bitmaps is very expensive.  Instead it is a sparse file which means the OS will only use space in the bitmap for any blocks in which a deleted RID is written.  Updates to the sparse bitmap are logged with a WAL (write ahead log) and are atomic.

If you have a 1 billion row table with only one deleted row, the “deleted.rid” file will appear to be large, but will in fact only take up 4K (or your filesystem block size) of space on disk.  Take care when backing up and restoring data, to preserve sparse files.

The sparse bitmap header-only C++ library is available for use in other projects here.

Column data files

WARP stores each column in an individual data file.  The data in the column is analogous to a raw C array.  On disk, each column data file is named after the ordinal position of the column in the table, staring with zero, and prefixed by the letter c.  Thus, a 17 column table will have sixteen data files c0 through c16.

Files in a partition

Files in a partition

The “r” file is the column which contains the WARP RID.  This column is fetched with every query, and each row RID is compared against the deleted rid simple sparse bitmap and skipped if the row has been marked as deleted.

NULL marker data files

NULL markers are stored in data files named nX, where X is an integer representing the ordinal position of the column.

Because a SQL RDBMS supports the concept of NULL values, when a column is NULLABLE, the NULL marker is created to store a 1 if the column is null, or a 0 if it is not.  When a column value is NULL, a 0 is stored in column data file, and a 1 is stored in the “null marker”.  

The NULL marker is essentially a hidden TINYINT column, thus NULL values take up the storage size of the column, plus one extra byte.  This is necessary because Fastbit does not support NULL values in the traditional RDBMS sense.  

The IS NULL and IS NOT NULL operators query the NULL markers, and bitmap indexes are automatically created for the markers to support these operations when necessary.

Index files

When WARP creates indexes, these files are stored in additional files, depending on if the data is character or integer based.  For integer data, there will be one “.idx” file, named cX.idx where the X is again the ordinal position of the column in the table.  For data stored as strings, “cX.sp” files are created, again named after the ordinal position of the column.  It is possible to delete “.idx” or “.sp” files.  They will be recreated automatically as necessary.

Locking and ACID compliance

WARP currently uses table level locking, and supports concurrent inserts, and querying data while loading.  WARP is partially ACID compliant.  Loading data commits periodically, and committed data is visible to new queries.  WARP does not support transactions (yet).  

It is recommended to add an integer column to tables representing a batch of data, and if the data loading fails, or is interrupted, to delete the affected batch.  The next version of WARP will support atomic loading so that this will no longer be necessary.  

DELETE and UPDATE queries block other queries.  DELETE and UPDATE queries are uncommon in analytics systems.  WARP does support DELETE and UPDATE so that “slowly changing dimensions” are supported, which is not possible on column stores like ClickHouse.

DELETE and UPDATE operations are not in place.  Old row versions are maintained in the table until it is rebuilt with OPTIMIZE TABLE.Thus WARP is somewhere between MyISAM and InnoDB in terms of ACID compliance, and the goal is full ACID compliance in a future release. 

A quick look at performance and automatic indexing.

The lineorder table referenced above has ~70M rows:

A COUNT(*) query with no WHERE clause

A COUNT(*) query with no WHERE clause

The LO_OrderKey and LO_SuppKey columns are integers.  Each order has on average 7 line items, but the number of lines differ (the SSB data is based on the TPC-H data generator).

WARP reads only the necessary column data from disk

You will notice that the following query is a lot faster than the COUNT(*) query.  This is because only the LO_OrderKey column had to be read from disk.  This is because WARP is a column store.  In addition, an index was automatically constructed on the LO_OrderKey column.

A filter is used on the LO_OrderKey column which constructs an index

A filter is used on the LO_OrderKey column which constructs an index

Automatic indexing for filtered columns

The first time you query a column and use a filter (WHERE clause) on it, IO will be done to read the data from disk, and an index will be automatically constructed.  Here is a filter for a single order:

A 39K index is created for the 8.0 megabytes of data in partition 0

A 39K index is created for the 8.0 megabytes of data in partition 0

You will notice that the index is small.  WARP uses WAH compressed bitmaps. 

This is the first partition in the table, but almost ever partition (save the last) contains the same amount of data because each partition has 1M rows.  Thus the size of the LO_OrderKey on disk is ~544MB, but the size of the index is only 4.67MB.  

A btree index in InnoDB requires the whole column to be copied (and sorted) along with the primary key for the row.  So an InnoDB index on this same data would be ~577MB.

Read free index updates

When a row is appended to a table only the last block of the index has to be updated, thus inserting new rows into the table does not incur the performance penalty of maintaining btree indexes.  Indexes use very little space, and are negligible to insertion and modification performance.

Querying the column again with the automatic index

The bitmap index will be used the next time a filter is applied to the LO_OrderKey column.  This query selects a different order from the table:

Searching for LO_OrderKey = 100 uses the indexes with a 0.03 second response

Searching for LO_OrderKey = 100 uses the indexes with a 0.03 second response

WARP didn’t have to do any IO for the query, because the data is already cached from the previous query.  The size of the cache is configurable and defaults to 4GB.  But more importantly, the bitmap index was utilized, and instead of scanning the whole table (which takes 55 seconds) the query is resolved in 0.03 seconds.

Unlike most MySQL engines, WARP can use more than one index to answer a query

Bitmap indexes support bitmap intersection for fast query results that can not be satisfied with btree indexes.  MySQL does support “index intersect” operations in some cases, but they are still not as fast as bitmap index intersection, and every btree index slows down data manipulation operations significantly.

I first query the LO_SuppKey column to create an index on it. Note, I could just run the final query that references both columns, but again, I want to demonstrate the indexing performance and show that the intersection returns the right results.

It takes 26.34 seconds to read the column from disk and construct the index

It takes 26.34 seconds to read the column from disk and construct the index

I will use a disjunction (an OR clause) to combine results from both of the indexes. 

There are no rows with both LO_OrderKey = 100 and LO_SuppKey = 9988, thus the query result should include the 2974 rows where LO_SuppKey = 9988, and the five rows where LO_OrderKey = 100 rows, and indeed it does:

Bitmap intersection returns results in 0.05 seconds for an OR query that btree indexes can not efficiently resolve

Bitmap intersection returns results in 0.05 seconds for an OR query that btree indexes can not efficiently resolve

Conclusion

This concludes the introduction to the WARP storage engine.  Questions and comments are welcome.  Again, if you would like binaries for your preferred Linux distribution, open a GitHub issue requesting them, and I will make them available.

Protect your data using ProxySQL Firewall

$
0
0

Feed: Planet MySQL
;
Author: René Cannaò
;

ProxySQL Firewall Overview

ProxySQL’s flexible query rules engine has many uses, from Read/Write splitting, sharding and even creating firewall blacklist. This allows ProxySQL to be loved by both Performance and Security-minded engineers.

Starting in ProxySQL 2.0.9, ProxySQL has another Security feature: the Firewall Whitelist.

Modeled on MySQL Enterprise Firewall, this allows a security-conscious administrator to tune access to only allow certain queries.

Imagine a situation where your webapp gets hacked, which exposes your user’s database credentials.

If your webapp connects directly to the database, the malicious user can do what they want to your data with the same permissions your webapp has.

So perhaps they can’t just DROP TABLE because you’ve smartly removed DDL permissions from your application user. But they can likely do DELETE FROM mytable after saving a copy to hold for ransom.

ProxySQL Firewall can prevent this situation.

Implementing ProxySQL Firewall

The documentation explains really well how to implement ProxySQL Firewall, but here are the high-level steps on how it works:

Prerequisite Point your webapp to connect to ProxySQL rather than MySQL if you haven’t already. Configure ProxySQL with users and servers to connect to the backend MySQL database.

Step 1. Allow ProxySQL to record the application’s traffic for a period of time to ensure all application queries are captured.
Protip: ProxySQL does this automatically through stats_mysql_digest and (if configured) stats_history.history_mysql_query_digest.

Step 2. Add your app user to mysql_firewall_whitelist_users table through the ProxySQL Admin interface. Let’s say our app user is myapp:

In this example, PROTECTING means ProxySQL will stop all other traffic that is not defined in the mysql_firewall_whitelist_rules for this user in the following step.

It’s also important to note that any user not in this table will be blocked when firewall whitelist is enabled below. You can get around this by adding an entry into mysql_firewall_whitelist_users for all users to allow, with mode=’OFF’.

Step 3. Add all digests from Step 2 into mysql_firewall_whitelist_rules table. This assumes you’ve configured stats_history.history_mysql_query_digest, as this will persist stats across ProxySQL restarts.

Step 4. Load your configuration to runtime.

Step 5. Enable the ProxySQL Firewall

Congratulations! Now the malicious user can only perform exact queries that your application can perform.

The previous DELETE FROM mytable statement would fail, because your application doesn’t allow doing DELETEs without WHERE clauses, right?

Keeping firewall rules up to date

If you are familiar with ProxySQL query rules, you’ll notice that the mysql_firewall_whitelist_rules table behaves a little bit differently.

You have to populate it with exact digests, rather than relying on regular expressions. If you have thousands of mysql_query_rules, you were likely doing this already though.

This means that you will need to update the firewall rules any time an application deployment introduces new queries or changes query digests.

Our recommendation is to implement a process that records application digest changes before it hits production, such as a staging environment, and then update the production firewall rules as part of the deployment process.

Conclusion

If you want to be absolutely certain of the queries hitting your database from your application are not malicious, ProxySQL Firewall can help with this.

There is a bit of friction to keep this maintained, but if the security of your data is paramount, then the ease of mind ProxySQL Firewall provides is well worth it.

Authored by: Derek Downey

Viewing all 275 articles
Browse latest View live