Greenplum not allowing to cancel long running queries

Greenplum not allowing to cancel long running queries - sql

There is a insert query which is loading data into table A from from table B .
Table B is having 3000 million records.
The query is running since 4 hours and after that if the user is forcefully cancelling it from the pivotal greenplum command center.
it's still running in the backend.
tried running the below commands:
pg_cancel_backend(pid)/pg_terminate_backend(pid)
both are returning true with no effect in real time.
how to deal with this , is restarting the db is the only option.
Thanks

check for error in the pg_log and try pstack the process on segment host to see what it is doing

Related

Bigquery internal error during copy job to move tables between datasets

I'm currently migrating around 200 tables in Bigquery (BQ) from one dataset (FROM_DATASET) to another one (TO_DATASET). Each one of these tables has a _TABLE_SUFFIX corresponding to a date (I have three years of data for each table). Each suffix contains typically between 5 GB and 80 GB of data.
I'm doing this using a Python script that asks BQ, for each table, for each suffix, to run the following query:
-- example table=T_SOME_TABLE, suffix=20190915
CREATE OR REPLACE TABLE `my-project.TO_DATASET.T_SOME_TABLE_20190915`
COPY `my-project.FROM_DATASET.T_SOME_TABLE_20190915`
Everything works except for three tables (and all their suffixes) where the copy job fails at each _TABLE_SUFFIX with this error:
An internal error occurred and the request could not be completed. This is usually caused by a transient issue. Retrying the job with back-off as described in the BigQuery SLA should solve the problem: https://cloud.google.com/bigquery/sla. If the error continues to occur please contact support at https://cloud.google.com/support. Error: 4893854
Retrying the job after some time actually works but of course is slowing the process. Is there anyone who has an idea on what the problem might be?
Thanks.

It turned out that those three problematic tables were some legacy ones with lots of columns. In particular, the BQ GUI shows this warning for two of them:
"Schema and preview are not displayed because the table has too many
columns and may cause the BigQuery console to become unresponsive"
This was probably the issue.
In the end, I managed to migrate everything by implementing a backoff mechanism to retry failed jobs.

How to make an already running PL SQL package run faster?

There is a PL SQL package which has been called by a job. I have run the job and I was monitoring the progress by running a query on the log table which gets populated for each record. There are 20k records to be processed. The code ran reasonably fast for first 8-10k records. After that , the speed slowed down considerably. The code is still running and it is very slow now. I checked all the active sessions and there are no issues. Can a booster be applied without killing the job?

Try inserting commit for every 5 k of records processed

Performance issue with joining tables over 3 DB links

Recently we have faced the problem where we are fetching the data from 3 different data sources over a db link. It was running fine when we were fetching 16 columns from the tables by joining three sources. But as we increased the column from 16 to 50 the query is taking too much time.
Here we are fetching the data from 3 different data source consider as A(Singapore), B(Malaysia), C (India) and and creating a view with the combination of above 3 regions and the view we have published to the front -end team (Tableau team) to perform the visualization process over that data
Any suggestion how to solve the problem? I am planning for below alternatives
applying /*+ DRIVING_SITE */ hint so that it run on remote server with the update statistics.
creating a MV on local server and refreshing the data over night, but it will not have proper update data.
creating a mv on local server and partitioning the mv and refreshing the partition whenever the changes has occurred at remote site, so to alert for changes planning to create a queuing system or DBMS_PIPE if it helps.

SparkSQL JDBC writer fails with "Cannot acquire locks error"

I'm trying to insert 50 million rows from hive table into a SQLServer table using SparkSQL JDBC Writer.Below is the line of code that I'm using to insert the data
mdf1.coalesce(4).write.mode(SaveMode.Append).jdbc(connectionString, "dbo.TEST_TABLE", connectionProperties)
The spark job is failing after processing 10 million rows with the below error
java.sql.BatchUpdateException: The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.
But the same job succeeds if I use the below line of code.
mdf1.coalesce(1).write.mode(SaveMode.Append).jdbc(connectionString, "dbo.TEST_TABLE", connectionProperties)
I'm trying to open 4 parallel connections to the SQLServer to optimize the performance. But the job keeps failing with "cannot aquire locks error" after processing 10 million rows. Also, If I limit the dataframe to just few million rows(less than 10 million), the job succeeds even with four parallel connections
Can anybody suggest me if SparkSQL can be used to export huge volumes of data into RDBMS and if I need to make any configuration changes on SQL server table.
Thanks in Advance.

weird issue with Hive 0.12 in BigInsights 3.0

I have this simple query which is fine in hive 0.8 in IBM BigInsights2.0:
SELECT * FROM patient WHERE hr > 50 LIMIT 5
However when I run this query using hive 0.12 in BigInsights3.0 it runs forever and returns no results.
Actually the scenario is the same for following query and many others:
INSERT OVERWRITE DIRECTORY '/Hospitals/dir' SELECT p.patient_id FROM
patient1 p WHERE p.readingdate='2014-07-17'
If I exclude the WHERE part then it would be all fine in both versions.
Any idea what might be wrong with hive 0.12 or BigInsights3.0 when including WHERE clause in the query?

When you use a WHERE clause in the Hive query, Hive will run a map-reduce job to return the results. That's why it usually takes longer to run the query because without the WHERE clause, Hive can simply return the content of the file that represents the table in HDFS.
You should check the status of the map-reduce job that is triggered by your query to find out if an error happened. You can do that by going to the Application Status tab in the BigInsights web console and clicking on Jobs, or by going to the job tracker web interface. If you see any failed tasks for that job, check the logs of the particular task to find out what error occurred. After fixing the problem, run the query again.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Greenplum not allowing to cancel long running queries - sql

check for error in the pg_log and try pstack the process on segment host to see what it is doing

Related

Bigquery internal error during copy job to move tables between datasets

How to make an already running PL SQL package run faster?

Performance issue with joining tables over 3 DB links

SparkSQL JDBC writer fails with "Cannot acquire locks error"

weird issue with Hive 0.12 in BigInsights 3.0

Categories

Resources