Does snowflake incur charges when I abort long running queries?
Yes. Because you pay for every second the warehouse is/was running (after the first 60 seconds which you get billed for just for starting the warehouse).
You also get billed if the long running query hits the execution timeout limit (default is something like 4 hours) you pay for those minutes, and you still have no answer.
But if you had the warehouse running many queries, and you run yet another query and then after time abort the new query, the warehouse was running so the new query itself will not be charged. But at the same time, the other queries run fractionally slow.
CPU is charged by the Virtual Warehouse being up and running, not by individual query. If you abort a query, and suspend the VWH you’re only charged during the time the VWH was in a running state.
Related
I have a query that is already optimized, it is meant to run under few minutes, this partly due to the use of external tables. Sometimes, it runs in few minutes, another time, it could take up to 1 hour, 30 minutes. The execution time is inconsistent.
I had a look at wait stats, the query seems to be spending a long time on Wait Category 'Unknown'.
Does anybody have any idea about ‘Unknown’ wait category in Azure SQL database or what is causing the inconsistent execution time?
Here is the situation I am in:
In Azure, we have an elastic database pool that is a Premium 1000 eDTU pool. We have 6 databases. One of the databases runs merges and aggregations on reporting data at regular intervals and has been chugging along for several weeks with the MAX eDTU's at around 60% and average around 40%.
Monday this week, everything started going to crap. We rolled NO CHANGES to ANYTHING. Looking that the DB metrics in the azure portal, as well as the sys.dm_db_resource_stats view, our CPU% is getting hammered at 99-100% for blocks of up to an hour while the memory in that view is consistently around 10%. We have reports that run off of this database, and some of the queries (when run in management studio) will bring the server to its knees. These are not poorly-written or gigantic queries either. For example, we have one query that runs two CTE's and joins the results together for final output. The first returns 30 records and the second returns 20 (each returning 4-5 columns). When I parameterize the query and simply select the individual result sets it runs in <1s however when I do the simple join, I had to kill the execution because it was at 10 minutes and taking up 99.9% of the eDTU's in the pool. Please take my word for it that this is a simple join on a tiny result set.
I replaced the CTE's with table variables and got the same result. I then replaced the table variables with temp tables (#tables) and voila! The query ran in ~2s...
Is somebody aware of a cap or throttling? Here is a graph of the database usage stats for the past 8 days or so:
As you can see, early on 6/27 things just started going haywire...Any insight would be greatly appreciated.
We've been experiencing timeouts, and long running queries with BigQuery. The table is 1 row (8 bytes) and the query is taking on average 50 seconds.
This is causes issues with our applications, which are timing out after 10 seconds. They don't expect a query over 1 row to take that long.
Is there something wrong with BigQuery today?
Some example job ids:
job_TlXh5ISWUc6QXpf3HQ3KC-iRdRw
job_4OLNwAFN_j6mqMsnw2q8UUAJ528
job_3bZjpkVOfb55PZbldCwot53HqWA
job_V_EJzsuM9hjikBM-zQ_hCoEJNu8
job_66awPpXPPfd7WrDuRzYf7o3bKPs
there was a temporary issue yesterday afternoon where some queries experienced added latency. I looked at several of your queries and they all appeared to have run during that time.
The issue was related to a blocking call that we were making to a monitoring service that was timing out. We've since fixed the issue. We're in the process of conducting an internal post-mortem to figure out how to prevent issues like this in the future.
I have an application running on Postgres database, sometimes when I have about 8-10 people working on the application, the CPU usage soars high to something between 99-100%, The application was built on Codeigniter framework which I believe had made provision for closing up connections to the database each and every time it is not needed, What could be solution to this problem. I would appreciate any suggestions. Thank you
Basically, what the people do on the application is to running insert queries but at a very fast rate, A person could run between 70 - 90 insert queries in a minute.
I came across with the similar kind of issue. The reason was - some transactions were getting stuck and running since long time. Hence CPU utilization got increased to 100%. Following command helped to find out the connections running for the longest time:
SELECT max(now() - xact_start) FROM pg_stat_activity
WHERE state IN ('idle in transaction', 'active');
This command shows the the amount of time a connection has been running. This time should not be greater than an hour. So killing the connection which was running for a long long time or stuck at any point worked for me. I followed this post for monitoring and solving my issue. Post includes lots of useful commands to monitor this situation.
You need to find out what PostgreSQL is doing. Relevant resources:
Monitoring in general
Monitoring queries
Finding slow queries
Once you find what the slow or the most common queries are use, use EXPLAIN to make sure they are being executed efficiently.
Here are some cases we met that cause high CPU usage of Postgres.
Incorrect indexes are used in the query
Check the query plan - Through EXPLAIN, we could check the query plan, if the index is used in the query, the Index Scan could be found in the query plan result.
Solution: add the corresponding index for the query SQL to reduce CPU usage
Query with sort operation
Check EXPLAIN (analyze, buffers) - If the memory is insufficient to do the sorting operation, the temporary file could be used to do the sorting, and high CPU usage comes up.
Note: DO NOT "EXPLAIN (analyze)" in a busy production system as it actually executes the query behind the scenes to provide more accurate planner information and its impact is significant
Solution: Tune up the work_mem and sorting operations
Sample: Tune sorting operations in PostgreSQL with work_mem
Long-running transactions
Find long-running transactions through
SELECT pid
, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '2 minutes';
Solution:
Kill the long-running transaction through select pg_terminate_backend(pid)
Optimize the transaction or query SQL through corresponding indexes.
We are using Hive for Ad-hoc querying and have a Hive table which is partitioned on two fields (date,id).
Now for each date there are around 1400 ids so on a single day around that many partitions are added. The actual data is residing in s3. Now the issue we are facing is suppose we do a select count(*) for a month from the table then it takes quite a long amount of time (approx : 1hrs 52 min) just to launch the map reduce job.
When I ran the query in Hive verbose mode I can see that its spending this time actually deciding how many number of mappers to spawn (calculating splits). Is there any means by which I can reduce this lag time for the launch of map-reduce job?
This is one of the log messages that is being logged during this lag time:
13/11/19 07:11:06 INFO mapred.FileInputFormat: Total input paths to process : 1
13/11/19 07:11:06 WARN httpclient.RestS3Service: Response '/Analyze%2F2013%2F10%2F03%2F465' - Unexpected response code 404, expected 200
This is probably because with an over-partitioned table the query planning phase takes a long time. Worse, the query planning phase itself might take longer than the query execution phase.
One way to overcome this problem would be to tune up your metastore. But the better solution would be to devise an efficient schema and get rid of unnecessary partitions. Trust me, you really don't want too many small partitions.
As an alternative you could also try setting hive.input.format to org.apache.hadoop.hive.ql.io.CombineHiveInputFormat before you issue your query.
HTH