Why does hive on tez run twice as fast sometimes?

Why does hive on tez run twice as fast sometimes? - hive

I've run some experiments with hive on tez. I have a query that 80% of the time runs for about 45 seconds and the rest of the time runs for about 24 seconds. What could that rare speedup be attributed to?

Related

Aborting queries in snowflake

Does snowflake incur charges when I abort long running queries?

Yes. Because you pay for every second the warehouse is/was running (after the first 60 seconds which you get billed for just for starting the warehouse).
You also get billed if the long running query hits the execution timeout limit (default is something like 4 hours) you pay for those minutes, and you still have no answer.
But if you had the warehouse running many queries, and you run yet another query and then after time abort the new query, the warehouse was running so the new query itself will not be charged. But at the same time, the other queries run fractionally slow.

CPU is charged by the Virtual Warehouse being up and running, not by individual query. If you abort a query, and suspend the VWH you’re only charged during the time the VWH was in a running state.

What wait types are associated with SQL Wait Category "Unknown" in Azure SQL database?

I have a query that is already optimized, it is meant to run under few minutes, this partly due to the use of external tables. Sometimes, it runs in few minutes, another time, it could take up to 1 hour, 30 minutes. The execution time is inconsistent.
I had a look at wait stats, the query seems to be spending a long time on Wait Category 'Unknown'.
Does anybody have any idea about ‘Unknown’ wait category in Azure SQL database or what is causing the inconsistent execution time?

BigQuery not so fast today - tables with just 1 row taking a minute to query

We've been experiencing timeouts, and long running queries with BigQuery. The table is 1 row (8 bytes) and the query is taking on average 50 seconds.
This is causes issues with our applications, which are timing out after 10 seconds. They don't expect a query over 1 row to take that long.
Is there something wrong with BigQuery today?
Some example job ids:
job_TlXh5ISWUc6QXpf3HQ3KC-iRdRw
job_4OLNwAFN_j6mqMsnw2q8UUAJ528
job_3bZjpkVOfb55PZbldCwot53HqWA
job_V_EJzsuM9hjikBM-zQ_hCoEJNu8
job_66awPpXPPfd7WrDuRzYf7o3bKPs

there was a temporary issue yesterday afternoon where some queries experienced added latency. I looked at several of your queries and they all appeared to have run during that time.
The issue was related to a blocking call that we were making to a monitoring service that was timing out. We've since fixed the issue. We're in the process of conducting an internal post-mortem to figure out how to prevent issues like this in the future.

How Can I Profile and Debug Intermittent Slow SQLite Inserts?

Using a test script, the average time to complete an insert is a few ms. But about 3% of the time, the insert takes between 0.5 and 3 seconds to complete. If I run the same query 1000 times about 970 finish in under 10ms, while 30 take over 500ms.
I'm running a fairly recent build of Raspbian from a few months ago and SQLite 3.8.4.
The process doing the inserts jumps from about 5% CPU usage to 10% when the slow inserts happen, but otherwise the CPU usage is normal.
How can I find out what's going on here, how would I know if SQLite is waiting on the OS to write, or if it's waiting to acquire a lock, or something else?
Edit: Here is the table schema
create table n (id INTEGER PRIMARY KEY,f TEXT,l TEXT);
And Here is the query I'm running
insert into n (f,l) values ('john','smith');

Hive query taking a lot of time just to launch map-reduce jobs

We are using Hive for Ad-hoc querying and have a Hive table which is partitioned on two fields (date,id).
Now for each date there are around 1400 ids so on a single day around that many partitions are added. The actual data is residing in s3. Now the issue we are facing is suppose we do a select count(*) for a month from the table then it takes quite a long amount of time (approx : 1hrs 52 min) just to launch the map reduce job.
When I ran the query in Hive verbose mode I can see that its spending this time actually deciding how many number of mappers to spawn (calculating splits). Is there any means by which I can reduce this lag time for the launch of map-reduce job?
This is one of the log messages that is being logged during this lag time:
13/11/19 07:11:06 INFO mapred.FileInputFormat: Total input paths to process : 1
13/11/19 07:11:06 WARN httpclient.RestS3Service: Response '/Analyze%2F2013%2F10%2F03%2F465' - Unexpected response code 404, expected 200

This is probably because with an over-partitioned table the query planning phase takes a long time. Worse, the query planning phase itself might take longer than the query execution phase.
One way to overcome this problem would be to tune up your metastore. But the better solution would be to devise an efficient schema and get rid of unnecessary partitions. Trust me, you really don't want too many small partitions.
As an alternative you could also try setting hive.input.format to org.apache.hadoop.hive.ql.io.CombineHiveInputFormat before you issue your query.
HTH

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why does hive on tez run twice as fast sometimes? - hive

I've run some experiments with hive on tez. I have a query that 80% of the time runs for about 45 seconds and the rest of the time runs for about 24 seconds. What could that rare speedup be attributed to?

Related

Aborting queries in snowflake

What wait types are associated with SQL Wait Category "Unknown" in Azure SQL database?

BigQuery not so fast today - tables with just 1 row taking a minute to query

How Can I Profile and Debug Intermittent Slow SQLite Inserts?

Hive query taking a lot of time just to launch map-reduce jobs

Categories

Resources