Bigquery query run generates a temp table that it is not used again - google-bigquery

I have a question regarding how BigQuery works under the hood. I read in the documentation that BigQuery uses cached results when it tries to run the exact same query twice in a cached temp table that has generated in the first time that it run.
When you run a duplicate query, BigQuery attempts to reuse cached results. To retrieve data from the cache, the duplicate query text must be the same as the original query.
But when I attempt to run the same query 2 times in a row the processing time is not faster. When I try to use the temp table that it is generated the response is instant.
Example:
Run of the same original query twice:
SELECT * FROM `project.dataset.table`
First time duration: 56 sec
Second time duration: 1 min 25 sec
Run the query from the same temp generated table:
SELECT * FROM `project._e552111cf62868acff9032079d48966c879f.b011854_b431_4889_92500bb0569489d`
Duration: 0 sec
Why BigQuery is not reusing the caching results from the temp table? Do I miss something?

Related

Identify query run time

I have a query that returns results very fast, seconds. But when I want to fetch all the rows it takes several hours.
If my definition of how long a query takes to run is to fetch all rows, how can one measure this besides actually fetching all the rows?
Would select count (*) on all rows be a good indicator on how long it would take to fetch all rows?
select count(*) is going to likely do a table scan to return the total number of records.
Depending what is in the table and how it is indexed the count(*) would most likely return faster then running a select *
You could run some baselines on your table by using set statistics time on and set statistics io on.
I would also suggest running with client statistics.
Also, try running a top 100, 1000, 10000 with the above turned on.
When I performance tune I like to look at actual execution plan and estimated execution plan

SQL takes more time to fetch records

I am using sql server 2012. I have one table, contains 1.7 million records. I am just selecting all record using (select * from table_name). But it takes 1 Hour 2 minutes to fetch.
What should i do to fetch records quickly?
All you can do is limit your Result by Using
Top(100) Result or Where Clause Your Desired Data instead of
SELECT * FROM table
This will help you get only concerning and limited data in a less amount of time.
Another thing you can do is to Get Concerning columns only which gives you desired results.
This will significantly enhance the fetching time.

Simple select from table takes 24 seconds in SQL Server 2014

I have a table named [cwbOrder] that currently has 1.277.469 rows. I am using SQL Server 2014 and I am doing these tests on a UAT environment, on production this query takes a little bit longer.
If I try selecting all of the rows like using:
SELECT * FROM cwbOrder
It takes 24 seconds to retrieve all of the data from the table. I have read about how it is important to index columns used in the predicates (WHERE), but I still cannot understand how does a simple select take 24 seconds.
Using this table in other more complex queries generates a lot of extra workload for the query, although I have created the JOINs on indexed columns. Additionally I have selected only 2 columns from this table then JOINED it to another table and this operation still takes a significantly long amount of time. As an example please consider the below query:
Below I have attached the index structure of both tables, to illustrate the matter:
PK_cwbOrder is the index on the id_cwbOrder column in the cwbOrder table.
Edit 1: I have added the execution plan for the query in which I join the cwbOrder table with the cwbAction table.
Is there any way, considering the information above, that I can make this query faster?
There are many reasons why such a select could be slow:
The row size or number of rows could be very large, requiring a lot of time to transport or delay.
Other operations on the table could have locks on the table.
The database server or network could be very busy.
The "table" could really be a view that is running a complicated query.
You can test different aspects. For instance:
SELECT TOP 10 <one column here>
FROM cwbOrder o
This returns a very small result set and reads just a small part of the table. This reads the entire table but returns a small result set:
SELECT COUNT(*)
FROM cwbOrder o

Fastest way to do SELECT * WHERE not null

I'm wondering what is the fastest way to get all non null rows. I've thought of these :
SELECT * FROM table WHERE column IS NOT NULL
SELECT * FROM table WHERE column = column
SELECT * FROM table WHERE column LIKE '%'
(I don't know how to measure execution time in SQL and/or Hive, and from repeatedly trying on a 4M lines table in pgAdmin, I get no noticeable difference.)
You will never notice any difference in performance when running those queries on Hive because these operations are quite simple and run on mappers which are running in parallel.
Initializing/starting mappers takes a lot more time than the possible difference in execution time of these queries and adds a lot of heuristics to the total execution time because mappers may be waiting resources and not running at all.
But you can try to measure time, see this answer about how to measure execution time: https://stackoverflow.com/a/44872319/2700344
SELECT * FROM table WHERE column IS NOT NULL is more straightforward (understandable/readable) though all of queries are correct.

working of where condition in sas proc sql, while connecting to some other database

I am working on a table with more that 30 million records.The table is on sybase and i am working on sas. There is a feed_key(numeric) variable which contains the time stamp for the record entry. I wanted to pull records for a particular time frame.
proc sql ;
Connect To Sybase (user="id" pass="password" server=concho);
create table table1 as
select * from connection to sybase
(
select a.feed_key as feed_key,
a.cm15,
a.country_cd,
a.se10,
convert(char(10),a.se10) as se_num,
a.trans_dt,
a.appr_deny_cd,
a.approval_cd,
a.amount
from abc.xyz a
where a.country_cd in ('ABC') and a.appr_deny_cd in ('0','1','6') and a.approval_cd not in ('123456') and feed_key > 12862298
);
disconnect from sybase;
quit;
it is pulling same no of records, irrespective of whether i put the feed_key condition or not, and it is taking almost same time to execute(16 mins without feek_key condition and 15 mins with feed_key condition) the query.
please clarify the working of where clause in this case.
as i believe the feed_key condition should have made the query run much faster as more than 80% of records do not match this condition....
If you're getting the same number of records back, it'll take the same amount of time to process the query.
This is because the I/O (transferring data back to SAS and storing it) is the most time-consuming part of the operation. This is why the lack of index doesn't make a big impact on the total time.
If you adjust your query so that it returns fewer rows, you will get faster processing.
You can tell whenever this is the case by looking at the SAS log, which will show you how much time was used by the CPU (the rest is IO):
NOTE: PROCEDURE SQL used (Total process time):
real time 11.07 seconds
cpu time 1.67 seconds