Simple count query takes amazingly long time to accomplish.
Am I doing something wrong?
SELECT COUNT(*) FROM `TABLE`
(if someone from bigquery hears this:
jobid: southamerica-east1.bquxjob_6a6df24d_16dfdbe0b54)
There are multiple reasons for a query running slow in BigQuery, as mentioned in the comments - if your table is an external table, that might cause an issue as well. If timing is critical for you and the queries you have are extremely simple, you might want to consider using Cloud SQL which is a realtime database.
BigQuery is normally used for larger more complex queries over very large datasets. If you have a support package, you might want to reach the Google Cloud Support team to have a look at the query to understand why it is running so slow.
Another workaround, just in case you only want to know the number of rows, could be to query the metadata:
SELECT table_id, row_count, size_bytes
FROM `PROJECT_ID.DATASET.__TABLES__` where table_id = 'your_table'
Related
We have a view in athena which is partitioned on processing_date (data type: string - format 20201231)
We are looking for data in 2020.
For exploration, we need all the columns.
Query :
select * from online_events_dw_view
where from_iso8601_date(processing_date) > from_iso8601_date('20191231')
Error :
Query exhausted resources at this scale factor
Is there any better way to optimize the query
You are applying a function to the partition column, chances are high that this leads to athena scanning all data and therefore you run into the problem.
Why not simply: processing_date like '2020%'
Maybe also try with a limit 1000 to limit the amount of data if you are just interested in the column.
The error "Query exhausted resources at this scale factor" is most often caused when sorting result sets with a lot of columns.
Since you don't post the view SQL there is no way to say for sure if that is the problem in your case, but it's almost always wide rows and sorting so I assume there is an ORDER BY in your view. Try removing that and see if the query executes without error.
Is there any better way to optimize the query
You need to post much more information for us to be able to help you. Without the SQL for the view it is impossible to say anything. Also post the SQL for all involved tables, and give some context about partitioning, the amount of data, the file formats, etc.
I'm trying to avoid the processing cost in BigQuery when creating a table from a query result e.g:
select * from xxx where ....
...and write into a destination table.
Is there a way to do it using one of the BigQuery tools?
No.
Whenever you query, you will be pay for how much data you are touching in the query no matter what.
Remember, your first 1TB p/m is free in BigQuery. So maybe this is not really a problem for you.
I've a performance issue with using Scalar User Defined Functions(UDF) in queries.
There is a UDF fn_get(i int) which returns a scalar.. It holds lot of logic and performs normal scalar operations..
Actually
SELECT *,fn_get(i) FROM #temp1;
is fetching 10,000 results within 3 seconds and getting displayed in Studio UI.
Whereas,
SELECT *,fn_get(i) INTO #temp2 FROM #temp1;
is inserting the same 10,000 results into table #temp2, by taking time of >4 minutes
Don't know why difference is such enormous (3 seconds vs 4 minutes :O)
Am not sure, if this is the way to ask a question here.. Any guidance to improve the query performance is of great help..
Michael is correct, if the table is very wide I would expect a massive performance hit on inserting the entire table width l, so your first step should be to only select/insert the int value and see what the performance looks like.
After that, the other thing I'd like you to try is switching your UDF to one that is very simple, (maybe just multiply by 10) and see if performs just as slowly.
I ask you to test that because one interesting thing I have experienced with UDFs in sybase IQ is if you use an operation that is not supported by IQ but is supported by the ASA store, is that you will cross the engine boundary. This could also happen if you created your UDF "in system" which means it's in the ASA store. If your #temp2 table is in the IQ store, the data movement would be read from IQ, moved to ASA to perform data ops then finally move back to IQ (slowly) to write to your temp table. In my experience, data moves very quickly from the IQ engine to the ASA engine, but much much slower going the other direction.
This is why I believe the select was quick (it came directly from the ASA store after data ops) and the insert is almost 100x slower.
Sybase IQ is a OLAP tuned columnar database. Which means, out of the box it is tuned for reads, not writes. So it would be normal for read performance to greatly outpace write performance, even on the same data set.
Now many things can affect write performance, storage type, IO bandwidth, caching, indexing are a few of the factors.
To get more detailed information on the particulars of your query, you should take a look at the execution plan. This will help break down where the system is spending time.
SAP has a detailed document on Sybase IQ execution/query plans. It may not be for the specific version of IQ you are running, but the information will be generally applicable.
Note: It is highly discouraged to use select * (ever) in a columnar database. The data is split and organized by column, so reassembling an entire row is a very costly procedure. Unless you absolutely need every column in the row, you should always specify which columns. It is also just general SQL best practice to always specify columns in your query, even if you are retrieving all of the columns.
I am a noob at riak, and have been trying to test the query aspect of riak using riakc in erlang. but I can not find any example of how to query the database that match the old SQL way, Only on how to get a single value out of a single field. I think I am missing something but all I really want is just a standard SQL query with a matching riakc code.
SELECT * FROM bucket;
SELECT * FROM bucket LIMIT 10, 100;
SELECT id, name FROM bucket;
SELECT * FROM bucket WHERE name="john" AND surname LIKE "Ste%";
SELECT * FROM bucket LEFT JOIN bucket2 ON bucket.id = bucket2.id2;
I assume that there is no direct correlation on how you write these, but was hoping there is a standard way, and is there somewhere that has a a simple to understand way of explaining this querys in riakc (or even just riak).
I have looked a mapreduce but found that it was confusing for just simple queries
Riak is a NoSQL database, more specifically a key-value database, and there is no query language like SQL available. When working with Riak you need to model and query your data in a completely different way compared to how you use a relational database in order to get the most from it. Trying to model and query your data in a relational manner, e.g. by extensive use of secondary indexes or by trying to use map/reduce as a real-time query language, generally results in very poor performance and scalability. A good and useful discussion about Riak development anti-patterns that can be found here.
We’re having a problem we were hoping the good folks of Stack Overflow could help us with. We’re running SQL Server 2008 R2 and are having problems with a query that takes a very long time to run on a moderate set of data , about 100000 rows. We're using CONTAINS to search through xml files and LIKE on another column to support leading wild cards.
We’ve reproduced the problem with the following small query that takes about 35 seconds to run:
SELECT something FROM table1
WHERE (CONTAINS(TextColumn, '"WhatEver"') OR
DescriptionColumn LIKE '%WhatEver%')
Query plan:
If we modify the query above to using UNION instead, the running time drops from 35 seconds to < 1 seconds. We would like to avoid using this approach to solve the issue.
SELECT something FROM table1 WHERE (CONTAINS(TextColumn, '"WhatEver"')
UNION
(SELECT something FROM table1 WHERE (DescriptionColumn LIKE '%WhatEver%'))
Query plan:
The column that we’re using CONTAINS to search through is a column with type image and consists of xml files sized anywhere from 1k to 20k in size.
We have no good theories as to why the first query is so slow so we were hoping someone here would have something wise to say on the matter. The query plans don’t show anything out of the ordinary as far as we can tell. We've also rebuilt the indexes and statistics.
Is there anything blatantly obvious we’re overlooking here?
Thanks in advance for your time!
Why are you using DescriptionColumn LIKE '%WhatEver%' instead of CONTAINS(DescriptionColumn, '"WhatEver"')?
CONTAINS is obviously a Full-Text predicate and will use the SQL Server Full-Text engine to filter the search results, however LIKE is a "normal" SQL Server keyword and so SQL Server will not use the Full-Text engine to asist with this query - In this case because the LIKE term begins with a wildcard SQL Server will be unable to use any indexes to help with the query either which will most likely result in a table scan and / or poorer performance than using the Full-Text engine.
Its difficult impossible to tell without an execution plan, however my guess on whats happening would be:
The UNION variation of the query is performing a table scan against table1 - the table scan is not fast, however because there are relatively few rows in the table it is not performing that slowly (compared to a 35s benchmark).
In the OR variation of the query SQL Server is first using the Full-Text engine to filter based on the CONTAINS and then goes on to perform an RDI lookup on each matching row in the result to filter based on the LIKE predicate, however for some reason SQL Server has massively underestimated the number of rows (this can happen with certain types of predicate) and so goes on to perform several thousnad RDI lookups which ends up being incredibly slow (a table scan would have been much quicker).
To really understand whats going on you need to get a query plan.
Did you guys try this:
SELECT *
FROM table
WHERE CONTAINS((column1, column2, column3), '"*keyword*"')
Instead of this:
SELECT *
FROM table
WHERE CONTAINS(column1, '"*keyword*"')
OR CONTAINS(column2, '"*keyword*"')
OR CONTAINS(column3y, '"*keyword*"')
The first one is a lot faster.
I just ran into this. This is reportedly a bug on SQL server 2008 R2:
http://www.arcomit.co.uk/support/kb.aspx?kbid=000060
Your approach of using a UNION of two selects instead of an OR is the workaround they recommend in that article.