How to circumvent BigQuery's 20 concurrent queries limitation? - google-bigquery

Wondering if anyone knows... or have ran into this.. there's a 20 concurrent queries limitation for BigQuery.
https://developers.google.com/bigquery/quota-policy#queries
Is there a way to disable the limit? Our MapReduce tasks needs many concurrent queries in order to complete within a reasonable amount of time.

We have a similar problem. There is no way how to change this from your side. Also "upgrading your plan" as #Dominik suggest won't help.
You have to contact Google directly, explain your problem (business case) and if it is valid they can increase your quota limits (for certain Google Cloud project)

Related

How to troubleshoot suspended queries in Azure Synapse?

Currently, I encounter an issue of suspended queries in Azure Synapse when executing from ADF (Store procedures call).
Also, I followed the suggestion in the link below for troubleshooting the issue:
Delete due to sensitive informations
The troubleshoot queries returned as below:
I checked if the transaction lock is the issue as I killed a few suspending or running queries which they ran for more than 15 hours. I also checked for the rest of the queries running but there is nothing would cause the transaction lock. I tried to run the store procedure manually from Azure Data Studio which is blocked as mentioned above and It took 40 seconds to complete.
While the suspending query from ADF, it took nearly an hour to finish.
Any suggestion to troubleshoot this issue is much appreciated.
Thanks
There a number of factors you must always consider when tuning queries in Azure Synapse Analytics, dedicated SQL pools:
DWU - what DWU is your pool at? Lower DWUs mean lower concurrent users and lower performance and should not be used for any kind of performance tuning. Crank it up temporarily to rule this out as a problem, bearing in mind changing this disconnects any active queries. Also bear in mind, not all queries respond to higher DWU.
Resource class - what resource class is associated with the user executing these queries? Remember the default is smallrc, and the admin user always has smallrc. Understand static and dynamic resource classes. DMV sys.dm_pdw_exec_requests will give you useful information on this. Trial with your workload to find the sweetspot between performance and concurrency v resource class. Encourage your dev team to use labels in their queries: OPTION ( LABEL = 'some informative label' )
Table geometry - this is the distribution (ROUND_ROBIN|HASH|REPLICATE) of your table and the indexing choice (CLUSTERED COLUMNSTORE|CLUSTERED INDEX|HEAP). Clustered columnstore and round robin are the defaults but they are not always appropriate. Consider what is appropriate for your tables.
If you work through those and still have an issue you can start to look at statistics and workload classification for starters, but gather information on the points above should give you a good idea.
If you are just doing single value INSERTs, then don't. Dedicated SQL pools are terrible with these. Convert these to load from a file in a single INSERT / COPY INTO.

Allowing many users to view stale BigQuery data query results concurrently

If I have a BigQuery dataset with data that I would like to make available to 1000 people (where each of these people would only be allowed to view their subset of the data, and is OK to view a 24hr stale version of their data), how can I do this without exceeding the 50 concurrent queries limit?
In the BigQuery documentation there's mention of 50 concurrent queries being permitted which give on-the-spot accurate data, which I would surpass if I needed them to all be able to view on-the-spot accurate data - which I don't.
In the documentation there is mention of Batch jobs being permitted and saving of results into destination tables which I'm hoping would somehow allow a reliable solution for my scenario, but am having difficulty finding information on how reliably or frequently those batch jobs can be expected to run, and whether or not someone querying results that exist in those destination tables is in itself counting towards the 50 concurrent users limit.
Any advice appreciated.
Without knowing the specifics of your situation and depending on how much data is in the output, I would suggest putting your own cache in front of BigQuery.
This sounds kind of like a dashboading/reporting solution, so I assume there is a large amount of data going in and a relatively small amount coming out (per-user).
Run one query per day with a batch script to generate your output (grouped by user) and then export it to GCS. You can then break it up into multiple flat files (or just read it into memory on your frontend). Each user hits your frontend, you determine which part of the output to serve up to them and respond.
This should be relatively cheap if you can work off the cached data and it is small enough that handling the BigQuery output isn't too much additional processing.
Google Cloud Functions might be an easy way to handle this, if you don't want the extra work of setting up a new VM to host your frontend.

List all the queries made to BigQuery with their processed bytes

I would like to know if there is a method in the BigQuery API or any other way where i can list all the queries made and their processed bytes. Something like what is listed in the Activity Page but with the processedBytes field:
https://console.cloud.google.com/home/activity?project=coherent-server-125913
We are having a problem with billing. Suddenly our BigQuery Analysis Costs have increased a lot and we think we are being charged like 20 times more than expected (we check all the responses from BigQuery API and save the processedBytes field, taking into account that the minimum charge is of 10MB).
The only way we can solve this difference is listing all the requests and comparing to our numbers to see if we arenĀ“t measuring something or if we are doing something wrong. We have opened a billing support ticket and they have redirected me to Stackoverflow for asking the question as they think that is a technical issue.
Thanks in advance!
Instead of checking totalBytesProcessed - you should try checking totalBytesBilled and billingTier (see here)
You might jumped to high billing tiers - just guess
The best place to check would be the BigQuery logs.
This is going to tell you what queries were run, who ran them, what date/time they were run, the total bytes billed etc.
Logs can be a bit tedious to look through but BigQuery allows you to stream BigQuery logs into a BigQuery table and you can then query said table to identify expensive queries.
I've done this and it works really well to give you visibility on your BQ charges. The process of how to do this is outlined in more detail here: https://www.reportsimple.com.au/post/google-bigquery

how to raise Google Big Query daily query quota

We are running a batch process, and hitting the daily query quota of 20,000.
Is there a way to raise the limit?
thanks.
The query-per-day limit (currently 40k / day) is one that we're generally happy to raise. In general, it is there to prevent abuse scenarios (people who use BigQuery as a calculator, as in SELECT 17 + 32). If you're running real queries over non-trivial sized data, we will almost certainly be willing to raise this ceiling.
If you've got a contact with Google Cloud Support, please let them know your project ID.
If you do not have a support contact, you can indicate your project ID here, or e-mail me (tigani#google.com) and I will route the request appropriately.

Big query is to slow

I am just starting with biquery, my DB is small (10K of rows 1 table) and my queries are simple count and group by.
Its takes and average of 3-4 sec per request but sometimes its jumps to 10 and event 15sec
I am querying from amazon linux server in Irland using the BQ tool.
Is it possible to get results faster (under 1sec) so I will be able to present my webpages faster.
1) Big Query is a highly scalable database, before being a "super fast" database. It's designed to process HUGE amount of data distributing the processing among several different machines using a technique named Dremel. Because it's designed to use several machines and parallel processing, you should expect to have super-scalability with a good performance.
2) BigQuery is an asset when you want to analyze billions of rows.
For example: analyzing all the wikipedia revisions in 5-10 seconds isn't bad, is it? But even a much smaller table would take about the same time, even if has 10k rows.
3) Under this size, you'll be better off using more traditional data storage solutions such as Cloud SQL or the App Engine Datastore. If you want to keep SQL capability, Cloud SQL is the best guess.
Sybase IQ is often installed in a single database and it doesn't use Dremel. That said, it's going to be faster than Big Query in many scenarios...as designed.
4) Certainly the performance differ from a dedicated environment. You get your dedicated environment for 20K$ a month.
That's the expected behaviour. In BigQuery you are using a shared infrastructure, so depending on the use at the moment you will get better or worse response time. Actually batch queries (those not needing interactivity) are encouraged and rewarded by not adding up to your quota.
You typically don't use BigQuery as your main database to show data in your web application. Depending on what you want to do, BigQuery can be a Big Data storage and you should have another intermediate store where you could store computed results to display to your users. Or maybe in your use case you don't really need BigQuery and there is a better solution.
In any case, you are not going to be able to avoid a few seconds wait (even if you go Premium, you get more guarantees about the service, but in no case a service fast enough as to be your main backend for a webapp)