Getting "Backend error. Job aborted" while trying to export a big query table to GCS - google-bigquery

Since the past couple of weeks I have been continuously getting "Backend error. Job aborted" error while trying to export a big query table to google cloud storage in csv format.
The table has been created using bq select * statement (using allowLargeQueryResult option)
Also the target bucket name doesn't seem to be problematic.
Here's a sample extract.
Errors:
Backend error. Job aborted.
Job ID: kiwiup.com:kiwi-bigquery:job_mk90xJqtyinbzRqIfWVjM2mHLP0
Start Time: 2:53pm, 8 Aug 2014
End Time: 8:53pm, 8 Aug 2014
The job is taking almost six hours to complete after which it fails. Previously it used to complete in a couple of minutes. Any help would be appreciated.

Your export job hit a timeout. We're currently investigating why; the date of your job coincides with a bandwidth issue we were having that should have been resolved. We're currently adding more instrumentation and monitoring so it will be easier to debug in the future.
As a workaround, if you give multiple extraction URI patterns, BigQuery will spin up more workers in parallel. See the "Multiple Wildcard URIs" example here.

As Jordan said, this coincided with a bandwidth problem. Sorry for the inconvenience.
In some cases, giving multiple wildcard URIs will increase parallelism, but this applies only to fairly large (10's of GB) tables, and can actually decrease parallelism. Multiple wildcard URIs are designed to support Hadoop jobs, not to control parallelism.

Related

Query Terminating in Redshift

We are migrating our database from SQL Server 2012 to Amazon Redshift.
The front end of our application is developed in MicroStrategy (MSTR) which fires the queries on Redshift.
Although the application is working fine in Production (on SQL Server 2012), we have run into a strange issue in our PoC Environment on Redshift.
When we kicked off a dashboard in MSTR, the query from the dashboard hits Redshift and it completes successfully without any issues.
But when we stress test the application by running all the dashboards simultaneously, then that particular dashboard's query terminates in Redshift. The database does not throw any error message which is why we cannot troubleshoot why the query is terminating.
Can anyone please suggest how we should go about solving this problem.
Thank you
The problem might be that you have some timeout on the queue that you are sending the query using WLM configuration.
Redshift is designed differently from other DB, to be optimized for Analytical queries. For that reason it doesn't cache queries results, as you would do with OLTP DB. The other difference is that you have a predefined concurrently level (also part of WLM - http://docs.aws.amazon.com/redshift/latest/mgmt/workload-mgmt-config.html). Each concurrency slot will have its allocated resources to complete big queries quickly, but it is limiting the number of concurrent queries that can run. The default configuration is 5, and you can increase it up to 50. The recommendation is to have it increased to not more than 15-20, as with 50, it means that each query is getting only 2% of the cluster resource instead of 20% (with 5) or 5% (with 20).
The combination of these two differences is: if you are connecting many dashboards, each one sends its queries to Redshift, competes over the resources (without caching each query will run again and again), and might timeout or just be too slow for an interactive dashboard.
Please make sure that you are using the Redshift optimized drivers for MicroStrategy, which are sending queries to Redshift under the above assumptions.
You can also consider putting some RDS between your dashboards and Redshift, with the aggregation data that you need for your dashboards, and that can use in-memory caching and higher concurrency on that summary data. You can see an interesting pattern that you can implement with pg-bouncer see here, that can help you send some queries (the analytical ones) to Redshift, and some (the aggregated dashboard ones) to a PostgreSQL one.

Time-out occurred while waiting for buffer latch type 3 while processing MOLAP cube

This is the error I get from the Log while trying to process a SQL Server 2012 MOLAP Cube.
"Time-out occurred while waiting for buffer latch type 3 for page (1:2044928) database ID 2.; 42000." Source="Microsoft SQL Server 2012 Analysis Services" HelpFile="Error ErrorCode="3240034318" Description="Errors in the OLAP storage engine: An error occurred while processing the 'Measurement' partition of the measure group for the 'PE cube' cube from the Cube database."
I have scripted the processing task in XMLA and execute the processing via a SSAS Command in an Agent Job.
The first step is to Process Update all dimensions and this succeeds, but when I want to Process Data of the cube the load fails and this error pops up.
I first tried processing with an SSIS package, but this caused the whole server to crash instead of just the job failing. This leads me to believe this a performance issue, but the machine running the job is an Azure VM with 16 processors and 112 GB RAM so I don't know where to look. I also tried running the job without any other activities on the server, but that did not help.
The disk containing the SSAS Instance still has 500GB Free.
The measure group is querying a table containing 180 million records.
While processing the cube on a Dev server with way less data there are no issues. I once succeeded to Process Full the whole cube while processing the SSAS cube directly within SSAS, but via DTEXEC, SSISDB or using SSDT the processing results in a server crash.
Earlier I got different time-out errors, but after adjusting the SSAS ExternalCommandTimeOut, ExternalConnectionTimeOut and ForceCommitTimeout properties to 0 this did not occur anymore.
I have tried multiple processing settings, but because I think it is a performance issue I tried to make the processing as low as possible on performance.
Processing Settings:
Object: Cube; Option: Process Data;
Processing Order: Sequential with Seperate Transactions.
Writeback Table Option: Use Existing;
Do not process affected objects.
Update:
I have processed the measure which triggered the error on its own, this did not finish and in the Activity Monitor I saw a lot of Wait_Type IO_Completion and CXPacket. And when querying the sys.dm_exe_requests I see a Select with wait_type IO_Completion which is already running for a long time and a lot of reads.
Last night I tried to process all measurements excluding the measuregroup which triggered the error earlier, but unfortunately the whole server crashed again...
Update2:
We have looked into upgrading to premium storage, but this means we have to switch from A11 to a DS or GS serie. Meaning we need to resize the whole VM which contains live solutions resulting in down-time and effort to restore the VHDS and replacing the current OS disk which contains parts of live solutions.
Another option we identified is applying partitioning or improving the underlying queries from the measures. Unfortunately way more effort than anticipated, a quick work-around for now would help a lot in selling a long-term solution improvement.
Update3:
We have had contact with Microsoft and they advice to migrate from an A11 VM to a D14 V2 and upgrade to premium storage disks. This will be our next step and will be executed upcoming friday. After the migration I will update or close this post.
If you miss information, please let me know. Any suggestions that would help me pin-point the situation would be much appreciated!
The upgrade to a VM better suitable for the situation (DS14 V2) and upgrade to P30 premium storage disks have resolved the occuring issues. The issue was not in the way the cube was being processed or configured, but in the hardware used.

Abort a table import stuck in 'pending'

Similar questions have been asked but not exactly what I am looking for.
The problem: on some occasions importing a table from Google Cloud to Big Query gets stuck in a 'pending' state for hours if not days. Tables that get stuck in this state never seem to come out of it, or at least we didn't bother waiting that long. I know it's not a queue issue since in the mean time we can import other tables just fine. No errors are returned by Big Query.
My question: in this situation, and in general, how can we safely abort/cancel an import to Big Query without having the table quietly import on us without us knowing. This would actually apply to any table regardless of its state, as long as it hasn't finished importing.
Thanks.
You may be hitting job load rate limits. For example, if you try to start more than two load jobs per minute for the same table, the load jobs against that table will be defferred, while other load jobs against other tables may continue at normal speed.
There are per-project limits on rate at which load jobs will be started and limits on the number of load jobs that can be running per project at any one time. If you send jobs faster than this, we'll queue, but as you've noticed, our queueing is not a fair queue, and can start newer jobs before older ones.
Aborting pending jobs is a commonly requested feature. If you file a feature request here that will help us prioritize it.

what are the best practices to run Hive queries

Usually Hive queries takes some time to execute which could be few minutes to hours. If several hundred Java clients are executing Hive queries then potentially such clients will be waiting for long time to get the results and may time out due to network issues. Is there a asynchronous feature with Hive that can be used instead of synchronous behavior?
What are the best practices to mitigate such issues?
Hadoop job scheduler provides guaranteed capacity to production jobs and good response time to interactive jobs while allocating resources fairly between users. You can check the following blog.
http://blog.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/
There is no asynchronous feature with Hive

Loading from Google cloud storage to Big Query seems slow

I'm running a test using Big Query. Basically I have 50,000 files, each of which are 27MB in size, on average. Some larger, some smaller.
Timing each file upload reveals:
real 0m49.868s
user 0m0.297s
sys 0m0.173s
Using something similar to:
time bq load --encoding="UTF-8" --field_delimiter="~" dataset gs://project/b_20130630_0003_1/20130630_0003_4565900000.tsv schema.json
Running command: "bq ls -j" and subsequently running "bq show -j " reveals that I have the following errors:
Job Type State Start Time Duration Bytes Processed
load FAILURE 01 Jul 22:21:18 0:00:00
Errors encountered during job execution. Exceeded quota: too many imports per table for this table
After checking the database, the rows seems to of loaded fine which is puzzling since, given the error, I would of expected nothing to of gotten loaded. The problem is that I really don't understand how I reached my quota limit since I've only just started
uploading files recently and thought the limit was 200,000 requests.
All the data is currently on Google Cloud Storage so I would expect the data loading to happen fairly quickly since the interaction is between cloud storage and Big Query both of which are in the cloud.
By my calculations the entire load is going to take: (50,000 * 49 seconds) 28 days.
Kinda hoping these numbers are wrong.
Thanks.
The quota limit per table is 1000 loads per day. This is to encourage people to batch their loads, since we can generate a more efficient representation of the table if we can see more of the data at once.
BigQuery can perform load jobs in parallel. Depending on the size of your load, a number of workers will be assigned to your job. If your files are large, those files will be split among workers; alternately if you pass multiple files, each worker may process a different file. So the time that it takes for one file is not indicative of the time that it takes to run a load job with multiple files.