Backend error, when loading gzip csv - google-bigquery

I got the "backend error. Job aborted" , job ID below.
I know this question was asked, but I still need some help to try & resolve this.
what happen if this happen in production,we want to have a 5min periodic loads?
thanks in advance
Errors:
Backend error. Job aborted.
Job ID: job_744a2b54b1a343e1974acdae889a7e5c
Start Time: 4:32pm, 30 Aug 2012
End Time: 5:02pm, 30 Aug 2012
Destination Table: XXXXXXXXXX
Source URI: gs://XXXXX/XXXXXX.csv.Z
Delimiter: ,
Max Bad Records: 99999999

This job hit an internal error. Since you ran this job, BigQuery has been updated to a new version, and a number of internal errors have been fixed. Can you retry your job?

Related

Setting Cloudwatch Event Cron Job

I'm a little bit confused on the cron job documentation for cloudwatch events. My goal is to create a cron job that run every day at 9am, 5pm, and 11pm EST. Does this look correct or did I do it wrong? It seems like cloudwatch uses UTC military time so I tried to convert it to EST.
I thought I was right, but I got the following error when trying to deploy cloudformation template via sam deploy
Parameter ScheduleExpression is not valid. (Service: AmazonCloudWatchEvents; Status Code: 400; Error Code: ValidationException
What is wrong with my cron job? I appreciate any help!
(SUN-SAT, 4,0,6)
UPDATED:
this below gets same error Parameter ScheduleExpression is not valid
Events:
CloudWatchEvent:
Type: Schedule
Properties:
Schedule: cron(0 0 9,17,23 ? * * *)
MemorySize: 128
Timeout: 100
You have to specify a value for all six required cron fields
This should satisfy all your requirements:
0 4,14,22 ? * * *
Generated using:
https://www.freeformatter.com/cron-expression-generator-quartz.html
There are a lot of other cron generators you can find online.

Talend (7.0.1) - Cannot modify mapred.job.name at runtime

I am having some trouble running a simple tHiveCreateTable job in Talend OS for Big Data (Print of the job where I am getting this error).
The Hive connection is fine and the job worked until Ranger was activated in the cluster.
After ranger, I started getting the following log:
[statistics] connecting to socket on port 3345
[statistics] connected
Error while processing statement: Cannot modify mapred.job.name at runtime. It is not in list of params that are allowed to be modified at runtime
[statistics] disconnected
This error occurs either using Tez or MapReduce for the job, throwing an exception in the following line of the automatically generated code:
// For MapReduce Mode
stmt_tHiveCreateTable_1.execute("set mapred.job.name=" + queryIdentifier);
Do you know any solution or workarround for this?
Thanks in advance
It is possible to disable changing mapreduce.job.name and hive.query.name at runtime by Talend7 jobs.
Edit the file
{talend_install_dir}/plugins/org.talend.designer.components.localprovider_7.1.1.20181026_1147/components/templates/Hive/SetQueryName.javajet
and comment out lines 6 and 11 like that:
// stmt_<%=cid %>.execute("set mapred.job.name=" + queryIdentifier_<%=cid %>);
// stmt_<%=cid %>.execute("set hive.query.name=" + queryIdentifier_<%=cid %>);
It solved this issue for me.

Unexpected. Please try again, error message

Trying to import a 2.8GB of file via GS to Big query, and the job failed with:
Unexpected. Please try again.
here are some other outputs
Job ID: aerobic-forge-504:job_a6H1vqkuNFf-cJfAn544yy0MfxA
Start Time: 5:12pm, 3 Jul 2014
End Time: 7:12pm, 3 Jul 2014
Destination Table: aerobic-forge-504:wr_dev.phone_numbers
Source URI: gs://fls_csv_files/2014-6-11_Global_0A43E3B1-2E4A-4CA9-BD2A-012B4D0E4C69.txt
Source Format: CSV
Allow Quoted Newlines: true
Allow Jagged Rows: true
Ignore Unknown Values: true
Schema:
area: INTEGER
number: INTEGER
The job failed due to timeout; there is a maximum of 2 hours allowed for processing; after that the import job is killed. I'm not sure why the import was so slow; from what I can tell we only processed at about 100KB/sec, which is far slower than expected. It is quite possible that the error is transient.
In the future, you can speed up the import by setting allow_quoted_newlines to false, which will allow bigquery to process the import in parallel. Alternately, you can partition the file yourself and send multiple file paths in the job.
Can you try again and let us know whether it works?

PDI Error occured while trying to connect to the database

I got the following error while executing a PDI job.
I do have mysql driver in place (libext/JDBC). Can some one say, what would be the reason of failure?
Despite the error while connecting to DB, my DB is up and I can access it by command prompt.
Error occured while trying to connect to the database
Error connecting to database: (using class org.gjt.mm.mysql.Driver)
Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
ERROR 03-08 11:05:10,595 - stepname- Error initializing step [Update]
ERROR 03-08 11:05:10,595 - stepname - Step [Update.0] failed to initialize!
INFO 03-08 11:05:10,595 - stepname - Finished reading query, closing connection.
ERROR 03-08 11:05:10,596 - stepname - Unable to prepare for execution of the transformation
ERROR 03-08 11:05:10,596 - stepname - org.pentaho.di.core.exception.KettleException:
We failed to initialize at least one step. Execution can not begin!
Thanks
Is this a long running query by any chance? Or; in PDI world it can be because your step kicks off at the start of the transform, waits for something to do, and if nothing comes along by the net write timeout then you'll see this error.
If so your problem is caused by a timeout that MySQL uses and frequently needs increasing from the default which is 10 mins.
See here:
http://wiki.pentaho.com/display/EAI/MySQL

Errors encountered during job execution. Unexpected. Please try again

I'm running a simple query which reads from a five-column table with five million rows and copies it to a new table adding an extra constant column.
This query keeps on failing with the following message:
u'status': {u'errorResult': {u'message': u'Unexpected. Please try again.',
u'reason': u'internalError'},
u'errors': [{u'message': u'Unexpected. Please try again.',
u'reason': u'internalError'}],
u'state': u'DONE'}}
Checking the jobs in BigQuery I get:
Errors encountered during job execution. Unexpected. Please try again.
The following jobIds had this problem:
job_6eae11c7792446a1b600fe5554071d32 query FAILURE 02 Aug 09:46:04 0:01:00
job_92ff013841574399a459e8c4296d7c73 query FAILURE 02 Aug 09:45:10 0:00:49
job_a1a11ee7bbec4c08b5e58b91b27aafad query FAILURE 02 Aug 09:43:40 0:00:55
job_496f8af99da94d8292f90580f73af64e query FAILURE 02 Aug 09:42:46 0:00:51
How can I fix this problem?
This is a known bug with large result sets ... we've got a fix and will release it today.