Failure of Importing data from Bigquery to GCS - google-bigquery

Dear support at Google,
We recently noticed that many of the GAP site import jobs extracting&uploading data from Google Bigquery to Google Cloud Service have been failing (Since April 4th). Our uploading jobs are running fine before April 4th but have been failing since April 4th, after did investigation, we feel this is an issue/error from Bigquery side, not from our job. The details of error info from Bigquery API when uploading data is shown below:
216769 [main] INFO  org.mortbay.log  - Dataset : 130288123
217495 [main] INFO  org.mortbay.log  - Job is PENDING waiting 10000 milliseconds...
227753 [main] INFO  org.mortbay.log  - Job is PENDING waiting 10000 milliseconds...
237995 [main] INFO  org.mortbay.log  - Job is PENDING waiting 10000 milliseconds...
Heart beat
248208 [main] INFO  org.mortbay.log  - Job is PENDING waiting 10000 milliseconds..
258413 [main] INFO  org.mortbay.log  - Job is PENDING waiting 10000 milliseconds...
268531 [main] INFO  org.mortbay.log  - Job is RUNNING waiting 10000 milliseconds...
Heart beat
278675 [main] INFO  org.mortbay.log  - An internal error has occurred
278675 [main] INFO  org.mortbay.log  - ErrorProto : null
 
As per log, it is an internal error with the issue ErrorProto:null.
 
Our google account: ea.eadp#gmail.com
 
Our Google Big Query projects:
Origin-BQ              origin-bq-1
Pulse-web             lithe-creek-712
The importing failure on following data set:
 
In Pulse-web, lithe-creek-712:
101983605
130288123
48135564
56570684
57740926
64736126
64951872
72220498
72845162
73148296
77517207
86821637
 
 
Please look into this and let us know if you have any updates.
Thank you very much and looking forward to hearing back from you.
 
Thanks

Related

Parallel processing Flux::groupBy hangs

I'm using reactor 3.4.18 and have a question about Flux.groupBy. I have generated 1000 integers and splited them into 100 groups, I expect that each group could be process in sperate thread but it hangs after several integers processed.
#Test
void shouldGroupByKeyAndProcessInParallel() {
final Scheduler scheduler = Schedulers.newParallel("group", 1000);
StepVerifier.create(Flux.fromStream(IntStream.range(0, 1000).boxed())
.groupBy(integer -> integer % 100)
.flatMap(groupedFlux -> groupedFlux
.subscribeOn(scheduler) // this line doesn't help
.doOnNext(integer -> log.info("processing {}:{}", groupedFlux.key(), integer)),
2)
)
.expectNextCount(1000)
.verifyComplete();
}
test execution logs:
10:47:58.670 [main] DEBUG reactor.util.Loggers - Using Slf4j logging framework
10:47:58.846 [group-1] INFO com.huawei.hwclouds.coney.spike.FluxGroupByTest - processing 0:0
10:47:58.866 [group-1] INFO com.huawei.hwclouds.coney.spike.FluxGroupByTest - processing 1:1
10:47:58.867 [group-1] INFO com.huawei.hwclouds.coney.spike.FluxGroupByTest - processing 0:100
10:47:58.867 [group-1] INFO com.huawei.hwclouds.coney.spike.FluxGroupByTest - processing 1:101
10:47:58.867 [group-1] INFO com.huawei.hwclouds.coney.spike.FluxGroupByTest - processing 0:200
10:47:58.867 [group-1] INFO com.huawei.hwclouds.coney.spike.FluxGroupByTest - processing 1:201
-------- start hanging ----------
I have changed the flatmap concurrecy to 2 to speed up the reproduction. I expects that flatmap should only slow the whole processing time but should not hang

BigQuery Transfer: Google Ads (formerly AdWords): Transfer job is successful but no data

I try to setup transfer with following configuration:
Source: Google Ads (formerly AdWords)
Destination dataset: app_google_ads
Schedule (UTC): every day 08:24
Notification Cloud Pub/Sub topic: None
Email notifications: None
Data source details
Customer ID: xxx-xxx-xxxx
Exclude removed/disabled items: None
I got no error during transfer but my dataset is empty, why?
12:02:00 PM Summary: succeeded 72 jobs, failed 0 jobs.
12:01:04 PM Job 77454333956:adwords_5cdace41-0000-2184-a73e-001a11435098 (table p_VideoConversionStats_2495318378$20190502) completed successfully
12:00:04 PM Job 77454333956:adwords_5cdace37-0000-2184-a73e-001a11435098 (table p_HourlyAccountStats_2495318378$20190502) completed successfully
12:00:04 PM Job 77454333956:adwords_5cd88a2b-0000-2117-b857-089e082679e4 (table p_HourlyCampaignStats_2495318378$20190502) completed successfully
12:00:04 PM Job 77454333956:adwords_5cd0ba27-0000-2c7c-aed0-f40304362f4a (table p_AudienceBasicStats_2495318378$20190502) completed successfully
12:00:04 PM Job 77454333956:adwords_5cd907f8-0000-2e16-a735-089e082678cc (table p_KeywordStats_2495318378$20190502) completed successfully
12:00:04 PM Job 77454333956:adwords_5cd88a32-0000-2117-b857-089e082679e4 (table p_ShoppingProductConversionStats_2495318378$20190502) completed successfully
12:00:04 PM Job 77454333956:adwords_5cce5c09-0000-28bd-86d3-f4030437b908 (table p_AdBasicStats_2495318378$20190502) completed successfully
etc
I have AdBlocked enabled in my browser. So it prevent me to see google ads tables in my dataset. So I turn off it and it works!

RavenDB Restore slow

I've been testing our DR Process for a new application and am finding that RavenDB restore is taking an unexpected and unacceptable amount of time. I need to know if there is something wrong with my process or if there is a way of improving performance.
For the 70MB database I am restoring it is taking > 8 hours.
After stopping the RavenDB Windows service I'm using the following command, after reading RavenDB documentation http://ravendb.net/docs/server/administration/backup-restore
d:\RavenDB\Server>Raven.Server.exe -src "D:\Backups\RavenDB\2013-11-25_2330\MyDatabase\RavenDB.Backup" -dest "D:\RavenDB\Server\Data
base\Databases\" -restore
I get progress reporting like this:
Request #10,306: POST - 72 ms - <system> - 201 - /admin/backup
Request #10,307: GET - 21 ms - <system> - 200 - /docs/Raven/Backup/Stat
us
Request #10,308: GET - 0 ms - <system> - 200 - /docs/Raven/Backup/Stat
us
Request #10,309: POST - 1,150 ms - MyDatabase - 201 - /admin/backup
Request #10,310: GET - 32 ms - MyDatabase - 200 - /docs/Raven/Backup/Status
etc
But have not yet had confirmation of successful restore.

PDI Error occured while trying to connect to the database

I got the following error while executing a PDI job.
I do have mysql driver in place (libext/JDBC). Can some one say, what would be the reason of failure?
Despite the error while connecting to DB, my DB is up and I can access it by command prompt.
Error occured while trying to connect to the database
Error connecting to database: (using class org.gjt.mm.mysql.Driver)
Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
ERROR 03-08 11:05:10,595 - stepname- Error initializing step [Update]
ERROR 03-08 11:05:10,595 - stepname - Step [Update.0] failed to initialize!
INFO 03-08 11:05:10,595 - stepname - Finished reading query, closing connection.
ERROR 03-08 11:05:10,596 - stepname - Unable to prepare for execution of the transformation
ERROR 03-08 11:05:10,596 - stepname - org.pentaho.di.core.exception.KettleException:
We failed to initialize at least one step. Execution can not begin!
Thanks
Is this a long running query by any chance? Or; in PDI world it can be because your step kicks off at the start of the transform, waits for something to do, and if nothing comes along by the net write timeout then you'll see this error.
If so your problem is caused by a timeout that MySQL uses and frequently needs increasing from the default which is 10 mins.
See here:
http://wiki.pentaho.com/display/EAI/MySQL

Backend error, when loading gzip csv

I got the "backend error. Job aborted" , job ID below.
I know this question was asked, but I still need some help to try & resolve this.
what happen if this happen in production,we want to have a 5min periodic loads?
thanks in advance
Errors:
Backend error. Job aborted.
Job ID: job_744a2b54b1a343e1974acdae889a7e5c
Start Time: 4:32pm, 30 Aug 2012
End Time: 5:02pm, 30 Aug 2012
Destination Table: XXXXXXXXXX
Source URI: gs://XXXXX/XXXXXX.csv.Z
Delimiter: ,
Max Bad Records: 99999999
This job hit an internal error. Since you ran this job, BigQuery has been updated to a new version, and a number of internal errors have been fixed. Can you retry your job?