Pydrill Heapmemory Issue while Executing Spark submit commnads

Pydrill Heapmemory Issue while Executing Spark submit commnads - apache-spark-sql

Exception :
[MainThread ] [ERROR] : Message : TransportError(500, '{\n "errorMessage" : "RESOURCE ERROR: There is not enough heap memory to run this query using the web interface. \n\nPlease try a query with fewer columns or with a filter or limit condition to limit the data returned. \nYou can also try an ODBC/JDBC client. \n\n[Error Id: 8dc824fc-b1b5-4352-85fe-84e2eb5ff71d on

Related

BigQuery generated table from gSheet is unaccessible through JDBC driver

I'm trying to connect BigQuery to a BI Tool (Cognos Analytics) through a JDBC driver.
It all works OK, until I try to read a gSheet generated table. Then I get the following error:
Data source adapter error: java.sql.SQLException: [Simba][BigQueryJDBCDriver](100033) Error getting job status.
[Simba][BigQueryJDBCDriver](100033) Error getting job status.
[Simba][BigQueryJDBCDriver](100033) Error getting job status.
400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Error while reading table: <GSHEET-TABLE>, error message: Failed to read the spreadsheet. Error code: PERMISSION_DENIED",
"reason" : "invalid"
} ],
"message" : "Error while reading table: <GSHEET-TABLE>, error message: Failed to read the spreadsheet. Error code: PERMISSION_DENIED",
"status" : "INVALID_ARGUMENT"
}
My connection string is the following:
jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<MY-PROJECT-ID>;OAuthType=0;OAuthServiceAcctEmail=<CLIENT-EMAIL-FROM-JSON>;OAuthPvtKeyPath=<PATH-TO-JSON>;Timeout=60; RequestGoogleDriveScope=1;
The Service Account has full Admin ownership of the entire project.
Does somebody know how to access gSheets generated tables?

How do I get more CPU seconds allocated in BigQuery when using the java API

I get the following error :
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Query exceeded resource limits. 6326.429689712539 CPU seconds were used, and this query must use less than 5100.0 CPU seconds.",
"reason" : "billingTierLimitExceeded"
} ],
"message" : "Query exceeded resource limits. 6326.429689712539 CPU seconds were used, and this query must use less than 5100.0 CPU seconds.",
"status" : "INVALID_ARGUMENT"
}
when I attempt to run a query in bigquery using the java API. I cannot find how I raise this limit. How do I get more cpu seconds allocated? How do I change my billing Tier for BigQuery only?

Dataflow insert into BigQuery fails with large number of files for asia-northeast1 location

I am using Cloud Storage Text to BigQuery template on Cloud Composer.
The template is kicked from Python google api client.
The same program
works fine in US location (for Dataflow and BigQuery).
fails in asia-northeast1 location.
works fine with the fewer (less than 10000) input files in asia-northeast location.
Does anybody have an idea about this?
I want to execute in the asia-northeast location for business reason.
More details about failure:
The program worked until "ReifyRenameInput", and the failed .
dataflow job failed
with the error message below:
java.io.IOException: Unable to insert job: beam_load_textiotobigquerydataflow0releaser0806214711ca282fc3_8fca2422ccd74649b984a625f246295c_2a18c21953c26c4d4da2f8f0850da0d2_00000-0, aborting after 9 .
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startJob(BigQueryServicesImpl.java:231)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startJob(BigQueryServicesImpl.java:202)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.startCopyJob(BigQueryServicesImpl.java:196)
at org.apache.beam.sdk.io.gcp.bigquery.WriteRename.copy(WriteRename.java:144)
at org.apache.beam.sdk.io.gcp.bigquery.WriteRename.writeRename(WriteRename.java:107)
at org.apache.beam.sdk.io.gcp.bigquery.WriteRename.processElement(WriteRename.java:80)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException:
404 Not Found { "code" : 404, "errors" : [ { "domain" : "global", "message" : "Not found: Dataset pj:datasetname", "reason" : "notFound" } ], "message" : "Not found: Dataset pj:datasetname" }
(pj and dataset name are not real name, and they are project name and dataset name for outputTable parameter)
Although the error message said the dataset is not found, the dataset surely existed.
Moreover, some new tables which seems to be tempory tables were created in the dataset after the program.

This is a known issue related to your Beam SDK version according to this public issue tracker. The Beam 2.5.0 SDK version doesn't have this issue.

com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found

When using a Talend bigquery input component (BQ java api) to read from bigquery, I get the following error (for a long running job) -
Exception in component tBigQueryInput_4
com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "Not found: Table rand-cap:_f000fcf374688fc5e7da50a4c0c04ba228d993c3.anon0849eba05949a62962f218a0433d6ee82bf13a7b",
"reason" : "notFound"
} ],
"message" : "Not found: Table rand-cap:_f000fcf374688fc5e7da50a4c0c04ba228d993c3.anon0849eba05949a62962f218a0433d6ee82bf13a7b"
}
Is this because of the "temporary" table that bq creates when querying results not being available after 24hrs. Or is it because rate limit was exceeded since I am querying a large table ?
In either case, how can I find more details on this error and what steps should I take to prevent this ?
Thank you !

This seems to be a problem in Talend, there are other users describing your issue: https://www.talendforge.org/forum/viewtopic.php?id=44734
Google Bigquery has a property i.e. Allowlargeresults but its not there in TBigqueryinput.
Hi there - I am currently using Talend open studio v6.1.1 and this issue still exists.

Frequent 503 errors raised from BigQuery Streaming API

Streaming data into BigQuery keeps failing due to the following error, which occurs more frequently recently:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 503 Service Unavailable
{
"code" : 503,
"errors" : [ {
"domain" : "global",
"message" : "Connection error. Please try again.",
"reason" : "backendError"
} ],
"message" : "Connection error. Please try again."
}
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:312)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1049)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:410)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:343)
Relevant question references:
Getting high rate of 503 errors with BigQuery Streaming API
BigQuery - BackEnd error when loading from JAVA API

We (the BigQuery team) are looking into your report of increased connection errors. From the internal monitoring, there hasn't been global a spike in connection errors in the last several days. However, that doesn't mean that your tables, specifically, weren't affected.
Connection errors can be tricky to chase down, because they can be caused by errors before they get to the BigQuery servers or after they leave. The more information your can provide, the easier it is for us to diagnose the issue.
The best practice for streaming input is to handle temporary errors like this to retry the request. It can be a little tricky, since when you get a connection error you don't actually know whether the insert succeeded. If you include a unique insertId with your data (see the documentation here), you can safely resend the request (within the deduplication window period, which I think is 15 minutes) without worrying that the same row will get added multiple times.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas