I'm using jobs from AWS Glue for very fist time, so it is normal that my job does not work but I can't see any detail log about what is wrong, because when I click in "Error Logs" link, or in "Logs" link I always get this message in AWS CloudWatch:
* Log group does not exist
The specific log group: /aws-glue/jobs/error does not exist in this account or region.
* An error occurred while describing log streams.
c.substring is not a function
How can I see AWS Glue Logs? Doesn't AWS Glue create the log group automatically?
Check if your IAM role has AWSGlueServiceRole policy attached to it or not, once you have verified a log is placed at the right position before the failure occurs.
Related
I was following this step by step to connect data from BQ to AWS Glue and store in S3 , everything works ok until i tried to run the job, where the job keeps failing with:
An error occurred while calling o76.getSource. The type of table {datasetName}.{table_name} is currently not supported: EXTERNAL
I can't seem to find any similar error online, also can't find further helpful info from the log, it seems that it's stuck at the issue with the BQ table, I was following exactly as what the author did here in the blog with the key-value pair to indicate project ID and dataset/table (image refers to blog's author table name).
Does anybody know what's causing this?
I just started a new project on Google Cloud, set up some bigquery datasets and tables. I now want to set up some scheduled queries. I have already enabled BigQuery Data Transfer API. My query is valid (it's just SELECT * FROM table). I can't find anything about this error online.
See screenshot
UPDATE: I've experimented a bit and it seems to be an organization wide issue. All projects, new and old within my organization get this same error when trying to schedule a query. I tried for a project in a different organization and did not have the issue. What could be causing this error for ALL projects in an organization?
UPDATE 2:
By querying a table that is not empty the error change to "Error creating scheduled query: Yn" instead of "Error creating scheduled query: er" (when the scheduled query would have queried an empty table).
I faced the same issue than you, and basically I just needed to run the query first before creating the the scheduled query... And that did the trick.
from the BQ FAQs :
"Scheduled queries use features of BigQuery Data Transfer Service. Verify that you have completed all actions required in Enabling BigQuery Data Transfer Service."
basically, what this means is that you need to enable the data transfer api in your project, AND give the user who creates the scheduled query a BQ admin role in order to have the right permissions to access that transfer service.
If done right, you should get a popup when creating the scheduled query to confirm that the data transfer service has access to your uses account (if you block popups you might not see this message and get stuck)
If this error only occurs in your organisation, I believe it might be caused by a organisation policy on Google Cloud. I would encourage you to double check if there is any org policy causing this error. If that's not the case, open a support ticket with GCP.
What worked for me was signing in through Incognito Mode with just my account and attempting to save the scheduled query. I have multiple Google Accounts signed it at one time and for whatever reason, BigQuery throws this generic error after authorization is successful and BigQuery is granted the access it requested.
You need to make sure that you are creating the query under the project targeted not in any other projects because it won't appear
Also you need to enable the API as one of the above answers
This eventually worked for me when i ran this in an cognito window
After the end of Apache Beam (Google Cloud Dataflow 2.0) job, we get a readymade command at the end of logs bq show -j --format=prettyjson --project_id=<My_Project_Id> 00005d2469488547749b5129ce3_0ca7fde2f9d59ad7182953e94de8aa83_00001-0 which can be run from the Google Cloud SDK command prompt.
Basically it shows all the information like Job start time, end time, number of bad records, number of records inserted,etc.
I can see these information on the Cloud SDK console but where these information is stored?
I checked in the stackdriver logs, it has the data till previous day and even not the complete information which is shown on the Cloud SDK console.
If I want to export these information and load into the BigQuery, where can I get it.
Update : This is possible and I found the information when I added filter resource.type="bigquery_resource" in the Stackdriver logs viewer but It shows Timestamp information like CreateTime, StartTime and EndTime as 1970-01-01T00:00:00Z
You can export these logs into google cloud bucket. From stackdriver click on create export and then create sink providing sink name and sink destination which is bucket path obviously. Now next time when job get started then all the logs get exported and you can use those logs further.
I am trying to export a table from big query to google cloud storage from console/command line. The console job runs for a few minutes and errors out without any error code and the command line job also after running for sometime gives the below error:
BigQuery error in extract operation: Error processing job 'data-flow-experiment:bqjob_r308ff0f73d1820a6_00000157f77e8ab9_1': Backend error. Job aborted.
Job id of the command line is given above.
The billing is enabled for the project and the big query service is also enabled.
Also I get the below error when I try to create a bucket in the Google Cloud Storage:
AccessDeniedException: 403 The account for the specified project is read only.
Though the IAM user I am using has owner access and I have created buckets using this account previously and have also extracted tables in the past.
Please guide.
For the bigquery issue:
Do you happen to have a timestamp column which have out-of-range values (say, far far far into the future)?
If so, you can just wait for two more days, as the fix is
I am trying to use the bq command line tool to load data into BigQuery from GCS bucket and I receive the following error message:
BigQuery error in load operation: Access Denied: Job mythical-maxim-293:bqjob_r11765e0cd9ceb52b_000001427694f0e1_1: RUN_JOB
I was using service account (with private key) for authentication. I followed the following links for granting the service account access level:
https://developers.google.com/bigquery/loading-data-into-bigquery
https://developers.google.com/bigquery/access-control
The service account email was granted WRITE access with the BigQuery dataset and READ access with the GCS bucket.
Note: Adding the service account email as a writer to the project, solved the issue but this is not feasible for my case. I am not allowed to request project level write access but BigQuery and GCS (readonly).
Thanks!
In order to run the job, the service account must be given at least READ permissions on the project. This is because whoever runs a job in the project can do things that cost the project owner money (e.g. run queries).
To add the service account to the project, go to https://cloud.google.com/console, then click on "Permissions", then "Add member".
You must provide the WRITE permission on the dataset.
https://cloud.google.com/bigquery/loading-data-into-bigquery#access
This is bad, as WRITE permission imply that you have READ permission. But, for bigquery READ is paid and Load is free. For doing free task, access to paid service should be necessary.
Google must correct this.