HIVE_METASTORE_ERROR persists after removing problematic column from schema - amazon-s3

I am trying to query my CloudTrail logging bucket through the use of Athena. I already deployed a crawler into the bucket and managed to populate a few tables. When I tried running a simple "preview table" query, I get the following error:
HIVE_METASTORE_ERROR:
com.amazonaws.services.datacatalog.model.InvalidInputException: Error: : expected at the position 121 of 'struct<roleArn:string,roleSessionName:string,durationSeconds:int,keySpec:string,keyId:string,encryptionContext:struct<aws\:cloudtrail\:arn:string,aws\:s3\......
I narrowed down the column name in question and removed it completely from my schema.
After removing it from the schema in AWS Glue and rerunning the preview table query I still get the same error at the same position. I tried again in a different browser but I get the same error. How can this be, am I missing something?
Please provide any advice.
Thanks in advance!

Related

Why am I getting an error when scheduling a query on Google BigQuery?

When trying to schedule a query in BQ, I am getting the following error:
Error code 3 : Query error: Not found: Dataset was not found in location EU at [2:1]
Is this a permissions issue?
This sounds like a case of the scheduled query being configured to run in a different region than either the referenced tables, or the destination table of the query.
Put another way, BigQuery requires a consistent location for reading and writing, and does not allow a query in location A to write results in location B.
https://cloud.google.com/bigquery/docs/scheduling-queries has some additional information about this.

When I run snowflake stage query I get aws error

I've created an s3 linked stage on snowflake called csv_stage with my aws credentials, and the creation was successful.
Now I'm trying to query the stage like below
select t.$1, t.$2 from #sandbox_ra.public.csv_stage/my_file.csv t
However the error I'm getting is
Failure using stage area. Cause: [The AWS Access Key Id you provided is not valid.]
Any idea why? Do I have to pass something in the query itself?
Thanks for your help!
Ultimately let's say my s3 location has 3 different csv files. I would like to load each one of them individually to different snowflake tables. What's the best way to go about doing this?
Regarding the last part of your question: You can load multiple files with one COPY INTO-command by using the file names or a certain regex-pattern. But as you have 3 different files for 3 different tables you also have to use three different COPY INTO-commands.
Regarding querying your stage you can find some more hints in these questions:
Missing List-permissions on AWS - Snowflake - Failure using stage area. Cause: [The AWS Access Key Id you provided is not valid.] and
https://community.snowflake.com/s/question/0D50Z00008EKjkpSAD/failure-using-stage-area-cause-access-denied-status-code-403-error-code-accessdeniedhow-to-resolve-this-error
https://aws.amazon.com/de/premiumsupport/knowledge-center/access-key-does-not-exist/
I found out the aws credential I provided was not right. After fixing that, query worked.
This approach works to import data from S3 into a snowgflake Table from a public S3 bucket:
COPY INTO SNOW_SCHEMA.table_name FROM 's3://test-public/new/solution/file.csv'

error loading table on bigquery dashboard but queries works fine

I clicked a table on bigquery dashboard, got this error:
However, I can get data when I do a select on this table. (That means the table does exist)
I already have the highest admin privilege so it shouldn't be a permission issue.
I created this table with python script, which collects data, writes into a csv file, and upload the csv file to bigquery everyday. After I created the table I once changed the schema both in the script and on the dashboard. Not sure if that's the cause, but the table loading error occurred several days after I changed the schema.
If you have Addblock extensions, this might be the root cause of this issue. Thus, try disabling it, then try running your query again.
Hope it helps.

Redshift drop/create/select query failing in Data Pipeline

I'm trying to run a daily migration script in Redshift using Data Pipeline.
The script works as expected when I run it directly using SQL Workbench/J, but fails when triggered through Data Pipeline.
I have reproduced the problem with this simple code:
drop table if exists image_stg;
create table image_stg (like image_full);
select * from image_stg;
When I run it in Data Pipeline, I get this error:
[Amazon](500310) Invalid operation: relation "image_stg" does not exist;
I also got this error once, for the exact same code, without changing anything:
[Amazon](500310) Invalid operation: Relation with OID 108425 does not exist.;
Here's a screenshot of the two error messages:
I've found this thread on the AWS forums, but it didn't help: Pipeline started failing on simple Redshift SqlActivity and temp table
What is causing this error? Is there a workaround?
I've contacted Amazon, and it looks like a problem in Data Pipeline.
They did suggest a workaround that seems to work in my case: Change the JDBC connection string from jdbc:redshift://… to jdbc:postgresql://… .
I had the same problem when creating a temporary table in Redshift via Pipeline but the workaround of changing the connection string from jdbc:redshift://… to jdbc:postgresql://… didn't work for me though. My last resort is to create the table as physical table and drop it after use - through Pipeline.

Stale BigQuery table after load job

I've ran into a situation where a BigQuery table has become stale. I can't even run a count query on it. This occurred right after I ran the first load job.
For each query I run I get an error:
Error: Unexpected. Please try again.
See for example Job IDs: job_OnkmhMzDeGpAQvG4VLEmCO-IzoY, job_y0tHM-Zjy1QSZ84Ek_3BxJ7Zg7U
The error is "illegal field name". It looks like the field 69860107_VID is causing it. BigQuery doesn't support column rename, so if you want to change the schema you'll need to recreate the table.
I've filed a bug to fix the internal error -- this should have been blocked when the table was created.