Redshift drop/create/select query failing in Data Pipeline - sql

I'm trying to run a daily migration script in Redshift using Data Pipeline.
The script works as expected when I run it directly using SQL Workbench/J, but fails when triggered through Data Pipeline.
I have reproduced the problem with this simple code:
drop table if exists image_stg;
create table image_stg (like image_full);
select * from image_stg;
When I run it in Data Pipeline, I get this error:
[Amazon](500310) Invalid operation: relation "image_stg" does not exist;
I also got this error once, for the exact same code, without changing anything:
[Amazon](500310) Invalid operation: Relation with OID 108425 does not exist.;
Here's a screenshot of the two error messages:
I've found this thread on the AWS forums, but it didn't help: Pipeline started failing on simple Redshift SqlActivity and temp table
What is causing this error? Is there a workaround?

I've contacted Amazon, and it looks like a problem in Data Pipeline.
They did suggest a workaround that seems to work in my case: Change the JDBC connection string from jdbc:redshift://… to jdbc:postgresql://… .

I had the same problem when creating a temporary table in Redshift via Pipeline but the workaround of changing the connection string from jdbc:redshift://… to jdbc:postgresql://… didn't work for me though. My last resort is to create the table as physical table and drop it after use - through Pipeline.

Related

HIVE_METASTORE_ERROR persists after removing problematic column from schema

I am trying to query my CloudTrail logging bucket through the use of Athena. I already deployed a crawler into the bucket and managed to populate a few tables. When I tried running a simple "preview table" query, I get the following error:
HIVE_METASTORE_ERROR:
com.amazonaws.services.datacatalog.model.InvalidInputException: Error: : expected at the position 121 of 'struct<roleArn:string,roleSessionName:string,durationSeconds:int,keySpec:string,keyId:string,encryptionContext:struct<aws\:cloudtrail\:arn:string,aws\:s3\......
I narrowed down the column name in question and removed it completely from my schema.
After removing it from the schema in AWS Glue and rerunning the preview table query I still get the same error at the same position. I tried again in a different browser but I get the same error. How can this be, am I missing something?
Please provide any advice.
Thanks in advance!

apache spark sql table overwrite issue

I am using the below code to create a table from a dataframe in databricks and run into error.
df.write.saveAsTable("newtable")
This works fine the very first time but for re-usability if I were to rewrite like below
df.write.mode(SaveMode.Overwrite).saveAsTable("newtable")
I get the following error.
Error Message:
org.apache.spark.sql.AnalysisException: Can not create the managed table newtable. The associated location dbfs:/user/hive/warehouse/newtable already exists
The SQL config 'spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation' was removed in the version 3.0.0. It was removed to prevent loosing of users data for non-default value.
What are the differences between saveAsTable and insertInto in different SaveMode(s)?
Run following command to fix issue :
dbutils.fs.rm("dbfs:/user/hive/warehouse/newtable/", true)
Or
Set the flag
spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation = true
spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

BigQuery return Unknown error after create table name with '_ads` suffix

I try both API and GUI to create this empty table and they both failed.
I create many tables via API just fine but only this name organizes_ads has a problem.
Same create process and schema can create organizes_ads_0 but not organizes_ads.
If I try to get this table via API it will return.
{"error":{"code":-1,"message":"A network error occurred, and the request could not be completed."}}
I tend to use this name because it's a replicated table name from other source, so it will be weird if I have to hard code to use other name for workaround.
[UPDATE] I also found that any table name with suffix _ads will be broken (so nothing wrong with schema).
This error could be caused by an AdBlocker.
I created a table with _ads suffix and when enabled the AdBlocker I got the same error: Unknown error response from the server.

Error while reviewing file after inserting data in redshift table

I have a table in Redshift in which I am inserting data from S3.
I viewed the table before inserting the data and it returned a blank table.
However, After inserting data in Redshift table, I am getting below error while doing select * from table.
Command to copy data in table from S3 runs successfully without any error.
java.lang.NoClassDefFoundError:
com/amazon/jdbc/utils/DataTypeUtilities$NumericRepresentation error in
redshift
what could be the possible cause and sol for this?
I have faced this error : java.lang.NoClassDefFoundError when the JDBC connection properties are set incorrectly.
If you are using postgres driver then ensure using postgres://
eg : jdbc:postgresql:// HostName:5439/
Let me know if this works.

Stale BigQuery table after load job

I've ran into a situation where a BigQuery table has become stale. I can't even run a count query on it. This occurred right after I ran the first load job.
For each query I run I get an error:
Error: Unexpected. Please try again.
See for example Job IDs: job_OnkmhMzDeGpAQvG4VLEmCO-IzoY, job_y0tHM-Zjy1QSZ84Ek_3BxJ7Zg7U
The error is "illegal field name". It looks like the field 69860107_VID is causing it. BigQuery doesn't support column rename, so if you want to change the schema you'll need to recreate the table.
I've filed a bug to fix the internal error -- this should have been blocked when the table was created.