Error while reviewing file after inserting data in redshift table - amazon-s3

I have a table in Redshift in which I am inserting data from S3.
I viewed the table before inserting the data and it returned a blank table.
However, After inserting data in Redshift table, I am getting below error while doing select * from table.
Command to copy data in table from S3 runs successfully without any error.
java.lang.NoClassDefFoundError:
com/amazon/jdbc/utils/DataTypeUtilities$NumericRepresentation error in
redshift
what could be the possible cause and sol for this?

I have faced this error : java.lang.NoClassDefFoundError when the JDBC connection properties are set incorrectly.
If you are using postgres driver then ensure using postgres://
eg : jdbc:postgresql:// HostName:5439/
Let me know if this works.

Related

Truncating tables in Azure Data Factory Pre-Copy script?

I am building a pipeline, and now I need to truncate my destination tables in azure sql db, but before that I need to truncate the destination tables. but I can't figure out the script:
Click to view the ADF screenshot for SINK settings
instead, I put this code but that is wrong because it runs before every copy of the tables (5 times) and truncates all the table except the last one. so I need to make it parameterized I guess:
*truncate table [dbo].[Global_data.csv]
truncate table [dbo].[Option_data.csv]
truncate table [dbo].[State_data.csv]
truncate table [dbo].[Status_data.csv]
truncate table [dbo].[Target_data.csv]*
Also please see my source parameters:
**ADLSv2 container: #pipeline().parameters.SourceContainer
ADLSv2 Directory: #pipeline().parameters.SourceDirectory
ADLSv2 filename: #item().name
Sink TableName: #item().name**
So I'm guessing that my pre-script must be something like:
truncate table #item().name but this resulted an error for me:
Error Screenshot
DetailsErrorCode= SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed with the following error: 'Incorrect syntax near '#item'.',Source=,''Type=System.Data.SqlClient.SqlException,Message=Incorrect syntax near '#item'.,Source=.Net SqlClient Data Provider,SqlErrorNumber=102,Class=15,ErrorCode=-2146232060,State=1,Errors=[{Class=15,Number=102,State=1,Message=Incorrect syntax near '#item'.,},],'
when I use TRUNCATE TABLE [#{item()}] , I get below error 5 times (one for each table accordingly):
ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed with the following error: 'Cannot find the object "{"name":"StateMetadata.csv","type":"File"}" because it does not exist or you do not have permissions.',Source=,''Type=System.Data.SqlClient.SqlException,Message=Cannot find the object "{"name":"StateMetadata.csv","type":"File"}" because it does not exist or you do not have permissions.,Source=.Net SqlClient Data Provider,SqlErrorNumber=4701,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=4701,State=1,Message=Cannot find the object "{"name":"StateMetadata.csv","type":"File"}" because it does not exist or you do not have permissions.,},],'
Please use truncate table TRUNCATE TABLE [#{item().name}]

When creating a BigQuery table I'm getting an error message about hive partitioning

I'm creating a table from a CSV text file on my machine. I'm getting this message - Hive partitioned loads require that the HivePartitioningOptions be set, with both a sourceUriPrefix and mode provided. No sourceUriPrefix was provided.
How can I fix this?

Presto failed: com.facebook.presto.spi.type.VarcharType

I created a table with three columns - id, name, position,
then I stored the data into s3 using orc format using spark.
When I query select * from person it returns everything.
But when I query from presto, I get this error:
Query 20180919_151814_00019_33f5d failed: com.facebook.presto.spi.type.VarcharType
I have found the answer for the problem, when I stored the data in s3, the data inside the file was with one more column that was not defined in the hive table metastore.
So when Presto tried to query the data, it found that there are varchar instead of integer.
This also might happen if one record has a a type different than what is defined in the metastore.
I had to delete my data and import it again without that extra unneeded column

Redshift drop/create/select query failing in Data Pipeline

I'm trying to run a daily migration script in Redshift using Data Pipeline.
The script works as expected when I run it directly using SQL Workbench/J, but fails when triggered through Data Pipeline.
I have reproduced the problem with this simple code:
drop table if exists image_stg;
create table image_stg (like image_full);
select * from image_stg;
When I run it in Data Pipeline, I get this error:
[Amazon](500310) Invalid operation: relation "image_stg" does not exist;
I also got this error once, for the exact same code, without changing anything:
[Amazon](500310) Invalid operation: Relation with OID 108425 does not exist.;
Here's a screenshot of the two error messages:
I've found this thread on the AWS forums, but it didn't help: Pipeline started failing on simple Redshift SqlActivity and temp table
What is causing this error? Is there a workaround?
I've contacted Amazon, and it looks like a problem in Data Pipeline.
They did suggest a workaround that seems to work in my case: Change the JDBC connection string from jdbc:redshift://… to jdbc:postgresql://… .
I had the same problem when creating a temporary table in Redshift via Pipeline but the workaround of changing the connection string from jdbc:redshift://… to jdbc:postgresql://… didn't work for me though. My last resort is to create the table as physical table and drop it after use - through Pipeline.

Stale BigQuery table after load job

I've ran into a situation where a BigQuery table has become stale. I can't even run a count query on it. This occurred right after I ran the first load job.
For each query I run I get an error:
Error: Unexpected. Please try again.
See for example Job IDs: job_OnkmhMzDeGpAQvG4VLEmCO-IzoY, job_y0tHM-Zjy1QSZ84Ek_3BxJ7Zg7U
The error is "illegal field name". It looks like the field 69860107_VID is causing it. BigQuery doesn't support column rename, so if you want to change the schema you'll need to recreate the table.
I've filed a bug to fix the internal error -- this should have been blocked when the table was created.