Error: while Importing data into Redshift - sql

I wanted to unload from one database(production) and reload into another database(QA) in Redshift having exact same schema.
I issued S3 load command as following.
copy table(col1,col2,col3,col4) from 's3://<bucket_path>/<file_name>.gzip' CREDENTIALS 'aws_access_key_id=<your_key>;aws_secret_access_key=<your_secret>' delimiter '|' gzip NULL AS 'null_string';
Got following error.
ERROR: Failed writing body (0 != XXX) Cause: Failed to inflateinvalid or incomplete deflate data. zlib error code: -3
error: Failed writing body (0 != XXX) Cause: Failed to inflateinvalid or incomplete deflate data. zlib error code: -3
code: 9001
context: S3 key being read : s3://<some_s3_bucket>/<some_s3_bucket_file>
query: XXXXX
location: table_s3_scanner.cpp:355
process: query1_23 [pid=2008]
-----------------------------------------------

This happens when you are trying to use Gzip file during copy and it cannot read the file as a Gzip.
If I’ve made a bad assumption please comment and I’ll refocus my answer.

Related

How to Validate a Non-JSON response body using Karate (2)

This is in continuation to How to validate Non-JSON response body using Karate.
Details: When the API post call is made, if the employee is already available in the DB, then an error response is thrown as follows in the response body, which is not of Json/String format:
{"error":{"text":SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'NewEmp' for key 'employee_name_unique'}}
My aim is to validate the above if the error response is thrown as expected.
I tried the solution provided in How to validate Non-JSON response body using Karate, but it did not work as expected. Below are the details:
I do not understand how to use the * provided in the solution of my previous question. Could you please explain how to use the *
Karate Feature:
Scenario: Testing non-string response
Given url 'dummy.restapiexample.com/api/v1/create'
And request {"name":"PutTest8","salary":"123","age":"23"}
When method POST
Then status 200
* string temp = response
And match temp contains 'error'
The above is throwing an error as follows --
line 20:4 mismatched input '*' expecting <EOF>
17:43:46.230 [main] ERROR com.intuit.karate.core.FeatureParser - syntax error: mismatched input '*' expecting <EOF>
17:43:46.235 [main] ERROR com.intuit.karate.core.FeatureParser - not a valid feature file: src/test/java/learnKarate/postcall.feature - mismatched input '*' expecting <EOF>
NOTE: I also tried to 'assert' the response - which failed, too, with the below error.
Then assert $ contains 'error'
Error:
com.intuit.karate.exception.KarateException: postcall.feature:29 - javascript evaluation failed: $ contains 'error', <eval>:1:2 Expected ; but found contains
$ contains 'error'
^ in <eval> at line number 1 at column number 2
at ✽.Then assert $ contains 'error' (postcall.feature:29)
There is something seriously wrong in your example or your environment. The * is just a replacement for Given When etc. For example paste this into a new Scenario, and this works for me:
* def response = 'error'
* string temp = response
And match temp contains 'error'
Since you seem to be stuck, it is time for you to follow this process: https://github.com/intuit/karate/wiki/How-to-Submit-an-Issue
All the best !

How to construct S3 URL for copying to Redshift?

I am trying to import a CSV file into a Redshift cluster. I have successfully completed the example in the Redshift documentation. Now I am trying to COPY from my own CSV file.
This is my command:
copy frontend_chemical from 's3://awssampledb/mybucket/myfile.CSV'
credentials 'aws_access_key_id=xxxxx;aws_secret_access_key=xxxxx'
delimiter ',';
This is the error I see:
An error occurred when executing the SQL command:
copy frontend_chemical from 's3://awssampledb/mybucket/myfile.CSV'
credentials 'aws_access_key_id=XXXX...'
[Amazon](500310) Invalid operation: The specified S3 prefix 'mybucket/myfile.CSV' does not exist
Details:
-----------------------------------------------
error: The specified S3 prefix 'mybucket/myfile.CSV' does not exist
code: 8001
context:
query: 3573
location: s3_utility.cpp:539
process: padbmaster [pid=2432]
-----------------------------------------------;
Execution time: 0.7s
1 statement failed.
I think I'm constructing the S3 URL wrong, but how should I do it?
My Redshift cluster is in the US East (N Virginia) region.
The Amazon Redshift COPY command can be used to load multiple files in parallel.
For example:
Bucket = mybucket
The files are in the bucket under the path data
Then refer to the contents as:
s3://mybucket/data
For example:
COPY frontend_chemical
FROM 's3://mybucket/data'
CREDENTIALS 'aws_access_key_id=xxxxx;aws_secret_access_key=xxxxx'
DELIMITER ',';
This will load all files within the data directory. You can also refer to a specific file by including it in the path, eg s3://mybucket/data/file.csv

Hi , Google big query - bq fail load display file number how to get the file name

I'm running the following bq command
bq load --source_format=CSV --skip_leading_rows=1 --max_bad_records=1000 --replace raw_data.order_20150131 gs://raw-data/order/order/2050131/* order.json
and
getting the following message when loading data into bq .
*************************************
Waiting on bqjob_r4ca10491_0000014ce70963aa_1 ... (412s) Current status: DONE
BigQuery error in load operation: Error processing job
'orders:bqjob_r4ca10491_0000014ce70963aa_1': Too few columns: expected
11 column(s) but got 1 column(s). For additional help: http://goo.gl/RWuPQ
Failure details:
- File: 844 / Line:1: Too few columns: expected 11 column(s) but got
1 column(s). For additional help: http://goo.gl/RWuPQ
**********************************
The message display only the file number .
checked the files content most of them are good .
gsutil ls and the cloud console on the other hand display file names .
how can I know which file is it according to the file number?
There seems to be some weird spacing introduced in the question, but if the desired path to ingest is "/order.json" - that won't work: You can only use "" at the end of the path when ingesting data to BigQuery.

FAILED: Error in metadata:

when i am trying to show tables from hive databases the following error displays..
i granted permissions to ware house & Tables even though the error shows
hive> show tables;
FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Thanks in advance.
this error occurs when hive CLI is terminated improperly.
solution:
exit from hive, give 'jps' command. some process named runjar will be there. kill it using ' kill -9 pid'
thats it. you are done.
plz ignore typo- replied from mobile

Unexpected error while loading data

I am getting an "Unexpected" error. I tried a few times, and I still could not load the data. Is there any other way to load data?
gs://log_data/r_mini_raw_20120510.txt.gzto567402616005:myv.may10c
Errors:
Unexpected. Please try again.
Job ID: job_4bde60f1c13743ddabd3be2de9d6b511
Start Time: 1:48pm, 12 May 2012
End Time: 1:51pm, 12 May 2012
Destination Table: 567402616005:myvserv.may10c
Source URI: gs://log_data/r_mini_raw_20120510.txt.gz
Delimiter: ^
Max Bad Records: 30000
Schema:
zoneid: STRING
creativeid: STRING
ip: STRING
update:
I am using the file that can be found here:
http://saraswaticlasses.net/bad.csv.zip
bq load -F '^' --max_bad_record=30000 mycompany.abc bad.csv id:STRING,ceid:STRING,ip:STRING,cb:STRING,country:STRING,telco_name:STRING,date_time:STRING,secondary:STRING,mn:STRING,sf:STRING,uuid:STRING,ua:STRING,brand:STRING,model:STRING,os:STRING,osversion:STRING,sh:STRING,sw:STRING,proxy:STRING,ah:STRING,callback:STRING
I am getting an error "BigQuery error in load operation: Unexpected. Please try again."
The same file works from Ubuntu while it does not work from CentOS 5.4 (Final)
Does the OS encoding need to be checked?
The file you uploaded has an unterminated quote. Can you delete that line and try again? I've filed an internal bigquery bug to be able to handle this case more gracefully.
$grep '"' bad.csv
3000^0^1.202.218.8^2f1f1491^CN^others^2012-05-02 20:35:00^^^^^"Mozilla/5.0^generic web browser^^^^^^^^
When I run a load from my workstation (Ubuntu), I get a warning about the line in question. Note that if you were using a larger file, you would not see this warning, instead you'd just get a failure.
$bq show --format=prettyjson -j job_e1d8636e225a4d5f81becf84019e7484
...
"status": {
"errors": [
{
"location": "Line:29057 / Field:12",
"message": "Missing close double quote (\") character: field starts with: <Mozilla/>",
"reason": "invalid"
}
]
My suspicion is that you have rows or fields in your input data that exceed the 64 KB limit. Perhaps re-check the formatting of your data, check that it is gzipped properly, and if all else fails, try importing uncompressed data. (One possibility is that the entire compressed file is being interpreted as a single row/field that exceeds the aforementioned limit.)
To answer your original question, there are a few other ways to import data: you could upload directly from your local machine using the command-line tool or the web UI, or you could use the raw API. However, all of these mechanisms (including the Google Storage import that you used) funnel through the same CSV parser, so it's possible that they'll all fail in the same way.