This is for an approach that WRITES to a local file.
I am using SQL WorkBench and I'm connected to an AWS Redshift instance (which uses postgresql). I would like to run the query and have data exported from AWS Redshift to a local csv or text file. I have tried:
SELECT transaction_date ,
Variable 1 ,
Variable 2 ,
Variable 3 ,
Variable 4 ,
Variable 5
From xyz
into OUTFILE 'C:/filename.csv'
But I get the following error:
ERROR: syntax error at or near "'C:/filename.csv'"
Position: 148
into OUTFILE 'C:/filename.csv'
Related
I have 4 csv files each having 500,000 rows. I am trying to import the csv data into my Exasol databse, but there is an error with the date column and I have a problem with the first unwanted column in the files.
Here is an example CSV file:
unnamed:0 , time, lat, lon, nobs_cloud_day
0, 2006-03-30, 24.125, -119.375, 22.0
1, 2006-03-30, 24.125, -119.125, 25.0
The table I created to import csv to is
CREATE TABLE cloud_coverage_CONUS (
index_cloud DECIMAL(10,0)
,"time" DATE -- PRIMARY KEY
,lat DECIMAL(10,6)
,lon DECIMAL(10,6)
,nobs_cloud_day DECIMAL (3,1)
)
The command to import is
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv';
But I get this error:
SQL Error [42636]: java.sql.SQLException: ETL-3050: [Column=0 Row=0] [Transformation of value='Unnamed: 0' failed - invalid character value for cast; Value: 'Unnamed: 0'] (Session: 1750854753345597339) while executing '/* add path to the 4 csv files, that are in the cloud database folder*/ IMPORT INTO cloud_coverage_CONUS FROM CSV AT 'https://27.1.0.10:59205' FILE 'e12a96a6-a98f-4c0a-963a-e5dad7319fd5' ;'; 04509 java.sql.SQLException: java.net.SocketException: Connection reset by peer: socket write error
Alternatively I use this table (without the first column):
CREATE TABLE cloud_coverage_CONUS (
"time" DATE -- PRIMARY KEY
,lat DECIMAL(10,6)
,lon DECIMAL(10,6)
,nobs_cloud_day DECIMAL (3,1)
)
And use this import code:
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv'(2 FORMAT='YYYY-MM-DD', 3 .. 5);
But I still get this error:
SQL Error [42636]: java.sql.SQLException: ETL-3052: [Column=0 Row=0] [Transformation of value='time' failed - invalid value for YYYY format token; Value: 'time' Format: 'YYYY-MM-DD'] (Session: 1750854753345597339) while executing '/* add path to the 4 csv files, that are in the cloud database folder*/ IMPORT INTO cloud_coverage_CONUS FROM CSV AT 'https://27.1.0.10:60350' FILE '22c64219-cd10-4c35-9e81-018d20146222' (2 FORMAT='YYYY-MM-DD', 3 .. 5);'; 04509 java.sql.SQLException: java.net.SocketException: Connection reset by peer: socket write error
(I actually do want to ignore the first column in the files.)
How can I solve this issue?
Solution:
IMPORT INTO cloud_coverage_CONUS FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv' (2 .. 5) ROW SEPARATOR = 'CRLF' COLUMN SEPARATOR = ',' SKIP = 1;
I did not realise that mysql is different from exasol
Looking at the first error message, a few things stand out. First we see this:
[Column=0 Row=0]
This tells us the problem is with the very first value in the file. This brings us to the next thing, where the message even tells us what value was read:
Transformation of value='Unnamed: 0' failed
So it's failing to convert Unnamed: 0. You also provided the table definition, where we see the first column in the table is a decimal type.
This makes sense. Unnamed: 0 is not a decimal. For this to work, the CSV data MUST align with the data types for the columns in the table.
But we also see this looks like a header row. Assuming everything else matches we can fix it by telling the database to skip this first row. I'm not familiar with Exasol, but according to the documentation I believe the correct code will look like this:
IMPORT INTO cloud_coverage_CONUS
FROM LOCAL CSV FILE 'D:\uni\BI\project 1\AOL_DB_ANALYSIS_TASK1\datasets\cloud\cfc_us_part0.csv'
(2 FORMAT='YYYY-MM-DD', 3 .. 5)
ROW SEPARATOR = 'CRLF'
COLUMN SEPARATOR = ','
SKIP = 1;
I'm using Azure Synapse/SQL Pools/Data Warehouse (insert any other brand names I may have missed!) to load data from Azure blob store.
I'm doing this via an external table using polybase.
I want to capture the source file for each row of data.
I've tried to test using OPENROWSET, but this does not appear to work
SELECT
*,
x.filename() AS [filename]
FROM
OPENROWSET(
WITH (
DATA_SOURCE = [Analytics_AzureStorage],
LOCATION = N'2022/06/21',
FILE_FORMAT = [CompressedTSV]
)
) x
Msg 103010, Level 16, State 1, Line 1
Parse error at line: 5, column: 5: Incorrect syntax near 'OPENROWSET'.
How can I load the filename to a table in the Azure Warehouse Synapse Pool?
Edit:
The OPENROWSET function is not supported in dedicated SQL pool.
which explains why it does not work, is there a COPY/Polybase equivalent command for getting the file name?
your syntax is wrong. WITH should come later.
Have a look at the syntax here: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-openrowset
OPENROWSET
( { BULK 'unstructured_data_path' , [DATA_SOURCE = <data source name>, ]
FORMAT= ['PARQUET' | 'DELTA'] }
)
[WITH ( {'column_name' 'column_type' }) ]
[AS] table_alias(column_alias,...n)
I want to store current_day - 1 in a variable in Hive. I know there are already previous threads on this topic but the solutions provided there first recommends defining the variable outside hive in a shell environment and then using that variable inside Hive.
Storing result of query in hive variable
I first got the current_Date - 1 using
select date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1);
Then i tried two approaches:
1. set date1 = ( select date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1);
and
2. set hivevar:date1 = ( select date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1);
Both the approaches are throwing an error:
"ParseException line 1:82 cannot recognize input near 'select' 'date_sub' '(' in expression specification"
When I printed (1) in place of yesterday's date the select query is saved in the variable. The (2) approach throws "{hivevar:dt_chk} is undefined
".
I am new to Hive, would appreciate any help. Thanks.
Hive doesn't support a straightforward way to store query result to variables.You have to use the shell option along with hiveconf.
date1 = $(hive -e "set hive.cli.print.header=false; select date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),1);")
hive -hiveconf "date1"="$date1" -f hive_script.hql
Then in your script you can reference the newly created varaible date1
select '${hiveconf:date1}'
After lots of research, this is probably the best way to achieve setting a variable as an output of an SQL:
INSERT OVERWRITE LOCAL DIRECTORY '<home path>/config/date1'
select CONCAT('set hivevar:date1=',date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1)) from <some table> limit 1;
source <home path>/config/date1/000000_0;
You will then be able to use ${date1} in your subsequent SQLs.
Here we had to use <some table> limit 1 as hive got a bug in insert overwrite if we don't specify a table name.
I have an SQL table with 3 columns as such:
I would like to write a script in Amazon Redshift (running PostgreSQL 8.0.2) that exports the above table to a CSV file transposed. By transposed I mean I would like to create a new column for each cobrand (there are 4 distinct values in the cobrand_id column) in the CSV file. To illustrate what I want, I included an image (the values are just illustrative):
When I try:
COPY temp_08.jwn_calc TO 'P:/SQL_New/products_199.csv' DELIMITER ',' CSV HEADER;
I get the error: [42601] ERROR: syntax error at or near "HEADER" Position: 74.
When I remove "CSV HEADER", I get the error: [0A000] ERROR: COPY TO file from Xen-tables not supported
TRANSPOSING
To transpose the data, you'll have to write a query that specifically names each column, such as:
SELECT
qqyy as "Quarter",
SUM(CASE WHEN cobrand_id = 10001372 THEN sum END) as "10001372",
SUM(CASE WHEN cobrand_id = 10005244 THEN sum END) as "10005244",
SUM(CASE WHEN cobrand_id = 10005640 THEN sum END) as "10005640",
SUM(CASE WHEN cobrand_id = 10006164 THEN sum END) as "10006164"
FROM input_table
GROUP BY qqyy
ORDER BY qqyy
SAVING
The COPY command in Amazon Redshift can load data from:
Amazon S3
Amazon DynamoDB
An Amazon EMR cluster
A Linux host running SSH
If you wish to load data into Redshift, you should place a CSV (or a zipped CSV) into an Amazon S3 bucket and use the COPY command to import the data.
If you wish to export the data from Redshift, use the UNLOAD command to created zipped CSV files in Amazon S3. It is not possible to directly download results from Redshift via the UNLOAD command. Alternatively, your SQL client that runs locally on your computer might have the ability to save query results to a file.
The error you received is due to the fact that you attempted to access the filesystem of the Redshift host computer (P:/SQL_New/products_199.csv). This is not permitted, since you have no login access to the host computer.
If you already have an SQL query that transforms the data to what you want, the use the UNLOAD command to export it:
UNLOAD ('SELECT...FROM...') CREDENTIALS ... TO 's3://my-bucket/output.csv'
If you need to run this in a script, you can use psql, format the query to print csv, and output the result to a file. Something like:
psql -t -h HOST -p 5439 -U USER -d DBNAME -o "P:/SQL_New/products_199.csvaf" -c \
"SELECT
qqyy || ',' ||
SUM(CASE WHEN cobrand_id = 10001372 THEN sum END) || ',' ||
SUM(CASE WHEN cobrand_id = 10005244 THEN sum END) || ',' ||
SUM(CASE WHEN cobrand_id = 10005640 THEN sum END) || ',' ||
SUM(CASE WHEN cobrand_id = 10006164 THEN sum END)
FROM input_table
GROUP BY qqyy
ORDER BY qqyy"
If you are scheduling this script, you need to configure your passwords in ~/.pgpass
I have created a hive table stored with Avro file format. I am trying to load same hive table using below Pig commands
pig -useHCatalog;
hive_avro = LOAD 'hive_avro_table' using org.apache.hive.hcatalog.pig.HCatLoader();
I am getting " failed to read from hive_avro_table " error when I tried to display "hive_avro" using DUMP command.
Please help me how to resolve this issue. Thanks in advance
create table hivecomplex
(name string,
phones array<INT>,
deductions map<string,float>,
address struct<street:string,zip:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '$'
MAP KEYS TERMINATED BY '#'
STORED AS AVRO
;
hive> select * from hivecomplex;
OK
John [650,999,9999] {"pf":500.0} {"street":"pleasantville","zip":88888}
Time taken: 0.078 seconds, Fetched: 1 row(s)
Now for the pig
pig -useHCatalog;
a = LOAD 'hivecomplex' USING org.apache.hive.hcatalog.pig.HCatLoader();
dump a;
ne.util.MapRedUtil - Total input paths to process : 1
(John,{(650),(999),(9999)},[pf#500.0],(pleasantville,88888))