Transposing and then exporting a table to a CSV file - sql

I have an SQL table with 3 columns as such:
I would like to write a script in Amazon Redshift (running PostgreSQL 8.0.2) that exports the above table to a CSV file transposed. By transposed I mean I would like to create a new column for each cobrand (there are 4 distinct values in the cobrand_id column) in the CSV file. To illustrate what I want, I included an image (the values are just illustrative):
When I try:
COPY temp_08.jwn_calc TO 'P:/SQL_New/products_199.csv' DELIMITER ',' CSV HEADER;
I get the error: [42601] ERROR: syntax error at or near "HEADER" Position: 74.
When I remove "CSV HEADER", I get the error: [0A000] ERROR: COPY TO file from Xen-tables not supported

TRANSPOSING
To transpose the data, you'll have to write a query that specifically names each column, such as:
SELECT
qqyy as "Quarter",
SUM(CASE WHEN cobrand_id = 10001372 THEN sum END) as "10001372",
SUM(CASE WHEN cobrand_id = 10005244 THEN sum END) as "10005244",
SUM(CASE WHEN cobrand_id = 10005640 THEN sum END) as "10005640",
SUM(CASE WHEN cobrand_id = 10006164 THEN sum END) as "10006164"
FROM input_table
GROUP BY qqyy
ORDER BY qqyy
SAVING
The COPY command in Amazon Redshift can load data from:
Amazon S3
Amazon DynamoDB
An Amazon EMR cluster
A Linux host running SSH
If you wish to load data into Redshift, you should place a CSV (or a zipped CSV) into an Amazon S3 bucket and use the COPY command to import the data.
If you wish to export the data from Redshift, use the UNLOAD command to created zipped CSV files in Amazon S3. It is not possible to directly download results from Redshift via the UNLOAD command. Alternatively, your SQL client that runs locally on your computer might have the ability to save query results to a file.
The error you received is due to the fact that you attempted to access the filesystem of the Redshift host computer (P:/SQL_New/products_199.csv). This is not permitted, since you have no login access to the host computer.
If you already have an SQL query that transforms the data to what you want, the use the UNLOAD command to export it:
UNLOAD ('SELECT...FROM...') CREDENTIALS ... TO 's3://my-bucket/output.csv'

If you need to run this in a script, you can use psql, format the query to print csv, and output the result to a file. Something like:
psql -t -h HOST -p 5439 -U USER -d DBNAME -o "P:/SQL_New/products_199.csvaf" -c \
"SELECT
qqyy || ',' ||
SUM(CASE WHEN cobrand_id = 10001372 THEN sum END) || ',' ||
SUM(CASE WHEN cobrand_id = 10005244 THEN sum END) || ',' ||
SUM(CASE WHEN cobrand_id = 10005640 THEN sum END) || ',' ||
SUM(CASE WHEN cobrand_id = 10006164 THEN sum END)
FROM input_table
GROUP BY qqyy
ORDER BY qqyy"
If you are scheduling this script, you need to configure your passwords in ~/.pgpass

Related

How to omit query name returned from sql to CSV

I'm doing CSV exports from database table with postgres.
The script I call via linux is constructed by 4 main sql queries, but what I wan't to save in CSV is only 3rd step. Below I present how import process is executed:
execute via linux bash script:
$sudo -u postgres -H sh -c "psql -d openstreetmap -f extractTagsFromEditorToCsv.sql" > $CSV_OUTPUT_FILE
construction of extractTagsFromEditorToCsv.sql file:
ALTER TABLE temp_tags
ADD COLUMN IF NOT EXISTS ah_edited boolean default TRUE;
INSERT INTO temp_tags
SELECT DISTINCT ON (way_id, k)
way_id, k, v, version, ah_edited
FROM way_tags
ORDER BY way_id, k, version desc;
COPY
(SELECT temp_tags.way_id, temp_tags.k, temp_tags.v, temp_tags.version, TRUE
FROM temp_tags
JOIN ways ON temp_tags.way_id = ways.way_id AND temp_tags.version = ways.version
JOIN changesets on ways.changeset_id=changesets.id
JOIN users ON changesets.user_id=users.id
WHERE (k like '%maxspeed:backward%'
OR k like '%maxspeed:forward%'
OR k like '%maxspeed%')
AND ((users.email like '%ah.com%' AND ways.changeset_id != 0)
OR temp_tags.ah_edited = TRUE)) TO STDOUT (format csv, delimiter ';', header false);
DROP TABLE IF EXISTS temp_tags;
In result I receive file which for example looks like:
CREATE TABLE
ALTER TABLE
INSERT 0 13426
8135845;maxspeed;501;10;t
DROP TABLE
That file isn't correct parsed by CSV read tool.
My question is: Is it possible to omit sql method name printed in the output? Perhaps I could divide that script in 3 separate and export CSV from 'the middle' with only extracting files, but maybe there is some way to do it in one sql script file.
Expected output result would be like:
8135845;maxspeed;501;10;t
Thank you in advance.

export data from db2 from all tables in N schemas into CSV with column names

I'm trying to export a bunch of DB2 tables to CSV, with column names. I don't see any straight forward way to do this. I followed this to get the data I want. But I have to execute that over hundreds of tables. Is there a way to dynamically get all the columns and tables given N schema names?
I also tried this which exports all tables to csv in a schema but this doesn't give me column names. So if someone could show me show to change this script to get column names in the CSVs my work is done.
The server is running: Red Hat Linux Server.
Using files
The following db2 command generates the export script:
export to exp.sql of del modified by nochardel
select
x'0a'||'export to file_header of del modified by nochardel VALUES '''||columns||''''
||x'0a'||'export to file_data of del messages messages.msg select '||columns||' from '||tabname_full
||x'0a'||'! cat file_header file_data > '||tabname_full||'.csv'
from
(
select rtrim(c.tabschema)||'.'||c.tabname as tabname_full, listagg(c.colname, ', ') as columns
from syscat.tables t
join syscat.columns c on c.tabschema=t.tabschema and c.tabname=t.tabname
where t.tabschema='SYSIBM' and t.type='T'
group by c.tabschema, c.tabname
--fetch first 10 row only
)
;
It's better to place the command above to some file like gen_exp.sql and run it to produce the export script:
db2 -tf gen_exp.sql
The export script exp.sql consists of 3 commands for each table:
* db2 export command to get a comma separated list of columns
* db2 export command to get table data
* concatenation command to collect both outputs above to a single file
You run this script as follows:
db2 -vf exp.sql -z exp.sql.log
Using pipe
gen_exp_sh.sql:
export to exp.sh of del modified by nochardel
select
x'0a'||'echo "'||columns||'" > '||filename
||x'0a'||'db2 "export to pipe_data of del messages messages.msg select '||columns||' from '||tabname_full||'" >/dev/null 2>&1 </dev/null &'
||x'0a'||'cat pipe_data >> '||filename
from
(
select
rtrim(c.tabschema)||'.'||c.tabname as tabname_full
, rtrim(c.tabschema)||'.'||c.tabname||'.csv' as filename
, listagg(c.colname, ', ') as columns
from syscat.tables t
join syscat.columns c on c.tabschema=t.tabschema and c.tabname=t.tabname
where t.tabschema='SYSIBM' and t.type='T'
group by c.tabschema, c.tabname
--fetch first 10 row only
)
;
Run it as follows:
db2 -tf gen_exp_sh.sql
The export shell script exp.sh consists of 3 commands for each table:
* echo command to write a comma separated list of columns to a file
* db2 export command to get table data to a pipe (started in a background)
* simple cat command to read from the pipe and add data to the same file with the columns list
Usage:
You must create the pipe first and source (dot space script notation - it's important) the export script afterwards:
mkfifo pipe_data
db2 connect to mydb ...
. ./exp.sh
rm -f pipe_data
Try to use this great tool: https://www.sql-workbench.eu/. It's universal and you may transfer data between any type of database motors.

Reading SQL query arguments from external csv file

I have following SQL query:
select * from P_SPC_LOT L,P_SPC_CHART_POINT CP
where L.spcs_id = CP.spcs_id and
(L.route like 'FL%' or L.route like 'RE%' or L.route like 'FE%') and
L.operation in ('123','234','456') and
L.data_collection_time > current_date -7 and rownum <100000
I am interesting to read L.operation values from external .csv file with "Operation" column. How that can be done? In addition, if reading from external file will slow query? I don't have DBA write privileges so looking for temporary csv table usage as not part of DB.
Import the CSV file into a table, and use it as follows (assuming the CSV file is loaded into a table named ops with a column named operations):
select *
from
P_SPC_LOT L
inner join P_SPC_CHART_POINT CP on L.spcs_id = CP.spcs_id
where
(L.route like 'FL%' or L.route like 'RE%' or L.route like 'FE%')
and L.operation in (select operation from ops)
and L.data_collection_time > current_date -7
and rownum <100000;

Exporting SQL Query to a local text file

This is for an approach that WRITES to a local file.
I am using SQL WorkBench and I'm connected to an AWS Redshift instance (which uses postgresql). I would like to run the query and have data exported from AWS Redshift to a local csv or text file. I have tried:
SELECT transaction_date ,
Variable 1 ,
Variable 2 ,
Variable 3 ,
Variable 4 ,
Variable 5
From xyz
into OUTFILE 'C:/filename.csv'
But I get the following error:
ERROR: syntax error at or near "'C:/filename.csv'"
Position: 148
into OUTFILE 'C:/filename.csv'

What's the best way to copy a subset of a table's rows from one database to another in Postgres?

I've got a production DB with, say, ten million rows. I'd like to extract the 10,000 or so rows from the past hour off of production and copy them to my local box. How do I do that?
Let's say the query is:
SELECT * FROM mytable WHERE date > '2009-01-05 12:00:00';
How do I take the output, export it to some sort of dump file, and then import that dump file into my local development copy of the database -- as quickly and easily as possible?
Source:
psql -c "COPY (SELECT * FROM mytable WHERE ...) TO STDOUT" > mytable.copy
Destination:
psql -c "COPY mytable FROM STDIN" < mytable.copy
This assumes mytable has the same schema and column order in both the source and destination. If this isn't the case, you could try STDOUT CSV HEADER and STDIN CSV HEADER instead of STDOUT and STDIN, but I haven't tried it.
If you have any custom triggers on mytable, you may need to disable them on import:
psql -c "ALTER TABLE mytable DISABLE TRIGGER USER; \
COPY mytable FROM STDIN; \
ALTER TABLE mytable ENABLE TRIGGER USER" < mytable.copy
source server:
BEGIN;
CREATE TEMP TABLE mmm_your_table_here AS
SELECT * FROM your_table_here WHERE your_condition_here;
COPY mmm_your_table_here TO 'u:\\source.copy';
ROLLBACK;
your local box:
-- your_destination_table_here must be created first on your box
COPY your_destination_table_here FROM 'u:\\source.copy';
article: http://www.postgresql.org/docs/8.1/static/sql-copy.html
From within psql, you just use copy with the query you gave us, exporting this as a CSV (or whatever format), switch database with \c and import it.
Look into \h copy in psql.
With the constraint you added (not being superuser), I do not find a pure-SQL solution. But doing it in your favorite language is quite simple. You open a connection to the "old" database, another one to the new database, you SELECT in one and INSERT in the other. Here is a tested-and-working solution in Python.
#!/usr/bin/python
"""
Copy a *part* of a database to another one. See
<http://stackoverflow.com/questions/414849/whats-the-best-way-to-copy-a-subset-of-a-tables-rows-from-one-database-to-anoth>
With PostgreSQL, the only pure-SQL solution is to use COPY, which is
not available to the ordinary user.
Stephane Bortzmeyer <bortzmeyer#nic.fr>
"""
table_name = "Tests"
# List here the columns you want to copy. Yes, "*" would be simpler
# but also more brittle.
names = ["id", "uuid", "date", "domain", "broken", "spf"]
constraint = "date > '2009-01-01'"
import psycopg2
old_db = psycopg2.connect("dbname=dnswitness-spf")
new_db = psycopg2.connect("dbname=essais")
old_cursor = old_db.cursor()
old_cursor.execute("""SET TRANSACTION READ ONLY""") # Security
new_cursor = new_db.cursor()
old_cursor.execute("""SELECT %s FROM %s WHERE %s """ % \
(",".join(names), table_name, constraint))
print "%i rows retrieved" % old_cursor.rowcount
new_cursor.execute("""BEGIN""")
placeholders = []
namesandvalues = {}
for name in names:
placeholders.append("%%(%s)s" % name)
for row in old_cursor.fetchall():
i = 0
for name in names:
namesandvalues[name] = row[i]
i = i + 1
command = "INSERT INTO %s (%s) VALUES (%s)" % \
(table_name, ",".join(names), ",".join(placeholders))
new_cursor.execute(command, namesandvalues)
new_cursor.execute("""COMMIT""")
old_cursor.close()
new_cursor.close()
old_db.close()
new_db.close()