Reading SQL query arguments from external csv file - sql

I have following SQL query:
select * from P_SPC_LOT L,P_SPC_CHART_POINT CP
where L.spcs_id = CP.spcs_id and
(L.route like 'FL%' or L.route like 'RE%' or L.route like 'FE%') and
L.operation in ('123','234','456') and
L.data_collection_time > current_date -7 and rownum <100000
I am interesting to read L.operation values from external .csv file with "Operation" column. How that can be done? In addition, if reading from external file will slow query? I don't have DBA write privileges so looking for temporary csv table usage as not part of DB.

Import the CSV file into a table, and use it as follows (assuming the CSV file is loaded into a table named ops with a column named operations):
select *
from
P_SPC_LOT L
inner join P_SPC_CHART_POINT CP on L.spcs_id = CP.spcs_id
where
(L.route like 'FL%' or L.route like 'RE%' or L.route like 'FE%')
and L.operation in (select operation from ops)
and L.data_collection_time > current_date -7
and rownum <100000;

Related

Partitioning Data in SQL On-Demand with Blob Storage as Data Source

In Amazon Redshift there is a way to create a partition key when using your S3 bucket as a data source. Link.
I am attempting to do something similar in Azure Synapse using the SQL On-Demand service.
Currently I have a storage account that is partitioned such that it follows this scheme:
-Sales (folder)
- 2020-10-01 (folder)
- File 1
- File 2
- 2020-10-02 (folder)
- File 3
- File 4
To create a view and pull in all 4 files I ran the command:
CREATE VIEW testview3 AS SELECT * FROM OPENROWSET ( BULK 'Sales/*/*.csv', FORMAT = 'CSV', PARSER_VERSION = '2.0', DATA_SOURCE = 'AzureBlob', FIELDTERMINATOR = ',', FIRSTROW = 2 ) AS tv1;
If I run a query of SELECT * FROM [myview] I receive data from all 4 files.
How can I go about creating a partition key so that I could run a query such as
SELECT * FROM [myview] WHERE folderdate > 2020-10-01
so that I can only analyze data from Files 3 and 4?
I know I can edit my OPENROWSET BULK statement but I want to be able to get all the data from my container at first and then constrain searches as needed.
Serverless SQL can parse partitioned folder structure's using the filename (where you wish to load a specific file or files) and filepath (where you wish to load all files in this said path). More information on syntax and usage is available on documentation online.
In your case, you can parse all files from '2020-10-01' and beyond using the filepath syntax such as filepath(1) > '2020-10-01'
To expand on the answer from Raunak I ended up with the following syntax for my query.
DROP VIEW IF EXISTS testview6
GO
CREATE VIEW testview6 AS
SELECT *,
r.filepath(1) AS [date]
FROM OPENROWSET (
BULK 'Sales/*/*.csv',
FORMAT = 'CSV', PARSER_VERSION = '2.0',
DATA_SOURCE = 'AzureBlob',
FIELDTERMINATOR = ',',
FIRSTROW = 2
) AS [r]
WHERE r.filepath(1) IN ('2020-10-02');
You can adjust the granularity of your partitioning by the addition of extra wildcards (*) and r.filepath(x) statements.
For instance you can create your query such as:
DROP VIEW IF EXISTS testview6
GO
CREATE VIEW testview6 AS
SELECT *,
r.filepath(1) AS [year],
r.filepath(2) as [month]
FROM OPENROWSET (
BULK 'Sales/*-*-01/*.csv',
FORMAT = 'CSV', PARSER_VERSION = '2.0',
DATA_SOURCE = 'AzureBlob',
FIELDTERMINATOR = ',',
FIRSTROW = 2
) AS [r]
WHERE r.filepath(1) IN ('2020')
AND r.filepath(2) IN ('10');

How to omit query name returned from sql to CSV

I'm doing CSV exports from database table with postgres.
The script I call via linux is constructed by 4 main sql queries, but what I wan't to save in CSV is only 3rd step. Below I present how import process is executed:
execute via linux bash script:
$sudo -u postgres -H sh -c "psql -d openstreetmap -f extractTagsFromEditorToCsv.sql" > $CSV_OUTPUT_FILE
construction of extractTagsFromEditorToCsv.sql file:
ALTER TABLE temp_tags
ADD COLUMN IF NOT EXISTS ah_edited boolean default TRUE;
INSERT INTO temp_tags
SELECT DISTINCT ON (way_id, k)
way_id, k, v, version, ah_edited
FROM way_tags
ORDER BY way_id, k, version desc;
COPY
(SELECT temp_tags.way_id, temp_tags.k, temp_tags.v, temp_tags.version, TRUE
FROM temp_tags
JOIN ways ON temp_tags.way_id = ways.way_id AND temp_tags.version = ways.version
JOIN changesets on ways.changeset_id=changesets.id
JOIN users ON changesets.user_id=users.id
WHERE (k like '%maxspeed:backward%'
OR k like '%maxspeed:forward%'
OR k like '%maxspeed%')
AND ((users.email like '%ah.com%' AND ways.changeset_id != 0)
OR temp_tags.ah_edited = TRUE)) TO STDOUT (format csv, delimiter ';', header false);
DROP TABLE IF EXISTS temp_tags;
In result I receive file which for example looks like:
CREATE TABLE
ALTER TABLE
INSERT 0 13426
8135845;maxspeed;501;10;t
DROP TABLE
That file isn't correct parsed by CSV read tool.
My question is: Is it possible to omit sql method name printed in the output? Perhaps I could divide that script in 3 separate and export CSV from 'the middle' with only extracting files, but maybe there is some way to do it in one sql script file.
Expected output result would be like:
8135845;maxspeed;501;10;t
Thank you in advance.

Dynamically inject the current date into PostgreSQL query with copy export

Is there a way I can dynamically inject the date in YYYYMMDD format into this PostgreSQL query that I am using with copy inside psql?
copy (SELECT
course.id, course.area, course.status, course.type,
customer.firstname, customer.lastname, customer.email
FROM course
JOIN customer_course
ON customer_course.course_id = course.id
JOIN customer
ON customer.id = customer_course.customer_id
WHERE course.type LIKE '%heathland%'
AND course.status = 'open'
) TO '~/Dropbox/CRMPicco/prod-customer-courses-' . (SELECT NOW()::date) . '.csv' WITH CSV;
When I run that query I get the following error:
ERROR: syntax error at or near "."
LINE 1: ...ace%' AND course.status = 'open' ) TO STDOUT . (SELECT ...
It would be advantageous if I could create this CSV with the current date without having to manually edit the query.
In psql you can set a variable to the name of the output file:
postgres=#> select '~/Dropbox/CRMPicco/prod-customer-courses-'||current_date||'.csv' as fname
postgres=#> \gset
postgres=#> copy (...) to :'fname';
Note the missing ; after the first select, you have to terminate that statement with \gset. The variable name is defined through the column alias of the "source".

export data from db2 from all tables in N schemas into CSV with column names

I'm trying to export a bunch of DB2 tables to CSV, with column names. I don't see any straight forward way to do this. I followed this to get the data I want. But I have to execute that over hundreds of tables. Is there a way to dynamically get all the columns and tables given N schema names?
I also tried this which exports all tables to csv in a schema but this doesn't give me column names. So if someone could show me show to change this script to get column names in the CSVs my work is done.
The server is running: Red Hat Linux Server.
Using files
The following db2 command generates the export script:
export to exp.sql of del modified by nochardel
select
x'0a'||'export to file_header of del modified by nochardel VALUES '''||columns||''''
||x'0a'||'export to file_data of del messages messages.msg select '||columns||' from '||tabname_full
||x'0a'||'! cat file_header file_data > '||tabname_full||'.csv'
from
(
select rtrim(c.tabschema)||'.'||c.tabname as tabname_full, listagg(c.colname, ', ') as columns
from syscat.tables t
join syscat.columns c on c.tabschema=t.tabschema and c.tabname=t.tabname
where t.tabschema='SYSIBM' and t.type='T'
group by c.tabschema, c.tabname
--fetch first 10 row only
)
;
It's better to place the command above to some file like gen_exp.sql and run it to produce the export script:
db2 -tf gen_exp.sql
The export script exp.sql consists of 3 commands for each table:
* db2 export command to get a comma separated list of columns
* db2 export command to get table data
* concatenation command to collect both outputs above to a single file
You run this script as follows:
db2 -vf exp.sql -z exp.sql.log
Using pipe
gen_exp_sh.sql:
export to exp.sh of del modified by nochardel
select
x'0a'||'echo "'||columns||'" > '||filename
||x'0a'||'db2 "export to pipe_data of del messages messages.msg select '||columns||' from '||tabname_full||'" >/dev/null 2>&1 </dev/null &'
||x'0a'||'cat pipe_data >> '||filename
from
(
select
rtrim(c.tabschema)||'.'||c.tabname as tabname_full
, rtrim(c.tabschema)||'.'||c.tabname||'.csv' as filename
, listagg(c.colname, ', ') as columns
from syscat.tables t
join syscat.columns c on c.tabschema=t.tabschema and c.tabname=t.tabname
where t.tabschema='SYSIBM' and t.type='T'
group by c.tabschema, c.tabname
--fetch first 10 row only
)
;
Run it as follows:
db2 -tf gen_exp_sh.sql
The export shell script exp.sh consists of 3 commands for each table:
* echo command to write a comma separated list of columns to a file
* db2 export command to get table data to a pipe (started in a background)
* simple cat command to read from the pipe and add data to the same file with the columns list
Usage:
You must create the pipe first and source (dot space script notation - it's important) the export script afterwards:
mkfifo pipe_data
db2 connect to mydb ...
. ./exp.sh
rm -f pipe_data
Try to use this great tool: https://www.sql-workbench.eu/. It's universal and you may transfer data between any type of database motors.

What's the best way to copy a subset of a table's rows from one database to another in Postgres?

I've got a production DB with, say, ten million rows. I'd like to extract the 10,000 or so rows from the past hour off of production and copy them to my local box. How do I do that?
Let's say the query is:
SELECT * FROM mytable WHERE date > '2009-01-05 12:00:00';
How do I take the output, export it to some sort of dump file, and then import that dump file into my local development copy of the database -- as quickly and easily as possible?
Source:
psql -c "COPY (SELECT * FROM mytable WHERE ...) TO STDOUT" > mytable.copy
Destination:
psql -c "COPY mytable FROM STDIN" < mytable.copy
This assumes mytable has the same schema and column order in both the source and destination. If this isn't the case, you could try STDOUT CSV HEADER and STDIN CSV HEADER instead of STDOUT and STDIN, but I haven't tried it.
If you have any custom triggers on mytable, you may need to disable them on import:
psql -c "ALTER TABLE mytable DISABLE TRIGGER USER; \
COPY mytable FROM STDIN; \
ALTER TABLE mytable ENABLE TRIGGER USER" < mytable.copy
source server:
BEGIN;
CREATE TEMP TABLE mmm_your_table_here AS
SELECT * FROM your_table_here WHERE your_condition_here;
COPY mmm_your_table_here TO 'u:\\source.copy';
ROLLBACK;
your local box:
-- your_destination_table_here must be created first on your box
COPY your_destination_table_here FROM 'u:\\source.copy';
article: http://www.postgresql.org/docs/8.1/static/sql-copy.html
From within psql, you just use copy with the query you gave us, exporting this as a CSV (or whatever format), switch database with \c and import it.
Look into \h copy in psql.
With the constraint you added (not being superuser), I do not find a pure-SQL solution. But doing it in your favorite language is quite simple. You open a connection to the "old" database, another one to the new database, you SELECT in one and INSERT in the other. Here is a tested-and-working solution in Python.
#!/usr/bin/python
"""
Copy a *part* of a database to another one. See
<http://stackoverflow.com/questions/414849/whats-the-best-way-to-copy-a-subset-of-a-tables-rows-from-one-database-to-anoth>
With PostgreSQL, the only pure-SQL solution is to use COPY, which is
not available to the ordinary user.
Stephane Bortzmeyer <bortzmeyer#nic.fr>
"""
table_name = "Tests"
# List here the columns you want to copy. Yes, "*" would be simpler
# but also more brittle.
names = ["id", "uuid", "date", "domain", "broken", "spf"]
constraint = "date > '2009-01-01'"
import psycopg2
old_db = psycopg2.connect("dbname=dnswitness-spf")
new_db = psycopg2.connect("dbname=essais")
old_cursor = old_db.cursor()
old_cursor.execute("""SET TRANSACTION READ ONLY""") # Security
new_cursor = new_db.cursor()
old_cursor.execute("""SELECT %s FROM %s WHERE %s """ % \
(",".join(names), table_name, constraint))
print "%i rows retrieved" % old_cursor.rowcount
new_cursor.execute("""BEGIN""")
placeholders = []
namesandvalues = {}
for name in names:
placeholders.append("%%(%s)s" % name)
for row in old_cursor.fetchall():
i = 0
for name in names:
namesandvalues[name] = row[i]
i = i + 1
command = "INSERT INTO %s (%s) VALUES (%s)" % \
(table_name, ",".join(names), ",".join(placeholders))
new_cursor.execute(command, namesandvalues)
new_cursor.execute("""COMMIT""")
old_cursor.close()
new_cursor.close()
old_db.close()
new_db.close()