Imagine that you have the following data in a CSV:
Name, Age, Gender
Jake, 40, M
Bill, 17, M
Suzie, 21, F
Is it possible to exclude the Age variable when importing the above CSV? My current approach is to simply use the cut shell command.
Update
iluvcapra has a great answer for small CSVs. However, for very large CSVs this approach is inefficient. For example, imagine that the above CSV was very large, 30Gb lets say. Loading all that Age data only to immediately remove is a waste of time. With this in mind, is there a more efficient way to load subsets of columns into sqlite databases?
I suspect that the best option is to use the shell command cut to cull out unnecessary columns. Is that intuition correct? Is it common to use shell commands to pre-process CSV files into more sqlite friendly versions?
Create a temporary table with the age column, and then use an INSERT... SELECT to move the data from the temporary table into your main one:
CREATE TEMP TABLE _csv_import (name text, age integer, gender text);
.separator ","
.import file.csv test
INSERT INTO names_genders (name, gender) SELECT name, gender
FROM _csv_import WHERE 1;
DROP TABLE _csv_import;
EDIT: Updating into a view with a phantom age column:
CREATE VIEW names_ages_genders AS
SELECT (name, 0 AS age ,gender) FROM names_genders;
CREATE TRIGGER lose_age
INSTEAD OF INSERT ON names_ages_genders
BEGIN
INSERT INTO names_genders (name, gender)
VALUES (NEW.name, NEW.gender)
END;
This will create a view called names_ages_genders that will say everybody is zero years old, and will silently drop the age field from any INSERT statement called on it. Not tested! (I'm actually not sure .import can import into views.)
If you wish to avoid reading more than necessary into SQLite, and if you wish to avoid the hazards of using standard text-processing tools (such as cut and awk) on CSV files, one possibility would be to use your favorite csv2tsv converter (*) along the following lines:
csv2tsv input.csv | cut -f 1,3- > tmp.tsv
cat << EOF | sqlite3 demo.db
drop table if exists demo;
.mode csv
.separator "\t"
.import tmp.tsv demo
EOF
/bin/rm tmp.tsv
Note, though, that if input.csv has literal tabs or newlines or escaped double-quotes, then
whether the above will have the desired effect will depend on the csv2tsv that is used.
(*) csv2tsv
In case you don't have ready access to a suitable csv2tsv converter, here is a simple python3 script that does the job, handling embedded literal newlines, tabs, and the two-character sequences "\t" and "\n", in the CSV:
#!/usr/bin/env python3
# Take care of embedded tabs and newlines in the CSV
import csv, re, sys
if len(sys.argv) > 2 or (len(sys.argv) > 1 and sys.argv[1] == '--help'):
sys.exit("Usage: " + sys.argv[0] + " [input.csv [output.tsv]]")
csv.field_size_limit(sys.maxsize)
if len(sys.argv) == 3:
out=open(sys.argv[2], 'w+')
else:
out=sys.stdout
if len(sys.argv) == 1:
csvfile=sys.stdin
else:
csvfile=open(sys.argv[1])
# tabs and newlines ...
def edit(s):
s=re.sub(r'\\t', r'\\\\t', s)
s=re.sub(r'\\n', r'\\\\n', s)
s=re.sub('\t', r'\\t', s)
return re.sub('\n', r'\\n', s)
reader = csv.reader(csvfile, dialect='excel')
for row in reader:
line=""
for s in row:
s=edit(s)
if len(line) == 0:
line = s
else:
line += '\t' + s
print(line)
Related
I'm doing CSV exports from database table with postgres.
The script I call via linux is constructed by 4 main sql queries, but what I wan't to save in CSV is only 3rd step. Below I present how import process is executed:
execute via linux bash script:
$sudo -u postgres -H sh -c "psql -d openstreetmap -f extractTagsFromEditorToCsv.sql" > $CSV_OUTPUT_FILE
construction of extractTagsFromEditorToCsv.sql file:
ALTER TABLE temp_tags
ADD COLUMN IF NOT EXISTS ah_edited boolean default TRUE;
INSERT INTO temp_tags
SELECT DISTINCT ON (way_id, k)
way_id, k, v, version, ah_edited
FROM way_tags
ORDER BY way_id, k, version desc;
COPY
(SELECT temp_tags.way_id, temp_tags.k, temp_tags.v, temp_tags.version, TRUE
FROM temp_tags
JOIN ways ON temp_tags.way_id = ways.way_id AND temp_tags.version = ways.version
JOIN changesets on ways.changeset_id=changesets.id
JOIN users ON changesets.user_id=users.id
WHERE (k like '%maxspeed:backward%'
OR k like '%maxspeed:forward%'
OR k like '%maxspeed%')
AND ((users.email like '%ah.com%' AND ways.changeset_id != 0)
OR temp_tags.ah_edited = TRUE)) TO STDOUT (format csv, delimiter ';', header false);
DROP TABLE IF EXISTS temp_tags;
In result I receive file which for example looks like:
CREATE TABLE
ALTER TABLE
INSERT 0 13426
8135845;maxspeed;501;10;t
DROP TABLE
That file isn't correct parsed by CSV read tool.
My question is: Is it possible to omit sql method name printed in the output? Perhaps I could divide that script in 3 separate and export CSV from 'the middle' with only extracting files, but maybe there is some way to do it in one sql script file.
Expected output result would be like:
8135845;maxspeed;501;10;t
Thank you in advance.
I'm trying to export a bunch of DB2 tables to CSV, with column names. I don't see any straight forward way to do this. I followed this to get the data I want. But I have to execute that over hundreds of tables. Is there a way to dynamically get all the columns and tables given N schema names?
I also tried this which exports all tables to csv in a schema but this doesn't give me column names. So if someone could show me show to change this script to get column names in the CSVs my work is done.
The server is running: Red Hat Linux Server.
Using files
The following db2 command generates the export script:
export to exp.sql of del modified by nochardel
select
x'0a'||'export to file_header of del modified by nochardel VALUES '''||columns||''''
||x'0a'||'export to file_data of del messages messages.msg select '||columns||' from '||tabname_full
||x'0a'||'! cat file_header file_data > '||tabname_full||'.csv'
from
(
select rtrim(c.tabschema)||'.'||c.tabname as tabname_full, listagg(c.colname, ', ') as columns
from syscat.tables t
join syscat.columns c on c.tabschema=t.tabschema and c.tabname=t.tabname
where t.tabschema='SYSIBM' and t.type='T'
group by c.tabschema, c.tabname
--fetch first 10 row only
)
;
It's better to place the command above to some file like gen_exp.sql and run it to produce the export script:
db2 -tf gen_exp.sql
The export script exp.sql consists of 3 commands for each table:
* db2 export command to get a comma separated list of columns
* db2 export command to get table data
* concatenation command to collect both outputs above to a single file
You run this script as follows:
db2 -vf exp.sql -z exp.sql.log
Using pipe
gen_exp_sh.sql:
export to exp.sh of del modified by nochardel
select
x'0a'||'echo "'||columns||'" > '||filename
||x'0a'||'db2 "export to pipe_data of del messages messages.msg select '||columns||' from '||tabname_full||'" >/dev/null 2>&1 </dev/null &'
||x'0a'||'cat pipe_data >> '||filename
from
(
select
rtrim(c.tabschema)||'.'||c.tabname as tabname_full
, rtrim(c.tabschema)||'.'||c.tabname||'.csv' as filename
, listagg(c.colname, ', ') as columns
from syscat.tables t
join syscat.columns c on c.tabschema=t.tabschema and c.tabname=t.tabname
where t.tabschema='SYSIBM' and t.type='T'
group by c.tabschema, c.tabname
--fetch first 10 row only
)
;
Run it as follows:
db2 -tf gen_exp_sh.sql
The export shell script exp.sh consists of 3 commands for each table:
* echo command to write a comma separated list of columns to a file
* db2 export command to get table data to a pipe (started in a background)
* simple cat command to read from the pipe and add data to the same file with the columns list
Usage:
You must create the pipe first and source (dot space script notation - it's important) the export script afterwards:
mkfifo pipe_data
db2 connect to mydb ...
. ./exp.sh
rm -f pipe_data
Try to use this great tool: https://www.sql-workbench.eu/. It's universal and you may transfer data between any type of database motors.
I'm trying to export data From postgresql to csv.
First i created the query and tried exporting From pgadmin with the File -> Export to CSV. The CSV is wrong, as it contains for example :
The header : Field1;Field2;Field3;Field4
Now, the rows begin well, except for the last field that it puts it on another line:
Example :
Data1;Data2;Data3;
Data4;
The problem is i get error when trying to import the data to another server.
The data is From a view i created.
I also tried
COPY view(field1,field2...) TO 'C:\test.csv' DELIMITER ',' CSV HEADER;
It exports the same file.
I just want to export the data to another server.
Edit:
When trying to import the csv i get the error :
ERROR : Extra data after the last expected column. Context Copy
actions, line 3: <<"Data1, data2 etc.">>
So the first line is the header, the second line is the first row with data minus the last field, which is on the 3rd line, alone.
In order for you to export the file in another server you have two options:
Creating a shared folder between the two servers, so that the
database also has access to this directory.
COPY (SELECT field1,field2 FROM your_table) TO '[shared directory]' DELIMITER ',' CSV HEADER;
Triggering the export from the target server using the STDOUT of
COPY. Using psql you can achieve this running the following
command:
psql yourdb -c "COPY (SELECT * FROM your_table) TO STDOUT" > output.csv
EDIT: Addressing the issue of fields containing line feeds (\n)
In case you wanna get rid of the line feeds, use the REPLACE function.
Example:
SELECT E'foo\nbar';
?column?
----------
foo +
bar
(1 Zeile)
Removing the line feed:
SELECT REPLACE(E'foo\nbaar',E'\n','');
replace
---------
foobaar
(1 Zeile)
So your COPY should look like this:
COPY (SELECT field1,REPLACE(field2,E'\n','') AS field2 FROM your_table) TO '[shared directory]' DELIMITER ',' CSV HEADER;
the described above export procedure is OK, e.g:
t=# create table so(i int, t text);
CREATE TABLE
t=# insert into so select 1,chr(10)||'aaa';
INSERT 0 1
t=# copy so to stdout csv header;
i,t
1,"
aaa"
t=# create table so1(i int, t text);
CREATE TABLE
t=# copy so1 from stdout csv header;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> i,t
1,"
aaa"
>> >> >> \.
COPY 1
t=# select * from so1;
i | t
---+-----
1 | +
| aaa
(1 row)
I would like to split a line such as:
name1=value1,name2=value2, .....,namen=valuen
two produce two lines as follows:
name1,name2, .....,namen
value1,value2, .....,valuen
the goal being to construct an sql insert along the lines of:
input="name1=value1,name2=value2, .....,namen=valuen"
namescsv=$( echo $input | sed 's/=[^,]*//g' )
valuescsv=$( echo $input | ?????? )
INSERT INTO table_name ( $namescsv ) VALUES ( $valuescsv )
Id like to do this as simply as possible - perl awk, or multiple piping to tr cut etc seems too complicated. Given the names part seems simple enough I figure there must be something similar for values but cant work it out.
You can just inverse your character match :
echo $input | sed 's/[^,]*=//g'
i think your best bet is still sed -re s/[^=,]*=([^,]*)/\1/g though I guess the input would have match your table exactly.
Note that in some RDBMS you can use the following syntax:
INSERT INTO table_name SET name=value, name2=value2, ...;
http://dev.mysql.com/doc/refman/5.5/en/insert.html
The following shell script does what you are asking for and takes care of escaping (not only because of injection, but you may want to insert values with quotes in them):
_IFS="$IFS"; IFS=","
line="name1=value1,name2=value2,namen=valuen";
for pair in $line; do
names="$names,${pair%=*}"
values="$values,'$(escape_sql "${pair#*=}")'"
done
IFS="$_IFS"
echo "INSERT INTO table_name ( ${names#,} ) VALUES ( ${values#,} )"
Output:
INSERT INTO table_name ( name1,name2,namen ) VALUES ( 'value1','value2','valuen' )
I've got a production DB with, say, ten million rows. I'd like to extract the 10,000 or so rows from the past hour off of production and copy them to my local box. How do I do that?
Let's say the query is:
SELECT * FROM mytable WHERE date > '2009-01-05 12:00:00';
How do I take the output, export it to some sort of dump file, and then import that dump file into my local development copy of the database -- as quickly and easily as possible?
Source:
psql -c "COPY (SELECT * FROM mytable WHERE ...) TO STDOUT" > mytable.copy
Destination:
psql -c "COPY mytable FROM STDIN" < mytable.copy
This assumes mytable has the same schema and column order in both the source and destination. If this isn't the case, you could try STDOUT CSV HEADER and STDIN CSV HEADER instead of STDOUT and STDIN, but I haven't tried it.
If you have any custom triggers on mytable, you may need to disable them on import:
psql -c "ALTER TABLE mytable DISABLE TRIGGER USER; \
COPY mytable FROM STDIN; \
ALTER TABLE mytable ENABLE TRIGGER USER" < mytable.copy
source server:
BEGIN;
CREATE TEMP TABLE mmm_your_table_here AS
SELECT * FROM your_table_here WHERE your_condition_here;
COPY mmm_your_table_here TO 'u:\\source.copy';
ROLLBACK;
your local box:
-- your_destination_table_here must be created first on your box
COPY your_destination_table_here FROM 'u:\\source.copy';
article: http://www.postgresql.org/docs/8.1/static/sql-copy.html
From within psql, you just use copy with the query you gave us, exporting this as a CSV (or whatever format), switch database with \c and import it.
Look into \h copy in psql.
With the constraint you added (not being superuser), I do not find a pure-SQL solution. But doing it in your favorite language is quite simple. You open a connection to the "old" database, another one to the new database, you SELECT in one and INSERT in the other. Here is a tested-and-working solution in Python.
#!/usr/bin/python
"""
Copy a *part* of a database to another one. See
<http://stackoverflow.com/questions/414849/whats-the-best-way-to-copy-a-subset-of-a-tables-rows-from-one-database-to-anoth>
With PostgreSQL, the only pure-SQL solution is to use COPY, which is
not available to the ordinary user.
Stephane Bortzmeyer <bortzmeyer#nic.fr>
"""
table_name = "Tests"
# List here the columns you want to copy. Yes, "*" would be simpler
# but also more brittle.
names = ["id", "uuid", "date", "domain", "broken", "spf"]
constraint = "date > '2009-01-01'"
import psycopg2
old_db = psycopg2.connect("dbname=dnswitness-spf")
new_db = psycopg2.connect("dbname=essais")
old_cursor = old_db.cursor()
old_cursor.execute("""SET TRANSACTION READ ONLY""") # Security
new_cursor = new_db.cursor()
old_cursor.execute("""SELECT %s FROM %s WHERE %s """ % \
(",".join(names), table_name, constraint))
print "%i rows retrieved" % old_cursor.rowcount
new_cursor.execute("""BEGIN""")
placeholders = []
namesandvalues = {}
for name in names:
placeholders.append("%%(%s)s" % name)
for row in old_cursor.fetchall():
i = 0
for name in names:
namesandvalues[name] = row[i]
i = i + 1
command = "INSERT INTO %s (%s) VALUES (%s)" % \
(table_name, ",".join(names), ",".join(placeholders))
new_cursor.execute(command, namesandvalues)
new_cursor.execute("""COMMIT""")
old_cursor.close()
new_cursor.close()
old_db.close()
new_db.close()