pgAdmin4: Importing a CSV - sql

I am trying to import a CSV using pgAdmin4. I created the table using the query,
CREATE TABLE i210_2017_02_18
(
PROBE_ID character varying(255),
SAMPLE_DATE timestamp without time zone,
LAT numeric,
LON numeric,
HEADING integer,
SPEED integer,
PROBE_DATA_PROVIDER character varying(255),
SYSTEM_DATE timestamp without time zone
)
The header and first line of my CSV read is...
PROBE_ID,SAMPLE_DATE,LAT,LON,HEADING,SPEED,PROBE_DATA_PROVIDER,SYSTEM_DATE
841625st,2017-02-18 00:58:19,34.11968,-117.80855,91.0,9.0,FLEET53,2017-02-18 00:58:58
When I try to use the import dialogue, the process fails with Error Code 1:
ERROR: invalid input syntax for type timestamp: "SAMPLE_DATE"
CONTEXT: COPY i210_2017_02_18, line 1, column sample_date: "SAMPLE_DATE"
Nothing seems wrong to me - any ideas?

According to your table structure, this import will fail in the columns HEADING and SPEED, since their values have decimals and you declared them as INTEGER. Either remove the decimals or change the column type to e.g. NUMERIC.
Having said that, just try this from pgAdmin (considering that file and database are in the same server):
COPY i210_2017_02_18 FROM '/home/jones/file.csv' CSV HEADER;
In case you're dealing with a remote server, try this using psql from your console:
$ cat file.csv | psql yourdb -c "COPY i210_2017_02_18 FROM STDIN CSV HEADER;"
You can also check this answer.
In case you really want to stick to the pgAdmin import tool, which I discourage, just select the Header option and the proper Delimiter:

Have you set the Header-Option = TRUE?
Import settings
that should work.

Step 1: Create a table.
you can use a query or dashboard to create a table.
Step 2: Create the exact number of columns present in the CSV file.
I would recommend creating columns using the dashboard.
Step 3: Click on your table_name in pgadmin you will see an option for import/export.
Step 4: provide the path of your CSV file, remember to choose delimiter as comma,

Related

How to load large csv files with lots of columns into a database at Postgresql using command shell? [duplicate]

How can I write a stored procedure that imports data from a CSV file and populates the table?
Take a look at this short article.
The solution is paraphrased here:
Create your table:
CREATE TABLE zip_codes
(ZIP char(5), LATITUDE double precision, LONGITUDE double precision,
CITY varchar, STATE char(2), COUNTY varchar, ZIP_CLASS varchar);
Copy data from your CSV file to the table:
COPY zip_codes FROM '/path/to/csv/ZIP_CODES.txt' WITH (FORMAT csv);
If you don't have permission to use COPY (which work on the db server), you can use \copy instead (which works in the db client). Using the same example as Bozhidar Batsov:
Create your table:
CREATE TABLE zip_codes
(ZIP char(5), LATITUDE double precision, LONGITUDE double precision,
CITY varchar, STATE char(2), COUNTY varchar, ZIP_CLASS varchar);
Copy data from your CSV file to the table:
\copy zip_codes FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV
Mind that \copy ... must be written in one line and without a ; at the end!
You can also specify the columns to read:
\copy zip_codes(ZIP,CITY,STATE) FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV
See the documentation for COPY:
Do not confuse COPY with the psql instruction \copy. \copy invokes COPY FROM STDIN or COPY TO STDOUT, and then fetches/stores the data in a file accessible to the psql client. Thus, file accessibility and access rights depend on the client rather than the server when \copy is used.
And note:
For identity columns, the COPY FROM command will always write the column values provided in the input data, like the INSERT option OVERRIDING SYSTEM VALUE.
One quick way of doing this is with the Python Pandas library (version 0.15 or above works best). This will handle creating the columns for you - although obviously the choices it makes for data types might not be what you want. If it doesn't quite do what you want you can always use the 'create table' code generated as a template.
Here's a simple example:
import pandas as pd
df = pd.read_csv('mypath.csv')
df.columns = [c.lower() for c in df.columns] # PostgreSQL doesn't like capitals or spaces
from sqlalchemy import create_engine
engine = create_engine('postgresql://username:password#localhost:5432/dbname')
df.to_sql("my_table_name", engine)
And here's some code that shows you how to set various options:
# Set it so the raw SQL output is logged
import logging
logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)
df.to_sql("my_table_name2",
engine,
if_exists="append", # Options are ‘fail’, ‘replace’, ‘append’, default ‘fail’
index = False, # Do not output the index of the dataframe
dtype = {'col1': sqlalchemy.types.NUMERIC,
'col2': sqlalchemy.types.String}) # Datatypes should be SQLAlchemy types
Most other solutions here require that you create the table in advance/manually. This may not be practical in some cases (e.g., if you have a lot of columns in the destination table). So, the approach below may come handy.
Providing the path and column count of your CSV file, you can use the following function to load your table to a temp table that will be named as target_table:
The top row is assumed to have the column names.
create or replace function data.load_csv_file
(
target_table text,
csv_path text,
col_count integer
)
returns void as $$
declare
iter integer; -- dummy integer to iterate columns with
col text; -- variable to keep the column name at each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet
begin
create table temp_table ();
-- add just enough number of columns
for iter in 1..col_count
loop
execute format('alter table temp_table add column col_%s text;', iter);
end loop;
-- copy the data from csv file
execute format('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_path);
iter := 1;
col_first := (select col_1 from temp_table limit 1);
-- update the column names based on the first row which has the column names
for col in execute format('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
loop
execute format('alter table temp_table rename column col_%s to %s', iter, col);
iter := iter + 1;
end loop;
-- delete the columns row
execute format('delete from temp_table where %s = %L', col_first, col_first);
-- change the temp table name to the name given as parameter, if not blank
if length(target_table) > 0 then
execute format('alter table temp_table rename to %I', target_table);
end if;
end;
$$ language plpgsql;
You could also use pgAdmin, which offers a GUI to do the import. That's shown in this SO thread. The advantage of using pgAdmin is that it also works for remote databases.
Much like the previous solutions though, you would need to have your table on the database already. Each person has his own solution, but I usually open the CSV file in Excel, copy the headers, paste special with transposition on a different worksheet, place the corresponding data type on the next column, and then just copy and paste that to a text editor together with the appropriate SQL table creation query like so:
CREATE TABLE my_table (
/* Paste data from Excel here for example ... */
col_1 bigint,
col_2 bigint,
/* ... */
col_n bigint
)
COPY table_name FROM 'path/to/data.csv' DELIMITER ',' CSV HEADER;
Create a table first
Then use the copy command to copy the table details:
copy table_name (C1,C2,C3....)
from 'path to your CSV file' delimiter ',' csv header;
NOTE:
columns and order are specified by C1,C2,C3.. in SQL
The header option just skips one line from the input, not according to columns' name.
As Paul mentioned, import works in pgAdmin:
Right-click on table → Import
Select a local file, format and coding.
Here is a German pgAdmin GUI screenshot:
A similar thing you can do with DbVisualizer (I have a license and am not sure about free version).
Right-click on a table → Import Table Data...
Use this SQL code:
copy table_name(atribute1,attribute2,attribute3...)
from 'E:\test.csv' delimiter ',' csv header
The header keyword lets the DBMS know that the CSV file have a header with attributes.
For more, visit Import CSV File Into PostgreSQL Table.
This is a personal experience with PostgreSQL, and I am still waiting for a faster way.
Create a table skeleton first if the file is stored locally:
drop table if exists ur_table;
CREATE TABLE ur_table
(
id serial NOT NULL,
log_id numeric,
proc_code numeric,
date timestamp,
qty int,
name varchar,
price money
);
COPY
ur_table(id, log_id, proc_code, date, qty, name, price)
FROM '\path\xxx.csv' DELIMITER ',' CSV HEADER;
When the \path\xxx.csv file is on the server, PostgreSQL doesn't have the
permission to access the server. You will have to import the .csv file through the pgAdmin built in functionality.
Right click the table name and choose import.
If you still have the problem, please refer this tutorial: Import CSV File Into PostgreSQL Table
How to import CSV file data into a PostgreSQL table
Steps:
Need to connect a PostgreSQL database in the terminal
psql -U postgres -h localhost
Need to create a database
create database mydb;
Need to create a user
create user siva with password 'mypass';
Connect with the database
\c mydb;
Need to create a schema
create schema trip;
Need to create a table
create table trip.test(VendorID int,passenger_count int,trip_distance decimal,RatecodeID int,store_and_fwd_flag varchar,PULocationID int,DOLocationID int,payment_type decimal,fare_amount decimal,extra decimal,mta_tax decimal,tip_amount decimal,tolls_amount int,improvement_surcharge decimal,total_amount
);
Import csv file data to postgresql
COPY trip.test(VendorID int,passenger_count int,trip_distance decimal,RatecodeID int,store_and_fwd_flag varchar,PULocationID int,DOLocationID int,payment_type decimal,fare_amount decimal,extra decimal,mta_tax decimal,tip_amount decimal,tolls_amount int,improvement_surcharge decimal,total_amount) FROM '/home/Documents/trip.csv' DELIMITER ',' CSV HEADER;
Find the given table data
select * from trip.test;
You can also use pgfutter, or, even better, pgcsv.
These tools create the table columns from you, based on the CSV header.
pgfutter is quite buggy, and I'd recommend pgcsv.
Here's how to do it with pgcsv:
sudo pip install pgcsv
pgcsv --db 'postgresql://localhost/postgres?user=postgres&password=...' my_table my_file.csv
In Python, you can use this code for automatic PostgreSQL table creation with column names:
import pandas, csv
from io import StringIO
from sqlalchemy import create_engine
def psql_insert_copy(table, conn, keys, data_iter):
dbapi_conn = conn.connection
with dbapi_conn.cursor() as cur:
s_buf = StringIO()
writer = csv.writer(s_buf)
writer.writerows(data_iter)
s_buf.seek(0)
columns = ', '.join('"{}"'.format(k) for k in keys)
if table.schema:
table_name = '{}.{}'.format(table.schema, table.name)
else:
table_name = table.name
sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(table_name, columns)
cur.copy_expert(sql=sql, file=s_buf)
engine = create_engine('postgresql://user:password#localhost:5432/my_db')
df = pandas.read_csv("my.csv")
df.to_sql('my_table', engine, schema='my_schema', method=psql_insert_copy)
It's also relatively fast. I can import more than 3.3 million rows in about 4 minutes.
You can create a Bash file as import.sh (that your CSV format is a tab delimiter):
#!/usr/bin/env bash
USER="test"
DB="postgres"
TBALE_NAME="user"
CSV_DIR="$(pwd)/csv"
FILE_NAME="user.txt"
echo $(psql -d $DB -U $USER -c "\copy $TBALE_NAME from '$CSV_DIR/$FILE_NAME' DELIMITER E'\t' csv" 2>&1 |tee /dev/tty)
And then run this script.
You can use the Pandas library if the file is not very large.
Be careful when using iter over Pandas dataframes. I am doing this here to demonstrate the possibility. One could also consider the pd.Dataframe.to_sql() function when copying from a dataframe to an SQL table.
Assuming you have created the table you want, you could:
import psycopg2
import pandas as pd
data=pd.read_csv(r'path\to\file.csv', delimiter=' ')
#prepare your data and keep only relevant columns
data.drop(['col2', 'col4','col5'], axis=1, inplace=True)
data.dropna(inplace=True)
print(data.iloc[:3])
conn=psycopg2.connect("dbname=db user=postgres password=password")
cur=conn.cursor()
for index,row in data.iterrows():
cur.execute('''insert into table (col1,col3,col6)
VALUES (%s,%s,%s)''', (row['col1'], row['col3'], row['col6'])
cur.close()
conn.commit()
conn.close()
print('\n db connection closed.')
DBeaver Community Edition (dbeaver.io) makes it trivial to connect to a database, then import a CSV file for upload to a PostgreSQL database. It also makes it easy to issue queries, retrieve data, and download result sets to CSV, JSON, SQL, or other common data formats.
It is a FOSS multi-platform database tool for SQL programmers, DBAs and analysts that supports all popular databases: MySQL, PostgreSQL, SQLite, Oracle, DB2, SQL Server, Sybase, MS Access, Teradata, Firebird, Hive, Presto, etc. It's a viable FOSS competitor to TOAD for Postgres, TOAD for SQL Server, or Toad for Oracle.
I have no affiliation with DBeaver. I love the price (FREE!) and full functionality, but I wish they would open up this DBeaver/Eclipse application more and make it easy to add analytics widgets to DBeaver / Eclipse, rather than requiring users to pay for the $199 annual subscription just to create graphs and charts directly within the application. My Java coding skills are rusty and I don't feel like taking weeks to relearn how to build Eclipse widgets, (only to find that DBeaver has probably disabled the ability to add third-party widgets to the DBeaver Community Edition.)
You have 3 options to import CSV files to PostgreSQL:
First, using the COPY command through the command line.
Second, using the pgAdmin tool’s import/export.
Third, using a cloud solution like Skyvia which gets the CSV file from an online location like an FTP source or a cloud storage like Google Drive.
You can check out the article that explains all of these from here.
Create a table and have the required columns that are used for creating a table in the CSV file.
Open postgres and right click on the target table which you want to load. Select import and Update the following steps in the file options section
Now browse your file for the filename
Select CSV in format
Encoding as ISO_8859_5
Now go to Misc. options. Check header and click on import.
If you need a simple mechanism to import from text/parse multiline CSV content, you could use:
CREATE TABLE t -- OR INSERT INTO tab(col_names)
AS
SELECT
t.f[1] AS col1
,t.f[2]::int AS col2
,t.f[3]::date AS col3
,t.f[4] AS col4
FROM (
SELECT regexp_split_to_array(l, ',') AS f
FROM regexp_split_to_table(
$$a,1,2016-01-01,bbb
c,2,2018-01-01,ddd
e,3,2019-01-01,eee$$, '\n') AS l) t;
DBFiddle Demo
I created a small tool that imports csv file into PostgreSQL super easy. It is just a command and it will create and populate the tables, but unfortunately, at the moment, all fields automatically created uses the type TEXT:
csv2pg users.csv -d ";" -H 192.168.99.100 -U postgres -B mydatabase
The tool can be found on https://github.com/eduardonunesp/csv2pg
These are some great answers but over complicated for me. I just need to load in a CSV file into postgreSQL without creating a table first.
Here is my way:
libraries
import pandas as pd
import os
import psycopg2 as pg
from sqlalchemy import create_engine
Use environmental Variable to get your password
password = os.environ.get('PSW')
create our engine
engine = create_engine(f"postgresql+psycopg2://postgres:{password}#localhost:5432/postgres")
The break down of engine requirements:
engine = create_engine(dialect+driver://username:password#host:port/database)
Break Down
postgresql+psycopg2 = dialect+driver
postgres = username
password = password from my environmental variable. You can type in password if needed but not recommended
localhost = host
5432 = port
postgres = database
Get your CSV file path, I had to use an encoding aspect. reason why can be found Here
data = pd.read_csv(r"path, encoding= 'unicode_escape')
Send data to Postgress SQL:
data.to_sql('test', engine, if_exists='replace')
Break Down
test = table name you want table to be
engine = engine created above. AKA our connection
if_exsists = will replace old table if there. Use this with caution.
All Together:
import pandas as pd
import os
import psycopg2 as pg
from sqlalchemy import create_engine
password = os.environ.get('PSW')
engine = create_engine(f"postgresql+psycopg2://postgres:{password}#localhost:5432/postgres")
data = pd.read_csv(r"path, encoding= 'unicode_escape')
data.to_sql('test', engine, if_exists='replace')

Query contains parameters but import file contains different values [importing csv to Teradata SQL]

I am using Teradata SQL to import a CSV file. I clicked import to activate the import operation, then typed the following
insert into databasename.tablename values(?,?,?,...)
I made sure to specify the database name as well as what I want the table to be named, and I put 13 commas--the number of columns in my CSV file.
It gives me the following error:
Query contains 13 parameters but Import file contains 1 data values
I have no idea what the issue is.
The default delimiter used by your SQL Assistant doesn't match the one used in the CSV, so it doesn't recognise all the columns.
On SQL Assistant, go to : Tools >> Options >> Export/Import and choose the proper delimiter so it matches the one in your CSV.

Hue on Cloudera - NULL values (importing file)

Yesterday I installed Cloudera QuickStart VM 5.8. After the import operation of files from the database by HUE, in some tables there were a NULL value (the entire column). In previous steps data display them properly as they should be imported.
First Pic.
Second Pic.
can you run the command describe formatted table_name in hive shell and see what is the field delimiter and then go to the warehouse directory and see if the delimiter in the data and in the table definition is same.i am sure it will not be same thats why you see null.
i am assuming you have imported the data in the default warehouse directory.
then you can do one of the following
1) delete your hive table and create it again with correct delimiter as it is in the actual data ( row format delimited fields terminated by "your delimitor" and give location as your data file
or
2) delete the data that is imported and run sqoop import again and give the fields-terminated-by " the delimitor in the hive table definition"
Once check datatype of second(col_1) and third(col_2) in original database from where your exporting.
This can not be case of missing delimiter, else fourth(col_3) would not have populated correctly, which is correct.

Pentaho | Issue with CSV file to Table output

I am working in Pentaho spoon. I have a requirement to load CSV file data into one table.
I have used , as delimter in CSV file. I can see correct data in preview of CSV file input step. But when I tried to insert data into Table Output step, I am getting data truncation error.
This is because I have below kind of values in one of my column.
"2,ABC Squere".
As you see, I have "," in my column value so it is truncating and throwing error.How to solve this problem?
I want to upload data in Table with this kind of values..
Here is one way of doing it
test.csv
--------
colA,colB,colC
ABC,"2,ABC Squere",test
See below the settings. The key is to use "" as encloser and , as delimiter.
you can change the delimiter say to PIPE and also keeping data as quoted text like "1,Name" this will treat the same as 1 column

DB2 SQL query returns some type of converted results when exporting to file

have shell script which queries a DB2 db and exports the output to a file. When I sun the SQL statement without exporting, I get the following:
su - myid -c 'db2 connect to mydb;db2 -v "select COL1"; db2 connect reset;'
Sample Output
COL 1
x'20A0E2450080000'
x'50D24520E100GDS00'
x'10H0EFJ10080000'
x'50A0GH0080000'
x'80RHE1008B0000'
x'70A50E1F4008000'
x'10F329EF09BB0'
But when I export my results using the exact same query, I get the following:
su - myid -c 'db2 connect to mydb;db2 -v "EXPORT TO '/tmp/query_results.out' OF DEL MODIFIED BY COLDEL: select COL1 from MYTABLE"; db2 connect reset;'
Sample Output
hôª"
"xàÓ °á
"èÅ °á
hôª"
"é# °á
hôª"
"é« °á
hôª"
"éÅ °á
hôª"
"""ÒYá  á
hôª"
"#sYá  á
hôª"
I'm assuming this is due to the single quote characters. Due to the fact that they are both preceded by another character, I have not been able to add '\' in front of them. I've also attempted to run the substr function within the query, but I still get the same result, only shorter. I'm sure there must be something I am overlooking, so after a several days of trying on my own (and failing), I'm turning to you guys. Any help would be greatly appreciated.
*Edit: Just wanted to add that my actual select statement includes more than one column which are displayed correctly. So out of several columns, only one is displaying bad data.
"I'm assuming this is due to the single quote characters" -- No. This particular column contains binary data, either BLOB or VARCHAR FOR BIT DATA. If it is BLOB, specify LOBS TO in the EXPORT command, this way BLOBs will be written to binary files. If it is VARCHAR FOR BIT DATA, you can either convert it to BLOB on export (export to ... lobs to ... select blob(your_column)...) or export it as hex(your_column), depending on what you're planning to do with the export later.
Another alternative for VARCHAR FOR BIT DATA would be to export your table using the IXF format instead of DEL, which will preserve binary strings.