how to import csv/data in postgresql

how to import csv/data in postgresql - sql

Hy, I know many questions has been asked on it already but this is somewhat different.
I have csv file containing millions of records. I tried the following commands to copy from csv to my table i.e.
copy "client_data" from '/home/mike/Desktop/client_data.txt' with delimiter ',' CSV;
BUT the problem arrises as the data in the csv is inconsistent state i.e.
following lines would like charm
12/12/12 20:17:35,304000000,"123","1"
12/12/12 20:17:36,311000000,"123","2"
12/12/12 20:17:36,814000000,"123","2"
12/12/12 20:17:36,814000000,"123","2"
12/12/12 20:17:37,317000000,"123",".1"
12/12/12 20:17:38,863000000,"123","TS"
12/12/12 20:17:39,835000000,"123","2"
12/12/12 20:17:40,337000000,"123","1"
but hundreds of rows are some what like
12/12/12 20:20:03,790000000,"123","1
{'""}__{""'} /""'\
( $AMZA./)#FRIDI
{__}""'{__} /) (\. ,,DON,,"
12/12/12 20:20:30,501000000,"123","INAM NIKALTA NHE HE KITNE SAWALO K JAWB DAY
/G\A\,':/\,':/S\K,':\"
12/12/12 20:22:55,928000000,"123","PAKISTAN KI BUNYAAD
2=QUAID-E-AZAM"
12/12/12 20:22:56,431000000,"123","QUIED E AZAM
MOHAMMAD ALI JINNAH
[KFK FEROZ]"
which are un parseable due to line breaks, commas, invalid characters, etc.
is there any way to parse these and load the data in postgres table in efficient way?
below is the table structure
create table "client_data" (
date_stamp text,
points bigint,
msisdn character varying(13),
data text
)
with (OIDS = false);
alter table "client_data" owner to postgres;

Did it using MySql import option as MySql had more sophisticated and relative easy import approach for importing data.

Related

POSTGRESQL invalid input syntax for type integer

I am a novice user of PostgresQL. I found out in Internet interesting DB, which consists of 3 tables. Every of this table is in CSV format. However I could smoothly copy the first one to my sql database. Right now I have the following problem:
ERROR: invalid input syntax for type integer: "43950100,43950010,TUR,GUNES Senol (TUR),S,1,"RUSTU ",GK,"
CONTEXT: COPY worldcupsplayers, line 25645, column roundid: "43950100,43950010,TUR,GUNES Senol (TUR),S,1,"RUSTU ",GK,"
Every row in a table has the same structure, as follows:
43950100,43950010,TUR,GUNES Senol (TUR),S,1,"RUSTU ",GK,
I am wondering why in such row I have a problem (line 25645), while it looks as other ones.
I am trying to copy it with this command:
COPY worldcupsplayers(RoundID, MatchID, team_initials, coach_name, line_up, shirt_number, player_name, position, event)
FROM 'C:\Users\Public\worldcup\worldcupplayers.csv'
DELIMITER ','
CSV HEADER;
[table structure]
UPDATE: rows above and below
25643 - 43950100,43950008,SVN,KATANEC Srecko (SVN),N,23,BULAJIC,,
25644 - 43950100,43950010,BRA,SCOLARI Luiz Felipe (BRA),S,1,MARCOS,GK,
25645 - 43950100,43950010,TUR,GUNES Senol (TUR),S,1,"RUSTU ",GK,
25646 - 43950100,43950010,BRA,SCOLARI Luiz Felipe (BRA),S,2,CAFU,C,
As u may see they are all the same, with the commas every row has the same structure.

Regex comparison in Oracle between 2 varchar columns (from different tables)

I am trying to find a way to capture relevant errors from oracle alertlog. I have one table (ORA_BLACKLIST) with column values as below (these are the values which I want to ignore from
V$DIAG_ALERT_EXT)
Below are sample data in ORA_BLACKLIST table. This table can grow based on additional error to ignore from alertlog.
ORA-07445%[kkqctdrvJPPD
ORA-07445%[kxsPurgeCursor
ORA-01013%
ORA-27037%
ORA-01110
ORA-2154
V$DIAG_ALERT_EXT contains a MESSAGE_TEXT column which contains sample text like below.
ORA-01013: user requested cancel of current operation
ORA-07445: exception encountered: core dump [kxtogboh()+22] [SIGSEGV] [ADDR:0x87] [PC:0x12292A56]
ORA-07445: exception encountered: core dump [java_util_HashMap__get()] [SIGSEGV]
ORA-00600: internal error code arguments: [qercoRopRowsets:anumrows]
I want to write a query something like below to ignore the black listed errors and only capture relevant info like below.
select
dae.instance_id,
dae.container_name,
err_count,
dae.message_level
from
ORA_BLACKLIST ob,
V$DIAG_ALERT_EXT dae
where
group by .....;
Can someone suggest a way or sample code to achieve it?
I should have provided the exact contents of blacklist table. It currently contains some regex (perl) and I want to convert it to oracle like regex and compare with v$diag_alert_ext message_text column. Below are sample perl regex in my blacklist table.
ORA-0(,|$| )
ORA-48913
ORA-00060
ORA-609(,|$| )
ORA-65011
ORA-65020 ORA-31(,|$| )
ORA-7452 ORA-959(,|$| )
ORA-3136(,|)|$| )
ORA-07445.[kkqctdrvJPPD
ORA-07445.[kxsPurgeCursor –

Your blacklist table looks like like patterns, not regular expressions.
You can write a query like this:
select dae.* -- or whatever columns you want
from V$DIAG_ALERT_EXT dae
where not exists (select 1
from ORA_BLACKLIST ob
where dae.message_text like ob.<column name>
);
This will not have particularly good performance if the tables are large.

import a txt file with 2 columns into different columns in SQL Server Management Studio

I have a txt file containing numerous items in the following format
DBSERVER: HKSER
DBREPLICAID: 51376694590
DBPATH: redirect.nsf
DBTITLE: Redirect AP
DATETIME: 09.03.2015 09:44:21 AM
READS: 1
Adds: 0
Updates: 0
Deletes: 0
DBSERVER: HKSER
DBREPLICAID: 21425584590
DBPATH: redirect.nsf
DBTITLE: Redirect AP
DATETIME: 08.03.2015 09:50:20 PM
READS: 2
Adds: 0
Updates: 0
Deletes: 0
.
.
.
.
please see the source capture here
I would like to import the txt file into the following format in SQL
1st column 2nd column 3rd column 4th column 5th column .....
DBSERVER DBREPLICAID DBPATH DBTITLE DATETIME ......
HKSER 51376694590 redirect.nsf Redirect AP 09.03.2015 09:44:21 AM
HKSER 21425584590 redirect.nsf Redirect AP 08.03.2015 01:08:07 AM
please see the output capture here
Thanks a lot!

You can dump that file into a temporary table, with just a single text column. Once imported, you loop through that table using a cursor, storing into variables the content, and every 10 records inserting a new row to the real target table.
Not the most elegant solution, but it's simple and it will do the job.

Using Bulk insert you can insert these headers and data in two different columns and then using dynamic sql query, you can create a table and insert data as required.

For Something like this I'd probably use SSIS.
The idea is to create a Script Component (As a Transformation)
You'll need to manually define your Output cols (Eg DBSERVER String (100))
The Src is your File (read Normally)
The Idea is that you build your rows line by line then add the full row to the Output Buffer.
Eg
Output0Buffer.AddRow();
Then write the rows to your Dest.
If all files have a common format then you can wrap the whole thiing in a for each loop

Exporting data containing line feeds as CSV from PostgreSQL

I'm trying to export data From postgresql to csv.
First i created the query and tried exporting From pgadmin with the File -> Export to CSV. The CSV is wrong, as it contains for example :
The header : Field1;Field2;Field3;Field4
Now, the rows begin well, except for the last field that it puts it on another line:
Example :
Data1;Data2;Data3;
Data4;
The problem is i get error when trying to import the data to another server.
The data is From a view i created.
I also tried
COPY view(field1,field2...) TO 'C:\test.csv' DELIMITER ',' CSV HEADER;
It exports the same file.
I just want to export the data to another server.
Edit:
When trying to import the csv i get the error :
ERROR : Extra data after the last expected column. Context Copy
actions, line 3: <<"Data1, data2 etc.">>
So the first line is the header, the second line is the first row with data minus the last field, which is on the 3rd line, alone.

In order for you to export the file in another server you have two options:
Creating a shared folder between the two servers, so that the
database also has access to this directory.
COPY (SELECT field1,field2 FROM your_table) TO '[shared directory]' DELIMITER ',' CSV HEADER;
Triggering the export from the target server using the STDOUT of
COPY. Using psql you can achieve this running the following
command:
psql yourdb -c "COPY (SELECT * FROM your_table) TO STDOUT" > output.csv
EDIT: Addressing the issue of fields containing line feeds (\n)
In case you wanna get rid of the line feeds, use the REPLACE function.
Example:
SELECT E'foo\nbar';
?column?
----------
foo +
bar
(1 Zeile)
Removing the line feed:
SELECT REPLACE(E'foo\nbaar',E'\n','');
replace
---------
foobaar
(1 Zeile)
So your COPY should look like this:
COPY (SELECT field1,REPLACE(field2,E'\n','') AS field2 FROM your_table) TO '[shared directory]' DELIMITER ',' CSV HEADER;

the described above export procedure is OK, e.g:
t=# create table so(i int, t text);
CREATE TABLE
t=# insert into so select 1,chr(10)||'aaa';
INSERT 0 1
t=# copy so to stdout csv header;
i,t
1,"
aaa"
t=# create table so1(i int, t text);
CREATE TABLE
t=# copy so1 from stdout csv header;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> i,t
1,"
aaa"
>> >> >> \.
COPY 1
t=# select * from so1;
i | t
---+-----
1 | +
| aaa
(1 row)

How to format Oracle SQL text-only select output

I am using Oracle SQL (in SQLDeveloper, so I don't have access to SQLPLUS commands such as COLUMN) to execute a query that looks something like this:
select assigner_staff_id as staff_id, active_flag, assign_date,
complete_date, mod_date
from work where assigner_staff_id = '2096';
The results it give me look something like this:
STAFF_ID ACTIVE_FLAG ASSIGN_DATE COMPLETE_DATE MOD_DATE
---------------------- ----------- ------------------------- ------------------------- -------------------------
2096 F 25-SEP-08 27-SEP-08 27-SEP-08 02.27.30.642959000 PM
2096 F 25-SEP-08 25-SEP-08 25-SEP-08 01.41.02.517321000 AM
2 rows selected
This can very easily produce a very wide and unwieldy textual report when I'm trying to paste the results as a nicely formatted quick-n-dirty text block into an e-mail or problem report, etc. What's the best way to get rid of all tha extra white space in the output columns when I'm using just plain-vanilla Oracle SQL? So far all my web searches haven't turned up much, as all the web search results are showing me how to do it using formatting commands like COLUMN in SQLPLUS (which I don't have).

In your statement, you can specify the type of output you're looking for:
select /*csv*/ col1, col2 from table;
select /*Delimited*/ col1, col2 from table;
there are other formats available such as xml, html, text, loader, etc.
You can change the formatting of these particular options under tools > preferences > Database > Utilities > Export
Be sure to choose Run Script rather than Run Statement.
* this is for Oracle SQL Developer v3.2

What are you using to get the results? The output you pasted looks like it's coming from SQL*PLUS. It may be that whatever tool you are using to generate the results has some method of modifying the output.
By default Oracle outputs columns based upon the width of the title or the width of the column data which ever is wider.
If you want make columns smaller you will need to either rename them or convert them to text and use substr() to make the defaults smaller.
select substr(assigner_staff_id, 8) as staff_id,
active_flag as Flag,
to_char(assign_date, 'DD/MM/YY'),
to_char(complete_date, 'DD/MM/YY'),
mod_date
from work where assigner_staff_id = '2096';

What you can do with sql is limited by your tool. SQL Plus has commands to format the columns but they are not real easy to use.
One quick approach is to paste the output into excel and format it there or just attach the spreadsheet. Some tools will save the output directly as a spreadsheet.

Nice question. I really had to think about it.
One thing you could do is change your SQL so that it only returns the narrowest usable columns.
e.g. (I'm not very hot on oracle syntax, but something similar should work):
select substring( convert(varchar(4), assigner_staff_id), 1, 4 ) as id,
active_flag as act, -- use shorter column name
-- etc.
from work where assigner_staff_id = '2096';
Does that make sense?
If you were doing this on unix/linux, I would suggest running it from the command line and piping it through an awk script.
If I've miss-understood, then please update your question and I'll have another go :)

If you don't have alot of rows returned I'll often use Tom Kytes print_table function.
SQL> set serveroutput on
SQL> execute print_table('select * from all_objects where rownum < 3');
OWNER : SYS
OBJECT_NAME : /1005bd30_LnkdConstant
SUBOBJECT_NAME :
OBJECT_ID : 27574
DATA_OBJECT_ID :
OBJECT_TYPE : JAVA CLASS
CREATED : 22-may-2008 11:41:13
LAST_DDL_TIME : 22-may-2008 11:41:13
TIMESTAMP : 2008-05-22:11:41:13
STATUS : VALID
TEMPORARY : N
GENERATED : N
SECONDARY : N
-----------------
OWNER : SYS
OBJECT_NAME : /10076b23_OraCustomDatumClosur
SUBOBJECT_NAME :
OBJECT_ID : 22390
DATA_OBJECT_ID :
OBJECT_TYPE : JAVA CLASS
CREATED : 22-may-2008 11:38:34
LAST_DDL_TIME : 22-may-2008 11:38:34
TIMESTAMP : 2008-05-22:11:38:34
STATUS : VALID
TEMPORARY : N
GENERATED : N
SECONDARY : N
-----------------
PL/SQL procedure successfully completed.
SQL>
If its lots of rows, i'll just do the query in SQL Developer and save as xls, businessy types love excel for some reason.

Why not just use the "cast" function?
select
(cast(assigner_staff_id as VARCHAR2(4)) AS STAFF_ID,
(cast(active_flag as VARCHAR2(1))) AS A,
(cast(assign_date as VARCHAR2(10))) AS ASSIGN_DATE,
(cast(COMPLETE_date as VARCHAR2(10))) AS COMPLETE_DATE,
(cast(mod_date as VARCHAR2(10))) AS MOD_DATE
from work where assigner_staff_id = '2096';

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to import csv/data in postgresql - sql

Did it using MySql import option as MySql had more sophisticated and relative easy import approach for importing data.

Related

POSTGRESQL invalid input syntax for type integer

Regex comparison in Oracle between 2 varchar columns (from different tables)

import a txt file with 2 columns into different columns in SQL Server Management Studio

Exporting data containing line feeds as CSV from PostgreSQL

How to format Oracle SQL text-only select output

Categories

Resources