Oracle PL-SQL : Import multiple delimited files into table - sql

I have multiple files (f1.log, f2.log, f3.log etc)
Each file has the data in ; & = delimited format. (new lines are delimited by ; and fields are delimited by =) e.g.
data of f1:
1=a;2=b;3=c
data of f2:
1=p;2=q;3=r
I need to read all these files and import data into table in format:
filename number data
f1 1 a
f1 2 b
f1 3 c
f2 1 p
[...]
I am new to SQL. Can you please guide me, how can do it?

Use SQL*Loader to get the files into a table. Assuming you have a table created a bit like:
create table FLOG
(
FILENAME varchar2(1000)
,NUM varchar2(1000)
,DATA varchar2(1000)
);
Then you can use the following control file:
LOAD DATA
INFILE 'f1.log' "str ';'"
truncate INTO TABLE flog
fields terminated by '=' TRAILING NULLCOLS
(
filename constant 'f1'
,num char
,data char
)
However, you will need a different control file for each file. This can be done by making the control file dynamically using a shell script. A sample shell script can be:
cat >flog.ctl <<_EOF
LOAD DATA
INFILE '$1.log' "str ';'"
APPEND INTO TABLE flog
fields terminated by '=' TRAILING NULLCOLS
(
filename constant '$1'
,num char
,data char
)
_EOF
sqlldr <username>/<password>#<instance> control=flog.ctl data=$1.log
Saved as flog.sh it can then be run like:
./flog.sh f1
./flog.sh f2

Related

Match multiline SQL statement in pgdump

I have PostgreSQL database dump by pg_dump version 9.5.2, which contains DDLs and also INSERT INTO statements for each table in given database. Dump looks like this:
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
CREATE TABLE unimportant_table (
id integer NOT NULL,
col1 character varying
);
CREATE TABLE important_table (
id integer NOT NULL,
col2 character varying NOT NULL,
unimportant_col character varying NOT NULL
);
INSERT INTO unimportant_table VALUES (123456, 'some data split into
- multiple
- lines
just for fun');
INSERT INTO important_table VALUES (987654321, 'some important data', 'another crap split into
- lines');
...
-- thousands of inserts into both tables
The dump file is really large and it is produced by another company, so I am not able to influence the export process. I need create 2 files from this dump:
All DDL statements (all statements that doesn't start with INSERT INTO)
All INSERT INTO important_table statements (I want restore only some tables from dump)
If all statements would be on single line without new line character in the data, it will be very easy to create 2 SQL script by grep, for example:
grep -v '^INSERT INTO .*;$' my_dump.sql > ddl.sql
grep -o '^INSERT INTO important_table .*;$' my_dump.sql > important_table.sql
# Create empty structures
psql < ddl.sql
# Import only one table for now
psql < important_table.sql
Firstly I was thinking about using grep but I did not find, how to process multiple lines at once, then I tried sed but it is returning only single line inserts. I also used https://regex101.com/ to find out right regular expression but I don't know how to combine it with grep or sed:
^(?!(INSERT INTO)).*$ -- for ddl
^INSERT INTO important_table(\s|[[:alnum:]])*;$ -- for inserts
I found similar question pcregrep multiline SQL match but there is no answer. Also, I don't mind if the solution will work with grep, sed or whatever you suggest, but it should work on Ubuntu 18.04.4 TLS.
Here is a bash based solution that uses perl one-liners to prepare your SQL dump data for the subsequent grep statements.
In my approach, the goal is to get one SQL statement on one line through a script that I called prepare.sh. It got a little more complicated because I wanted to accomodate for semicolons and quotes within your insert data strings (these, along with the line breaks, are represented by their hex codes in the intermediate output):
EDIT: In response to #32cupo's comment, below is a modified set of scripts that avoids xargs with large data sets (although I don't have huge dump files to test it with):
#!/bin/bash
perl -pne 's/;(?=\s*$)/__ENDOFSTATEMENT__/g' \
| perl -pne 's/\\/\\\\x5c/g' \
| perl -pne 's/\n/\\\\x0a/g' \
| perl -pne 's/"/\\\\x22/g' \
| perl -pne 's/'\''/\\\\x27/g' \
| perl -pne 's/__ENDOFSTATEMENT__/;\n/g' \
Then, a separate script (called ddl.sh) includes your grep statement for the DDL (and, with the help of the loop, only feeds smaller chunks (lines) into xargs):
#!/bin/bash
while read -r line; do
<<<"$line" xargs -I{} echo -e "{}"
done < <(grep -viE '^(\\\\x0a)*insert into')
Another separate script (called important_table.sh) includes your grep statement for the inserts into important-table:
#!/bin/bash
while read -r line; do
<<<"$line" xargs -I{} echo -e "{}"
done < <(grep -iE '^(\\\\x0a)*insert into important_table')
Here is the set of scripts in action (please also note that I spiced up your insert data with some semicolons and quotes):
~/$ cat dump.sql
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
CREATE TABLE unimportant_table (
id integer NOT NULL,
col1 character varying
);
CREATE TABLE important_table (
id integer NOT NULL,
col2 character varying NOT NULL,
unimportant_col character varying NOT NULL
);
INSERT INTO unimportant_table VALUES (123456, 'some data split into
- multiple
- lines
;just for fun');
INSERT INTO important_table VALUES (987654321, 'some important ";data"', 'another crap split into
- lines;');
...
-- thousands of inserts into both tables
~/$ cat dump.sql | ./prepare.sh | ./ddl.sh >ddl.sql
~/$ cat ddl.sql
SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
CREATE TABLE unimportant_table (
id integer NOT NULL,
col1 character varying
);
CREATE TABLE important_table (
id integer NOT NULL,
col2 character varying NOT NULL,
unimportant_col character varying NOT NULL
);
...
-- thousands of inserts into both tables
~/$ cat dump.sql | ./prepare.sh | ./important_table.sh > important_table.sql
~/$ cat important_table.sql
INSERT INTO important_table VALUES (987654321, 'some important ";data"', 'another crap split into
- lines;');

Postgres copy to TSV file with header

I have a function like so -
CREATE
OR REPLACE FUNCTION ind (bucket text) RETURNS table (
middle character varying (100),
last character varying (100)
) AS $body$ BEGIN return query
select
fname as first,
lname as last
from all_records
; END;
$body$ LANGUAGE PLPGSQL;
How do I output the results of select ind ('Mob') into a tsv file?
I want the output to look like this -
first last
MARY KATHERINE
You can use the COPY command
example:
COPY (select * from ind('Mob')) TO '/tmp/ind.tsv' CSV HEADER DELIMITER E'\t';
the file '/tmp/ind.tsv' will contain you data
Postgres doesn't allow copy with header for tsv for some reason.
If you're using a linux based system you can do it with a script like this:
#create file with tab delimited column list (use \t between each column name)
echo -e "user_id\temail" > user_output.tsv
#now you can append the results of your query to that file by copying to STDOUT
psql -h your_host_name -d your_database_name -c "\copy (SELECT user_id, email FROM my_user_table) to STDOUT;" >> user_output.tsv
Alternatively, if your script is long and you don't want to pass it in with -c command you can use the same approach from a .sql file, use "--quiet" to avoid notices being passed into your file
psql --quiet -h your_host_name -d your_database_name -f your_sql_file.sql >> user_output.tsv

SQL: Loading a CSV file with BULK statement causing problems with hebrew strings

I'm trying to insert very large csv file into a table on SQL server.
On the table itself the fields are defined as nvarchar but when i'm trying to use the bulk statement to load that file - all the hebrew fields are gibberish.
When i'm using the INSERT statement everything is ok but the BULK one's getting all wrong. I even tried to put the string in the CSV file with the N'string' thing - but it just came to be (in the table: N'gibberish'.
The reason i'm not using just INSERT is because the file contains more than 250K long rows.
This is the statement that i'm using. The delimiter is '|' on purpose:
BULK INSERT [dbo].[SomeTable]
FROM 'C:\Desktop\csvfilesaved.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '|',
ROWTERMINATOR = '\n',
ERRORFILE = 'C:\Desktop\Error.csv',
TABLOCK
)
And this is two row sample of the csv file:
2017-03|"מחוז ש""ת דן"|בני 18 עד 24|זכר|א. לא למד|ב. קלה|יהודים|ב. בין 31 ל-180 יום||הנדסאים, טכנאים, סוכנים ובעלי משלח יד נלווה|1|0|0|1|0|0
2017-03|"מחוז ש""ת דן"|בני 18 עד 24|זכר|א. לא למד|ג. בינונית|יהודים|ב. בין 31 ל-180 יום||עובדי מכירות ושירותים|1|0|0|1|0|0
Thanks!

import array type into hana?

I am importing data into SAP HANA using the CSV files.
When I try to import a column which has an array type then it results in the following error
ARRAY type is not compatible with PARAMETER TYPE
For example
CREATE COLUMN TABLE "SCHEMA"."TABLE"
( 'ID' INT,
'SUBJECTS' INT ARRAY)
The above query creates the table and when I run
INSERT INTO "SCHEMA"."TABLE" VALUES (1,ARRAY(1,2,3))
It inserts successfully into the HANA database.
But when I try
INSERT INTO "SCHEMA"."TABLE" VALUES (1,"{1,2,3}")
It does not work.So how can I import the array values in the CSV file to the column in HANA database.
Array storage types can currently only be created by using the ARRAY() function.
You could construct the INSERT statement in a loop but you still need to construct the ARRAY() call for every record.
Ok, here's the example you asked for.
By now you should understand that there is no simple IMPORT command that would automatically insert arrays into a HANA table.
That leaves you with two options as I see it:
you write a loader program that reads your CSV file and parses
the array data {..., ... , ...} and makes INSERT statements with
ARRAY functions out of it.
or
You load the data in two steps:
2.1 Load the data from the CSV as-is and put the array data into a CLOB column.
2.2. Add the array columns to the table and run a loop that takes the CLOB data, replaces the curly brackets with normal brackets and creates a dynamic SQL statement.
Like so:
create column table array_imp_demo as (
select owner_name , object_type, to_clob( '{'|| string_agg ( object_oid, ', ') || '}' )array_text
from ownership
group by owner_name, object_type);
select top 10 * from array_imp_demo;
/*
OWNER_NAME OBJECT_TYPE ARRAY_TEXT
SYS TABLE {142540, 142631, 142262, 133562, 185300, 142388, 133718, 142872, 133267, 133913, 134330, 143063, 133386, 134042, 142097, 142556, 142688, 142284, 133577, 185357, 142409, 133745, 142902, 133308, 133948, 134359, 143099, 133411, 134135, 142118, 142578, 142762, 142306, 133604, 142429, 133764, 142928, 133322, 133966, 134383, 143120, 133443, 134193, 142151, 142587, 142780, 142327, 133642, 142448, 133805, 142967, 185407, 133334, 133988, 134624, 143134, 133455, 134205, 142173, 142510, 142606, 142798, 142236, 133523, 142359, 133663, 142465, 133825, 142832, 133175, 133866, 134269, 143005, 133354, 134012, 134646, 143148, 133497, 134231, 142195, 142526, 142628, 142816, 142259, 133551, 142382, 133700, 142493, 133855, 142862, 133235, 133904, 134309, 143051, 133373, 134029, 142082, 143306, 133513, 134255, 142216, 142553, 142684, 142281, 133572, 185330, 142394, 133738, 142892, 133299, 133929, 134351, 143080, 133406, 134117, 142115, 142576, 142711, 142303, 133596, 142414, 133756, 142922, 185399, 133319, 133958, 134368, 143115,
SYSTEM TABLE {154821, 146065, 146057, 173960, 174000, 173983, 144132, 173970}
_SYS_STATISTICS TABLE {151968, 146993, 147245, 152026, 147057, 147023, 151822, 147175, 147206, 151275, 151903, 147198, 147241, 151326, 151798, 152010, 147016, 147039, 147068, 151804, 151810, 147002, 147202, 151264, 151850, 147186, 147114, 147228, 151300, 151792, 151992, 147030, 147062, 151840, 147181, 147210, 151289, 151754, 149419, 150274, 147791, 150574, 147992, 149721, 150672, 148132, 149894, 147042, 148434, 149059, 150071, 147518, 148687, 149271, 150221, 150877, 147716, 148913, 150388, 147957, 149675, 150634, 148102, 149829, 148267, 148979, 149976, 147494, 148663, 151107, 149224, 150178, 147667, 148855, 149532, 150306, 147925, 149602, 150598, 148080, 149792, 148179, 149926, 147417, 148547, 149186, 150107, 147568, 148804}
_SYS_EPM TABLE {143391, 143427, 143354, 143385, 143411, 143376, 143398, 143367, 143414, 143344, 175755}
_SYS_REPO TABLE {145086, 151368, 171680, 145095, 152081, 169183, 151443, 149941, 154366, 143985, 145116, 152104, 151496, 169029, 177728, 179047, 145065, 169760, 178767, 151659, 169112, 169258, 153736, 177757, 174923, 145074, 150726, 151697, 169133, 178956, 145083, 169171, 168992, 145092, 177665, 169178, 151378, 169271, 178881, 174911, 154128, 143980, 145101, 152098, 151481, 177720, 152504, 145062, 151570, 169102, 154058, 145071, 170733, 151687, 169130, 145080, 171629, 169166, 178846, 145089, 149588, 151373, 177614, 143976, 145098, 152087, 151458, 149955, 178907, 154386, 145059, 169605, 151529, 169035, 178579, 151176, 179178, 145068, 150709, 151670, 169124, 174905, 177778, 154244, 145077, 170883, 169158, 144072, 152681, 144285, 154415, 144620, 145268, 168884, 144895, 143512, 151428, 168774, 143750, 152337, 168558, 144114, 149559, 152719, 144327, 144674, 145508, 168924, 144939, 143578, 152135, 143793, 152392, 168587, 144151, 152753, 144370, 144720, 145722, 168960, 144990, 143626, 152174, 143832, 152435, 168620, 144188, 152785,
_SYS_TASK TABLE {146851, 146847, 146856, 146231, 146143, 146839, 146854, 146834, 146430, 146464, 146167, 146505, 146205, 146257, 146384, 146313, 146420, 146356, 146454, 146155, 146495, 146193, 146525, 146244, 146368, 146296, 146410, 146346, 146443, 146148, 146484, 146181, 146517, 146234, 146275, 146395, 146325}
_SYS_AFL TABLE {177209, 176741, 176522, 177243, 176777, 176692, 176201, 177294, 176929, 177328, 176967, 177383, 177105, 177015, 176826, 177143, 177056, 176866, 177215, 176748, 176550, 176474, 177249, 176784, 176699, 176218, 146871, 177300, 176935, 177335, 176973, 177389, 177111, 177021, 176835, 177156, 177062, 176883, 185427, 177221, 176755, 176572, 176487, 177261, 176790, 176706, 176398, 177306, 176941, 177341, 176979, 177397, 177119, 177033, 176843, 177162, 177069, 176889, 177228, 176762, 176589, 177267, 176799, 176713, 176416, 177313, 176953, 177348, 176991, 177404, 177127, 177040, 176849, 177173, 177075, 176895, 177195, 176732, 176507, 177234, 176768, 177274, 176805, 176720, 176448, 177285, 176921, 177319, 176959, 177354, 176997, 177375, 177095, 177007, 176818, 177411, 177133, 177047, 176858, 177179, 177081, 176911, 177201, 176738, 176519, 177241, 176775, 176690, 176196, 177280, 176814, 176727, 176464, 177291, 176927, 177326, 176965, 177360, 177003, 177381, 177103, 177013, 176824, 177141, 177053, 176864, 177191, 177091,
_SYS_XB TABLE {146971, 146957}
DEVDUDE TABLE {167121, 165374, 182658, 156616, 173111, 181568, 174901, 183413, 184432, 183498, 183470, 184464, 155821, 173102, 183613, 184495, 155857, 166744, 180454, 184547, 156234, 172096, 166765, 165548, 184649, 183399, 184357, 184577, 183477, 181594, 183537, 181572, 167201, 184685, 185467, 183406, 184422, 184610, 183491, 155842, 172923, 157723, 182636, 167895, 183463, 184454, 183505, 165542, 183606, 184488, 155849, 172749, 157626, 184527, 183449, 166759, 184627, 182827, 184347, 184568, 157619, 172118, 183530, 181556, 167137, 184670, 182642, 184411, 184598, 183484, 155835, 183456, 183593, 181584, 167328, 183421, 184443, 183586, 184474, 155828, 166392, 183620, 184517, 183435, 183442, 183512, 166753, 184557, 156598, 172106, 183523, 166771, 166568, 184660, 182630, 184381, 184587, 183428, 157681, 182649, 167264, 168236}
ADMIN TABLE {158520, 158925, 158982, 158492, 158571, 158583, 158560, 158541, 158744}
*/
Ok, let's just assume that this is the data that you managed to load from your CSV file into a table and that the ARRAY_DATA column is a big CLOB.
Now, the data is in the table, but you need to put it into an ARRAY column.
alter table array_imp_demo add (array_data integer array);
The unconvenient part follows now: creating a new UPDATE command for each record to transform the CLOB data into a proper ARRAY() function call.
Be aware that this update command works correctly, because (OWNER, OBJECT_TYPE) are used as keys in this case.
do begin
declare vsqlmain nvarchar (50) :='UPDATE array_imp_demo set array_data = ARRAY ';
declare vsql nvarchar(5000);
declare cursor c_upd for
select OWNER_NAME, OBJECT_TYPE,
replace (replace (ARRAY_TEXT, '{', '(' ), '}', ')' ) array_data
from array_imp_demo;
for cur_row as c_upd do
vsql = :vsqlmain || cur_row.array_data;
vsql = :vsql || ' WHERE owner_name = ''' || cur_row.owner_name || '''';
vsql = :vsql || ' AND object_type = ''' || cur_row.object_type || '''';
exec :vsql;
end for;
end;
select * from array_imp_demo;
OWNER_NAME OBJECT_TYPE ARRAY_TEXT ARRAY_DATA
SYS TABLE {142540, 142631, 142262,... 142540, 142631, 142262,...
SYSTEM TABLE {154821, 146065, 146057,... 154821, 146065, 146057,...
_SYS_STATISTICS TABLE {151968, 146993, 147245,... 151968, 146993, 147245,...
_SYS_EPM TABLE {143391, 143427, 143354,... 143391, 143427, 143354,...
_SYS_REPO TABLE {145086, 151368, 171680,... 145086, 151368, 171680,...
After that you can drop the CLOB column.
Ok, that's about it from my side.
Have you tried using CTL method to import data. I hope that might help.

BCP/ Bulk Insert Fails (tab delimited file)

I have been trying to import data (tab delimited) into SQL server. The source data is exported from IBM Cognos. Data can be downloaded from: sample data
I have tried BCP / Bulk Insert, but it did not help. The original datafile contains a header row (which needs to be skipped).
==================================
Schema:
CREATE TABLE [dbo].[DIM_Assessment](
[QueryType] [nvarchar](4000) NULL,
[QueryDate] [nvarchar](4000) NULL,
[APUID] [nvarchar](4000) NULL,
[AssessmentID] [nvarchar](4000) NULL,
[ICDCode] [nvarchar](4000) NULL,
[ICDName] [nvarchar](4000) NULL,
[LoadDate] [nvarchar](4000) NULL
) ON [PRIMARY]
GO
=============================
Format File generated using the following command
bcp [dbname].dbo.dim_assessment format nul -c -f C:\config\dim_assessment.Fmt -S <IP> -U sa -P Pwd
Content of the format file:
11.0
7
1 SQLCHAR 0 8000 "\t" 1 QueryType SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 8000 "\t" 2 QueryDate SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 8000 "\t" 3 APUID SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 8000 "\t" 4 AssessmentID SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 8000 "\t" 5 ICDCode SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 8000 "\t" 6 ICDName SQL_Latin1_General_CP1_CI_AS
7 SQLCHAR 0 8000 "\r\n" 7 LoadDate SQL_Latin1_General_CP1_CI_AS
=============================
I tried importing data using BCP / Bulk Insert, however, non of them worked.
bcp [dbname].dbo.dim_assessment IN C:\dim_assessment.dat -f C:\config\dim_assessment.Fmt -S <IP> -U sa -P Pwd
BULK INSERT dim_assessment FROM '\\dbserver\DIM_Assessment.dat'
WITH (
DATAFILETYPE = 'char',
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\r\n'
);
GO
Thank you in advance for your help#
Your input file is in a terrible format.
Your format file and your BULK INSERT command both state that the end of a row should be a carriage return and line feed combination, and that there are seven columns of data. However if you open your CSV file in Notepad you will quickly see that the carriage returns and line feeds are not observed correctly in Windows (meaning they must be something other than precisely \r\n). You can also see that there aren't actually seven columns of data, but five:
QueryType QueryDate APUID AssessmentID ICDCode ICDName LoadDate
PPIC 2013-11-20 10:23:14 11431 10963 Tremors
PPIC 2013-11-20 10:23:14 11431 11299 THUMB PAIN
PPIC 2013-11-20 10:23:14 11431 11348 Environmental allergies
...
Just looking at it visually you can tell it isn't right, and you need to get a better source file before throwing it over the wall at SQL Server and expecting it to handle it smoothly:
Just Saved your file as .CSV and bulk inserted with the following statement.
BULK INSERT dim_assessment FROM 'C:\Blabla\TestFile.csv'
WITH (
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
);
GO
Returned Message
(22587 row(s) affected)
Loaded Data
Just notice that some data from ICD name has overflown into LoadDate Column, Just use the | pipe character to deliminate and use the same bulk insert statement with FIELDTERMINATOR = '|' and happy days .
Opening the file via Excel shows the following:
There are indeed 7 row headers
Only the first six of them are populated
Columns 1, 2 and 3 hold identical values
There is some confusing data, where the fifth column can be either empty, or filled with numbers, or filled with text.
I guess that, in these conditions, bulk insert might not work properly. As Excel seems to manage your file in quite a clean way, you should think about an extra step, from CSV to Excel and then to your database.
Ok, so, this was a seemingly simple task to push delimited data from flat-file to SQL server. I thought BCP was the way to go (i used it earlier and was successful).
A quick rundown of what was suggested:
a. fix the source file
b. saving source data in native excel format
c. saving source data as pipe-delimited data
I tried all the options, but, it was adding multiple steps to my process, but was do-able.
I stumbled upon invoke-sqlcmd & import-csv commandlets from powershell. Turns out, I can import the data using powershell directly. it is a bit slow at this time, but, i can live with that for now.
$DATA=IMPORT-CSV dim_assessment.CSV -Delimiter "`t"
FOREACH ($LINE in $DATA)
{
$QueryType="`'"+$Line.QueryType+"`'"
$QueryDate="`'"+$Line.QueryDate+"`'"
$APUID="`'"+$Line.APUID+"`'"
$AssessmentID="`'"+$Line.AssessmentID+"`'"
$ICDCode="`'"+$Line.ICDCode+"`'"
$ICDName=$Line.ICDName
$ICDName = $ICDName.replace("'","''")
$ICDName="`'"+$ICDName+"`'"
$LoadDate="`'"+$Line.LoadDate+"`'"
$SQLHEADER="INSERT INTO [dim_assessment] ([QueryType],[QueryDate],[APUID],[AssessmentID],[ICDCode],[ICDName],[LoadDate])"
$SQLVALUES="VALUES ($QueryType,$QueryDate,$APUID,$AssessmentID,$ICDCode,$ICDName,$LoadDate)"
$SQLQUERY=$SQLHEADER+$SQLVALUES
Invoke-Sqlcmd –Query $SQLQuery -ServerInstance HA -U sa -P Pwd
}
Thanks for all your help!