LOAD DATA INFILE (*.csv) - ignore empty cells - sql

I'm about to import a large (500 MB) *.csv file to a MySQL database.
I'm as far as that:
LOAD DATA INFILE '<file>'
REPLACE
INTO TABLE <table-name>
FIELDS
TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
IGNORE 1 LINES ( #Header
<column-name1>,
<column-name2>,
...
);
I have a problem with one of the coluns (it's data type is int) - I get an error Message:
Error Code: 1366 Incorrect integer value: ' ' for column at row
I looked at this line in the *.csv-file. The cell that causes the error has just a whitespace inside (like this: ...; ;...).
How can I tell SQL to ignore whitespaces in this column?
As the *.csv-file is very big and I have to import even bigger ones afterwards, I'd like to avoid editing the *.csv-file; I'm looking for a SQL-solution.

Add a SET COLUMN like so:
LOAD DATA INFILE 'file.txt'
INTO TABLE t1
(column1, #var1)
SET column2 = #var1/100;
You need to replace the #var1/100 with an expression that handles the 'space' and convert to -Infinity or 0 or 42... not sure..

This answer was originally included in the question as an edit by #speendo; I have converted it into a proper answer.
The solution is:
LOAD DATA INFILE '<file>'
REPLACE
INTO TABLE <table-name>
FIELDS
TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
IGNORE 1 LINES ( #Header
<column-name1>,
<column-name2>,
#var1 #the variable that causes the problem
...
)
SET <column-name-of-problematic-column> = CASE
WHEN #var1 = ' ' THEN NULL
ELSE #var1
END
;

Related

Remove a special character that exists between 2 characters

I have the following string:
6103951001#136,00#S0#0#99999999#8000010000#10.12.2019#
31.10.2019#"MATZOURAKIS IOANNISMROA118#OSPh"#99470##APE A 54226#K
What I want is to delete the special character HORIZONTAL_TAB(#) which is between " here is the part of the string: "MATZOURAKIS IOANNISMROA118#OSPh"
How can I do?
Thanks
PS. I am using the following to upload the data from a TAB delimited text file
data: data_table type standard table of char255,
wa_data_table like line of data_table.
lv_file = p_file.
cl_gui_frontend_services=>gui_upload(
exporting
filename = lv_file
filetype = 'ASC'
changing
data_tab = data_table
Now I am doing the following in order to catch the problem
loop at data_table into wa_data_table.
find all occurrences of '"' in wa_data_table match count lv_count.
if sy-subrc = 0 and lv_count = 2.
* REPLACE ALL OCCURRENCES OF REGEX
* '(#(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$))' IN wa_data_table WITH ' '.
split wa_data_table at '"'
into split_data1 split_data2 split_data3.
replace all occurrences of cl_abap_char_utilities=>horizontal_tab
in split_data2 with ' '.
concatenate split_data1 split_data2 split_data3
into wa_data_table.
endif.
endloop.
I think that we must handle the cl_abap_char_utilities=>horizontal_tab not with the character # but in another way.
You can use following in ABAP to find this character and replace it:
data : lv_test type string VALUE '6103951001#136,00#S0#0#99999999#8000010000#10.12.2019# 31.10.2019#"MATZOURAKIS IOANNISMROA118#OSPh"#99470##APE A 54226#K'.
REPLACE ALL OCCURRENCES OF REGEX '(#(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$))' IN lv_test WITH ''.
write lv_test.
This is the correct solution for this problem.
replace all occurrences of regex
'(\t(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$))'
in table data_table with ' '.
replace all occurrences of regex '["]' in table data_table with ''.

Replace Function - Handling single quotes and forward slashes in SQL SERVER 2008

I want to replace some data from my database where single quotes and slashes are present.
The line below is exactly how it appears in the database and I only want to remove 'F/D', from the record.
('P/P','F/D','DFC','DTP')
Been using varations of
UPDATE tablename SET columnname = REPLACE(columnname, '''F/D,''', '')
WHERE RECORDID = XXXXX
Also been using varations of
UPDATE tablename SET columnname = REPLACE(columnname, 'F/D,', '')
WHERE RECORDID = XXXXX
Seems like it should be a simple fix but I haven't had any luck yet - all suggestions are appreciated.
The reason your's doesn't work is because you aren't including the quotes. You are looking for F/D, and 'F/D,' and your data it is 'F/D',.
If it's simply 'F/D' from all values you want removed, then you also need to remove a comma and the quotes. This method removes 'F/D' and then, any double commas (in case 'F/D' is in the middle of the string).
declare #var varchar(64) = '(''P/P'',''F/D'',''DFC'',''DTP'')'
select replace(replace(#var,'''F/D''',''),',,',',')
--update tablename
--set columnname = replace(replace(columnname,'''F/D''',''),',,',',')
--where RECORDID = 1324
If you want to replace the second element in the string, here is a way:
select
#var
--find the location of the first comma
,charindex(',',#var,0)
--find the location of the second comma
,charindex(',',#var,charindex(',',#var) + 1)
--Put it all together, using STUFF to replace the values between this range with nothing
,stuff(#var,charindex(',',#var,0),charindex(',',#var,charindex(',',#var) + 1) - charindex(',',#var,0),'')
Your first version should work fine if the comma is in the right place:
UPDATE tablename
SET columnname = REPLACE(columnname, '''F/D'',', '')
WHERE RECORDID = XXXXX;
Note that this will not replace 'F/D' if it is the first or last element in the value. If that is an issue, I would suggest that you ask another question.

SQL* Loader Handling formatted dates

I'm developing this solution where I'll receive a spool file and I need to insert it to a table.
I always use SQL* Loader and it fits well. But I never used it with dates. I'm getting this error as I'll show:
Control File
OPTIONS (ERRORS=999999999, ROWS=999999999)
load data
infile 'spool.csv'
append
into table A_CONTROL
fields terminated by ","
TRAILING NULLCOLS
(
AStatus,
ASystem,
ADate,
AUser,
)
spool.csv
foo,bar,2015/01/12 13:22:21,User
But when I run the loader I got this error
Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
AStatus FIRST * , CHARACTER
ASystem NEXT * , CHARACTER
ADate NEXT * , CHARACTER
AUser NEXT * , CHARACTER
Record 1: Rejected - Error on table A_CONTROL, column ADate.
ORA-01861: literal does not match format string
Table A_CONTROL:
0 Rows successfully loaded.
1 Row not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Convert the string to date for insertion.
OPTIONS (ERRORS=999999999, ROWS=999999999)
load data
infile 'spool.csv'
append
into table A_CONTROL
fields terminated by ","
TRAILING NULLCOLS
(
AStatus,
ASystem,
ADate "TO_DATE(:ADate,'YYYY/MM/DD HH24:MI:SS')",
AUser,
)

Import csv date changes to 0000-00-00

LOAD DATA INFILE 'filename.csv' INTO TABLE table_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n' IGNORE 1 LINES
(Date,col2,col3,col4,col5,col6,col7,#dummy_variable)
Set dummy_variable = 0
Loads fine but date format shows reads 0000-00-00
Date in csv is in the style dd/mm/yyyy, can't be changed in csv file.
You need to adjust the date to a mysql format with str_to_date
Something like below should work.
LOAD DATA INFILE 'filename.csv' INTO TABLE table_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n' IGNORE 1 LINES
(#myDate,col2,col3,col4,col5,col6,col7,#dummy_variable)
Set dummy_variable = 0, Date = str_to_date(#myDate,'%d/%m/%Y')

Detect \n character saved in SQLite table?

I designed a table with a column whose data contains \n character (as the separator, I used this instead of comma or anything else). It must save the \n characters OK because after loading the table into a DataTable object, I can split the values into arrays of string with the separator '\n' like this:
DataTable dt = LoadTable("myTableName");
DataRow dr = dt.Rows[0]; //suppose this row has the data with \n character.
string[] s = dr["myColumn"].ToString().Split(new char[]{'\n'}, StringSplitOptions.RemoveEmptyEntries);//This gives result as I expect, e.g an array of 2 or 3 strings depending on what I saved before.
That means '\n' does exist in my table column. But when I tried selecting only rows which contain \n character at myColumn, it gave no rows returned, like this:
--use charindex
SELECT * FROM MyTable WHERE CHARINDEX('\n',MyColumn,0) > 0
--use like
SELECT * FROM MyTable WHERE MyColumn LIKE '%\n%'
I wonder if my queries are wrong?
I've also tested with both '\r\n' and '\r' but the result was the same.
How can I detect if the rows contain '\n' character in my table? This is required to select the rows I want (this is by my design when choosing '\n' as the separator).
Thank you very much in advance!
Since \n is the ASCII linefeed character try this:
SELECT *
FROM MyTable
WHERE MyColumn LIKE '%' || X'0A' || '%'
Sorry this is just a guess; I don't use SQLite myself.
Maybe you should just be looking for carriage returns if you arent storing the "\n" literal in the field. Something like
SELECT *
FROM table
WHERE column LIKE '%
%'
or select * from table where column like '%'+char(13)+'%' or column like '%'+char(10)+'%'
(Not sure if char(13) and 10 work for SQLite
UPDATED: Just found someone's solution here They recommend to replace the carriage returns
So if you want to replace them and strip the returns, you could
update yourtable set yourCol = replace(yourcol, '
', ' ');
The following should do it for you
SELECT *
FROM your_table
WHERE your_column LIKE '%' + CHAR(10) + '%'
If you want to test for carriage return use CHAR(13) instead or combine them.
I've found a solution myself. There is few way (with some dedicated function) to convert ascii code to symbol in SQLite at the moment (CHAR function is not support and using '\n' or '\r' directly doesn't work). But we can convert using CAST function and passing in a Hex string (specified by append X or x before the string) in SQLite like this:
-- use CHARINDEX
SELECT * FROM MyTable WHERE CHARINDEX(CAST(x'0A' AS text),MyColumn,0) > 0
-- use LIKE
SELECT * FROM MyTable WHERE MyColumn LIKE '%' || CAST(x'0A' AS text) || '%'
The Hex string '0A' is equal to 10 in ascii code (\r). I've tried with '0D' (13 or '\n') but it won't work. Maybe the \n character is turned to \r after being saved in to SQLite table.
Hope this helps others! Thanks!