Selecting all rows from Informix table containing some null columns - sql

I am using Perl DBD::ODBC to connect to an Informix database which I was previously blind to the schema of. I have successfully discovered the schema via querying tabname and colname tables. I am now iterating over each of those tables extracting everything in them to load into another model. What I am finding is that null columns bail out of the select query. E.g. if a table looked like this, with an optionally null lastseen column (of whatever data type):
ID username lastseen
-- -------- --------
1 joe 1234567890
2 bob 1098765432
3 mary
4 jane 1246803579
then select * from mytable (or specifying all column names indiidually) stops at the mary row.
I do have this working by using NVL as follows:
select nvl(id, ''), nvl(username, ''), nvl(lastseen, '') from mytable
And that's okay, but my question is: Is there a simpler Informix syntax to allow nulls to come into my result set, something as simple as NULLS OK or something that I am missing? Alternatively, some database handle option to allow the same?
Here is an example of my Perl with the nvl() hack, in case it's relevant:
my %tables = (
users => [
qw(id username lastseen)
]
);
foreach my $tbl (sort keys %tables) {
my $sql = 'select ' . join(',', map { "nvl($_, '')" } #{$tables{$tbl}}) . " from $tbl";
# sql like: select nvl(a, ''), nvl(b, ''), ...
my $sth = $dbh->prepare($sql);
$sth->execute;
while(defined(my $row = $sth->fetchrow_arrayref)) {
# do ETL stuff with $row
}
}

After a balked attempt at installing DBD::Informix I came back around to this and found that for some reason enabling LongTruncOk on the database handle did allow all rows including those with null columns to be selected. I don't imagine this is the root of the issue but it worked here.
However, this solution seems to have collided with an unrelated tweak to locales to support non-ascii characters. I added DB_LOCALE=en_us.utf8 and CLIENT_LOCALE=en_us.utf8 to my connection string to prevent selects from similarly breaking when encountering non-ascii characters (i.e., in a result set of say 500 where the 300th row had a non-ascii character the trailing 200 rows would not be returned). With locales set this way as well as LongTruncOk enabled on the dbh all rows are being returned (without the NVL hack), but null columns have bytes added to them from previous rows, and not in any pattern that is obvious to me. When I leave the locale settings off of the connection string and set LongTruncOk, rows with null columns are selected correctly but rows with utf characters break.
So if you don't have a charset issue perhaps just LongTruncOk would work for you. For my purposes I have had to continue using the NVL workaround for nulls and specify the locales for characters.

Check this - section NULL in Perl. It seems that with this driver there is no simple way to handle this problem.

Related

Query to ignore rows which have non hex values within field

Initial situation
I have a relatively large table (ca. 0.7 Mio records) where an nvarchar field "MediaID" contains largely media IDs in proper hexadecimal notation (as they should).
Within my "sequential" query (each query depends on the output of the query before, this is all in pure T-SQL) I have to convert these hexadecimal values into decimal bigint values in order to do further calculations and filtering on these calculated values for the subsequent queries.
--> So far, no problem. The "sequential" query works fine.
Problem
Unfortunately, some of these Media IDs do contain non-hex characters - most probably because there was some typing errors by the people which have added them or through import errors from the previous business system.
Because of these non-hex chars, the whole query fails (of course) because the conversion hits an error.
For my current purpose, such rows must be skipped/ignored as they are clearly wrong and cannot be used (there are no medias / data carriers in use with the current business system which can have non-hex character IDs).
Manual editing of the data is not an option as there are too many errors and it is not clear with what the data must be replaced.
Challenge
To create a query which only returns records which have valid hex values within the media ID field.
(Unfortunately, my SQL skills are not enough to create the above query. Your help is highly appreciated.)
The relevant section of the larger query looks like this (xxxx is where your help comes in :-))
select
pureMediaID
, mediaID
, CUSTOMERID
,CONTRACT_CUSTOMERID
from
(
select concat('0x', Replace(Ltrim(Replace(mediaID, '0', ' ')), ' ', '0')) AS pureMediaID
--, CUSTOMERID
, *
from M_T_CONTRACT_CUSTOMERS
where mediaID is not null
and mediaID like '0%'
and xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) as inner1
EDIT: As per request I have added here some good and some bad data:
Good:
4335463357
4335459809
1426427996
4335463509
4335515039
4335465134
4427370396
4335415661
4427369036
4335419089
004BB03433
004e7cf9c6
00BD23133
00EE13D8C1
00CCB5522C
00C46522C
00dbbe3433
Bad:
4564589+
AB6B8BFC.8
7B498DFCnm
DB218DFChb
d<tgfh8CFC
CB9E8AFCzj
B458DFCjhl
rytzju8DFC
BFCtdsjshj
DB9888FCgf
9BC08CFCyx
EB198DFCzj
4B628CFChj
7B2B8DFCgg
After I did upgrade the compatibility level of the SQL instance to SQL2016 (it was below 2012 before) I could use try_convert with same syntax as the original convert function as donPablo has pointed out. With that the query could run fully through and every MediaID which is not a correct hex value gets nicely converted into a null value - really, really nice.
Exactly what I needed.
Unfortunately, the solution of ALICE... didn't work out for me as this was also (strangely) returning records which had the "+" character within them.
Edit: The added comment of Alice... where you create a calculated field like this:
CASE WHEN "KEY" LIKE '%[^0-9A-F]%' THEN 0 ELSE 1 end as xyz
and then filter in the next query like this:
where xyz = 1
works also with SQL Instances with compatibility level < SQL 2012.
Great addition for people which still have to work with older SQL instances.
An option (although not ideal in terms of performance) is to check the characters in the MediaID through a case statement and regular expression
Hexadecimals cannot contain characters other than A-F and numbers between 0 and 9
CASE WHEN MediaID LIKE '%[0-9A-F]%' THEN 1 ELSE 0 END
I would recommend writing a function that can be used to evaluate MediaID first and checks if it is hexadecimal and then running the query for conversion

Can 2 character length variables cause SQL injection vulnerability?

I am taking a text input from the user, then converting it into 2 character length strings (2-Grams)
For example
RX480 becomes
"rx","x4","48","80"
Now if I directly query server like below can they somehow make SQL injection?
select *
from myTable
where myVariable in ('rx', 'x4', '48', '80')
SQL injection is not a matter of length of anything.
It happens when someone adds code to your existing query. They do this by sending in the malicious extra code as a form submission (or something). When your SQL code executes, it doesn't realize that there are more than one thing to do. It just executes what it's told.
You could start with a simple query like:
select *
from thisTable
where something=$something
So you could end up with a query that looks like:
select *
from thisTable
where something=; DROP TABLE employees;
This is an odd example. But it does more or less show why it's dangerous. The first query will fail, but who cares? The second one will actually work. And if you have a table named "employees", well, you don't anymore.
Two characters in this case are sufficient to make an error in query and possibly reveal some information about it. For example try to use string ')480 and watch how your application will behave.
Although not much of an answer, this really doesn't fit in a comment.
Your code scans a table checking to see if a column value matches any pair of consecutive characters from a user supplied string. Expressed in another way:
declare #SearchString as VarChar(10) = 'Voot';
select Buffer, case
when DataLength( Buffer ) != 2 then 0 -- NB: Len() right trims.
when PatIndex( '%' + Buffer + '%', #SearchString ) != 0 then 1
else 0 end as Match
from ( values
( 'vo' ), ( 'go' ), ( 'n ' ), ( 'po' ), ( 'et' ), ( 'ry' ),
( 'oo' ) ) as Samples( Buffer );
In this case you could simply pass the value of #SearchString as a parameter and avoid the issue of the IN clause.
Alternatively, the character pairs could be passed as a table parameter and used with IN: where Buffer in ( select CharacterPair from #CharacterPairs ).
As far as SQL injection goes, limiting the text to character pairs does preclude adding complete statements. It does, as others have noted, allow for corrupting the query and causing it to fail. That, in my mind, constitutes a problem.
I'm still trying to imagine a use-case for this rather odd pattern matching. It won't match a column value longer (or shorter) than two characters against a search string.
There definitely should be a canonical answer to all these innumerable "if I have [some special kind of data treatment] will be my query still vulnerable?" questions.
First of all you should ask yourself - why you are looking to buy yourself such an indulgence? What is the reason? Why do you want add an exception to your data processing? Why separate your data into the sheep and the goats, telling yourself "this data is "safe", I won't process it properly and that data is unsafe, I'll have to do something?
The only reason why such a question could even appear is your application architecture. Or, rather, lack of architecture. Because only in spaghetti code, where user input is added directly to the query, such a question can be ever occur. Otherwise, your database layer should be able to process any kind of data, being totally ignorant of its nature, origin or alleged "safety".

SQL Select to keep out fields that are NULL

I am trying to connect a Filemaker DB to Firebird SQL DB in both ways import to FM and export back to Firebird DB.
So far it works using the MBS Plug-in but FM 13 Pro canot handle NULL.
That means that for example Timestamp fields that are empty (NULL) produce a "0" value.
Thats means in Time something like 01.01.1889 00:00:00.
So my idea was to simply ignore fields containing NULL.
But here my poor knowlege stops.
First I thought I can do this with WHERE, but this is ignoring whole records sets:
SELECT * FROM TABLE WHERE FIELD IS NOT NULL
Also I tried to filter it later on like this:
If (IsEmpty (MBS("SQL.GetFieldAsDateTime"; $command; "FIELD") ) = 0 ; MBS("SQL.GetFieldAsDateTime"; $command; "FIELD"))
With no result either.
This is a direct answer to halfbit's suggestion, which is correct but not for this SQL dialect. In a query to provide a replacement value when a field is NULL you need to use COALESCE(x,y). Where if X is null, Y will be used, and if Y is null then the field is NULL. Thats why it is common for me to use it like COALESCE(table.field,'') such that a constant is always outputted if table.field happens to be NULL.
select COALESCE(null,'Hello') as stackoverflow from rdb$database
You can use COALESCE() for more than two arguments, I just used two for conciseness.
I dont know the special SQL dialect, but
SELECT field1, field2, value(field, 0), ...FROM TABLE
should help you:
value gives the first argument, ie, your field if it is NOT NULL or the second argument if it is.

Searching for a specific text value in a column in SQLite3

Suppose I have a table named 'Customer' with many columns and I want to display all customers who's name ends with 'Thomas'(Lastname = 'Thomas'). The following query shows an empty result(no rows). Also it didn't show any error.
SELECT * FROM Customer WHERE Lastname = 'Thomas';
While executing the following query give me correct result.
SELECT * FROM Customer WHERE Lastname LIKE '%Thomas%';
I would like to know what is the problem with my first query. I am using sqlite3 with Npm. Below is the result of '.show' command(Just in case of the problem is with config).
sqlite> .show
echo: off
explain: off
headers: on
mode: column
nullvalue: ""
output: stdout
separator: "|"
stats: off
width:
Use Like instead of =
Trim to ensure that there arent spaces messing around
so the query will be
SELECT * FROM Customer WHERE trim(Lastname) LIKE 'Thomas';
depending on your types, probably you dont need point 2, since as can be read in mysql manual
All MySQL collations are of type PADSPACE. This means that all CHAR
and VARCHAR values in MySQL are compared without regard to any
trailing spaces
But the point 1 could be the solution. Actually if you want to avoid problems, you should compare strings with LIKE, instead of =.
If You still have problems, probably you will have to use collates.
SELECT *
FROM t1
WHERE k LIKE _latin1 'Müller' COLLATE latin1_german2_ci; #using your real table collation
more information here But specifically with 'Thomas' you shouldn't need it, since it hasn't got any special characters.

SQL - Conditionally joining two columns in same table into one

I am working with a table that contains two versions of stored information. To simplify it, one column contains the old description of a file run while another column contains the updated standard for displaying ran files. It gets more complicated in that the older column can have multiple standards within itself. The table:
Old Column New Column
Desc: LGX/101/rpt null
null Home
Print: LGX/234/rpt null
null Print
null Page
I need to combine the two columns into one, but I also need to delete the "Print: " and "Desc: " string from the beginning of the old column values. Any suggestions? Let me know if/when I'm forgetting something you need to know!
(I am writing in Cache SQL, but I'd just like a general approach to my problem, I can figure out the specifics past that.)
EDIT: the condition is that if substr(oldcol,1,5) = 'desc: ' then substr(oldcol,6)
else if substr(oldcol,1,6) = 'print: ' then substr(oldcol,7) etc. So as to take out the "desc: " and the "print: " to sanitize the data somewhat.
EDIT2: I want to make the table look like this:
Col
LGX/101/rpt
Home
LGX/234/rpt
Print
Page
It's difficult to understand what you are looking for exactly. Does the above represent before/after, or both columns that need combining/merging.
My guess is that COALESCE might be able to help you. It takes a bunch of parameters and returns the first non NULL.
It looks like you're wanting to grab values from new if old is NULL and old if new is null. To do that you can use a case statement in your SQL. I know CASE statements are supported by MySQL, I'm not sure if they'll help you here.
SELECT (CASE WHEN old_col IS NULL THEN new_col ELSE old_col END) as val FROM table_name
This will grab new_col if old_col is NULL, otherwise it will grab old_col.
You can remove the Print: and Desc: by using a combination of CharIndex and Substring functions. Here it goes
SELECT CASE WHEN CHARINDEX(':',COALESCE(OldCol,NewCol)) > 0 THEN
SUBSTRING(COALESCE(OldCol,NewCol),CHARINDEX(':',COALESCE(OldCol,NewCol))+1,8000)
ELSE
COALESCE(OldCol,NewCol)
END AS Newcolvalue
FROM [SchemaName].[TableName]
The Charindex gives the position of the character/string you are searching for.
So you get the position of ":" in the computed column(Coalesce part) and pass that value to the substring function. Then add +1 to the position which indicates the substring function to get the part after the ":". Now you have a string without "Desc:" and "Print:".
Hope this helps.