SQL string comparison -how to ignore blank spaces - sql

I have prepared an SQL query that I will have to run on several databases (Oracle and Sybase) where some data might be stored differently.
I have noticed that one of the differences in data storage is the blank string.
For example, in the column PRODUCT_TYPE below, please have a look at the second record:
This "empty string" (the data type is CHAR(15)) circled in red is equal to '' in some of the databases, whereas it's equal to ' ' to some others. The length is never constant and there are several fields that behave as such.
So, since I need to filter on these "empty strings", I should change the following statement in my WHERE clause:
WHERE PRODUCT_TYPE = ''
...because the above will take the ' ' string as different than '' even if "functionally" speaking is not.
I would hence like to make the statement in a way that it "ignores white spaces", i.e. ' ' is equal to '' that is equal to ' ' etc.
How should I do this change in order to make it work?
I have tried the simple replacing approach:
WHERE REPLACE(PRODUCT_TYPE,' ','') = ''
...but it doesn't seem to work, probably because I should use a different character.
For sake of testing, inside the ' below there is a copied-pasted example of what I find in these "empty strings":
' '
Ideally, it should be a "non-specific SQL" solution since I will have to run the same query on both Oracle and Sybase RDBMS. Any idea?

You can use trim on the column.
where trim(product_type) is null
The above is not DBMS-independent, since Sybase does not provide the trim function.
However, the below approach will work both in Sybase and Oracle:
where rtrim(ltrim(product_type)) is null

You can use the replace statement you've tried but you should test for "is null" instead of =''
WHERE REPLACE(PRODUCT_TYPE,' ','') is null
See also:
null vs empty string in Oracle

The simple (and non-DBMS specific) answer is:
Do not use CHAR(15).
char(n) is a fixed length data type. So no matter what you store in there, the value will always be padded to the defined length. If you store a single character, the DBMS will store that single character and 14 spaces.
Change your columns to use varchar(15) and you should not have any problems.

Related

Column_name > " "?

I came across a query today where inside an IIF statement there was a comparison where column > " ". I have never seen this before and I am not even sure what it is doing. I am familiar with comparing strings with < and >. IE "B" < "W" returns True.
However what exactly does comparing a column to a string of spaces do?
The line in the select statement:
,IIf([CD] > " " AND [DT]=#12/31/9999#,[MT],[O]) AS [Final]
Can someone explain exactly what this comparison is doing and why would you want this?
Your logic is checking that CD is larger than a field whose name consists only of spaces. That seems highly unlikely.
My guess is that you are either using MS Access because the date constant is not compatible with SQL Server. In SQL Server this would be:
IIf([CD] > ' ' AND [DT] = #12/31/9999#, [MT], [O]) AS [Final]
This is validating that the string does not start with a bunch of spaces. It is actually a rather pathetic way of doing this; more typically LTRIM() would be involved.
I might further speculate that CD is exactly five characters long. In this case, the logic is checking that it is not all spaces. This would be particularly applicable if the type were CHAR(5) NOT NULL.

sas sql : filter out corrupted row

I need to copy data from own table to another and filter out corrupted rows;
I have a column with dates and sometimes I have rows like this " . " - random number of spaces and one dot.
how can I make my sql to ignore these rows?
i tried to make using
where (trim(put(DatesOfRun) not like '.'
and multiple other variance of
"where not like"
or
"where <>"
but all of them gave me an errors like
"Expression using equals (=) has components that are of different
data types."
or
ERROR 22-322: Syntax error, expecting one of the following:
and a long list of operators
First, you need to confirm if this is a character or a numeric field. . is how SAS displays null (missing in SAS speak) for numerics, so it's entirely possible you have a numeric field.
where not missing(DatesOfRun)
or
where DatesOfRun is not null
Either of those should do it, if it's numeric.
If it is character, then it's fairly simple.
where not (strip(DatesOfRun) = '.')
trim only trims blanks at the end, strip removes from both sides.
It's also possible you have non-breaking spaces or other things that are going to mess the latter up; if the strip one works as in doesn't error, but doesn't actually remove the characters, you may want to use a data step and put that variable to the log using $HEX32. format (with appropriate width, 2 times the number of characters possible), and see what comes out; if you don't recognize the characters or don't know how to handle ASCII codes, come back here and ask a new question with that information.
Just to clarify, you are trying to ignore results where the DatesOfRun column contains the character '.'? If so, you may want to use wildcard operators if the '.' can appear in random locations, such as '.%' or '%.%'
Also, check the datattype of the DatesOfRun column; this could influence results as well.
Two WHERE clauses could potentially solve your issue; try using this WHERE clause and see if it throws an error:
WHERE DatesOfRun is not null
AND DatesOfRun not like '%.%'

Various ways to query strings?

I'm working on an app that involves a lot of carefully designed strings. I'm in the process of designing the string format and for that I need to know what's possible and what's not when I'm querying the same data.
Which ones of these are possible with MySQL? .. and how do I accomplish them?
Results which contain this exact string -- not case sensitive
Results which contain this exact string -- case sensitive
Results which contain a similar string -- not case sensitive
Results which contain a similar string -- but individual characters must be of the same case
1. Results which contain this exact string -- not case sensitive
2. Results which contain this exact string -- case sensitive
These can both be accomplished. See the page documenting string functions in MySQL, in particular INSTR.
Case sensitivity is determined by the collation of a column. If you want values in a column to be compared in a case-sensitive fashion, then you give it a case-sensitive collation, as in the following example:
ALTER TABLE MyTable ADD MyColumn VARCHAR(10) CHARACTER SET ascii COLLATE ascii_general_cs NOT NULL
Conversely, if you want values in a column to be compared in a case-insensitive fashion, then give it a case-insensitive collation.
If you might want values in a column to be compared either way, then there are ways to do that too, though it's slightly more complicated.
3. Results which contain a similar string -- not case sensitive
4. Results which contain a similar string -- but individual characters must be of the same case
Depends what exactly you mean by "similar", but for some values of "similar" yes this is available. You will probably find it useful to consult the page I linked above.
SELECT this FROM that WHERE LOWER(string) = LOWER("blablabla");
SELECT this FROM that WHERE string = "blablabla";
SELECT this FROM that WHERE LOWER(string) LIKE LOWER("blablabla");
SELECT this FROM that WHERE string LIKE "blablabla";
Hope that's right.

MS-Access - why can't this update query fill empty cells?

In MS-Access database, table called NewTable3
colname is a text column containing lot of empty cells or blanks.
i want to put ? character in empty cells . when i run the query
UPDATE NewTable3 SET colname = '?' WHERE ISNULL(colname) ;
This query updates 0 records why . what is wrong with this query
Two quick things:
1) Try putting the colname in square brackets.
2) Remember that empty cells (Nulls) and empty strings ("") are different.
Together:
UPDATE NewTable3 SET [colname] = "?" WHERE ISNULL([colname]) OR [colname] = "";
Also, are you running the query in Access itself, or just using the Access engine and using the data in another program/via a VBA script? It can make a difference.
EDIT:
Based on #onedaywhen's prodding, I now see that I never fully absorbed the original question, which was asking about replacing Nulls with the literal ? character. This is insane and not helpful or useful. If you don't have a meaningful default value for the field, then LEAVE IT NULL. If you want to distinguish between Null (unknown) and Blank (i.e., known to be blank), you can allow zero-length strings and change the Nulls to ZLS.
My original post follows, since I think it is useful for people who might get to this crazy question needing to do things properly:
In total, all the answers in this thread end up solving all the problems with the original SQL statement, but they do so incompletely, so I'll compile them all together in an attempt to create a comprehensive correct answer.
#Wim Hollebrandse wisely points out that a parameter needs brackets, but posts the SQL as:
UPDATE NewTable3 SET colname = '[?]' WHERE ISNULL(colname);
This is incorrect, in that the quotes will cause what's inside them to be treated literally, instead of evaluated as a paramter, so you'll end up with all your fields updated to the literal value "[?]". The correct syntax would be:
UPDATE NewTable3 SET colname = [?] WHERE ISNULL(colname);
#GuinnessFan points out a problem in the WHERE clause, suggesting out that the result of IsNull() needs to be compared to True in order for the WHERE clause to work. In other words, this:
WHERE IsNull(NewTable3.colname)
...should be this:
WHERE IsNull(NewTable3.colname)=True
But given that both statements evaluate the same, they are entirely equivalent. But #GuinnessFan is correct that this is the best syntax:
WHERE NewTable3.colname Is Null
#mavnn points out that the fields may be "empty" while not being Null, which is a very common problem. I believe on principle (and consistent with my understanding of the official SQL standards) that fields should be initialized as Null and should not allow zero-length strings. It is certainly possible in some applications that one might want to distinguish Null, i.e., value not yet supplied, from blank (zero-length string), i.e., value known to be blank. But if that's part of the application design, then the user should know that criteria on such fields need to consider whether one or both should be included (i.e., both Null and <>"" or one or the other).
From my point of view, it was unfortunate that the the old default for text fields (where AllowZLS defaulted to FALSE) was changed in Access 2003 to allow ZLS's by default. This means that many people who don't notice that AllowZLS is set to TRUE when they create their tables end up with ZLS's stored in their text fields without intending to do so (and importing a table from a previous version also defaults to TRUE).
While testing for Null and ="" will make the WHERE clause that is seeking all "empty" fields work as expected, the permanent fix is to change the field definition to disallow ZLS's. But do note that changing AllowZLS to FALSE does not clear the existing ZLS's -- you have to run a SQL UPDATE to remove them.
Last of all, in using parameters, it is better to declare them such that the values that the user can input are restricted to appropriate values. If the field is numeric, you to limit it to numeric values, if a date, date values, if text or memo, to text:
PARAMETERS [User Prompt] Long;
UPDATE MyTable SET LongIntegerColumn = [User Prompt]
PARAMETERS [User Prompt] DateTime;
UPDATE MyTable SET DateColumn = [User Prompt]
PARAMETERS [User Prompt] Text ( 255 );
UPDATE MyTable SET TextColumn = [User Prompt]
Note that with Text(255) as your parameter type, anything supplied by the user is truncated to 255 characters, even if it's longer than that (it would be a pretty unusual situation where'd you'd need that). For values longer than that (such as memo fields), you omit the text length declaration:
PARAMETERS [User Prompt] Text;
UPDATE MyTable SET TextColumn = [User Prompt]
In any event, I think so-called anonymous parameters are not too helpful, as you aren't leveraging the power of parameters to restrict data type of input criteria.
Try:
UPDATE NewTable3 SET colname = '[?]' WHERE ISNULL(colname);
The questionmark is used for anonymous parameters, so you need to escape it as above. Note that I have not tried this.
UPDATE NewTable3 SET NewTable3.colname = "?"
WHERE (((NewTable3.colname) Is Null));
To keep your function: WHERE (((IsNull([NewTable3.colname]))=True));
I don't believe that replacing the NULL value with your own 'magic' value ? will cause you anything but further pain.
Here's hoping you may draw inspiration from this article:
How To Handle Missing Information Without Using (some magic value)

Why does Oracle 9i treat an empty string as NULL?

I know that it does consider ' ' as NULL, but that doesn't do much to tell me why this is the case. As I understand the SQL specifications, ' ' is not the same as NULL -- one is a valid datum, and the other is indicating the absence of that same information.
Feel free to speculate, but please indicate if that's the case. If there's anyone from Oracle who can comment on it, that'd be fantastic!
I believe the answer is that Oracle is very, very old.
Back in the olden days before there was a SQL standard, Oracle made the design decision that empty strings in VARCHAR/VARCHAR2 columns were NULL and that there was only one sense of NULL (there are relational theorists that would differentiate between data that has never been prompted for, data where the answer exists but is not known by the user, data where there is no answer, etc. all of which constitute some sense of NULL).
By the time that the SQL standard came around and agreed that NULL and the empty string were distinct entities, there were already Oracle users that had code that assumed the two were equivalent. So Oracle was basically left with the options of breaking existing code, violating the SQL standard, or introducing some sort of initialization parameter that would change the functionality of potentially large number of queries. Violating the SQL standard (IMHO) was the least disruptive of these three options.
Oracle has left open the possibility that the VARCHAR data type would change in a future release to adhere to the SQL standard (which is why everyone uses VARCHAR2 in Oracle since that data type's behavior is guaranteed to remain the same going forward).
Tom Kyte VP of Oracle:
A ZERO length varchar is treated as
NULL.
'' is not treated as NULL.
'' when assigned to a char(1) becomes
' ' (char types are blank padded
strings).
'' when assigned to a varchar2(1)
becomes '' which is a zero length
string and a zero length string is
NULL in Oracle (it is no long '')
Oracle documentation alerts developers to this problem, going back at least as far as version 7.
Oracle chose to represent NULLS by the "impossible value" technique. For example, a NULL in a numeric location will be stored as "minus zero", an impossible value. Any minus zeroes that result from computations will be converted to positive zero before being stored.
Oracle also chose, erroneously, to consider the VARCHAR string of length zero (the empty string) to be an impossible value, and a suitable choice for representing NULL. It turns out that the empty string is far from an impossible value. It's even the identity under the operation of string concatenation!
Oracle documentation warns database designers and developers that some future version of Oracle might
break this association between the empty string and NULL, and break any code that depends on that association.
There are techniques to flag NULLS other than impossible values, but Oracle didn't use them.
(I'm using the word "location" above to mean the intersection of a row and a column.)
I suspect this makes a lot more sense if you think of Oracle the way earlier developers probably did -- as a glorified backend for a data entry system. Every field in the database corresponded to a field in a form that a data entry operator saw on his screen. If the operator didn't type anything into a field, whether that's "birthdate" or "address" then the data for that field is "unknown". There's no way for an operator to indicate that someone's address is really an empty string, and that doesn't really make much sense anyways.
According to official 11g docs
Oracle Database currently treats a character value with a length of zero as null. However, this may not continue to be true in future releases, and Oracle recommends that you do not treat empty strings the same as nulls.
Possible reasons
val IS NOT NULL is more readable than val != ''
No need to check both conditions val != '' and val IS NOT NULL
Empty string is the same as NULL simply because its the "lesser evil" when compared to the situation when the two (empty string and null) are not the same.
In languages where NULL and empty String are not the same, one has to always check both conditions.
Example from book
set serveroutput on;
DECLARE
empty_varchar2 VARCHAR2(10) := '';
empty_char CHAR(10) := '';
BEGIN
IF empty_varchar2 IS NULL THEN
DBMS_OUTPUT.PUT_LINE('empty_varchar2 is NULL');
END IF;
IF '' IS NULL THEN
DBMS_OUTPUT.PUT_LINE(''''' is NULL');
END IF;
IF empty_char IS NULL THEN
DBMS_OUTPUT.PUT_LINE('empty_char is NULL');
ELSIF empty_char IS NOT NULL THEN
DBMS_OUTPUT.PUT_LINE('empty_char is NOT NULL');
END IF;
END;
Because not treating it as NULL isn't particularly helpful, either.
If you make a mistake in this area on Oracle, you usually notice right away. In SQL server, however, it will appear to work, and the problem only appears when someone enters an empty string instead of NULL (perhaps from a .net client library, where null is different from "", but you usually treat them the same).
I'm not saying Oracle is right, but it seems to me that both ways are approximately equally bad.
Indeed, I have had nothing but difficulties in dealing with Oracle, including invalid datetime values (cannot be printed, converted or anything, just looked at with the DUMP() function) which are allowed to be inserted into the database, apparently through some buggy version of the client as a binary column! So much for protecting database integrity!
Oracle handling of NULLs links:
http://digitalbush.com/2007/10/27/oracle-9i-null-behavior/
http://jeffkemponoracle.com/2006/02/empty-string-andor-null.html
First of all, null and null string were not always treated as the same by Oracle. A null string is, by definition, a string containing no characters. This is not at all the same as a null. NULL is, by definition, the absence of data.
Five or six years or so ago, null string was treated differently from null by Oracle. While, like null, null string was equal to everything and different from everything (which I think is fine for null, but totally WRONG for null string), at least length(null string) would return 0, as it should since null string is a string of zero length.
Currently in Oracle, length(null) returns null which I guess is O.K., but length(null string) also returns null which is totally WRONG.
I do not understand why they decided to start treating these 2 distinct "values" the same. They mean different things and the programmer should have the capability of acting on each in different ways. The fact that they have changed their methodology tells me that they really don't have a clue as to how these values should be treated.