Compare strings with trailing spaces in Firebird SQL? - sql

I have an existing database with a table with a string[16] key field.
There are rows whose key ends with a space: "16 ".
I need to allow user to change from "16 " to e.g. "16" but also do a unique key check (i.e. the table does not have already a record with key="16").
I run the following query:
select * from plu__ where store=100 and plu_num = '16'
It returns the row with key="16 "!
How do I check for unique key so that keys with trailing spaces are not included?
EDIT: The DDL and the char_length
CREATE TABLE PLU__
(
PLU_NUM Varchar(16),
CAPTION Varchar(50),
...

string[16] - there is no such datatype in Firebird. There are CHAR(16) and VARCHAR(16) (and BLOB SUBTYPE TEXT, but it is improbable here). So you omit some crucial points about your system. You do not work with Firebird, but with some undisclosed intermediate layer, that is no one knows how opaque or transparent.
I suspect you or your system chose CHAR datatype instead of VARCHAR where all data is right-padded with space to the max. OR maybe the COLLATION of the column/table/database is so that trailing spaces do not matter.
Additionally, you may be just wrong. You claim that the row being Selected does contain the trailing blank, but I do not see it. For example, add CHAR_LENGTH(plu_num) to the columns in your SELECT and see what is there.
Additionally, if plu_num is number - should it not be integer or int64 rather than text?
Bottom of your screenshot shows "(NONE)". I suspect that is the "connection charset". This is allowed for backward compatibility with programs made 20 years ago, but it is quite dangerous today. You have to consult your system documentation, how to set the connection charset to URF-8 or Windows-1250 or something meaningful.
"How do I check for unique key so that keys with trailing spaces are not included?" you do not. You just can not do it reliably, because of different transactions and different programs making simultaneous connections. You would check it, decide you are clear, but right before you would insert your row - some other computer would insert it too. That gap can not be crossed that way, between your two commands of checking and inserting - anyone else can do it too. It is called race conditions.
You have to ask the server to do the checks.
For example, you have to introduce UNIQUE CONSTRAINT on the pair of columns (store, plu_num). That way the server would refuse to store two rows with the same values in those columns, visible in the same transaction.
Additionally, is it even normal to have values with spaces? Convert the field to integer datatype and be safe.
Or if you want to keep it textual and non-numeric you still can
Introduce CHECK CONSTRAINT that trim(plu_num) is not distinct from plu_num (or if plu_num is declared as a NOT NULL column to the server, then trim(plu_num) = plu_num). That way the server would refuse storing any value with spaces before or after the text.
In a case the datatype or the collation of the column makes no difference for comparing texts with and without trailing spaces (and in case you can not change that datatype or collation), you may try adding tokens, like ('+' || trim(plu_num) || '+') = ('+' || plu_num || '+')
Or instead of that CHECK CONSTRAINT, you can have proactively remove those spaces: set new before update or insert TRIGGER on the table, that would do like NEW.plu_num = TRIM(NEW.plu_num)
Documentation:
https://www.firebirdsql.org/refdocs/langrefupd20-distinct.html
http://www.firebirdtest.com/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-ddl-tbl.html#fblangref25-ddl-tbl-constraints
http://www.firebirdtest.com/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-ddl-tbl.html#fblangref25-ddl-tbl-altradd
http://www.firebirdtest.com/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-ddl-trgr.html
http://www.firebirdtest.com/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-datatypes-chartypes.html
Also, via http://www.translate.ru a bit more verbose:
http://firebirdsql.su/doku.php?id=constraint
http://firebirdsql.su/doku.php?id=alter_table
You may also check http://www.firebirdfaq.org/cat3/
Additionally, if you add the constraints onto existing table with non-valid data entered earlier before you introduced those checks, you might trap yourself into "non-restorable backup" situation. You would have to check for it, and sanitize your old data to abide by newly introduced constraints.
Option #4 explained in detail is below. Just this seems be a bad idea of database design! One should not just "let people edit number to remove trailing blanks", one should make the database design so that there would be no any numbers with trailing blank and would be no any way to insert them into the database.
CREATE TABLE "_NEW_TABLE" (
ID INTEGER NOT NULL,
TXT VARCHAR(10)
);
Select id, txt, '_'||txt||'_', char_length(txt) from "_NEW_TABLE"
ID TXT CONCATENATION CHAR_LENGTH
1 1 _1_ 1
2 2 _2_ 1
4 1 _1 _ 2
5 2 _2 _ 2
7 1 _ 1_ 2
8 2 _ 2_ 2
Select id, txt, '_'||txt||'_', char_length(txt) from "_NEW_TABLE"
where txt = '2'
ID TXT CONCATENATION CHAR_LENGTH
2 2 _2_ 1
5 2 _2 _ 2
Select id, txt, '_'||txt||'_', char_length(txt) from "_NEW_TABLE"
where txt || '+' = '2+' -- WARNING - this PROHIBITS index use on txt column, if there is any
ID TXT CONCATENATION CHAR_LENGTH
2 2 _2_ 1
Select id, txt, '_'||txt||'_', char_length(txt) from "_NEW_TABLE"
where txt = '2' and char_length(txt) = char_length('2')

Related

Query to ignore rows which have non hex values within field

Initial situation
I have a relatively large table (ca. 0.7 Mio records) where an nvarchar field "MediaID" contains largely media IDs in proper hexadecimal notation (as they should).
Within my "sequential" query (each query depends on the output of the query before, this is all in pure T-SQL) I have to convert these hexadecimal values into decimal bigint values in order to do further calculations and filtering on these calculated values for the subsequent queries.
--> So far, no problem. The "sequential" query works fine.
Problem
Unfortunately, some of these Media IDs do contain non-hex characters - most probably because there was some typing errors by the people which have added them or through import errors from the previous business system.
Because of these non-hex chars, the whole query fails (of course) because the conversion hits an error.
For my current purpose, such rows must be skipped/ignored as they are clearly wrong and cannot be used (there are no medias / data carriers in use with the current business system which can have non-hex character IDs).
Manual editing of the data is not an option as there are too many errors and it is not clear with what the data must be replaced.
Challenge
To create a query which only returns records which have valid hex values within the media ID field.
(Unfortunately, my SQL skills are not enough to create the above query. Your help is highly appreciated.)
The relevant section of the larger query looks like this (xxxx is where your help comes in :-))
select
pureMediaID
, mediaID
, CUSTOMERID
,CONTRACT_CUSTOMERID
from
(
select concat('0x', Replace(Ltrim(Replace(mediaID, '0', ' ')), ' ', '0')) AS pureMediaID
--, CUSTOMERID
, *
from M_T_CONTRACT_CUSTOMERS
where mediaID is not null
and mediaID like '0%'
and xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) as inner1
EDIT: As per request I have added here some good and some bad data:
Good:
4335463357
4335459809
1426427996
4335463509
4335515039
4335465134
4427370396
4335415661
4427369036
4335419089
004BB03433
004e7cf9c6
00BD23133
00EE13D8C1
00CCB5522C
00C46522C
00dbbe3433
Bad:
4564589+
AB6B8BFC.8
7B498DFCnm
DB218DFChb
d<tgfh8CFC
CB9E8AFCzj
B458DFCjhl
rytzju8DFC
BFCtdsjshj
DB9888FCgf
9BC08CFCyx
EB198DFCzj
4B628CFChj
7B2B8DFCgg
After I did upgrade the compatibility level of the SQL instance to SQL2016 (it was below 2012 before) I could use try_convert with same syntax as the original convert function as donPablo has pointed out. With that the query could run fully through and every MediaID which is not a correct hex value gets nicely converted into a null value - really, really nice.
Exactly what I needed.
Unfortunately, the solution of ALICE... didn't work out for me as this was also (strangely) returning records which had the "+" character within them.
Edit: The added comment of Alice... where you create a calculated field like this:
CASE WHEN "KEY" LIKE '%[^0-9A-F]%' THEN 0 ELSE 1 end as xyz
and then filter in the next query like this:
where xyz = 1
works also with SQL Instances with compatibility level < SQL 2012.
Great addition for people which still have to work with older SQL instances.
An option (although not ideal in terms of performance) is to check the characters in the MediaID through a case statement and regular expression
Hexadecimals cannot contain characters other than A-F and numbers between 0 and 9
CASE WHEN MediaID LIKE '%[0-9A-F]%' THEN 1 ELSE 0 END
I would recommend writing a function that can be used to evaluate MediaID first and checks if it is hexadecimal and then running the query for conversion

Selecting all rows from Informix table containing some null columns

I am using Perl DBD::ODBC to connect to an Informix database which I was previously blind to the schema of. I have successfully discovered the schema via querying tabname and colname tables. I am now iterating over each of those tables extracting everything in them to load into another model. What I am finding is that null columns bail out of the select query. E.g. if a table looked like this, with an optionally null lastseen column (of whatever data type):
ID username lastseen
-- -------- --------
1 joe 1234567890
2 bob 1098765432
3 mary
4 jane 1246803579
then select * from mytable (or specifying all column names indiidually) stops at the mary row.
I do have this working by using NVL as follows:
select nvl(id, ''), nvl(username, ''), nvl(lastseen, '') from mytable
And that's okay, but my question is: Is there a simpler Informix syntax to allow nulls to come into my result set, something as simple as NULLS OK or something that I am missing? Alternatively, some database handle option to allow the same?
Here is an example of my Perl with the nvl() hack, in case it's relevant:
my %tables = (
users => [
qw(id username lastseen)
]
);
foreach my $tbl (sort keys %tables) {
my $sql = 'select ' . join(',', map { "nvl($_, '')" } #{$tables{$tbl}}) . " from $tbl";
# sql like: select nvl(a, ''), nvl(b, ''), ...
my $sth = $dbh->prepare($sql);
$sth->execute;
while(defined(my $row = $sth->fetchrow_arrayref)) {
# do ETL stuff with $row
}
}
After a balked attempt at installing DBD::Informix I came back around to this and found that for some reason enabling LongTruncOk on the database handle did allow all rows including those with null columns to be selected. I don't imagine this is the root of the issue but it worked here.
However, this solution seems to have collided with an unrelated tweak to locales to support non-ascii characters. I added DB_LOCALE=en_us.utf8 and CLIENT_LOCALE=en_us.utf8 to my connection string to prevent selects from similarly breaking when encountering non-ascii characters (i.e., in a result set of say 500 where the 300th row had a non-ascii character the trailing 200 rows would not be returned). With locales set this way as well as LongTruncOk enabled on the dbh all rows are being returned (without the NVL hack), but null columns have bytes added to them from previous rows, and not in any pattern that is obvious to me. When I leave the locale settings off of the connection string and set LongTruncOk, rows with null columns are selected correctly but rows with utf characters break.
So if you don't have a charset issue perhaps just LongTruncOk would work for you. For my purposes I have had to continue using the NVL workaround for nulls and specify the locales for characters.
Check this - section NULL in Perl. It seems that with this driver there is no simple way to handle this problem.

Oracle SQL SELECT unpopulated records with the "NOT NULL" constraint

In order to retrieve the data I need, I must SELECT records where there is no data. Some of the rows are unpopulated(?).
Many of these "unpopulated" rows are set with the Not Null constraint. This makes things difficult! I cannot simply search for NULL rows because they are NOT NULL.
I have been able to select or exclude unpopulated rows with a few methods. These methods seem to randomly work or not work.
Example: select or exclude records where st.sart_code or st.sart_hold or st.sart_status or st.sart_date is unpopulated.
SELECT
sp.sprite_id, sp.sprite_last, sp.sprite_first,
st.sart_code
/* 4 data retrieval methods are listed below.
For st.sart_code, I have substituted:
st.sart_hold, st.sart_status, and st.sart_date in the methods 2-4*/
FROM
sprite sp
JOIN sart st
on sp.sprite_pidm = st.sart_pidm
METHOD 1 - select records with rows that do not have the value EVEA -- st.sart_code could contain multiple values for one sp.sprite_id. This is a checklist of items. I am looking for records that do not have EVEA in the checklist
Varchar2 type with a Not Null constraint
WHERE
Sp.sprite_change_ind is null
and
st.sart_pidm NOT IN
(SELECT st.sart_pidm
FROM sart st
WHERE st.sart_code = 'EVEA')
METHOD 2 - select records with rows that do not have the value A2 -- st.sart_hold could contain multiple values for one sp.sprite_id. st.sart_hold may be blank/unpopulated (record has no holds) or contain several different holds. The values are account hold types. I am looking for records that do not have that particular "A2" hold.
Varchar2 type with a Not Null constraint
EDIT I just realized that this works ONLY if there is at least one hold already. If the person has no holds, this script will not select the records (even though the person also has no A2 hold).
WHERE
Sp.sprite_change_ind is null
and
group by sp.sprite_id, sp.sprite_last, sp.sprite_first, st.sart_hold
having sum(case when st.sart_hold = 'A2' then 1 else 0 end) = 0;
METHOD 3 - select records with rows that have no value for st.sart_status -- st.sart_status could contain only 1 of 3 possible values or NO value for one sp.sprite_id. The values are file statuses. I am looking for records that have no status
Varchar2 type with a Not Null constraint
WHERE
Sp.sprite_change_ind is null
and
trim(st.sart_status) is null
METHOD 4 - select records with rows that are NOT missing ANY values in st.sart_date (all date fields in list are populated) -- st.sart_date could either contain a date or be blank/unpopulated for one sp.sprite_id. The value is a received date for a checklist item. I am excluding ANY record that has no date for any of the checklist items (there may be many items with corresponding dates).
Date type with a Not Null constraint
This is a little different, so I am including the first part again.
with MYVIEW AS
(
SELECT
sp.sprite_id AS Per_ID
sp.sprite_last,
sp.sprite_first,
st.sart_date as RECEIVED_DATE
FROM
sprite sp
JOIN sart st
on sp.sprite_pidm = st.sart_pidm
WHERE
Sp.sprite_change_ind is null
)
Select
Per_ID as "ID",
max(SPRITE_LAST_NAME) as "Last",
max(SPRITE_FIRST_NAME) as "First",
FROM MYVIEW
GROUP BY Per_ID
HAVING SUM(NVL2(RECEIVED_DATE,0,1)) = 0
My questions: I have had a difficult time finding methods of working with Not Null constraint fields.
EDIT: How do I see what is in the "not null" constrained field when it is not populated?
Why do the methods above not always work when looking for unpopulated fields? Do certain methods only work with certain data types (varchar2, number, date)? Or does it have to do with the type of JOIN I use? Something else?
Are there other methods out there someone could please direct me to? Any guidance would be greatly appreciated!
What is the correct terminology for "selecting records where there are unpopulated fields of [ColumnName DataType() NOT NULL]?" If I knew the terminology for what I am trying to ask, I could search for it.
NOTE My scripts are usually MUCH more involved than the examples above. I usually have at least 3 joins and many WHERE clauses.
Please let me know if this question is too involved! I am new here. :-)
Probably more a long comment than an answer, but since there isn't much activity here...
Oracle SQL SELECT blank records with the “NOT NULL” constraint Auntie Anita
- How do you have blanks if they are not null - is that partly what you're asking? Alex Poole
- That is one of the problems. Auntie Anita
Few things to know:
In Oracle the empty string '' is the same thing as NULL for VARCHAR/CHAR. That's a departure from "standard" SQL that makes a distinction between empty strings and NULL strings.
TRIM will return NULL for NULL/empty strings/space only strings.
but strings composed of spaces/invisible characters are not null. Even if they only contains the character CHR(0) (aka NUL -- with only one L )
and TRIM does not remove invisible characters. Only spaces.
To convince yourself, try those:
select NVL2(CAST('' AS VARCHAR2(20)), 'NOT NULL','NULL') FROM DUAL
select NVL2(CAST('' AS CHAR(20)), 'NOT NULL','NULL') FROM DUAL
select NVL2(TRIM(' '), 'NOT NULL','NULL') FROM DUAL
select NVL2(' ', 'NOT NULL','NULL') FROM DUAL
select NVL2(CHR(10), 'NOT NULL','NULL') FROM DUAL
select NVL2(CHR(0), 'NOT NULL','NULL') FROM DUAL
select NVL2(TRIM(' '||CHR(10)), 'NOT NULL','NULL') FROM DUAL
select NVL2(TRIM(' '||CHR(0)), 'NOT NULL','NULL') FROM DUAL
So, my guess is your "not null empty fields" in fact contain either some invisible characters -- or maybe even a single CHR(0). This is quite possible as in some languages, the NUL character is used as string terminator -- and might have sneaked into your DB at data import time for empty/missing values. Intentionally or not.
To check for that, you might want to try RAWTOHEX to examine your suspect data fields. In the following example, notice how the middle NUL character is lurking unnoticed when displayed as a string. But not in the raw hex dump:
SQL> select 'abc' || chr(0) || 'def' AS str,
RAWTOHEX('abc' || CHR(0) || 'def') AS hex FROM DUAL
STR HEX
abcdef 61626300646566
^^^^^^ ^^
Is there something Yes !
special here?
Please let me know if this question is too involved! I am new here. :-)
:D "StackOverflow" is usually much more efficient if you are able to narrow down your issue. Ideally providing some reproducible case (formelly know as SSCCE or MCVE).
Take time to examine closely your data, and if needed, don't hesitate to post an other more focused answer.

Update column in postgresql

I found out that I have a character varaying column with mistakes in a database with over 4 millon records. It contains numbers. Each number has to have 12 digits, but for some reason a lot of those numbers ended up having 10 digits.
The good news is that the only thing I have to do, is prepend '55' to each cell that only has 10 digits and starts with the number '22', leaving the ones with 12 digits untouched.
My objective is this:
UPDATE
table
SET
column = CONCAT( '55', column )
WHERE
LENGTH( column ) = 10 AND column LIKE( '22%');
I am thinking of using this:
UPDATE
telephones
SET
telephone_number = CONCAT( '55', telephone_number )
WHERE
LENGTH( telephone_number ) = 10 AND telephone_number LIKE( '22%');
Am I doing it right? If not, what would be the correct way to do it
What if instead of a string the numbers were stored as big int, same rules apply, it is still 10 digits long which means the number is lower than 3.000.000.000 and bigger than 2.000.000.000? and they all need to be the same number starting with 55
The answer is: yes, that's right. You can play around with a sample database here on SQL Fiddle. That one uses the BIGINT type. Also see this one by #gmm, which uses the VARCHAR form. Both work just like you've described them using your original syntax.

PostgreSQL ORDER BY issue - natural sort

I've got a Postgres ORDER BY issue with the following table:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
I select the last record with SELECT * FROM employees ORDER BY em_code DESC
Strip alphabets from em_code usiging reg exp and store in ec_alpha
Cast the remating part to integer ec_num
Increment by one ec_num++
Pad with sufficient zeors and prefix ec_alpha again
When em_code reaches EM1000, the above algorithm fails.
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.
Any idea how to select EM1000?
Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.
https://www.postgresql.org/docs/10/collation.html
-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en#colNumeric=yes');
-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;
Now just query as you would otherwise.
SELECT * FROM employees ORDER BY em_code
On my data, I get results in this order (note that it also sorts foreign numerals):
Value
0
0001
001
1
06
6
13
۱۳
14
One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.
create or replace function naturalsort(text)
returns bytea language sql immutable strict as $f$
select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;
Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9.
You could solve it like this:
SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;
It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.
Answer to question in comment
To strip any and all non-digits from a string:
SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;
\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
After replacing every non-digit with the empty string, only digits remain.
This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:
https://github.com/Bjond/pg_natural_sort_order
It's free to use, MIT license.
Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.
The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.
you can use just this line
"ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"
I wrote about this in detail in this related question:
Humanized or natural number sorting of mixed word-and-number strings
(I'm posting this answer as a useful cross-reference only, so it's community wiki).
I came up with something slightly different.
The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.
ORDER BY ARRAY(
SELECT ROW(
CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
match[2]
)
FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
AS match
)
I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.
https://stackoverflow.com/a/47522040/935122
I've also put it on GitHub
https://github.com/ccsalway/dbNaturalSort
The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:
create function natsort(s text) returns text immutable language sql as $$
select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$$;
The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.
Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.