Integer comparison as string - sql

I have an integer column and I want to find numbers that start with specific digits.
For example they do match if I look for '123':
1234567
123456
1234
They do not match:
23456
112345
0123445
Is the only way to handle the task by converting the Integers into Strings before doing string comparison?
Also I am using Postgre regexp_replace(text, pattern, replacement) on numbers which is very slow and inefficient way doing it.
The case is that I have large amount of data to handle this way and I am looking for the most economical way doing this.
PS. I am not looking a way how to cast integer into string.

Are you looking for a match at the start of the value?
You might create a functional index like this:
CREATE INDEX my_index ON mytable(CAST(stuff AS TEXT));
It should be used by your LIKE query, but I didn't test it.

As a standard principle (IMHO), a database design should use a number type if and only if the field is:
A number you could sensibly perform maths on
A reference code within the database - keys etc
If it's a number in some other context - phone numbers, IP addresses etc - store it as text.
This sounds to me like your '123' is conceptually a string that just happens to only contain numbers, so if possible I'd suggest altering the design so it's stored as such.
Otherwise, I can't see a sensible way to do the comparison using it as numbers, so you'll need to convert it to strings on the fly with something like
SELECT * FROM Table WHERE CheckVar LIKE '''' + to_char(<num>,'999') + '%'

The best way for performance is to store them as strings with an index on the column and use LIKE '123%'. Most other methods of solving this will likely involve a full table scan.
If you aren't allowed to change the table, you could try the following, but it's not pretty:
WHERE col = 123
OR col BETWEEN 1230 AND 1239
OR col BETWEEN 12300 AND 12399
etc...
This might also result in a table scan though. You can solve by converting the OR to multiple selects and then UNION ALL them to get the final result.

Related

Query to ignore rows which have non hex values within field

Initial situation
I have a relatively large table (ca. 0.7 Mio records) where an nvarchar field "MediaID" contains largely media IDs in proper hexadecimal notation (as they should).
Within my "sequential" query (each query depends on the output of the query before, this is all in pure T-SQL) I have to convert these hexadecimal values into decimal bigint values in order to do further calculations and filtering on these calculated values for the subsequent queries.
--> So far, no problem. The "sequential" query works fine.
Problem
Unfortunately, some of these Media IDs do contain non-hex characters - most probably because there was some typing errors by the people which have added them or through import errors from the previous business system.
Because of these non-hex chars, the whole query fails (of course) because the conversion hits an error.
For my current purpose, such rows must be skipped/ignored as they are clearly wrong and cannot be used (there are no medias / data carriers in use with the current business system which can have non-hex character IDs).
Manual editing of the data is not an option as there are too many errors and it is not clear with what the data must be replaced.
Challenge
To create a query which only returns records which have valid hex values within the media ID field.
(Unfortunately, my SQL skills are not enough to create the above query. Your help is highly appreciated.)
The relevant section of the larger query looks like this (xxxx is where your help comes in :-))
select
pureMediaID
, mediaID
, CUSTOMERID
,CONTRACT_CUSTOMERID
from
(
select concat('0x', Replace(Ltrim(Replace(mediaID, '0', ' ')), ' ', '0')) AS pureMediaID
--, CUSTOMERID
, *
from M_T_CONTRACT_CUSTOMERS
where mediaID is not null
and mediaID like '0%'
and xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) as inner1
EDIT: As per request I have added here some good and some bad data:
Good:
4335463357
4335459809
1426427996
4335463509
4335515039
4335465134
4427370396
4335415661
4427369036
4335419089
004BB03433
004e7cf9c6
00BD23133
00EE13D8C1
00CCB5522C
00C46522C
00dbbe3433
Bad:
4564589+
AB6B8BFC.8
7B498DFCnm
DB218DFChb
d<tgfh8CFC
CB9E8AFCzj
B458DFCjhl
rytzju8DFC
BFCtdsjshj
DB9888FCgf
9BC08CFCyx
EB198DFCzj
4B628CFChj
7B2B8DFCgg
After I did upgrade the compatibility level of the SQL instance to SQL2016 (it was below 2012 before) I could use try_convert with same syntax as the original convert function as donPablo has pointed out. With that the query could run fully through and every MediaID which is not a correct hex value gets nicely converted into a null value - really, really nice.
Exactly what I needed.
Unfortunately, the solution of ALICE... didn't work out for me as this was also (strangely) returning records which had the "+" character within them.
Edit: The added comment of Alice... where you create a calculated field like this:
CASE WHEN "KEY" LIKE '%[^0-9A-F]%' THEN 0 ELSE 1 end as xyz
and then filter in the next query like this:
where xyz = 1
works also with SQL Instances with compatibility level < SQL 2012.
Great addition for people which still have to work with older SQL instances.
An option (although not ideal in terms of performance) is to check the characters in the MediaID through a case statement and regular expression
Hexadecimals cannot contain characters other than A-F and numbers between 0 and 9
CASE WHEN MediaID LIKE '%[0-9A-F]%' THEN 1 ELSE 0 END
I would recommend writing a function that can be used to evaluate MediaID first and checks if it is hexadecimal and then running the query for conversion

Implementing Greater Than operation using SQL wildcard

I have some serialized data inside a relational database table, like:
ID | VALUE
60 | "A=18, D=78"
70 | "D=4, A=18"
80 | "A=21, C=44"
The system can perform queries for searching a particular value using wildcards:
LIKE '%A=18%' (returns the ID:60 and ID:70 registers)
But now I require to implement the Greater Than operator in a similar way.
Is it possible using wildcards?
Thanks!
No that is not possible. It will be treated as a string literal.
When you say LIKE '%A=10%' then A=10 is treated as a string for text matching not as an expression to evaluate.
So if you write like LIKE '%A>10%' then it would take A>10 as a string and not perform any math on it and will result in rows which match the text and in your case it would not return anything.

Manipulating a record data

I am looking for a way to take data from one table and manipulate it and bring it to another table using an SQL query.
I have a Column called NumberStuff that has data like this in it:
INC000000315482
I need to cut off the INC portion of the number and convert it into an integer and store it into a Column in another table so that it ends up looking like this:
315482
Any help would be much appreciated!
Another approach is to use the Replace function. Either in TSQL or as a Derived Column Expression in SSIS.
TSQL
SELECT REPLACE(T.MyColumn, 'INC', '') AS ReplacedINC
SSIS
REPLACE([MyColumn], "INC", "")
This removes the character based data. It then becomes an optional exercise in converting to a numeric type before storing it to the target table or letting the implicit conversion happen.
Simplest version of what you need.
select cast(right(column,6) as int) from table
Are you doing this in a SSIS statement, or?...is it always the last 6 or?...
This is a little less dependant on your formatting...removes 0's and can be any length (will trim the first 3 chars and the leading 0's).
select cast(SUBSTRING('INC000000315482',4,LEN('INC000000315482') - 3) as int)

Sorting '£' (pound symbol) in sql

I am trying to sort £ along with other special characters, but its not sorting properly.
I want that string to be sorted along with other strings starting with special characters. For example I have four strings:
&!##
££$$
abcd
&#$%.
Now its sorting in the order: &!##, &#$%, abcd, ££$$.
I want it in the order: &!##, &#$%, ££$$, abcd.
I have used the function order by replace(column,'£','*') so that it sorts along with strings starting with *. Although this seems to work while querying the DB, when used in code and deployed the £ gets replaced by �, i.e. (replace(column,'�','*') in the query, and doesn't sort as expected.
How to resolve this issue? Is there any other solution to sort the pound symbol/£? Any help would be greatly appreciated.
You seem to have two problems; performing the actual sort, and (possibly) how the £ symbol appears in the results in your code. Without knowing anything about your code or client or environment it's rather hard to guess what you might need to change, but I'd start by looking at your NLS_LANG and other NLS settings at the client end. #amccausl's link might be useful, but it depends what you're doing. I suspect you'll find different values in nls_session_parameters when queried from SQL*Plus and from your code, which may give you some pointers.
The sorting itself is slightly clearer now. Have a look at the docs for Linguistic Sorting and String Searching and NLSSORT.
You can do something like this (with a CTE to generate your data):
with tmp_tab as (
select '&!##' as value from dual
union all select '££$$' from dual
union all select 'abcd' from dual
union all select '&#$%' from dual
)
select * from tmp_tab
order by nlssort(value, 'NLS_SORT = WEST_EUROPEAN')
VALUE
------
&!##
&#$%
££$$
abcd
4 rows selected.
You can get sort values supported by your configuration with select value from v$nls_valid_values where parameter = 'SORT', but WESTERN_EUROPEAN seems to do what you want, for this sample data anyway.
You can see the default sorting in your current session with select value from nls_session_parameters where parameter = 'NLS_SORT'. (You can change that with an ALTER SESSION, but it's only letting me do that with some values, so that may not be helpful here).
You need to make sure your application code is all proper UTF-8 (see http://htmlpurifier.org/docs/enduser-utf8.html for more details)
Seems like your issue is with db characterset, or difference in charactersets between the app and db. For Oracle side, you can check by doing:
select value from sys.nls_database_parameters where parameter='NLS_CHARACTERSET';
If this comes up ascii (like US7ASCII), then you may have issues storing the data properly. Even if this is the charset, you should be able to insert and retrieve sorted (binary sort) by using nvarchar2 and unistr (assuming they conform to your NLS_NCHAR_CHARACTERSET, see above query but change parameter), like:
create table test1(val nvarchar2(100));
insert into test1(val) values (unistr('\00a3')); -- pound currency
insert into test1(val) values (unistr('\00a5')); -- yen currency
insert into test1(val) values ('$'); -- dollar currency
commit;
select * from test1
order by val asc;
-- will give symbols in order: dollar('\0024'), pound ('\00a3'), yen ('\00a5')
I will say that I would not resort to using the national characterset, I would probably change the db characterset to fit the needs of my data, as supporting 2 diff character sets isn't ideal, but its available anyway
If you have no issues storing/retrieving on the data side, then your app/client characterset is probably different than your db.
Use nchar(168). It will work.
select nchar(168)

how to rearrange a data

I have a table like this:-
Item Model
------------------------
A 10022009
B 10032006
C 05081997
I need to rearrange/convert the Model column into this format:-
Item Model
------------------------
A 20090210
B 20060310
C 19970805
The Model column is character.
Thanks
You can try the following
UPDATE MyTable
SET Model = substr(Model, 5, 4) + substr(Model, 3, 2) + substr(Model, 1, 2)
The right way to do this, assuming those are date fields (and they certainly look like them), is to put that data into a date type column, not a string type column.
Then you can use the DBMS-provided date/time manipulation functions as they were meant to be used, including being able to extract them in the format and order that you want.
Normally, I would have proposed a simple textual change with substrings but, since you're going to change the data anyway, the best thing to do is bite the bullet and change the schema so all your problems disappear (not just one of them).
If you want to keep it as a string type, the syntax to use depends on your DBMS. It's likely to be one of the following:
substring (column, start, length) # substr for Oracle, I think.
substring (column FROM start for length)