SQL REPLACE function, how to replace single letter - sql

My code is as follows:
REPLACE(REPLACE(cc.contype,'x','y'),'y','z') as ContractType,
This REPLACE's correctly what I would like, but it unfortunatley changes all "z's" to "y's" when I would like
x > y
y > z
Does this make sense? I would not like all of the new Y's to then change again in my second REPLACE function. In Microsoft Access, I would do this with the following
Iif(cc.contype = x, y, iif(cc.contype = y, x))
But I am not sure how to articulate this in SQL, would it be best I do this kind of thing in the client side language?
Many thanks.
EDIT: Have also tried with no luck:
CASE WHEN SUBSTRING(cc.contype, 1, 1) = 'C'
THEN REPLACE(cc.contype, 'C', 'Signed')
CASE WHEN SUBSTRING(cc.contype, 1, 1) = 'E'
THEN REPLACE(cc.contype, 'E', 'Estimate') as ContractType,

Try doing it the other way round if you don't want the new "y"'s to become "z"'s:
REPLACE(REPLACE(cc.contype,'y','z'),'x','y') as ContractType

Not that I'm a big fan of the performance killing process of handling sub-columns, but it appears to me you can do that just by reversing the order:
replace(replace(cc.contype,'y','z'),'x','y') as ContractType,
This will transmute all the y characters to z before transmuting the x characters to y.
If you're after a more general solution, you can do unioned queries like:
select 'Signed: ' || cc.contype as ContractType
wherecc.contype like 'C%' from wherever
union all select 'Estimate: ' || cc.contype as ContractType
where cc.contype like 'E%' from wherever
without having to mess about with substrings at all (at the slight cost of prefixing the string rather than modifying it, and adding any other required conditions as well, of course). This will usually be much more efficient than per-row functions.
Some DBMS' will actually run these sub-queries in parallel for efficiency.
Of course, the ideal solution is to change your schema so that you don't have to handle sub-columns. Separate the contype column into two, storing the first character into contype_first and contype_rest.
Then whenever you want the full contype:
select contype_first || contype_rest ...
For your present query, you could then use a lookup table:
lookup_table:
first char(1) primary key
description varchar(20)
containing:
first description
----- -----------
C Signed:
E Estimate:
and the query:
select lkp.description || cc.contype_rest
from lookup_table lkp, real_table cc
where lkp.first = cc.first ...
Both these queries are likely to be blazingly fast compared to one that does repeated string substitutions on each row.
Even if you can't replace the single column with two independent columns, you can at least create the two new ones and use an insert/update trigger to keep them in sync. This gives you the old way and a new improved way for accessing the contype information.
And while this technically violates 3NF, that's often acceptable for performance reasons, provided you understand and mitigate the risks (with the triggers).

How about
REPLACE(REPLACE(REPLACE(cc.contype,'x','ahhhgh'),'y','z'),'ahhhgh','y') as ContractType,
ahhhgh can be replaced with whatever you like.

Related

Is there a way to check if one bit is set and another is not at the same time?

Sorry this has probably been asked before but it's rather hard to search for. I'm trying to figure if some bits have been set and some have not been set in a single operation. Is this possible?
e.g. I want to check if the fifth bit is off but either the second or third bit are on.
For the SQL situation you suggested, you could just mask out the three bits and perform an IN check for the patterns you accept, e.g.
SELECT * FROM table
WHERE (field & 0x16) IN (0x2, 0x4);
or if you accept both second and third bits being set at once, you can just do a range check, since all three combinations you accept exhibit no overlap with the range of the not accepted options:
SELECT * FROM table
WHERE (field & 0x16) BETWEEN 0x2 AND 0x6;
I'll note that while this works, it's not great style for SQL (or any language really). In practice, you'd probably want to stick with named single BIT (NOT NULL) fields, which provides more information to SQL (potentially allowing indexed searches and the like). Writing
SELECT * FROM table
WHERE (field2 = 1 OR field3 = 1) AND field5 = 0;
-- or for mutually exclusive
WHERE field2 + field3 = 1 AND field5 = 0;
is not meaningfully longer, and it's significantly more clear what you're doing (assuming you have more useful field names of course).

Efficient way to ignore whitespace in DB2?

I am running queries in a large IBM DB2 database table (let's call it T) and have found that the cells for column Identifier tend to be padded not just on the margins, but in between as well, as in: ' ID1 ID2 '. I do not have rights to update this DB, nor would I, given a number of factors. However, I want a way to ignore the whitespace AT LEAST on the left and right, even if I need to simply add a couple of spaces in between. The following queries work, but are slow, upwards of 20 seconds slow....
SELECT * FROM T WHERE Identifier LIKE '%ID1%ID2%';
SELECT * FROM T WHERE TRIM(Identifier) LIKE 'ID1%ID2';
SELECT * FROM T WHERE TRIM(Identifier) = 'ID1 ID2';
SELECT * FROM T WHERE LTRIM(RTRIM(Identifier)) = 'ID1 ID2';
SELECT * FROM T WHERE LTRIM(Identifier) LIKE 'ID1 ID2%';
SELECT * FROM T WHERE LTRIM(Identifier) LIKE 'ID1%ID2%';
SELECT * FROM T WHERE RTRIM(Identifier) LIKE '%ID1 ID2';
SELECT * FROM T WHERE RTRIM(Identifier) LIKE '%ID1%ID2';
Trying to query something like "Select * FROM T WHERE REPLACE(Identifier, ' ', '')..." of course just freezes up Access until I Ctrl+Break to end the operation. Is there a better, more efficient way to ignore the whitespace?
================================
UPDATE:
As #Paul Vernon describes below, "Trailing spaces are ignored in Db2 for comparison purpose, so you only need to consider the leading and embedded spaces."
This led me to generate combinations of spaces before 'ID1' and 'ID2' and select the records using the IN clause. The number of combinations means that the query is slower than if I knew the exact match. This is how it looks in my Java code with Jdbc (edited to make it more generic to the key issue):
private static final int MAX_LENGTH = 30;
public List<Parts> queryMyTable(String ID1, String ID2) {
String query="SELECT * FROM MYTABLE WHERE ID IN (:ids)";
final Map<String, List<String>> parameters = getIDCombinations(ID1, ID2);
return namedJdbcTemplate.query(query,parameters,new PartsMapper());
}
public static List<String> getIDCombinations(String ID1, String ID2) {
List<String> combinations = new ArrayList<>();
final int literalLength = ID1.length() + ID2.length();
final int maxWhitespace = MAX_LENGTH - literalLength;
combinations.add(ID1+ID2);
for(int x = 1; x <= maxWhitespace; x++){
String xSpace = String.format("%1$"+x+"s", "");
String idZeroSpaceBeforeBase = String.format("%s%s%s",ID1,xSpace,ID2);
String idZeroSpaceAfterBase = String.format("%s%s%s",xSpace,ID1,ID2);
combinations.add(idZeroSpaceBeforeBase);
combinations.add(idZeroSpaceAfterBase);
for(int y = 1; (x+y) <= maxWhitespace; y++){
String ySpace = String.format("%1$"+y+"s", "");
String id = String.format("%s%s%s%s",xSpace,ID1,ySpace,ID2);
combinations.add(id);
}
}
return combinations;
}
Trailing spaces are ignored in Db2 for comparison purpose, so you only need to consider the leading and embedded spaces.
Assuming there is an index on the Identifier, your only option (if you can't change the data, or add a functional index or index a generated column), is probably something like this
SELECT * FROM T
WHERE
Identifier = 'ID1 ID2'
OR Identifier = ' ID1 ID2'
OR Identifier = ' ID1 ID2'
OR Identifier = 'ID1 ID2'
OR Identifier = ' ID1 ID2'
OR Identifier = ' ID1 ID2'
which the Db2 optimize might implement as 6 index lookups, which would be faster than a full index or table scan
You could also try this
SELECT * FROM T
WHERE
Identifier LIKE 'ID1 %ID2'
OR Identifier LIKE ' ID1 %ID2'
OR Identifier LIKE ' ID1 %ID2'
which the Db2 optimize might implement as 3 index range scans,
In both examples add more lines to cover the maximum number of leading spaces you have in your data if needed. In the first example add more lines for the embeded spaces too if needed
Index on the expression REGEXP_REPLACE(TRIM(Identifier), '\s{2,}', ' ') and the following query should make Db2 use this index:
SELECT *
FROM T
WHERE REGEXP_REPLACE(TRIM(Identifier), '\s{2,}', ' ') = 'ID1 ID2'
If you need to search excluding leading and trailing spaces, then no traditional indexes can help you with that, at least as you show the case. To make the query fast, the options I can see are:
Full Text Search
You can use a "full text search" solution. DB2 does include this functionality, but I don't remember if it's included by default in the license or is sold separately. In any case, it requires a bit of indexing or periodic re-indexing of the data to make sure the search is up to date. It's worth the effort if you really need it. You'll need to change your app, since the mechanics are different.
Index on extra, clean column
Another solution is to index the column without the leading or trailing spaces. But you'll need to create an extra column; on a massive table this operation can take some time. The good news is that once is created then there's no more delay. For example:
alter table t add column trimmed_id varchar(100)
generated always as (trim(identifier));
Note: You may need to disable/enable integrity checks on the table before and after this clause. DB2 is picky about this. Read the manual to make sure it works. The creation of this column will take some time.
Then, you need to index it:
create index ix1 on t (trimmed_id);
The creation of the index will also take some time, but it should be faster than the step above.
Now, it's ready. You can query your table by using the new column instead of the original one (that's still there), but this time, you can forget about leading and traling spaces. For example:
SELECT * FROM T WHERE trimmed_id LIKE 'ID1%ID2';
The only wildcard now shows up in the middle. This query will be much faster than reading the whole table. In fact, the longer the string ID1 is, the faster the query will be, since the selectivity will be better.
Now, if ID2 is longer than ID1 then you can reverse the index to make it fast.

Can 2 character length variables cause SQL injection vulnerability?

I am taking a text input from the user, then converting it into 2 character length strings (2-Grams)
For example
RX480 becomes
"rx","x4","48","80"
Now if I directly query server like below can they somehow make SQL injection?
select *
from myTable
where myVariable in ('rx', 'x4', '48', '80')
SQL injection is not a matter of length of anything.
It happens when someone adds code to your existing query. They do this by sending in the malicious extra code as a form submission (or something). When your SQL code executes, it doesn't realize that there are more than one thing to do. It just executes what it's told.
You could start with a simple query like:
select *
from thisTable
where something=$something
So you could end up with a query that looks like:
select *
from thisTable
where something=; DROP TABLE employees;
This is an odd example. But it does more or less show why it's dangerous. The first query will fail, but who cares? The second one will actually work. And if you have a table named "employees", well, you don't anymore.
Two characters in this case are sufficient to make an error in query and possibly reveal some information about it. For example try to use string ')480 and watch how your application will behave.
Although not much of an answer, this really doesn't fit in a comment.
Your code scans a table checking to see if a column value matches any pair of consecutive characters from a user supplied string. Expressed in another way:
declare #SearchString as VarChar(10) = 'Voot';
select Buffer, case
when DataLength( Buffer ) != 2 then 0 -- NB: Len() right trims.
when PatIndex( '%' + Buffer + '%', #SearchString ) != 0 then 1
else 0 end as Match
from ( values
( 'vo' ), ( 'go' ), ( 'n ' ), ( 'po' ), ( 'et' ), ( 'ry' ),
( 'oo' ) ) as Samples( Buffer );
In this case you could simply pass the value of #SearchString as a parameter and avoid the issue of the IN clause.
Alternatively, the character pairs could be passed as a table parameter and used with IN: where Buffer in ( select CharacterPair from #CharacterPairs ).
As far as SQL injection goes, limiting the text to character pairs does preclude adding complete statements. It does, as others have noted, allow for corrupting the query and causing it to fail. That, in my mind, constitutes a problem.
I'm still trying to imagine a use-case for this rather odd pattern matching. It won't match a column value longer (or shorter) than two characters against a search string.
There definitely should be a canonical answer to all these innumerable "if I have [some special kind of data treatment] will be my query still vulnerable?" questions.
First of all you should ask yourself - why you are looking to buy yourself such an indulgence? What is the reason? Why do you want add an exception to your data processing? Why separate your data into the sheep and the goats, telling yourself "this data is "safe", I won't process it properly and that data is unsafe, I'll have to do something?
The only reason why such a question could even appear is your application architecture. Or, rather, lack of architecture. Because only in spaghetti code, where user input is added directly to the query, such a question can be ever occur. Otherwise, your database layer should be able to process any kind of data, being totally ignorant of its nature, origin or alleged "safety".

What applications are there for NULLIF()?

I just had a trivial but genuine use for NULLIF(), for the first time in my career in SQL. Is it a widely used tool I've just ignored, or a nearly-forgotten quirk of SQL? It's present in all major database implementations.
If anyone needs a refresher, NULLIF(A, B) returns the first value, unless it's equal to the second in which case it returns NULL. It is equivalent to this CASE statement:
CASE WHEN A <> B OR B IS NULL THEN A END
or, in C-style syntax:
A == B || A == null ? null : A
So far the only non-trivial example I've found is to exclude a specific value from an aggregate function:
SELECT COUNT(NULLIF(Comment, 'Downvoted'))
This has the limitation of only allowing one to skip a single value; a CASE, while more verbose, would let you use an expression.
For the record, the use I found was to suppress the value of a "most recent change" column if it was equal to the first change:
SELECT Record, FirstChange, NULLIF(LatestChange, FirstChange) AS LatestChange
This was useful only in that it reduced visual clutter for human consumers.
I rather think that
NULLIF(A, B)
is syntactic sugar for
CASE WHEN A = B THEN NULL ELSE A END
But you are correct: it is mere syntactic sugar to aid the human reader.
I often use it where I need to avoid the Division by Zero exception:
SELECT
COALESCE(Expression1 / NULLIF(Expression2, 0), 0) AS Result
FROM …
Three years later, I found a material use for NULLIF: using NULLIF(Field, '') translates empty strings into NULL, for equivalence with Oracle's peculiar idea about what "NULL" represents.
NULLIF is handy when you're working with legacy data that contains a mixture of null values and empty strings.
Example:
SELECT(COALESCE(NULLIF(firstColumn, ''), secondColumn) FROM table WHERE this = that
SUM and COUNT have the behavior of turning nulls into zeros. I could see NULLIF being handy when you want to undo that behavior. If fact this came up in a recent answer I provided. If I had remembered NULLIF I probably would have written the following
SELECT student,
NULLIF(coursecount,0) as courseCount
FROM (SELECT cs.student,
COUNT(os.course) coursecount
FROM #CURRENTSCHOOL cs
LEFT JOIN #OTHERSCHOOLS os
ON cs.student = os.student
AND cs.school <> os.school
GROUP BY cs.student) t

Integer comparison as string

I have an integer column and I want to find numbers that start with specific digits.
For example they do match if I look for '123':
1234567
123456
1234
They do not match:
23456
112345
0123445
Is the only way to handle the task by converting the Integers into Strings before doing string comparison?
Also I am using Postgre regexp_replace(text, pattern, replacement) on numbers which is very slow and inefficient way doing it.
The case is that I have large amount of data to handle this way and I am looking for the most economical way doing this.
PS. I am not looking a way how to cast integer into string.
Are you looking for a match at the start of the value?
You might create a functional index like this:
CREATE INDEX my_index ON mytable(CAST(stuff AS TEXT));
It should be used by your LIKE query, but I didn't test it.
As a standard principle (IMHO), a database design should use a number type if and only if the field is:
A number you could sensibly perform maths on
A reference code within the database - keys etc
If it's a number in some other context - phone numbers, IP addresses etc - store it as text.
This sounds to me like your '123' is conceptually a string that just happens to only contain numbers, so if possible I'd suggest altering the design so it's stored as such.
Otherwise, I can't see a sensible way to do the comparison using it as numbers, so you'll need to convert it to strings on the fly with something like
SELECT * FROM Table WHERE CheckVar LIKE '''' + to_char(<num>,'999') + '%'
The best way for performance is to store them as strings with an index on the column and use LIKE '123%'. Most other methods of solving this will likely involve a full table scan.
If you aren't allowed to change the table, you could try the following, but it's not pretty:
WHERE col = 123
OR col BETWEEN 1230 AND 1239
OR col BETWEEN 12300 AND 12399
etc...
This might also result in a table scan though. You can solve by converting the OR to multiple selects and then UNION ALL them to get the final result.