Efficient way to ignore whitespace in DB2? - sql

I am running queries in a large IBM DB2 database table (let's call it T) and have found that the cells for column Identifier tend to be padded not just on the margins, but in between as well, as in: ' ID1 ID2 '. I do not have rights to update this DB, nor would I, given a number of factors. However, I want a way to ignore the whitespace AT LEAST on the left and right, even if I need to simply add a couple of spaces in between. The following queries work, but are slow, upwards of 20 seconds slow....
SELECT * FROM T WHERE Identifier LIKE '%ID1%ID2%';
SELECT * FROM T WHERE TRIM(Identifier) LIKE 'ID1%ID2';
SELECT * FROM T WHERE TRIM(Identifier) = 'ID1 ID2';
SELECT * FROM T WHERE LTRIM(RTRIM(Identifier)) = 'ID1 ID2';
SELECT * FROM T WHERE LTRIM(Identifier) LIKE 'ID1 ID2%';
SELECT * FROM T WHERE LTRIM(Identifier) LIKE 'ID1%ID2%';
SELECT * FROM T WHERE RTRIM(Identifier) LIKE '%ID1 ID2';
SELECT * FROM T WHERE RTRIM(Identifier) LIKE '%ID1%ID2';
Trying to query something like "Select * FROM T WHERE REPLACE(Identifier, ' ', '')..." of course just freezes up Access until I Ctrl+Break to end the operation. Is there a better, more efficient way to ignore the whitespace?
================================
UPDATE:
As #Paul Vernon describes below, "Trailing spaces are ignored in Db2 for comparison purpose, so you only need to consider the leading and embedded spaces."
This led me to generate combinations of spaces before 'ID1' and 'ID2' and select the records using the IN clause. The number of combinations means that the query is slower than if I knew the exact match. This is how it looks in my Java code with Jdbc (edited to make it more generic to the key issue):
private static final int MAX_LENGTH = 30;
public List<Parts> queryMyTable(String ID1, String ID2) {
String query="SELECT * FROM MYTABLE WHERE ID IN (:ids)";
final Map<String, List<String>> parameters = getIDCombinations(ID1, ID2);
return namedJdbcTemplate.query(query,parameters,new PartsMapper());
}
public static List<String> getIDCombinations(String ID1, String ID2) {
List<String> combinations = new ArrayList<>();
final int literalLength = ID1.length() + ID2.length();
final int maxWhitespace = MAX_LENGTH - literalLength;
combinations.add(ID1+ID2);
for(int x = 1; x <= maxWhitespace; x++){
String xSpace = String.format("%1$"+x+"s", "");
String idZeroSpaceBeforeBase = String.format("%s%s%s",ID1,xSpace,ID2);
String idZeroSpaceAfterBase = String.format("%s%s%s",xSpace,ID1,ID2);
combinations.add(idZeroSpaceBeforeBase);
combinations.add(idZeroSpaceAfterBase);
for(int y = 1; (x+y) <= maxWhitespace; y++){
String ySpace = String.format("%1$"+y+"s", "");
String id = String.format("%s%s%s%s",xSpace,ID1,ySpace,ID2);
combinations.add(id);
}
}
return combinations;
}

Trailing spaces are ignored in Db2 for comparison purpose, so you only need to consider the leading and embedded spaces.
Assuming there is an index on the Identifier, your only option (if you can't change the data, or add a functional index or index a generated column), is probably something like this
SELECT * FROM T
WHERE
Identifier = 'ID1 ID2'
OR Identifier = ' ID1 ID2'
OR Identifier = ' ID1 ID2'
OR Identifier = 'ID1 ID2'
OR Identifier = ' ID1 ID2'
OR Identifier = ' ID1 ID2'
which the Db2 optimize might implement as 6 index lookups, which would be faster than a full index or table scan
You could also try this
SELECT * FROM T
WHERE
Identifier LIKE 'ID1 %ID2'
OR Identifier LIKE ' ID1 %ID2'
OR Identifier LIKE ' ID1 %ID2'
which the Db2 optimize might implement as 3 index range scans,
In both examples add more lines to cover the maximum number of leading spaces you have in your data if needed. In the first example add more lines for the embeded spaces too if needed

Index on the expression REGEXP_REPLACE(TRIM(Identifier), '\s{2,}', ' ') and the following query should make Db2 use this index:
SELECT *
FROM T
WHERE REGEXP_REPLACE(TRIM(Identifier), '\s{2,}', ' ') = 'ID1 ID2'

If you need to search excluding leading and trailing spaces, then no traditional indexes can help you with that, at least as you show the case. To make the query fast, the options I can see are:
Full Text Search
You can use a "full text search" solution. DB2 does include this functionality, but I don't remember if it's included by default in the license or is sold separately. In any case, it requires a bit of indexing or periodic re-indexing of the data to make sure the search is up to date. It's worth the effort if you really need it. You'll need to change your app, since the mechanics are different.
Index on extra, clean column
Another solution is to index the column without the leading or trailing spaces. But you'll need to create an extra column; on a massive table this operation can take some time. The good news is that once is created then there's no more delay. For example:
alter table t add column trimmed_id varchar(100)
generated always as (trim(identifier));
Note: You may need to disable/enable integrity checks on the table before and after this clause. DB2 is picky about this. Read the manual to make sure it works. The creation of this column will take some time.
Then, you need to index it:
create index ix1 on t (trimmed_id);
The creation of the index will also take some time, but it should be faster than the step above.
Now, it's ready. You can query your table by using the new column instead of the original one (that's still there), but this time, you can forget about leading and traling spaces. For example:
SELECT * FROM T WHERE trimmed_id LIKE 'ID1%ID2';
The only wildcard now shows up in the middle. This query will be much faster than reading the whole table. In fact, the longer the string ID1 is, the faster the query will be, since the selectivity will be better.
Now, if ID2 is longer than ID1 then you can reverse the index to make it fast.

Related

postgres where equals any value

I have a query where I filter a column based on a where clause, like so:
select * from table where col = val
What value do I set for val so that there is no filtering in the where clause (in which case the where clause is redundant)?
What value do I set for val so that there is no filtering in the where clause?
It's impossible.
You might instead use
query = "SELECT * FROM TABLE"
if val is not None:
query += " WHERE col = :val"
None is a common sentinel value,
but feel free to use another.
Consider switching from = equality to LIKE.
Then a % wildcard will arrange for an unfiltered blind query.
query = "SELECT * FROM table WHERE col LIKE :val"
Pro:
You still get to exploit any index there might be on col, for values of val ending with a % wildcard.
Cons:
The program behavior is clearly different, e.g. there might be a UNIQUE KEY on col, and the revised SELECT can now return multiple rows.
Your data might contain wildcard characters, which now need escaping.
Your users may have more flexibility now to pose queries that you didn't want them to.
Any % characters that are not at the end of val may disable the index, leading to unexpectedly long query times / result set sizes.
If col can't be null you can use col=col?

Postgresql, tsquery doesn't work with part of string

I'm using postgres' tsquery function to search in a field that might contain letters in multiple languages and numbers.
it seems that in every case the search works up to a part of the searched phrase and stops working until you write the full phrase.
for example:
searching for the name '15339' outputs the right row when the search term is '15339' but if it's '153' it won't.
searching for Al-Alamya, if the term is 'al-' it will work and return the row, but adding letters after that, for example, 'al-alam' won't return it until I finish writing the full name ('Al-Alamya').
my query:
SELECT *
FROM (SELECT DISTINCT ON ("consumer_api_spot"."id") "consumer_api_spot"."id",
"consumer_api_spot"."name",
FROM "consumer_api_spot"
INNER JOIN "consumer_api_account" ON ("consumer_api_spot"."account_id" = "consumer_api_account"."id")
INNER JOIN "users_user" ON ("consumer_api_account"."id" = "users_user"."account_id")
WHERE (
users_user.id = 53 AND consumer_api_spot.active
AND
"consumer_api_spot"."vectorized_name" ## tsquery('153')
)
GROUP BY "consumer_api_spot"."id"
) AS "Q"
LIMIT 50 OFFSET 0
If you check the documentation, you'll find more information about what you can specify as a tsquery. They support grouping, combining using boolean operations, and also prefixing which is probably something you want. An example from the docs:
Also, lexemes in a tsquery can be labeled with * to specify prefix matching:
SELECT 'super:*'::tsquery;
This query will match any word in a tsvector that begins with “super”.
So in your query you should modify the part of tsquery('153') to tsquery('153:*').
Btw. I don't know exactly how you constructed your database schema, but you can add a tsvector index for a column using a GIN index. I will assume that you generate the "consumer_api_spot"."vectorized_name" column from a "consumer_api_spot"."name" column. If that's the case you can create a tsvector index for that column like this:
CREATE INDEX gin_name on consumer_api_spot using gin (to_tsvector('english',name))
And then you could change this query:
"consumer_api_spot"."vectorized_name" ## tsquery('153')
into this:
to_tsvector('english', "consumer_api_spot"."name") ## to_tsquery('english', '153:*')
and get a potential speed benefit, because the query would utilize an index.
Note about the 'english': You cannot omit the language, when creating the index, but it won't have an effect on queries in other languages, or queries with numbers. However, be careful, the language must be the same for creating the index and performing the query to enable PostgreSQL to use the index.

How to replace where clause dynamically in query (BIRT)?

In my report query I have a where clause that needs to be replaced dynamically based on the data chosen in the front end.
The query is something like :
where ?=?
I already have a code to replace the value - I created report parameter and linked to the value ? in the query.
Example:
where name=?
Any value of name that comes from front end replaces the ? in the where clause - this works fine.
But now I need to replace the entire clause (where ?=?). Should I create two parameters and link them to both the '?' ?
No, unfortunately most database engines do not allow to use a query parameter for handling a dynamic column name. This is for security considerations.
So you need to keep an arbitrary column name in the query:
where name=?
And then in "beforeOpen" script of the dataset replace 'name' with a report parameter value:
this.queryText=this.queryText.replace("name",params["myparameter"].value);
To prevent SQLIA i recommend to test the value of the parameter in this script. There are many ways to do this but a white list is the strongest test, for example:
var column=params["myparameter"].value;
if (column=="name" || column=="id" || column=="account" || column=="mycolumnname"){
this.queryText=this.queryText.replace("name",column);
}
In addition to Dominique's answer and your comment, then you'll just need a slightly more advanced logic.
For example, you could name your dynamic column-name-value pairs (column1, value1), (column2, value2) and so on. In the static text of the query, make sure to have bind variables for value1, value2 and so on (for example, with Oracle SQL, using the syntax
with params as (
select :value1 as value1,
:value2 as value2 ...
from dual
)
select ...
from params, my_table
where 1=1
and ... static conditions....
Then, in the beforeOpen script, append conditions to the query text in a loop as needed (the loop left as an exercise to the reader, and don't forget checking the column names for security reasons!):
this.queryText += " and " + column_name[i] + "= params.value" + i;
This way you can still use bind variables for the comparison values.

Searching for a specific text value in a column in SQLite3

Suppose I have a table named 'Customer' with many columns and I want to display all customers who's name ends with 'Thomas'(Lastname = 'Thomas'). The following query shows an empty result(no rows). Also it didn't show any error.
SELECT * FROM Customer WHERE Lastname = 'Thomas';
While executing the following query give me correct result.
SELECT * FROM Customer WHERE Lastname LIKE '%Thomas%';
I would like to know what is the problem with my first query. I am using sqlite3 with Npm. Below is the result of '.show' command(Just in case of the problem is with config).
sqlite> .show
echo: off
explain: off
headers: on
mode: column
nullvalue: ""
output: stdout
separator: "|"
stats: off
width:
Use Like instead of =
Trim to ensure that there arent spaces messing around
so the query will be
SELECT * FROM Customer WHERE trim(Lastname) LIKE 'Thomas';
depending on your types, probably you dont need point 2, since as can be read in mysql manual
All MySQL collations are of type PADSPACE. This means that all CHAR
and VARCHAR values in MySQL are compared without regard to any
trailing spaces
But the point 1 could be the solution. Actually if you want to avoid problems, you should compare strings with LIKE, instead of =.
If You still have problems, probably you will have to use collates.
SELECT *
FROM t1
WHERE k LIKE _latin1 'Müller' COLLATE latin1_german2_ci; #using your real table collation
more information here But specifically with 'Thomas' you shouldn't need it, since it hasn't got any special characters.

SQL REPLACE function, how to replace single letter

My code is as follows:
REPLACE(REPLACE(cc.contype,'x','y'),'y','z') as ContractType,
This REPLACE's correctly what I would like, but it unfortunatley changes all "z's" to "y's" when I would like
x > y
y > z
Does this make sense? I would not like all of the new Y's to then change again in my second REPLACE function. In Microsoft Access, I would do this with the following
Iif(cc.contype = x, y, iif(cc.contype = y, x))
But I am not sure how to articulate this in SQL, would it be best I do this kind of thing in the client side language?
Many thanks.
EDIT: Have also tried with no luck:
CASE WHEN SUBSTRING(cc.contype, 1, 1) = 'C'
THEN REPLACE(cc.contype, 'C', 'Signed')
CASE WHEN SUBSTRING(cc.contype, 1, 1) = 'E'
THEN REPLACE(cc.contype, 'E', 'Estimate') as ContractType,
Try doing it the other way round if you don't want the new "y"'s to become "z"'s:
REPLACE(REPLACE(cc.contype,'y','z'),'x','y') as ContractType
Not that I'm a big fan of the performance killing process of handling sub-columns, but it appears to me you can do that just by reversing the order:
replace(replace(cc.contype,'y','z'),'x','y') as ContractType,
This will transmute all the y characters to z before transmuting the x characters to y.
If you're after a more general solution, you can do unioned queries like:
select 'Signed: ' || cc.contype as ContractType
wherecc.contype like 'C%' from wherever
union all select 'Estimate: ' || cc.contype as ContractType
where cc.contype like 'E%' from wherever
without having to mess about with substrings at all (at the slight cost of prefixing the string rather than modifying it, and adding any other required conditions as well, of course). This will usually be much more efficient than per-row functions.
Some DBMS' will actually run these sub-queries in parallel for efficiency.
Of course, the ideal solution is to change your schema so that you don't have to handle sub-columns. Separate the contype column into two, storing the first character into contype_first and contype_rest.
Then whenever you want the full contype:
select contype_first || contype_rest ...
For your present query, you could then use a lookup table:
lookup_table:
first char(1) primary key
description varchar(20)
containing:
first description
----- -----------
C Signed:
E Estimate:
and the query:
select lkp.description || cc.contype_rest
from lookup_table lkp, real_table cc
where lkp.first = cc.first ...
Both these queries are likely to be blazingly fast compared to one that does repeated string substitutions on each row.
Even if you can't replace the single column with two independent columns, you can at least create the two new ones and use an insert/update trigger to keep them in sync. This gives you the old way and a new improved way for accessing the contype information.
And while this technically violates 3NF, that's often acceptable for performance reasons, provided you understand and mitigate the risks (with the triggers).
How about
REPLACE(REPLACE(REPLACE(cc.contype,'x','ahhhgh'),'y','z'),'ahhhgh','y') as ContractType,
ahhhgh can be replaced with whatever you like.