Query speed and expressions with constant value - sql

Is
SELECT * FROM Table WHERE ID=2*2
slower than
SELECT * FROM Table WHERE ID=4
I mean: is 2*2 calculated once or is it calculated when checking each record?
You may ask why write 2*2 and not just simply 4?
Well actually in my real case I have some numbers which represents characters building some text (somebody had a "great" idea of saving array of numeric data as string).
So my query would be like:
SELECT * FROM Table where value1=(CHAR(20)+CHAR(8)+CHAR(32))
I could calculate this string on my side but there could be possible problems with encoding, passing characters like /n or /r in a query etc.
Will this (CHAR(20)+CHAR(8)+CHAR(32)) slow my query if I use it instead of plain string text like '#!_'

it will be the same performance with a constant expression like 2*2. However if you start comparing ID/2 = 2. It will slow down the query because it has to calculate for all rows.

Related

Real number comparison for trigram similarity

I am implementing trigram similarity for word matching in column comum1. similarity() returns real. I have converted 0.01 to real and rounded to 2 decimal digits. Though there are rank values greater than 0.01, I get no results on screen. If I remove the WHERE condition, lots of results are available. Kindly guide me how to overcome this issue.
SELECT *,ROUND(similarity(comum1,"Search_word"),2) AS rank
FROM schema.table
WHERE rank >= round(0.01::real,2)
I have also converted both numbers to numeric and compared, but that also didn't work:
SELECT *,ROUND(similarity(comum1,"Search_word")::NUMERIC,2) AS rank
FROM schema.table
WHERE rank >= round(0.01::NUMERIC,2)
LIMIT 50;
The WHERE clause can only reference input column names, coming from the underlying table(s). rank in your example is the column alias for a result - an output column name.
So your statement is illegal and should return with an error message - unless you have another column named rank in schema.table, in which case you shot yourself in the foot. I would think twice before introducing such a naming collision, while I am not completely firm with SQL syntax.
And round() with a second parameter is not defined for real, you would need to cast to numeric like you tried. Another reason your first query is illegal.
Also, the double-quotes around "Search_word" are highly suspicious. If that's supposed to be a string literal, you need single quotes: 'Search_word'.
This should work:
SELECT *, round(similarity(comum1,'Search_word')::numeric,2) AS rank
FROM schema.table
WHERE similarity(comum1, 'Search_word') > 0.01;
But it's still pretty useless as it fails to make use of trigram indexes. Do this instead:
SET pg_trgm.similarity_threshold = 0.01; -- set once
SELECT *
FROM schema.table
WHERE comum1 % 'Search_word';
See:
Finding similar strings with PostgreSQL quickly
That said, a similarity of 0.01 is almost no similarity. Typically, you need a much higher threshold.

Why would SQL truncate the right-side of a string when using the right function?

I have a database with MANY out-of-date file locations. The difference between the out-of-date file locations and the correct locations is simply the left-side of the address. So, I am attempting to take the left-side off and replace it with the correct string. But, I can't get there because my query is altering the right-side of the address.
This query is made using "vfpoledb."
SELECT RIGHT(LINK,LEN(LINK)-8) ,LEN(LINK)-8,RIGHT(LINK,77),LINK
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"
This query returns the following:
EXP1:
\SHARES\DATA\QMS\QMS DATA\TRACKING FILES\REMOTEENTRIES\V46145 216447
EXP2:
77
EXP3:
\SHARES\DATA\QMS\QMS DATA\TRACKING FILES\REMOTEENTRIES\V46145 216447-002A.PDF
LINK:
\\SERVER\SHARES\DATA\QMS\QMS DATA\TRACKING FILES\REMOTEENTRIES\V46145 216447-002A.PDF
I don't understand why EXP1 and EXP3 are giving different results. EXP3 is what I'm looking for EXP1 to return. If I could get that, I could append the correct left-hand-side and create an update query to fix everything.
Edit:
Even when changing the query to:
SELECT RIGHT(LINK,LEN(LINK)) ,LEN(LINK)-8,RIGHT(LINK,77),LINK
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"
The link still cuts off at the same point, which is odd because expression_3 which still uses Right(), but manually provides the length instead of using Len() does not do this.
Furthermore, it seems that when I run the query to include all results:
SELECT RIGHT(LINK,LEN(LINK)) ,LEN(LINK)-8,RIGHT(LINK,77),LINK
FROM LINKSTORE
WHERE 1=1
All values returned by Exp1 are equal in length even though Exp2 and Link are different in size.
So back to the problem, how can I run a query to replace the left-side with the correct server if I can't separate them out?
OK this is tricky, I did some Foxpro 20 years ago but don't have it to hand.
Your SELECT statement looks OK to me. In the comments under the question Thomas G created this DbFiddle which shows that in a 'normal' dbms, your SELECT statement gives the result you are expecting: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=37047d2b7efb91aaa029fa0fb98eea24
So the problem must be something FoxPro/dBase specific rather than a problem with your SELECT statement.
Reading up I see people say that with FoxPro always use ALLTRIM() when using RIGHT() or LEN() on table fields because the data gets returned padded with spaces. I don't see how that would cause the exact bug you're seeing but you could try this maybe:
SELECT RIGHT(ALLTRIM(LINK),LEN(ALLTRIM(LINK))-8) ,LEN(ALLTRIM(LINK))-8,RIGHT(ALLTRIM(LINK),77),ALLTRIM(LINK)
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"
edit: OK I got a better idea - are there other rows in your result set?
According to this: https://www.tek-tips.com/viewthread.cfm?qid=1706948 ... when you do SELECT (expr) in FoxPro whatever the length of the expr in the first row becomes that max length for that 'field' and so all subsequent rows get truncated to that length. Makes sense in a crazy 1970s sort of way.
So perhaps you have a row of data above the one we are talking about which comes out at 68 chars long and so every subsequent value gets truncated to that length.
The way around it is to pad your expression results with CAST or PADR:
SELECT PADR(RIGHT(ALLTRIM(LINK),LEN(ALLTRIM(LINK))-8),100),LEN(ALLTRIM(LINK))-8,PADR(RIGHT(ALLTRIM(LINK),77),100),LINK
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"
Or same without the ALLTRIM()
SELECT PADR(RIGHT(LINK,LEN(LINK)-8),100),LEN(LINK)-8,PADR(RIGHT(LINK,77),100),LINK
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"

Query to ignore rows which have non hex values within field

Initial situation
I have a relatively large table (ca. 0.7 Mio records) where an nvarchar field "MediaID" contains largely media IDs in proper hexadecimal notation (as they should).
Within my "sequential" query (each query depends on the output of the query before, this is all in pure T-SQL) I have to convert these hexadecimal values into decimal bigint values in order to do further calculations and filtering on these calculated values for the subsequent queries.
--> So far, no problem. The "sequential" query works fine.
Problem
Unfortunately, some of these Media IDs do contain non-hex characters - most probably because there was some typing errors by the people which have added them or through import errors from the previous business system.
Because of these non-hex chars, the whole query fails (of course) because the conversion hits an error.
For my current purpose, such rows must be skipped/ignored as they are clearly wrong and cannot be used (there are no medias / data carriers in use with the current business system which can have non-hex character IDs).
Manual editing of the data is not an option as there are too many errors and it is not clear with what the data must be replaced.
Challenge
To create a query which only returns records which have valid hex values within the media ID field.
(Unfortunately, my SQL skills are not enough to create the above query. Your help is highly appreciated.)
The relevant section of the larger query looks like this (xxxx is where your help comes in :-))
select
pureMediaID
, mediaID
, CUSTOMERID
,CONTRACT_CUSTOMERID
from
(
select concat('0x', Replace(Ltrim(Replace(mediaID, '0', ' ')), ' ', '0')) AS pureMediaID
--, CUSTOMERID
, *
from M_T_CONTRACT_CUSTOMERS
where mediaID is not null
and mediaID like '0%'
and xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) as inner1
EDIT: As per request I have added here some good and some bad data:
Good:
4335463357
4335459809
1426427996
4335463509
4335515039
4335465134
4427370396
4335415661
4427369036
4335419089
004BB03433
004e7cf9c6
00BD23133
00EE13D8C1
00CCB5522C
00C46522C
00dbbe3433
Bad:
4564589+
AB6B8BFC.8
7B498DFCnm
DB218DFChb
d<tgfh8CFC
CB9E8AFCzj
B458DFCjhl
rytzju8DFC
BFCtdsjshj
DB9888FCgf
9BC08CFCyx
EB198DFCzj
4B628CFChj
7B2B8DFCgg
After I did upgrade the compatibility level of the SQL instance to SQL2016 (it was below 2012 before) I could use try_convert with same syntax as the original convert function as donPablo has pointed out. With that the query could run fully through and every MediaID which is not a correct hex value gets nicely converted into a null value - really, really nice.
Exactly what I needed.
Unfortunately, the solution of ALICE... didn't work out for me as this was also (strangely) returning records which had the "+" character within them.
Edit: The added comment of Alice... where you create a calculated field like this:
CASE WHEN "KEY" LIKE '%[^0-9A-F]%' THEN 0 ELSE 1 end as xyz
and then filter in the next query like this:
where xyz = 1
works also with SQL Instances with compatibility level < SQL 2012.
Great addition for people which still have to work with older SQL instances.
An option (although not ideal in terms of performance) is to check the characters in the MediaID through a case statement and regular expression
Hexadecimals cannot contain characters other than A-F and numbers between 0 and 9
CASE WHEN MediaID LIKE '%[0-9A-F]%' THEN 1 ELSE 0 END
I would recommend writing a function that can be used to evaluate MediaID first and checks if it is hexadecimal and then running the query for conversion

Natural or Human Sort order

I have been working on this on for months. I just cannot get the natural (True alpha-numeric) results. I am shocked that I cannot get them as I have been able to in RPG since 1992 with EBCDIC.
I am looking for any solution in SQL, VBS or simple excel or access. Here is the data I have:
299-8,
3410L-87,
3410L-88,
420-A20,
420-A21,
420A-40,
4357-3,
AN3H10A,
K117GM-8,
K129-1,
K129-15,
K271B-200L,
K271B-38L,
K271D-200EL,
KD1051,
KD1062,
KD1092,
KD1108,
KD1108,
M8000-3,
MS24665-1,
SK271B-200L,
SAYA4008
The order I am looking for is the true alpha-numeric order as below:
AN3H10A,
KD1051,
KD1062,
KD1092,
KD1108,
KD1108,
K117GM-8,
K129-1,
K129-15,
MS24665-1,
M8000-3,
SAYA4008,
SK271B-200L
The inventory is 7800 records so I have had some problems with processing power as well.
Any help would be appreciated.
Jeff
In native Excel, you can add multiple sorting columns to return the ASCII code for each character, but if the character is a number, then add a large number to the code (e.g 1000).
Then sort on each of the helper columns, including the first column in the table, but not in the sort.
The formula:
=IFERROR(CODE(MID($A1,COLUMNS($A:A),1))+AND(CODE(MID($A1,COLUMNS($A:A),1))>=48,CODE(MID($A1,COLUMNS($A:A),1))<=57)*1000,"")
The Sort dialog:
The results:
You can implement a similar algorithm using VBA, and probably SQL also. I dunno about VBS or Access.
You could try using format for left padding the string in order by
select column
from my_table
order by Format(column, "0000000000")
Add a sorting column:
, iif (left(fieldname, 1) between '0' and '9', 1, 0) sortField
etc
order by sortField, FieldName
Lets say you have your data in column "A". If you put this formula in column "B" =IFERROR(IF(LEFT(A1,1)+1>0,"ZZZZZZZ "&A1,A1),A1), it will automatically add Z in front of all numerical values, so that they will naturally appear after all alphabetical values when you sort A-Z. later you can find&replace that funny ZZZZZZ string...
There a number of approaches, but likely the least amount of work is to build two columns that split out the delimiter (-) in this case.
You then “pad” the results (spaces, or 0) right justified, and then sort on the two columns.
So in the query builder we have this:
SELECT Field1,
Format(
Mid(field1,1,IIf(InStr(field1,"-")=0,50,InStr(field1,"-")-1)),
">##########") AS Expr1,
Format(
Mid(field1,IIf(InStr(field1,"-")=0,99,InStr(field1,"-")+1)),
">##########") AS Expr2
FROM Data
When we run the above raw query we get this:
So now in the query builder, simply sort on the first derived column, and then sort on the 2nd derived column.
Eg this:
Run the query, and we get this result:
Edit:
Looking at you desired results, it looks like above sort is wrong. We have to RIGHT just and pad with 0’s.
So this 2nd try:
SELECT Field1,
Left(Mid(field1,1,IIf(InStr(field1,"-")=0,30,InStr(field1,"-")-1))
& String(30,"0"),30) AS Expr1,
Left(Mid(field1,IIf(InStr(field1,"-")=0,99,InStr(field1,"-")+1))
& String(30,"0"),30) AS Expr2
FROM Data
The results are thus this:
Given your small table size, then the above query should perform quite well.

Return rows where first character is non-alpha

I'm trying to retrieve all columns that start with any non alpha characters in SQlite but can't seem to get it working. I've currently got this code, but it returns every row:
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[A-z]%'
Is there a way to retrieve all rows where the first character of TestNames are not part of the alphabet?
Are you going first character only?
select * from TestTable WHERE substr(TestNames,1) NOT LIKE '%[^a-zA-Z]%'
The substr function (can also be called as left() in some SQL languages) will help isolate the first char in the string for you.
edit:
Maybe substr(TestNames,1,1) in sqllite, I don't have a ready instance to test the syntax there on.
Added:
select * from TestTable WHERE Upper(substr(TestNames,1,1)) NOT in ('A','B','C','D','E',....)
Doesn't seem optimal, but functionally will work. Unsure what char commands there are to do a range of letters in SQLlite.
I used 'upper' to make it so you don't need to do lower case letters in the not in statement...kinda hope SQLlite knows what that is.
try
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[^a-zA-Z]%'
SELECT * FROM NC_CRIT_ATTACH WHERE substring(FILENAME,1,1) NOT LIKE '[A-z]%';
SHOULD be a little faster as it is
A) First getting all of the data from the first column only, then scanning it.
B) Still a full-table scan unless you index this column.