SQLite, check if TEXT field has any alphabetical chars in it - sql

Okay, so I have a huge list of entries, and in one of the columns (for simplicity let's call it num there's a number, something like 123456780000 (they are all the same length and format), but sometimes there are fields that look something like this
12345678E000
or
12345678H000
Now, I need to delete all the rows in which the num column is not entirely numeric. The type of num is TEXT, not INTEGER. So the above examples should be deleted, while 123456780000 should not.
I have tried two solutions, of which one works but is inelegant and messy, and the other one doesn't work at all.
The first thing I tried is
DELETE FROM MY_TABLE WHERE abs(num) == 0.0
Because according to the documentation, abs(X) returns exactly 0.0 if a TEXT value is given and is unconvertable to an real number. So I was thinking it should let all the "numbers-only" pass and delete the ones with a character in it. But it doesn't do a thing, it doesn't delete even a single row.
The next thing I tried is
DELETE FROM MY_TABLE WHERE num LIKE "%A%" OR "%B%" OR "%C%"
Which seems to work, but the database is large and I am not sure which characters can appear, and while I could just do "%D%" OR "%E%" OR "%F%" OR ... with the entire alphabet, this feels inelegant and messy. And I actually want to learn something about the SQLite language.
My question, finally, is: how do I solve this problem in a nice and simple way? Perhaps there's something I'm doing wrong with the abs(X) solution, or is there another way that I do not know of/thought of?
Thanks in advance.
EDIT:
According to a comment I tried SELECT abs(num) FROM MY_TABLE WHERE num like '%A%'
and it returned the following
12345678.0
That's strange. It seems it has split the number where the alphabetical appeared. The documentation claimed it would return 0.0 if it couldn't convert it to a number. Hmm..

You can use GLOB in SQLite with a range to single them out:
SELECT *
FROM MY_TABLE
WHERE num GLOB '*[A-Za-z]*'
See it in use with this fiddle: http://sqlfiddle.com/#!7/4bc21/10
For example, for these records:
num
----------
1234567890
0987654321
1000000000
1000A00000
1000B00000
1000c00000
GLOB '*[A-Za-z]*' will return these three:
num
----------
1000A00000
1000B00000
1000c00000
You can then translate that to the appropriate DELETE:
DELETE
FROM MY_TABLE
WHERE num GLOB '*[A-Za-z]*'

Related

Why would SQL truncate the right-side of a string when using the right function?

I have a database with MANY out-of-date file locations. The difference between the out-of-date file locations and the correct locations is simply the left-side of the address. So, I am attempting to take the left-side off and replace it with the correct string. But, I can't get there because my query is altering the right-side of the address.
This query is made using "vfpoledb."
SELECT RIGHT(LINK,LEN(LINK)-8) ,LEN(LINK)-8,RIGHT(LINK,77),LINK
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"
This query returns the following:
EXP1:
\SHARES\DATA\QMS\QMS DATA\TRACKING FILES\REMOTEENTRIES\V46145 216447
EXP2:
77
EXP3:
\SHARES\DATA\QMS\QMS DATA\TRACKING FILES\REMOTEENTRIES\V46145 216447-002A.PDF
LINK:
\\SERVER\SHARES\DATA\QMS\QMS DATA\TRACKING FILES\REMOTEENTRIES\V46145 216447-002A.PDF
I don't understand why EXP1 and EXP3 are giving different results. EXP3 is what I'm looking for EXP1 to return. If I could get that, I could append the correct left-hand-side and create an update query to fix everything.
Edit:
Even when changing the query to:
SELECT RIGHT(LINK,LEN(LINK)) ,LEN(LINK)-8,RIGHT(LINK,77),LINK
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"
The link still cuts off at the same point, which is odd because expression_3 which still uses Right(), but manually provides the length instead of using Len() does not do this.
Furthermore, it seems that when I run the query to include all results:
SELECT RIGHT(LINK,LEN(LINK)) ,LEN(LINK)-8,RIGHT(LINK,77),LINK
FROM LINKSTORE
WHERE 1=1
All values returned by Exp1 are equal in length even though Exp2 and Link are different in size.
So back to the problem, how can I run a query to replace the left-side with the correct server if I can't separate them out?
OK this is tricky, I did some Foxpro 20 years ago but don't have it to hand.
Your SELECT statement looks OK to me. In the comments under the question Thomas G created this DbFiddle which shows that in a 'normal' dbms, your SELECT statement gives the result you are expecting: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=37047d2b7efb91aaa029fa0fb98eea24
So the problem must be something FoxPro/dBase specific rather than a problem with your SELECT statement.
Reading up I see people say that with FoxPro always use ALLTRIM() when using RIGHT() or LEN() on table fields because the data gets returned padded with spaces. I don't see how that would cause the exact bug you're seeing but you could try this maybe:
SELECT RIGHT(ALLTRIM(LINK),LEN(ALLTRIM(LINK))-8) ,LEN(ALLTRIM(LINK))-8,RIGHT(ALLTRIM(LINK),77),ALLTRIM(LINK)
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"
edit: OK I got a better idea - are there other rows in your result set?
According to this: https://www.tek-tips.com/viewthread.cfm?qid=1706948 ... when you do SELECT (expr) in FoxPro whatever the length of the expr in the first row becomes that max length for that 'field' and so all subsequent rows get truncated to that length. Makes sense in a crazy 1970s sort of way.
So perhaps you have a row of data above the one we are talking about which comes out at 68 chars long and so every subsequent value gets truncated to that length.
The way around it is to pad your expression results with CAST or PADR:
SELECT PADR(RIGHT(ALLTRIM(LINK),LEN(ALLTRIM(LINK))-8),100),LEN(ALLTRIM(LINK))-8,PADR(RIGHT(ALLTRIM(LINK),77),100),LINK
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"
Or same without the ALLTRIM()
SELECT PADR(RIGHT(LINK,LEN(LINK)-8),100),LEN(LINK)-8,PADR(RIGHT(LINK,77),100),LINK
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"

Clickhouse, column values to array

I want to make a query then turn the values for each of it's columns into arrays, I've tried finding a way to do this, but until now it has alluded me.
The query is a simple select:
SELECT a,b,c FROM X
Instead of the usual result of say (in the default format):
val_a_1, val_b_1, val_c_1
----------------
val_a_2, val_b_2, val_c_2
-----------------
val_a_3, val_b_3, val_c_3
I want to get an array for each columns, namely:
[val_a_1, val_a_2, val_a_3], [val_b_1, val_b_2, val_b_3] , [val_c_1, val_c_2, val_c_3]
Is this at all possible ?
Nevermind, this can easily be done with groupArray as:
SELECT groupArray(a), groupArray(b), groupArray(c) FROM X
It completely slipped my mind and google didn't help...
leaving this here in case there's a better option or anyone stumbles upon it when searching.

select using wildcard to find ending in two character then numeric

I am querying to find things ending in "ST" followed by a number 1 - 999.
SELECT NUMBER WHERE NUMBER LIKE '%ST -- works correctly to return everything ending in "ST"
SELECT NUMBER WHERE NUMBER LIKE '%[1-999] -- works correctly to return everything ending in 1 - 999
SELECT NUMBER WHERE NUMBER LIKE '%ST[1-999] -- doesn't work - returns nothing
Also tried:
SELECT NUMBER WHERE NUMBER LIKE '%ST%[1-999] -- works, but also returns things like "GRASTNT3" that have extra things between the "ST" and the number
Can anyone help this struggling beginner?
Thanks!
The problem is that [1-999] doesn't mean what you think it does.
SQL Server interprets that as a set of values (1-9, 9, 9) which basically means that if there's more than 1 digit after the ST, the entry won't be returned.
So far as I can tell, your best bet is:
SELECT NUMBER WHERE
NUMBER LIKE '%ST[1-9][0-9][0-9]' OR
NUMBER LIKE '%ST[1-9][0-9]' OR
NUMBER LIKE '%ST[1-9]'
(assuming that your numbers don't have leading zeros - if they do, replace the ones with more zeros)
You need to do
SELECT NUMBER WHERE
NUMBER LIKE '%ST[1-9][0-9][0-9]'
OR NUMBER LIKE '%ST[1-9][0-9]'
OR NUMBER LIKE '%ST[1-9]';
The group in the the [] is a Char/NChar not an Int.
Better still normalise and type your data, so you have an ST bit and an int column for the number.
If you find you need to define different filters on variable string data, consider Full Text Searching or another Lucene related technology depending on your RDBMS.

How can I SELECT DISTINCT on the last, non-numerical part of a mixed alphanumeric field?

I have a data set that looks something like this:
A6177PE
A85506
A51SAIO
A7918F
A810004
A11483ON
A5579B
A89903
A104F
A9982
A8574
A8700F
And I need to find all the ENDings where they are non-numeric. In this example, that means PE, AIO, F, ON, B and F.
In pseudocode, I'm imagining I need something like
SELECT DISTINCT X FROM
(SELECT SUBSTR(COL,[SOME_CLEVER_LOGIC]) AS X FROM TABLE);
Any ideas? Can I solve this without learning regexp?
EDIT: To clarify, my data set is a lot larger than this example. Also, I'm only interested in the part of the string AFTER the numeric part. If the string is "A6177PE" I want "PE".
Disclaimer: I don't know Oracle SQL. But, I think something like this should work:
SELECT DISTINCT X FROM
(SELECT SUBSTR(COL,REGEXP_INSTR(COL, "[[:ALPHA:]]+$")) AS X FROM TABLE);
REGEXP_INSTR(COL, "[[:ALPHA:]]+$") should return the position of the first of the characters at the end of the field.
For readability, I'd recommend using the REGEXP_SUBSTR function (If there are no performance issues of course, as this is definitely slower than the accepted solution).
...also similar to REGEXP_INSTR, but instead of returning the position of the substring, it returns the substring itself
SELECT DISTINCT SUBSTR(MY_COLUMN,REGEXP_SUBSTR("[a-zA-Z]+$")) FROM MY_TABLE;
(:alpha: is supported also, as #Audun wrote )
Also useful: Oracle Regexp Support (beginning page)
For example
SELECT SUBSTR(col,INSTR(TRANSLATE(col,'A0123456789','A..........'),'.',-1)+1)
FROM table;

Return rows where first character is non-alpha

I'm trying to retrieve all columns that start with any non alpha characters in SQlite but can't seem to get it working. I've currently got this code, but it returns every row:
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[A-z]%'
Is there a way to retrieve all rows where the first character of TestNames are not part of the alphabet?
Are you going first character only?
select * from TestTable WHERE substr(TestNames,1) NOT LIKE '%[^a-zA-Z]%'
The substr function (can also be called as left() in some SQL languages) will help isolate the first char in the string for you.
edit:
Maybe substr(TestNames,1,1) in sqllite, I don't have a ready instance to test the syntax there on.
Added:
select * from TestTable WHERE Upper(substr(TestNames,1,1)) NOT in ('A','B','C','D','E',....)
Doesn't seem optimal, but functionally will work. Unsure what char commands there are to do a range of letters in SQLlite.
I used 'upper' to make it so you don't need to do lower case letters in the not in statement...kinda hope SQLlite knows what that is.
try
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[^a-zA-Z]%'
SELECT * FROM NC_CRIT_ATTACH WHERE substring(FILENAME,1,1) NOT LIKE '[A-z]%';
SHOULD be a little faster as it is
A) First getting all of the data from the first column only, then scanning it.
B) Still a full-table scan unless you index this column.