Querying on text values being casted into dates in postgresql - sql

I was fiddling around with one of our databases earlier today and I was curious on how to do something in psql. Let's say I have a query like the following (with value1 being a text type in mytable):
SELECT * FROM mytable WHERE value1::date < '2013-10-24'::date;
This works fine if all the rows contain cast-able date strings. The second it finds a string that CAN NOT be casted into a date, an error is thrown like the following:
ERROR: invalid input syntax for type date: "C"
This makes sense and should happen. But is there a way to modify the above query so that if we come to a row where value1 would trigger this error that it would just move on, and skip over that row? I'm asking more out of curiosity than an actual need for an answer, and digging around on the web hasn't produced much (although, that could be do to the keywords I'm using, of course.)

You can use like-pattern or regex to pre-filter only value1s looking like dates:
SELECT * FROM mytable WHERE
value1 like '____-__-__'
and value1::date < '2013-10-24'::date;
SELECT * FROM mytable WHERE
value1 similar to '[1-2][0-9]{3}-[0-1][0-9]-[0-3][0-9]'
and value1::date < '2013-10-24'::date;
Here it is in SQLfiddle - http://sqlfiddle.com/#!1/06916/6

Technically we can't assume left-to-right evaluation order of the WHERE clauses, which means that in such a clause:
WHERE value1 ~ '^\d{4}-\d{2}-\d{2}$' AND value1::date < '2013-10-24'::date
the planner may decide to evaluate value1::date first and the execution will error out before testing the regexp. Should it estimate that the cast plus comparison is faster than the regexp test, it's a perfectly reasonable choice to make.
I don't think the current PostgreSQL code is sophisticated enough to do that specific rearrangement, but this problem is covered by the doc in Expression Evaluation Rules, and it recommends to use CASE to conditionally avoid the evaluation of problematic expressions.
Following this advice, the query would be like:
SELECT * FROM mytable WHERE
CASE WHEN value1 ~ '^\d{4}-\d{2}-\d{2}$'
THEN value1::date < '2013-10-24'::date
ELSE false
END;
Also if the content format seems to match a date but which happens to be invalid (e.g. 2013-01-32), the query will still fail. If this is a concern, you should encapsulate the cast in a function that traps the error:
create function cast_date(text) returns date as $$
begin
return $1::date;
exception when others then return null;
end; $$ language plpgsql;
and replace the test with cast_date(value1) < '2013-10-24'::date

May be this will work:
SELECT * FROM mytable WHERE value1 ~ '^\d{4}-\d{2}-\d{2}$' AND
value1::date < '2013-10-24'::date
regular expression will check if value1 is in needed format. and if it's not then cast to date shouldn't happen.

Related

Search Through All Between Values SQL

I have data following data structure..
_ID _BEGIN _END
7003 99210 99217
7003 10225 10324
7003 111111
I want to look through every _BEGIN and _END and return all rows where the input value is between the range of values including the values themselves (i.e. if 10324 is the input, row 2 would be returned)
I have tried this filter but it does not work..
where #theInput between a._BEGIN and a._END
--THIS WORKS
where convert(char(7),'10400') >= convert(char(7),a._BEGIN)
--BUT ADDING THIS BREAKS AND RETURNS NOTHING
AND convert(char(7),'10400') < convert(char(7),a._END)
Less than < and greater than > operators work on xCHAR data types without any syntactical error, but it may go semantically wrong. Look at examples:
1 - SELECT 'ab' BETWEEN 'aa' AND 'ac' # returns TRUE
2 - SELECT '2' BETWEEN '1' AND '10' # returns FALSE
Character 2 as being stored in a xCHAR type has greater value than 1xxxxx
So you should CAST types here. [Exampled on MySQL - For standard compatibility change UNSIGNED to INTEGER]
WHERE CAST(#theInput as UNSIGNED)
BETWEEN CAST(a._BEGIN as UNSIGNED) AND CAST(a._END as UNSIGNED)
You'd better change the types of columns to avoid ambiguity for later use.
This would be the obvious answer...
SELECT *
FROM <YOUR_TABLE_NAME> a
WHERE #theInput between a._BEGIN and a._END
If the data is string (assuming here as we don't know what DB) You could add this.
Declare #searchArg VARCHAR(30) = CAST(#theInput as VARCHAR(30));
SELECT *
FROM <YOUR_TABLE_NAME> a
WHERE #searchArg between a._BEGIN and a._END
If you care about performance and you've got a lot of data and indexes you won't want to include function calls on the column values.. you could in-line this conversion but this assures that your predicates are Sargable.
SELECT * FROM myTable
WHERE
(CAST(#theInput AS char) >= a._BEGIN AND #theInput < a.END);
I also saw several of the same type of questions:
SQL "between" not inclusive
MySQL "between" clause not inclusive?
When I do queries like this, I usually try one side with the greater/less than on either side and work from there. Maybe that can help. I'm very slow, but I do lots of trial and error.
Or, use Tony's convert.
I supposed you can convert them to anything appropriate for your program, numeric or text.
Also, see here, http://technet.microsoft.com/en-us/library/aa226054%28v=sql.80%29.aspx.
I am not convinced you cannot do your CAST in the SELECT.
Nick, here is a MySQL version from SO, MySQL "between" clause not inclusive?

SQL conversion failed when converting

following situation:
a column xy is defined as varchar(25). In a view (SQL Server Mgmt Studio 2008) I filtered all values with letters (-> is not like '%[A-Z]%') and converted it to int (cast(xy as int)).
If I now try to make comprisons with that column (e.g. where xy < 1000), I'm getting a conversion error. And the message contains a value that should have been filtered with "is not like '%[A-Z]%'". Whats wrong??
thanks for help in advance...
this works (it folters out for example value 'G8111'):
SELECT unid
FROM CD_UNITS AS a INNER JOIN DEF_STATION AS b ON a.STATION = b.STATION
WHERE (b.CURENT = 'T') and UNID like '%[A-Z]%'
but when i put that in a view, an make select on it:
select * from my_view where xy < 3000
system says 'Conversion failed when converting the varchar value 'G8111' to data type int.' but 'G8111' should be filtered out in query above...
The optimizer does crazy things at times, so despite the fact that an "inner" filter1 "should" protect you, the optimizer may still push the conversion lower down than the filter and cause such errors.
The only semi-documented place where it will not do this is within a CASE expression:
The CASE statement(sic) evaluates its conditions sequentially and stops with the first condition whose condition is satisfied. In some situations, an expression is evaluated before a CASE statement receives the results of the expression as its input.
...
You should only depend on order of evaluation of the WHEN conditions for scalar expressions (including non-correlated sub-queries that return scalars), not for aggregate expressions
So the only way that should currently work would be:
CASE WHEN xy NOT LIKE '%[^0-9]%' THEN CONVERT(int,xy) END < 1000
This also uses a double-negative with LIKE to ensure that it only attempts the conversion when the value only contains digits.
1Whether this be in a subquery, a CTE, a View, or even just considering the logical processing order of SELECT and WHERE clauses. Within a single query, the optimizer can and will push conversion operations past filters.

SQL pattern matching

I have a question related to SQL.
I want to match two fields for similarities and return a percentage on how similar it is.
For example if I have a field called doc, which contains the following
This is my first assignment in SQL
and in another field I have something like
My first assignment in SQL
I want to know how I can check the similarities between the two and return by how much percent.
I did some research and wanted a second opinion plus I never asked for source code. Ive looked at Soundex(), Difference(), Fuzzy string matching using Levenshtein distance algorithm.
You didn't say what version of Oracle you are using. This example is based on 11g version.
You can use edit_distance function of utl_match package to determine how many characters you need to change in order to turn one string to another. greatest function returns the greatest value in the list of passed in parameters. Here is an example:
-- sample of data
with t1(col1, col2) as(
select 'This is my first assignment in SQL', 'My first assignment in SQL ' from dual
)
-- the query
select trunc(((greatest(length(col1), length(col2)) -
(utl_match.edit_distance(col2, col1))) * 100) /
greatest(length(col1), length(col2)), 2) as "%"
from t1
result:
%
----------
70.58
Addendum
As #jonearles correctly pointed out, it is much simpler to use edit_distance_similarity function of utl_match package.
with t1(col1, col2) as(
select 'This is my first assignment in SQL', 'My first assignment in SQL ' from dual
)
select utl_match.edit_distance_similarity(col1, col2) as "%"
from t1
;
Result:
%
----------
71

How can I SELECT DISTINCT on the last, non-numerical part of a mixed alphanumeric field?

I have a data set that looks something like this:
A6177PE
A85506
A51SAIO
A7918F
A810004
A11483ON
A5579B
A89903
A104F
A9982
A8574
A8700F
And I need to find all the ENDings where they are non-numeric. In this example, that means PE, AIO, F, ON, B and F.
In pseudocode, I'm imagining I need something like
SELECT DISTINCT X FROM
(SELECT SUBSTR(COL,[SOME_CLEVER_LOGIC]) AS X FROM TABLE);
Any ideas? Can I solve this without learning regexp?
EDIT: To clarify, my data set is a lot larger than this example. Also, I'm only interested in the part of the string AFTER the numeric part. If the string is "A6177PE" I want "PE".
Disclaimer: I don't know Oracle SQL. But, I think something like this should work:
SELECT DISTINCT X FROM
(SELECT SUBSTR(COL,REGEXP_INSTR(COL, "[[:ALPHA:]]+$")) AS X FROM TABLE);
REGEXP_INSTR(COL, "[[:ALPHA:]]+$") should return the position of the first of the characters at the end of the field.
For readability, I'd recommend using the REGEXP_SUBSTR function (If there are no performance issues of course, as this is definitely slower than the accepted solution).
...also similar to REGEXP_INSTR, but instead of returning the position of the substring, it returns the substring itself
SELECT DISTINCT SUBSTR(MY_COLUMN,REGEXP_SUBSTR("[a-zA-Z]+$")) FROM MY_TABLE;
(:alpha: is supported also, as #Audun wrote )
Also useful: Oracle Regexp Support (beginning page)
For example
SELECT SUBSTR(col,INSTR(TRANSLATE(col,'A0123456789','A..........'),'.',-1)+1)
FROM table;

Return rows where first character is non-alpha

I'm trying to retrieve all columns that start with any non alpha characters in SQlite but can't seem to get it working. I've currently got this code, but it returns every row:
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[A-z]%'
Is there a way to retrieve all rows where the first character of TestNames are not part of the alphabet?
Are you going first character only?
select * from TestTable WHERE substr(TestNames,1) NOT LIKE '%[^a-zA-Z]%'
The substr function (can also be called as left() in some SQL languages) will help isolate the first char in the string for you.
edit:
Maybe substr(TestNames,1,1) in sqllite, I don't have a ready instance to test the syntax there on.
Added:
select * from TestTable WHERE Upper(substr(TestNames,1,1)) NOT in ('A','B','C','D','E',....)
Doesn't seem optimal, but functionally will work. Unsure what char commands there are to do a range of letters in SQLlite.
I used 'upper' to make it so you don't need to do lower case letters in the not in statement...kinda hope SQLlite knows what that is.
try
SELECT * FROM TestTable WHERE TestNames NOT LIKE '[^a-zA-Z]%'
SELECT * FROM NC_CRIT_ATTACH WHERE substring(FILENAME,1,1) NOT LIKE '[A-z]%';
SHOULD be a little faster as it is
A) First getting all of the data from the first column only, then scanning it.
B) Still a full-table scan unless you index this column.