Like operator condition in postgresql data issue - sql

I have below data in the table
HJ-DEF-ABCF010-ABC18-09-17-D
GHJ-ABC-ABFV006-ABC18-09-18-R
OH-DEF-ABFCRT2037-ABC17-01-18-R
I want to populate the value in another column like
HJ-DEF-ABCF010-ABC18-09-17-D BET
GHJ-ABC-ABFV006-ABD18-09-18-R BET
OH-DEF-ABFCRT2037-ABCD17-01-18-R BET
As the mapping for
ABC18 is BET
ABD18 is BET
ABCD17 is BET
I was using the below sql query for that
select col1,case
when col1 like '%-ABC[1-2][0-9]-%' then BET
when col1 like '%-ABD[1-2][0-9]-%' then BET
when col1 like '%-ABCD[1-2][0-9]-%' then BET
else - end form table
which is working fine in SQl sever but in Pogresql we can't use [1-2] to find out the expected digit in the position. Any suggestion or idea how to achieve in Posgresql.

It works just fine. However, you think -- for some reason -- that LIKE supports character classes and other regular-expression-like features. That is simply not true in Postgres, nor in any other database other than SQL Server, Sybase, and MS Access.
Just use regular expressions.
For the original version:
SELECT *
FROM Customers
WHERE Customerid ~ '^[1-2][1-1]';
For the edited version:
SELECT *
FROM Customers
WHERE Customerid ~ '[-].{3}[1-2]1';

You can actually make your current logic work with LIKE, with a bit of effort:
SELECT *
FROM Customers
WHERE Customerid LIKE '11%' OR Customerid LIKE '21%';
You seem to be using perhaps SQL Server or Sybase enhanced LIKE syntax. Postgres does not support this, but it does support its own ~ regex operator. See Gordon's answer for more on that.
The edit to your question can use this:
SELECT *
FROM Customers
WHERE Customerid ~ '-[^-]{3}[1-2]7';
Demo

Used with SIMILAR TO instead of like. select col1,case when col1 SIMILAR TO '%-AB([CD]|CD)[1-2][0-9]-%' then BET else - end form table thnx #WiktorStribiżew

Related

Add column with substring of other column in SQL (Snowflake)

I feel like this should be simple but I'm relatively unskilled in SQL and I can't seem to figure it out. I'm used to wrangling data in python (pandas) or Spark (usually pyspark) and this would be a one-liner in either of those. Specifically, I'm using Snowflake SQL, but I think this is probably relevant to a lot of flavors of SQL.
Essentially I just want to trim the first character off of a specific column. More generally, what I'm trying to do is replace a column with a substring of the same column. I would even settle for creating a new column that's a substring of an existing column. I can't figure out how to do any of these things.
On obvious solution would be to create a temporary table with something like
CREATE TEMPORARY TABLE tmp_sub AS
SELECT id_col, substr(id_col, 2, 10) AS id_col_sub FROM table1
and then join it back and write a new table
CREATE TABLE table2 AS
SELECT
b.id_col_sub as id_col,
a.some_col1, a.some_col2, ...
FROM table1 a
JOIN tmp_sub b
ON a.id_col = b.id_col
My tables have roughly a billion rows though and this feels extremely inefficient. Maybe I'm wrong? Maybe this is just the right way to do it? I guess I could replace the CREATE TABLE table2 AS... to INSERT OVERWRITE INTO table1 ... and at least that wouldn't store an extra copy of the whole thing.
Any thoughts and ideas are most welcome. I come at this humbly from the perspective of someone who is baffled by a language that so many people seem to have mastery over.
I'm not sure the exact syntax/functions in Snowflake but generally speaking there's a few different ways of achieving this.
I guess the general approach that would work universally is using the SUBSTRING function that's available in any database.
Assuming you have a table called Table1 with the following data:
+-------+-----------------------------------------+
Code | Desc
+-------+-----------------------------------------+
0001 | 1First Character Will be Removed
0002 | xCharacter to be Removed
+-------+-----------------------------------------+
The SQL code to remove the first character would be:
select SUBSTRING(Desc,2,len(desc)) from Table1
Please note that the "SUBSTRING" function may vary according to different databases. In Oracle for example the function is "SUBSTR". You just need to find the Snowflake correspondent.
Another approach that would work at least in SQLServer and MySQL would be using the "RIGHT" function
select RIGHT(Desc,len(Desc) - 1) from Table1
Based on your question I assume you actually want to update the actual data within the table. In that case you can use the same function above in an update statement.
update Table1 set Desc = SUBSTRING(Desc,2,len(desc))
You didn't try this?
UPDATE tableX
SET columnY = substr(columnY, 2, 10 ) ;
-Paul-
There is no need to specify the length, as is evidenced from the following simple test harness:
SELECT $1
,SUBSTR($1, 2)
,RIGHT($1, -2)
FROM VALUES
('abcde')
,('bcd')
,('cdef')
,('defghi')
,('e')
,('fg')
,('')
;
Both expressions here - SUBSTR(<col>, 2) and RIGHT(<col>, -2) - effectively remove the first character of the <col> column value.
As for the strategy of using UPDATE versus INSERT OVERWRITE, I do not believe that there will be any difference in performance or outcome, so I might opt for the UPDATE since it is simpler. So, in conclusion, I would use:
UPDATE tableX
SET columnY = SUBSTR(columnY, 2)
;

Sub-Queries in Sybase SQL

We have an application which indexes data using user-written SQL statements. We place those statements within parenthesis so we can limit that query to a certain criteria. For example:
select * from (select F_Name from table_1)q where ID > 25
Though we have discovered that this format does not function using a Sybase database. Reporting a syntax error around the parenthesis. I've tried playing around on a test instance but haven't been able to find a way to achieve this result. I'm not directly involved in the development and my SQL knowledge is limited. I'm assuming the 'q' is to give the subresult an alias for the application to use.
Does Sybase have a specific syntax? If so, how could this query be adapted for it?
Thanks in advance.
Sybase ASE is case sensitive w.r.t. all identifiers and the query shall work:
as per #HannoBinder query :
select id from ... is not the same as select ID from... so make sure of the case.
Also make sure that the column ID is returned by the Q query in order to be used in where clause .
If the table and column names are in Upper case the following query shall work:
select * from (select F_NAME, ID from TABLE_1) Q where ID > 25

One select for multiple records by composite key

Such a query as in the title would look like this I guess:
select * from table t where (t.k1='apple' and t.k2='pie') or (t.k1='strawberry' and t.k2='shortcake')
... --10000 more key pairs here
This looks quite verbose to me. Any better alternatives? (Currently using SQLite, might use MYSQL/Oracle.)
You can use for example this on Oracle, i assume that if you use regular concatenate() instead of Oracle's || on other DB, it would work too (as it is simply just a string comparison with the IN list). Note that such query might have suboptimal execution plan.
SELECT *
FROM
TABLE t
WHERE
t.k1||','||t.k2 IN ('apple,pie',
'strawberry,shortcake' );
But if you have your value list stored in other table, Oracle supports also the format below.
SELECT *
FROM
TABLE t
WHERE (t.k1,t.k2) IN ( SELECT x.k1, x.k2 FROM x );
Don't be afraid of verbose syntax. Concatenation tricks can easily mess up the selectivity estimates or even prevent the database from using indexes.
Here is another syntax that may or may not work in your database.
select *
from table t
where (k1, k2) in(
('apple', 'pie')
,('strawberry', 'shortcake')
,('banana', 'split')
,('raspberry', 'vodka')
,('melon', 'shot')
);
A final comment is that if you find yourself wanting to submit 1000 values as filters you should most likely look for a different approach all together :)
select * from table t
where (t.k1+':'+t.k2)
in ('strawberry:shortcake','apple:pie','banana:split','etc:etc')
This will work in most of the cases as it concatenate and finds in as one column
off-course you need to choose a proper separator which will never come in the value of k1 and k2.
for e.g. if k1 and k2 are of type int you can take any character as separator
SELECT * FROM tableName t
WHERE t.k1=( CASE WHEN t.k2=VALUE THEN someValue
WHEN t.k2=otherVALUE THEN someotherValue END)
- SQL FIDDLE

SQL Contains Question

Can someone explain this to me? I have two queries below with their results.
query:
select * from tbl where contains([name], '"*he*" AND "*ca*"')
result-set:
Hertz Car Rental
Hemingyway's Cantina
query:
select * from tbl where contains([name], '"*he*" AND "*ar*"')
result-set:
nothing
The first query is what I would expect, however I would expect the second query to return "Hertz Car Rental". Am I fundamentally misunderstanding how '*' works in full-text searching?
Thanks!
I think SQL Server is interpreting your strings as prefix_terms. The asterisk is not a plain old wildcard specifier. Fulltext and Contains are word oriented. For what you are trying to do, you would be better off using plain old LIKE instead of CONTAINS.
http://msdn.microsoft.com/en-us/library/ms187787.aspx
"*" only works as a suffix. If you use it as a prefix, the table needs to be scanned no matter what and the index is useless. At that point, you might as well do
Select * From Table Where (Name Like '%he%') And (Name Like '%ar%')
I would try replacing * with % to see how it goes.
select * from tbl where contains([name], '"%he%" AND "%ar%"')

Make an SQL request more efficient and tidy?

I have the following SQL query:
SELECT Phrases.*
FROM Phrases
WHERE (((Phrases.phrase) Like "*ing aids*")
AND ((Phrases.phrase) Not Like "*getting*")
AND ((Phrases.phrase) Not Like "*contracting*"))
AND ((Phrases.phrase) Not Like "*preventing*"); //(etc.)
Now, if I were using RegEx, I might bunch all the Nots into one big (getting|contracting|preventing), but I'm not sure how to do this in SQL.
Is there a way to render this query more legibly/elegantly?
Just by removing redundant stuff and using a consistent naming convention your SQL looks way cooler:
SELECT *
FROM phrases
WHERE phrase LIKE '%ing aids%'
AND phrase NOT LIKE '%getting%'
AND phrase NOT LIKE '%contracting%'
AND phrase NOT LIKE '%preventing%'
You talk about regular expressions. Some DBMS do have it: MySQL, Oracle... However, the choice of either syntax should take into account the execution plan of the query: "how quick it is" rather than "how nice it looks".
With MySQL, you're able to use regular expression where-clause parameters:
SELECT something FROM table WHERE column REGEXP 'regexp'
So if that's what you're using, you could write a regular expression string that is possibly a bit more compact that your 4 like criteria. It may not be as easy to see what the query is doing for other people, however.
It looks like SQL Server offers a similar feature.
Sinec it sounds like you're building this as you go to mine your data, here's something that you could consider:
CREATE TABLE Includes (phrase VARCHAR(50) NOT NULL)
CREATE TABLE Excludes (phrase VARCHAR(50) NOT NULL)
INSERT INTO Includes VALUES ('%ing aids%')
INSERT INTO Excludes VALUES ('%getting%')
INSERT INTO Excludes VALUES ('%contracting%')
INSERT INTO Excludes VALUES ('%preventing%')
SELECT
*
FROM
Phrases P
WHERE
EXISTS (SELECT * FROM Includes I WHERE P.phrase LIKE I.phrase) AND
NOT EXISTS (SELECT * FROM Excludes E WHERE P.phrase LIKE E.phrase)
You are then always just running the same query and you can simply change what's in the Includes and Excludes tables to refine your searches.
Depending on what SQL server you are using, it may support REGEX itself. For example, google searches show that SQL Server, Oracle, and mysql all support regex.
You could push all your negative criteria into a short circuiting CASE expression (works Sql Server, not sure about MSAccess).
SELECT *
FROM phrases
WHERE phrase LIKE '%ing aids%'
AND CASE
WHEN phrase LIKE '%getting%' THEN 2
WHEN phrase LIKE '%contracting%' THEN 2
WHEN phrase LIKE '%preventing%' THEN 2
ELSE 1
END = 1
On the "more efficient" side, you need to find some criteria that allows you to avoid reading the entire Phrases column. Double sided wildcard criteria is bad. Right sided wildcard criteria is good.