Complicated text compare in SQL - sql

Suppose I have a table result
---------------------------------------------------------
coupon id| required_product_ids|used_product_in_this_year
---------------------------------------------------------
1 |1,2,3,10 |2,3,4,5,6,7,8,9,10,12,13
How can I check if used_product_in_this_year has at least one required_product_ids by SQL.
I tried somethings with SQL like keyword but did not success.

There is no native SQL construct for performing this type of comparison.
To find a single value in a comma separated list, MySQL provides a FIND_IN_SET function. But to handle a comma separated list of values, to check each one, to see if it's in a list, each separate value would need to be supplied into FIND_IN_SET. And that would be unweildy.
If the hard and fast requirement is to handle this comparison in a SQL statement, I'd recommend writing a function to do the comparison.
DELIMITER $$
CREATE FUNCTION upity_halo_rpi(upity VARCHAR(4000), rpi VARCHAR(4000))
RETURNS INT DETERMINISTIC
BEGIN
-- TODO: extract first element of upity
-- TODO: check if element is in rpi list
-- if it is found in the list
RETURN TRUE;
-- otherwise, split off next element
-- loop through all elements
-- if loop completes without finding a match is found, fall out
RETURN FALSE;
END$$
DELIMITER ;
With the function written, and thoroughly tested, it could be used in a SQL statement. To return a column that indicates that the row "has at least one"...
SELECT t.coupon id
, t.required_product_ids
, t.used_product_in_this_year
, upity_halo_rpi(t.used_product_in_this_year,t.required_product_ids) AS halo
FROM result t
To return:
coupon id required_product_ids used_product_in_this_year halo
--------- -------------------- ------------------------ -----
1 1,2,3,10 2,3,4,5,6,7,8,9,10,12,13 1
I'm not going to write the function. I'm just demonstrating a possible approach. One possible answer to "how" this type of comparison operation could be performed within a SQL statement.

This is how you can do it, without changing your database structure.
In MYSQL (Tested):
select * from TableName
where concat(',', used_product_in_this_year, ',') regexp concat(',',replace(required_product_ids,',',',|,'),',')
Using this Regex structure with your table and manipulating the data with a some mysql string functions.
I don't recommend your database structure, but I like puzzles and this one was fun, thanks for the challenge.

Related

Oracle SQL Stored Procedure with Oracle reserved words passed to variable

I have a stored procedure that gets passed a string of values separated with spaces, which then does a search in the table and returns data where a column has any of those values. All went well until a user needed to pass 'INDEX END UNKNOWN PROCESS' which didn't return anything, even though there is data with those values:
CREATE OR REPLACE PROCEDURE Searches
(
QUEUE IN TYPES.CHAR50,
P_CURSOR IN OUT SYS_REFCURSOR
)
AS
BEGIN
OPEN P_CURSOR FOR
SELECT *
FROM tablez t
WHERE /* If the subquery returns UNKNOWN, END, PROCESS, INDEX which are Oracle reserved words the main query won't return any results */
/* In order to pass this inconsistency, I concatenated XYZ to both sides when using IN Clause */
CONCAT(LTRIM(RTRIM(t.QUEUECD)),'XYZ') IN ( SELECT CONCAT(LTRIM(RTRIM(tr.prom)),'XYZ')
FROM ( SELECT regexp_substr(QUEUE,'[^ ]+', 1, LEVEL) prom
FROM dual
CONNECT BY regexp_substr(QUEUE, '[^ ]+', 1, LEVEL) IS NOT NULL
) tr
)
;
END Searches;
So, I changed the code to use regexp_substr, and only concatenating 'XYZ' returned values when doing the comparison. But this is a temporary fix, because QUEUECD is an indexed column in the database and using CONCAT in WHERE clause led to performance issues, on big data.
Do you have any suggestions how to improve the performance or pass the list of values in a different way?
Thank you!
Oracle SQL Stored Procedure with Oracle reserved words passed to
variable
It looks like a different problem. Look: there is no way to pass "reserved words" as a value of variable - when you've got varchar variable then the value is a text - nothing more.
I've made a sample table and tested a query without concatenation 'XYZ' - and I don't have such problems. Maybe there are some white, non-printable characters at the end or beginning in records??
Regarding to:
Do you have any suggestions how to improve the performance or pass the list of values in a different way?
Yes. Pass collection (nested table) as parameter. For example:
create or replace type T_TAB_STRING as table of varchar2(4000);
Next, change type of QUEUE from TYPES.CHAR50 to T_TAB_STRING.
Then you can use table() expression to unnest collection inside query like that:
SELECT *
FROM tablez t
WHERE t.QUEUECD IN ( SELECT /*+ DYNAMIC_SAMPLING(tr, 2) */
*
FROM TABLE(QUEUE) tr
)
;
The dynamic sampling hint is for forcing DB to check how many elements is inside collection. Without that DB assume it is the size of 1 block (usually 8k), so CBO could choose to do full scan instead of index scan.
If you cannot use that hint, or it doesn't work for some reason, there is other way to help CBO with collections in queries. It's implementing Extensible Optimiser interface for that collection. It has been written by Adrian Billington in this article how to do it.

Pass table name used in FROM to function automatically?

Working with PostgreSQL 9.6.3. I am new to functions in databases.
Let's say there are multiple tables of item numbers. Each one has the item number, the item cost and several other columns which are factored into the "additional cost". I would like to put the calculation into a function so I can call it for any of these tables.
So instead of:
SELECT
itemnumber,
itemname,
base,
CASE
WHEN labor < 100 AND overhead < .20 THEN
WHEN .....
WHEN .....
WHEN .....
.....
END AS add_cost,
gpm
FROM items1;
I can just do:
SELECT
itemnumber,
itemname,
base,
calc_add_cost(),
gpm
FROM items1;
If I want to be able to use it on any of the item tables, I guess I would need to set a table_name parameter that the function takes since adding the table name into the function would be undesirable to say the least.
calc_add_cost(items1)
However, is there a simpler way such that when I call calc_add_cost() it will just use the table name from the FROM clause?
SELECT ....., calc_add_cost(item1) FROM item1
Just seems redundant.
I did come across a few topics with titles that sounded like they addressed what I was hoping to accomplish, but upon reviewing them it looked like they were a different issue.
You can even emulate a "computed field" or "generated column" like you had in mind. Basics here:
Store common query as column?
Simple demo for one table:
CREATE OR REPLACE FUNCTION add_cost(items1) -- function name = default col name
RETURNS numeric AS
$func$
SELECT
CASE
WHEN $1.labor < 100 AND $1.overhead < .20 THEN numeric '1'
-- WHEN .....
-- WHEN .....
-- WHEN .....
ELSE numeric '0' -- ?
END;
$func$
LANGUAGE sql IMMUTABLE;
Call:
SELECT *, t.add_cost FROM items1 t;
Note the table-qualification in t.add_cost. I only demonstrate this syntax variant since you have been asking for it. My advise is to use the less confusing standard syntax:
SELECT *, add_cost(t) AS add_cost FROM items1 t; -- column alias is also optional
However, SQL is a strictly typed language. If you define a particular row type as input parameter, it is bound to this particular row type. Passing various whole table types is more sophisticated, but still possible with polymorphic input type.
CREATE OR REPLACE FUNCTION add_cost(ANYELEMENT) -- function name = default col name
RETURNS numeric AS
$func$
SELECT
CASE
WHEN $1.labor < 100 AND $1.overhead < .20 THEN numeric '1'
-- WHEN .....
-- WHEN .....
-- WHEN .....
ELSE numeric '0' -- ?
END;
$func$
LANGUAGE sql IMMUTABLE;
Same call for any table that has the columns labor and overhead with matching data type.
dbfiddle here
Also see the related simple case passing simple values here:
How to put part of a SELECT statement into a Postgres function
For even more complex requirements - like also returning various row types - see:
Refactor a PL/pgSQL function to return the output of various SELECT queries

Conditionally delete item inside an Array Field PostgreSQL

I'm building a kind of dictionary app and I have a table for storing words like below:
id | surface_form | examples
-----------------------------------------------------------------------
1 | sounds | {"It sounds as though you really do believe that",
| | "A different bell begins to sound midnight"}
Where surface_form is of type CHARACTER VARYING and examples is an array field of CHARACTER VARYING
Since the examples are generated automatically from another API, it might not contain the exact "surface_form". Now I want to keep in examples only sentences that contain the exact surface_form. For instance, in the given example, only the first sentence is kept as it contain sounds, the second should be omitted as it only contain sound.
The problem is I got stuck in how to write a query and/or plSQL stored procedure to update the examples column so that it only has the desired sentences.
This query skips unwanted array elements:
select id, array_agg(example) new_examples
from a_table, unnest(examples) example
where surface_form = any(string_to_array(example, ' '))
group by id;
id | new_examples
----+----------------------------------------------------
1 | {"It sounds as though you really do believe that"}
(1 row)
Use it in update:
with corrected as (
select id, array_agg(example) new_examples
from a_table, unnest(examples) example
where surface_form = any(string_to_array(example, ' '))
group by id
)
update a_table
set examples = new_examples
from corrected
where examples <> new_examples
and a_table.id = corrected.id;
Test it in rextester.
Maybe you have to change the table design. This is what PostgreSQL's documentation says about the use of arrays:
Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.
Documentation:
https://www.postgresql.org/docs/current/static/arrays.html
The most compact solution (but not necessarily the fastest) is to write a function that you pass a regular expression and an array and which then returns a new array that only contains the items matching the regex.
create function get_matching(p_values text[], p_pattern text)
returns text[]
as
$$
declare
l_result text[] := '{}'; -- make sure it's not null
l_element text;
begin
foreach l_element in array p_values loop
-- adjust this condition to whatever you want
if l_element ~ p_pattern then
l_result := l_result || l_element;
end if;
end loop;
return l_result;
end;
$$
language plpgsql;
The if condition is only an example. You need to adjust that to whatever you exactly store in the surface_form column. Maybe you need to test on word boundaries for the regex or a simple instr() would do - your question is unclear about that.
Cleaning up the table then becomes as simple as:
update the_table
set examples = get_matching(examples, surface_form);
But the whole approach seems flawed to me. It would be a lot more efficient if you stored the examples in a properly normalized data model.
In SQL, you have to remember two things.
Tuple elements are immutable but rows are mutable via updates.
SQL is declarative, not procedural
So you cannot "conditionally" "delete" a value from an array. You have to think about the question differently. You have to create a new array following a specification. That specification can conditionally include values (using case statements). Then you can overwrite the tuple with the new array.
Looks like one way could to update the array with array elements that are valid by doing a select using like or some regular expression.
https://www.postgresql.org/docs/current/static/arrays.html
If you want to hold elements from array that have "surface_form" in it you have to use that entries with substring(....,...) is not null
First you unnest the array, hold only items that match, and then array_agg the stored items
Here is a little query you can run to test without any table.
SELECT
id,
surface_form,
(SELECT array_agg(examples_matching)
FROM unnest(surfaces.examples) AS examples_matching
WHERE substring(examples_matching, surfaces.surface_form) IS NOT NULL)
FROM
(SELECT
1 AS id,
'example' :: TEXT AS surface_form,
ARRAY ['example form', 'test test','second example form'] :: TEXT [] AS examples
) surfaces;
You can select data in temp table using
Then update temp table using update query on row number
Merge value using
This merge value you can update in original table
For Example
Suppose you create temp table
Temp (id int, element character varying)
Then update Temp table and nest it.
Finally update original table
Here is the query you can directly try to execute in editor
CREATE TEMP TABLE IF NOT EXISTS temp_element (
id bigint,
element character varying)WITH (OIDS);
TRUNCATE TABLE temp_element;
insert into temp_element select row_number() over (order by p),p from (
select unnest(ARRAY['It sounds as though you really do believe that',
'A different bell begins to sound midnight']) as P)t;
update temp_element set element = 'It sounds as though you really'
where element = 'It sounds as though you really do believe that';
--update table
select array_agg(r) from ( select element from temp_element)r

PostgreSQL function check if field is CSV

I can accomplish this with PHP in the end, but it would be more elegant to have it in the SQL. I have no choice but to use PostgreSQL for this project and I have never used it before, so...
There is a table 'test_results' that contains the columns:
sample_id(text) | test_result(text) | sessiontime(bigint)
Another table has information that includes the sample_id, but some have had multiple tests run. When that happens the sample_id field is populated with a CSV list of sample_ids. Not all of these sample_ids exist in the test_results table. There is also no way of knowing how many tests have been run.
If there is only one sample_id it will be in the table and should be returned. Otherwise the field of CSV needs to split and checked to see if it exists and since only one test_result need be returned the one with the latest sessiontime(which is epochtime) need be returned.
I have been over this many ways and my code has now become a jumble of unworkable ...
Guidance would be appreciated. I can always go back and do it in the PHP if I need...
EDIT TO BE CLEAR.. SOMETHING LIKE THIS:
DROP FUNCTION get_test_results(text);
CREATE OR REPLACE FUNCTION get_test_results(sample_id TEXT) returns
table(test_results text) as $$
BEGIN
IF position("," in sample_id) THEN
-----DO SOMETHING to
ELSE
SELECT test_results FROM test_results WHERE sample_id = sample_id ORDER BY sessiontime DESC;
END IF;
END
$$ LANGUAGE plpgsql;
This not functioning yet.... needs to split_part(sample_id, ','::text, 1) then get all the results but on the one with the most recent sessiontime.
PostgreSQL is an excellent choice and very versatile for things like this.
First of, to determine if your sample_id is a single value or a list of values:
-- (sample_id ~ '^ *\d\+ *$') returns true if there is one number only
SELECT CASE WHEN sample_id ~ '^ *\d\+ *$' THEN sample_id::int END
Then, to open up the list of ids in a comma-separated list of samples you can unnest the array returned by string_to_array:
SELECT i
FROM unnest(string_to_array(sample_id, ',')::int[]) i
You can use that for either single or multiple numbers (since there is just one value, you'll get only one row).

SQL Checking Substring in a String

I have a table with column mapping which store record: "IV=>0J,IV=>0Q,IV=>2,V=>0H,V=>0K,VI=>0R,VI=>1,"
What is the sql to check whether or not a substring is in column mapping.
so, I would like this:
if I have "IV=>0J" would return true, because IV=>0J is exact in string "mapping"
if I have "IV=>01" would return false. And so on...
I try this:
SELECT * FROM table WHERE charindex('IV=>0J',mapping)
But when I have "IV=>0", it returns TRUE. But, it should return FALSE.
Thank You..
You can search with commas included. Just also add one at beginning and end of mapping:
SELECT * FROM table WHERE charindex(',IV=>0J,',',' + mapping + ',') <> 0
or
SELECT * FROM table WHERE ',' + mapping + ',' LIKE '%,IV=>OJ,%'
This should do the trick:
SELECT * FROM table
WHERE
mapping LIKE '%,IV=>0J,%'
OR mapping LIKE '%,IV=>0J'
OR mapping LIKE 'IV=>0J,%'
OR mapping = 'IV=>0J'
But you should really normalize the database - you are currently violating the principle of atomicity, and therefore the 1NF. Your current difficulties in querying and the future difficulties with performance that you are about to encounter all stem from this root problem...
While you can search by including a comma in the string, this is a bad design for several reasons.
You are unable to take advantage of indexing
You force a full scan of the table, which will lead to bad performance AND excessive blocking.
You have to make sure that there is always a leading or a trailing comma (depends on what you expect in your LIKE expression).
You are no longer able to edit a single entry, you'll have to replace the entire string each time you want to change even a single mapping.
You open yourself to a concurrency nightmare if more that one users try to update different mappings that just happen to be stored in the same column.
Your table isn't even in 1st normal form any more, which is why you have such difficulties
You should normalize your mapping column, by extracting the data to a different mapping table, with at least the From and To columns you require. You can then add these columns to an index an convert your query using only a single index seek.
You can also add the ID values of your source table to the Mappings table and the index. This will allow you to convert the lookup for a source row to a join between the two tables that takes advantage of indexing
charindex returns the position of the text, not Boolean.
to check if the text exists, compare to 0:
SELECT * FROM table WHERE charindex('IV=>0J',mapping) <> 0
I think you're missing something here, the Charindex function does not return TRUE or FALSE.
It returns the starting point of the substring inside master string, or if the substring is not present, then -1.
So you query should read,
SELECT * FROM table WHERE charindex('IV=>0J',mapping) > 0