Challenging Firebird Recursive CTE issue - sql

This might not be a straightforward Firebird question, but I'm hoping there's a feature I'm unaware of that can help me beyond plain-vanilla SQL.
I have two tables. The first is a list of names of "critical parameters," and the second relates certain object IDs, critical parameter names, and critical parameter values:
CREATE TABLE CRITICALPARAMS
(
PARAM Varchar(32) NOT NULL,
INDX INTEGER NOT NULL,
CONSTRAINT PK_CRITICALPARAMS_1 PRIMARY KEY (PARAM),
CONSTRAINT UNQ_CRITICALPARAMS_1 UNIQUE (INDX)
);
CREATE TABLE CRITICALPARAMVALS
(
ID INTEGER NOT NULL,
PARAM Varchar(32) NOT NULL,
VAL Float NOT NULL,
CONSTRAINT PK_CRITICALPARAMVALS_1 PRIMARY KEY (ID,PARAM)
);
Let's suppose we have a four-dimensional space:
insert into CRITICALPARAMS values ('a', 1);
insert into CRITICALPARAMS values ('b', 2);
insert into CRITICALPARAMS values ('c', 3);
insert into CRITICALPARAMS values ('foo', 4);
...and a handful of objects in that space:
insert into CRITICALPARAMVALS values (1, 'a', 0.0);
insert into CRITICALPARAMVALS values (1, 'b', 0.0);
insert into CRITICALPARAMVALS values (1, 'c', 2.0);
insert into CRITICALPARAMVALS values (1, 'foo', 99.0);
insert into CRITICALPARAMVALS values (2, 'a', 0.0);
insert into CRITICALPARAMVALS values (2, 'b', 0.0);
insert into CRITICALPARAMVALS values (2, 'c', 2.0);
insert into CRITICALPARAMVALS values (2, 'foo', 99.0);
insert into CRITICALPARAMVALS values (3, 'a', 0.0);
insert into CRITICALPARAMVALS values (3, 'b', 0.0);
insert into CRITICALPARAMVALS values (3, 'c', 1.0);
insert into CRITICALPARAMVALS values (3, 'foo', 98.0);
insert into CRITICALPARAMVALS values (4, 'a', 0.0);
insert into CRITICALPARAMVALS values (4, 'b', 0.0);
insert into CRITICALPARAMVALS values (4, 'c', 1.0);
insert into CRITICALPARAMVALS values (4, 'foo', 98.0);
insert into CRITICALPARAMVALS values (5, 'a', 0.0);
insert into CRITICALPARAMVALS values (5, 'b', 0.0);
insert into CRITICALPARAMVALS values (5, 'c', 2.0);
insert into CRITICALPARAMVALS values (5, 'foo', 98.0);
The problem is to partition the critical parameter space, grouping together all object IDs that have the same parameter values. We can think of using a "seed" object ID, and asking what other IDs belong to the same partition as the seed object.
In our example, objects 1 and 2 form a partition, 3 and 4 form another, and 5 forms a third. All five objects are equal in the critical parameters a and b, but differ in parameters c and foo.
Is there any way to solve this using plain-vanilla SQL? How about a recursive CTE?
I've solved the problem crudely, using EXECUTE STATEMENT in a stored procedure, looping through the seed's critical parameter values and manually constructing a big SQL statement with as many WHERE clauses as critical parameters, but that solution doesn't scale as I go up to around 500-1000 critical parameters (or more!).
My current attempt has petered out at the following point -- I first define a view that can give me a partition along a single critical parameter (TEST_FLOAT_EQ is a selectable stored proc that compares two floats for 'good enough!' equality):
CREATE VIEW VGROUPIDBYPARAM (SEEDID, GROUPMEMBERID, CRITPARAMINDX)
AS
select a.id as seedid, b.id as groupmemberid, c.INDX as critparamindx
from CRITICALPARAMVALS a
join CRITICALPARAMVALS b on a.PARAM=b.param and (exists (select isequal from TEST_FLOAT_EQ(a.val, b.val, 1e-5) where ISEQUAL=1))
join CRITICALPARAMS c on b.param=c.PARAM;
...and then I want to use the VGROUPIDBYPARAM view inductively, in something like the following partially-complete select:
SELECT a1.SEEDID, a6.GROUPMEMBERID
FROM VGROUPIDBYPARAM a1
join VGROUPIDBYPARAM a2 on a1.SEEDID=a2.SEEDID and a1.GROUPMEMBERID=a2.GROUPMEMBERID
join VGROUPIDBYPARAM a3 on a1.SEEDID=a3.SEEDID and a2.GROUPMEMBERID=a3.GROUPMEMBERID
join VGROUPIDBYPARAM a4 on a1.SEEDID=a4.SEEDID and a3.GROUPMEMBERID=a4.GROUPMEMBERID
join VGROUPIDBYPARAM a5 on a1.SEEDID=a5.SEEDID and a4.GROUPMEMBERID=a5.GROUPMEMBERID
join VGROUPIDBYPARAM a6 on a1.SEEDID=a6.SEEDID and a5.GROUPMEMBERID=a6.GROUPMEMBERID
...
where a1.CRITPARAMINDX=1
and a2.CRITPARAMINDX=2
and a3.CRITPARAMINDX=3
and a4.CRITPARAMINDX=4
and a5.CRITPARAMINDX=5
and a6.CRITPARAMINDX=6
...
At the end of this inductive process (which I'm hoping a recursive CTE can imitate), the only surviving records that made it through the pile of JOINS have group member ID's belong to the same partition as the seed ID.
Many thanks to anyone that can help me solve this efficiently!

To solve the problem, I would start with this simple query (count matching dimensions in other objects):
SELECT
CPV1.ID AS ID1,
CPV2.ID AS ID2,
COUNT(*)
FROM
CRITICALPARAMVALS CPV1
INNER JOIN CRITICALPARAMVALS CPV2 ON CPV1.ID <> CPV2.ID
AND CPV1.PARAM = CPV2.PARAM
AND CPV1.VAL = CPV2.VAL
GROUP BY
CPV1.ID, CPV2.ID
With the following output:
As you can see, the interesting rows are marked with yellow background.
To filter only those rows we should add this condition:
HAVING COUNT(*) = (SELECT COUNT(*) FROM CRITICALPARAMS)
We can think of using a "seed" object ID, and asking what other IDs
belong to the same partition as the seed object.
The final query to answer the above question, with :SEED parameter, looks like this:
SELECT
CPV2.ID
FROM
CRITICALPARAMVALS CPV1
INNER JOIN CRITICALPARAMVALS CPV2 ON CPV1.ID <> CPV2.ID
AND CPV1.PARAM = CPV2.PARAM
AND CPV1.VAL = CPV2.VAL
WHERE CPV1.ID = :SEED
GROUP BY
CPV1.ID, CPV2.ID
HAVING COUNT(*) = (SELECT COUNT(*) FROM CRITICALPARAMS)
It should perform well even for big data sets.

Related

Why selecting a single attribute returns less rows than selecting all columns in oracle SQL

The tables created and the queries made are not the primary focus of this question, what confuses me is that why the first query and the second query returns different numbers of rows
drop table Reserves;
drop table Sailors;
drop table Boats;
create table Sailors (
sid char(1) not null,
sname char(1) not null,
rating int,
age int not null,
primary key (sid)
);
create table Boats (
bid char(1) not null,
bname char(1) not null,
color varchar(5),
primary key (bid)
);
create table Reserves (
sid char(1) not null,
bid char(1) not null,
rdate int not null,
primary key (sid, bid, rdate),
foreign key (sid) references Sailors(sid)
on delete cascade,
foreign key (bid) references Boats(bid)
on delete cascade
);
------------------------------------------------------------------------------
-- Insert values
insert into Sailors values ('1', 'q', 90, 24);
insert into Sailors values ('0', 's', 60, 22);
insert into Sailors values ('2', 'd', 80, 20);
insert into Sailors values ('3', 'w', 70, 18);
insert into Sailors values ('4', 'a', 60, 19);
insert into Sailors values ('5', 'l', 80, 17);
insert into Sailors values ('6', 'o', 90, 18);
insert into Sailors values ('7', 'q', 70, 20);
insert into Sailors values ('8', 'd', 60, 16);
insert into Sailors values ('9', 'i', 80, 22);
insert into Boats values ('0', 'U', 'red');
insert into Boats values ('1', 'P', 'red');
insert into Boats values ('2', 'Q', 'blue');
insert into Boats values ('3', 'C', 'green');
insert into Boats values ('4', 'L', 'blue');
insert into Boats values ('5', 'O', 'blue');
insert into Boats values ('6', 'A', 'red');
insert into Boats values ('7', 'C', 'red');
insert into Boats values ('8', 'Y', 'green');
insert into Boats values ('9', 'N', 'blue');
insert into Reserves values ('0', '0', 3);
insert into Reserves values ('0', '1', 2);
insert into Reserves values ('0', '2', 1);
insert into Reserves values ('0', '2', 3);
insert into Reserves values ('1', '0', 4);
insert into Reserves values ('3', '2', 2);
insert into Reserves values ('4', '0', 3);
insert into Reserves values ('4', '0', 1);
insert into Reserves values ('4', '1', 3);
insert into Reserves values ('4', '6', 4);
insert into Reserves values ('4', '7', 1);
insert into Reserves values ('5', '8', 2);
insert into Reserves values ('5', '9', 2);
insert into Reserves values ('7', '4', 4);
insert into Reserves values ('7', '5', 1);
insert into Reserves values ('8', '3', 2);
insert into Reserves values ('9', '3', 3);
insert into Reserves values ('9', '0', 1);
insert into Reserves values ('9', '6', 1);
insert into Reserves values ('9', '8', 2);
commit;
select *
from Sailors join Boats on color='red' natural left outer join Reserves
where rdate is null;
select sid
from Sailors join Boats on color='red' natural left outer join Reserves
where rdate is null;
I want to find the sid of the sailors who have not ordered all the red boats, the first query above returns the correct rows I am expecting, nonethless the second query returns only rows with sid=2 and sid=6, despite the two queries are identical. sailors with sid 2 and 6 are the only sailors who have not booked any boat.
As far as I can tell, this looks like a bug in Oracle's implementation of natural [...] join. I will do some testing to see if it affects inner joins too.
Instead of natural join, one can use the syntax left|right|inner join USING(...) and giving the list of column names in the using clause. The list of columns should be the list of ALL columns that have the same name in the two members of the join.
Very simple experimentation with the data you provided (+1 even for that alone) shows that the results are the same as if we had written using(sid, bid) in the first query, but only using(sid) in the second. If in either query - regardless of what it selects, whether * or sid - you use the using syntax, you get the same number of rows in the output as either your first or your second query, depending on what you put in the using clause.
So, what Oracle does for the second query is simply wrong. I can only speculate, but I believe Oracle looks at the SELECT clause first, and perhaps at other clauses, to see what columns it needs to retrieve from each table. (For example, this tells the optimizer what indexes it could use, etc.) And at this step, Oracle decides - in your second query - that it doesn't need bid. Then when it translates "natural" join to its own internal code, it doesn't throw in bid as a join column. Which is wrong - and which is why I called this a "bug".
IMPORTANT NOTE: Others have commented that both queries are "wrong" in that they do not solve the problem you were trying to solve. That may be entirely true. I didn't even look at your problem specification; here I am answering your question, which is valid regardless of the problem - which is, why do the two queries produce different numbers of rows. Even if they are "wrong" for your use case, they should essentially give "the same" wrong answer, not "different" wrong answers. That is the only thing I discussed above.
To be honest, the query is ... errr... not correct
For example it gives you sailors with sid = 0 and sid = 1. These sailors bought red boats
This is SQL you need to display sailors that did not buy a red boat.
select sl.*
from Sailors sl
where not exists (select 1
from Reserves rs
join Boats bs on rs.bid = bs.bid
where bs.color = 'red' and rs.sid = sl.sid);
These 2 queries will give you what you asked for in your last comment. I've UNION'd them together to give you a single list of SIDs but obviously you can run them independently if you want 22 separate lists.
-- Sailors with no reservations
select s.sid
from sailors s
where s.sid not in (
select sid from reserves)
union
-- Sailors with reservations but not of red boats
select s.sid
from sailors s
where s.sid not in (
Select distinct r.sid
from reserves r
inner join boats b on r.bid = b.bid
where b.color = 'red');

PL/SQL update all records except with max value

Please help with SQL query. I've got a table:
CREATE TABLE PCDEVUSER.tabletest
(
id INT PRIMARY KEY NOT NULL,
name VARCHAR2(64),
pattern INT DEFAULT 1 NOT NULL,
tempval INT
);
Let's pretend it was filled with values:
INSERT INTO TABLETEST (ID, NAME, PATTERN, TEMPVAL) VALUES (1, 'A', 1, 10);
INSERT INTO TABLETEST (ID, NAME, PATTERN, TEMPVAL) VALUES (2, 'A', 1, 20);
INSERT INTO TABLETEST (ID, NAME, PATTERN, TEMPVAL) VALUES (3, 'A', 2, 10);
INSERT INTO TABLETEST (ID, NAME, PATTERN, TEMPVAL) VALUES (5, 'A', 2, 20);
INSERT INTO TABLETEST (ID, NAME, PATTERN, TEMPVAL) VALUES (4, 'A', 2, 30);
And I need to update all records (grouped by pattern) with NO MAX value TEMPVALUE. So as result I have to update records with Ids (1, 3, 5). Records with IDs (2, 4) has max values in there PATTERN group.
HELP PLZ
This select statement will help you get the IDs you need :
SELECT
*
FROM
(SELECT
id
,name
,pattern
,tempval
,MAX(tempval) OVER (PARTITION BY pattern) max_tempval
FROM
tabletest
)
WHERE 1=1
AND tempval != max_tempval
;
You should be able to build an update statement around that easily enough
Something like this:
update tabletest t
set ????
where t.tempval < (select max(tempval) from tabletest tt where tt.pattern = t.pattern);
It is unclear what values you want to set. The ???? is for the code that sets the values.

SQL merge statement with multiple conditions

I have a requirement with some business rules to implement on SQL (within a PL/SQL block): I need to evaluate such rules and according to the result perform the corresponding update, delete or insert into a target table.
My database model contains a "staging" and a "real" table. The real table stores records inserted in the past and the staging one contains "fresh" data coming from somewhere that needs to be merged into the real one.
Basically these are my business rules:
Delta between staging MINUS real --> Insert rows into the real
Delta between real MINUS staging--> Delete rows from the real
Rows which PK is the same but any other fields different: Update.
(Those "MINUS" will compare ALL the fields to get equality and distinguise the 3rd case)
I haven't figured out the way to accomplish such tasks without overlapping between rules by using a merge statement: Any suggestion for the merge structure? Is it possible to do it all together within the same merge?
Thank you!
If I understand you task correctly following code should do the job:
--drop table real;
--drop table stag;
create table real (
id NUMBER,
col1 NUMBER,
col2 VARCHAR(10)
);
create table stag (
id NUMBER,
col1 NUMBER,
col2 VARCHAR(10)
);
insert into real values (1, 1, 'a');
insert into real values (2, 2, 'b');
insert into real values (3, 3, 'c');
insert into real values (4, 4, 'd');
insert into real values (5, 5, 'e');
insert into real values (6, 6, 'f');
insert into real values (7, 6, 'g'); -- PK the same but at least one column different
insert into real values (8, 7, 'h'); -- PK the same but at least one column different
insert into real values (9, 9, 'i');
insert into real values (10, 10, 'j'); -- in real but not in stag
insert into stag values (1, 1, 'a');
insert into stag values (2, 2, 'b');
insert into stag values (3, 3, 'c');
insert into stag values (4, 4, 'd');
insert into stag values (5, 5, 'e');
insert into stag values (6, 6, 'f');
insert into stag values (7, 7, 'g'); -- PK the same but at least one column different
insert into stag values (8, 8, 'g'); -- PK the same but at least one column different
insert into stag values (9, 9, 'i');
insert into stag values (11, 11, 'k'); -- in stag but not in real
merge into real
using (WITH w_to_change AS (
select *
from (select stag.*, 'I' as action from stag
minus
select real.*, 'I' as action from real
)
union (select real.*, 'D' as action from real
minus
select stag.*, 'D' as action from stag
)
)
, w_group AS (
select id, max(action) as max_action
from w_to_change
group by id
)
select w_to_change.*
from w_to_change
join w_group
on w_to_change.id = w_group.id
and w_to_change.action = w_group.max_action
) tmp
on (real.id = tmp.id)
when matched then
update set real.col1 = tmp.col1, real.col2 = tmp.col2
delete where tmp.action = 'D'
when not matched then
insert (id, col1, col2) values (tmp.id, tmp.col1, tmp.col2);

sql query to join two tables and a boolean flag to indicate whether it contains any words from third table

I have 3 tables with the following schema
create table main (
main_id int PRIMARY KEY,
secondary_id int NOT NULL
);
create table secondary (
secondary_id int NOT NULL,
tags varchar(100)
);
create table bad_words (
words varchar(100) NOT NULL
);
insert into main values (1, 1001);
insert into main values (2, 1002);
insert into main values (3, 1003);
insert into main values (4, 1004);
insert into secondary values (1001, 'good word');
insert into secondary values (1002, 'bad word');
insert into secondary values (1002, 'good word');
insert into secondary values (1002, 'other word');
insert into secondary values (1003, 'ugly');
insert into secondary values (1003, 'bad word');
insert into secondary values (1004, 'pleasant');
insert into secondary values (1004, 'nice');
insert into bad_words values ('bad word');
insert into bad_words values ('ugly');
insert into bad_words values ('worst');
expected output
----------------
1, 1000, good word, 0 (boolean flag indicating whether the tags contain any one of the words from the bad_words table)
2, 1001, bad word,good word,other word , 1
3, 1002, ugly,bad word, 1
4, 1003, pleasant,nice, 0
I am trying to use case to select 1 or 0 for the last column and use a join to join the main and secondary table, but getting confused and stuck. Can someone please help me with a query ? These tables are stored in redshift and i want query compatible with redshift.
you can use the above schema to try your query in sqlfiddle
EDIT: I have updated the schema and expected output now by removing the PRIMARY KEY in secondary table so that easier to join with the bad_words table.
You can use EXISTS and a regex comparison with \m and \M (markers for beginning and end of a word, respectively):
with
main(main_id, secondary_id) as (values (1, 1000), (2, 1001), (3, 1002), (4, 1003)),
secondary(secondary_id, tags) as (values (1000, 'very good words'), (1001, 'good and bad words'), (1002, 'ugly'),(1003, 'pleasant')),
bad_words(words) as (values ('bad'), ('ugly'), ('worst'))
select *, exists (select 1 from bad_words where s.tags ~* ('\m'||words||'\M'))::int as flag
from main m
join secondary s using (secondary_id)
select main_id, a.secondary_id, tags, case when c.words is not null then 1 else 0 end
from main a
join secondary b on b.secondary_id = a.secondary_id
left outer join bad_words c on c.words like b.tags
SELECT m.main_id, m.secondary_id, t.tags, t.is_bad_word
FROM srini.main m
JOIN (
SELECT st.secondary_id, st.tags, exists (select 1 from srini.bad_words b where st.tags like '%'+b.words+'%') is_bad_word
FROM
( SELECT secondary_id, LISTAGG(tags, ',') as tags
FROM srini.secondary
GROUP BY secondary_id ) st
) t on t.secondary_id = m.secondary_id;
This worked for me in redshift and produced the following output with the above mentioned schema.
1 1001 good word false
3 1003 ugly,bad word true
2 1002 good word,other word,bad word true
4 1004 pleasant,nice false

Oracle PIVOT, twice?

I have been trying to move away from using DECODE to pivot rows in Oracle 11g, where there is a handy PIVOT function. But I may have found a limitation:
I'm trying to return 2 columns for each value in the base table. Something like:
SELECT somethingId, splitId1, splitName1, splitId2, splitName2
FROM (SELECT somethingId, splitId
FROM SOMETHING JOIN SPLIT ON ... )
PIVOT ( MAX(splitId) FOR displayOrder IN (1 AS splitId1, 2 AS splitId2),
MAX(splitName) FOR displayOrder IN (1 AS splitName1, 2 as splitName2)
)
I can do this with DECODE, but I can't wrestle the syntax to let me do it with PIVOT. Is this even possible? Seems like it wouldn't be too hard for the function to handle.
Edit: is StackOverflow maybe not the right Overflow for SQL questions?
Edit: anyone out there?
From oracle-developer.net it would appear that it can be done like this:
SELECT somethingId, splitId1, splitName1, splitId2, splitName2
FROM (SELECT somethingId, splitId
FROM SOMETHING JOIN SPLIT ON ... )
PIVOT ( MAX(splitId) ,
MAX(splitName)
FOR displayOrder IN (1 AS splitName1, 2 as splitName2)
)
I'm not sure from what you provided what the data looks or what exactly you would like. Perhaps if you posted the decode version of the query that returns the data you are looking for and/or the definition for the source data, we could better answer your question. Something like this would be helpful:
create table something (somethingId Number(3), displayOrder Number(3)
, splitID Number(3));
insert into something values (1, 1, 10);
insert into something values (2, 1, 11);
insert into something values (3, 1, 12);
insert into something values (4, 1, 13);
insert into something values (5, 2, 14);
insert into something values (6, 2, 15);
insert into something values (7, 2, 16);
create table split (SplitID Number(3), SplitName Varchar2(30));
insert into split values (10, 'Bob');
insert into split values (11, 'Carrie');
insert into split values (12, 'Alice');
insert into split values (13, 'Timothy');
insert into split values (14, 'Sue');
insert into split values (15, 'Peter');
insert into split values (16, 'Adam');
SELECT *
FROM (
SELECT somethingID, displayOrder, so.SplitID, sp.splitname
FROM SOMETHING so JOIN SPLIT sp ON so.splitID = sp.SplitID
)
PIVOT ( MAX(splitId) id, MAX(splitName) name
FOR (displayOrder, displayOrder) IN ((1, 1) AS split, (2, 2) as splitname)
);