SQL Server: Find the group of records existing in another group of records - sql

I'm new to SQL Server and I searched for a solution to find, if a group is included in another group.
The query result should be grp_id 2 because 'A'+'B' is included in grp 3 and 5.
The result should be the grp_id of the the groups, that are included in other groups. With this result i´ll make an update of another table, joined with the grp_id.
The result should be:
+----+
| id |
+----+
| 2 |
+----+
I stuck in SQL because I do not find a solution to compare the groups. The idea was using bitwise comparison. But for that I had to add the value of each item in a field. I think there could be an easier way.
Thank you and best regards!
Eric
create table tmp_grpid (grp_id int);
create table tmp_grp (grp_id int, item_val nvarchar(10));
insert into tmp_grpid(grp_id) values (1);
insert into tmp_grpid(grp_id) values (2);
insert into tmp_grpid(grp_id) values (3);
insert into tmp_grpid(grp_id) values (4);
insert into tmp_grpid(grp_id) values (5);
--
insert into tmp_grp(grp_id, item_val) values (1, 'A');
insert into tmp_grp(grp_id, item_val) values (2, 'A');
insert into tmp_grp(grp_id, item_val) values (2, 'B');
insert into tmp_grp(grp_id, item_val) values (3, 'A');
insert into tmp_grp(grp_id, item_val) values (3, 'B');
insert into tmp_grp(grp_id, item_val) values (3, 'C');
insert into tmp_grp(grp_id, item_val) values (4, 'A');
insert into tmp_grp(grp_id, item_val) values (4, 'C');
insert into tmp_grp(grp_id, item_val) values (4, 'D');
insert into tmp_grp(grp_id, item_val) values (5, 'A');
insert into tmp_grp(grp_id, item_val) values (5, 'B');
insert into tmp_grp(grp_id, item_val) values (5, 'E');

Geez!
Technically speaking, group one is found in all other groups right? So, first a cross join to itself would be best with the condition that the values are the same AND that the groups are different, but before we do that we need to know how many items belong to each group so that's why we have the first select as a group that includes the count of elements per group, then join that with the cross join...Hope this helps.
select distinct dist_grpid
from
(select grp_id, count(*) cc from tmp_grp group by grp_id) g
inner join
(
select dist.grp_id dist_grpid, tmp_grp.grp_id, count(*) cc
from
tmp_grp dist
cross join tmp_grp
where
dist.item_val = tmp_grp.item_val and
dist.grp_id != tmp_grp.grp_id
group by
dist.grp_id,
tmp_grp.grp_id
) cj on g.grp_id = cj.dist_grpid and g.cc = cj.cc

Related

How to select distinct multi-column values in Oracle SQL?

I am trying to get distinct values with multi column select.
Sample table:
CREATE TABLE DUP_VALUES (ID NUMBER, NAME VARCHAR2(64));
INSERT INTO DUP_VALUES values (1, 'TEST1');
INSERT INTO DUP_VALUES values (2, 'TEST1');
INSERT INTO DUP_VALUES values (3, 'TEST2');
INSERT INTO DUP_VALUES values (4, 'TEST2');
INSERT INTO DUP_VALUES values (5, 'TEST1');
INSERT INTO DUP_VALUES values (6, 'TEST1');
INSERT INTO DUP_VALUES values (7, 'TEST1');
I want to get
ID NAME
1 TEST1
3 TEST2
I tried with SELECT DISTINCT ID, NAME FROM DUP_VALUES
But, I got all values, because ID is unique.
Use aggregation:
select min(id) as id, name
from dup_values
group by name;

Sample observations per group without replacement in SQL

Using the provided table I would like to sample let's say 2 users per day so that users assigned to the two days are different. Of course the problem I have is more sophisticated, but this simple example gives the idea.
drop table if exists test;
create table test (
user_id int,
day_of_week int);
insert into test values (1, 1);
insert into test values (1, 2);
insert into test values (2, 1);
insert into test values (2, 2);
insert into test values (3, 1);
insert into test values (3, 2);
insert into test values (4, 1);
insert into test values (4, 2);
insert into test values (5, 1);
insert into test values (5, 2);
insert into test values (6, 1);
insert into test values (6, 2);
The expected results would look like this:
create table results (
user_id int,
day_of_week int);
insert into results values (1, 1);
insert into results values (2, 1);
insert into results values (3, 2);
insert into results values (6, 2);
You can use window functions. Here is an example . . . although the details do depend on your database (functions for random numbers vary by database):
select t.*
from (select t.*, row_number() over (partition by day_of_week order by random()) as seqnum
from test t
) t
where seqnum <= 2;

SQL merge statement with multiple conditions

I have a requirement with some business rules to implement on SQL (within a PL/SQL block): I need to evaluate such rules and according to the result perform the corresponding update, delete or insert into a target table.
My database model contains a "staging" and a "real" table. The real table stores records inserted in the past and the staging one contains "fresh" data coming from somewhere that needs to be merged into the real one.
Basically these are my business rules:
Delta between staging MINUS real --> Insert rows into the real
Delta between real MINUS staging--> Delete rows from the real
Rows which PK is the same but any other fields different: Update.
(Those "MINUS" will compare ALL the fields to get equality and distinguise the 3rd case)
I haven't figured out the way to accomplish such tasks without overlapping between rules by using a merge statement: Any suggestion for the merge structure? Is it possible to do it all together within the same merge?
Thank you!
If I understand you task correctly following code should do the job:
--drop table real;
--drop table stag;
create table real (
id NUMBER,
col1 NUMBER,
col2 VARCHAR(10)
);
create table stag (
id NUMBER,
col1 NUMBER,
col2 VARCHAR(10)
);
insert into real values (1, 1, 'a');
insert into real values (2, 2, 'b');
insert into real values (3, 3, 'c');
insert into real values (4, 4, 'd');
insert into real values (5, 5, 'e');
insert into real values (6, 6, 'f');
insert into real values (7, 6, 'g'); -- PK the same but at least one column different
insert into real values (8, 7, 'h'); -- PK the same but at least one column different
insert into real values (9, 9, 'i');
insert into real values (10, 10, 'j'); -- in real but not in stag
insert into stag values (1, 1, 'a');
insert into stag values (2, 2, 'b');
insert into stag values (3, 3, 'c');
insert into stag values (4, 4, 'd');
insert into stag values (5, 5, 'e');
insert into stag values (6, 6, 'f');
insert into stag values (7, 7, 'g'); -- PK the same but at least one column different
insert into stag values (8, 8, 'g'); -- PK the same but at least one column different
insert into stag values (9, 9, 'i');
insert into stag values (11, 11, 'k'); -- in stag but not in real
merge into real
using (WITH w_to_change AS (
select *
from (select stag.*, 'I' as action from stag
minus
select real.*, 'I' as action from real
)
union (select real.*, 'D' as action from real
minus
select stag.*, 'D' as action from stag
)
)
, w_group AS (
select id, max(action) as max_action
from w_to_change
group by id
)
select w_to_change.*
from w_to_change
join w_group
on w_to_change.id = w_group.id
and w_to_change.action = w_group.max_action
) tmp
on (real.id = tmp.id)
when matched then
update set real.col1 = tmp.col1, real.col2 = tmp.col2
delete where tmp.action = 'D'
when not matched then
insert (id, col1, col2) values (tmp.id, tmp.col1, tmp.col2);

SQL Query count

HI there I have this table,
Recipe = (idR, recipeTitle, prepText, cuisineType, mealType)
Ingredient = (idI, ingrDesc)
RecipIngr = (idR*, idI*)
and I'm trying to query a list for ingrDesc with a count of how many recipies that ingrDesc is in. I want to list only those ingrDesc that occur more than 10 times.
Here's what I have:
SELECT a.idI, a.recipeTitle
FROM Recipe a
INNER JOIN recpingr b
ON a.idr = b.idr
WHERE a.preptext = '>10'
Any help as I don't know how to carry on with this query
Use GROUP BY with HAVING:
SELECT i.idI, i.ingrDesc, COUNT(*)
FROM Ingredient i
INNER JOIN RecipIngr ri ON i.idI = ri.idI
GROUP BY i.idI, i.ingrDesc
HAVING COUNT(*) > 10
You need to use a group by clause and having. I have created a quick sample here but my sample data does not go up to 10 so I used any ingredient that was used more than once (> 1).
Here is the sample data:
create table dbo.recipe (
idR int not null,
recipeTitle varchar(100) not null,
prepText varchar(4000) null,
cuisineType varchar(100) null,
mealType varchar(100) null
)
go
insert into dbo.recipe values (1, 'Eggs and Bacon', 'Prep Text 1', 'American', 'Breakfast')
insert into dbo.recipe values (2, 'Turkey Sandwich', 'Prep Text 2', 'American', 'Lunch')
insert into dbo.recipe values (3, 'Roast Beef Sandwich', 'Prep Text 3', 'American', 'Lunch')
go
create table dbo.ingredient (
idI int not null,
ingrDesc varchar(200) not null
)
go
insert into dbo.ingredient values (1, 'Large Egg')
insert into dbo.ingredient values (2, 'Bacon');
insert into dbo.ingredient values (3, 'Butter');
insert into dbo.ingredient values (4, 'Sliced Turkey');
insert into dbo.ingredient values (5, 'Lettuce');
insert into dbo.ingredient values (6, 'Tomato');
insert into dbo.ingredient values (7, 'Onion');
insert into dbo.ingredient values (8, 'Bread');
insert into dbo.ingredient values (9, 'Mustard');
insert into dbo.ingredient values (10, 'Horseradish');
insert into dbo.ingredient values (11, 'Sliced Roast Beef');
go
create table dbo.recipingr(
idR int not null,
idI int not null
)
go
insert into dbo.recipingr values (1, 1);
insert into dbo.recipingr values (1, 2);
insert into dbo.recipingr values (2, 4);
insert into dbo.recipingr values (2, 5);
insert into dbo.recipingr values (2, 6);
insert into dbo.recipingr values (2, 7);
insert into dbo.recipingr values (2, 8);
insert into dbo.recipingr values (2, 9);
insert into dbo.recipingr values (3, 11);
insert into dbo.recipingr values (3, 10);
insert into dbo.recipingr values (3, 8);
insert into dbo.recipingr values (3, 6);
insert into dbo.recipingr values (3, 5);
go
Here is the query:
select
i.ingrDesc,
count(*) ingrCount
from
dbo.recipe r
inner join dbo.recipingr ri on ri.idR = r.idR
inner join dbo.ingredient i on i.idI = ri.idI
group by
i.ingrDesc
having
count(*) > 1

Help with a SQL Query

My tables:
suggestions:
suggestion_id|title|description|user_id|status|created_time
suggestion_comments:
scomment_id|text|user_id|suggestion_id
suggestion_votes:
user_id|suggestion_id|value
Where value is the number of points assigned to a vote.
I'd like to be able to SELECT:
suggestion_id, title, the number of comments and the SUM of values for that suggestion.
sorted by SUM of values. LIMIT 30
Any ideas?
You may want to try using sub queries, as follows:
SELECT s.suggestion_id,
(
SELECT COUNT(*)
FROM suggestion_comments sc
WHERE sc.suggestion_id = s.suggestion_id
) num_of_comments,
(
SELECT SUM(sv.value)
FROM suggestion_votes sv
WHERE sv.suggestion_id = s.suggestion_id
) sum_of_values
FROM suggestions s;
Test case:
CREATE TABLE suggestions (suggestion_id int);
CREATE TABLE suggestion_comments (scomment_id int, suggestion_id int);
CREATE TABLE suggestion_votes (user_id int, suggestion_id int, value int);
INSERT INTO suggestions VALUES (1);
INSERT INTO suggestions VALUES (2);
INSERT INTO suggestions VALUES (3);
INSERT INTO suggestion_comments VALUES (1, 1);
INSERT INTO suggestion_comments VALUES (2, 1);
INSERT INTO suggestion_comments VALUES (3, 2);
INSERT INTO suggestion_comments VALUES (4, 2);
INSERT INTO suggestion_comments VALUES (5, 2);
INSERT INTO suggestion_comments VALUES (6, 3);
INSERT INTO suggestion_votes VALUES (1, 1, 3);
INSERT INTO suggestion_votes VALUES (2, 1, 5);
INSERT INTO suggestion_votes VALUES (3, 1, 1);
INSERT INTO suggestion_votes VALUES (1, 2, 4);
INSERT INTO suggestion_votes VALUES (2, 2, 2);
INSERT INTO suggestion_votes VALUES (1, 3, 5);
Result:
+---------------+-----------------+---------------+
| suggestion_id | num_of_comments | sum_of_values |
+---------------+-----------------+---------------+
| 1 | 2 | 9 |
| 2 | 3 | 6 |
| 3 | 1 | 5 |
+---------------+-----------------+---------------+
3 rows in set (0.00 sec)
UPDATE: #Naktibalda's solution is an alternative solution that avoids sub queries.
I was typing the same query as potatopeelings.
But there is an issue:
Resultset after joins contains M*N rows (M-number of comments, N-number of votes, not less than 1) for each suggestion.
To avoid that you have to count distinct comment ids and divide a sum of votes by number of comments.
SELECT
s.*,
COUNT(DISTINCT c.scommentid) AS comment_count,
SUM(v.value)/GREATEST(COUNT(DISTINCT c.scommentid), 1) AS total_votes
FROM suggestions AS s
LEFT JOIN suggestion_comments AS c ON s.suggestion_id = c.suggestion_id
LEFT JOIN suggestion_votes AS v ON s.suggestion_id = v.suggestion_id
GROUP BY s.suggestion_id
ORDER BY total_votes DESC
LIMIT 30