Match rows that include one of each at least once in SQL - sql

I have a users table:
ID Name OID TypeID
1 a 1 1
2 b 1 2
3 c 1 3
4 d 2 1
5 e 2 1
6 f 2 2
7 g 3 2
8 h 3 2
9 i 3 2
for this table, I want to filter by OID and TypeID so that I get the rows that it is filtered by OID and that includes all 1, 2, and 3 in TypeID.
For example, where OID=1, we have 1, 2, and 3 in TypeID but I shouldn't get the rows with IDs 4-6 because for IDs 4-6, OIDs are the same but TypeID does not include all of each(1, 2, and 3).

You can do :
select oid
from table t
where typeid in (1,2,3)
group by oid
having count(*) = 3;
If, oid contain duplicate typeid then you can use count(distinct typeid) instead.

you could use exists
select oid from table t1
where exists ( select 1 from table t1 where t1.oid=t2.oid
group by t2.oid
having (distinct TypeID)=3
)
Asume TypeID 1,2,3

if you are using sql-server, you can try this.
DECLARE #SampleData TABLE(ID INT, Name VARCHAR(5), OID INT, TypeID INT)
INSERT INTO #SampleData VALUES
(1 , 'a', 1, 1),
(2 , 'b', 1, 2),
(3 , 'c', 1, 3),
(4 , 'd', 2, 1),
(5 , 'e', 2, 1),
(6 , 'f', 2, 2),
(7 , 'g', 3, 2),
(8 , 'h', 3, 2),
(9 , 'i', 3, 2)
SELECT * FROM #SampleData D
WHERE NOT EXISTS (
SELECT * FROM #SampleData D1
RIGHT JOIN (VALUES (1),(2),(3)) T(TypeID) ON D1.TypeID = T.TypeID
AND D.OID = D1.OID
WHERE D1.TypeID IS NULL
)
Result:
ID Name OID TypeID
----------- ----- ----------- -----------
1 a 1 1
2 b 1 2
3 c 1 3

Related

how to calculate consecutive difference using values of two columns? [duplicate]

I'm working on a data structure with list of positive or negative result for each person.
Sample data (id is an identity):
id person result
1 1 0
2 1 1
3 1 1
4 2 1
5 2 0
6 1 1
7 1 0
8 2 0
9 2 0
10 2 0
With this I would like to count the maximum number of consecutive result = 1 for each person. The result in this sample would be
person max_count
1 3
2 1
I have tried using ROW_NUMBER() OVER (PARTITION BY) like this
SELECT person,
ROW_NUMBER() OVER (PARTITION BY person, result ORDER BY id) AS max_count
FROM TABLE
but it gives me an accumulative count instead of consecutive one.
What should I do to perform a consecutive count? Any hint would be appreciated. Thanks in advance
This looks like classic gaps-and-islands problem.
Examine intermediate results of each CTE in the query below to understand what is going on.
Sample data
I added person 3 with two sequences of positive results, so that we could find the longest sequence.
DECLARE #T TABLE (id int, person int, result int);
INSERT INTO #T (id, person, result) VALUES
(1 , 1, 0),
(2 , 1, 1),
(3 , 1, 1),
(4 , 2, 1),
(5 , 2, 0),
(6 , 1, 1),
(7 , 1, 0),
(8 , 2, 0),
(9 , 2, 0),
(10, 2, 0),
(11, 3, 0),
(12, 3, 1),
(13, 3, 1),
(14, 3, 1),
(15, 3, 1),
(16, 3, 0),
(17, 3, 1),
(18, 3, 1),
(19, 3, 0),
(20, 3, 0);
Query
WITH
CTE_RowNumbers
AS
(
SELECT
id, person, result
,ROW_NUMBER() OVER (PARTITION BY person ORDER BY ID) AS rn1
,ROW_NUMBER() OVER (PARTITION BY person, result ORDER BY ID) AS rn2
FROM #T
)
,CTE_Groups
AS
(
SELECT
id, person, result
,rn1-rn2 AS GroupNumber
FROM CTE_RowNumbers
)
,CTE_GroupSizes
AS
(
SELECT
person
,COUNT(*) AS GroupSize
FROM CTE_Groups
WHERE
result = 1
GROUP BY
person
,GroupNumber
)
SELECT
person
,MAX(GroupSize) AS max_count
FROM CTE_GroupSizes
GROUP BY person
ORDER BY person;
Result
+--------+-----------+
| person | max_count |
+--------+-----------+
| 1 | 3 |
| 2 | 1 |
| 3 | 4 |
+--------+-----------+
by using Case and SUM we can achieve the above result
DECLARE #T TABLE (id int, person int, result int);
INSERT INTO #T (id, person, result) VALUES
(1 , 1, 0),
(2 , 1, 1),
(3 , 1, 1),
(4 , 2, 1),
(5 , 2, 0),
(6 , 1, 1),
(7 , 1, 0),
(8 , 2, 0),
(9 , 2, 0),
(10, 2, 0)
select
person,
SUM(CASE WHEN RESULT = 1 then 1 else 0 END)
from #T
GROUP BY person

Update column with a dynamic sequence with Row_number()

I tried to update in MSSQL a column(Y) of a table(A) with with an ascending sequence that resets itself when the value of another column(X) of the same table changes.
Table A at the beginning:
id
X
Y
1
1
1
2
1
1
3
2
1
4
2
1
5
2
1
6
3
1
As it should be after the script:
id
X
Y
1
1
1
2
1
2
3
2
1
4
2
2
5
2
3
6
3
1
I tried with row_number() but in the loop, it modify all the rows :
With a counter and variable to increment:
UPDATE dbo.A
SET "Y" = #MyInc
FROM (
SELECT ROW_NUMBER() OVER ( "Id" ASC) AS row_num_Id
, Id
, X
, Y
FROM dbo.A) AS sub
WHERE row_num_Id = #MyCounter;
This will give you the results you want
CREATE TABLE #T (
Id INT NOT NULL,
X INT NOT NULL,
Y INT NOT NULL
)
INSERT INTO #T(Id, X, Y)
VALUES
(1, 1, 1),
(2, 1, 1),
(3, 2, 1),
(4, 2, 1),
(5, 2, 1),
(6, 3, 1);
GO
WITH WithRowNumbers AS (
SELECT
Id,
X,
ROW_NUMBER() OVER (PARTITION BY X ORDER BY Id) As RowNumber
FROM #T
)
UPDATE T
SET Y = WRN.RowNumber
FROM WithRowNumbers AS WRN
INNER JOIN #T AS T ON T.Id = WRN.Id
SELECT * FROM #T
Or as #CharlieFace mentions you can simplify even more, as the CTE is like a view of the original table.
UPDATE T
SET Y = T.RowNumber
FROM WithRowNumbers AS T;

Count over Partition by with one condition (/don't count the NULL values)

I want to count how many houses are within a building. Dataset like the following:
BuildingID, HouseID
1, 1
1, 2
1, 3
2, 4
2, 5
2, 6
NULL, 7
NULL, 8
With the following code it shows the total count of the houses, however, houses 7 and 8 don't have a building, so it shouldn't count anything.
SELECT BuildingID
, HouseID
, COUNT(HouseID) OVER (PARTITION BY BuildingID) AS 'Houses in Building'
FROM BUILDING
The result I get:
BuildingID, HouseID, Houses in Building
1, 1, 3
1, 2, 3
1, 3, 3
2, 4, 3
2, 5, 3
2, 6, 3
NULL, 7, 2
NULL, 8, 2
The result I want:
BuildingID, HouseID, Houses in Building
1, 1, 3
1, 2, 3
1, 3, 3
2, 4, 3
2, 5, 3
2, 6, 3
NULL, 7, NULL --or 0
NULL, 8, NULL --or 0
Any suggestions?
Just count the BuildingID. The COUNT function does not count nulls so it'll work:
COUNT(BuildingID) OVER (PARTITION BY BuildingID) AS 'Houses in Building'
Note that it assumes that HouseID is not null.
You could simply use a case expression to only show a count where the BuildingID is not null, or you could change your count to be COUNT(BuildingID) rather than COUNT(HouseID) (Since COUNT(NULL) gives 0). Both yield your required results:
DECLARE #Building TABLE (BuildingID INT, HouseID INT);
INSERT #Building (BuildingID, HouseID)
VALUES
(1, 1), (1, 2), (1, 3), (2, 4), (2, 5),
(2, 6), (NULL, 7), (NULL, 8);
SELECT BuildingID,
HouseID,
CountBuildingID = COUNT(BuildingID) OVER (PARTITION BY BuildingID),
CaseExpression = CASE WHEN BuildingID IS NOT NULL THEN COUNT(HouseID) OVER (PARTITION BY BuildingID) END
FROM #Building
ORDER BY HouseID;
OUTPUT
BuildingID HouseID CountBuildingID CaseExpression
-------------------------------------------------------
1 1 3 3
1 2 3 3
1 3 3 3
2 4 3 3
2 5 3 3
2 6 3 3
NULL 7 0 NULL
NULL 8 0 NULL
You can check this following self join option-
WITH your_table (BuildingID, HouseID)
AS
(
SELECT 1, 1 UNION ALL
SELECT 1, 2 UNION ALL
SELECT 1, 3 UNION ALL
SELECT 2, 4 UNION ALL
SELECT 2, 5 UNION ALL
SELECT 2, 6 UNION ALL
SELECT NULL, 7 UNION ALL
SELECT NULL, 8
)
SELECT A.BuildingID,A.HouseID,COUNT(A.BuildingID)Count
FROM your_table A
LEFT JOIN your_table B ON A.BuildingID = B.BuildingID
GROUP BY A.BuildingID,A.HouseID
Output is-
BuildingID HouseID Count
1 1 3
1 2 3
1 3 3
2 4 3
2 5 3
2 6 3
NULL 7 0
NULL 8 0
You can use case when condition in Count function like below,
COUNT(CASE WHEN BuildingID IS NOT NULL THEN HouseID END) OVER (PARTITION BY BuildingID) AS 'Houses in Building'

Consecutive Count on Record Result

I'm working on a data structure with list of positive or negative result for each person.
Sample data (id is an identity):
id person result
1 1 0
2 1 1
3 1 1
4 2 1
5 2 0
6 1 1
7 1 0
8 2 0
9 2 0
10 2 0
With this I would like to count the maximum number of consecutive result = 1 for each person. The result in this sample would be
person max_count
1 3
2 1
I have tried using ROW_NUMBER() OVER (PARTITION BY) like this
SELECT person,
ROW_NUMBER() OVER (PARTITION BY person, result ORDER BY id) AS max_count
FROM TABLE
but it gives me an accumulative count instead of consecutive one.
What should I do to perform a consecutive count? Any hint would be appreciated. Thanks in advance
This looks like classic gaps-and-islands problem.
Examine intermediate results of each CTE in the query below to understand what is going on.
Sample data
I added person 3 with two sequences of positive results, so that we could find the longest sequence.
DECLARE #T TABLE (id int, person int, result int);
INSERT INTO #T (id, person, result) VALUES
(1 , 1, 0),
(2 , 1, 1),
(3 , 1, 1),
(4 , 2, 1),
(5 , 2, 0),
(6 , 1, 1),
(7 , 1, 0),
(8 , 2, 0),
(9 , 2, 0),
(10, 2, 0),
(11, 3, 0),
(12, 3, 1),
(13, 3, 1),
(14, 3, 1),
(15, 3, 1),
(16, 3, 0),
(17, 3, 1),
(18, 3, 1),
(19, 3, 0),
(20, 3, 0);
Query
WITH
CTE_RowNumbers
AS
(
SELECT
id, person, result
,ROW_NUMBER() OVER (PARTITION BY person ORDER BY ID) AS rn1
,ROW_NUMBER() OVER (PARTITION BY person, result ORDER BY ID) AS rn2
FROM #T
)
,CTE_Groups
AS
(
SELECT
id, person, result
,rn1-rn2 AS GroupNumber
FROM CTE_RowNumbers
)
,CTE_GroupSizes
AS
(
SELECT
person
,COUNT(*) AS GroupSize
FROM CTE_Groups
WHERE
result = 1
GROUP BY
person
,GroupNumber
)
SELECT
person
,MAX(GroupSize) AS max_count
FROM CTE_GroupSizes
GROUP BY person
ORDER BY person;
Result
+--------+-----------+
| person | max_count |
+--------+-----------+
| 1 | 3 |
| 2 | 1 |
| 3 | 4 |
+--------+-----------+
by using Case and SUM we can achieve the above result
DECLARE #T TABLE (id int, person int, result int);
INSERT INTO #T (id, person, result) VALUES
(1 , 1, 0),
(2 , 1, 1),
(3 , 1, 1),
(4 , 2, 1),
(5 , 2, 0),
(6 , 1, 1),
(7 , 1, 0),
(8 , 2, 0),
(9 , 2, 0),
(10, 2, 0)
select
person,
SUM(CASE WHEN RESULT = 1 then 1 else 0 END)
from #T
GROUP BY person

SQL aggregation on the latest output per machine for each time

I have the following table:
ID machine app output time
1 1 A 12 1
2 1 B 15 1
3 1 B 8 3
4 1 A 11 4
5 2 C 14 4
6 2 D 17 4
For each app I want to get the latest output given up to each point in time, and aggregate these results grouped by machine using AVG
So for the table on top, the data before aggregation should be:
time machine app latest
1 1 A 12
1 1 B 15
3 1 A 12
3 1 B 8
4 1 A 11
4 1 B 8
4 2 C 14
4 2 D 17
And the aggregated result should be:
time machine avg
1 1 =(12+15)/2
3 1 =(12+8)/2
4 1 =(11+8)/2
4 2 =(14+17)/2
What is the correct way to approach this problem?
It is not as simple as I thought to be, but I think it works just as You want. I changed time column to ts, like this:
CREATE TABLE Table1
(ID int, machine int, app char(1), output int, ts int)
;
INSERT INTO Table1
(ID,machine,app,output, ts)
VALUES
(1, 1, 'A', 12, 1),
(2, 1, 'B', 15, 1),
(3, 1, 'B', 8, 3),
(4, 1, 'A', 11, 4),
(5, 2, 'C', 14, 4),
(6, 2, 'D', 17, 4)
;
And here is the query:
WITH
times as
(
SELECT distinct ts FROM Table1
),
machine_apps as
(
SELECT DISTINCT machine,app FROM Table1
),
grid as
(
SELECT
ts,machine,app
FROM
times
CROSS JOIN machine_apps
),
last_outputs as
(
SELECT
g.ts,
g.app,
g.machine,
max(t.ts) as last_time
FROM
grid g
JOIN Table1 t ON (t.app = g.app AND t.machine = g.machine AND t.ts <= g.ts)
GROUP BY
g.ts,
g.app,
g.machine
)
SELECT
l.ts,
l.machine,
AVG(t.output) as avg
FROM
last_outputs l
LEFT JOIN Table1 t ON (t.app = l.app AND t.machine = l.machine AND t.ts = l.last_time)
GROUP BY
l.ts,
l.machine
ORDER BY
l.ts,
l.machine