Select distinct rows where all values are in group - sql

I have a table that stores a one to many relationsion (caseid to code). It is possible for one caseid to have many codes.
I would like to select all rows where all the codes for a given caseid are contained within a group of codes. If a caseid is associated with any code that is not in this group, then exclude it, regardless of if all the other codes are in the group.
I would then like to build a table where each unique caseid has a single row and four Boolean columns (one for each code I am looking for) denoting if that code is present.
Here is my query so far:
select distinct(caseid), _43280, _43279, _43282, _43281 from
(select caseid,
0 < countif(code = "43280") as _43280,
0 < countif(code = "43279") as _43279,
0 < countif(code = "43282") as _43282,
0 < countif(code = "43281") as _43281
from mytable
inner join (
select caseid, logical_and(code in ('43280', '43279', '43282', '43281')) as include,
from mytable
group by caseid
having include
)
using(caseid)
group by caseid
order by caseid)
An example table may be:
caseid | code
1 43280
1 43279
1 43282
2 43280
2 43279
2 43282
2 99999
3 43280
3 43279
3 43282
It should come out as:
caseid | _43280 | _43279 | _43282 | _43281
1 TRUE TRUE TRUE FALSE
3 TRUE TRUE TRUE FALSE

You can use conditional aggregation as follows:
select caseid,
logical_or(code = 43280) code_43280,
logical_or(code = 43279) code_43279,
logical_or(code = 43282) code_43282,
logical_or(code = 43281) code_43281
from mytable
group by caseid
having not logical_or(code not in (43280, 43279, 43282, 43281))

Below is for BigQuery Standard SQL and using BQ Scripting
#standardsql
create temp table data as
select caseid, array_agg(code) as codes,
from `project.dataset.table` t
left join unnest(['43280', '43279', '43282', '43281']) test_code
on code = test_code
group by caseid
having countif(test_code is null) = 0;
execute immediate (
select """
select caseid, """ ||
string_agg("""max(if(code = '""" || code || """', true, false)) as _""" || replace(code, '.', '_'), ', ')
|| """
from data, unnest(codes) code
group by caseid
"""
from unnest(['43280', '43279', '43282', '43281']) code
);
if to apply to sample data from your question - output is

Related

How to return a value with COALESCE when a SELECT statement returns nothing?

Hello Stackoverflow community !
I got this code :
(SELECT COALESCE(id, 0) FROM members WHERE user_id = 1225282512438558720)
UNION ALL
(SELECT COALESCE(id, 0) FROM channels WHERE channel_id = 720694686028791971)
UNION ALL
(SELECT COALESCE(id, 0) FROM guilds WHERE guild_id = 831900150115991605);
Each statement could or not could return a value (because nothing corresponds to the WHERE Clause)
My problem is that if for example the first statement returns nothing then Postgres is going to return me this
coalesce
----------
6
1
but i want that Postgres returns me this :
coalesce
----------
NULL
6
1
How can i do that ?
This query returns nothing because no rows satisfy filter criteria. To return rows from the empty result set you need to do aggregation. So you need:
(SELECT max(id) FROM members WHERE user_id = 1225282512438558720)
UNION ALL
(SELECT max(id) FROM channels WHERE channel_id = 720694686028791971)
UNION ALL
(SELECT max(id) FROM guilds WHERE guild_id = 831900150115991605);
For multiple columns or results, you can left-join an unary row table to each of your inputs
WITH dual AS (select 1)
(SELECT COALESCE(id, 0) FROM dual LEFT JOIN members ON user_id = 1225282512438558720)
UNION ALL
(SELECT COALESCE(id, 0) FROM dual LEFT JOIN channels ON channel_id = 720694686028791971)
UNION ALL
(SELECT COALESCE(id, 0) FROM dual LEFT JOIN guilds ON guild_id = 831900150115991605);

Update column with count of row value in another column in SQL Server

I have no idea why I haven't found a solution to this on here but: I am trying to COUNT how many times the Serial_Parent appears during the packing phase so I can use this in other calculations.
This works perfectly:
SELECT
Serial_Parent, COUNT(Serial_Parent)
FROM
mfng_data
WHERE
Machine_Process = 'Packing'
GROUP BY
Serial_Parent;
However when I try to UPDATE a column with this COUNT, I fail miserably as it just counts all the rows in the table and saves that as each row value thus 2,134,222 appearing in each row value.
I've tried this:
UPDATE mfng_data
SET Count_Serial_Parent = (SELECT COUNT(*)
FROM mfng_data
WHERE Machine_Process = 'Packing'
AND Serial_Parent = Serial_Parent)
WHERE Serial_Parent = Serial_Parent;
I've also tried this:
UPDATE mfng_data
SET Count_Serial_Parent = (SELECT COUNT(Serial_Parent)
FROM mfng_data
WHERE Machine_Process = 'Packing'
AND Serial_Parent = Serial_Parent);
Sample data:
Spec: 12373 Rev: -6 M35846 M358461 M3584610 M35846101 NULL NULL NULL M35846101 6808
Spec: 12373 Rev: -6 M35846 M358461 M3584610 M35846102 NULL NULL NULL M35846102 6808
Spec: 16692 Rev: -4 K45678 K456781 K4567810 K45678101 NULL NULL NULL K45678101 3964
Spec: 16692 Rev: -4 K45678 K456782 K4567820 K45678201 NULL NULL NULL K45678201 3978
Spec: 16693 Rev: -4 K45678 K456782 K4567820 K45678202 NULL NULL NULL K45678202 3806
Desired result (M35846 will appear twice so it will list "2" for each row entry)
Serial_Parent Count_Serial_Parent
----------------------------------
M35846 2
M35846 2
J39384 52 - - > 52 rows will show "52" and so on below
M35488 10
K4448 4
M35927 8
K45678 3
You need table aliases and qualified column names:
UPDATE m
SET Count_Serial_Parent = (SELECT COUNT(*)
FROM mfng_data m2
WHERE m2.Machine_Process = 'Packing' AND
m2.Serial_Parent = m.Serial_Parent
)
FROM mfng_data m;
The subquery needs to be correlated to the outer query. A condition such as Serial_Parent = Serial_Parent basically always evaluates to true because it is referring to the column of the table referenced in the subquery.
However, the better approach is an updatable CTE:
with toupdate as (
select m.*,
sum(case when m.Machine_Process = 'Packing' then 1 else 0 end) over (partition by m.serial_parent) as new_Count_Serial_Parent
from mfng_data m
)
update toupdate
set Count_Serial_Parent = new_Count_Serial_Parent;
Another way (using a toy table -- hope you get the idea):
DECLARE # TABLE (Serial_Parent int, Count_Serial_Parent int)
INSERT INTO # (Serial_Parent, Count_Serial_Parent) VALUES
(1,0), (1,0),(2,0),(2,0),(2,0)
SELECT * FROM #
UPDATE m -- mfng_data
SET m.Count_Serial_Parent = j.Count_Serial_Parent
FROM # m -- mfng_data
JOIN (
SELECT Serial_Parent, COUNT(*) Count_Serial_Parent
FROM #
GROUP BY Serial_Parent
) j ON j.Serial_Parent = m.Serial_Parent
SELECT * FROM #

SQL count DISTINCT ONCE user_id multiple attributes

Hello there I cant manage to get a good result for the following case:
I have a table which is like this:
UserID | Label
-------- ------
1 | Private
1 | Public
2 | Private
3 | Hidden
4 | Public
5 | Hidden
I want to have the following happening if a User has following assigned he is:
Private and Hidden are treaten the same: lets say Business
Public: BtoC
Public and Private and/or Hidden: both
So in the end I have a count(DISTINCT UserID) of
Business 3
BtoC 1
both 1
I have tried to use CASE WHEN but it doesn't work my current total query looks like this:
SELECT gen_month,
count(DISTINCT cu.id) as leads,
a.label
FROM generate_series(DATE_TRUNC('month', CURRENT_DATE::date - 96*INTERVAL '1 month'), CURRENT_DATE::date, '1 month') m(gen_month)
LEFT OUTER JOIN company_user AS cu
ON (date_trunc('month', cu.creation_date) = date_trunc('month', gen_month))
LEFT JOIN user u
ON u.user_id = cu.id
LEFT join user_account_status as uas
on cu.id = uas.user_id
LEFT JOIN account as a
on uas.account_id = a.id
where gen_month >= DATE_TRUNC('month',NOW() - INTERVAL '5 months')
group by m.gen_month, a.label
order by gen_month
So my main problem now is that the count appears in every attribute once.
How can I make a userid only count once under condition CASE WHEN user_id appears Public and (Private or Hidden) THEN count(DISTINCT user_id) as Both?
Addition: its mySQL mariaDB and postgreSQL. But first I would happy with Postgres
This is not implemented in your total query, but for counting users for each category, you can:
with the_table(UserID , Label) as(
select 1 ,'Private' union all
select 1 ,'Public' union all
select 2 ,'Private' union all
select 3 ,'Hidden' union all
select 4 ,'Public' union all
select 5 ,'Hidden'
)
select result, count(*) from (
select UserID, case when min(Label) = 'Public' then 'BtoC' when max(Label) in('Private','Hidden') then 'Business' else 'both' end as result
from the_table
group by UserID
) t
group by result
with
my_table(user_id, label) as (values
(1,'Private'),
(1,'Public'),
(2,'Private'),
(3,'Hidden'),
(4,'Public'),
(5,'Hidden')),
t as (
select
user_id,
string_agg('{'||label||'}', '') as labels
from my_table
group by user_id),
tt as (
select
user_id,
labels,
case
when
position('{Public}' in labels) > 0 and (position('{Private}' in labels) > 0 or position('{Hidden}' in labels) > 0) then 'Both'
when
position('{Private}' in labels) > 0 or position('{Hidden}' in labels) > 0 then 'Business'
when
position('{Public}' in labels) > 0 then 'BtoC'
end as kind
from t)
select kind, count(*) from tt group by kind;
For MariaDB use GROUP_CONCAT() instead of PostgreSQL string_agg().
Note that the case statement check conditions in order of appearance and returns the value for the first satisfied condition.
PS: Using PostgreSQL's arrays the conditions would be more elegant.

3 Statements into a view

I have 3 SQL statements that I would like to create a view and return 3 columns, each representing a count.
Here are my statements
SELECT Count(*)
FROM PlaceEvents
WHERE PlaceID = {placeID} AND EndDateTimeUTC >= GETUTCDATE()
SELECT Count(*)
FROM PlaceAnnouncements
WHERE PlaceID = {placeID}
SELECT Count(*)
FROM PlaceFeedback
WHERE PlaceID = {placeID} AND IsSystem = 0
I know how to create a basic view but how do I create one that will let me have those 3 column place placeID as a column to use for filtering
I would like to do the following to return the proper data
SELECT *
FROM vMyCountView
WHERE PlaceID = 1
CREATE VIEW vMyCountView AS
(...) AS ActiveEvents,
(...) AS Announcements,
(...) AS UserFeedback,
PlaceID
I'd rather use a function then a view:
This allows you to pass in any parameters you like (I assumed placeId is an INT) and deal with it within your query. The handling is quite as easy as with a View:
CREATE FUNCTION MyCountFunction(#PlaceID INT)
RETURNS TABLE
AS
RETURN
SELECT
(SELECT Count(*) FROM PlaceEvents WHERE PlaceID = #PlaceID AND EndDateTimeUTC >= GETUTCDATE()) AS ActiveEvents
,(SELECT Count(*) FROM PlaceAnnouncements WHERE PlaceID = #PlaceID) AS Announcements
,(SELECT Count(*) FROM PlaceFeedback WHERE PlaceID = #PlaceID AND IsSystem = 0) AS UserFeedback
,#PlaceID AS PlaceID;
GO
And this is how you call it. You can use this for JOINs or with APPLY also...
SELECT * FROM dbo.MyCountFunction(3);
You can combine them as multiple select sub-queries.
CREATE VIEW vMyCountView AS
SELECT
(SELECT Count(*) FROM PlaceEvents
WHERE PlaceID = s.placeID AND EndDateTimeUTC >= GETUTCDATE()) AS ActiveEvents,
(SELECT Count(*) FROM PlaceAnnouncements
WHERE PlaceID = s.placeID) AS Announcements,
(SELECT Count(*) FROM PlaceFeedback
WHERE PlaceID = s.placeID AND IsSystem = 0) AS UserFeedback,
placeID
from Sometable s
By definition, view is a single select statement. You can use join, union and so on if it makes sense to your business logic provided create view is the only query in the batch.
You can make a view like that with GROUP BY:
SELECT
PlaceId
, Count(peId) AS ActiveEvents
, COUNT(paId) AS Announcements
, COUNT(fbId) AS UserFeedback
FROM (
SELECT PlaceId, 1 AS peId, NULL AS paId, NULL AS fbId
FROM PlaceEvents
WHERE EndDateTimeUTC >= GETUTCDATE()
UNION ALL
SELECT PlaceId, NULL AS peId, 1 AS paId, NULL AS fbId
FROM PlaceAnnouncements
UNION ALL
SELECT PlaceId, NULL AS peId, NULL AS paId, 1 AS fbId
FROM PlaceFeedback
WHERE IsSystem = 0
) src
GROUP BY PlaceId
The idea behind this select, which is very easy to make into a view, is to select items from three tables into one for counting, and then group them all at once.
If you have two active events, one announcement, and three feedbacks for place ID 123, the three inner selects would produce this:
PlaceId peId paId fbId
------- ---- ---- ----
123 1 NULL NULL
123 1 NULL NULL
123 NULL 1 NULL
123 NULL NULL 1
123 NULL NULL 1
123 NULL NULL 1

Find matching sets in a database table

I have a junction table in a (SQL Server 2014) database with columns FirstID and SecondID. Given a specific FirstID, I'd like to find all other FirstIDs from the table that have an equivalent set of SecondIDs (even if that set is empty).
Sample Data:
FirstId SecondId
1 1
1 2
2 3
3 1
3 2
... ...
In the case of the sample data, if I specified FirstID = 1, then I'd expect 3 to appear in the result set.
I've tried the following so far, which works pretty well except for empty sets:
SELECT FirstSecondEqualSet.FirstId
FROM FirstSecond FirstSecondOriginal
INNER JOIN FirstSecond FirstSecondEqualSet ON FirstSecondOriginal.SecondId = FirstSecondEqualSet.SecondId
WHERE FirstSecondOriginal.FirstId = #FirstId 
AND FirstSecondEqualSet.FirstId != #FirstId
GROUP BY FirstSecondEqualSet.FirstId
HAVING COUNT(1) = (SELECT COUNT(1) FROM FirstSecond WHERE FirstSecond.FirstId = #FirstId)
I think it's somehow related to Relational Division with no Remainder (RDNR). See this great article by Dwain Camps for reference.
DECLARE #firstId INT = 1
SELECT
f2.FirstId
FROM FirstSecond f1
INNER JOIN FirstSecond f2
ON f2.SecondId = f1.SecondId
AND f1.FirstId <> f2.FirstId
WHERE
f1.FirstId = #firstId
GROUP BY f2.FirstId
HAVING COUNT(*) = (SELECT COUNT(*) FROM FirstSecond WHERE FirstId = #firstId)
Here is one approach. It counts the number of values for each firstid and then joins on the secondid.
select fs2.firstid
from (select fs1.*, count(*) over (partition by firstid) as numseconds
from firstsecond fs1
where fs1.firstid = #firstid
) fs1 join
(select fs2.*, count(*) over (partition by firstid) as numseconds
from firstsecond fs2
) fs2
on fs1.secondid = fs2.secondid and fs1.numseconds = fs2.numseconds
group by fs2.firstid
having count(*) = max(fs1.numseconds);