Count Joins from Multiple Tables - sql

For reference, I am using Postgres 9.2.23.
I have several tables where one table (user_group) is related to some other tables (eg: posts, group_invites, and some more other ones). There is, also a groups table, but it doesn't hold any data that I need for the purposes of these queries.
Table user_group:
fk_user_group_id, fk_user_id, fk_group_id, fk_invite_id user_status, ...
Table message:
pk_message_id, fk_user_id, fk_group_id, child_message_id, ...
Table group_prospective_user:
pk_prospective_user_id, fk_group_id, ...
I want to get some statistics for each of the related tables for a list of specified group ids if the user is a member of the group.
Right now I do this with one query for each related table, eg:
select
"public"."user_group"."fk_group_id" as "groupId",
count(case
when (
"public"."message"."child_message_id" is null
and "public"."message"."pk_message_id" is not null
) then "public"."message"."pk_message_id"
end) as "numDiscussions",
count("public"."message"."pk_message_id") as "numDiscussionPosts"
from "public"."user_group"
left outer join "public"."message"
on "public"."message"."fk_group_id" = "public"."user_group"."fk_group_id"
where (
"public"."user_group"."fk_group_id" in (
1, 11, 23, 530, 1070
)
and "public"."user_group"."role" in (
'ADMINISTRATOR', 'MODERATOR', 'MEMBER'
)
and "public"."user_group"."fk_user_id" = 17517
)
group by "public"."user_group"."fk_group_id"
And for invites:
select
"public"."user_group"."fk_group_id" as "groupId",
count(case
when "public"."prospective_user"."status" = 1 then "public"."prospective_user"."pk_prospective_user_id"
end) as "numInviteesExternal"
from "public"."user_group"
left outer join "public"."prospective_user"
on "public"."prospective_user"."fk_group_id" = "public"."user_group"."fk_group_id"
where (
"public"."user_group"."fk_group_id" in (
1, 11, 23, 530, 6176
)
and "public"."user_group"."role" in (
'ADMINISTRATOR', 'MODERATOR', 'MEMBER'
)
and "public"."user_group"."fk_user_id" = 17517
)
group by "public"."user_group"."fk_group_id"
The query to count the number of group invites is very similar to the above query. Just the count when and join on change.
Each of the queries to these tables has the same related logic for checking the groups to which the current user is an active member. Is there efficient way to merge multiple similar queries like this into a single query?
I tried using multiple LEFT JOINs with select count distinct, but that ran into performance issues on groups with both lots of messages, and lots of invites. Is there a way to easily/efficiently do this with, say, a subquery?

The answer from user #Parfait was the most scalable solution I could find. I based my queries on this tutorial: https://www.sqlteam.com/articles/using-derived-tables-to-calculate-aggregate-values.
While this isn't perfect, and results in a bunch of subqueries running, it does get me all the data at once, and with a single trip to the DB.
It ended up like this:
"groups"."groupId",
coalesce(
"members"."member_count",
0
) as "numActiveMembers",
coalesce(
"members"."invitee_count",
0
) as "numInviteesInternal",
coalesce(
"discussions"."discussions_count",
0
) as "numDiscussions",
coalesce(
"discussions"."posts_count",
0
) as "numDiscussionPosts"
from (
select "public"."user_group"."fk_group_id" as "groupId"
from "public"."user_group"
where (
"public"."user_group"."fk_group_id" in (
1, 2, 3, 4, 5
)
and "public"."user_group"."role" = 'ADMINISTRATOR'
and "public"."user_group"."fk_user_id" = 123
)
group by "public"."user_group"."fk_group_id"
) as "groups"
left outer join (
select
"public"."user_group"."fk_group_id" as "members_group_id",
count(distinct case
when "public"."user_group"."role" in (
'ADMINISTRATOR', 'MODERATOR', 'MEMBER'
) then "public"."user_group"."pk_user_group_id"
end) as "member_count",
count(distinct case
when "public"."user_group"."role" = 'INVITEE' then "public"."user_group"."pk_user_group_id"
end) as "invitee_count"
from "public"."user_group"
group by "public"."user_group"."fk_group_id"
) as "members"
on "members_group_id" = "groupId"
left outer join (
select
"public"."message"."fk_group_id" as "discussions_group_id",
count(case
when (
"public"."message"."child_message_id" is null
and "public"."message"."pk_message_id" is not null
) then "public"."message"."pk_message_id"
end) as "discussions_count",
count("public"."message"."pk_message_id") as "posts_count"
from "public"."message"
group by "public"."message"."fk_group_id"
) as "discussions"
on "discussions_group_id" = "groupId"```

Related

Maximum number of statements within CTE

WITH abidAccount AS
(
SELECT
[ID], AzureBlobInsertDate = MAX(AzureBlobInsertDate)
FROM
[dba].[CurAccounts]
GROUP BY
ID
),
recentAccount AS
(
SELECT ca.*
FROM [dba].[CurAccounts] ca
JOIN abidAccount aa ON aa.[id] = ca.[ID]
AND aa.azureblobInsertDate = ca.azureBlobInsertDate
),
abidDevice AS
(
SELECT deviceID, azureBlobInsertDate = MAX(azureBlobInsertDate)
FROM [dba].[CurrentDevices]
GROUP BY DeviceID
),
recentDevice AS
(
SELECT cd.*
FROM [RZRExploreLayer3].[CurrentDevices] cd
JOIN abidDevice ad ON ad.DeviceID = cd.DeviceID
AND ad.azureBlobInsertDate = cd.azureblobinsertdate
)
SELECT
rd.deviceId,
rd.[DeviceReturned],
rd.accountNumber,
rd.[DeviceFormat],
rd.[DeviceLabel],
MAX(ra.azureBlobInsertDate) AS AzureBlobInsertDate
FROM
recentAccount ra
JOIN
recentDevice rd ON ra.[id] = rd.accountNumber
WHERE
rd.deviceReturned NOT LIKE 'Null'
GROUP BY
rd.deviceId, rd.[DeviceReturned], rd.accountNumber,
rd.[DeviceFormat], rd.[DeviceLabel]
/* deviceID, rd.[DeviceReturned], rd.accountNumber, ra.azureBlobInsertDate */
HAVING
COUNT(1) > 1
How do I combine multiple CTE into one query?
My query is attempting to determine if there are duplicate records and if so only keep the max(AzureBlobInsertDate) record and remove other duplicates. then combine all the results from the CurAccounts & Devices tables.
Any assistance you can offer is greatly appreciated.

error incorporating a select within a IFNULL in MariaDB

I'm creating a view in MariaDB and i'm having trouble making it work for a couple of fields. Currently this is working:
( SELECT DISTINCT IFNULL(grades.`grade`,'No Grade')
FROM `table` grades
WHERE userinfo.`id` = grades.`id`
AND grades.`Item Name` = 'SOMEINFO'
) 'SOMENAME',
But i need to add a select where the 'No grade' is, in the following form
( SELECT DISTINCT IFNULL( grades.`grade`,
SELECT IF( EXISTS
( SELECT *
FROM `another_table`
WHERE userid = 365
AND courseid = 2
), 'Enrolled', 'Not enrolled'
)
)
FROM `table` grades
WHERE userinfo.`id` = grades.`id`
AND grades.`Item Name` = 'SOMEINFO'
) 'SOMENAME',
i know that
SELECT IF( EXISTS( SELECT *
FROM `another_table`
WHERE userid = 365
AND courseid = 2
),
'Enrolled', 'Not enrolled'
)
is working too, but now the whole thing it's giving me an error, so any suggestions would be greatly appreciated
Thanks
This looks like a subquery:
(SELECT DISTINCT IFNULL(grades.`grade`,
SELECT IF( EXISTS (SELECT *
FROM `another_table`
WHERE userid = 365 AND courseid = 2
), 'Enrolled', 'Not enrolled'
)
)
FROM `table` grades
WHERE userinfo.`id` = grades.`id` AND
grades.`Item Name` = 'SOMEINFO'
) as SOMENAME,
You are using a subquery that returns two columns in a position where a scalar subquery is expected. A scalar subquery returns one column in at most one row.
Unfortunately, there is no easy way to do what you want in MySQL, because of the restrictions on views. I would advise you to rewrite the logic so the exists is handled using a left join in the from clause.

SQL query top 2 columns of joined table?

I am having no luck attempting to get the top (x number) of rows from a joined table. I want the top 2 resources (ordered by name) which in this case should be Katie and Simon and regardless of what I've tried, I can't seem to get it right. You can see below what I've commented out - and what looks like it should work (but doesn't). I cannot use a union. Any ideas?
select distinct
RTRESOURCE.RNAME as Resource,
RTTASK.TASK as taskname, SUM(distinct SOTRAN.QTY2BILL) AS quantitytobill from SOTRAN AS SOTRAN INNER JOIN RTTASK AS RTTASK ON sotran.taskid = rttask.taskid
left outer JOIN RTRESOURCE AS RTRESOURCE ON rtresource.keyno=sotran.resid
WHERE sotran.phantom<>'y' and sotran.pgroup = 'L' and sotran.timesheet = 'y' and sotran.taskid >0 AND RTRESOURCE.KEYNO in ('193','159','200') AND ( SOTRAN.ADDDATE>='8/15/2015 12:00:00 AM' AND SOTRAN.ADDDATE<'9/3/2015 11:59:59 PM' )
//and RTRESOURCE.RNAME in ( select distinct top 2 RTRESOURCE.RNAME from RTRESOURCE order by RTRESOURCE.RNAME)
//and ( select count(*) from RTRESOURCE RTRESOURCE2 where RTRESOURCE2.RNAME = RTRESOURCE.RNAME ) <= 2
GROUP BY RTRESOURCE.rname,RTTASK.task,RTTASK.taskid,RTTASK.mdsstring ORDER BY Resource,taskname
You should provide a schema.
But lets assume your query work. You create a CTE.
WITH youQuery as (
SELECT *
FROM < you big join query>
), maxBill as (
SELECT Resource, Max(quantitytobill) as Bill
FROM yourQuery
)
SELECT top 2 *
FROM maxBill
ORDER BY Bill
IF you want top 2 alphabetical
WITH youQuery as (
SELECT *
FROM < you big join query>
), Names as (
SELECT distinct Resource
FROM yourQuery
Order by Resource
)
SELECT top 2 *
FROM Names

How to complare one Master table against several child tables

I have one master table (PO_BreakOutAll) with ~3000 rows made up of only two columns (PO_ID, PO_LN_NO) which together make up the primary key. I also have several other tables each with a subset of the data from the master table (or so it is supposed to be). All the tables are the same schema as the master table.
All tables have this exact schema:
PO_ID char(5) PK
PO_LN_NO int PK
I need to do two different types of comparisons for validation and to find duplicates.
First to make sure that every row in the master table exists in one, and only one, of the other child tables.
Second I need to make sure that no row is duplicated across any of the child tables. The same row can exist in two or more child tables and I need to find them.
I can do each table in a separate query but have not figured out how to write one query that compares all the child tables at once.
Here is what I have so far but it does not work:
SELECT a.PO_ID as all_PO,
a.PO_LN_NO,
c.PO_ID as Cummings_PO,
c.PO_LN_NO,
f.PO_ID as filter_PO,
f.PO_LN_NO,
fo.PO_ID as fixedObl_PO,
fo.PO_LN_NO
FROM
PO_BreakOutAll a
LEFT OUTER JOIN
PO_Cummins c ON (c.PO_ID = a.PO_ID AND c.PO_LN_NO = a.PO_LN_NO)
LEFT OUTER JOIN
PO_Filters f ON (f.PO_ID = a.PO_ID AND f.PO_LN_NO = a.PO_LN_NO)
LEFT OUTER JOIN
PO_FixedOblig fo ON (fo.PO_ID = a.PO_ID AND fo.PO_LN_NO = a.PO_LN_NO)
I think #gordon linoff has an overall solution. If you want to work with the CTE paradigm here is an example based on your Fiddle that answers the duplicate question:
WITH CTE (PO_ID,PO_LN_NO,TableName) AS
(SELECT
PO_ID,
PO_LN_NO,
'Cummings' as TableName
FROM PO_Cummins
UNION ALL
SELECT
PO_ID,
PO_LN_NO,
'Filters' as TableName
FROM PO_Filters
UNION ALL
SELECT
PO_ID,
PO_LN_NO,
'Office' as TableName
FROM PO_Office )
SELECT
PO_BreakOutAll.PO_ID,
PO_BreakOutAll.PO_LN_NO,
CHILD_DATA.TABLENAME AS DUP_TABLENAME
FROM
PO_BreakOutAll
INNER JOIN (
SELECT PO_ID, PO_LN_NO, COUNT(1) AS DUP_COUNTER
FROM CTE
GROUP BY PO_ID, PO_LN_NO
HAVING COUNT(1) > 1
) DUPS ON DUPS.PO_ID = PO_BreakOutAll.PO_ID AND DUPS.PO_LN_NO = PO_BreakOutAll.PO_LN_NO
INNER JOIN (
SELECT PO_ID, PO_LN_NO, TABLENAME
FROM CTE
) CHILD_DATA
ON CHILD_DATA.PO_ID = PO_BreakOutAll.PO_ID AND CHILD_DATA.PO_LN_NO = PO_BreakOutAll.PO_LN_NO
ORDER BY PO_ID, PO_LN_NO, DUP_TABLENAME
I wouldn't use join for this; I would use union all. Here is a way to get a count of how the records overlap among the tables:
select isAll, isCummins, isFilters, isOblig, count(*)
from (select PO_ID, PO_LN_NO, sum(isAll) as isAll, sum(isCummins) as isCummins,
sum(isFilters) as isFilters, sum(isOblig) as isOblig
from ((select PO_ID, PO_LN_NO, 1 as isAll, 0 as isCummins, 0 as isFilters, 1 as isOblig
from PO_BreakOutAll
) union all
(select PO_ID, PO_LN_NO, 0, 1, 0, 0
from PO_Cummins
) union all
(select PO_ID, PO_LN_NO, 0, 0, 1, 0
from PO_Filters
) union all
(select PO_ID, PO_LN_NO, 0, 0, 0, 1
from PO_FixedOblig
)
) t
group by PO_ID, PO_LN_NO
) t
group by isAll, isCummins, isFilters, isOblig;
If you want to find rows that fail your test, just use the subquery with where conditions:
select PO_ID, PO_LN_NO, sum(isAll) as isAll, sum(isCummins) as isCummins,
sum(isFilters) as isFilters, sum(isOblig) as isOblig
from ((select PO_ID, PO_LN_NO, 1 as isAll, 0 as isCummins, 0 as isFilters, 1 as isOblig
from PO_BreakOutAll
) union all
(select PO_ID, PO_LN_NO, 0, 1, 0, 0
from PO_Cummins
) union all
(select PO_ID, PO_LN_NO, 0, 0, 1, 0
from PO_Filters
) union all
(select PO_ID, PO_LN_NO, 0, 0, 0, 1
from PO_FixedOblig
)
) t
group by PO_ID, PO_LN_NO
having sum(isAll) <> 1 or
(sum(isAll) = 1 and (sum(isCummins) + sum(isFilters) + sum(isOblig) <> 1)
);

SQL Server GROUP BY troubles!

I'm getting a frustrating error in one of my SQL Server 2008 queries. It parses fine, but crashes when I try to execute. The error I get is the following:
Msg 8120, Level 16, State 1, Line 4
Column
'customertraffic_return.company' is
invalid in the select list because it
is not contained in either an
aggregate function or the GROUP BY
clause.
SELECT *
FROM (SELECT ctr.sp_id AS spid,
Substring(ctr.company, 1, 20) AS company,
cci.email_address AS tech_email,
CASE
WHEN rating IS NULL THEN 'unknown'
ELSE rating
END AS rating
FROM customer_contactinfo cci
INNER JOIN customertraffic_return ctr
ON ctr.sp_id = cci.sp_id
WHERE cci.email_address <> ''
AND cci.email_address NOT LIKE '%hotmail%'
AND cci.email_address IS NOT NULL
AND ( region LIKE 'Europe%'
OR region LIKE 'Asia%' )
AND SERVICE IN ( '1', '2' )
AND ( rating IN ( 'Premiere', 'Standard', 'unknown' )
OR rating IS NULL )
AND msgcount >= 5000
GROUP BY ctr.sp_id,
cci.email_address) AS a
WHERE spid NOT IN (SELECT spid
FROM customer_exclude)
GROUP BY spid,
tech_email
Well, the error is pretty clear, no??
You're selecting those columns in your inner SELECT:
spid
company
tech_email
rating
and your grouping only by two of those (GROUP BY ctr.sp_id, cci.email_address).
Either you need group by all four of them (GROUP BY ctr.sp_id, cci.email_address, company, rating), or you need to apply an aggregate function (SUM, AVG, MIN, MAX) to the other two columns (company and rating).
Or maybe using a GROUP BY here is totally the wrong way to do - what is it you're really trying to do here??
The inner query:
SELECT ctr.sp_id AS spid,
Substring(ctr.company, 1, 20) AS company,
cci.email_address AS tech_email,
CASE
WHEN rating IS NULL THEN 'unknown'
ELSE rating
END AS rating
FROM customer_contactinfo cci
INNER JOIN customertraffic_return ctr
ON ctr.sp_id = cci.sp_id
WHERE cci.email_address <> ''
AND cci.email_address NOT LIKE '%hotmail%'
AND cci.email_address IS NOT NULL
AND ( region LIKE 'Europe%'
OR region LIKE 'Asia%' )
AND SERVICE IN ( '1', '2' )
AND ( rating IN ( 'Premiere', 'Standard', 'unknown' )
OR rating IS NULL )
AND msgcount >= 5000
GROUP BY ctr.sp_id,
cci.email_address
has 4 non-aggregate things in the select (sp_id, company, email_address, rating) and you only group on two of them, so it is throwing an error on the first one it sees
So you either need to not group by any of them or group by all of them
i suggest replacing the * with a fully specified column list.
you can either group by all selected columns or use the other columns (not in group by clause) in a aggregate function (like sum)
you cannot: select a,b,c from bla group by a,b
but you can: select a,b,sum(c) from bla groupy by a,b