Count rows in hierarchy structure including subtrees - sql

I'm trying to use recursive query to get number of events for every category including subcategories. I have 3 tables - ContentTabs (hierarchical table), Events and intermediate table RelEventsToContentTabs so it's simple many-to-many relationship.
The problem is when I use a query such as one below I get number of Events for every category but without number of events for subcategories.
I'm using SQL Server 2008.
Any ideas?
WITH ContentTabsStructure (Id, Name)
AS
(
SELECT Id, Name,parentId FROM ContentTabs
WHERE Id =1
UNION ALL
SELECT ct.Id, ct.Name,ct.parentId FROM ContentTabs AS ct
INNER JOIN ContentTabsStructure AS cts
ON ct.ParentId = cts.Id
)
SELECT cts.id,cts.Name, Count(distinct e.id) as NumberOfEvents
FROM ContentTabsStructure cts
INNER JOIN RelEventsToContentTabs etct
ON cts.id = etct.contentTabId
INNER JOIN Events e
ON etct.eventId = e.id
GROUP BY cts.id,cts.Name

You can include parentId also in CTE as:
WITH ContentTabsStructure (Id, Name,parentId)
to get subcategories for every category and then include some thing like below in selected columns to get number of events for subcategories:
, NumberOfSubCatagoryEvents = isnull(
(
Count(distinct e.id)
FROM ContentTabsStructure cts1
INNER JOIN RelEventsToContentTabs etct1
ON cts1.id = etct1.contentTabId
INNER JOIN Events e1
ON etct1.eventId = e1.id
where cts1.parentId<cts.parentId
GROUP BY cts1.id,cts1.Name
), 0 )

Related

Group by and Having aggregation

i'm trying to determine who is the largest scorer in a world cup group (this is a personal project)
I have the data but i'm having a hard time using count, group by and having in order to accomplish what i need.
I need to count messi's goals (top scorer) and group by each one of the groups so i get the highest scorer of each group.
For now i just have the joins:
select * from zonas
left join goles_zonas on (zonas.id = goles_zonas.Id_zona)
inner join goles on (goles.id = goles_zonas.id_gol)
inner join jugadores on (goles.id_jugador = jugadores.id)
instead displaying all columns (by using SELECT * ), in order to group the data, I find it necessary to do SELECT only certain columns which are considered to be the keys to determine the difference of each group of dataset to get the aggregation (in this case COUNT) of each dataset group
SELECT Id_zona, id_gol, id_jugador, COUNT(1) as number_of_goal
FROM zonas
left join goles_zonas on (zonas.id = goles_zonas.Id_zona)
inner join goles on (goles.id = goles_zonas.id_gol)
inner join jugadores on (goles.id_jugador = jugadores.id)
GROUP BY Id_zona, id_gol, id_jugador
It has to be grouped by all columns included the select statement that does not being aggregated.
but if you expect to display other columns as well which are not part of the grouping keys, you can do it like this
SELECT goles_zonas.* , x.* FROM (
SELECT Id_zona, id_gol, id_jugador, COUNT(1) as number_of_goal
FROM zonas
left join goles_zonas on (zonas.id = goles_zonas.Id_zona)
inner join goles on (goles.id = goles_zonas.id_gol)
inner join jugadores on (goles.id_jugador = jugadores.id)
GROUP BY Id_zona, id_gol, id_jugador ) X
LEFT JOIN goles_zonas on (x.id = goles_zonas.Id_zona)

Get some data which corresponds to the maximum date

I have these 3 tables:
Table ORG:
Fields:historyid, personid
Table PERSON:
Fields: id
Table HISTORY:
Fields: id,date,personid
Both HISTORY and ORG are linked to PERSON with an 1:N relationship. Also, ORG is linked to HISTORY with an 1:N relationship. I want to get from table ORG for each person just one row: this which corresponds to the HISTORY row with the highest date. The following SQL gives the highest date for a certain person. However, I do not know how to combine this with the above requirement.
SELECT ash1.id
FROM
(SELECT * FROM history a WHERE a.personid=person.id) ash1
LEFT JOIN
(SELECT * FROM history b WHERE b.personid=person.id) ash2
ON ash1.personid=ash2.personid
AND ash1.date < ash2.date
WHERE ash2.date IS NULL
I think you can do it by using MAX() and GROUP BY:
SELECT
o.historyid AS o_hist,
o.personid AS o_per,
h.id AS h_id,
MAX(h.date) AS h_date,
h.personid AS h_person
FROM
org o
LEFT JOIN
person p ON p.id = o.personid
LEFT JOIN
history h ON h.id = o.historyid AND h.personid = p.id
GROUP BY o_per
Try the below query..
;WITH_CTE_HighestHistory
AS (SELECT PersonID,MAX([Date]) HDate
FROM History
GROUP BY PersonID)
SELECT org.*,h.*
FROM org o
LEFT JOIN History h ON o.Historyid=h.Id and o.PersonID=h.PersonId
INNER JOIN WITH_CTE_HighestHistory ch ON h.Personid=ch.Personid and h.[Date]=ch.[Date]
WHERE EXISTS(SELECT 1 FROM Person p WHERE p.Id=o.PersonID )
There are multiple ways to approach this, depending on the database. However, your data structure is awkward. Why does org have historyid? That doesn't really make sense to me.
In any case, based on your description, this should work:
select o.*, h.*
from org o join
history h
on h.personid = o.personid
where h.date = (select max(h2.date)
from history h2
where h2.personid = h.personid
);
You might want to start the from clause as:
from (select distinct personid from org) o
So, you only get one person, if they are repeated in the table.

Selecting a count of items and a count of all children for those items

I am writing a query that reports on information about a location, including the total number of classes occurring at that location and the total number of attendees who are registered for those classes. I need the results in a single row.
I am looking for the most efficient and/or most readable way to perform this query.
The cleanest query that I've come up with is the following:
SELECT Location.Id AS LocationId,
--additional columns from location or joined tables...
ClassStatistics.TotalClasses,
ClassStatistics.TotalRegistrants
FROM Locations AS Location
OUTER APPLY
(
SELECT
COUNT(*) AS TotalClasses,
SUM(TotalRegistrantsInClass) AS TotalRegistrants
FROM
(
SELECT
Class.Id AS ClassId,
COUNT(*) AS TotalRegistrantsInClass
FROM
Classes AS Class
LEFT OUTER JOIN
Attendees AS Attendee
ON
Attendee.ClassId = Class.Id
WHERE
Class.LocationId= Location.Id
GROUP BY
Class.Id
) AS AttendeeTotalsByClass
) AS ClassStatistics
WHERE
Location.Id = 1
Is this sort of query acceptable in practice, or have I missed some magic to make it more efficient?
You should just join directly to the Classes and Attendee tables. No need for all the sub-queries.
SELECT Location.Id AS LocationId,
--additional columns from location or joined tables...
COUNT(DISTINCT C.ID) AS TotalClasses,
COUNT(A.ID) AS TotalRegistrants
FROM Locations AS L
INNER JOIN
CLASSES C
ON C.LocationId= L.Id
LEFT OUTER JOIN
Attendees A
ON A.ClassId = C.Id
WHERE L.Id = 1
GROUP BY
Location.Id AS LocationId,
--additional columns from location or joined tables...

Left outer join two levels deep in Postgres results in cartesian product

Given the following 4 tables:
CREATE TABLE events ( id, name )
CREATE TABLE profiles ( id, event_id )
CREATE TABLE donations ( amount, profile_id )
CREATE TABLE event_members( id, event_id, user_id )
I'm attempting to get a list of all events, along with a count of any members, and a sum of any donations. The issue is the sum of donations is coming back wrong (appears to be a cartesian result of donations * # of event_members).
Here is the SQL query (Postgres)
SELECT events.name, COUNT(DISTINCT event_members.id), SUM(donations.amount)
FROM events
LEFT OUTER JOIN profiles ON events.id = profiles.event_id
LEFT OUTER JOIN donations ON donations.profile_id = profiles.id
LEFT OUTER JOIN event_members ON event_members.event_id = events.id
GROUP BY events.name
The sum(donations.amount) is coming back = to the actual sum of donations * number of rows in event_members. If I comment out the count(distinct event_members.id) and the event_members left outer join, the sum is correct.
As I explained in an answer to the referenced question you need to aggregate before joining to avoid a proxy CROSS JOIN. Like:
SELECT e.name, e.sum_donations, m.ct_members
FROM (
SELECT e.id AS event_id, e.name, SUM(d.amount) AS sum_donations
FROM events e
LEFT JOIN profiles p ON p.event_id = e.id
LEFT JOIN donations d ON d.profile_id = p.id
GROUP BY 1, 2
) e
LEFT JOIN (
SELECT m.event_id, count(DISTINCT m.id) AS ct_members
FROM event_members m
GROUP BY 1
) m USING (event_id);
IF event_members.id is the primary key, then id is guaranteed to be UNIQUE in the table and you can drop DISTINCT from the count:
count(*) AS ct_members
You seem to have this two independent structures (-[ means 1-N association):
events -[ profiles -[ donations
events -[ event members
I wrapped the second one into a subquery:
SELECT events.name,
member_count.the_member_count
COUNT(DISTINCT event_members.id),
SUM(donations.amount)
FROM events
LEFT OUTER JOIN profiles ON events.id = profiles.event_id
LEFT OUTER JOIN donations ON donations.profile_id = profiles.id
LEFT OUTER JOIN (
SELECT
event_id,
COUNT(*) AS the_member_count
FROM event_members
GROUP BY event_id
) AS member_count
ON member_count.event_id = events.id
GROUP BY events.name
Of course you get a cartesian product between donations and events for every event since both are only bound to the event, there is no join relation between donations and event_members other than the event id, which of course means that every member matches every donation.
When you do your query, you ask for all events - let's say there are two, event Alpha and event Beta - and then JOIN with the members. Let's say that there is a member Alice that participates on both events.
SELECT events.name, COUNT(DISTINCT event_members.id), SUM(donations.amount)
FROM events
LEFT OUTER JOIN profiles ON events.id = profiles.event_id
LEFT OUTER JOIN donations ON donations.profile_id = profiles.id
LEFT OUTER JOIN event_members ON event_members.event_id = events.id
GROUP BY events.name
On each row you asked the total for Alice's donations. If Alice donated 100 USD, then you asked for:
Alpha Alice 100USD
Beta Alice 100USD
So it's not surprising that when asking for the sum total Alice comes out as having donated 200 USD.
If you wanted the sum of all donations, you'd better doing with two distinct queries. Trying to do everything with a single query, while possible, would be a classical SQL Antipattern (actually the one in chapter #18, "Spaghetti Query"):
Unintended Products
One common consequence of producing all your
results in one query is a Cartesian product. This happens when two of
the tables in the query have no condition restricting their
relationship. Without such a restriction, the join of two tables pairs
each row in the first table to every row in the other table. Each such
pairing becomes a row of the result set, and you end up with many more
rows than you expect.

i want to modify this SQL statement to return only distinct rows of a column

select
picks.`fbid`,
picks.`time`,
categories.`name` as cname,
options.`name` as oname,
users.`name`
from
picks
left join categories
on (categories.`id` = picks.`cid`)
left join options
on (options.`id` = picks.oid)
left join users
on (users.fbid = picks.`fbid`)
order by
time desc
that query returns a result that like:
my question is.... I would like to modify the query to select only DISTINCT fbid's. (perhaps the first row only sorted by time)
can someone help with this?
select
p2.fbid,
p2.time,
c.`name` as cname,
o.`name` as oname,
u.`name`
from
( select p1.fbid,
min( p1.time ) FirstTimePerID
from picks p1
group by p1.fbid ) as FirstPerID
JOIN Picks p2
on FirstPerID.fbid = p2.fbid
AND FirstPerID.FirstTimePerID = p2.time
LEFT JOIN Categories c
on p2.cid = c.id
LEFT JOIN Options o
on p2.oid = o.id
LEFT JOIN Users u
on p2.fbid = u.fbid
order by
time desc
I don't know why you originally had LEFT JOINs, as it appears that all picks must be associated with a valid category, option and user... I would then remove the left, and change them to INNER joins instead.
The first inner query grabs for each fbid, the FIRST entry time which will result in a single entity for the FBID. From that, it re-joins to the picks table for the same ID and timeslot... then continues for the rest of the category, options, users join criteria of that single entry.
2 options, you could write a group by clause.
Or you could write a nested query joined back to itself to get pertinent info.
Nested aliased table:
SELECT
n.fBids
FROM
MyTable t
INNER JOIN
(SELECT DISTINCT fBids
FROM MyTable) n
ON n.ID = t.ID
Or group by option
SELECT fBId from MyTable
GROUP BY fBID
select picks.`fbid`, picks.`time`, categories.`name` as cname,
options.`name` as oname, users.`name` from picks left join categories
on (categories.`id` = picks.`cid`) left join options on (options.`id` = picks.oid)
left join users on (users.fbid = picks.`fbid`)
order by time desc GROUP BY picks.`fbid`
select
picks.fbid,
MIN(picks.time) as first_time,
MAX(picks.time) as last_time
from
picks
group by
picks.fbid
order by
MIN(picks.time) desc
However, if you want only distinct fbid's you cannot display cname and other columns at the same time.