Distinctly sum a column on a joined table? - sql

This is a simple problem, and I'm not sure if its possible here. Here's the problem:
=> http://sqlfiddle.com/#!12/584f1/7
Explanation:
A ticket belongs to an attendee
An attendee has a revenue
I need to group the tickets by section and get the total revenue.
This double counts attendees because 2 tickets can belong to the same attendee, thus double counting it. I'd like to grab the sum of the revenue, but only count the attendees once.
In my sqlfiddle example, I'd like to see:
section | total_revenue
------------------------
A | 40 <= 40 is correct, but I'm getting 50...
B | null
C | 40
I'd like to solve this without the use of sub queries. I need a scalable solution that will allow me to do this for multiple columns on different joins in a single query. So whatever allows me to accomplish this, I'm open to suggestions.
Thanks for your help.

Here is a version using row_number():
select section,
sum(revenue) Total
from
(
select t.section, a.revenue,
row_number() over(partition by a.id, t.section order by a.id) rn
from tickets t
left join attendees a
on t.attendee_id = a.id
) src
where rn = 1
group by section
order by section;
See SQL Fiddle with Demo

Again, without subquery:
Key element is to add PARTITION BY to the window function(s):
SELECT DISTINCT
t.section
-- ,sum(count(*)) OVER (PARTITION BY t.section) AS tickets_count
,sum(min(a.revenue)) OVER (PARTITION BY t.section) AS atendees_revenue
FROM tickets t
LEFT JOIN attendees a ON a.id = t.attendee_id
GROUP BY t.attendee_id, t.section
ORDER BY t.section;
-> sqlfiddle
Here, you GROUP BY t.attendee_id, t.section, before you run the result through the window function. And use PARTITION BY t.section in the window function as you want results partitioned by section this time.
Uncomment the second line if you want to get a count of tickets, too.
Otherwise, it works similar to my answer to your previous question. I.e., the rest of the explanation applies.

You can do this:
select t.section, sum(d.revenue)
from
(
SELECT DISTINCT section, attendee_id FROM tickets
) t
left join attendees d on t.attendee_id = d.id
group by t.section
order by t.section;

Related

How to Rank Based on Multiple Columns

I'm trying to score people in Microsoft Access based on the count they have for a particular category.
There are 7 possible categories a person can have against them, and I want to assigned each person a score from 1-7, with 1 being assigned to the highest scoring category, 7 being the lowest. They might not have an answer for every category, in which case that category can be ignored.
The aim would be to have an output result as shown in this image:
I've tried a few different things, including partition over and joins, but none have worked. To be honest I think I'm way off the mark with the queries I've been trying. I've tried to write the code in SQL from scratch, and used query builder.
Any help is really appreciated!
As you for an email can have duplicated counts, you will need two subqueries for this:
SELECT
Score.email,
Score.category,
Score.[Count],
(Select Count(*) From Score As T Where
T.email = Score.email And
T.[Count] >= Score.[Count])-
(Select Count(*) From Score As S Where
S.email = Score.email And
S.[Count] = Score.[Count] And
S.category > Score.category) AS Rank
FROM
Score
ORDER BY
Score.email,
Score.[Count] DESC,
Score.category;
For categories with equal Count values for the same email, the following will rank the records alphabetically descending by Category name (since this is what is shown in your example):
select t.email, t.category, t.count,
(
select count(*) from YourTable u
where t.email = u.email and
((t.count = u.count and t.category <= u.category) or t.count < u.count)
) as rank
from YourTable t
order by t.email, t.count desc, t.category desc
Change both references of YourTable to the name of your table.

SQL: Take 1 value per grouping

I have a very simplified table / view like below to illustrate the issue:
The stock column represents the current stock quantity of the style at the retailer. The reason the stock column is included is to avoid joins for reporting. (the table is created for reporting only)
I want to query the table to get what is currently in stock, grouped by stylenumber (across retailers). Like:
select stylenumber,sum(sold) as sold,Max(stock) as stockcount
from MGTest
I Expect to get Stylenumber, Total Sold, Most Recent Stock Total:
A, 6, 15
B, 1, 6
But using ...Max(Stock) I get 10, and with (Sum) I get 25....
I have tried with over(partition.....) also without any luck...
How do I solve this?
I would answer this using window functions:
SELECT Stylenumber, Date, TotalStock
FROM (SELECT M.Stylenumber, M.Date, SUM(M.Stock) as TotalStock,
ROW_NUMBER() OVER (PARTITION BY M.Stylenumber ORDER BY M.Date DESC) as seqnum
FROM MGTest M
GROUP BY M.Stylenumber, M.Date
) m
WHERE seqnum = 1;
The query is a bit tricky since you want a cumulative total of the Sold column, but only the total of the Stock column for the most recent date. I didn't actually try running this, but something like the query below should work. However, because of the shape of your schema this isn't the most performant query in the world since it is scanning your table multiple times to join all of the data together:
SELECT MDate.Stylenumber, MDate.TotalSold, MStock.TotalStock
FROM (SELECT M.Stylenumber, MAX(M.Date) MostRecentDate, SUM(M.Sold) TotalSold
FROM [MGTest] M
GROUP BY M.Stylenumber) MDate
INNER JOIN (SELECT M.Stylenumber, M.Date, SUM(M.Stock) TotalStock
FROM [MGTest] M
GROUP BY M.Stylenumber, M.Date) MStock ON MDate.Stylenumber = MStock.Stylenumber AND MDate.MostRecentDate = MStock.Date
You can do something like this
SELECT B.Stylenumber,SUM(B.Sold),SUM(B.Stock) FROM
(SELECT Stylenumber AS 'Stylenumber',SUM(Sold) AS 'Sold',MAX(Stock) AS 'Stock'
FROM MGTest A
GROUP BY RetailerId,Stylenumber) B
GROUP BY B.Stylenumber
if you don't want to use joins
My solution, like that of Gordon Linoff, will use the window functions. But in my case, everything will turn around the RANK window function.
SELECT stylenumber, sold, SUM(stock) totalstock
FROM (
SELECT
stylenumber,
SUM(sold) OVER(PARTITION BY stylenumber) sold,
RANK() OVER(PARTITION BY stylenumber ORDER BY [Date] DESC) r,
stock
FROM MGTest
) T
WHERE r = 1
GROUP BY stylenumber, sold

SQL Query: Find the name of the company that has been assigned the highest number of patents

Using this query I can find the Company Assignee number for company with most patents but I can't seem to print the company name.
SELECT count(*), patent.assignee
FROM Patent
GROUP BY patent.assignee
HAVING count(*) =
(SELECT max(count(*))
FROM Patent
Group by patent.assignee);
COUNT(*) --- ASSIGNEE
9 19715
9 27895
Nesting above query into
SELECT company.compname
FROM company
WHERE ( company.assignee = ( *above query* ) );
would give an error "too many values" since there are two companies with most patents but above query takes only one assignee number in the WHERE clause. How do I solve this problem? I need to print name of BOTH companies with assignee number 19715 and 27895. Thank you.
You have started down the path of using nested queries. All you need to do is remove COUNT(*):
SELECT company.compname
FROM company
WHERE company.assignee IN
(SELECT patent.assignee
FROM Patent
GROUP BY patent.assignee
HAVING count(*) = (SELECT max(count(*))
FROM Patent
GROUP BY patent.assignee
)
);
I wouldn't write the query this way. The use of max(count(*)) is particularly jarring, but it is valid Oracle syntax.
Applying an aggregate function on another aggregate function (like max(count(*))) is illegal in many databases but I believe using the ALL operator instead and a join to get the company name would solve your problem.
Try this:
SELECT COUNT(*), p.assignee, c.compname
FROM Patent p
JOIN Company c ON c.assignee = p.assignee
GROUP BY p.assignee, c.compname
HAVING COUNT(*) >= ALL -- this predicate will return those rows
( -- for which the comparison holds true
SELECT COUNT(*) -- for all instances.
FROM Patent -- it can only be true for the highest count
GROUP BY assignee
);
Assuming you have Oracle, I thought about this a bit differently:
select
c.compname
from
company c
join
(
select
assignee,
dense_rank() over (order by count(1) desc) rnk
from
patent
group by
assignee
) p
on p.assignee = c.assignee
where
p.rnk = 1
;
I like this because is lets you find the any rank. For example, if you want the top 3 you would just change p.rnk = 1 to p.rnk <= 3. If you want 10th place, you just change it to p.rnk = 10. Adding the total count and rank into the results would be easy from here too. Overall I think it's more versatile.

Highest Record for a set user

Hope someone can help.
I have been trying a few queries but I do not seem to be getting the desired result.
I need to identify the highest ‘’claimed’’ users within my table without discarding the columns from the final report.
The user can have more than one record in the table, however the data will be completely different as only the user will match.
The below query only provides me the count per user without giving me the details.
SELECT User, count (*) total_record
FROM mytable
GROUP BY User
ORDER BY count(*) desc
Table:
mytable
Column 1 = User Column 2 = Ref Number Column 3 = Date
The first column will be the unique identifier, however the data in the other columns will differ, therefore it needs to descend the highest claimed user with all the relevant rows to the user to the least claimed user.
User|Ref Num|Date
1|a|20150317
1|b|20150317
2|c|20150317
3|d|20150317
4|e|20150317
1|f|20150317
4|e|20150317
The below data is how the values should be returned.
User|Ref Num|Date|Count
1|a|20150317|3
1|b|20150317|3
1|f|20150317|3
2|c|20150317|1
3|d|20150317|1
4|e|20150317|2
4|e|20150317|2
Hope it makes sense.
Thank you
As you're using MSSQL you can use the OVER() clause like so:
SELECT [user], mt.ref_num, mt.[date], COUNT(mt.[user]) OVER(PARTITION BY mt.[user])
FROM myTable mt
More about the OVER clause can be found here: https://msdn.microsoft.com/en-us/library/ms189461.aspx
As per your comment you can use the wildcard * like so:
SELECT mt.*, COUNT(mt.[user]) OVER(PARTITION BY mt.[user])
FROM myTable mt
This would get you every column as well as the result of the count.
If you want to order by the number of record for each user, then use window functions instead of aggregation:
SELECT t.*
FROM (SELECT t., count(*) OVER (partition by user) as cnt
FROM mytable t
) t
ORDER BY cnt DESC, user;
Note that I added user to the order by so users with the same count will appear together in the list.
You could use an outer apply if your version of SQL Server supports it:
SELECT [User], [Ref Num], Date, total_record
FROM mytable M
OUTER APPLY (
SELECT count(*) total_record
FROM mytable
WHERE [user] = M.[user]
GROUP BY [user]
) oa
ORDER BY total_record desc, [user]
Note that user is a reserved keyword in MSSQL and you need to enclose it in either brackets [user] or double-quotes "user".
This would produce an output like:
user Ref Num Date total_record
1 a 2015-03-17 3
1 b 2015-03-17 3
1 f 2015-03-17 3
4 e 2015-03-17 2
4 e 2015-03-17 2
2 c 2015-03-17 1
3 d 2015-03-17 1
Note that the answers using the count(*) OVER (partition by [user]) construct are more efficient though.
Most simple way would be to use window fuction.
SELECT table.*, COUNT(*) OVER (PARTITION BY user)
FROM nameoftable table -- this is an alias
ORDER BY user, ref_num
This also seem to fit your need.
This is the old way of doing it. Where possible you should use OVER but as other people have answered with that I thought I'd throw this one into the mix.
SELECT
T.[User]
,T.[Ref Num]
,T.[Date]
,(SELECT count(*) from [myTable] T2 where T2.[User] = T.[USER]) as [Count]
FROM [mytable] T
ORDER BY [Count] DESC

Get records with the newest date in Oracle

I need to find the emails of the last person that performed an action over a post. The database structure is a little bit complicated because of several reasons not important for the case.
SELECT u.address
FROM text t
JOIN post p ON (p.pid=t.pid)
JOIN node n ON (n.nid=p.nid)
JOIN user u ON (t.login=u.login)
WHERE n.nid='123456'
AND p.created IN (
SELECT max(p.created)
FROM text t
JOIN post p ON (p.pid=t.pid)
JOIN node n ON (n.nid=p.nid)
WHERE n.nid='123456');
I would like to know if there is a way to do use the max function or any other way to get the latest date without having to make a subquery (that is almost the same as the main query).
Thank you very much
You can use a window function (aka "analytical" function) to calculate the max date.
Then you can select all rows where the created date equals the max. date.
select address
from (
SELECT u.address,
p.created,
max(p.created) over () as max_date
FROM text t
JOIN post p ON (p.pid=t.pid)
JOIN node n ON (n.nid=p.nid)
JOIN user u ON (t.login=u.login)
WHERE n.nid='123456'
) t
where created = max_date;
The over() clause is empty as you didn't use a GROUP BY in your question. But if you need e.g. the max date per address then you could use
max(p.created) over (partition by t.adress) as max_date
The partition by works like a group by
You can also extend that query to work for more than one n.id. In that you you have to include it in the partition:
max(p.created) over (partition by n.id, ....) as max_date
Btw: if n.id is a numeric column you should not compare it to a string literal. '123456' is a string, 123456 is a number
SELECT address
FROM (
SELECT u.address,
row_number() OVER (PARTITION BY n.nid ORDER BY p.created DESC) AS rn
FROM text t JOIN post p ON (p.pid=t.pid)
JOIN node n ON (n.nid=p.nid)
JOIN user u ON (t.login=u.login)
WHERE n.nid='123456'
)
WHERE rn = 1;
The ROW_NUMBER function numbers the rows in descending order of p.created with PARTITION BY n.nid making separate partitions for row numbers of separate n.nids.