Find row that has the largest sum

Find row that has the largest sum - sql

I have data for tutors. I recorded hours spent tutoring by month in the SESSION table. I need to know who had the most tutoring hours in March, 2006.
TABLE TUTOR
tutorID
1
2
TABLE SESSION
tutorID Hours Month
1 2 March
1 1 March
2 1 March
Expected Output:
TutorID
1

I would suggest:
select top 1 sum(Hours), tutorID from SESSION where Month like 'March' group by
tutorID order by sum(Hours) DESC

Use 2 CTEs.
The 1st will return all the sums for each tutor.
The 2nd will return the maximum of the sums returned by the 1st cte.
Finally your select statement will return only the tutors from the 1st cte that have sum of hours equal to that maximum returned by the 2nd cte.
with
sumcte as (
select tutorID, sum(hours) sumhours
from session
where month = 'March' -- here there should be another condition for the year?
group by tutorID
),
maxcte as (
select max(sumhours) maxhours from sumcte
)
select tutorid from sumcte
where sumhours = (select maxhours from maxcte)

Related

Finding the average when values are missing using SQL

I'm using Presto but any flavor of SQL will do.
I have a table in that format.
Group_id
event_id
month
party
time_interval
1
1
Jan
Player A
1 hour
1
1
Jan
Player A
2 hours
1
1
Jan
Player B
1 hours
1
1
Jan
Player B
1 hour
1
2
Jan
Player A
3 hour
I need to get the average per group_id, per month, per party
Here's how my average should be calculated
total number of hours per group, per month, per party/total number of events per org, per month
Here's the output I should be expecting for clarity's sake:
Group_id
month
party
avg_time_interval
1
Jan
Player A
3 hours
1
Jan
Player B
1 hour
Now here's the tricky part. For the first row everything makes perfect sense. We have 6 hours across both events, which we divide by 2 distinct events and get an average of 3.
However for the 2nd row, we get 1 hour instead of 2 because since the user did not get a time included we should be assuming that the interval there was 0. This means that there are still 2 unique events across that org_id, month. So the 2 hours totaled should be divided by 2 and not by 1.
This missing data essentially has made this way more complicated than it should be. Otherwise I believe running the following would've solved it
SELECT Group_id , month, party, total/num_cases FROM(
SELECT Group_id , month, party, SUM(time_interval) AS total, COUNT(DISTINCT(event_id)) AS num_cases
FROM table
GROUP BY Group_id , month, party
)

You may find the count of distinct event_id values grouped by group_id, month; then join this with your table as the following:
SELECT T.Group_id, T.month, T.party
,SUM(T.time_interval)*1.0/ MAX(D.eid) AS avg_time_interval
FROM tbl T
JOIN
(
SELECT Group_id, month,
COUNT(DISTINCT event_id) AS eid
FROM tbl GROUP BY Group_id, month
) D
ON T.Group_id=D.Group_id AND
T.month=D.month
GROUP BY T.Group_id,T.month,T.party
ORDER BY T.Group_id,T.month,T.party

select distinct Group_id
,month
,party
,total_hours_per_party/max(dns_rnk) over() as avg_time_interval
from (
select Group_id
,month
,party
,sum(time_interval) over(partition by party) as total_hours_per_party
,dense_rank() over(order by event_id) as dns_rnk
from t
) t
Group_id
month
party
avg_time_interval
1
Jan
Player A
3
1
Jan
Player B
1
Fiddle

Retrieve Customers with a Monthly Order Frequency greater than 4

I am trying to optimize the below query to help fetch all customers in the last three months who have a monthly order frequency +4 for the past three months.
Customer ID
Feb
Mar
Apr
0001
4
5
6
0002
3
2
4
0003
4
2
3
In the above table, the customer with Customer ID 0001 should only be picked, as he consistently has 4 or more orders in a month.
Below is a query I have written, which pulls all customers with an average purchase frequency of 4 in the last 90 days, but not considering there is a consistent purchase of 4 or more last three months.
Query:
SELECT distinct lines.customer_id Customer_ID, (COUNT(lines.order_id)/90) PurchaseFrequency
from fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY Customer_ID
HAVING PurchaseFrequency >=4;
I tried to use window functions, however not sure if it needs to be used in this case.

I would sum the orders per month instead of computing the avg and then retrieve those who have that sum greater than 4 in the last three months.
Also I think you should select your interval using "month(CURRENT_DATE()) - 3" instead of using a window of 90 days. Of course if needed you should handle the case of when current_date is jan-feb-mar and in that case go back to oct-nov-dec of the previous year.
I'm not familiar with Google BigQuery so I can't write your query but I hope this helps.

So I've found the solution to this using WITH operator as below:
WITH filtered_orders AS (
select
distinct customer_id ID,
extract(MONTH from date) Order_Month,
count(order_id) CountofOrders
from customer_order_lines` lines
where EXTRACT(YEAR FROM date) = 2022 AND EXTRACT(MONTH FROM date) IN (2,3,4)
group by ID, Order_Month
having CountofOrders>=4)
select distinct ID
from filtered_orders
group by ID
having count(Order_Month) =3;
Hope this helps!

An option could be first count the orders by month and then filter users which have purchases on all months above your threshold:
WITH ORDERS_BY_MONTH AS (
SELECT
DATE_TRUNC(lines.date, MONTH) PurchaseMonth,
lines.customer_id Customer_ID,
COUNT(lines.order_id) PurchaseFrequency
FROM fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY PurchaseMonth, Customer_ID
)
SELECT
Customer_ID,
AVG(PurchaseFrequency) AvgPurchaseFrequency
FROM ORDERS_BY_MONTH
GROUP BY Customer_ID
HAVING COUNT(1) = COUNTIF(PurchaseFrequency >= 4)

SQL Grouping with No Duplicates

Here is the output. No problem here. Exactly what I want. But I added the DISTINCT ID to remove duplicates and that works in each grouped month.
MN | CNT
====================
1 | 1
10 | 2
11 | 5
12 | 5
SELECT EXTRACT(MONTH FROM TRUNC(HDATE)) as MN, COUNT(DISTINCT ID) as CNT
FROM Schema.TRAVEL
WHERE (ARR = '2' OR ARR = '3')
AND
HDATE BETWEEN to_date('2015-10-01', 'yyyy-mm-dd') AND to_date('2016-09-30', 'yyyy-mm-dd')
GROUP BY EXTRACT(MONTH FROM TRUNC(HDATE));
But I can still possibly have duplicates that span more than each month. So if I have a record in October and another in November with the same ID - I want to only count this once - that is my issue
So over the course of a year or any time period - an ID only gets counted once...but I still need to maintain the monthly groupings and output...
??

In other words, you want to count each id in the first month where it appears.
SELECT EXTRACT(MONTH FROM TRUNC(HDATE)) as MN, COUNT(DISTINCT ID) as CNT
FROM (SELECT id, MIN(HDATE) as HDATE
FROM Schema.TRAVEL t
WHERE ARR IN '2', '3') AND
HDATE BETWEEN DATE '2015-10-01' AND DATE '2016-09-30'
GROUP BY id
) t
GROUP BY EXTRACT(MONTH FROM TRUNC(HDATE));
Note: If an id appears before '2015-10-01', this will still count the id in the first month it appears after that date. If you don't want such an id counted at all, move the HDATE comparison to the outer query.

Get the first occurence of the result in each specified group

I have this query in sql server 2012
select sum(user_number),
sum(media_number),
month_name from (
select TOP 100
count(distinct a.answer_group_guid) as 'user_number',
count(distinct a.media_guid) as 'media_number',
datename(mm,answer_datetime) as 'month_name' ,year(answer_datetime) as 'year'
from
tb_answers as a
left outer join
tb_media as m ON m.user_guid = 'userguid' and m.media_guid=a.media_guid
where
m.user_guid = 'userguid'
group by concat(year(answer_datetime),'',month(answer_datetime)),datename(mm,answer_datetime),year(answer_datetime)
order by year(answer_datetime) desc) as aa
group by month_name,year
order by month_name desc,year desc;
it get this result
Out
user_number media_number month_name
5 1 September
2 1 October
1 1 October
1 1 August
But I need only the first occurence of octuber month
as
user_number media_number month_name
5 1 September
2 1 October
1 1 August

You simply need to use a ranking function like ROW_NUMBER(). Use it to number the records partitioning by month_name, and select only the records which are number 1 in each partition, i.e.
Add this to the select list of your query:
ROW_NUMBER() OVER(PARTITION BY month_name ORDER By XXX) as RowNumber
This will number the rows which have the same month_name with consecutive numbers, starting by 1, and in the order specified by XXX.
NOTE: specify the order in XXX to decide which of the month rows is number one and will be returned by the query
And then, do a select from the resulting query, filtering by RowNumber = 1
SELECT Q.user_number, Q.media_number, Q.month_name
FROM(
-- your query + RowNumber) Q
WHERE Q.RowNumber = 1
NOTE: if you need some ordering in your result, you'll have to move the ORDER BY out of the subselect, and write it beside the WHERE Q.RowNumber=1

SQL Query to fetch number of employees joined over a calender year, broken down per month

I'm trying to find the number of employees joined over a calender year, broken down on a monthly basis. So if 15 employees had joined in January, 30 in February and so on, the output I'd like would be
Month | Employees
------|-----------
Jan | 15
Feb | 30
I've come up with a query to fetch it for a particular month
SELECT * FROM (
SELECT COUNT(EMP_NO), EMP_JN_DT
FROM EMP_REG WHERE
EMP_JN_DT between '01-NOV-09' AND '30-NOV-09'
GROUP BY EMP_JN_DT )
ORDER BY 2
How do I extend this for the full calender year?

SELECT Trunc(EMP_JN_DT,'MM') Emp_Jn_Mth,
Count(*)
FROM EMP_REG
WHERE EMP_JN_DT between date '2009-01-01' AND date '2009-12-31'
GROUP BY Trunc(EMP_JN_DT,'MM')
ORDER BY 1;
If you do not have anyone join in a particular month then you'd get no row returned. To over come this you'd have to outerjoin the above to a list of months in the required year.

SELECT to_date(EMP_JN_DT,'MON') "Month", EMP_NO "Employees"
FROM EMP_REG
WHERE EMP_JN_DT between date '2009-01-01' AND date '2009-12-31'
GROUP by "Month"
ORDER BY 1;

http://www.techonthenet.com/oracle/functions/extract.php
There is a function that returns month. What you need to do is just put it in group by

The number of employees in January can be selected in the following way:
SELECT EXTRACT(MONTH FROM HIREDATE) AS MONTH1, COUNT(*)
FROM employee
WHERE EXTRACT(MONTH FROM HIREDATE)=1
GROUP BY EXTRACT(MONTH FROM HIREDATE)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find row that has the largest sum - sql

I have data for tutors. I recorded hours spent tutoring by month in the SESSION table. I need to know who had the most tutoring hours in March, 2006. TABLE TUTOR tutorID 1 2 TABLE SESSION tutorID Hours Month 1 2 March 1 1 March 2 1 March Expected Output: TutorID 1

I would suggest: select top 1 sum(Hours), tutorID from SESSION where Month like 'March' group by tutorID order by sum(Hours) DESC

Related

Finding the average when values are missing using SQL

Retrieve Customers with a Monthly Order Frequency greater than 4

SQL Grouping with No Duplicates

Get the first occurence of the result in each specified group

SQL Query to fetch number of employees joined over a calender year, broken down per month

Categories

Resources