SQL: Running total by record groups - sql

I'm trying to create a running total based on groups of records. My data looks like the below
Country | Genre | Month | Amt
A X 1 5
A X 2 3
B X 1 8
B Y 1 10
A X 1 12
I'd like to do a running sum on Amt grouped by unique occurences of Genre and Month for each Country. So my desired output is
Country | Genre | Month | Amt | RunAmt
A X 1 5 5
A X 2 3 3
B X 1 8 8
B Y 1 10 10
A X 1 12 17
Since Genre = X and Month = 1 has occurred previously for country A.
My attempt has been sum(amt) over (partition by country, genre, month order by month rows between unbounded preceding and current row) but it seems to do a running sum over every row not for unique occurences by country. Any help would be appreciated.

Try this
SELECT *,
Sum(amt)
OVER(
partition BY country, genre, month
ORDER BY amt) as RunAmt
FROM ( VALUES ('A','X',1,5),
('A','X',2,3),
('B','X',1,8),
('B','Y',1,10),
('A','X',1,12) )tc(Country, Genre, Month, Amt)

Try this..
select *,
sum(amt) over
(
partition by country, Genre, month order by amt
) as RunAmt
from TableName
Here is the SQLFiddle for you that I tried.

Related

Extract column from SQL table based on another column if the same table

I m using POSTGRESQL.
Table of PURCHASES looks like this:
ID | CUSTOMER_ID | YEAR
1 1 2011
2 2 2012
3 2 2012
4 1 2013
5 3 2014
6 3 2014
7 3 2015
I need to extract 'ID' of the purchase with the latest 'date/year' for each CUSTOMER.
For example for CUSTOMER_ID 1 the year s 2013 which correcponds with id '4'.
I need to get ONE column as a return data structure.
PS. i m stuck with this kinda simple task )))
If you want one row per customer, you can use distinct on:
select distinct on (customer_id) id
from purchases
order by customer_id, year desc;
This returns one column which is an id from the most recent year for that customer.
This should work, but doesn't look too pretty...
SELECT DISTINCT ON(CUSTOMER_ID) ID FROM PURCHASES P
WHERE (CUSTOMER_ID,YEAR) =
(SELECT CUSTOMER_ID,MAX(YEAR) FROM PURCHASES WHERE CUSTOMER_ID = P.CUSTOMER_ID
GROUP BY CUSTOMER_ID);
So for input
ID | CUSTOMER_ID | YEAR
1 1 2011
2 2 2012
3 2 2012
4 1 2013
5 3 2014
6 3 2014
7 3 2015
It will return
id
4
2
7
Meaning:
For the lowest CUSTOMER_ID (it is 1) the id is 4 (year 2013)
Next we have CUSTOMER_ID (it is 2) the id is 2 (year 2012)
Lastly the CUSTOMER_ID (it is 3) the id is 7 (year 2015)
The idea behind this:
Group by CUSTOMER_ID
For each group select max(year)
While looping over all records - if Customer_id and year equals those from number 2. then select ID from this record.
Without DISTINCT ON(CUSTOMER_ID) it would return 2 records
for CUSTOMER_ID = 2, because for both years 2012 it would find some records while looping.
If you write in the beginning instead of:
SELECT DISTINCT ON(CUSTOMER_ID) ID FROM PURCHASES P
this code:
SELECT DISTINCT ON(CUSTOMER_ID) * FROM PURCHASES P
then you will see everything clearly.
Use row_number() analytic function with partition by customer_id to select by each customer with descending ordering by year ( if ties occur for year values [e.g. they're equal], then the below query brings the least ID values for each customer_id. e.g. 4, 2, 7 respectively )
WITH P2 AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY CUSTOMER_ID ORDER BY YEAR DESC) AS RN,
*
FROM PURCHASES
)
SELECT ID FROM P2 WHERE RN = 1
Demo

How to get MAX Hike in Min month?

below is table:
Name | Hike% | Month
------------------------
A 7 1
A 6 2
A 8 3
b 4 1
b 7 2
b 7 3
Result should be:
Name | Hike% | Month
------------------------
A 8 3
b 7 2
Here is one way of doing this:
SELECT Name, [Hike%], Month
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY [Hike%] DESC, Month) rn
FROM yourTable
) t
WHERE rn = 1
ORDER BY Name;
If you instead want to return multiple records per name, in the case where two or more records might be tied for having the greatest hike%, then replace ROW_NUMBER with RANK.
use correlated subquery
select Name,min(Hike) as Hike,min(Month) as Month
from
(
select * from tablename a
where Hike in (select max(Hike) from tablename b where a.name=b.name)
)A group by Name
You can use something similar to the below:
SELECT Name, MAX(Hike), Month
FROM table
GROUP BY Name, Month
Hope this helps :)

Sum of Distinct count in Oracle

I have the following table:
CUST_PRODUCT_DTL
Cust_ID Product_ID QTY
1 10 5
2 10 2
3 10 5
1 11 5
2 12 1
How can I get Total Distinct CUST_ID, TOTAL DISTINCT PRODUCT_ID from the above table in Oracle 11 G
The below one doesn't work
SELECT SUM(COUNT(DISTINCT cust_id)), product_id
FROM CUST_PRODUCT_DTL
WHERE
GROUP BY product_id , cust_id
The desired result I am looking at is
Total Unique Cust_id: 3
Total Unique Product_id:3
sum is not involved, nor do you need the group by. Your desired output contains only one row. You just want two count distincts:
select count(distinct cust_id) as total_distinct_cust_id,
count(distinct product_id) as total_distinct_prod_id
from cust_product_dtl

SQL - Overall average Points

I have a table like this:
[challenge_log]
User_id | challenge | Try | Points
==============================================
1 1 1 5
1 1 2 8
1 1 3 10
1 2 1 5
1 2 2 8
2 1 1 5
2 2 1 8
2 2 2 10
I want the overall average points. To do so, i believe i need 3 steps:
Step 1 - Get the MAX value (of points) of each user in each challenge:
User_id | challenge | Points
===================================
1 1 10
1 2 8
2 1 5
2 2 10
Step 2 - SUM all the MAX values of one user
User_id | Points
===================
1 18
2 15
Step 3 - The average
AVG = SUM (Points from step 2) / number of users = 16.5
Can you help me find a query for this?
You can get the overall average by dividing the total number of points by the number of distinct users. However, you need the maximum per challenge, so the sum is a bit more complicated. One way is with a subquery:
select sum(Points) / count(distinct userid)
from (select userid, challenge, max(Points) as Points
from challenge_log
group by userid, challenge
) cl;
You can also do this with one level of aggregation, by finding the maximum in the where clause:
select sum(Points) / count(distinct userid)
from challenge_log cl
where not exists (select 1
from challenge_log cl2
where cl2.userid = cl.userid and
cl2.challenge = cl.challenge and
cl2.points > cl.points
);
Try these on for size.
Overall Mean
select avg( Points ) as mean_score
from challenge_log
Per-Challenge Mean
select challenge ,
avg( Points ) as mean_score
from challenge_log
group by challenge
If you want to compute the mean of each users highest score per challenge, you're not exactly raising the level of complexity very much:
Overall Mean
select avg( high_score )
from ( select user_id ,
challenge ,
max( Points ) as high_score
from challenge_log
) t
Per-Challenge Mean
select challenge ,
avg( high_score )
from ( select user_id ,
challenge ,
max( Points ) as high_score
from challenge_log
) t
group by challenge
After step 1 do
SELECT USER_ID, AVG(POINTS)
FROM STEP1
GROUP BY USER_ID
You can combine step 1 and 2 into a single query/subquery as follows:
Select BestShot.[User_ID], AVG(cast (BestShot.MostPoints as money))
from (select tLog.Challenge, tLog.[User_ID], MostPoints = max(tLog.points)
from dbo.tmp_Challenge_Log tLog
Group by tLog.User_ID, tLog.Challenge
) BestShot
Group by BestShot.User_ID
The subquery determines the most points for each user/challenge combo, and the outer query takes these max values and uses the AVG function to return the average value of them. The last Group By tells SQL to average all the values across each User_ID.

How can I find consecutive active weeks in SQL?

What I would like to do is find the number of consecutive weeks that someone is active on Sundays and assign them a value. They have to participate in at least 2 races a day to be counted as active for the week.
If they are active for 2 consecutive weeks I would like to assign a value of 100, 3 consecutive weeks a value of 200, 4 consecutive weeks a value of 300, and continuing up to 9 consecutive weeks.
My difficulty is not determining consecutive weeks, but breaks in between consecutive dates. Suppose the following dataset:
CustomerID RaceDate Races
1 2/2/2014 2
1 2/9/2014 5
1 2/16/2014 3
1 2/23/2014 3
1 3/2/2014 4
1 3/9/2014 3
1 3/16/2014 3
2 2/2/2014 2
2 2/9/2014 3
2 3/2/2014 2
2 3/9/2014 4
2 3/16/2014 3
CustomerID 1 would have 7 consecutive weeks for a value of 600.
The hard part for me is CustomerID 2. They would have 2 consecutive weeks AND 3 consecutive weeks. So their total value would be 100 + 200 = 300.
I would like to be able to do this with any different combination of consecutive weeks.
Any help please?
EDIT: I am using SQL Server 2008 R2.
When looking for sequential values, there is a simple observation that helps. If you subtract a sequence from the dates then the value is a constant. You can use this as a grouping mechanism
select CustomerId, min(RaceDate) as seqStart, max(RaceDate) as seqEnd,
count(*) as NumDaysRaced
from (select t.*,
dateadd(week, - row_number() over (partition by customerID, RaceDate),
RaceDate) as grp
from table t
where races >= 2
) t
group by CustomerId, grp;
You can then use this to get your final "points":
select CustomerId,
sum(case when NumDaysRaced > 1 then (NumDaysRaced - 1) * 100 else 0 end) as Points
from (select CustomerId, min(RaceDate) as seqStart, max(RaceDate) as seqEnd,
count(*) as NumDaysRaced
from (select t.*,
dateadd(week, - row_number() over (partition by customerID, RaceDate),
RaceDate) as grp
from table t
where races >= 2
) t
group by CustomerId, grp
) c
group by CustomerId;