Count rows within each group when condition is satisfied Sql Server

Count rows within each group when condition is satisfied Sql Server - sql

I have at table which looks like below:
ID Date IsFull
1 2020-01-05 0
1 2020-02-05 0
1 2020-02-25 1
1 2020-03-01 1
1 2020-03-20 1
I want to display how many months for ID = 1
have sum(isfull)/count(*) > .6 in a given month (More than 60% of the times in that month isfull = 1)
So the final output should
ID HowManyMonths
1 1 --------(Only month 3----2 out 2 cases)
If the question changes to sum(isfull)/count(*) > .4
then the final output should be
ID HowManyMonths
1 2 --------(Month 2 and Month 3)
Thanks!!

You can do this with two levels of aggregation:
select id, count(*) howManyMonths
from (
select id
from mytable
group by id, year(date), month(date)
having avg(1.0 * isFull) > 0.6
) t
group by id
The subquery aggregates by id, year and month, and uses a having clause to filter on groups that meet the success rate (avg() comes handy for this). The outer query counts how many month passed the target rate for each id.

Related

GBQ SQL: How to find first instance of X value and pull a corresponding row

I have a table that records the history of each ID per LOCATION. This table is updated each day to keep track of the history of any change in a certain row(ID). Note: The date field is not in chronological order.
ID Count Date (datetime type)
1 20 2020-01-15T12:00:00.000
1 16 2020-03-15T12:00:00.000
1 13 2020-04-15T12:00:00.000
1 4 2020-05-15T12:00:00.000
1 0 2020-06-15T12:00:00.000
2 20 2020-01-15T12:00:00.000
2 10 2020-02-15T12:00:00.000
3 12 2020-01-15T12:00:00.000
3 10 2020-02-15T12:00:00.000
3 0 2020-03-15T12:00:00.000
For each unique ID, I need to pull the first instance (oldest date) when the Count value is zero. If a unique ID does not have an instance where it Count value is zero, I need to pull the most current Count value.
Here's what my results should look like below:
ID Count Date (datetime type)
1 0 2020-06-15T12:00:00.000
2 10 2020-02-15T12:00:00.000
3 0 2020-03-15T12:00:00.000
I can't seem to wrap my head around how to code this in Google BigQuery.

Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE
CASE COUNTIF(count = 0)
WHEN 0 THEN ARRAY_AGG(t ORDER BY date DESC LIMIT 1)[OFFSET(0)]
ELSE ARRAY_AGG(t ORDER BY count, date LIMIT 1)[OFFSET(0)]
END
FROM `project.dataset.table` t
GROUP BY id
if to apply to sample data in your question - output is
Row id count date
1 1 0 2020-05-15 12:00:00 UTC
2 2 10 2020-03-15 12:00:00 UTC
3 3 0 2020-06-15 12:00:00 UTC

Do you just want the last row for each id?
One method is row_number():
select t.*
from (select t.*,
row_number() over (partition by id
order by case when count = 0 then date end nulls last,
date desc
) as seqnum
from t
) t
where seqnum = 1;
But I also like using aggregation in BigQuery:
select (array_agg(t order by date desc limit 1))[ordinal(1)]
from t
group by id;

Use aggregation only on rows where count(ID) is greater than one

Hi I have the following table
Cash_table
ID Cash Rates
1 50 3
2 100 4
3 70 10
3 60 10
4 13 7
5 20 8
5 10 10
6 10 5
What I want as a result is to cumulate all the entries that have a Count(id)>1 like this:
ID New_Cash New_Rates
1 50 3
2 100 4
3 (70+60)/(10+10) 10+10
4 13 7
5 (20+10)/(8+10) 8+10
6 10 5
So I only want to change the rows where Count(id)>1 and leave the rest like it was.
For the rows with count(id)>1 I want to sum up the rates and take the sum of the cash and divide it by the sum of the rates. The Rates alone aren't a problem since I can sum them up and group by id and get the desired result.
The problem is with the cash column:
I am trying to do it with a case statement but it isn't working:
select id, sum(rates) as new_rates, case
when count(id)>1 then sum(cash)/nullif(sum(rates),0))
else cash
end as new_cash
from Cash_table
group by id

You only need group by id and aggregate:
select
id,
sum(cash) / (case count(*) when 1 then 1 else sum(rates) end) as new_cash,
sum(rates) as new_rates
from Cash_table
group by id
order by id
See the demo.

You can aggregate rate and cash columns by sum() function with grouping by id
select
id,
sum(cash)/decode( sum( nvl(rates,0) ), 0 ,1, sum( nvl(rates,0) )) as new_cash,
sum(rates) as new_rates
from cash_table
group by id
there's no nullif() function in Oracle, use nvl() instead
switch case part ( where decode() function is used ) against the
possibility of division by zero

SQL - Overall average Points

I have a table like this:
[challenge_log]
User_id | challenge | Try | Points
==============================================
1 1 1 5
1 1 2 8
1 1 3 10
1 2 1 5
1 2 2 8
2 1 1 5
2 2 1 8
2 2 2 10
I want the overall average points. To do so, i believe i need 3 steps:
Step 1 - Get the MAX value (of points) of each user in each challenge:
User_id | challenge | Points
===================================
1 1 10
1 2 8
2 1 5
2 2 10
Step 2 - SUM all the MAX values of one user
User_id | Points
===================
1 18
2 15
Step 3 - The average
AVG = SUM (Points from step 2) / number of users = 16.5
Can you help me find a query for this?

You can get the overall average by dividing the total number of points by the number of distinct users. However, you need the maximum per challenge, so the sum is a bit more complicated. One way is with a subquery:
select sum(Points) / count(distinct userid)
from (select userid, challenge, max(Points) as Points
from challenge_log
group by userid, challenge
) cl;
You can also do this with one level of aggregation, by finding the maximum in the where clause:
select sum(Points) / count(distinct userid)
from challenge_log cl
where not exists (select 1
from challenge_log cl2
where cl2.userid = cl.userid and
cl2.challenge = cl.challenge and
cl2.points > cl.points
);

Try these on for size.
Overall Mean
select avg( Points ) as mean_score
from challenge_log
Per-Challenge Mean
select challenge ,
avg( Points ) as mean_score
from challenge_log
group by challenge
If you want to compute the mean of each users highest score per challenge, you're not exactly raising the level of complexity very much:
Overall Mean
select avg( high_score )
from ( select user_id ,
challenge ,
max( Points ) as high_score
from challenge_log
) t
Per-Challenge Mean
select challenge ,
avg( high_score )
from ( select user_id ,
challenge ,
max( Points ) as high_score
from challenge_log
) t
group by challenge

After step 1 do
SELECT USER_ID, AVG(POINTS)
FROM STEP1
GROUP BY USER_ID

You can combine step 1 and 2 into a single query/subquery as follows:
Select BestShot.[User_ID], AVG(cast (BestShot.MostPoints as money))
from (select tLog.Challenge, tLog.[User_ID], MostPoints = max(tLog.points)
from dbo.tmp_Challenge_Log tLog
Group by tLog.User_ID, tLog.Challenge
) BestShot
Group by BestShot.User_ID
The subquery determines the most points for each user/challenge combo, and the outer query takes these max values and uses the AVG function to return the average value of them. The last Group By tells SQL to average all the values across each User_ID.

How can I find consecutive active weeks in SQL?

What I would like to do is find the number of consecutive weeks that someone is active on Sundays and assign them a value. They have to participate in at least 2 races a day to be counted as active for the week.
If they are active for 2 consecutive weeks I would like to assign a value of 100, 3 consecutive weeks a value of 200, 4 consecutive weeks a value of 300, and continuing up to 9 consecutive weeks.
My difficulty is not determining consecutive weeks, but breaks in between consecutive dates. Suppose the following dataset:
CustomerID RaceDate Races
1 2/2/2014 2
1 2/9/2014 5
1 2/16/2014 3
1 2/23/2014 3
1 3/2/2014 4
1 3/9/2014 3
1 3/16/2014 3
2 2/2/2014 2
2 2/9/2014 3
2 3/2/2014 2
2 3/9/2014 4
2 3/16/2014 3
CustomerID 1 would have 7 consecutive weeks for a value of 600.
The hard part for me is CustomerID 2. They would have 2 consecutive weeks AND 3 consecutive weeks. So their total value would be 100 + 200 = 300.
I would like to be able to do this with any different combination of consecutive weeks.
Any help please?
EDIT: I am using SQL Server 2008 R2.

When looking for sequential values, there is a simple observation that helps. If you subtract a sequence from the dates then the value is a constant. You can use this as a grouping mechanism
select CustomerId, min(RaceDate) as seqStart, max(RaceDate) as seqEnd,
count(*) as NumDaysRaced
from (select t.*,
dateadd(week, - row_number() over (partition by customerID, RaceDate),
RaceDate) as grp
from table t
where races >= 2
) t
group by CustomerId, grp;
You can then use this to get your final "points":
select CustomerId,
sum(case when NumDaysRaced > 1 then (NumDaysRaced - 1) * 100 else 0 end) as Points
from (select CustomerId, min(RaceDate) as seqStart, max(RaceDate) as seqEnd,
count(*) as NumDaysRaced
from (select t.*,
dateadd(week, - row_number() over (partition by customerID, RaceDate),
RaceDate) as grp
from table t
where races >= 2
) t
group by CustomerId, grp
) c
group by CustomerId;

SQL query to return rows in multiple groups

I have a SQL table with data in the following format:
REF FIRSTMONTH NoMONTHS VALUE
--------------------------------
1 2 1 100
2 4 2 240
3 5 4 200
This shows a quoted value which should be delivered starting on the FIRSTMONTH and split over NoMONTHS
I want to calculate the SUM for each month of the potential deliveries from the quoted values.
As such I need to return the following result from a SQL server query:
MONTH TOTAL
------------
2 100 <- should be all of REF=1
4 120 <- should be half of REF=2
5 170 <- should be half of REF=2 and quarter of REF=3
6 50 <- should be quarter of REF=3
7 50 <- should be quarter of REF=3
8 50 <- should be quarter of REF=3
How can I do this?

You are trying extract data from what should be a many to many relationship.
You need 3 tables. You should be able to write a JOIN or GROUP BY select statement from there. The tables below don't use the same data values as yours, and are merely intended for a structural example.
**Month**
REF Month Value
---------------------
1 2 100
2 3 120
etc.
**MonthGroup**
REF
---
1
2
**MonthsToMonthGroups**
MonthREF MonthGroupREF
------------------
1 1
2 2
2 3

The first part of this query gets a set of numbers between the start and the end of the valid values
The second part takes each month value, and divides it into the monthly amount
Then it is simply a case of grouping each month, and adding up all of the monthly amounts.
select
number as month, sum(amount)
from
(
select number
from master..spt_values
where type='p'
and number between (select min(firstmonth) from yourtable)
and (select max(firstmonth+nomonths-1) from yourtable)
) numbers
inner join
(select
firstmonth,
firstmonth+nomonths-1 as lastmonth,
value / nomonths as amount
from yourtable) monthly
on numbers.number between firstmonth and lastmonth
group by number

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Count rows within each group when condition is satisfied Sql Server - sql

Related

GBQ SQL: How to find first instance of X value and pull a corresponding row

Use aggregation only on rows where count(ID) is greater than one

SQL - Overall average Points

How can I find consecutive active weeks in SQL?

SQL query to return rows in multiple groups

Categories

Resources