Grouping data on SQL Server - sql

I have this table in SQL Server:
Year Month Quantity
----------------------------
2015 January 10
2015 February 20
2015 March 30
2014 November 40
2014 August 50
How can I identify the different years and months adding two more columns that group the same years with a number and then different months in sequential way like the example
Year Month Quantity Group Subgroup
------------------------------------------------
2015 January 10 1 1
2015 February 20 1 2
2015 March 30 1 3
2014 November 40 2 1
2014 August 50 2 2

You can use DENSE_RANK to calculate the groups for you:
SELECT t1.*, DENSE_RANK() OVER (ORDER BY Year DESC) AS [Group],
DENSE_RANK() OVER (PARTITION BY Year ORDER BY DATEPART(month, Month + ' 01 2010')) AS [SubGroup]
FROM t1
ORDER BY 4, 5
See this fiddle.

To associate group and subgroup with a number you can do this:
WITH RankedTable AS (
SELECT year, month, quantity,
ROW_NUMBER() OVER (partition by year order by Month) AS rn
FROM yourtable)
SELECT year, month, quantity,
SUM (CASE WHEN rn = 1 THEN 1 ELSE 0 END) OVER (ORDER BY YEAR) as year_group,
rn AS subgroup
FROM RankedTable
Here ROW_NUMBER() OVER clause calculates rank of a month within a year.
And SUM() ... OVER calculates running SUM for the months with rank 1.
SQL Fiddle

Related

Get Last Value of previous row partition in SQL

In my data set, each customer has some orders on different dates.
For each customer each month, I want to check his/her last order in the previous month in which city.
For example, it is my data for one of the customers.
customer
year
month
day
order id
city id
1544
2022
2
6
413
9
1544
2022
2
17
39
10
1544
2022
3
5
115
21
1544
2022
5
29
2153
4
1544
2022
5
30
955
9
the result should be the same as this:
customer
year
month
city of last order of prev month(prevCity)
1544
2022
2
null or 9
1544
2022
3
10
1544
2022
5
21
(the first row of the above table is not my question now. )
I write my query using last_value the same as this:
select customer,
year,
month,
last_value(City) over (partition by customer, year, month order by created_at desc
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as prevCity
from table1
but the result is false!
How can I correct this?
Using the window function lag() over() in concert with the WITH TIES clause
Select top 1 with ties
customer
,year
,month
,LastCityID = lag([city id],1) over (partition by customer order by year, month,day)
From YourTable
order by row_number() over (partition by customer,year,month order by year, month,day)
Or an Nudge More Perforamt
with cte as (
Select *
,LastCityID = lag([city id],1) over (partition by customer order by year, month,day)
,RN = row_number() over (partition by customer,year,month order by year, month,day)
From YourTable
)
Select customer
,year
,month
,LastCityID
From cte
Where RN =1
Results
customer year month LastCityID
1544 2022 2 NULL
1544 2022 3 10
1544 2022 5 21

Cumulative sum of previous rows for each partition

I want to calculate the cumulative sum of monthly orders for each customer in my database.
For example, I have this data:
customer
year
month
no_orders
1544
2022
4
5
1544
2022
4
1
1544
2022
12
1
1544
2023
1
3
And the result should be the same as below:
customer
year
month
cumulative no_orders
1544
2022
4
0
1544
2022
12
6
1544
2023
1
7
I used lag() and in the next step, sum() over () but my result was false!
How can I solve this problem?
As #Larnu advises in the comments
Seems like you need to do several steps here. Aggregate (and group)
into months first, and then use a cumulative SUM but have the window
not include the current row.
Some SQL to implement this idea is below (DB FIDDLE)
SELECT customer,
year,
month,
cumulative_no_orders = ISNULL(SUM(SUM(no_orders))
OVER (
PARTITION BY customer
ORDER BY year, month
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
, 0)
FROM YourTable
GROUP BY customer,
year,
month
It first does the aggregation
SELECT customer,
year,
month,
sum_no_orders = SUM(no_orders)
FROM YourTable
GROUP BY customer, year, month
to return the following
customer
year
month
sum_no_orders
1544
2022
4
6
1544
2022
12
1
1544
2023
1
3
and then calculates the running total of sum_no_orders from previous rows on top of that.
Can you try this,
SELECT
customer
, year
, month
, ISNULL(LAG(no_orders) OVER (PARTITION BY customer ORDER BY customer, year, month),0) Cum_Orders
FROM (
SELECT
DISTINCT customer, year, month
, SUM(no_orders) OVER (PARTITION BY customer ORDER BY customer, year, month) no_orders
FROM ABC
) a

Selecting records that have low numbers consecutively

I have a table as following (using bigquery):
id
year
month
day
rating
111
2020
11
30
4
111
2020
12
01
4
112
2020
11
30
5
113
2020
11
30
5
Is there a way in which I can select ids that have ratings that are consecutively (two or more consecutive records) low (low as in both records' ratings less than 4.5)?
For example, my desired output is:
id
year
month
day
rating
111
2020
11
30
4
111
2020
12
01
4
If you want all rows, then you need to look at both the previous rating and the next rating:
SELECT t.*
FROM (SELECT t.*,
LAG(rating) OVER (PARTITION BY id ORDER BY year, month, day ASC) AS prev_rating,
LEAD(rating) OVER (PARTITION BY id ORDER BY year, month, day ASC) AS next_rating,
FROM dataset.table t
) t
WHERE (rating < 4.5 and prev_rating < 4.5) OR
(rating < 4.5 and next_rating < 4.5)
Below is for BigQuery Standard SQL
select * except(grp, seq_len)
from (
select *, sum(1) over(partition by grp) seq_len
from (
select *,
countif(rating >= 4.5) over(partition by id order by year, month, day) grp
from `project.dataset.table`
)
where rating < 4.5
)
where seq_len > 1

SQL - use only clients that are present in all months

I have a dataset with different clients, and their sales count. Over time, some clients get added and deleted from the data. How do I make sure that when I look at the sales counts, that I am only using a selection of the clients that were in the data set all the time? Ie if I have a client that doesn't have a record for 2018-03, then I don't want that client to be part of the entire query. If a clients does not have a record in 2020-03, then I also do not want this client to be part of the entire query.
For example, the following query:
select DATE_PART (y, sold_date)as year, DATE_PART (mm, sold_date) as month, count(distinct(client))
from sales_data
where sold_date > '2018-01-01'
group by year, month
order by year,month
Yields
year month count
2018 1 78
2018 2 83
2018 3 80
2018 4 83
2018 5 84
2018 6 81
2018 7 83
2018 8 90
2018 9 89
2018 10 95
2018 11 94
2018 12 97
2019 1 102
2019 2 103
2019 3 102
2019 4 105
2019 5 103
2019 6 104
2019 7 104
2019 8 106
2019 9 106
2019 10 108
2019 11 109
2019 12 104
2020 1 104
2020 2 102
2020 3 103
2020 4 98
2020 5 97
2020 6 79
So I want to only use the clients that are in all months, they should not be more than 78, because there can not be more users than the minimal month (2018-1).
FYI, I am using Amazon Redshift here but I am OK with a query that's rdbms agnostic or works for SQL-Server/Oracle/MySQL/PostgreSQL, I am just interested in a pattern on how to solve this issue effectively.
If I'm understanding what you want correctly, and if this is just a one-off query, you could use a correlated subquery in the where clause:
SELECT
DATE_PART(y, s.sold_date) AS year,
DATE_PART(mm, s.sold_date) AS month,
COUNT(DISTINCT s.client)
FROM
sales_data AS s
WHERE
EXISTS (
SELECT sd.client FROM sales_data AS sd WHERE DATE_PART(y,
sd.sold_date) = 2018 AND DATE_PART(mm, sd.sold_date) = 1 AND
sd.client = s.client
) AND
s.sold_date > '2018-01-01'
GROUP BY
year,
month
ORDER
DATE_PART(y, s.sold_date),
DATE_PART(mm, s.sold_date)
presence in all months can be done with 2-step aggregation:
group sales data by customer ID having all months
group sales data joined to (1) by year, month
like this (=12 can be a dynamic expression, depending on the amount of history you have)
with
stable_customers as (
select customer_id
from sales_data
group by 1
having count(distinct date_trunc('month' from sold_date)=12
)
select
DATE_PART (y, sold_date) as year
,DATE_PART (mm, sold_date) as month,
,count(1)
from sales_date
join stable_customers
using (customer_id)
where sold_date > '2018-01-01'
group by year, month
order by year,month
Use window functions. Unfortunately, SQL Server does not support count(distinct) as a window function. Fortunately, there is a simple work-around using dense_rank():
select year, month, count(distinct client)
from (select sd.*, year, month,
(dense_rank() over (order by year, month) +
dense_rank() over (order by year desc, month desc)
) as num_months,
(dense_rank() over (partition by client order by year, month) +
dense_rank() over (partition by client order by year desc, month desc)
) as num_months_client
from sales_data sd cross apply
(values (year(sold_date), month(sold_date))) v(year, month)
where sd.sold_date > '2018-01-01'
) sd
where num_months_client = num_months
group by year, month
order by year, month;
Note: This looks at all months that are in the data. If all clients are missing 2019-03, then that months is not considered at all.

select best attribute of a row SQL oracle

YEAR MONTH BALANCE SSN
2016 1 3175 34/1043/03T
2016 1 2984 93/1194/07T
2016 1 2269 39/3149/00T
2015 12 3172 36/1011/03T
2015 12 2984 22/1224/07T
2015 12 2169 12/3143/00T
For example I have this table, but I have rows for each month of each year, and I have to choose the best ssn and balance of each month of each year. For example, here, I would like obtain this on my query:
YEAR MONTH BALANCE SSN
2016 1 3175 34/1043/03T
2015 12 3172 36/1011/03T
What can I do?
You can do this in several ways. A very Oracle'ish way is to use keep:
select year, month,
max(balance) as balance,
max(SSN) keep (dense_rank first order by balance desc) as ssn
from t
group by year, month;
Like most DBMSes Oracle supports ROW_NUMBER/RANK:
select *
from
(
select year, month, balance, SSN,
row_number()
over (partition by year, month
order by balance desc) as rn
from tab
) dt
where rn = 1