SQL Ranking Groups Ordered by Date Before Ranking Rule Is Applied - sql

I'm trying to get a rank over groups of integers when ordered by date. Some of the groups will have the same value but be separated by other groups. For this reason I can't use DENSE_RANK as is puts the integers of the same value together. The values of 10 below would belong to the same group in DENSE_RANK, I would like them to be in ranked group 2 & 4. Thanks for any help.
| ID | Date | IntValue | DesiredRankResult |
| 1 | 01 Jan | 10 | 4 |
| 1 | 02 Jan | 10 | 4 |
| 1 | 03 Jan | 20 | 3 |
| 1 | 04 Jan | 20 | 3 |
| 1 | 05 Jan | 10 | 2 |
| 1 | 06 Jan | 10 | 2 |
| 1 | 07 Jan | 30 | 1 |

You can do this with using lead() and a cumulative sum. I think it looks like this:
select t.*,
sum(case when next_intval = intval then 0 else 1 end) over (partition by id order by date desc) as DesiredRankResult
from (select t.*,
lead(intval) over (partition by id order by date) as next_intval
from t
) t;

Related

SQL Find max no of consecutive months over a period of last 12 Months

I am trying to write a query in sql where I need to find the max no. of consecutive months over a period of last 12 months excluding June and July.
so for example I have an initial table as follows
+---------+--------------+-----------+------------+
| id | Payment | amount | Date |
+---------+--------------+-----------+------------+
| 1 | CJ1 | 70000 | 11/3/2020 |
| 1 | 1B4 | 36314000 | 12/1/2020 |
| 1 | I21 | 119439000 | 1/12/2021 |
| 1 | 0QO | 9362100 | 2/2/2021 |
| 1 | 1G0 | 140431000 | 2/23/2021 |
| 1 | 1G | 9362100 | 3/2/2021 |
| 1 | g5d | 9362100 | 4/6/2021 |
| 1 | rt5s | 13182500 | 4/13/2021 |
| 1 | fgs5 | 48598 | 5/18/2021 |
| 1 | sd8 | 42155 | 5/25/2021 |
| 1 | wqe8 | 47822355 | 7/20/2021 |
| 1 | cbg8 | 4589721 | 7/27/2021 |
| 1 | jlk8 | 4589721 | 8/3/2021 |
| 1 | cxn9 | 4589721 | 10/5/2021 |
| 1 | qwe | 45897210 | 11/9/2021 |
| 1 | mmm | 45897210 | 12/16/2021 |
+---------+--------------+-----------+------------+
I have written below query:
SELECT customer_number, year, month,
payment_month - lag(payment_month) OVER(partition by customer_number ORDER BY year, month) as previous_month_indicator,
FROM
(
SELECT DISTINCT Month(date) as month, Year(date) as year, CUSTOMER_NUMBER
FROM Table1
WHERE Month(date) not in (6,7)
and TO_DATE(date,'yyyy-MM-dd') >= DATE_SUB('2021-12-31', 425)
and customer_number = 1
) As C
and I get this output
+-----------------+------+-------+--------------------------+
| customer_number | year | month | previous_month_indicator |
+-----------------+------+-------+--------------------------+
| 1 | 2020 | 11 | null |
| 1 | 2020 | 12 | 1 |
| 1 | 2021 | 1 | -11 |
| 1 | 2021 | 2 | 1 |
| 1 | 2021 | 3 | 1 |
| 1 | 2021 | 4 | 1 |
| 1 | 2021 | 5 | 1 |
| 1 | 2021 | 8 | 3 |
| 1 | 2021 | 10 | 2 |
| 1 | 2021 | 11 | 1 |
+-----------------+------+-------+--------------------------+
What I want is to get a view like this
Expected output
+-----------------+------+-------+--------------------------+
| customer_number | year | month | previous_month_indicator |
+-----------------+------+-------+--------------------------+
| 1 | 2020 | 11 | 1 |
| 1 | 2020 | 12 | 1 |
| 1 | 2021 | 1 | 1 |
| 1 | 2021 | 2 | 1 |
| 1 | 2021 | 3 | 1 |
| 1 | 2021 | 4 | 1 |
| 1 | 2021 | 5 | 1 |
| 1 | 2021 | 8 | 1 |
| 1 | 2021 | 9 | 0 |
| 1 | 2021 | 10 | 1 |
| 1 | 2021 | 11 | 1 |
+-----------------+------+-------+--------------------------+
As June/July does not matter, after May, August should be considered as consecutive month, and since in September there was no record it appears as 0 and breaks the consecutive months chain.
My final desired output is to get the max no of consecutive months in which transactions were made which in above case is 8 from Nov-2020 to Aug-2021
Final Desired Output:
+-----------------+-------------------------+
| customer_number | Max_consecutive_months |
+-----------------+-------------------------+
| 1 | 8 |
+-----------------+-------------------------+
CTEs can break this down a little easier. In the code below, the payment_streak CTE is the key bit; the start_of_streak field is first marking rows that count as the start of a streak, and then taking the maximum over all previous rows (to find the start of this streak).
The last SELECT is only comparing these two dates, computing how many months are between them (excluding June/July), and then finding the best streak per customer.
WITH payments_in_context AS (
SELECT customer_number,
date,
lag(date) OVER (PARTITION BY customer_number ORDER BY date) AS prev_date
FROM Table1
WHERE EXTRACT(month FROM date) NOT IN (6,7)
),
payment_streak AS (
SELECT
customer_number,
date,
max(
CASE WHEN (prev_date IS NULL)
OR (EXTRACT(month FROM date) <> 8
AND (date - prev_date >= 62
OR MOD(12 + EXTRACT(month FROM date) - EXTRACT(month FROM prev_date),12)) > 1))
OR (EXTRACT(month FROM date) = 8
AND (date - prev_date >= 123
OR EXTRACT(month FROM prev_date) NOT IN (5,8)))
THEN date END
) OVER (PARTITION BY customer_number ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
as start_of_streak
FROM payments_in_context
)
SELECT customer_number,
max( 1 +
10*(EXTRACT(year FROM date) - EXTRACT(year FROM start_of_streak))
+ (EXTRACT(month FROM date) - EXTRACT(month FROM start_of_streak))
+ CASE WHEN (EXTRACT(month FROM date) > 7 AND EXTRACT(month FROM start_of_streak) < 6)
THEN -2
WHEN (EXTRACT(month FROM date) < 6 AND EXTRACT(month FROM start_of_streak) > 7)
THEN 2
ELSE 0 END
) AS max_consecutive_months
FROM payment_streak
GROUP BY 1;
You can use a recursive cte to generate all the dates in the twelve month timespan for each customer id, and then find the maximum number of consecutive dates excluding June and July in that interval:
with recursive cte(id, m, c) as (
select cust_id, min(date), 1 from payments group by cust_id
union all
select c.id, c.m + interval 1 month, c.c+1 from cte c where c.c <= 12
),
dts(id, m, f) as (
select c.id, c.m, c.c = 1 or exists
(select 1 from payments p where p.cust_id = c.id and extract(month from p.date) = extract(month from (c.m - interval 1 month)) and extract(year from p.date) = extract(year from (c.m - interval 1 month)))
from cte c where extract(month from c.m) not in (6,7)
),
result(id, f, c) as (
select d.id, d.f, (select sum(d.id = d1.id and d1.m < d.m and d1.f = 0)+1 from dts d1)
from dts d where d.f != 0
)
select r1.id, max(r1.s)-1 from (select r.id, r.c, sum(r.f) s from result r group by r.id, r.c) r1 group by r1.id

How to get count of particular column value from total number of records and display difference in two different columns in SQL Server

I am trying to get difference between total records and a column (Is_Registered) to get Month wise matrics of how many registered in particular month and how many are pending
Actual Data
| Inserted On | IsRegistered |
+-------------+--------------+
| 10-01-2020 | 1 |
| 15-01-2020 | 1 |
| 17-01-2020 | null |
| 17-02-2020 | 1 |
| 21-02-2020 | null |
| 04-04-2020 | null |
| 18-04-2020 | null |
| 19-04-2020 | 1 |
Expected Output -As shown in actual data, out of 8 users(records) 2 are registered in Jan and 6 are not ,in February total 3 are registered i.e. Jan's 2 + Feb's 1 and 5 are not and so on
| Year | Month | Registered | Not Registered |
| -------- | -------------- | ----------- | -------------- |
| 2020 | January | 2 | 6 |
| 2020 | Feb | 3 | 5 |
| 2020 | April | 4 | 4 |
But when a new record is added with new month then it should not update previous output result e.g. After adding new record with month as May and IsReg as NULL the value for Not_Registered should be as mentioned below because the new record is added in new month.
| Year | Month | Registered | Not Registered |
| -------- | -------------- | ----------- | -------------- |
| 2020 | January | 2 | 6 |
| 2020 | Feb | 3 | 5 |
| 2020 | April | 4 | 4 |
| 2020 | May | 4 | 5 |
And if the new record has month as May and Is_Registered as 1(true) then the output should be as follows
| Year | Month | Registered | Not Registered |
| -------- | -------------- | ----------- | -------------- |
| 2020 | January | 2 | 6 |
| 2020 | Feb | 3 | 5 |
| 2020 | April | 4 | 4 |
| 2020 | May | 5 | 4 |
I managed to write a query but didn't got expected output, what changes I'll have to make in order to get expected output
select year(dateinserted) as [Year], datename(month,dateinserted) as [Month],
coalesce(sum(cast(isregistered as int)), 0) as Authenticated,
sum(case when isregistered is null then 1 else 0 end) as UnAuthenticated
from table_name where IsRegistered is not null
group by year(dateinserted), datename(month,dateinserted)
order by year(dateinserted), month(min(dateinserted));
Output I got after executing above query -
| Year | Month | Registered | Not Registered |
| -------- | -------------- | ----------- | -------------- |
| 2020 | January | 2 | 1 |
| 2020 | Feb | 1 | 1 |
| 2020 | April | 1 | 2 |
Hmmm . . . You seem to want a cumulative sum of the counts (which are 1 or NULL, so count() works). For the second column, then difference between that and the total number of rows:
select year(dateinserted) as [Year],
datename(month, dateinserted) as [Month],
count(isregistered) as registered_in_month,
sum(count(isregistered)) over (order by min(dateinserted)) as registered_up_to_month,
sum(count(*)) over () - sum(count(isregistered)) over (order by min(dateinserted)) as not_yet_registered
from table_name
group by year(dateinserted), datename(month, dateinserted)
order by year(dateinserted), month(min(dateinserted));
Here is a db<>fiddle.
You should use self join and analytical function as follows:
Select year(t.inserted_on) as yr,
datename(month, t.dateinserted) as mnth,
Sum(count(t.is_registered))
over (order by min(t.inserted_on)) as resistered,
Tt.cnt - Sum(count(t.is_registered))
over (order by min(t.inserted_on)) as not_registered
From your_table t
Join (select t.*,
Count(*) over () as cnt
From your_table t) tt on t.inserted_on = tt.inserted_on
group by year(t.dateinserted), datename(month, t.dateinserted), tt.cnt
order by year(t.dateinserted), month(min(t.dateinserted));

If column value is 1 then get same value for the following date

I have data like table A. I try to write a SQL code to change the Flag column value to 1 wherever the pervious month (column Date) Flag value is 1 for each ID.
**Table A**
ID | Date | Flag |
+--------+-----------+------+
| 1 | Jan 20 | 1 |
| 1 | Feb 20 | 0 |
| 1 | Mar 20 | 0 |
| 2 | Jan 20 | 0 |
| 2 | Feb 20 | 1 |
| 2 | Mar 20 | 0 |
+--------+-----------+------+
I want results like this:
ID | Date | Flag |
+--------+-----------+------+
| 1 | Jan 20 | 1 |
| 1 | Feb 20 | 1 |
| 1 | Mar 20 | 0 |
| 2 | Jan 20 | 0 |
| 2 | Feb 20 | 1 |
| 2 | Mar 20 | 1 |
+--------+-----------+------+
I'd really appreciated if someone can help me.
In a select you can do:
select t.*,
(case when flag = 0 and lag(flag) over (partition by id order by date) = 1
then 1
else flag
end) as imputed_flag
from t;
If you want to update the flag, you can use:
with toupdate as (
select t.*,
lag(flag) over (partition by id order by date) as prev_flag
from t
)
update toupdate
set flag = 1
where prev_flag = 1;
If you want to update the record then use exists as follows:
Update your_table t
Set t.flag = 1
Where t.flag = 0
And exists
(Select 1 from your_table tt
Where t.id = tt.id
And DATEADD(month, -1, tt.date) = t.date
And tt.flag = 1)

Create a temp table with multiple conditions

I'm struggling with creating a temporary table with multiple conditions.
Let's call this main table A. I want to pull data from this table to output the distinct account with their last purchase date and payment date to a temporary table.
+---+--------+-----------+----------+
| | Acct | Trans_Date|Trans_code|
+---+--------+-----------+----------+
| 1 | ABC | July 31 | Purchase |
| 2 | ABC | Nov 5 | Payment |
| 3 | DEF | Mar 1 | Purchase |
| 4 | ABC | June 5 | Purchase |
| 5 | GFH | Feb 7 | Payment |
| 6 | GFH | Mar 9 | Purchase |
| 7 | DEF | Aug 8 | Payment |
| 8 | GFH | Mar 9 | Purchase |
| 9 | DEF | Aug 8 | Payment |
+---+--------+---------+----------+
Output result
+---+-------+----------------+--------------+
| | Acct | Last_trans_date|Last_transpay |
+---+-------+----------------+--------------+
| 1 | ABC | July 31 | Nov 5 |
| 2 | DEF | Mar 1 | Aug 8 |
| 3 | GFH | Mar 9 | Feb 7 |
+---+------+-----------------+--------------+
I read that using the WITH clauses could be an option, but struggling to understand it.
You can use conditional aggregation like so:
select acct,
max(case when trans_code = 'Purchase' then trans_date end) as last_purchase,
max(case when trans_code = 'Payment' then trans_date end) as last_payment
from mytable
group by acct
The syntax to insert the result of a query to another table varies across databases. In many of them, you can use:
create table newtable as
select ... -- above query
SQL Server is a notable exception, where you would need:
select ...
into newtable
from ...
group by ...
You can use conditional aggregation:
select acct, max(trans_date),
max(case when trans_code = 'Payment' then trans_date end)
from t
group by acct;
You can then insert this into an existing table or use the appropriate mechanism for your database to save the result as a new table.

get records where one colum has values within range across records with same column names

with a table like below
+------+-----+------+----------+-----------+
| city | day | hour | car_name | car_count |
+------+-----+------+----------+-----------+
| 1 | 12 | 00 | corolla | 8 |
| 1 | 12 | 00 | city | 9 |
| 1 | 12 | 00 | amaze | 3 |
| 1 | 13 | 00 | corolla | 17 |
| 1 | 13 | 00 | city | 2 |
| 1 | 13 | 00 | amaze | 8 |
| 1 | 14 | 00 | corolla | 3 |
| 1 | 14 | 00 | amaze | 1 |
+------+-----+------+----------+-----------+
need to find out the city, day, hour where the car_count for all car_names is >= 3 and <= 10
expected result
| city | day | hour |
+------+-----+------+
| 1 | 12 | 00 |
Use group by and having.
select city,day,hour
from tablename
group by city,day,hour
having sum(case when car_count>=3 and car_count<=10 then 1 else 0 end) = count(*)
select city, day, hour
from t
group by 1, 2, 3
having bool_and(car_count >= 3)
You can group by on city, day and hour with the having condition sum(your condition) = count(your condition)
So basically we are creating a flag for each row which satisfies the condition "10 >= car_count >= 3" . Now we are summing all the flags and counting them simultaneously, if both the count and sum are equal that means your condition "10 >= car_count >= 3" was true for all the cars against city,day and hour
create table want as
select city,day,hour from have
group by city,day,hour
having sum(car_count>=3 and car_count<=10)=count(car_count>=3 and car_count<=10);
Please let me know in case of any queries.