SQL Server : group by consecutive - sql

I have this table:
CREATE TABLE yourtable
(
HevEvenementID INT,
HjvNumeSequJour INT,
HteTypeEvenID INT
);
INSERT INTO yourtable
VALUES (12074, 1, 66), (12074, 2, 66), (12074, 3, 5),
(12074, 4, 7), (12074, 5, 17), (12074, 6, 17),
(12074, 7, 17), (12074, 8, 17), (12074, 9, 17), (12074, 10, 5)
I need to group by consecutive HteTypeEvenID. Right now I am doing this:
SELECT
HevEvenementID,
MAX(HjvNumeSequJour) AS HjvNumeSequJour,
HteTypeEvenID
FROM
(SELECT
HevEvenementID,
HjvNumeSequJour,
HteTypeEvenID
FROM
yourtable y) AS s
GROUP BY
HevEvenementID, HteTypeEvenID
ORDER BY
HevEvenementID,HjvNumeSequJour, HteTypeEvenID
which returns this:
HevEvenementID HjvNumeSequJour HteTypeEvenID
---------------------------------------------
12074 2 66
12074 4 7
12074 9 17
12074 10 5
I need to group by consecutive HteTypeEvenID, to get this result:
HevEvenementID HjvNumeSequJour HteTypeEvenID
----------------------------------------------
12074 2 66
12074 3 5
12074 4 7
12074 9 17
12074 10 5
Any suggestions?

In SQL Server, you can do this with aggregation and difference of row numbers:
select HevEvenementID, HteTypeEvenID,
max(HjvNumeSequJour)
from (select t.*,
row_number() over (partition by HevEvenementID order by HjvNumeSequJour) as seqnum_1,
row_number() over (partition by HevEvenementID, HteTypeEvenID order by HjvNumeSequJour) as seqnum_2
from yourtable t
) t
group by HevEvenementID, HteTypeEvenID, (seqnum_1 - seqnum_2)
order by max(HjvNumeSequJour);
I think the best way to understand how this works is by staring at the results of the subquery. You will see how the difference between the two values defines the groups of adjacent values.

Related

Display Average Billing Amount For Each Customer only between years 2019-2021

QUESTION : Display Average Billing Amount For Each Customer ONLY between YEAR(2019-2021).
If customer doesn't have any billing amount for any of the particular year then consider as 0.
-------: OUTPUT :
Customer_ID | Customer_Name | AVG_Billed_Amount
-------------------------------------------------------------------------
1 | A | 87.00
2 | B | 200.00
3 | C | 183.00
--------: EXPLANATION :
If any customer doesn't have any billing records for these 3 years then we need to consider as one record with billing_amount = 0
Like Customer C doesn't have any record for Year 2020, so for C Average will be
(250+300+0)/3 = 183.33 OR 183.00
TEMP TABLE HAS FOLLOWING DATA
DROP TABLE IF EXISTS #TEMP;
CREATE TABLE #TEMP
(
Customer_ID INT
, Customer_Name NVARCHAR(100)
, Billing_ID NVARCHAR(100)
, Billing_creation_Date DATETIME
, Billed_Amount INT
);
INSERT INTO #TEMP
SELECT 1, 'A', 'ID1', TRY_CAST('10-10-2020' AS DATETIME), 100 UNION ALL
SELECT 1, 'A', 'ID2', TRY_CAST('11-11-2020' AS DATETIME), 150 UNION ALL
SELECT 1, 'A', 'ID3', TRY_CAST('12-11-2021' AS DATETIME), 100 UNION ALL
SELECT 2, 'B', 'ID4', TRY_CAST('10-11-2019' AS DATETIME), 150 UNION ALL
SELECT 2, 'B', 'ID5', TRY_CAST('11-11-2020' AS DATETIME), 200 UNION ALL
SELECT 2, 'B', 'ID6', TRY_CAST('12-11-2021' AS DATETIME), 250 UNION ALL
SELECT 3, 'C', 'ID7', TRY_CAST('01-01-2018' AS DATETIME), 100 UNION ALL
SELECT 3, 'C', 'ID8', TRY_CAST('05-01-2019' AS DATETIME), 250 UNION ALL
SELECT 3, 'C', 'ID9', TRY_CAST('06-01-2021' AS DATETIME), 300
-----------------------------------------------------------------------------------
Here, 'A' has 3 transactions - TWICE in year 2020(100+150) and 1 in year 2021(100), but none in 2019(SO, Billed_Amount= 0).
so the average will be calculated as (100+150+100+0)/4
DECLARE #BILL_dATE DATE = (SELECT Billing_creation_date from #temp group by customer_id, Billing_creation_date) /*-- THIS THROWS ERROR AS #BILL_DATE WON'T ACCEPT MULTIPLE VALUES.*/
OUTPUT should look like this:
Customer_ID
Customer_Name
AVG_Billed_Amount
1
A
87.00
2
B
200.00
3
C
183.00
You just need a formula to count the number of missing years.
That's 3 - COUNT(DISTINCT YEAR(Billing_creation_Date)
Then the average = SUM() / (COUNT() + (3 - COUNT(DISTINCT YEAR)))...
SELECT
Customer_ID,
Customer_Name,
SUM(Billed_Amount) * 1.0
/
(COUNT(*) + 3 - COUNT(DISTINCT YEAR(Billing_creation_Date)))
AS AVG_Billed_amount
FROM
#temp
WHERE
Billing_creation_Date >= '2019-01-01'
AND Billing_creation_Date < '2022-01-01'
GROUP BY
Customer_ID,
Customer_Name
Demo : https://dbfiddle.uk/ILcfiGWL
Note: The WHERE clause in another answer here would cause a scan of the table, due to hiding the filtered column behind a function. The way I've formed the WHERE clause allows a "Range Seek" if the column is in an index.
Here is a query that can do that :
select s.Customer_ID, s.Customer_Name, sum(Billed_amount)/ ( 6 - count(1)) as AVG_Billed_Amount from (
select Customer_ID, Customer_Name, sum(Billed_Amount) as Billed_amount
from TEMP
where year(Billing_creation_Date) between 2019 and 2021
group by Customer_ID, year(Billing_creation_Date)
) as s
group by Customer_ID;
According to your description the customer_name C will be 137.5000 not 183.00 since 2018 is not counted and 2020 is not there.

Finding the largest subsets of consecutive rows with a maximum gap size (gaps and islands)

I'm trying to solve a SQL puzzle. The goal is to find subsets wherein the acceptible gap size is less than some maximum. Think of (say) searching for suspicious credit card behaviour by looking for n transactions within m minutes.
I'm using Postgres 9.6, but a correct solution to the puzzle sticks to ANSI SQL:2008.
Input
t
amt
1
10
4
10
16
40
20
10
30
50
60
5
61
5
62
5
63
5
72
5
90
30
create table d(t int, amt int);
insert into d
values (1, 10),
(4, 10),
(16, 40),
(20, 10),
(30, 50),
(60, 5),
(61, 5),
(62, 5),
(63, 5),
(72, 5),
(90, 30);
Expected Output
All subsequences such that the difference of t with the previous row is less than 10.
start_t
end_t
cnt
total
1
4
2
20
16
20
2
50
30
30
1
50
60
72
5
25
90
90
1
30
Notes
I've tried the "difference of row_number" (Tabibitosan method), but the fact that t is not necessarily consecutive foiled my efforts.
Thank you for your help!
Flag the start of group and aggregate groups
select min(t) start_t, max(t) end_t, count(*) cnt, sum(amt) total
from (
select t, amt, sum(flag) over(order by t) grp
from (
select t, amt, case when t - lag(t, 1, t-11) over(order by t) >= 10 then 1 end flag
from d
) t
) t
group by grp

Cumulative sum of a column

I have a table that has the below data.
COUNTRY LEVEL NUM_OF_DUPLICATES
US 9 6
US 8 24
US 7 12
US 6 20
US 5 39
US 4 81
US 3 80
US 2 430
US 1 178
US 0 430
I wrote a query that will calculate the sum of cumulative rows and got the below output .
COUNTRY LEVEL NUM_OF_DUPLICATES POOL
US 9 6 6
US 8 24 30
US 7 12 42
US 6 20 62
US 5 39 101
US 4 81 182
US 3 80 262
US 2 130 392
US 1 178 570
US 0 254 824
Now I want to to filter the data and take only where the POOL <=300, if the POOL field does not have the value 300 then I should take the first value after 300. So, in the above example we do not have the value 300 in the field POOL, so we take the next immediate value after 300 which is 392. So I need a query so that I can pull the records POOL <= 392(as per the example above) which will yield me the output as
COUNTRY LEVEL NUM_OF_DUPLICATES POOL
US 9 6 6
US 8 24 30
US 7 12 42
US 6 20 62
US 5 39 101
US 4 81 182
US 3 80 262
US 2 130 392
Please let me know your thoughts. Thanks in advance.
declare #t table(Country varchar(5), Level int, Num_of_Duplicates int)
insert into #t(Country, Level, Num_of_Duplicates)
values
('US', 9, 6),
('US', 8, 24),
('US', 7, 12),
('US', 6, 20),
('US', 5, 39),
('US', 4, 81),
('US', 3, 80),
('US', 2, 130/*-92*/),
('US', 1, 178),
('US', 0, 430);
select *, sum(Num_of_Duplicates) over(partition by country order by Level desc),
(sum(Num_of_Duplicates) over(partition by country order by Level desc)-Num_of_Duplicates) / 300 as flag,--any row which starts before 300 will have flag=0
--or
case when sum(Num_of_Duplicates) over(partition by country order by Level desc)-Num_of_Duplicates < 300 then 1 else 0 end as startsbefore300
from #t;
select *
from
(
select *, sum(Num_of_Duplicates) over(partition by country order by Level desc) as Pool
from #t
) as t
where Pool - Num_of_Duplicates < 300 ;
The logic here is quite simple:
Calculate the running sum POOL value up to the current row.
Filter rows so that the previous row's total is < 300, you can either subtract the current row's value, or use a second sum
If the total up to the current row is exactly 300, the previous row will be less, so this row will be included
If the current row's total is more than 300, but the previous row is less then it will also be included
All higher rows are excluded
It's unclear what ordering you want. I've used NUM_OF_DUPLICATES column ascending, but you may want something else
SELECT
COUNTRY,
LEVEL,
NUM_OF_DUPLICATES,
POOL
FROM (
SELECT *,
POOL = SUM(NUM_OF_DUPLICATES) OVER (ORDER BY NUM_OF_DUPLICATES ROWS UNBOUNDED PRECEDING)
-- alternative calculation
-- ,POOLPrev = SUM(NUM_OF_DUPLICATES) OVER (ORDER BY NUM_OF_DUPLICATES ROWS UNBOUNDED PRECEDING AND 1 PRECEDING)
FROM YourTable
) t
WHERE POOL - NUM_OF_DUPLICATES < 300;
-- you could also use POOLPrev above
I used two temp tables to get the answer.
DECLARE #t TABLE(Country VARCHAR(5), [Level] INT, Num_of_Duplicates INT)
INSERT INTO #t(Country, Level, Num_of_Duplicates)
VALUES ('US', 9, 6),
('US', 8, 24),
('US', 7, 12),
('US', 6, 20),
('US', 5, 39),
('US', 4, 81),
('US', 3, 80),
('US', 2, 130),
('US', 1, 178),
('US', 0, 254);
SELECT
Country
,Level
, Num_of_Duplicates
, SUM (Num_of_Duplicates) OVER (ORDER BY id) AS [POOL]
INTO #temp_table
FROM
(
SELECT
Country,
level,
Num_of_Duplicates,
ROW_NUMBER() OVER (ORDER BY country) AS id
FROM #t
) AS A
SELECT
[POOL],
ROW_NUMBER() OVER (ORDER BY [POOL] ) AS [rank]
INTO #Temp_2
FROM #temp_table
WHERE [POOL] >= 300
SELECT *
FROM #temp_table WHERE
[POOL] <= (SELECT [POOL] FROM #Temp_2 WHERE [rank] = 1 )
DROP TABLE #temp_table
DROP TABLE #Temp_2

How to write this query in SQL with category and date sum

I have a table in MySQL with these data belows:
DATE Category AMOUNT
2016-1-1 A 12
2016-1-1 B 10
2016-1-2 A 5
2016-1-3 C 1
2016-2-1 A 5
2016-2-1 B 6
2016-2-2 A 7
2016-2-3 C 3
How can I get the result as below:
MONTH TOTAL Category-A Category-B Category-C
2016 Jan 28 17 10 1
2016 Feb 21 12 6 3
If you're using MySQL this would work:
SELECT
DATE_FORMAT(DATE, '%Y %b') AS MONTH,
SUM(AMOUNT) AS TOTAL,
SUM(IF(CATEGORY='A', AMOUNT, 0)) AS `Category-A`,
SUM(IF(CATEGORY='B', AMOUNT, 0)) AS `Category-B`,
SUM(IF(CATEGORY='C', AMOUNT, 0)) AS `Category-C`
FROM your_table
GROUP BY MONTH;
For other database engines you might have to change that a little.
DECLARE #Table1 TABLE
(DATE varchar(8), CAT varchar(1), AMOUNT int)
;
INSERT INTO #Table1
(DATE, CAT, AMOUNT)
VALUES
('2016-1-1', 'A', 12),
('2016-1-1', 'B', 10),
('2016-1-2', 'A', 5),
('2016-1-3', 'C', 1),
('2016-2-1', 'A', 5),
('2016-2-1', 'B', 6),
('2016-2-2', 'A', 7),
('2016-2-3', 'C', 3)
;
Select Months,
[A][Category-A],
[B][Category-B],
[C][Category-C],
SUM([A]+[B]+[C])TOTAL
from (
select CAST(year(DATE) AS VARCHAR)+' '+CONVERT(CHAR(3), DATENAME(MONTH, date))Months,
CAT,
AMOUNT
from #Table1)T
PIVOT (SUM(AMOUNT) FOR cat IN ([A],[B],[C]))P
GROUP BY Months,[A],[B],[C]
ORDER BY CONVERT(CHAR(3), DATENAME(MONTH, Months)) desc

Calculate Returns in Oracle SQL

I have below data available with me
Date Sec ID Price
01-Jan-2014, 1, 100
02-Jan-2014, 1, 111
03-Jan-2014, 1, 90
04-Jan-2014, 1, 121
01-Jan-2014, 2, 10
02-Jan-2014, 2, 11
03-Jan-2014, 2, 9
04-Jan-2014, 2, 12
I am using the lag function using below query but not getting proper results
select sec_id,date_of_data,price,
LAG(sec_id,1) over (order by sec_id) as prev_sec_id,
LAG(date_of_data,1) over (order by sec_id) as prev_date,
LAG(price,1) over (order by sec_id) as prev_price,
price/LAG(price,1) over (order by sec_id)-1 as price_return
from eqa.asset_mkt_price_ts
where sec_id in (1,2);
and date_of_data between '01-Jan-2014' and '04-Jan-2014'
Results are as below
Date Sec ID Price Prev Sec ID Prev Price
01-Jan-2014, 1, 100, NULL, NULL
02-Jan-2014, 1, 111, 1, 100
03-Jan-2014, 1, 90, 1, 111
04-Jan-2014, 1, 121, 1, 90
01-Jan-2014, 2, 10, 1, 121 ----- Issue Case
02-Jan-2014, 2, 11, 2, 10
03-Jan-2014, 2, 9, 2, 11
04-Jan-2014, 2, 12, 2, 12
As seen above, results are not logical as For Sec ID: 2, Previous Price is being used of Sec ID: 1 which is not correct
Hope any expert around here can help me
Thanks
Hitesh
You need to replace order by sec_id with partition by sec_id order by date. Using order by sec_id produces an analytic window of the whole input table ordered by sec_id, which could give unpredictable results and will always get the previous row regardless of whether a new sec_id group is started.
Partitioning by sec_id gives two analytic windows, so the lag function works as you would like it to:
select x.*
from
(select sec_id,date_of_data,price,
LAG(sec_id,1) over (partition by sec_id order by date) as prev_sec_id,
LAG(date_of_data,1) over (partition by sec_id order by date) as prev_date,
LAG(price,1) over (partition by sec_id order by date) as prev_price,
price/LAG(price,1) over (partition by sec_id order by date)-1 as price_return
from eqa.asset_mkt_price_ts
where sec_id in (1,2)) x where x.price_return < 0.3;