How to exclude data based on the weeks in oracle - sql

CREATE TABLE states_tab (
id NUMBER(10),
states VARCHAR2(50),
action VARCHAR2(50),
schedule_time DATE,
CONSTRAINT pk_states_tab PRIMARY KEY ( id )
);
INSERT INTO states_tab VALUES(1,'Albania','Rejected','07-03-22');
INSERT INTO states_tab VALUES(2,'Albania','Approved','07-03-22');
INSERT INTO states_tab VALUES(3,'Albania','Rejected','28-02-22');
INSERT INTO states_tab VALUES(4,'Albania','Approved','21-02-22');
INSERT INTO states_tab VALUES(5,'Albania','Reviewed','14-02-22');
INSERT INTO states_tab VALUES(6,'Albania','Reviewed','14-02-22');
INSERT INTO states_tab VALUES(7,'Albania','Reviewed','07-02-22');
commit;
Hi Team,
Above are some sample data for which I need to extract the data based on the dates. For example in my sample data, I have data from 7th Feb till 7th March. But I need only the past 4 weeks' data i.e till 14th Feb. So, I need to exclude the data whichever coming after the 4th week. Below is my attempt.
SELECT
states,
schedule_time,
SUM(decode(action, 'Rejected', 1, 0)) reject_count,
SUM(decode(action, 'Approved', 1, 0)) approve_count,
SUM(decode(action, 'Reviewed', 1, 0)) review_count
FROM
states_tab
GROUP BY
states,
schedule_time
ORDER BY schedule_time DESC ;
From the above query, I am getting all the records but I need to restrict the records which are beyond the 4th week from 7th Mar 2022.
Expected Output:
+---------+----------------+---------------+----------------+---------------+
| States | Schedule_TIME | REJECT_COUNT | APPROVE_COUNT | REVIEW_COUNT |
+---------+----------------+---------------+----------------+---------------+
| Albania | 07-03-22 | 1 | 1 | 0 |
| Albania | 28-02-22 | 1 | 0 | 0 |
| Albania | 21-02-22 | 0 | 1 | 0 |
| Albania | 14-02-22 | 0 | 0 | 2 |
+---------+----------------+---------------+----------------+---------------+
Just I need to exclude the 4th week's record from my existing query. Rest output is as expected from my attempt but how will I be able to exclude the data after the 4th week?
Note: From the current date I need past 4-week data only no matter how many records are present I need just till 4th-week data. This I am not able to achieve
Tool used: Oracle SQL Developer(18c)

Sorry to inform you, but - what you got actually is 4 weeks back. Result you expect is 3 weeks back.
SQL> select sysdate,
2 trunc(sysdate) - (4 * 7) four_weeks,
3 trunc(sysdate) - (3 * 7) three_Weeks
4 from dual;
SYSDATE FOUR_WEEKS THREE_WEEK
---------- ---------- ----------
07.03.2022 07.02.2022 14.02.2022
Anyway - whichever period you need, you'll get data if you filter it, and that's done with the WHERE clause (see line #8):
SQL> SELECT
2 states,
3 schedule_time,
4 SUM(decode(action, 'Rejected', 1, 0)) reject_count,
5 SUM(decode(action, 'Approved', 1, 0)) approve_count,
6 SUM(decode(action, 'Reviewed', 1, 0)) review_count
7 FROM states_tab
8 where schedule_time >= trunc(sysdate) - (3 * 7)
9 GROUP BY
10 states,
11 schedule_time
12 ORDER BY schedule_time DESC ;
STATES SCHEDULE_T REJECT_COUNT APPROVE_COUNT REVIEW_COUNT
--------------- ---------- ------------ ------------- ------------
Albania 07.03.2022 1 1 0
Albania 28.02.2022 1 0 0
Albania 21.02.2022 0 1 0
Albania 14.02.2022 0 0 2
SQL>

Related

How to pivot column data into a row where a maximum qty total cannot be exceeded?

Introduction:
I have come across an unexpected challenge. I'm hoping someone can help and I am interested in the best method to go about manipulating the data in accordance to this problem.
Scenario:
I need to combine column data associated to two different ID columns. Each row that I have associates an item_id and the quantity for this item_id. Please see below for an example.
+-------+-------+-------+---+
|cust_id|pack_id|item_id|qty|
+-------+-------+-------+---+
| 1 | A | 1 | 1 |
| 1 | A | 2 | 1 |
| 1 | A | 3 | 4 |
| 1 | A | 4 | 0 |
| 1 | A | 5 | 0 |
+-------+-------+-------+---+
I need to manipulate the data shown above so that 24 rows (for 24 item_ids) is combined into a single row. In the example above I have chosen 5 items to make things easier. The selection format I wish to get, assuming 5 item_ids, can be seen below.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 4 | 0 | 0 |
+---------+---------+---+---+---+---+---+
However, here's the condition that is making this troublesome. The maximum total quantity for each row must not exceed 5. If the total quantity exceeds 5 a new row associated to the cust_id and pack_id must be created for the rest of the item_id quantities. Please see below for the desired output.
+---------+---------+---+---+---+---+---+
| cust_id | pack_id | 1 | 2 | 3 | 4 | 5 |
+---------+---------+---+---+---+---+---+
| 1 | A | 1 | 1 | 3 | 0 | 0 |
| 1 | A | 0 | 0 | 1 | 0 | 0 |
+---------+---------+---+---+---+---+---+
Notice how the quantities of item_ids 1, 2 and 3 summed together equal 6. This exceeds the maximum total quantity of 5 for each row. For the second row the difference is created. In this case only item_id 3 has a single quantity remaining.
Note, if a 2nd row needs to be created that total quantity displayed in that row also cannot exceed 5. There is a known item_id limit of 24. But, there is no known limit of the quantity associated for each item_id.
Here's an approach which goes from left-field a bit.
One approach would have been to do a recursive CTE, building the rows one-by-one.
Instead, I've taken an approach where I
Create a new (virtual) table with 1 row per item (so if there are 6 items, there will be 6 rows)
Group those items into groups of 5 (I've called these rn_batches)
Pivot those (based on counts per item per rn_batch)
For these, processing is relatively simple
Creating one row per item is done using INNER JOIN to a numbers table with n <= the relevant quantity.
The grouping then just assigns rn_batch = 1 for the first 5 items, rn_batch = 2 for the next 5 items, etc - until there are no more items left for that order (based on cust_id/pack_id).
Here is the code
/* Data setup */
CREATE TABLE #Order (cust_id int, pack_id varchar(1), item_id int, qty int, PRIMARY KEY (cust_id, pack_id, item_id))
INSERT INTO #Order (cust_id, pack_id, item_id, qty) VALUES
(1, 'A', 1, 1),
(1, 'A', 2, 1),
(1, 'A', 3, 4),
(1, 'A', 4, 0),
(1, 'A', 5, 0);
/* Pivot results */
WITH Nums(n) AS
(SELECT (c * 100) + (b * 10) + (a) + 1 AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) A(a)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) B(b)
CROSS JOIN (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) C(c)
),
ItemBatches AS
(SELECT cust_id, pack_id, item_id,
FLOOR((ROW_NUMBER() OVER (PARTITION BY cust_id, pack_id ORDER BY item_id, N.n)-1) / 5) + 1 AS rn_batch
FROM #Order O
INNER JOIN Nums N ON N.n <= O.qty
)
SELECT *
FROM (SELECT cust_id, pack_id, rn_batch, 'Item_' + LTRIM(STR(item_id)) AS item_desc
FROM ItemBatches
) src
PIVOT
(COUNT(item_desc) FOR item_desc IN ([Item_1], [Item_2], [Item_3], [Item_4], [Item_5])) pvt
ORDER BY cust_id, pack_id, rn_batch;
And here are results
cust_id pack_id rn_batch Item_1 Item_2 Item_3 Item_4 Item_5
1 A 1 1 1 3 0 0
1 A 2 0 0 1 0 0
Here's a db<>fiddle with
additional data in the #Orders table
the answer above, and also the processing with each step separated.
Notes
This approach (with the virtual numbers table) assumes a maximum of 1,000 for a given item in an order. If you need more, you can easily extend that numbers table by adding additional CROSS JOINs.
While I am in awe of the coders who made SQL Server and how it determines execution plans in millisends, for larger datasets I give SQL Server 0 chance to accurately predict how many rows will be in each step. As such, for performance, it may work better to split the code up into parts (including temp tables) similar to the db<>fiddle example.

How can I see point in time rolling five week counts of distinct values?

I am trying to see the point in time rolling five week count of distinct employees paid. For example, in week 48 I would need to see the count of distinct employees paid in weeks 44 through 48. I think I have to include something like "WHERE Week_Number BETWEEN Week_Number -5 AND Week_Number" but am not sure how to make this work. The output should just be the Year, Week Number, and count of distinct employee IDs.
SELECT Week_Number,
Year,
Account,
count(distinct EmployeeID as 'EmployeeCount'
FROM [Table]
GROUP BY Week_Number, Year, Account
I assume that you have a data table like this:
YearNumber | WeekNumber | Account | EmployeeID
----------------------------------------------
2019 | 51 | 101 | 1
2019 | 48 | 101 | 2
And this is the result you want to see:
YearNumber | WeekNumber | Account | Quantity
----------------------------------------------
2019 | 48 | 101 | 1
2019 | 49 | 101 | 1
2019 | 50 | 101 | 1
2019 | 51 | 101 | 2
2019 | 52 | 101 | 2
2020 | 1 | 101 | 1
2020 | 2 | 101 | 1
2020 | 3 | 101 | 1
So one person starts paying on week 48, one at 51, which means their payment on account 101 overlaps on week 51 and 52, but on the other weeks, only one person pays to the account.
To also answer your question in the comment: this - I think - is a good way to provide a sample data and expected result when you ask on SO.
The query which helped me produce the results above:
SELECT
d.Year + IIF((d.Week + n.Number - 1) >= 52, 1, 0) AS Year,
(d.Week + n.Number - 1) % 52 + 1 AS Week,
d.AccountID,
COUNT(d.EmployeeID) AS Quantity
FROM Data d
CROSS APPLY (SELECT * FROM Number n WHERE Number BETWEEN 0 AND 4) n
GROUP BY
d.Year + IIF((d.Week + n.Number - 1) >= 52, 1, 0), -- Year
(d.Week + n.Number - 1) % 52 + 1, -- Week
d.AccountID
This uses a Number table which is basically a table containing the numbers - help a lot in queries like this. The code also has a minimal handling for year turning, but be aware that you may need to care for years containing 53 weeks.

Oracle SQL - Efficiently calculate number of concurrent phone calls

I know that this question is essentially a duplicate of an older question I asked but quite a few things changed since I asked that question so I thought I'd ask a new question about it.
I have a table that holds phone call records which has the following fields:
END: Holds the timestamp of when a call ended - Data Type: DATE
LINE: Holds the phone line that was used for a call - Data Type: NUMBER
CALLDURATION: Holds the duration of a call in seconds - Data Type: NUMBER
The table has entries like this:
END LINE CALLDURATION
---------------------- ------------------- -----------------------
25/01/2012 14:05:10 6 65
25/01/2012 14:08:51 7 1142
25/01/2012 14:20:36 5 860
I need to create a query that returns the number of concurrent phone calls based on the data from that table. The query should calculate that number in different intervals. What I mean by that is that the results of the query should only contain a new entry whenever a call was started or ended. As long as the number of concurrent phone calls stays the same there should not be any additional entry in the output.
To make this more clear, here is an example of everything the query should return based on the example entries from the previous table:
TIMESTAMP LINE CALLDURATION STATUS CURRENTLYUSEDLINES
---------------------- ----- ------------- ------- -------------------
25/01/2012 13:49:49 7 1142 1 1
25/01/2012 14:04:05 6 65 1 2
25/01/2012 14:05:10 6 65 -1 1
25/01/2012 14:06:16 5 860 1 2
25/01/2012 14:08:51 7 1142 -1 1
25/01/2012 14:20:36 5 860 -1 0
I got the following example query from a colleague but unfortunately I do not fully understand it and it also does not work exactly as it should because for calls with a duration of 0 seconds it would sometimes have "-1" in the CURRENTLYUSEDLINES-column:
SELECT COALESCE (SUM (STATUS) OVER (ORDER BY END ROWS BETWEEN UNBOUNDED PRECEDING AND 0 PRECEDING), 0) CURRENTLYUSEDLINES
FROM (SELECT END - CALLDURATION / 86400 AS TIMESTAMP,
LINE,
CALLDURATION,
1 AS STATUS
FROM t_calls
UNION ALL
SELECT END,
LINE,
CALLDURATION,
-1 AS STATUS
FROM t_calls) t
ORDER BY 1;
Now I am supposed to make that query work like in the example but I'm not sure how to do that.
Could someone help me out with this or at least explain this query so I can try fixing it myself?
I think this will solve your problem:
SELECT TIMESTAMP,
SUM(SUM(STATUS)) OVER (ORDER BY TIMESTAMP) as CURRENTLYUSEDLINES
FROM ((SELECT END - CALLDURATION / (24*60*60) AS TIMESTAMP,
COUNT(*) AS STATUS
FROM t_calls
GROUP BY END - CALLDURATION / (24*60*60)
) UNION ALL
(SELECT END, - COUNT(*) AS STATUS
FROM t_calls
GROUP BY END
)
) t
GROUP BY TIMESTAMP
ORDER BY 1;
This is a slight simplification of your query. But by doing all the aggregations, you should be getting 0s, but not negative values.
You are getting negative values because the "ends" of the calls are being processed before the begins. This does all the work "at the same time", because there is only one row per timestamp.
You can use an UNPIVOT (using a similar technique to my answer here):
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( END, LINE, CALLDURATION ) AS
SELECT CAST( TIMESTAMP '2012-01-25 14:05:10' AS DATE ), 6, 65 FROM DUAL UNION ALL
SELECT CAST( TIMESTAMP '2012-01-25 14:08:51' AS DATE ), 7, 1142 FROM DUAL UNION ALL
SELECT CAST( TIMESTAMP '2012-01-25 14:20:36' AS DATE ), 5, 860 FROM DUAL;
Query 1:
SELECT p.*,
SUM( status ) OVER ( ORDER BY dt, status DESC ) AS currentlyusedlines
FROM (
SELECT end - callduration / 86400 As dt,
t.*
FROM table_name t
)
UNPIVOT( dt FOR status IN ( dt As 1, end AS -1 ) ) p
Results:
| LINE | CALLDURATION | STATUS | DT | CURRENTLYUSEDLINES |
|------|--------------|--------|----------------------|--------------------|
| 7 | 1142 | 1 | 2012-01-25T13:49:49Z | 1 |
| 6 | 65 | 1 | 2012-01-25T14:04:05Z | 2 |
| 6 | 65 | -1 | 2012-01-25T14:05:10Z | 1 |
| 5 | 860 | 1 | 2012-01-25T14:06:16Z | 2 |
| 7 | 1142 | -1 | 2012-01-25T14:08:51Z | 1 |
| 5 | 860 | -1 | 2012-01-25T14:20:36Z | 0 |

Oracle SQL: How to eliminate redundant recursive calls in CTE

The below set represents the sales of a product in consecutive weeks.
22,19,20,23,16,14,15,15,18,21,24,10,17
...
weekly sales table
date sales
week-1 : 22
week-2 : 19
week-3 : 20
...
week-12 : 10
week-13 : 17
I need to find the longest run of higher sales figures for consecutive weeks, i.e week-6 to week-11 represented by 14,15,15,18,21,24.
I am trying to use a recursive CTE to move forward to the next week(s) to find if the sales value is equal or higher. As long as the value is equal or higher, keep on moving to the next week, recording the ROWNUMBER of the anchor member (represents the starting week number) and the week number of the iterated row. With this approach, there are redundant recursive calls. For example, when cte is called for week-2, it iterates week-3, week-4 and week-5 as the sales values are higher on each week from its previous week. Now, after week-2, the cte should be called for week-5 as week-3, week-4 and week-5 have already been visited.
Basically, if I have already visited a row of filt_coll in my recursive calls, I do not want it to be passed to the CTE again. The rows marked as redundant should not be found and the values for actualweek column should be unique.
I know the sql below does not give a solution to my problem of finding the longest run of higher values. I can work out that from the max count of startweek column. For now, I am trying to figure out how to eliminate the redundant recursive calls.
START_WEEK | SALES | SALESLAG | SALESLEAD | ACTUALWEEK
1 | 22 | 0 | -3 | 1
2 | 19 | -3 | 1 | 2
2 | 20 | 1 | 3 | 3
2 | 23 | 3 | -7 | 4
3 | 20 | 1 | 3 | 3 <-(redundant)
3 | 23 | 3 | -7 | 4 <-(redundant)
4 | 23 | 3 | -7 | 4 <-(redundant)
6 | 14 | -2 | 1 | 6
...
with
-- begin test data
raw_data (sales) as
(
select '22,19,20,23,16,14,15,15,18,21,24,10,17' from dual
)
,
derived_tbl(week, sales) as
(
select level, regexp_substr(sales, '([[:digit:]]+)(,|$)', 1, level, null, 1)
from raw_data connect by level <= regexp_count(sales,',')+1
)
-- end test data
,
coll(week, sales, saleslag, saleslead) as
(
select week, sales,
nvl(sales - (lag(sales) over (order by week)), 0),
nvl((lead(sales) over (order by week) - sales), 0)
from derived_tbl
)
,
filt_coll(week, sales, saleslag, saleslead) as
(
select week, sales, saleslag, saleslead
from coll
where not (saleslag < 0 and saleslead < 0)
)
,
cte(startweek, sales, saleslag, saleslead, actualweek) as
(
select week, sales, saleslag, saleslead, week from filt_coll
-- where week not in (select week from cte)
-- *** want to achieve the effect of the above commented out line
union all
select cte.startweek, cl.sales, cl.saleslag, cl.saleslead, cl.week
from filt_coll cl, cte
where cl.week = cte.actualweek + 1 and cl.sales >= cte.sales
)
select * from cte
order by 1,actualweek
;

How can I identify groups of consecutive dates in SQL?

Im trying to write a function which identifies groups of dates, and measures the size of the group.
I've been doing this procedurally in Python until now but I'd like to move it into SQL.
for example, the list
Bill 01/01/2011
Bill 02/01/2011
Bill 03/01/2011
Bill 05/01/2011
Bill 07/01/2011
should be output into a new table as:
Bill 01/01/2011 3
Bill 02/01/2011 3
Bill 03/01/2011 3
Bill 05/01/2011 1
Bill 07/01/2011 1
Ideally this should also be able to account for weekends and public holidays - the dates in my table will aways be Mon-Fri (I think I can solve this by making a new table of working days and numbering them in sequence). Someone at work suggested I try a CTE. Im pretty new to this, so I'd appreciate any guidance anyone could provide! Thanks.
You can do this with a clever application of window functions. Consider the following:
select name, date, row_number() over (partition by name order by date)
from t
This adds a row number, which in your example would simply be 1, 2, 3, 4, 5. Now, take the difference from the date, and you have a constant value for the group.
select name, date,
dateadd(d, - row_number() over (partition by name order by date), date) as val
from t
Finally, you want the number of groups in sequence. I would also add a group identifier (for instance, to distinguish between the last two).
select name, date,
count(*) over (partition by name, val) as NumInSeq,
dense_rank() over (partition by name order by val) as SeqID
from (select name, date,
dateadd(d, - row_number() over (partition by name order by date), date) as val
from t
) t
Somehow, I missed the part about weekdays and holidays. This solution does not solve that problem.
The following query account the weekends and holidays. The query has a provision to include the holidays on-the-fly, though for the purpose of making the query clearer, I just materialized the holidays to an actual table.
CREATE TABLE tx
(n varchar(4), d date);
INSERT INTO tx
(n, d)
VALUES
('Bill', '2006-12-29'), -- Friday
-- 2006-12-30 is Saturday
-- 2006-12-31 is Sunday
-- 2007-01-01 is New Year's Holiday
('Bill', '2007-01-02'), -- Tuesday
('Bill', '2007-01-03'), -- Wednesday
('Bill', '2007-01-04'), -- Thursday
('Bill', '2007-01-05'), -- Friday
-- 2007-01-06 is Saturday
-- 2007-01-07 is Sunday
('Bill', '2007-01-08'), -- Monday
('Bill', '2007-01-09'), -- Tuesday
('Bill', '2012-07-09'), -- Monday
('Bill', '2012-07-10'), -- Tuesday
('Bill', '2012-07-11'); -- Wednesday
create table holiday(d date);
insert into holiday(d) values
('2007-01-01');
/* query should return 7 consecutive good
attendance(from December 29 2006 to January 9 2007) */
/* and 3 consecutive attendance from July 7 2012 to July 11 2012. */
Query:
with first_date as
(
-- get the monday of the earliest date
select dateadd( ww, datediff(ww,0,min(d)), 0 ) as first_date
from tx
)
,shifted as
(
select
tx.n, tx.d,
diff = datediff(day, fd.first_date, tx.d)
- (datediff(day, fd.first_date, tx.d)/7 * 2)
from tx
cross join first_date fd
union
select
xxx.n, h.d,
diff = datediff(day, fd.first_date, h.d)
- (datediff(day, fd.first_date, h.d)/7 * 2)
from holiday h
cross join first_date fd
cross join (select distinct n from tx) as xxx
)
,grouped as
(
select *, grp = diff - row_number() over(partition by n order by d)
from shifted
)
select
d, n, dense_rank() over (partition by n order by grp) as nth_streak
,count(*) over (partition by n, grp) as streak
from grouped
where d not in (select d from holiday) -- remove the holidays
Output:
| D | N | NTH_STREAK | STREAK |
-------------------------------------------
| 2006-12-29 | Bill | 1 | 7 |
| 2007-01-02 | Bill | 1 | 7 |
| 2007-01-03 | Bill | 1 | 7 |
| 2007-01-04 | Bill | 1 | 7 |
| 2007-01-05 | Bill | 1 | 7 |
| 2007-01-08 | Bill | 1 | 7 |
| 2007-01-09 | Bill | 1 | 7 |
| 2012-07-09 | Bill | 2 | 3 |
| 2012-07-10 | Bill | 2 | 3 |
| 2012-07-11 | Bill | 2 | 3 |
Live test: http://www.sqlfiddle.com/#!3/815c5/1
The main logic of the query is to shift all the dates two days back. This is done by dividing the date to 7 and multiplying it by two, then subtracting it from the original number. For example, if a given date falls on 15th, this will be computed as 15/7 * 2 == 4; then subtract 4 from the original number, 15 - 4 == 11. 15 will become the 11th day. Likewise the 8th day becomes the 6th day; 8 - (8/7 * 2) == 6.
Weekends are not in attendance(e.g. 6,7,13,14)
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15
Applying the computation to all the weekday numbers will yield these values:
1 2 3 4 5
6 7 8 9 10
11
For holidays, you need to slot them on attendance, so to the consecutive-ness could be easily determined, then just remove them from the final query. The above attendance yields 11 consecutive good attendance.
Query logic's detailed explanation here: http://www.ienablemuch.com/2012/07/monitoring-perfect-attendance.html