In my table contracts I do have all contracts and orders (order belong to particular contracts defined by parent id pid). Contracts and orders are distinguished by id_type;
1 = contract (active at the beginning)
2 = deactivation order (contract becomes inactive)
3 = reactivation order (contract becomes active again)
Contracts can be deactivated or reactivated many times. Also, contracts can be deactivated and never reactivated again.
table of records:
id | pid | id_type | start_date
===+=====+=========+===========
20 | | 1 | 2021-01-01 --> contract 20 started and active
38 | 20 | 2 | 2021-02-15 --> contract 20 temporarily deactivated
42 | 20 | 3 | 2021-02-25 --> contract 20 activated again
54 | 20 | 2 | 2021-04-01 --> contract 20 temporarily deactivated
95 | 20 | 3 | 2021-04-15 --> contract 20 activated again
30 | | 1 | 2021-01-12 --> contract 30 started and active
I need SQL query which will return whether the contract is active or deactivated on a given date.
For example for date 2021-02-20 I should get that contract 20 is inactive.
I made some tries with LAG/LEAD functions but without success.
You can get the most recent row on or before a particular date using:
select t.*
from (select t.*,
row_number() over (partition by coalesce(pid, id) order by start_date desc) as seqnum
from t
where start_date <= date '2021-02-20'
) t
where seqnum = 1;
If you only want the status and date, then you can also use group by and keep:
select coalesce(pid, id), max(start_date),
max(id_type) keep (dense_rank first order by start_date desc) as id_type
from t
where start_date <= date '2021-02-20'
group by coalesce(pid, id)
Related
I’m looking for some kind of SQL window function that calculate values based on a calculated value from a previous iteration when looping over the window. I’m not looking for ‘lag’ which will just take the original value of the previous row.
Here is the case: We have web analytics sessions. We would like to attribute to each session to the last relevant channel. There are 3 channels: direct, organic and paid. However, they have different priorities: paid will always be relevant. Organic will only be relevant if there was no paid channel in the last 30 days and direct would only be relevant if there was not paid or organic channel in the last 30 days
So in the example table we would like to calculate the values in column ‘attributed’ based on channel and the date columns. Note, the data is there for several users so this should be calculated per user.
+─────────────+───────+──────────+─────────────+
| date | user | channel | attributed |
+─────────────+───────+──────────+─────────────+
| 2022-01-01 | 123 | direct | direct |
| 2022-01-14 | 123 | paid | paid |
| 2022-02-01 | 123 | direct | paid |
| 2022-02-12 | 123 | direct | paid |
| 2022-02-13 | 123 | organic | paid |
| 2022-03-08 | 123 | direct | direct |
| 2022-03-10 | 123 | paid | paid |
+─────────────+───────+──────────+─────────────+
So in the table above row 1 is attributed direct because it’s the first line. The second then is paid as this has priority to direct. It stays paid for the next 2 sessions as direct has lower priority, then it switches to organic as the paid attribution is older than 30 days. The last one is then paid again as it has a higher priority than organic.
I would know how to solve it if you could decide whether a new channel needs to be attributed only based on the current row and the previous. I added here the SQL to do it:
with source as ( -- example data
select cast("2022-01-01" as date) as date, 123 as user, "direct" as channel
union all
select "2022-01-14", 123, "paid"
union all
select "2022-02-01", 123, "direct"
union all
select "2022-02-12", 123, "direct"
union all
select "2022-02-13", 123, "organic"
union all
select "2022-03-08", 123, "direct"
union all
select "2022-03-10", 123, "paid"
),
flag_new_channel as( -- flag sessions that would override channel informaton ; this only works statically here
select *,
case
when lag(channel) over (partition by user order by date) is null then 1
when date_diff(date,lag(date) over (partition by user order by date),day)>30 then 1
when channel = "paid" then 1
when channel = "organic" and lag(channel) over (partition by user order by date)!='paid' then 1
else 0
end flag
from source
qualify flag=1
)
select s.*,
f.channel attributed_channel,
row_number() over (partition by s.user, s.date order by f.date desc) rn -- number of flagged previous sessions
from source s
left join flag_new_channel f
on s.date>=f.date
qualify rn=1 --only keep the last flagged session at or before the current session
However, this would for example set "organic" in row 5 because it doesn't know "paid" is still relevant.
+─────────────+───────+──────────+─────────────────────+
| date | user | channel | attributed_channel |
+─────────────+───────+──────────+─────────────────────+
| 2022-01-01 | 123 | direct | direct |
| 2022-01-14 | 123 | paid | paid |
| 2022-02-01 | 123 | direct | paid |
| 2022-02-12 | 123 | direct | paid |
| 2022-02-13 | 123 | organic | organic |
| 2022-03-08 | 123 | direct | organic |
| 2022-03-10 | 123 | paid | paid |
+─────────────+───────+──────────+─────────────────────+
Any ideas? Not sure recursive queries can help or udfs. I’m using BigQuery usually but if you know solutions in other dialects it would still be interesting to know.
Here's one approach:
Updated: Corrected. I lost track of your correct / expected result, due to the confusing story.
For PostgreSQL, we can do something like this (with CTE and window functions):
The fiddle for PG 14
pri - provides a table of (channel, priority) pairs
cte0 - provides the test data
cte1 - determines the minimum priority over the last 30 days per user
final - the final query expression obtains the attributed channel name
WITH pri (channel, pri) AS (
VALUES ('paid' , 1)
, ('organic' , 2)
, ('direct' , 3)
)
, cte0 (date, xuser, channel) AS (
VALUES
('2022-01-01'::date, 123, 'direct')
, ('2022-01-14' , 123, 'paid')
, ('2022-02-01' , 123, 'direct')
, ('2022-02-12' , 123, 'direct')
, ('2022-02-13' , 123, 'organic')
, ('2022-03-08' , 123, 'direct')
, ('2022-03-10' , 123, 'paid')
)
, cte1 AS (
SELECT cte0.*
, pri.pri
, MIN(pri) OVER (PARTITION BY xuser ORDER BY date
RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW
) AS mpri
FROM cte0
JOIN pri
ON pri.channel = cte0.channel
)
SELECT cte1.*
, pri.channel AS attributed
FROM cte1
JOIN pri
ON pri.pri = cte1.mpri
;
The result:
date
xuser
channel
pri
mpri
attributed
2022-01-01
123
direct
3
3
direct
2022-01-14
123
paid
1
1
paid
2022-02-01
123
direct
3
1
paid
2022-02-12
123
direct
3
1
paid
2022-02-13
123
organic
2
1
paid
2022-03-08
123
direct
3
2
organic
2022-03-10
123
paid
1
1
paid
Our data table looks like this:
Machine Name
Lot Number
Qty
Load TxnDate
Unload TxnDate
M123
ABC
500
10/1/2020
10/2/2020
M741
DEF
325
10/1/2020
M123
ZZZ
100
10/5/2020
10/7/2020
M951
AAA
550
10/5/2020
10/9/2020
M123
BBB
550
10/7/2020
I need to create an SQL query that shows the currently loaded Lot number - Machines with no Unload TxnDate - and the last loaded Lot number based on the unload TxnDate.
So in the example, when I run a query for M123, the result will show:
Machine Name
Lot Number
Qty
Load TxnDate
Unload TxnDate
M123
ZZZ
100
10/5/2020
10/7/2020
M123
BBB
550
10/7/2020
As you can see although Machine Name has 3 records, the results only show the currently loaded and the last loaded. Is there anyway to replicate this? The Machine Name is dynamic, so my user can enter the Machine Name and see the results the machine based on the missing Unload TxnDate and the last Unload Txn Date
You seem to want the last two rows. That would be something like this:
select t.*
from t
where machine_name = 'M123'
order by load_txn_date desc
fetch first 2 rows only;
Note: not all databases support the first first clause. Some spell it limit, or select top, or even something else.
If you want two rows per machine, one option uses window functions:
select *
from (
select t.*,
row_number() over(
partition by machine_name, (case when unload_txn_date is null then 0 else 1 end)
order by coalesce(unload_txn_date, load_txn_date) desc
) rn
from mytable t
) t
where rn = 1
The idea is to separate rows between those that have an unload date, and those that do not. We can then bring the top record per group.
For your sample data, this returns:
Machine_Name | Lot_Number | Qty | Load_Txn_Date | Unload_Txn_Date | rn
:----------- | :--------- | --: | :------------ | :-------------- | -:
M123 | BBB | 550 | 2020-10-07 | null | 1
M123 | ZZZ | 100 | 2020-10-05 | 2020-10-07 | 1
M741 | DEF | 325 | 2020-10-01 | null | 1
M951 | AAA | 550 | 2020-10-05 | 2020-10-09 | 1
You might use the following query, presuming that you're on a database having Window(or Analytic) Function
WITH t AS
(
SELECT COALESCE(Unload_Txn_Date -
LAG(Load_Txn_Date) OVER
(PARTITION BY Machine_Name ORDER BY Load_Txn_Date DESC),0) AS lg,
MAX(CASE WHEN Unload_Txn_Date IS NULL THEN Load_Txn_Date END) OVER
(PARTITION BY Machine_Name) AS mx,
t.*
FROM tab t
), t2 AS
(
SELECT DENSE_RANK() OVER (ORDER BY mx DESC NULLS LAST) AS dr, t.*
FROM t
WHERE mx IS NOT NULL
)
SELECT Machine_Name,Lot_Number,Qty,Load_Txn_Date,Unload_Txn_Date
FROM t2
WHERE dr = 1 AND lg = 0
ORDER BY Load_Txn_Date
where if previous row's Unload_Txn_Date is equal to the current Load_Txn_Date, then it's accepted that there's no interruption will occur for the job, while determining the last Unload Txn Dates with no unload date values per each machine. And then, the result set returns through filtering by the values derived from the window functions within the penultimate query.
Demo
I have an issue we want to create a request to add customer number of account by period.
For each account I have : accountid, customerid, createddate and deleteddate
select accountid,customerid, createddate , deleteddate from account
where customerid = 1
This customer have 4 accounts :
accountid | customerid | createddate | deleteddate
2145 | 6641 | 2018-12-12 10:39:16.457 | 2020-03-26 00:00:12.540
2718 | 6641 | 2020-02-11 15:04:51.643 | 2020-03-26 00:00:04.947
2825 | 46818 | 2020-04-14 15:28:30.400 | 2020-04-29 15:58:30.651
2851 | 46818 | 2020-06-05 12:41:45.790 | NULL
so i want a chart for current year to get the nb of account of the customer not for each month but for each modification
For exemple 02/01/2020 I will have 1 account
03/01/2020 I will have 0 account
It is possible to do that or something like that in SQL ? And how can I do it if it's possible.
get the nb of account of the customer not for each month but for each modification
Is this what you want?
select
x.customer_id,
x.modifdate,
sum(x.cnt) over(partition by x.customer_id order by x.modifdate) no_active_accounts
from mytable t
cross apply (
values (customer_id, createddate, 1), (customer_id, deleteddate, -1)
) as x(customer_id, modifdate, cnt)
where modifdate is not null
For each customer, this generates one record everytime an account is created or deleted, with the modification date and the running count of active accounts.
I have a SOME_DELTA table which records all party related transactions with amount change
Ex.:
PARTY_ID | SOME_DATE | AMOUNT
--------------------------------
party_id_1 | 2019-01-01 | 100
party_id_1 | 2019-01-15 | 30
party_id_1 | 2019-01-15 | -60
party_id_1 | 2019-01-21 | 80
party_id_2 | 2019-01-02 | 50
party_id_2 | 2019-02-01 | 100
I have a case where where MVC controller accepts map someMap(party_id, some_date) and I need to get part_id list with summed amount till specific some_date
In this case if I send mapOf("party_id_1" to Date(2019 - 1 - 15), "party_id_2" to Date(2019 - 1 - 2))
I should get list of party_id with summed amount till some_date
Output should look like:
party_id_1 | 70
party_id_2 | 50
Currently code is:
select sum(amount) from SOME_DELTA where party_id=:partyId and some_date <= :someDate
But in this case I need to iterate through map and do multiple DB calls for summed amount for eatch party_id till some_date which feels wrong
Is there a more delicate way to get in one select query? (to avoid +100 DB calls)
You can use a lateral join for this:
select map.party_id,
c.amount
from (
values
('party_id_1', date '2019-01-15'),
('party_id_2', date '2019-01-02')
) map (party_id, cutoff_date)
join lateral (
select sum(amount) amount
from some_delta sd
where sd.party_id = map.party_id
and sd.some_date <= map.cutoff_date
) c on true
order by map.party_id;
Online example
I have two tables that the first one stores task data (task name, create date, assign_to etc) and the second table stores task history data e.g operation_date,task completed, task rejected etc. (Task and Task_history tables)
Company creates tasks and assign them to employees, then employees accepted tasks and complete them.
Task create_date column specify the sequence of the task to do, both operation_date and completed status columns specify the sequence of the task complementation.
I need a query for reporting in employee detail that Does An Employee complete the tasks in a sequence specified at the beginning ? How many tasks completed accordance with the given sequence ?
I tried a query for status completed tasks that order tables for task_creation and operation_date for an employee for a given day. Then, add the rownum for select queries then join two tables. If rownums are equals, employee completes the task for given sequence else not. But the query result was not like what I expected. Rownums displaying like that, r_h--> 1,2,3 ; r_t--> 1,15,17
SELECT *
FROM (SELECT W.id, w.create_date, ROWNUM as r_t
FROM wfm_task_1 W where W.task_status = 3
ORDER BY W.create_date ASC) TASK_SEQ LEFT OUTER JOIN
( SELECT H.wfm_task, H.record_date, ROWNUM as r_h
FROM wfm_task_history H
WHERE H.task_status = 3
AND H.record_date BETWEEN (TO_DATE ('12.07.2013',
'DD.MM.YYYY')
- 1)
AND (TO_DATE ('12.07.2013',
'DD.MM.YYYY')
+ 1)
ORDER BY H.record_date ASC) HISTORY_SEQ
ON TASK_SEQ.id = HISTORY_SEQ.wfm_task
Sample dataset
wfm_task (ID, CREATION_DATE, TASK_NAME)
49361 | 06.07.2013 11:50:00 | missionx
49404 | 10.07.2013 13:01:00 | missiony
49407 | 11.07.2013 11:02:00 | missiona
49108 | 01.07.2013 21:02:00 | missionb
task_history (ID,WFM_TASK,OP_DATE, STATUS)
98 | 49361 | 12.07.2013 15:19:19 | 3
92 | 49404 | 12.07.2013 11:10:50 | 3
90 | 49407 | 12.07.2013 11:06:58 | 3
78 | 49108 | 03.07.2013 11:02:00 | 1
result (WFM_TASK,RECORD_DATE,R_H,ID,CREATE_DATE,R_T)
49361 | 12.07.2013 15:19:19 | 3 | 49361 | 06.07.2013 11:50:00 | 15
49404 | 12.07.2013 11:10:50 | 2 | 49404 | 10.07.2013 13:01:00 | 17
49407 | 12.07.2013 11:06:58 | 1 | 49407 | 11.07.2013 11:02:00 | 1
Status 3 = completed. I want to find that are the tasks completed by an order. I check that task complete order is likely to task creation order.
You'll probably have to use ROW_NUMBER function instead of ROWNUM.
SELECT a.id, a.create_date,
row_number() over (order by a.create_date) r_t,
b.record_date,
row_number() over (order by b.record_date) r_h
from wfm_task a left outer join task_history b
on a.id = b.wfm_task
where b.status = 3
and b.record_date between date'2013-07-12' - 1 and date'2013-07-12' + 1
Demo here.