I have a table where the columns are:
Transaction_id(T_id): Distinct id generated for each transactions
Date(Dt): Date of Transaction
account-id(Ac_id): The id from which the transaction is done
Org_id(O_id): It is the id given to the organizations. One organization can have multiple accounts thereby different account id can have the same org_id
Sample table:
T_id
Dt
Ac_id
O_id
101
23/4/22
1
A
102
06/7/22
3
C
103
01/8/22
2
A
104
13/3/22
6
B
*The question is to mark the o_id where transactions are done in the past 90 days as 1 and others as 0
Output
T_id
Dt.
Ac_id.
O_id
Mark
101
23/4/22
1
A
0
102
06/7/22
3
C
1
103
01/8/22
2
A
1
104
13/3/22
6
B
0
The query I am using is:
Select *,
Case when datediff('day', Dt, current_date()) between 0 and 90 then '1'
Else '0'
End as Mark
From Table1
Desired Output:
T_id
Dt.
Ac_id.
O_id
Mark
101
23/4/22
1
A
1
102
06/7/22
3
C
1
103
01/8/22
2
A
1
104
13/3/22
6
B
0
for o_id 'A' from the output the mark I want is 1 in all cases as one transaction is done past 90 days, irrespective of other transactions done prior to 90days.
I have to join this out to another table so need all o_id where ever any one transaction is done in the past 90 days as '1'.
Please help me with it quickly.
The easisest approach is to compare date difference of current date against windowed MAX partitioned by o_id:
SELECT *,
CASE
WHEN DATEDIFF('day', (MAX(Dt) OVER(PARTITION BY o_id)), CURRENT_DATE()) <= 90
THEN 1
ELSE 0
END AS Mark
FROM Tab;
Sample data:
ALTER SESSION SET DATE_INPUT_FORMAT = 'DD/MM/YYYY';
CREATE OR REPLACE TABLE tab(t_id INT,
Dt Date,
Ac_id INT,
O_id TEXT)
AS
SELECT 101, '23/04/2022' ,1 ,'A' UNION
SELECT 102, '06/07/2022' ,3 ,'C' UNION
SELECT 103, '01/08/2022' ,2 ,'A' UNION
SELECT 104, '13/03/2022' ,6 ,'B';
Output:
Snowflake supports natively BOOLEAN data types so entire query could be just:
SELECT *,
DATEDIFF('day', (MAX(Dt) OVER(PARTITION BY o_id)), CURRENT_DATE()) <= 90 AS Mark
FROM tab
Create a subquery where you identify all the distinct o_id where there is a recent transaction, and use that to update the main result.
The subquery would be:
select o_id, dt from table1
group by o_id
having datediff('day', max(Dt), current_date()) between 0 and 90;
Then your main query becomes:
Select *,'1' as Mark
From Tab
where o_id in
(select x.o_id from (select o_id, max(Dt)
from tab
group by o_id
having datediff('day', max(Dt), current_date()) between 0 and 90) x)
union all
select *,'0' as Mark
from Tab
where o_id not in
(select x.o_id from (select o_id, max(Dt)
from tab
group by o_id
having datediff('day', max(Dt), current_date()) between 0 and 90) x);
I have a requirement to insert a record in a SCD- 2 table. Databse we are using Oracle 12C. Situation is as below -
Current record set in SCD2 table -
Prod_Id Begin_Version_dt End_version_dt
'1234', '2020-03-10', '2020-04-09'
'1234', '2020-04-10', '2020-05-10'
'1234', '2020-05-11', '9999-12-31'
A record came in Prod transaction table as below -
Prod_Id Trans_dt
'1234', '2020-05-15'
The updated record set in SCD2 should be -
Prod_Id Begin_Version_dt End_version_dt
'1234', '2020-03-10', '2020-04-09'
'1234', '2020-04-10', '2020-05-10'
'1234', '2020-05-11', '2020-05-14'
'1234', '2020-05-15', '9999-12-31'
I have tried using LEAD and LAG function but they are not giving me extra record set.Any pointer will be a great help.
You need to use two queries. First for inserting a record and then second for updating the dates as follows:
Creating the sample data
SQL> CREATE TABLE yourTable AS (
2 SELECT '1234' AS Prod_Id, date '2020-03-10' AS Begin_Version_dt, date '2020-04-09' AS End_version_dt FROM dual UNION ALL
3 SELECT '1234', date '2020-04-10', date '2020-05-10' FROM dual UNION ALL
4 SELECT '1234', date '2020-05-11', date '9999-12-31' FROM dual
5 );
Table created.
SQL>
SQL> CREATE TABLE prod_table as
2 (SELECT 1234 AS PROD_ID, DATE '2020-05-15' AS TRANS_DATE FROM DUAL);
Table created.
Current view of data:
SQL> SELECT * FROM YOURTABLE ORDER BY BEGIN_VERSION_DT;
PROD BEGIN_VER END_VERSI
---- --------- ---------
1234 10-MAR-20 09-APR-20
1234 10-APR-20 10-MAY-20
1234 11-MAY-20 31-DEC-99
SQL> select * from prod_table;
PROD_ID TRANS_DAT
---------- ---------
1234 15-MAY-20
Queries that you are looking for
SQL> INSERT INTO yourTable Y
2 SELECT PROD_ID, TRANS_DATE, TRANS_DATE FROM PROD_TABLE;
1 row created.
SQL>
SQL>
SQL> UPDATE YOURTABLE YY
2 SET END_VERSION_DT = COALESCE(
3 (SELECT LD_BEGIN_DT
4 FROM (SELECT Y.BEGIN_VERSION_DT,
5 Y.PROD_ID,
6 LEAD(Y.BEGIN_VERSION_DT)
7 OVER (PARTITION BY Y.PROD_ID
8 ORDER BY Y.BEGIN_VERSION_DT) - 1 AS LD_BEGIN_DT
9 FROM YOURTABLE Y) Y
10 WHERE YY.PROD_ID = Y.PROD_ID
11 AND YY.BEGIN_VERSION_DT = Y.BEGIN_VERSION_DT),
12 DATE '9999-12-31');
4 rows updated.
Updated data:
SQL> SELECT * FROM YOURTABLE ORDER BY BEGIN_VERSION_DT;
PROD BEGIN_VER END_VERSI
---- --------- ---------
1234 10-MAR-20 09-APR-20
1234 10-APR-20 10-MAY-20
1234 11-MAY-20 14-MAY-20
1234 15-MAY-20 31-DEC-99
SQL>
You may try using LEAD with a default value:
SELECT
Prod_Id,
Begin_Version_dt,
COALESCE(End_Version_dt,
LEAD(Begin_Version_dt, 1, date '9999-12-31')
OVER (PARTITION BY Prod_Id ORDER BY Begin_Version_dt)) AS End_Version_dt
FROM yourTable
ORDER BY
Prod_Id,
Begin_Version_dt;
Demo
The logic here is that we select, if available, any non NULL end version date. In the event that an end version date be not available, we look to take the lead of the next begin date in the sequence. In the event that this, too, be not available, then we default to reporting 9999-12-31 as the end version date.
I hope I can describe my challenge in an understandable way.
I have two tables on a Oracle Database 12c which look like this:
Table name "Invoices"
I_ID | invoice_number | creation_date | i_amount
------------------------------------------------------
1 | 10000000000 | 01.02.2016 00:00:00 | 30
2 | 10000000001 | 01.03.2016 00:00:00 | 25
3 | 10000000002 | 01.04.2016 00:00:00 | 13
4 | 10000000003 | 01.05.2016 00:00:00 | 18
5 | 10000000004 | 01.06.2016 00:00:00 | 12
Table name "payments"
P_ID | reference | received_date | p_amount
------------------------------------------------------
1 | PAYMENT01 | 12.02.2016 13:14:12 | 12
2 | PAYMENT02 | 12.02.2016 15:24:21 | 28
3 | PAYMENT03 | 08.03.2016 23:12:00 | 2
4 | PAYMENT04 | 23.03.2016 12:32:13 | 30
5 | PAYMENT05 | 12.06.2016 00:00:00 | 15
So I want to have a select statement (maybe with oracle analytic functions but I am not really familiar with it) where the payments are getting summed up till the amount of an invoice is reached, ordered by dates. If the sum of for example two payments is more than the invoice amount the rest of the last payment amount should be used for the next invoice.
In this example the result should be like this:
invoice_number | reference | used_pay_amount | open_inv_amount
----------------------------------------------------------
10000000000 | PAYMENT01 | 12 | 18
10000000000 | PAYMENT02 | 18 | 0
10000000001 | PAYMENT02 | 10 | 15
10000000001 | PAYMENT03 | 2 | 13
10000000001 | PAYMENT04 | 13 | 0
10000000002 | PAYMENT04 | 13 | 0
10000000003 | PAYMENT04 | 4 | 14
10000000003 | PAYMENT05 | 14 | 0
10000000004 | PAYMENT05 | 1 | 11
It would be nice if there is a solution with a "simple" select statement.
thx in advance for your time ...
Oracle Setup:
CREATE TABLE invoices ( i_id, invoice_number, creation_date, i_amount ) AS
SELECT 1, 100000000, DATE '2016-01-01', 30 FROM DUAL UNION ALL
SELECT 2, 100000001, DATE '2016-02-01', 25 FROM DUAL UNION ALL
SELECT 3, 100000002, DATE '2016-03-01', 13 FROM DUAL UNION ALL
SELECT 4, 100000003, DATE '2016-04-01', 18 FROM DUAL UNION ALL
SELECT 5, 100000004, DATE '2016-05-01', 12 FROM DUAL;
CREATE TABLE payments ( p_id, reference, received_date, p_amount ) AS
SELECT 1, 'PAYMENT01', DATE '2016-01-12', 12 FROM DUAL UNION ALL
SELECT 2, 'PAYMENT02', DATE '2016-01-13', 28 FROM DUAL UNION ALL
SELECT 3, 'PAYMENT03', DATE '2016-02-08', 2 FROM DUAL UNION ALL
SELECT 4, 'PAYMENT04', DATE '2016-02-23', 30 FROM DUAL UNION ALL
SELECT 5, 'PAYMENT05', DATE '2016-05-12', 15 FROM DUAL;
Query:
WITH total_invoices ( i_id, invoice_number, creation_date, i_amount, i_total ) AS (
SELECT i.*,
SUM( i_amount ) OVER ( ORDER BY creation_date, i_id )
FROM invoices i
),
total_payments ( p_id, reference, received_date, p_amount, p_total ) AS (
SELECT p.*,
SUM( p_amount ) OVER ( ORDER BY received_date, p_id )
FROM payments p
)
SELECT invoice_number,
reference,
LEAST( p_total, i_total )
- GREATEST( p_total - p_amount, i_total - i_amount ) AS used_pay_amount,
GREATEST( i_total - p_total, 0 ) AS open_inv_amount
FROM total_invoices
INNER JOIN
total_payments
ON ( i_total - i_amount < p_total
AND i_total > p_total - p_amount );
Explanation:
The two sub-query factoring (WITH ... AS ()) clauses just add an extra virtual column to the invoices and payments tables with the cumulative sum of the invoice/payment amount.
You can associate a range with each invoice (or payment) as the cumulative amount owing (paid) before the invoice (payment) was placed and the cumulative amount owing (paid) after. The two tables can then be joined where there is an overlap of these ranges.
The open_inv_amount is the positive difference between the cumulative amount invoiced and the cumulative amount paid.
The used_pay_amount is slightly more complicated but you need to find the difference between the lower of the current cumulative invoice and payment totals and the higher of the previous cumulative invoice and payment totals.
Output:
INVOICE_NUMBER REFERENCE USED_PAY_AMOUNT OPEN_INV_AMOUNT
-------------- --------- --------------- ---------------
100000000 PAYMENT01 12 18
100000000 PAYMENT02 18 0
100000001 PAYMENT02 10 15
100000001 PAYMENT03 2 13
100000001 PAYMENT04 13 0
100000002 PAYMENT04 13 0
100000003 PAYMENT04 4 14
100000003 PAYMENT05 14 0
100000004 PAYMENT05 1 11
Update:
Based on mathguy's method of using UNION to join the data, I came up with a different solution re-using some of my code.
WITH combined ( invoice_number, reference, i_amt, i_total, p_amt, p_total, total ) AS (
SELECT invoice_number,
NULL,
i_amount,
SUM( i_amount ) OVER ( ORDER BY creation_date, i_id ),
NULL,
NULL,
SUM( i_amount ) OVER ( ORDER BY creation_date, i_id )
FROM invoices
UNION ALL
SELECT NULL,
reference,
NULL,
NULL,
p_amount,
SUM( p_amount ) OVER ( ORDER BY received_date, p_id ),
SUM( p_amount ) OVER ( ORDER BY received_date, p_id )
FROM payments
ORDER BY 7,
2 NULLS LAST,
1 NULLS LAST
),
filled ( invoice_number, reference, i_prev, i_total, p_prev, p_total ) AS (
SELECT FIRST_VALUE( invoice_number ) IGNORE NULLS OVER ( ORDER BY ROWNUM ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ),
FIRST_VALUE( reference ) IGNORE NULLS OVER ( ORDER BY ROWNUM ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ),
FIRST_VALUE( i_total - i_amt ) IGNORE NULLS OVER ( ORDER BY ROWNUM ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ),
FIRST_VALUE( i_total ) IGNORE NULLS OVER ( ORDER BY ROWNUM ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ),
FIRST_VALUE( p_total - p_amt ) IGNORE NULLS OVER ( ORDER BY ROWNUM ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ),
COALESCE(
p_total,
LEAD( p_total ) IGNORE NULLS OVER ( ORDER BY ROWNUM ),
LAG( p_total ) IGNORE NULLS OVER ( ORDER BY ROWNUM )
)
FROM combined
),
vals ( invoice_number, reference, upa, oia, prev_invoice ) AS (
SELECT invoice_number,
reference,
COALESCE( LEAST( p_total - i_total ) - GREATEST( p_prev, i_prev ), 0 ),
GREATEST( i_total - p_total, 0 ),
LAG( invoice_number ) OVER ( ORDER BY ROWNUM )
FROM filled
)
SELECT invoice_number,
reference,
upa AS used_pay_amount,
oia AS open_inv_amount
FROM vals
WHERE upa > 0
OR ( reference IS NULL AND invoice_number <> prev_invoice AND oia > 0 );
Explanation:
The combined sub-query factoring clause joins the two tables with a UNION ALL and generates the cumulative totals for the amounts invoiced and paid. The final thing it does is order the rows by their ascending cumulative total (and if there are ties it will put the payments, in order created, before the invoices).
The filled sub-query factoring clause will fill the previously generated table so that if a value is null then it will take the value from the next non-null row (and if there is an invoice with no payments then it will find the total of the previous payments from the preceding rows).
The vals sub-query factoring clause applies the same calculations as my previous query (see above). It also adds the prev_invoice column to help identify invoices which are entirely unpaid.
The final SELECT takes the values and filters out the unnecessary rows.
Here is a solution that doesn't require a join. This is important if the amount of data is significant. I did some testing on my laptop (nothing commercial), using the free edition (XE) of Oracle 11.2. Using MT0's solution, the query with the join takes about 11 seconds if there are 10k invoices and 10k payments. For 50k invoices and 50k payments, the query took 287 seconds (almost 5 minutes). This is understandable, since joining two 50k tables requires 2.5 billion comparisons.
The alternative below uses a union. It uses lag() and last_value() to do the work the join does in the other solution. This union-based solution, with 50k invoices and 50k payments, took less than 0.5 seconds on my laptop (!)
I simplified the setup a bit; i_id, invoice_number and creation_date are all used for one purpose only: to order the invoice amounts. I use just an inv_id (invoice id) for that purpose, and similar for payments..
For testing purposes, I created tables invoices and payments like so:
create table invoices (inv_id, inv_amt) as
(select level, trunc(dbms_random.value(20, 80)) from dual connect by level <= 50000);
create table payments (pmt_id, pmt_amt) as
(select level, trunc(dbms_random.value(20, 80)) from dual connect by level <= 50000);
Then, to test the solutions, I use the queries to populate a CTAS, like this:
create table bal_of_pmts as
[select query, including the WITH clause but without the setup CTE's, comes here]
In my solution, I look to show the allocation of payments to one or more invoice, and the payment of invoices from one or more payments; the output discussed in the original post only covers half of this information, but for symmetry it makes more sense to me to show both halves. The output (for the same inputs as in the original post) looks like this, with my version of inv_id and pmt_id:
INV_ID PAID UNPAID PMT_ID USED AVAILABLE
---------- ---------- ---------- ---------- ---------- ----------
1 12 18 101 12 0
1 18 0 103 18 10
2 10 15 103 10 0
2 2 13 105 2 0
2 13 0 107 13 17
3 13 0 107 13 4
4 4 14 107 4 0
4 14 0 109 14 1
5 1 11 109 1 0
5 11 0 11
Notice how the left half is what the original post requested. There is an extra row at the end. Notice the NULL for payment id, for a payment of 11 - that shows how much of the last payment is left uncovered. If there was an invoice with id = 6, for an amount of, say, 22, then there would be one more row - showing the entire amount (22) of that invoice as "paid" from a payment with no id - meaning actually not covered (yet).
The query may be a little easier to understand than the join approach. To see what it does, it may help to look closely at intermediate results, especially the CTE c (in the WITH clause).
with invoices (inv_id, inv_amt) as (
select 1, 30 from dual union all
select 2, 25 from dual union all
select 3, 13 from dual union all
select 4, 18 from dual union all
select 5, 12 from dual
),
payments (pmt_id, pmt_amt) as (
select 101, 12 from dual union all
select 103, 28 from dual union all
select 105, 2 from dual union all
select 107, 30 from dual union all
select 109, 15 from dual
),
c (kind, inv_id, inv_cml, pmt_id, pmt_cml, cml_amt) as (
select 'i', inv_id, sum(inv_amt) over (order by inv_id), null, null,
sum(inv_amt) over (order by inv_id)
from invoices
union all
select 'p', null, null, pmt_id, sum(pmt_amt) over (order by pmt_id),
sum(pmt_amt) over (order by pmt_id)
from payments
),
d (inv_id, paid, unpaid, pmt_id, used, available) as (
select last_value(inv_id) ignore nulls over (order by cml_amt desc),
cml_amt - lead(cml_amt, 1, 0) over (order by cml_amt desc),
case kind when 'i' then 0
else last_value(inv_cml) ignore nulls
over (order by cml_amt desc) - cml_amt end,
last_value(pmt_id) ignore nulls over (order by cml_amt desc),
cml_amt - lead(cml_amt, 1, 0) over (order by cml_amt desc),
case kind when 'p' then 0
else last_value(pmt_cml) ignore nulls
over (order by cml_amt desc) - cml_amt end
from c
)
select inv_id, paid, unpaid, pmt_id, used, available
from d
where paid != 0
order by inv_id, pmt_id
;
In most cases, CTE d is all we need. However, if the cumulative sum for several invoices is exactly equal to the cumulative sum for several payments, my query would add a row with paid = unpaid = 0. (MT0's join solution does not have this problem.) To cover all possible cases, and not have rows with no information, I had to add the filter for paid != 0.
I have table consisting of these fields:
id | date_from | date_to | price | status
----------------------------------------------------------
CK1 22-12-2012 29-12-2012 800 1
CK1 22-12-2012 29-12-2012 1200 1
CK2 24-12-2012 30-12-2012 1400 0
CK2 24-12-2012 30-12-2012 1800 1
CK2 24-12-2012 30-12-2012 2200 1
How do I create SQL select that groups results by ID, DATE_FROM, DATE_TO and picks lowest value from price where status == 1 and also to count amount of how many records where grouped?
So result would be
id | date_from | date_to | price | count
CK1 22-12-2012 29-12-2012 800 2
CK2 24-12-2012 30-12-2012 1800 2
And maybe, is there a way to find out how many of records were not grouped because of status == 0? This is not very important, I am just wondering whether there is a way how to find out number of uncounted records for group of records.
Your description doesn't match what you want the result to be.
This will match your description, i.e. give you the lowest price where status is 1, and count the number of records in the group:
select id, date_from, date_to, min(case status when 1 then price end) as price, count(*) as count
from TheTable
group by id, date_from, date_to
Result:
id | date_from | date_to | price | count
CK1 22-12-2012 29-12-2012 800 2
CK2 24-12-2012 30-12-2012 1800 3
This will give you the result that you asked for, i.e. filter out the records where status is 1, get you the lowest price, and get the number of records in the groups after filtering:
select id, date_from, date_to, min(price) as price, count(*) as count
from TheTable
where status = 1
group by id, date_from, date_to
Result:
id | date_from | date_to | price | count
CK1 22-12-2012 29-12-2012 800 2
CK2 24-12-2012 30-12-2012 1800 2
To get the number of records where the status is 0, you need to use the first method, where you don't filter out those records. If the status only can be 0 or 1, you can simply use sum(status) to get the number of records where the status is 1, and count(case status when 0 then 1 end) or sum(1 - status) to get the number of records where the status is 0.
Something like should do the trick:
select ID, DATE_FROM, DATE_TO, MIN(price), COUNT(*) from my_table where satus = 1 group by ID, DATE_FROM, DATE_TO
One remark: is it normal that an ID field can have the same value?
MSDN article on "group by" and onother on devguru
Maybe something like this:
test data
DECLARE #tbl TABLE
(
id VARCHAR(100),
date_from VARCHAR(100),
date_to VARCHAR(100),
price INT,
status INT
)
INSERT INTO #tbl
VALUES
('CK1','22-12-2012','29-12-2012',800,1),
('CK1','22-12-2012','29-12-2012',1200,1),
('CK2','22-12-2012','29-12-2012',1400,0),
('CK2','22-12-2012','29-12-2012',1800,1),
('CK2','22-12-2012','29-12-2012',2200,1)
Query
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY tbl.id ORDER BY tbl.price ASC) AS RowNbr,
SUM(status) OVER(PARTITION BY tbl.id) AS [count],
tbl.*
FROM
#tbl AS tbl
WHERE
status=1
)
SELECT
CTE.id,
CTE.date_from,
CTE.date_to,
CTE.price,
CTE.[count]
FROM
CTE
WHERE
CTE.RowNbr=1