Teradata Correlated subquery - sql

I'm facing an issue since 2 days regarding this query :
select distinct a.id,
a.amount as amount1,
(select max (a.date) from t1 a where a.id=t.id and a.cesitc='0' and a.date<t.date) as date1,
t.id, t.amount as amount2, t.date as date2
from t1 a
inner join t1 t on t.id = a.id and a.cevexp in ('0', '1' )
and exists (select t.id from t1 t
where t.id= a.id and t.amount <> a.amount and t.date > a.date)
and t.cesitc='1' and t.dafms='2015-07-31' and t.date >='2015-04-30' and '2015-07-31' >= t.daefga
and '2015-07-31' <= t.daecga and t.cevexp='1' and t.amount >'1'
Some details, the goal is to compare the difference in valuation of assets (id), column n2 (a.amount/ amount1) is the one which needs to be corrected.
I would like my a.mount/amount1 being correlated with my subquery 'date1' which is actually not the case. Same criterias have to be applied to find the correct amount1.
The outcomes of this query are currently displaying like this :
Id Amount1 Date1 id amount2 date2
1 100 04/03/2014 1 150 30/06/2015
1 102 04/03/2014 1 150 30/06/2015
1 170 04/03/2014 1 150 30/06/2015
the Amount1 matches with all Date1 < date2 instead of max(date1) < date2 that's why I have several amount1
Thanks in advance for helping hand :)
have a good day !

You can access the previous row's data using a Windowed Aggregate Function, there's no LEAD/LAG in Teradata, but it's easy to rewrite.
This will return the correct data for your example:
SELECT t.*,
MIN(amount) -- previous amount
OVER (PARTITION BY Id
ORDER BY date_valuation, dafms DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_amount,
MIN(date_valuation) -- previous date
OVER (PARTITION BY Id
ORDER BY date_valuation, dafms DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_date
FROM test5 AS t
QUALIFY cesitc = '1' -- return only the current row
If it doesn't work as expected you need to add more details of the applied logic.
Btw, if a column is a DECIMAL you shouldn't add quotes, 150 instead of '150'. And there's only one recommended way to write a date, using a date literal, e.g. DATE '2015-07-31'

The final query :
SELECT a.id, a.mtvbie, a.date_valuation, t.id,
MIN(t.amount) -- previous amount
OVER (PARTITION BY t.Id
ORDER BY t.date_valuation, t.dafms DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_amount,
MIN(t.date_valuation) -- previous date
OVER (PARTITION BY t.Id
ORDER BY t.date_valuation, t.dafms DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS prev_date
FROM test5 t
inner join test5 a on a.id=t.id
where t.amount <> a.amount and a.cesitc='1' and a.date_valuation > t.date_valuation and a.dafms ='2015-07-31' and another criteria....
QUALIFY row_number () over (partition by a.id order a.cogarc)=1

Related

How to select only decreasing delta in values with increasing delta also present in data?

I have a warehouse which returns stock. Stock diminishes (orders) and increases (supplies). I have only stock values, nothing else. Say I have this data sorted by stock quantity ascending:
time
stock
00:11
7144 <--- current
00:10
7280
00:09
7416 <--- note increase, means new supply arrived
00:08
2259
00:07
2333
00:06
2538
00:05
2999
00:04
3074
00:03
3104 <--- start
I need to derive orders excluding supply, so max(qty)-min(qty) do not work, function has to mind the sudden increase and return only diminishing delta. So I expect in given values formula to be: orders = (3104-2259)+(7416-7144)
How would you approach this task?
Thanks.
You can join two "adjacent" rows and check that the stock of the earlier row is more than the stock of the later row to find your orders. Then sum over the difference of these stocks.
Assuming that the column time is unique, you can use following query:
SELECT SUM(t1.stock - t2.stock) AS sum_of_orders
FROM my_table t1
INNER JOIN my_table t2 ON t2.time > t1.time
AND NOT EXISTS(
SELECT 1
FROM my_table t3
WHERE t3.time > t1.time AND t3.time < t2.time)
WHERE t1.stock > t2.stock;
This is a type of gaps-and-islands problem. You can identify the "islands" by when the stock increases. Then within each island you want the max and min. You can summarize the islands using:
select min(stock), max(stock)
from (select t.*,
sum(case when prev_stock > stock then 0 else 1 end) over (order by time) as grp
from (select t.*,
lag(stock) over (order by time) as prev_stock
from t
) t
) t
group by grp;
Then one more summary gives the total you want:
select sum(max_stock - min_stock)
from (select select min(stock) as min_stock, max(stock) as max_stock
from (select t.*,
sum(case when prev_stock > stock then 0 else 1 end) over (order by time) as grp
from (select t.*,
lag(stock) over (order by time) as prev_stock
from t
) t
) t
group by grp
) t;
Here is a db<>fiddle.

Find the true start end dates for customers that have multiple accounts in SQL Server 2014

I have a checking account table that contains columns Cust_id (customer id), Open_Date (start date), and Closed_Date (end date). There is one row for each account. A customer can open multiple accounts at any given point. I would like to know how long the person has been a customer.
eg 1:
CREATE TABLE [Cust]
(
[Cust_id] [varchar](10) NULL,
[Open_Date] [date] NULL,
[Closed_Date] [date] NULL
)
insert into [Cust] values ('a123', '10/01/2019', '10/15/2019')
insert into [Cust] values ('a123', '10/12/2019', '11/01/2019')
Ideally I would like to insert this into a table with just one row, that says this person has been a customer from 10/01/2019 to 11/01/2019. (as he opened his second account before he closed his previous one.
Similarly eg 2:
insert into [Cust] values ('b245', '07/01/2019', '09/15/2019')
insert into [Cust] values ('b245', '10/12/2019', '12/01/2019')
I would like to see 2 rows in this case- one that shows he was a customer from 07/01 to 09/15 and then again from 10/12 to 12/01.
Can you point me to the best way to get this?
I would approach this as a gaps and islands problem. You want to group together groups of adjacents rows whose periods overlap.
Here is one way to solve it using lag() and a cumulative sum(). Everytime the open date is greater than the closed date of the previous record, a new group starts.
select
cust_id,
min(open_date) open_date,
max(closed_date) closed_date
from (
select
t.*,
sum(case when not open_date <= lag_closed_date then 1 else 0 end)
over(partition by cust_id order by open_date) grp
from (
select
t.*,
lag(closed_date) over (partition by cust_id order by open_date) lag_closed_date
from cust t
) t
) t
group by cust_id, grp
In this db fiddle with your sample data, the query produces:
cust_id | open_date | closed_date
:------ | :--------- | :----------
a123 | 2019-10-01 | 2019-11-01
b245 | 2019-07-01 | 2019-09-15
b245 | 2019-10-12 | 2019-12-01
I would solve this with recursion. While this is certainly very heavy, it should accommodate even the most complex account timings (assuming your data has such). However, if the sample data provided is as complex as you need to solve for, I highly recommend sticking with the solution provided above. It is much more concise and clear.
WITH x (cust_id, open_date, closed_date, lvl, grp) AS (
SELECT cust_id, open_date, closed_date, 1, 1
FROM (
SELECT cust_id
, open_date
, closed_date
, row_number()
OVER (PARTITION BY cust_id ORDER BY closed_date DESC, open_date) AS rn
FROM cust
) AS t
WHERE rn = 1
UNION ALL
SELECT cust_id, open_date, closed_date, lvl, grp
FROM (
SELECT c.cust_id
, c.open_date
, c.closed_date
, x.lvl + 1 AS lvl
, x.grp + CASE WHEN c.closed_date < x.open_date THEN 1 ELSE 0 END AS grp
, row_number() OVER (PARTITION BY c.cust_id ORDER BY c.closed_date DESC) AS rn
FROM cust c
JOIN x
ON x.cust_id = c.cust_id
AND c.open_date < x.open_date
) AS t
WHERE t.rn = 1
)
SELECT cust_id, min(open_date) AS first_open_date, max(closed_date) AS last_closed_date
FROM x
GROUP BY cust_id, grp
ORDER BY cust_id, grp
I would also add the caveat that I don't run on SQL Server, so there could be syntax differences that I didn't account for. Hopefully they are minor, if present.
you can try something like that:
select distinct
cust_id,
(select min(Open_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
),
(select max(Closed_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
)
from Cust as a
so, for every row - you're selecting minimal and maximal dates from all overlapping ranges, later distinct filters out duplicates

How to get the validity date range of a price from individual daily prices in SQL

I have some prices for the month of January.
Date,Price
1,100
2,100
3,115
4,120
5,120
6,100
7,100
8,120
9,120
10,120
Now, the o/p I need is a non-overlapping date range for each price.
price,from,To
100,1,2
115,3,3
120,4,5
100,6,7
120,8,10
I need to do this using SQL only.
For now, if I simply group by and take min and max dates, I get the below, which is an overlapping range:
price,from,to
100,1,7
115,3,3
120,4,10
This is a gaps-and-islands problem. The simplest solution is the difference of row numbers:
select price, min(date), max(date)
from (select t.*,
row_number() over (order by date) as seqnum,
row_number() over (partition by price, order by date) as seqnum2
from t
) t
group by price, (seqnum - seqnum2)
order by min(date);
Why this works is a little hard to explain. But if you look at the results of the subquery, you will see how the adjacent rows are identified by the difference in the two values.
SELECT Lag.price,Lag.[date] AS [From], MIN(Lead.[date]-Lag.[date])+Lag.[date] AS [to]
FROM
(
SELECT [date],[Price]
FROM
(
SELECT [date],[Price],LAG(Price) OVER (ORDER BY DATE,Price) AS LagID FROM #table1 A
)B
WHERE CASE WHEN Price <> ISNULL(LagID,1) THEN 1 ELSE 0 END = 1
)Lag
JOIN
(
SELECT [date],[Price]
FROM
(
SELECT [date],Price,LEAD(Price) OVER (ORDER BY DATE,Price) AS LeadID FROM [#table1] A
)B
WHERE CASE WHEN Price <> ISNULL(LeadID,1) THEN 1 ELSE 0 END = 1
)Lead
ON Lag.[Price] = Lead.[Price]
WHERE Lead.[date]-Lag.[date] >= 0
GROUP BY Lag.[date],Lag.[price]
ORDER BY Lag.[date]
Another method using ROWS UNBOUNDED PRECEDING
SELECT price, MIN([date]) AS [from], [end_date] AS [To]
FROM
(
SELECT *, MIN([abc]) OVER (ORDER BY DATE DESC ROWS UNBOUNDED PRECEDING ) end_date
FROM
(
SELECT *, CASE WHEN price = next_price THEN NULL ELSE DATE END AS abc
FROM
(
SELECT a.* , b.[date] AS next_date, b.price AS next_price
FROM #table1 a
LEFT JOIN #table1 b
ON a.[date] = b.[date]-1
)AA
)BB
)CC
GROUP BY price, end_date

Calculating difference in rows for many columns in SQL (Access)

What's up guys. I have an other question regarding using SQL to analyze. I have a table build like this.
ID Date Value
1 31.01.2019 10
1 30.01.2019 5
2 31.01.2019 20
2 30.01.2019 10
3 31.01.2019 30
3 30.01.2019 20
With many different IDs and many different Dates. What I would like to have as an output is an additional column, that gives me the difference to the previous date for each ID. So that I can then analyze the change of values between days for each Category (ID). To do that I would need to avoid that the command computes the difference of Last Day WHERE ID = 1 - First Day WHERE ID = 2.
Desired Output:
ID Date Difference to previous Days
1 31.01.2019 5
2 31.01.2019 10
3 31.01.2019 10
In the end I want to find outlier, so days where the difference in value between two days is very large. Does anyone have a solution? If it is not possible with Access, I am open to solutions with Excel, but Access should be the first choice as it is more scaleable.
Greetings and thanks in advance!!
With a self join:
select t1.ID, t1.[Date],
t1.[Value] - t2.[Value] as [Difference to previous Day]
from tablename t1 inner join tablename t2
on t2.[ID] = t1.[ID] and t2.[Date] = t1.[Date] - 1
Results:
ID Date Difference to previous Day
1 31/1/2019 5
2 31/1/2019 10
3 31/1/2019 10
Edit.
For the case that there are gaps between your dates:
select
t1.ID, t1.[Date], t1.[Value] - t2.[Value] as [Difference to previous Day]
from (
select t.ID, t.[Date], t.[Value],
(select max(tt.[Date]) from tablename as tt where ID = t.ID and tt.[Date] < t.[Date]) as prevdate
from tablename as t
) as t1 inner join tablename as t2
on t2.ID = t1.ID and t2.[Date] = t1.prevdate
In your example data, each id has the same two rows and the values are increasing. If this is generally true, then you can simply use aggregation:
select id, max(date), max(value) - min(value)
from t
group by id;
If the values might not be increasing, but the dates are the same, then you can use conditional aggregation:
select id,
max(date),
(max(iif(date = "31.01.2019", value, null)) -
max(iif(date = "30.01.2019", value, null))
) as diff
from t
group by id;
Note: Your date looks like it is using a bespoke format, so I am just doing the comparison as a string.
If previous date is exactly one day before, you can use a join:
select t.*,
(t.value - tprev.value) as diff
from t left join
t as tprev
on t.id = tprev.di and t.date = dateadd("d", 1, tprev.date);
If date is arbitrarily the previous date in the table, then you can use a correlated subquery
select t.*,
(t.value -
(select top (1) tprev.value
from t as tprev
where tprev.id = t.id and tprev.date < t.date
order by tprev.date desc
)
) as diff
(t.value - tprev.value) as diff
from t;
You can use a self join with an additional condition using a sub-query to determine the previous date
SELECT t.ID, t.Date, t.Value - prev.Value AS Diff
FROM
dtvalues AS t
INNER JOIN dtvalues AS prev
ON t.ID = prev.ID
WHERE
prev.[Date] = (SELECT MAX(x.[Date]) FROM dtvalues x WHERE x.ID=t.ID AND x.[Date]<t.[Date])
ORDER BY t.ID, t.[Date];
You could also include the where condition into the join condition, but the query designer would not be able to handle the query anymore. Like this, you can still edit the query in the query designer.

Oracle SQL: Show entries from component tables once apiece

My objective is produce a dataset that shows a boatload of data from, in total, just shy of 50 tables, all in the same Oracle SQL database schema. Each table except the first consists of, as far as the report I'm building cares, two elements:
A foreign-key identifier that matches a row on the first table
A date
There may be many rows on one of these tables corresponding to one case, and it will NOT be the same number of rows from table to table.
My objective is to have each row in the first table show up as many times as needed to display all the results from the other tables once. So, something like this (except on a lot more tables):
CASE_FILE_ID INITIATED_DATE INSPECTION_DATE PAYMENT_DATE ACTION_DATE
------------ -------------- --------------- ------------ -----------
1000 10-JUL-1986 14-JUL-1987 10-JUL-1986
1000 14-JUL-1988 10-JUL-1987
1000 14-JUL-1989 10-JUL-1988
1000 10-JUL-1989
My current SQL code (shrunk down to five tables, but the rest all follow the same format as T1-T4):
SELECT DISTINCT
A.CASE_FILE_ID,
T1.DATE AS INITIATED_DATE,
T2.DATE AS INSPECTION_DATE,
T3.DATE AS PAYMENT_DATE,
T4.DATE AS ACTION_DATE
FROM
RECORDS.CASE_FILE A
LEFT OUTER JOIN RECORDS.INITIATE T1 ON A.CASE_FILE_ID = T1.CASE_FILE_ID
LEFT OUTER JOIN RECORDS.INSPECTION T2 ON A.CASE_FILE_ID = T2.CASE_FILE_ID
LEFT OUTER JOIN RECORDS.PAYMENT T3 ON A.CASE_FILE_ID = T3.CASE_FILE_ID
LEFT OUTER JOIN RECORDS.ACTION T4 ON A.CASE_FILE_ID = T4.CASE_FILE_ID
ORDER BY
A.CASE_FILE_ID
The problem is, the output this produces results in distinct combinations; so in the above example (where I added a 'WHERE' clause of A.CASE_FILE_ID = '1000'), instead of four rows for case 1000, it'd show twelve (1 Initiated Date * 3 Inspection Dates * 4 Payment Dates = 12 rows). Suffice it to say, as the number of tables increases, this would get very prohibitive in both display and runtime, very quickly.
What is the best way to get an output loosely akin to the ideal above, where any one date is only shown once? Failing that, is there a way to get it to only show as many lines for one CASE_FILE as it needs to show all the dates, even if some dates repeat within that?
There isn't a good way, but there are two ways. One method involves subqueries for each table and complex outer joins. The second involves subqueries and union all. Let's go with that one:
SELECT CASE_FILE_ID,
MAX(INITIATED_DATE) as INITIATED_DATE,
MAX(INSPECTION_DATE) as INSPECTION_DATE,
MAX(PAYMENT_DATE) as PAYMENT_DATE,
MAX(ACTION) as ACTION
FROM ((SELECT A.CASE_FILE_ID, NULL as INITIATED_DATE, NULL as INSPECTION_DATE,
NULL as PAYMENT_DATE, NULL as ACTION_DATE,
1 as seqnum
FROM RECORDS.CASE_FILE A
) UNION ALL
(SELECT T1.CASE_FILE_ID, DATE as INITIATED_DATE, NULL as INSPECTION_DATE,
NULL as PAYMENT_DATE, NULL as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INITIATE
) UNION ALL
(SELECT T1.CASE_FILE_ID, NULL as INITIATED_DATE, DATE as INSPECTION_DATE,
NULL as PAYMENT_DATE, NULL as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INSPECTION
) UNION ALL
(SELECT T1.CASE_FILE_ID, NULL as INITIATED_DATE, NULL as INSPECTION_DATE,
DATE as PAYMENT_DATE, NULL as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.PAYMENT
) UNION ALL
(SELECT T1.CASE_FILE_ID, NULL as INITIATED_DATE, NULL as INSPECTION_DATE,
NULL as PAYMENT_DATE, ACTION as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.ACTION
)
) a
GROUP BY CASE_FILE_ID, seqnum;
Hmmm, a closely related solution is easier to maintain:
SELECT CASE_FILE_ID,
MAX(CASE WHEN type = 'INITIATED' THEN DATE END) as INITIATED_DATE,
MAX(CASE WHEN type = 'INSPECTION' THEN DATE END) as INSPECTION_DATE,
MAX(CASE WHEN type = 'PAYMENT' THEN DATE END) as PAYMENT_DATE,
MAX(CASE WHEN type = 'ACTION' THEN DATE END) as ACTION
FROM ((SELECT A.CASE_FILE_ID, NULL as TYPE, NULL as DATE,
1 as seqnum
FROM RECORDS.CASE_FILE A
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'INSPECTION', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INITIATE
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'INSPECTION', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INSPECTION
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'PAYMENT', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.PAYMENT
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'ACTION', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.ACTION
)
) a
GROUP BY CASE_FILE_ID, seqnum;