Filling in for missing latest data with last available data

Filling in for missing latest data with last available data - sql

I have two tables, one (market_cap_data) with month_end_date, id, market_cap fields:
month_end_date id market_cap
2012-12-31 123456 5000
2011-12-31 123456 4000
and a second table (start_date_table) with month_end_date, id, start_date fields:
month_end_date id start_date
2011-12-31 123456 1980-12-31
I want to combine the two tables but the start_date_table data ends a year before the market_cap_data table. I want to fill the latest data where the start_date_table doesn't have data using the most recent start_date. For example, instead of an outside join like:
month_end_date id market_cap start_date
2012-12-31 123456 5000 NULL
2011-12-31 123456 4000 1980-12-31
I want it to look like
month_end_date id market_cap start_date
2012-12-31 123456 5000 1980-12-31
2011-12-31 123456 4000 1980-12-31
Tried a bunch of different things but can't figure it out.
Any help would be appreciated!

SELECT
m.month_end_date,
m.id,
m.market_cap,
CASE
WHEN s.start_date IS NOT NULL THEN s.start_date
ELSE (SELECT MAX(s2.start_date) FROM start_date_table s2 WHERE s2.id = m.id)
END AS start_date
FROM market_cap_data m
LEFT JOIN start_date_table s
ON m.id = s.id
AND m.month_end_date = s.month_end_date

I think you would benefit from a case statement, this is not tested as I don't have a fiddle to validate against
create function get_latest_date_from_table(varchar(100) table_name returns Date
(
return select max(date) from #table_name
)
create procedure modify_null_dates_for_marker
(
max_date Date;
max_date = get_latest_date_from_table('table');
select
foo,
bar
CASE WHEN start_date IS NULL
THEN max_date
ELSE start_date END AS start_date
FROM table
)
This should give a method to set the null columns correctly.

Related

Joining a transaction fact table to a periodic snapshot table in SQL using the nearest date

I am using Redshift on AWS and I have two tables, the first is a list of transactions like so:
cust_ID
order_date
product
100
2022/05/01
A
101
2022/05/01
A
100
2022/05/05
B
101
2022/05/07
B
The second is a snapshot table which has customer attributes for each customer at a specific point in time. Though the second table has rows for most dates, it doesn't have rows for every customer at every date.
cust_ID
as_of_date
favourite_colour
100
2022/05/01
blue
100
2022/05/02
red
100
2022/05/05
green
100
2022/05/07
red
101
2022/05/01
blue
101
2022/05/04
red
101
2022/05/05
green
101
2022/05/08
yellow
How can I join the tables such that the transaction table has the customer attributes either on the date of the order itself, or if the transaction date is not available in table 2, at the nearest available date before the transaction?
An example of the desired output would be:
cust_ID
order_date
product
Favourite_colour
as_of_date
100
2022/05/01
A
blue
2022/05/01
101
2022/05/01
A
blue
2022/05/01
100
2022/05/05
B
green
2022/05/05
101
2022/05/07
B
green
2022/05/05
Joining by cust_ID and order_date = as_of_date doesn't work due to edge cases where the order_date/id combination is not in the second table.
I've also tried something like:
with snapshot as (
SELECT
row_number() OVER(PARTITION BY cust_ID ORDER BY as_of_date DESC) as row_number,
cust_ID,
favourite_color,
as_of_date
FROM table2 t2
INNER JOIN table1 t1
ON t1.cust_ID = t2.cust_ID
AND t2.as_of_date <= t1.order_date
)
SELECT * FROM snapshot
WHERE row_number = 1
However, this doesn't handle cases where the same customer has multiple transactions in table 1. When I check the count of the resulting table, the number of distinct cust_IDs is the same as count(*) so it seems like the resulting table is only retaining one transaction per customer.
Any help would be appreciated.

Using your provided table inputs, I tested this solution in DB Fiddle and it works for your desired output.
with my_cte AS (
select *,
row_number() OVER(PARTITION BY cust_id, order_date ORDER BY as_of_date desc) ranked
from transactions
left join attribs using (cust_id)
where as_of_date <= order_date
)
select cust_id, order_date, product, favorite_color, as_of_date
from my_cte
where ranked = 1
order by order_date, cust_id;

Hive - Using Lateral View Explode with Joined Table

I am building some analysis and need to prep the date by joining two tables and then unpivot a date field and create one record for each "date_type". I have been trying to work with lateral view explode(array()) function but I can't figure out how to do this with columns from two separate tables. Any help would be appreciated, open to completely different methods.
TableA:
loan_number
app_date
123
07/09/2022
456
07/11/2022
TableB:
loan_number
funding_date
amount
123
08/13/2022
12000
456
08/18/2022
10000
Desired Result:
loan_number
date_type
date_value
amount
123
app_date
07/09/2022
12000
456
app_date
07/11/2022
10000
123
funding_date
08/13/2022
12000
456
funding_date
08/18/2022
10000
Here is some sample code related the example above I was trying to make work:
SELECT
b.loan_number,
b.amount,
Date_Value
FROM TableA as a
LEFT JOIN
TableB as b
ON a.loan_number=b.loan_number
LATERAL VIEW explode(array(to_date(a.app_date),to_date(b.funding_date)) Date_List AS Date_value

No need lateral view explode, just union, try below:
with base_data as (
select
a.loan_number,
a.app_date,
b.funding_date,
b.amount
from
tableA a
join
tableB b on a.loan_number = b.loan_number
)
select
loan_number,
'app_date' as date_type,
app_date as date_value,
amount
from
base_data
union all
select
loan_number,
'funding_date' as date_type,
funding_date as date_value,
amount
from
base_data

How to get the MAX value of unique column in sql and aggregate other?

I want get the row with max 'date', groupy just by unique 'id' but without considering another columns.
I tried this query:
(But don't work cause modify anothers columns)
SELECT id,
MAX(num),
MAX(date),-- I just want the max of this column
MAX(product_name),
MAX(other_columns)
FROM TB
GROUP BY id
Table:
id num date product_name other_columns
123 0001 2021-12-01 exit 12315413
123 0002 2021-12-02 entry 65481328
333 0001 2021-12-03 entry 13848136
333 ASDV 2021-12-04 exit 1325165
Expected Result:
id num date product_name
123 0002 2021-12-02 entry
333 ASDV 2021-12-04 exit
How to do that?

Sub-query with an inner join can take care of this pretty DBMS agnostically.
SELECT
t.ID
,t.date
,t.product_name
,t.other_columns
FROM tb as t
INNER JOIN (
SELECT
id
,MAX(date) as date
FROM tb
GROUP BY id
) as s on t.id = s.id and t.date = s.date

subquery calculate days between dates

Sub query, SQL, Oracle
I'm new to sub queries and hoping to get some assistance. My thought was the sub query would run first and then the outer query would execute based on the sub query filter of trans_code = 'ABC'. The query works but it pulls all dates from all transaction codes, trans_code 'ABC' and 'DEF' ect.
The end goal is to calculate the number of days between dates.
The table structure is:
acct_num effective_date
1234 01/01/2020
1234 02/01/2020
1234 03/01/2020
1234 04/01/2021
I want to execute a query to look like this:
account Effective_Date Effective_Date_2 Days_Diff
1234 01/01/2020 02/01/2020 31
1234 02/01/2020 03/01/2020 29
1234 03/01/2020 04/01/2021 395
1234 04/01/2021 0
Query:
SELECT t3.acct_num,
t3.trans_code,
t3.effective_date,
MIN (t2.effective_date) AS effective_date2,
MIN (t2.effective_date) - t3.effective_date AS days_diff
FROM (SELECT t1.acct_num, t1.trans_code, t1.effective_date
FROM lawd.trans t1
WHERE t1.trans_code = 'ABC') t3
LEFT JOIN lawd.trans t2 ON t3.acct_num = t2.acct_num
WHERE t3.acct_num = '1234' AND t2.effective_date > t3.effective_date
GROUP BY t3.acct_num, t3.effective_date, t3.trans_code
ORDER BY t3.effective_date asc
TIA!

Use lead():
select t.*,
lead(effective_date) over (partition by acct_num order by effect_date) as next_efffective_date,
(lead(effective_date) - effective_date) as diff
from lawd.trans t

Count two Columns with two Where Clauses

I know it's just late in the day and my brain is just fried....
Using Teradata, I need to COUNT DISTINCT MEMBERS that haven't had a TRANS in the past six months and also COUNT the number of TRANS they had historically (prior to the six months). We can just assume the cutoff date to be 01/01/2012. All table is contained in a single table.
For example:
Member | Tran Date
123 | 01/01/2011
789 | 06/01/2011
123 |10/31/2011
678 | 04/03/2011
789 | 06/01/2012
So 2 members had a total of 3 transactions dated prior to 1/1/2012 with no transactions later than 1/1/2012.
In this example, my result would be:
MEMBERS | TRANS
2 | 3

Try this solution:
SELECT
COUNT(DISTINCT member_id) AS MEMBERS,
COUNT(*) AS TRANS
FROM
tbl
WHERE
member_id NOT IN
(
SELECT DISTINCT member_id
FROM tbl
WHERE trans_date > '2012-01-01'
)

You can't do it in one SQL statement. Use subqueries. This is TSQL coz I am unfamiliar with Teradata.
DECLARE #CUTOFF DATETIME = DATEADD(MO,-6,GETDATE()) --6MTHS AGO
SELECT COUNT(MEMBERID) AS MEMBERS, SUM(TRANSCOUNT) AS TRANS FROM (
SELECT DISTINCT
MEMBERID,
(SELECT COUNT(*) TRANSDATE WHERE TRANSDATA.MEMBERID = MEMBER.MEMBERIF) AS TRANSCOUNT
FROM MEMBER WHERE NOT EXISTS
(SELECT * FROM TRANSDATA, MEMBER WHERE
TRANSDATA.MEMBERID = MEMBER.MEMBERIF
AND TRANDATE > #CUTOFF)
)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Filling in for missing latest data with last available data - sql

SELECT m.month_end_date, m.id, m.market_cap, CASE WHEN s.start_date IS NOT NULL THEN s.start_date ELSE (SELECT MAX(s2.start_date) FROM start_date_table s2 WHERE s2.id = m.id) END AS start_date FROM market_cap_data m LEFT JOIN start_date_table s ON m.id = s.id AND m.month_end_date = s.month_end_date

Related

Joining a transaction fact table to a periodic snapshot table in SQL using the nearest date

Hive - Using Lateral View Explode with Joined Table

How to get the MAX value of unique column in sql and aggregate other?

subquery calculate days between dates

Count two Columns with two Where Clauses

Categories

Resources