SQL fill missing date and null value - sql

I am using SAS Enterprise Guide 8.3 to connect IBM DB2.
I want to join and fill missing date and values.
I have a full calendar date table from 1/1/2021 up to yesterday.
Each ID can work 5 days to 7 days a week.
There is no target for Sunday.
4/3/2022 is Sunday.
Code is
SELECT t2.ID,
t1.CAL_DT, COALESCE(t2.EXPRESS,0) AS EXPRESS, COALESCE(t2.OTHRES,0) AS OTHRES , COALESCE(t2.CPRO_RPT,0) AS Total
FROM WORK.QUERY_FOR_DATE t1
left outer JOIN WORK.QUERY_FOR_CPRO_0000 t2 on t1.cal_dt = t2.cal_dt
ORDER BY t2.ID asc ,t1.CAL_DT asc;
Sample tables are below.
Table 1.
ID
Date
Express
Others
Total
001
4/1/2022
0
2
2
001
4/2/2022
2
3
5
001
4/4/2022
1
2
3
001
4/5/2022
2
2
4
002
4/1/2022
0
3
3
002
4/4/2022
3
3
6
002
4/5/2022
1
2
3
003
4/1/2022
3
3
6
003
4/2/2022
4
4
8
003
4/3/2022
1
1
2
003
4/4/2022
3
4
7
003
4/6/2022
2
4
6
Table 2.
ID
Date
Target
001
4/1/2022
4
001
4/2/2022
4
001
4/4/2022
4
001
4/5/2022
4
002
4/1/2022
6
002
4/2/2022
6
002
4/4/2022
6
002
4/5/2022
6
003
4/1/2022
8
003
4/2/2022
8
003
4/4/2022
8
003
4/5/2022
8
I want the result in Table 3.
ID
Date
Express
Others
Total
Target
001
4/1/2022
0
2
2
4
001
4/2/2022
2
3
5
4
001
4/3/2022
0
0
0
0
001
4/4/2022
1
2
3
4
001
4/5/2022
2
2
4
4
002
4/1/2022
0
3
3
6
002
4/2/2022
0
0
0
6
002
4/3/2022
0
0
0
0
002
4/4/2022
3
3
6
6
002
4/5/2022
1
2
3
6
003
4/1/2022
3
3
6
8
003
4/2/2022
4
4
8
8
003
4/3/2022
1
1
2
0
003
4/4/2022
3
4
7
8
003
4/5/2022
2
4
6
8

I would use your date table to cross join with your ID to build a permutation of those two, then you can join ID back into the set and get a full set of ID and Dates along with your real data from your CPRO table. You could join back in again between MIN and MAX date to remove all the dates in your date table.
SELECT X.ID, X.CAL_DT
, COALESCE(T3.EXPRESS,0) EXPRESS
, COALESCE(T3.OTHRES,0) OTHRES
, COALESCE(T3.CPRO_RPT,0) CPRO_RPT
FROM ( SELECT DISTINCT T2.ID, T1.CAL_DT FROM (WORK.QUERY_FOR_DATE t1 CROSS JOIN WORK.QUERY_FOR_CPRO_0000 t2)) X
LEFT OUTER JOIN WORK.QUERY_FOR_CPRO_0000 T3
ON X.ID = T3.ID
AND X.CAL_DT = T3.CAL_DT
You could also add another join to keep you date range correct if you didn't want to hard code it in the where clause.
SELECT
X.ID
, X.CAL_DT
, COALESCE(T3.EXPRESS,0) EXPRESS
, COALESCE(T3.OTHRES,0) OTHRES
, COALESCE(T3.CPRO_RPT,0) CPRO_RPT
FROM
(
SELECT DISTINCT T2.ID, T1.CAL_DT
FROM (WORK.QUERY_FOR_DATE t1
CROSS JOIN WORK.QUERY_FOR_CPRO_0000 t2)) X
LEFT OUTER JOIN WORK.QUERY_FOR_CPRO_0000 T3
ON X.ID = T3.ID
AND X.CAL_DT = T3.CAL_DT
INNER JOIN
(SELECT MIN(DT) AS MINDATE, MAX(DT) AS MAXDATE
FROM WORK.QUERY_FOR_CPRO_000) AS DATEPARAM
ON X.CAL_DT BETWEEN MINDATE AND MAXDATE

Since you have a Calendar table, you may try the following:
Select B.id, C.CAL_DT, COALESCE(D.express, 0) express, COALESCE(D.others, 0) others,
COALESCE(D.total, 0) total, COALESCE(E.target, 0) target
From Calendar C
Cross Join (Select Distinct id From Table1) B
Left Join table1 D
On D.id = B.id And D.date_ = C.CAL_DT
Left Join table2 E
On E.id = B.id And E.date_ = C.CAL_DT
Order By B.id, C.CAL_DT
First, you have to join each id from table1 to every day in your calendar table, and that done by cross join the calendar with the distinct ids from table1.
Now, left join that result to table1, table2 to get which ids not having entries for a specific date (null values).
The COALESCE function is used to replace null values with 0.
See a demo using DB2 from db<>fiddle.

Related

SQL Server : multiple rows single line

I would like to get the representation of one record based on the primary key value from multiple tables. As shown below, each table can have multiple values based on this primary key value.
TABLE-1
ID
NAME
1
AA
2
BB
3
CC
4
DD
5
EE
TABLE-2
ID
SCHOOL
AUT
1
11
A
2
11
A
2
12
B
3
11
A
4
12
A
4
13
B
5
13
A
TABLE-3
ID
TC
1
101
2
102
2
103
2
104
3
105
4
106
4
107
5
108
The result below is the value obtained with an OUTER JOIN.
SELECT
T1.ID, T2.SCHOOL, T3.TC, T2.AUT
FROM
T1
LEFT OUTER JOIN
T2 ON T1.ID = T2.ID
LEFT OUTER JOIN
T3 ON T1.ID = T3.ID
ORDER BY
T1.ID ASC
ID
SCHOOL
TC
AUT
1
11
101
A
2
11
102
A
2
12
102
B
2
11
103
A
2
12
103
B
2
11
104
A
2
12
104
B
3
11
105
A
4
12
106
A
4
13
106
B
4
12
107
A
4
13
107
B
5
13
106
A
How can I get the result like below?
ID
SCHOOL
TC1
TC2
TC3
1
11
101
2
11
102
103
104
3
11
105
4
12
106
107
5
13
108
The important thing here is that in the result value, SCHOOL only shows that AUT is 'A'.
I would appreciate it if you let me know your query.
It looks, from your desired results, you just need to use row_number in combination with a conditional aggregate. Your sample data seems a little inadequate, I can't see any requirement for table1 at all.
Try the following:
with t as (
select t2.id,t2.school,t3.tc, Row_Number() over(partition by t2.id order by t3.tc) col
from t2 join t3 on t2.id=t3.id
where aut='A'
)
select id,school,
max(case when col=1 then tc end) TC1,
max(case when col=2 then tc end) TC2,
max(case when col=3 then tc end) TC3
from t
group by id, school
Example SQL Fiddle
SELECT
T1.ID, T2.SCHOOL,
GROUP_CONCAT(T3.TC),
GROUP_CONCAT(T2.AUT)
FROM
T1
LEFT OUTER JOIN
T2 ON T1.ID = T2.ID
LEFT OUTER JOIN
T3 ON T1.ID = T3.ID
GROUP BY
T1.ID, T2.SCHOOL
WHERE
T2.AUT = ‘A’
ORDER BY
T1.ID ASC
Notice that GROUP_CONCAT concatenates the values in the row.
EDIT: oh my, haven't seen that it's a SQL Server question!
Just replace GROUP_CONCAT with STRING_AGG if you’re using SQL Server 2017 or newer.

Get all records from tbl1 and matching (or the next close) records from tbl2, match is based on Client# and Date from tbl1 to tbl2

I'm trying to match (or next greater match)from Tbl1 to Tbl2.
Criteria is
Extract all records from tbl1 and close or equal match from tbl2
Find Matching records from tbl2 (match is based on ClientNo and/or date)
tbl2 match should be based on date greater than or equal to the tbl1 date.
Results should not have any duplicates from tbl1 or tbl2
First match should be the first date2 in tbl2 that is greater or equal to date1 in tbl1
If there are more than one records on the same date than it should pick the next greater or equal date based on RefNO in tbl2.
tbl1 contains
RecNo ClientNo Date1
-----------------------------
4 1001 2/6/2017
3 1001 2/4/2018
1 1001 2/5/2018
2 1001 2/5/2018
5 1002 3/8/2018
9 1002 3/9/2018
10 1002 4/11/2019
tbl2 contains
RecNo ClientNo Date2 RefNo
-----------------------------------
1 1001 2/5/2017 1
4 1001 2/5/2018 2
2 1001 2/5/2018 4
3 1001 2/6/2018 5
5 1002 3/9/2018 1
6 1002 4/10/2019 2
Query result
RecNoTbl1 ClientNo Date1 RecNoTbl2 Date2 RefNo
---------------------------------------------------------------
4 1001 2/6/2017 4 2/5/2018 2
3 1001 2/4/2018 2 2/5/2018 4
1 1001 2/5/2018 3 2/6/2018 5
2 1001 2/5/2018 NULL NULL NULL
5 1002 3/8/2018 5 3/9/2018 1
9 1002 3/9/2018 6 4/10/2019 2
10 1002 4/11/2019 NULL NULL NULL
I tried with ROW OVER PARTITION but that didn't work.
You could use a left join query like this
Select tb1.*, tb2.* from table1 tb1
Left join table2 tb2 ON tb1.clientno = tb2.clientno Where tb1.date>=tb2.dare
For unique you can use distinct
Like this: select distinct tb1.title
Check this link
For more info of distinct
And sorry if am not really detailed about things. I am posting from phone.

SQL Server: Find rows rows in Table1 not in Table2 but need data from tables

I need to find missing rows, however, I need data from BOTH tables to be returned. I checked google but did not find a similar question.
TableA
thetime real-time
1 1 pm
2 5 pm
3 7 pm
4 9 pm
5 11 pm
Table2
thedate transaction_num thetime
1/1/2000 111 1
1/1/2000 111 4
1/1/2000 111 5
2/1/2000 111 2
2/1/2000 111 4
2/1/2000 222 1
2/1/2000 222 5
I need to select the date and transaction_num from Table2 that do not have a time in Table1 so the result from the select statement should have the date and trnsaction number for the missing times not in table2:
thedate transaction_num thetime
1/1/2000 111 2
1/1/2000 111 3
2/1/2000 111 1
2/1/2000 111 3
2/1/2000 111 5
2/1/2000 222 2
2/1/2000 222 3
2/1/2000 222 4
This is the code I have but it is giving me a multi-part binding error:
select t2.thedate, t2.transaction_num, t1.thetime
from table2 t2
where not exists(select t1.thetime
from table1 t1
where t2.thetime = t1.thetime)
Does anyone know how to solve this or can point me to an answer?
Most questions in stack overflow for missing rows involve returning data from one table but I need it for 2 tables.
Thank you
It seems all the transaction_nums on all dates should have all the times associated with them. Else it would be treated as missing.
To do this, you can initially cross join the distinct date and transaction_num from table2 and thetime from table1. Then left join on this derived table to get the missing rows.
select tt.thedate, tt.transaction_num,tt.thetime
from (
select * from (
(select distinct thedate,transaction_num from table2) a cross join
(select distinct thetime from table1) b
)
) tt
left join table2 t2 on t2.transaction_num=tt.transaction_num and t2.thetime=tt.thetime and tt.thedate=t2.thedate
where t2.transaction_num is null and t2.thedate is null and t2.thetime is null
Sample Demo

SQL- Add Missing data in Left outer joing query

I have following data
Components
componentid title
1 houseRent
2 medical
3 Travelling Allowance
empPayrollMaster
MasterID EmployeeID SalaryMonthID
1 101 1
2 102 1
3 103 1
empPayrollDetail
DetailID MasterID ComponentID amount
1 1 1 100
2 1 2 500
3 2 1 300
4 2 3 250
5 3 1 150
6 3 2 350
7 3 3 450
Required Output
EmployeeID MasterID ComponentID amount
101 1 1 100
101 1 2 500
101 1 3 0
102 2 1 300
102 1 2 0
102 2 3 250
103 3 1 150
103 3 2 350
103 3 3 450
To get the required output if i do left outer join between components and empPayrollDetail I get null in EmployeeID and MasterID and amount Columns. How to modify left join to get the required output
You need to do a CROSS JOIN on Components and empPayrollMaster to generate first all combination of employees and components. Then, do a LEFT JOIN on empPayrollDetail to achieve the result, using ISNULL(amount, 0) for NULL amounts.
SQL Fiddle
SELECT
epm.EmployeeID,
epm.MasterID,
c.ComponentID,
amount = ISNULL(epd.amount, 0)
FROM empPayrollMaster epm
CROSS JOIN Components c
LEFT JOIN empPayrollDetail epd
ON epd.MasterID = epm.MasterID
AND epd.ComponentID = c.ComponentID
Try this
select empPayrollMaster.EmployeeID,empPayrollMaster.MasterID,
Components.componentid,isnull(empPayrollDetail.amount,0)
from empPayrollMaster
left join Components
on empPayrollMaster.EmployeeID is not null
left join empPayrollDetail
on empPayrollDetail.MasterID = empPayrollMaster.MasterID
and empPayrollDetail.ComponentID = Components.componentid
Try this way
select c.EmployeeID,d.MasterID,c.ComponentID,isnull(d.amount,0) as amount from (
select * from Components a
Cross join empPayrollMaster b) c
left outer join empPayrollDetail d on d.componentid =c.componentid
As you want the component amount for each employee in the master table you should use a insull(payrole_detail.amount,0) or, as #Turophile pointed out, the SQL standard function coalesce(payrole_detail.amount,0) for the amounts column.
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
INNER JOIN Orders
ON Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;

How to extract data from table2 which is closest to the DATE FIELD of table1?

I have following two table Diagnose & Exercise
I would like to extract Exercise date closest to the Diagnose_Date and it should be 1 row from exercise table.
I have tried left join with DATEDIFF function in where condition
SELECT D.ID,D.Diagnose_Date,D.Type1,D.Type2,E.Exercise_Date],E.Field1,E.Field2,E.Field3
FROM Diagnose D
LEFT JOIN Exercise E
ON D.ID=E.ID
WHERE DATEDIFF(DAY,[Diagnose_Date],[Exercise_Date]) BETWEEN -30 AND 30
any help would be very helpful
Thanks in Advance
Diagnose Table
------------------------------------------
ID Dignose_Date Type1 SubType1
------------------------------------------
1 10/01/2010 01 1.1
2 20/02/2012 02 2.2
3 30/03/2013 01 1.2
------------------------------------------
Exercise Table
------------------------------------------
ID Exercise_Date Field1 Field2 Field3
------------------------------------------
1 01/01/2010 x y z
2 10/02/2012 a b c
2 01/04/2012 e f f
3 01/03/2013 x y z
3 05/04/2013 a b c
3 01/06/2013 x y z
------------------------------------------
Expected Result should be :
------------------------------------------------------------------------
ID Diagnose_Date Exercise_Date Type1 SubType2 Field1 Field2 Field3
------------------------------------------------------------------------
1 10/01/2010 01/01/2010 01 1.1 x y z
2 20/02/2012 10/02/2012 02 2.2 a b c
3 30/03/2013 05/04/2013 01 1.2 a b c
-------------------------------------------------------------------------
First, in a CTE, for each diagnose get the smallest time interval between the diagnose date and all the exercise dates associated with that diagnose.
WITH MIN_DATES_CTE(ID, DATE_DIFF)
AS (
SELECT ID, MIN(ABS(DATEDIFF(DAY,[Diagnose_Date],[Exercise_Date])))
FROM Exercise E
INNER JOIN Diagnose D ON D.ID = E.ID
GROUP BY E.ID
)
Then, join Diagnose and Exercise by ID and the smallest time interval
SELECT D.ID,D.Diagnose_Date,D.Type1,D.Type2,E.Exercise_Date],E.Field1,E.Field2,E.Field3
FROM Diagnose D
LEFT JOIN Exercise E ON D.ID = E.ID
INNER JOIN MIN_DATES_CTE ON MIN_DATES_CTE.ID = E.ID
WHERE ABS(DATEDIFF(DAY,[Diagnose_Date],[Exercise_Date])) = MIN_DATES_CTE.DATE_DIFF
I'm assuming you're just matching ANY single diagnose entry with ANY single exercise entry based on their dates being closest to each other.
Here's my line of thinking:
Do a full JOIN on diagnoses and exercises, order by absolute date difference, ascending.
SELECT
D.ID,
D.Date,
E.ID,
E.Date,
ABS(DATEDIFF(day, D.Date, E.Date)) Diff
FROM Diagnosis D, Exercise E
ORDER BY Diff
You'll get a result like this:
ID Date ID Date Diff
3 2013-03-30 5 2013-03-25 5
2 2012-02-20 2 2012-02-10 10
3 2013-03-30 4 2013-03-01 29
2 2012-02-20 3 2012-04-01 41
3 2013-03-30 6 2013-06-01 63
1 2010-10-01 1 2010-01-01 273
3 2013-03-30 3 2012-04-01 363
2 2012-02-20 4 2013-03-01 375
2 2012-02-20 5 2013-03-25 399
3 2013-03-30 2 2012-02-10 414
2 2012-02-20 6 2013-06-01 467
1 2010-10-01 2 2012-02-10 497
1 2010-10-01 3 2012-04-01 548
2 2012-02-20 1 2010-01-01 780
1 2010-10-01 4 2013-03-01 882
1 2010-10-01 5 2013-03-25 906
1 2010-10-01 6 2013-06-01 974
3 2013-03-30 1 2010-01-01 1184
Now you can see the dates that are closest to each other, with the number of days they are far.
Of course, you won't use this, but from this list, you can select the first one:
SELECT TOP 1
D.ID,
D.Date,
E.ID,
E.Date,
ABS(DATEDIFF(day, D.Date, E.Date)) Diff
FROM Diagnosis D, Exercise E
ORDER BY Diff
Now you can plug this statement in a LEFT join, so you can singly select a date matching another.
Like this:
SELECT
fD.ID,
fD.Date,
fE.ID,
fE.Date
FROM
Diagnosis fD
LEFT JOIN Exercise fE
ON fE.ID = (SELECT TOP 1 E.ID
FROM Diagnosis D, Exercise E
WHERE D.ID = fD.ID
ORDER BY ABS(DATEDIFF(day, D.Date, E.Date)))
Which gives the result:
ID Date ID Date
1 2010-10-01 1 2010-01-01
2 2012-02-20 2 2012-02-10
3 2013-03-30 5 2013-03-25
You can use OUTER APPLY
SELECT d.ID,
d.Diagnose_Date,
d.Type1,
d.SubType1,
e.Exercise_Date,
e.Field1,
e.Field2,
e.Field3
FROM Diagnose d
OUTER APPLY
( SELECT TOP 1 Exercise_Date, Field1, Field2, Field3
FROM Exercise e
WHERE d.ID = e.ID
AND DATEDIFF(DAY, d.[Diagnose_Date], e.[Exercise_Date]) BETWEEN -30 AND 30
ORDER BY ABS(DATEDIFF(DAY, d.[Diagnose_Date], e.[Exercise_Date]))
) e;
Example on SQL Fiddle
I have done more testing on this and found that a method using ROW_NUMBER() is the most efficient:
WITH CTE AS
( SELECT d.ID,
d.Diagnose_Date,
d.Type1,
d.SubType1,
e.Exercise_Date,
e.Field1,
e.Field2,
e.Field3,
RowNumber = ROW_NUMBER() OVER (PARTITION BY d.ID ORDER BY ABS(DATEDIFF(DAY,[Diagnose_Date],[Exercise_Date])))
FROM Diagnose D
LEFT JOIN Exercise E
ON D.ID = E.ID
)
SELECT ID,
Diagnose_Date,
Type1,
SubType1,
EID = ID,
Exercise_Date,
Field1,
Field2,
Field3
FROM CTE
WHERE RowNumber = 1;
I have compared this with my first solution and the answer with the most upvotes for comparison. The results are as follows:
OUTER APPLY
Cost relative to batch: 34%
--------------------------------------------------
Table 'Exercise'. Scan count 3, logical reads 3
Table 'Diagnose'. Scan count 1, logical reads 1
--------------------------------------------------
Total. Scan count 4, logical reads 4
SELF JOIN WITH AGGREGATES (Highest voted so far)
Cost relative to batch: 51%
--------------------------------------------------
Table 'Worktable'. Scan count 0, logical reads 0
Table 'Exercise'. Scan count 2, logical reads 4
Table 'Diagnose'. Scan count 2, logical reads 2
--------------------------------------------------
Total. Scan count 4, logical reads 6
ROW_NUMBER()
Cost relative to batch: 15%
--------------------------------------------------
Table 'Exercise'. Scan count 1, logical reads 3
Table 'Diagnose'. Scan count 1, logical reads 1
--------------------------------------------------
Total. Scan count 2, logical reads 4
Examples on SQL Fiddle
So the ROW_NUMBER solution has the lowest IO statistics, and the lowest estimated cost
Using only standard SQL:
SELECT D.ID, D.Diagnose_Date, D.Type1, D.SubType1, E.Exercise_Date, E.Field1, E.Field2, E.Field3
FROM Diagnose D
LEFT JOIN Exercise E
ON E.ID=D.ID AND
E.Exercise_Date=(SELECT MAX(Exercise_Date) FROM Exercise WHERE Exercise.ID=D.ID AND Exercise.Exercise_Date<=D.Diagnose_Date)