running tally in SQL - sql

Need help tallying fork truck training completions at work. Here is an example of the tables I have, and the table I need to create:
table 1:
date
is_work_day
2023-01-25
1
2023-01-26
1
2023-01-27
1
2023-01-28
0
2023-01-29
1
2023-01-30
0
table 2:
employee_id
training_passed
test_date
001
1
2023-01-25
002
1
2023-01-26
003
0
2023-01-26
004
1
2023-01-26
005
0
2023-01-27
006
1
2023-01-29
need table:
date
cumulative_passed_training
2023-01-26
2
2023-01-27
2
2023-01-29
3
The table should count the total passed trainings, but only starting on 2023-01-26 and should only show dates that are work days. Any help would be greatly appreciated.
I think I need to JOIN the two tables, and then SUM the training_passed column, but am unsure how to get it to start at a certain date, and how to make it only show work days on the final table.

JOIN on the date column and add the passed tests as JOIN condition. Also GROUP BY the date so you can sum for each one
select t1.date, count(t2.employee_id)
from table1 t1
join table2 t2 on t1. date = t2.test_date
and t2.training_passed = 1
group by t1.date
It would make no difference if you put the condition
t2.training_passed = 1
in a where clause instead of the INNER JOIN.

Related

Is there a way to backfill missing rows in SQL table with a reference table?

Let's say I have a table for 3 days (2023-01-01 to 2023-01-03) of data that looks like this
Name
date
budget
A
2023-01-01
10
A
2023-01-03
15
B
2023-01-02
12
B
2023-01-03
17
I want to calculate the average budget over 3 days, but as you can see both A and B have missing data - A misses data from 2023-01-02 and B misses data from 2023-01-01. Instead of ignoring these missing data, I want to fill in data from a reference table which might look like
date
budget
2023-01-01
12
2023-01-02
15
2023-01-03
20
So that I should end up with something like
Name
date
budget
A
2023-01-01
10
A
2023-01-03
15
A
2023-01-02
15
B
2023-01-02
12
B
2023-01-03
17
B
2023-01-01
12
So the 1/2 data for A and 1/1 data for B are taken from the reference table now. Is there a way to do this in SQL? Thanks!
I'm thinking of cross join but I'm not entirely sure how that would solve the problem
You can use cross join with a left join:
select t.name, rt.date, coalesce(t1.budget, rt.budget) from
(select distinct t.name from tbl t) t
cross join ref_tbl rt left join tbl t1 on t1.name = t.name and t1.date = rt.date
See fiddle

SQLite function to combine tables with min() and max() function with dates?

In SQLite, I am trying to combine both tables. Specifically, I am trying to find a way to combine lab result dates with 0-7 days of follow-up for diagnosis dates (minimum 0 day, like the same day, to maximum 7 days). I have attached the tables here (note: not real ID, ENCID, lab result date, and diag_date numbers). Is there a possible way to combine both tables without the first row (of Table 1) attached to DIAG_DATE of 11/19/2020 in SQLite? If not, what about in Python?
Table 1
ID ENCID LAB RESULT DATE
1 098 10/29/2020
1 098 11/17/2020
1 098 11/15/2020
1 098 11/12/2020
1 098 11/19/2020
Table 2
ID ENCID DIAG_DATE
1 098 11/19/2020
1 098 10/01/2021
My goal:
Table 3
ID ENCID LAB_RESULT_DATE DIAG_DATE
1 098 11/12/2020 11/19/2020
1 098 11/15/2020 11/19/2020
1 098 11/17/2020 11/19/2020
1 098 11/19/2020 11/19/2020
Here is my SQLite code below (I am aware this is not right):
CREATE TABLE table3 AS
SELECT *
FROM table1
JOIN table2
WHERE table1.ID=table2.ID AND table1.ENCID=table2.ENCID AND DIAG_DATE >= LAB_RESULT_DATE
HAVING MAX(DIAG_DATE)>MIN(LAB_RESULT_DATE)
ORDER BY table1.ID ASC
you can join both table with thier ENCID and dates.
You need to chech if the time frame of the second ON parameter is enough, to caputure all dates and times else you need to adjust the time by adding , '-10 seconds' for example
SELECT t1.*, t2."DIAG_DATE"
FROM tab1 t1 JOIN tab2 t2 ON t1."ENCID" = t2."ENCID" AND "LAB RESULT DATE" BETWEEN DATE("DIAG_DATE",
'-7 day') AND "DIAG_DATE"
ID ENCID LAB RESULT DATE DIAG_DATE
1 98 2020-11-17 01:00:00 2020-11-19 01:00:00
1 98 2020-11-15 01:00:00 2020-11-19 01:00:00
1 98 2020-11-12 01:00:00 2020-11-19 01:00:00
1 98 2020-11-19 01:00:00 2020-11-19 01:00:00
db<>fiddle here

Join tables on dates, with dirty date field

In AWS Athena, I am trying to join two tables in the db using the date, but one of the tables (table2) is not clean, and contains values that are not dates, as shown below.
| table2.date |
| ---- |
|6/02/2021|
|9/02/2021|
|1431 BEL & 1628 BEL."|
|15/02/2021|
|and failed to ....|
|18/02/2021|
|19/02/2021|
I am not able to have any influence in cleaning this table up.
My current query is:
SELECT *
FROM table1
LEFT JOIN table2
ON table1.operation_date = cast(date_parse(table2."date",'%d/%m/%Y') as date)
LIMIT 10;
I've tried using regex_like(col, '[a-z]'), but this still leaves the values that are numerical, but not dates.
How do I get the query to ignore the values that are not dates?
You may wrap conversion expression with try function, that will resolve to NULL in case of failed conversion.
select
try(date_parse(col, '%d/%m/%Y'))
from(values
('6/02/2021'),
('9/02/2021'),
('1431 BEL & 1628 BEL.'),
('15/02/2021'),
('and failed to ....'),
('18/02/2021'),
('19/02/2021')
) as t(col)
#
_col0
1
2021-02-06 00:00:00.000
2
2021-02-09 00:00:00.000
3
4
2021-02-15 00:00:00.000
5
6
2021-02-18 00:00:00.000
7
2021-02-19 00:00:00.000

Many to many join with filter

I have two tables like so -
Table 1 -
patient admit_dt discharge_dt
323 2020-01-09 2020-02-01
323 2020-02-18 2020-02-27
231 2020-02-13 2020-02-17
Table 2 -
patient admit_dt discharge_dt
323 2020-02-05 2020-02-07
231 2020-02-23 2020-02-28
The output I am needing is
patient
323
The logic is - if one patient goes from table 1 into table 2 and ends up back in table 1 within 30 days, we want to count them in the output.
Patient 231 is not included in the result because they didn't go back to table 1.
If I understand correctly, you can use join:
select t1.patient
from table1 t1 join
table2 t2
on t2.patient = t1.patient and
t2.admit_dt > t1.discharge_dt join
table1 tt1
on tt1.patient = t1.patient and
tt1.admit_dt > t2.discharge_dt;

How do i join the last record from one table where the date is older than other table?

This is my first post here, and the first problem i havent been able to find a solution to on my own. I have a MainTable that contains the fields: Date, MinutesActiveWork (And other not relevant fields). I have a second table that contains the fields: ID, id_Workarea, GoalOfActiveMinutes, GoalActiveFrom.
I want to make a query that returns all records from MainTable, and the active goal for the date.
Exampel:
Maintable (Date = dd/mm/yyyy)
ID Date ActvWrkMin WrkAreaID
1 01-01-2019 45 1
2 02-01-2019 50 1
3 03-01-2019 48 1
GoalTable:
ID id_Workarea Goal GlActvFrm
1 1 45 01-01-2019
2 2 90 01-01-2019
3 1 50 03-01-2019
What i want from my query:
IDMain Date ActvWrkMin Goal WrkAreaID
1 01-01-2019 45 45 1
2 02-01-2019 50 45 1
3 03-01-2019 48 50 1
The query that i have now is really close to what i want. But the problem is that the query outputs all goals that is less than the date from MainTable (It makes sense why, but i dont know what criteria to type to fix it). Like so:
IDMain Date ActvWrkMin Goal WrkAreaID
1 01-01-2019 45 45 1
2 02-01-2019 50 45 1
3 03-01-2019 48 45 1 <-- Dont want this one
3 03-01-2019 48 50 1
My query
SELECT tblMain.Date, tblMain.ActiveWorkMins, tblGoal.Goal
FROM VtblSumpMain AS tblMain LEFT JOIN (
SELECT VtblGoalsForWorkareas.idWorkArea, VtblGoalsForWorkareas.Goal, VtblGoalsForWorkareas.GoalActiveFrom (THIS IS THE DATE FIELD)
FROM VtblGoalsForWorkareas
WHERE VtblGoalsForWorkareas.idWorkArea= 1) AS tblGoal ON tblMain.Date > tblGoal.GoalActiveFrom
ORDER BY tblMain.Date
(I know i could do this pretty simple with Dlookup, but that is just not fast enough)
Thanks for any advice!
For this, I think you have to use the nested query as I mention below.
select tblMain.id,tblMain.Date,tblMain.ActvWrkMin, tblMain.WrkAreaID,
(select top 1 Goal
from GoalTable as gtbl
where gtbl.id_workarea = 1
and tblmain.[Date] >= gtbl.glActvFrm order by gtbl.glActvFrm desc) as Goal
from Maintable as tblMain
Check the below image for the result which is generated from this query.
I hope this will solve your issue.