I need to figure out when each person will complete a task based on a work calendar that won't include sequential dates. I know the data in two tables T1
Name DaysRemaining Complete
Joe 3
Mary 2
and T2
Date Count
6/1/2018
6/8/2018
6/10/2018
6/15/2018
Now if Joe has 3 days remaining I would like to count 3 records forward from today in T2 and return the date to the Complete column. If today is 6/1/2018 I would want the Update query to return 6/10/2018 to the Complete column for Joe.
My thought is that I could daily update T2.count with a query that began today and would then autoincrement. Following that I could join the T1 and T2 on DaysRemaining and Count. I can do that but haven't found a working solution for updating t2.count with autoincrement. Any better ideas? I am using a linked sharepoint table so creating a new field each time would not be an option.
I think this will work:
select t1.*, t2.date
from t1, t2 -- ms access doesn't support cross join
where t1.daysremaining = (select count(*)
from t2 as tt2
where tt2.date <= t2.date and tt2.date > now()
);
This is an expensive query and one that is easier to express and more efficient in almost any other database.
Related
I have 2 tables with epoch values. One with multiple samples per minute such as:
id
First_name
epoch_time
1
Paul
1650317420
2
Jeff
1650317443
3
Raul
1650317455
And one with 1 sample per minute:
id
Home
epoch_time
1
New York
1650317432
What I would like to do is join on the closest timestamp between the two tables. Ideally, finding the closest values between tables 1 and 2 and then populating a field from table 1 and 2. Id like to populate the 'Home' field and keep the rest of the records from table 1 as is, such as:
id
Name
Home
epoch_time
1
Paul
New York
1650317420
2
Jeff
New York
1650317443
3
Raul
New York
1650317455
The problem is the actual join. The ID is not unique hence why I need to not only join on ID but also scan for the closest epoch time between the 2 tables. I cannot use correlated subqueries, since Presto doesn't support correlated subqueries.
Answered my own question. It was as simple as first adding some offset such as a LEAD() between each minute sample and then using a BETWEEN in the join between the tables on the current minute sample looking ahead 59 seconds. Such that:
WITH tbl1 AS (
SELECT
*
FROM table_1
),
tbl2 AS (
SELECT
*,
LEAD(epoch_time) OVER (
PARTITION BY
name,
home
ORDER BY
epoch_time
) - 1 AS next_time
FROM table_2
)
SELECT
t1.Id,
t1.Name,
t2.Home,
t1.epoch_time
FROM tbl1 t1
LEFT JOIN tbl2 t2
ON t1.Id = t2.Id
AND t1.epoch_time BETWEEN t2.epoch_time AND t2.next_time
I'm struggling with following problem. I want to create a query in spark that runs a query for every row on existing table based on current column value.
Table can be simplified like this:
job_id
start_date
end_date
1
1-1-2000
2-1-2000
2
1-1-2000
3-1-2000
3
2-1-2000
4-1-2000
4
5-1-2000
7-1-2000
I want to create query which adds another column that counts how many jobs have already been started at each rows start date.
Output for this table should look as following
job_id
start_date
end_date
jobs_active_at_start
1
1-1-2000
2-1-2000
2 (active jobs id - 1,2)
2
1-1-2000
3-1-2000
2 (active jobs id - 1,2)
3
2-1-2000
4-1-2000
3 (active jobs id - 1,2,3)
4
5-1-2000
7-1-2000
1 (only job 4 is active)
I've tried to do subquery
%sql
SELECT
t1.id,
(SELECT COUNT(*) FROM table t2 WHERE t2.start_date <= t1.start_date AND t2.end_date >= t1.start_date)
FROM table t1
But databricks returned an error
AnalysisException: Correlated column is not allowed in predicate
I guess this method doesn't have best efficiency either.
What is best approach to tackle such problem?
You can just join the table to itself on the dates.
select
t1.job_id,
t1.start_date,
t1.end_date,
count (t2.job_id)
from
Table1 t1
inner join Table1 t2
on t2.start_date <= t1.start_date AND t2.end_date >= t1.start_date
group by
t1.job_id,
t1.start_date,
t1.end_date;
I want to combine records with duplicate into single row. The surviving record will be updated with info from the duplicate if available. In the example below, I want to retain ID 500 then supply the missing data from its duplicate record which is ID 501.
--->>>SQL Problem sample data image<<<---
CURRENT DATA:
ID Group Name Identifier1 Identifier2 Birthday
500 1 Christopher Col asdf NULL NULL
501 2 Christopher Col asdf qwerty 2/18/1987
502 1 Mickey vbnx tyui 1/25/1998
503 2 Minnie ghjk erty 4/23/2003
EXPECTED RESULT:
ID Group Name Identifier1 Identifier2 Birthday
500 1 Christopher Col asdf qwerty 2/18/1987
502 1 Mickey vbnx tyui 1/25/1998
503 2 Minnie ghjk erty 4/23/2003
This will need two steps if there is only one duplicate:
Update the primary record
UPDATE T1
SET T1.Group = ISNULL(T1.Group, T2.Group)
FROM Table T1
LEFT JOIN Table T2 ON T2.ID <> T1.ID AND T2.Name = T1.Identifier1 AND T2.Identifier1
WHERE ROW_NUMBER() OVER (PARTITION BY T1.Identifier1 ORDER BY ID) = 1
I pressume that 'Identifier1' is the unique way to identify the records - otherwise you need to change the query... also the updated fields need some fillup ;)
Delete all secondary data
DELETE FROM Table WHERE ROW_NUMBER() OVER (PARTITION BY T1.Identifier1 ORDER BY ID) > 1
This script will delete all records which are not the first when partitioney by the identifier.
PS: Will only work with T-SQL
PPS: These are just dummy scripts and no guarantee they will work. But I hope they will give you the idea how to approach your goal.
Please test yourself before implementing.
select
t1.id,
t1.[Group],
t1.Name,
t1.Identifier1,
isnull(t1.Identifier2,t2.Identifier2) Identifier2,
Isnull(t1.Birthday,t2.Birthday) Birthday
from
#temp T1
left join #Temp t2 on t1.Name=t2.name and t1.[Group]<t2.[Group]
Where t2.id is null
Also you you can find your answer for the question which you asked in comments.
Answer to your question asked in comments
I have two tables left joined. The query is grouped by the left table's ID column. The right table has a date column called close_date. The problem is, if there are any right table records that have not been closed (thus having a close_date of 0000-00-00), then I do not want any of the left table records to be shown, and if there are NO right table records with a close_date of 0000-00-00, I would like only the right table record with the MAX close date to be returned.
So for simplicity sake, let's say the tables look like this:
Table1
id
1
2
Table2
table1_id | close_date
1 | 0000-00-00
1 | 2010-01-01
2 | 2010-01-01
2 | 2010-01-02
I would like the query to only return this:
Table1.id | Table2.close_date
2 | 2010-01-02
I tried to come up with an answer using aliased CASES and aggregate functions, but I could not search by the result, and I was attempting not to make a 3 mile long query to solve the problem. I looked through a few of the related posts on here, but none seem to meet the criteria of this particular case.
Use:
SELECT t1.id,
MAX(t2.close_date)
FROM TABLE1 t1
JOIN TABLE2 t2 ON t2.table1_id = t1.id
WHERE NOT EXISTS(SELECT NULL
FROM TABLE2 t
WHERE t.table1_id = t1.id
AND t.closed_date = '0000-00-00')
The '0000-00-00' should be implicitly converted by MySQL to a DATETIME. If not, cast the value to DATETIME.
Try:
select table1id,close_date form table2
where close_date= (select max(close_date) from table2) or close_date='0000-00-00'
I have a table in Oracle 10 that is defined like this:
LOCATION HOUR STATUS
--------------------------------------
10 12/10/09 5:00PM 1
10 12/10/09 6:00PM 1
10 12/10/09 7:00PM 2
10 12/10/09 8:00PM 1
10 12/10/09 9:00PM 3
10 12/10/09 10:00PM 3
10 12/10/09 11:00PM 3
This table continues for various locations and for a small number of status values. Each row covers one hour for one location. Data is collected from a particular location over the course of that hour, and processed in chunks. Sometimes the data is available, sometimes it isn't, and that information is encoded in the status. I am trying to find runs of a particular status, so that I could convert the above table into something like:
LOCATION STATUS START END
-----------------------------------------------------------
10 1 12/10/09 5:00PM 12/10/09 7:00PM
10 2 12/10/09 7:00PM 12/10/09 8:00PM
10 1 12/10/09 8:00PM 12/10/09 9:00PM
10 3 12/10/09 9:00PM 12/11/09 12:00AM
Basically condensing the table into rows that define each stretch of a particular status. I have tried various tricks, like using lead/lag to figure out where starts and ends are and such, but all of them have met with failure. The only trick that works so far is going one by one through the values programatically, which is slow. Any ideas for doing it directly in Oracle? Thanks!
Here's an ANSI SQL solution:
select t1.location
, t1.status
, min(t1.hour) AS "start" -- first of stretch of same status
, coalesce(t2.hour, max(t1.hour) + INTERVAL '1' HOUR) AS "end"
from t_intervals t1 -- base table, this is what we are condensing
left join t_intervals t2 -- finding the first datetime after a stretch of t1
on t1.location = t2.location -- demand same location
and t1.hour < t2.hour -- demand t1 before t2
and t1.status != t2.status -- demand different status
left join t_intervals t3 -- finding rows not like t1, with hour between t1 and t2
on t1.location = t3.location
and t1.status != t3.status
and t1.hour < t3.hour
and t2.hour > t3.hour
where t3.status is null -- demand that t3 does not exist, in other words, t2 marks a status transition
group by t1.location -- condense on location, status.
, t1.status
, t2.hour -- this pins the status transition
order by t1.location
, t1.status
, min(t1.hour)
OK, I apologize for not knowing Oracle syntax, but I hope that the below Sybase one is clear enough
(I split it into 3 queries creating 2 temp tables for readbility but you can just re-unit as sub-queries. I don't know how to add/subtract 1 hour in Oracle, dateadd(hh...) does it in Sybase
SELECT * FROM T
INTO #START_OF_PERIODS
WHERE NOT EXISTS (
SELECT 1 FROM T_BEFORE
WHERE T.LOCATION = T_BEFORE.LOCATION
AND T.STATUS = T_BEFORE.STATUS
AND T.HOUR = dateadd(hh, T_BEFORE.HOUR, 1)
)
SELECT * FROM T
INTO #END_OF_PERIODS
WHERE NOT EXISTS (
SELECT 1 FROM T_AFTER
WHERE T.LOCATION = T_AFTER.LOCATION
AND T.STATUS = T_AFTER.STATUS
AND T.HOUR = dateadd(hh, T_AFTER.HOUR, -1)
)
SELECT T1.LOCATION, T1.STATUS, T1.HOUR AS 'START', MIN(T2.HOUR) AS 'END'
FROM #START_OF_PERIODS 'T1', #END_OF_PERIODS 'T2'
WHERE T1.LOCATION = T2.LOCATION
AND T1.STATUS = T2.STATUS
AND T1.HOUR <= T2.HOUR
GROUP BY T1.LOCATION, T1.STATUS, T1.HOUR
-- May need to add T2.LOCATION, T2.STATUS to GROUP BY???
Ever thought about a stored procedure? I think that would be the most readable solution.
Basic Idea:
run a select statement that gives you the rown in the right order for one building
iterate over the result line by line and write a new 'run'-record every time the status changes and when reaching the end of the result set.
You need to test if it is also the fastest way. Depending on the number of records, this might not be an issue at all.