I'm struggling with following problem. I want to create a query in spark that runs a query for every row on existing table based on current column value.
Table can be simplified like this:
job_id
start_date
end_date
1
1-1-2000
2-1-2000
2
1-1-2000
3-1-2000
3
2-1-2000
4-1-2000
4
5-1-2000
7-1-2000
I want to create query which adds another column that counts how many jobs have already been started at each rows start date.
Output for this table should look as following
job_id
start_date
end_date
jobs_active_at_start
1
1-1-2000
2-1-2000
2 (active jobs id - 1,2)
2
1-1-2000
3-1-2000
2 (active jobs id - 1,2)
3
2-1-2000
4-1-2000
3 (active jobs id - 1,2,3)
4
5-1-2000
7-1-2000
1 (only job 4 is active)
I've tried to do subquery
%sql
SELECT
t1.id,
(SELECT COUNT(*) FROM table t2 WHERE t2.start_date <= t1.start_date AND t2.end_date >= t1.start_date)
FROM table t1
But databricks returned an error
AnalysisException: Correlated column is not allowed in predicate
I guess this method doesn't have best efficiency either.
What is best approach to tackle such problem?
You can just join the table to itself on the dates.
select
t1.job_id,
t1.start_date,
t1.end_date,
count (t2.job_id)
from
Table1 t1
inner join Table1 t2
on t2.start_date <= t1.start_date AND t2.end_date >= t1.start_date
group by
t1.job_id,
t1.start_date,
t1.end_date;
Related
I have 2 tables with epoch values. One with multiple samples per minute such as:
id
First_name
epoch_time
1
Paul
1650317420
2
Jeff
1650317443
3
Raul
1650317455
And one with 1 sample per minute:
id
Home
epoch_time
1
New York
1650317432
What I would like to do is join on the closest timestamp between the two tables. Ideally, finding the closest values between tables 1 and 2 and then populating a field from table 1 and 2. Id like to populate the 'Home' field and keep the rest of the records from table 1 as is, such as:
id
Name
Home
epoch_time
1
Paul
New York
1650317420
2
Jeff
New York
1650317443
3
Raul
New York
1650317455
The problem is the actual join. The ID is not unique hence why I need to not only join on ID but also scan for the closest epoch time between the 2 tables. I cannot use correlated subqueries, since Presto doesn't support correlated subqueries.
Answered my own question. It was as simple as first adding some offset such as a LEAD() between each minute sample and then using a BETWEEN in the join between the tables on the current minute sample looking ahead 59 seconds. Such that:
WITH tbl1 AS (
SELECT
*
FROM table_1
),
tbl2 AS (
SELECT
*,
LEAD(epoch_time) OVER (
PARTITION BY
name,
home
ORDER BY
epoch_time
) - 1 AS next_time
FROM table_2
)
SELECT
t1.Id,
t1.Name,
t2.Home,
t1.epoch_time
FROM tbl1 t1
LEFT JOIN tbl2 t2
ON t1.Id = t2.Id
AND t1.epoch_time BETWEEN t2.epoch_time AND t2.next_time
I need to figure out when each person will complete a task based on a work calendar that won't include sequential dates. I know the data in two tables T1
Name DaysRemaining Complete
Joe 3
Mary 2
and T2
Date Count
6/1/2018
6/8/2018
6/10/2018
6/15/2018
Now if Joe has 3 days remaining I would like to count 3 records forward from today in T2 and return the date to the Complete column. If today is 6/1/2018 I would want the Update query to return 6/10/2018 to the Complete column for Joe.
My thought is that I could daily update T2.count with a query that began today and would then autoincrement. Following that I could join the T1 and T2 on DaysRemaining and Count. I can do that but haven't found a working solution for updating t2.count with autoincrement. Any better ideas? I am using a linked sharepoint table so creating a new field each time would not be an option.
I think this will work:
select t1.*, t2.date
from t1, t2 -- ms access doesn't support cross join
where t1.daysremaining = (select count(*)
from t2 as tt2
where tt2.date <= t2.date and tt2.date > now()
);
This is an expensive query and one that is easier to express and more efficient in almost any other database.
I have a database that currently looks like this
Date | valid_entry | profile
1/6/2015 1 | 1
3/6/2015 2 | 1
3/6/2015 2 | 2
5/6/2015 4 | 4
I am trying to grab the dates but i need to make a query to display also for dates that does not exist in the list, such as 2/6/2015.
This is a sample of what i need it to be:
Date | valid_entry
1/6/2015 1
2/6/2015 0
3/6/2015 2
3/6/2015 2
4/6/2015 0
5/6/2015 4
My query:
select date, count(valid_entry)
from database
where profile = 1
group by 1;
This query will only display the dates that exist in there. Is there a way in query that I can populate the results with dates that does not exist in there?
You can generate a list of all dates that are between the start and end date from your source table using generate_series(). These dates can then be used in an outer join to sum the values for all dates.
with all_dates (date) as (
select dt::date
from generate_series( (select min(date) from some_table), (select max(date) from some_table), interval '1' day) as x(dt)
)
select ad.date, sum(coalesce(st.valid_entry,0))
from all_dates ad
left join some_table st on ad.date = st.date
group by ad.date, st.profile
order by ad.date;
some_table is your table with the sample data you have provided.
Based on your sample output, you also seem to want group by date and profile, otherwise there can't be two rows with 2015-06-03. You also don't seem to want where profile = 1 because that as well wouldn't generate two rows with 2015-06-03 as shown in your sample output.
SQLFiddle example: http://sqlfiddle.com/#!15/b0b2a/2
Unrelated, but: I hope that the column names are only made up. date is a horrible name for a column. For one because it is also a keyword, but more importantly it does not document what this date is for. A start date? An end date? A due date? A modification date?
You have to use a calendar table for this purpose. In this case you can create an in-line table with the tables required, then LEFT JOIN your table to it:
select "date", count(valid_entry)
from (
SELECT '2015-06-01' AS d UNION ALL '2015-06-02' UNION ALL '2015-06-03' UNION ALL
'2015-06-04' UNION ALL '2015-06-05' UNION ALL '2015-06-06') AS t
left join database AS db on t.d = db."date" and db.profile = 1
group by t.d;
Note: Predicate profile = 1 should be applied in the ON clause of the LEFT JOIN operation. If it is placed in the WHERE clause instead then LEFT JOIN essentially becomes an INNER JOIN.
This question already has answers here:
T-SQL: Selecting rows to delete via joins
(12 answers)
Closed 7 years ago.
I have two tables:
datess, deletess, sample
I wrote a query like this:
DELETE FROM sample
WHERE sample_date_key IN
(SELECT date_key FROM datess WHERE s_date BETWEEN '2015-02-18' and DATE'2015-02-25');
But I have 2 rows and two columns in deletess:
start_date | end_date
------------+----------
2015-02-18 | 2015-02-18
2015-01-18 | 2015-01-18
I want to delete all the rows in sample with dates between start_date and end_date in deletess.
I tried the below code but got error:
ERROR: more than one row returned by a subquery used as an expression
(SELECT date_key FROM datess WHERE s_date BETWEEN (SELECT start_date FROM deletess) AND (SELECT end_date FROM deletess);
I appreciate any help. Thanks!
You can do it using WHERE EXISTS to check whether any row match the condition:
DELETE FROM sample
WHERE sample_date_key IN
(
SELECT date_key
FROM datess
WHERE EXISTS (
SELECT 1
FROM deletess
WHERE s_date BETWEEN start_date
AND end_date
)
);
join is better than in
DELETE s
FROM sample s
INNER JOIN datess d
ON s.sample_date_key = d.date_key
INNER JOIN deletess sd
ON sd.start_date <= d.s_date
AND d.s_date <= sd.end_date;
Let say I am having "Employee_Sal" table with following columns:
Employe_Sal ( wage_No, From_Date, To_Date, Amt)
No | From_Date | TO_Date | AMT
1 ____01/7/2015 ____25/7/2015___40000
2 ____26/7/2015 ____05/8/2015___38000
3 ____03/8/2015 ____12/8/2015___59000
So here, I want to list out those two record which are like 2nd and 3rd here -
Next records From_Date should be LESS THAN Current Records TO_DATE
if we compare Record 1 and 2 ---- its fine
if we compare record 2 and 3 ----- 5/8/2015 < 3/8/2015 ----- IS FALSE
--- So I want to find such records using SQL Query.
Any Suggestions or Any Help, Please help.
EDITED:
I want to compare my record with my NEXT RECORD ONLY, not with the all records in the Table.
If there are no wage_No gaps guaranteed, try a self join, where you have t1 as "current record", and t2 as "next record":
select t1.*
from Employee_Sal t1
join Employee_Sal t2
on t1.wage_No = t2.wage_No - 1
where t2.From_Date < t1.TO_DATE
A gap safe way, use a correlated sub-select to find next wage_No:
select t1.*
from Employee_Sal t1
join Employee_Sal t2
on t2.wage_No = (select min(wage_No) from Employee_Sal t3
where t3.wage_No > t1.wage_No)
where t2.From_Date < t1.TO_DATE
Or, if no later wage_No is allowed to have a too early From_Date, do a NOT EXISTS:
select t1.*
from Employee_Sal t1
where not exists (select 1 from Employee_Sal t2
where t2.wage_No > t1.wage_No
and t2.From_Date < t1.TO_DATE)