SQL - Find a specific date depending on condition from a table - sql

I have 2 tables having schema as given below
EmployeeDetails - (employeeID,timestamp,status)
EmployeeActivity - (employeedID, timestamp, activity)
Status field is either 0 or 1
What I want to do is, find the timestamp when status goes from 1 to 0. Use this date as parameter in finding activity done after this date from table 2.
I am sorry I am not able to provide a sample query at this point. Since I am not sure whether this can be done using a single query or will I need PL/SQL.
Any help would be appreciated.
Thanks
EDIT
Sample Data
Table 1
employeeID | timestamp | status
1 | 01-NOV-13 | 1
2 | 01-NOV-13 | 1
1 | 02-NOV-13 | 0
2 | 02-NOV-13 | 1
1 | 03-NOV-13 | 0
2 | 03-NOV-13 | 0
Table 2
employeeID | timestamp | activity
1 | 01-NOV-13 | 1
2 | 01-NOV-13 | 1
1 | 02-NOV-13 | 0
2 | 02-NOV-13 | 1
1 | 03-NOV-13 | 1
2 | 03-NOV-13 | 0
Result
employeeID | timestamp | activity
1 | 03-NOV-13 | 1
This is the output since EmployeeId=1 has activity when its status is 0.

WITH previous_statuses AS (
SELECT employeeID,
timestamp_val,
status,
LAG( status ) OVER ( PARTITION BY employeeID ORDER BY timestamp_val ) AS previous_status
FROM employeeDetails
),
changed_statuses AS (
SELECT employeeID,
timestamp_val
FROM previous_statuses
WHERE status = 0
AND previous_status = 1
)
SELECT a.employeeID,
a.timestamp_val,
a.activity
FROM employeeActivity a
INNER JOIN
changed_statuses s
ON ( a.employeeID = s.employeeID
AND a.timestamp_val = s.timestamp_val);
SQLFIDDLE

Related

Partition & consecutive in SQL

fellow stackers
I have a data set like so:
+---------+------+--------+
| user_id | date | metric |
+---------+------+--------+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 1 |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 2 | 3 | 0 |
| 2 | 4 | 1 |
+---------+------+--------+
I am looking to flag those customers who has 3 consecutive "1"s in the metric column. I have a solution as below.
select distinct user_id
from (
select user_id
,metric +
ifnull( lag(metric, 1) OVER (PARTITION BY user_id ORDER BY date), 0 ) +
ifnull( lag(metric, 2) OVER (PARTITION BY user_id ORDER BY date), 0 )
as consecutive_3
from df
) b
where consecutive_3 = 3
While it works it is not scalable. As one can imagine what the above query would look like if I were looking for a consecutive 50.
May I ask if there is a scalable solution? Any cloud SQL will do. Thank you.
If you only want such users, you can use a sum(). Assuming that metric is only 0 or 1:
select user_id,
(case when max(metric_3) = 3 then 1 else 0 end) as flag_3
from (select df.*,
sum(metric) over (partition by user_id
order by date
rows between 2 preceding and current row
) as metric_3
from df
) df
group by user_id;
By using a windowing clause, you can easily expand to as many adjacent 1s as you like.

SQL: Select single item per name with multiple criteria

I'm trying to select a single item per value in a "Name" column according to several criteria.
The criteria I want to use look like this:
Only include results where IsEnabled = 1
Return the single result with the lowest priority (we're using 1 to mean "top priority")
In case of a tie, return the result with the newest Timestamp
I've seen several other questions that ask about returning the newest timestamp for a given value, and I've been able to adapt that to return the minimum value of Priority - but I can't figure out how to filter off of both Priority and Timestamp.
Here is the question that's been most helpful in getting me this far.
Sample data:
+------+------------+-----------+----------+
| Name | Timestamp | IsEnabled | Priority |
+------+------------+-----------+----------+
| A | 2018-01-01 | 1 | 1 |
| A | 2018-03-01 | 1 | 5 |
| B | 2018-01-01 | 1 | 1 |
| B | 2018-03-01 | 0 | 1 |
| C | 2018-01-01 | 1 | 1 |
| C | 2018-03-01 | 1 | 1 |
| C | 2018-05-01 | 0 | 1 |
| C | 2018-06-01 | 1 | 5 |
+------+------------+-----------+----------+
Desired output:
+------+------------+-----------+----------+
| Name | Timestamp | IsEnabled | Priority |
+------+------------+-----------+----------+
| A | 2018-01-01 | 1 | 1 |
| B | 2018-01-01 | 1 | 1 |
| C | 2018-03-01 | 1 | 1 |
+------+------------+-----------+----------+
What I've tried so far (this gets me only enabled items with lowest priority, but does not filter for the newest item in case of a tie):
SELECT DATA.Name, DATA.Timestamp, DATA.IsEnabled, DATA.Priority
From MyData AS DATA
INNER JOIN (
SELECT MIN(Priority) Priority, Name
FROM MyData
GROUP BY Name
) AS Temp ON DATA.Name = Temp.Name AND DATA.Priority = TEMP.Priority
WHERE IsEnabled=1
Here is a SQL fiddle as well.
How can I enhance this query to only return the newest result in addition to the existing filters?
Use row_number():
select d.*
from (select d.*,
row_number() over (partition by name order by priority, timestamp) as seqnum
from mydata d
where isenabled = 1
) d
where seqnum = 1;
The most effective way that I've found for these problems is using CTEs and ROW_NUMBER()
WITH CTE AS(
SELECT *, ROW_NUMBER() OVER( PARTITION BY Name ORDER BY Priority, TimeStamp DESC) rn
FROM MyData
WHERE IsEnabled = 1
)
SELECT Name, Timestamp, IsEnabled, Priority
From CTE
WHERE rn = 1;

Efficient ROW_NUMBER increment when column matches value

I'm trying to find an efficient way to derive the column Expected below from only Id and State. What I want is for the number Expected to increase each time State is 0 (ordered by Id).
+----+-------+----------+
| Id | State | Expected |
+----+-------+----------+
| 1 | 0 | 1 |
| 2 | 1 | 1 |
| 3 | 0 | 2 |
| 4 | 1 | 2 |
| 5 | 4 | 2 |
| 6 | 2 | 2 |
| 7 | 3 | 2 |
| 8 | 0 | 3 |
| 9 | 5 | 3 |
| 10 | 3 | 3 |
| 11 | 1 | 3 |
+----+-------+----------+
I have managed to accomplish this with the following SQL, but the execution time is very poor when the data set is large:
WITH Groups AS
(
SELECT Id, ROW_NUMBER() OVER (ORDER BY Id) AS GroupId FROM tblState WHERE State=0
)
SELECT S.Id, S.[State], S.Expected, G.GroupId FROM tblState S
OUTER APPLY (SELECT TOP 1 GroupId FROM Groups WHERE Groups.Id <= S.Id ORDER BY Id DESC) G
Is there a simpler and more efficient way to produce this result? (In SQL Server 2012 or later)
Just use a cumulative sum:
select s.*,
sum(case when state = 0 then 1 else 0 end) over (order by id) as expected
from tblState s;
Other method uses subquery :
select *,
(select count(*)
from table t1
where t1.id < t.id and state = 0
) as expected
from table t;

Check the first value according to first date

I have two tables
guid | id | Status
-----| -----| ----------
1 | 123 | 0
2 | 456 | 3
3 | 789 | 0
The other table is
id | modified date | Status
------| --------------| ----------
1 | 26-08-2017 | 3
1 | 27-08-2017 | 0
1 | 01-09-2017 | 0
1 | 02-09-2017 | 0
2 | 26-08-2017 | 3
2 | 01-09-2017 | 0
2 | 02-09-2017 | 3
3 | 01-09-2017 | 0
3 | 02-09-2017 | 3
3 | 03-09-2017 | 0
Every time the status in the first table changes for each id it also modifies date and status in second table.Like for id 1 status was changed 4 times.
I want to select those ids by joining both tables whose value of status was 0 in its first modified date.
In this example it should return only id 3 because only id 3 has a status value as 0 on it first modified date 01-09-2017.Ids 1& 2 have value 3 in their first modified date.
Any help
Try using below(Assuming first table as A and second table as B):
;with cte as (
Select a.id, b.Status, row_number() over(partition by a.id order by [modified date] asc) row_num
from A
inner join B
on a.id = b.id
)
Select * from cte where
status = 0 and row_num = 1
Think this will do what your looking for.
WITH cte
AS (SELECT id
, ROW_NUMBER() OVER (PARTITION BY (id) ORDER BY [modified date]) RN
, Status
FROM SecondTable
)
SELECT *
FROM FirstTable
JOIN cte ON FirstTable.id = cte.id
AND RN = 1
WHERE cte.Status = 0
Just expand out the * and return what fields you need.

Best Hive SQL query for this

i have 2 table something like this. i'm running a hive query and windows function seems pretty limited in hive.
Table dept
id | name |
1 | a |
2 | b |
3 | c |
4 | d |
Table time (build with heavy load query so it's make a very slow process if i need to join to another newly created table time.)
id | date | first | last |
1 | 1992-01-01 | 1 | 1 |
2 | 1993-02-02 | 1 | 2 |
2 | 1993-03-03 | 2 | 1 |
3 | 1993-01-01 | 1 | 3 |
3 | 1994-01-01 | 2 | 2 |
3 | 1995-01-01 | 3 | 1 |
i need to retrieve something like this :
SELECT d.id,d.name,
t.date AS firstdate,
td.date AS lastdate
FROM dbo.dept d LEFT JOIN dbo.time t ON d.id=t.id AND t.first=1
LEFT JOIN time td ON d.id=td.id AND td.last=1
How the most optimized answer ?
GROUP BY operation that will be done in a single map-reduce job
select id
,max(name) as name
,max(case when first = 1 then `date` end) as firstdate
,max(case when last = 1 then `date` end) as lastdate
from (select id
,null as name
,`date`
,first
,last
from time
where first = 1
or last = 1
union all
select id
,name
,null as `date`
,null as first
,null as last
from dept
) t
group by id
;
+----+------+------------+------------+
| id | name | firstdate | lastdate |
+----+------+------------+------------+
| 1 | a | 1992-01-01 | 1992-01-01 |
| 2 | b | 1993-02-02 | 1993-03-03 |
| 3 | c | 1993-01-01 | 1995-01-01 |
| 4 | d | (null) | (null) |
+----+------+------------+------------+
select d.id
,max(d.name) as name
,max(case when t.first = 1 then t.date end) as 'firstdate'
,max(case when t.last = 1 then t.date end) as 'lastdate'
from dept d left join
time t on d.id = t.id
where t.first = 1 or t.last = 1
group by d.id