Bigquery avoid null data and merge rows - google-bigquery

In Google Bigquery, i'm having data sets with data dispersed between int_value and double value, how can i merge the
-------------------------------------------------------------------------
|user_id | params.string_value | params.int_value | params.double_value |
-------------------------------------------------------------------------
| 12 | null | null | 121 |
| 12 | Tom | null | null |
| 12 | null | null | 141 |
| 12 | Kim | null | null |
| 13 | null | null | 961 |
| 13 | Jack | null | null |
| 14 | null | null | 31 |
| 14 | Jerry | null | null |
-------------------------------------------------------------------------
Result needed
-------------------------------------------------------------------------
|user_id | params.string_value | params.int_value | params.double_value |
-------------------------------------------------------------------------
| 12 | Tom | null | 121 |
| 12 | Kim | null | 141 |
| 13 | Jack | null | 961 |
| 14 | Jerry | null | 31 |
-------------------------------------------------------------------------
There can be multiple data for same user_id but with different params.string_value | params.int_value | params.double_value
I want to merge all the data which has same user_id in Big Query

Below is for BigQuery Standard SQL
#standardSQL
SELECT user_id, STRUCT(string_value, int_value, double_value) params
FROM (
SELECT user_id,
ARRAY_AGG(params.string_value IGNORE NULLS) string_values,
ARRAY_AGG(params.int_value IGNORE NULLS) int_values,
ARRAY_AGG(params.double_value IGNORE NULLS) double_values
FROM `project.dataset.table`
GROUP BY user_id
)
LEFT JOIN UNNEST(string_values) string_value WITH OFFSET
LEFT JOIN UNNEST(int_values) int_value WITH OFFSET USING(OFFSET)
LEFT JOIN UNNEST(double_values) double_value WITH OFFSET USING(OFFSET)
If to apply to sample data from your question
WITH `project.dataset.table` AS (
SELECT 12 user_id, STRUCT<string_value STRING, int_value INT64, double_value FLOAT64>(NULL, NULL, 121) AS params UNION ALL
SELECT 12, STRUCT('Tom', NULL, NULL) UNION ALL
SELECT 12, STRUCT(NULL, NULL, 141) UNION ALL
SELECT 12, STRUCT('Kim', NULL, NULL) UNION ALL
SELECT 13, STRUCT(NULL, NULL, 961) UNION ALL
SELECT 13, STRUCT('Jack', NULL, NULL) UNION ALL
SELECT 14, STRUCT(NULL, NULL, 31) UNION ALL
SELECT 14, STRUCT('Jerry', NULL, NULL)
)
result is
Row user_id params.string_value params.int_value params.double_value
1 12 Tom null 121.0
2 12 Kim null 141.0
3 13 Jack null 961.0
4 14 Jerry null 31.0

You can use the MAX function :
SELECT user_id,
MAX(params.string_value) as string_value,
MAX(params.int_value) as int_value,
MAX(params.double_value) as double_value
FROM your_dataset.your_table
GROUP BY user_id
MAX do not consider NULL values. Neither do MIN so you can also use this one !

Related

converting NULLs to 0 in an ARRAY join

I have an table that has nulls and I want to replace them with 0s. The table was generated by a join between a table ('Table_A') and an array ('Table_B).
Current table:
Date | Sessions | ID | City
------+----------+-------------+-------------
06-02 | 1 | 107 | Cardiff
| | 102 | Paris
06-03 | NULL | NULL | NULL
11-12 | 1 | 105 | Amsterdam
| | 107 | Cardiff
| | 103 | Rome
27-06 | NULL | NULL | NULL
Desirable Output:
Date | Sessions | ID | City
------+----------+-------------+-------------
06-02 | 1 | 107 | Cardiff
| | 102 | Paris
06-03 | 0 | 0 | 0
11-12 | 1 | 105 | Amsterdam
| | 107 | Cardiff
| | 103 | Rome
27-06 | 0 | 0 | 0
Below is my current code. I can't remove the 'ignore nulls' because it wouldn't allow me to do the join.
select date, Sessions,
array_agg(a.ID ignore nulls) as ID, array_agg(City ignore nulls) as City
from Table_B b, unnest (ID) as ID_un
left join Table_A a on ID_un = cast(a.ID as string)
group by 1, 2
...
Current Table
WITH sample_data AS (
SELECT '06-02' Date, 1 Sessions, [107, 102] ID, ['Cardiff', 'Paris'] City UNION ALL
SELECT '06-03', NULL, [], [] UNION ALL
SELECT '11-12', 1, [105, 107, 103], ['Amsterdam', 'Cardiff', 'Rome'] UNION ALL
SELECT '27-06', NULL, NULL, NULL
)
Note that an empty array is displayed as null in the output.
Desired Output
I have an table that has nulls and I want to replace them with 0s.
Below query replaces an empty or null array to [0] or ['0'] depending on it's type.
SELECT Date,
COALESCE(Sessions, 0) AS Sessions,
IF(ARRAY_LENGTH(ID) = 0 OR ID IS NULL, [0], ID) AS ID,
IF(ARRAY_LENGTH(City) = 0 OR City IS NULL, ['0'], City) AS City,
-- below is a little bit concise notation of above two.
-- IF(ARRAY_LENGTH(ID) > 0, ID, [0]) AS ID,
-- IF(ARRAY_LENGTH(City) > 0, City, ['0']) AS City,
FROM sample_data;
■ Query results

Count of records by category, including zeros

I have a table in the following format:
----------------------------------------------------
| Id | user_name | submitted | reviewed | returned |
---------------------------------------------------------
| 1 | tom | 01-01-2020 | 02-01-2020 | |
| 2 | mary | 01-15-2020 | | |
| 3 | joe | 01-25-2020 | 02-07-2020 | 03-04-2020 |
| 4 | tom | 01-07-2020 | | |
| 5 | tom | 01-04-2020 | | |
| 6 | mary | 01-16-2020 | | |
| 7 | joe | 02-08-2020 | 02-08-2020 | 03-07-2020 |
| 8 | mary | 01-05-2020 | 01-20-2020 | 03-19-2020 |
| 9 | joe | 01-21-2020 | 02-09-2020 | |
---------------------------------------------------------
I want to write a query that counts the Submitted, Reviewed, and Returned records for each user, where "Submitted" is any records where submitted date in not null and reviewed and returned are null. "Reviewed" is any records where submitted and reviewed dates are not null and returned date is null. "Returned is any record where submitted, reviewed and returned dates are not null.
The desired output would be as follows:
-----------------------------------------------------
| user_name | # Submitted | # Reviewed | # Returned |
-----------------------------------------------------
| joe | 0 | 1 | 2 |
| mary | 2 | 0 | 1 |
| tom | 2 | 1 | 0 |
-----------------------------------------------------
I tried doing three separate counts queries grouped by user_name, but those miss the zeros. I'm very new to sql so any help would be greatly appreciated.
Just use count(). Based on your sample data, you can look at each column individually:
select user_name,
count(submitted) as num_submitted,
count(reviewed) as num_reviewed,
count(returned) as num_returned
from t
group by user_name;
There are no examples, for instance, where returned is non-NULL and either of the other columns are NULL.
If that is actually possibly, you could use conditional aggregation:
select user_name,
count(submitted) as num_submitted,
sum(case when submitted is not null and reviewed is not null then 1 else 0 end) as num_reviewed,
sum(case when submitted is not null and reviewed is not null and returned is not null then 1 else 0 end) as num_returned
from t
group by user_name;
You could also use count() and play games with arithmetic:
select user_name,
count(submitted) as num_submitted,
count(day(submitted) + day(reviewed)) as num_reviewed,
count(day(submitted) + day(reviewed) + day(returned)) as num_returned
from t
group by user_name;
This works because day() returns NULL if the value is NULL. And + returns NULL if any value is NULL.
Try this:
DECLARE #DataSource TABLE
(
[id] INT
,[user_name] NVARCHAR(128)
,[submitted] DATE
,[reviewed] DATE
,[returned] DATE
);
INSERT INTO #DataSource ([id], [user_name], [submitted], [reviewed], [returned])
VALUES (1, 'tom', '01-01-2020', '02-01-2020', NULL)
,(2, 'mary', '01-15-2020', NULL, NULL)
,(3, 'joe', '01-25-2020', '02-07-2020', '03-04-2020')
,(4, 'tom', '01-07-2020', NULL, NULL)
,(5, 'tom', '01-04-2020', NULL, NULL)
,(6, 'mary', '01-16-2020', NULL, NULL)
,(7, 'joe', '02-08-2020', '02-08-2020', '03-07-2020')
,(8, 'mary', '01-05-2020', '01-20-2020', '03-19-2020')
,(9, 'joe', '01-21-2020', '02-09-2020', NULL);
SELECT [user_name]
,SUM(IIF([returned] IS NULL AND [reviewed] IS NULL AND [submitted] IS NOT NULL, 1, 0)) AS [ # Submitted]
,SUM(IIF([returned] IS NULL AND [reviewed] IS NOT NULL AND [submitted] IS NOT NULL, 1, 0)) AS [# Reviewed]
,SUM(IIF([returned] IS NOT NULL AND [reviewed] IS NOT NULL AND [submitted] IS NOT NULL, 1, 0)) AS [# Returned]
FROM #DataSource
GROUP BY [user_name];

Linear extrapolate values down to 0 from variable starting points

I want to build a query which allows me to flexible linear extrapolate a number down to Age 0 starting from the last known value. The table (see below) has two columns, column Age and Volume. My last known volume is 321.60 at age 11, how can I linear extrapolate the 321.60 down to age 0 in annual steps? Also, I would like to design the query in a way which allows the age to change. For example, in another scenario the last volume is at age 27. I have been experimenting with the lead function, as a result I can extrapolate the volume at age 10 but the function does not allow me to extrapolate down to 0. How can I design a query which (A) allows me to linear extrapolate to age 0 and (B) is flexible and allows different starting points for the linear extrapolation.
SELECT [age],
[volume],
Concat(CASE WHEN volume IS NULL THEN ( Lead(volume, 1, 0) OVER (ORDER BY age) ) / ( age + 1 ) *
age END, volume) AS 'Extrapolate'
FROM tbl_volume
+-----+--------+-------------+
| Age | Volume | Extrapolate |
+-----+--------+-------------+
| 0 | NULL | NULL |
| 1 | NULL | NULL |
| 2 | NULL | NULL |
| 3 | NULL | NULL |
| 4 | NULL | NULL |
| 5 | NULL | NULL |
| 6 | NULL | NULL |
| 7 | NULL | NULL |
| 8 | NULL | NULL |
| 9 | NULL | NULL |
| 10 | NULL | 292.363 |
| 11 | 321.60 | 321.60 |
| 12 | 329.80 | 329.80 |
| 13 | 337.16 | 337.16 |
| 13 | 343.96 | 343.96 |
| 14 | 349.74 | 349.74 |
+-----+--------+-------------+
If I assume that the value is 0 at 0, then you can use simple arithmetic. This seems to work in your case:
select t.*,
coalesce(t.volume, t.age * (t2.volume / t2.age)) as extrapolated_volume
from t cross join
(select top (1) t2.*
from t t2
where t2.volume is not null
order by t2.age asc
) t2;
Here is a db<>fiddle
You can use a windowing function with an empty over() for this kind of thing. As a trivial example:
create table t(j int, k decimal(3,2));
insert t values (1, null), (2, null), (3, 3), (4, 4);
select j, j * avg(k / j) over ()
from t
Note that avg() ignores nulls.

Converting rows from a table into days of the week

What I thought was going to be a fairly easy task is becoming a lot more difficult than I expected. We have several tasks that get performed sometimes several times per day, so we have a table that gets a row added whenever a user performs the task. What I need is a snapshot of the month with the initials and time of the person that did the task like this:
The 'activity log' table is pretty simple, it just has the date/time the task was performed along with the user that did it and the scheduled time (the "Pass Time" column in the image); this is the table I need to flatten out into days of the week.
Each 'order' can have one or more 'pass times' and each pass time can have zero or more initials for that day. For example, for pass time 8:00, it can be done several times during that day or not at all.
I have tried standard joins to get the orders and the scheduled pass times with no issues, but getting the days of the week is escaping me. I have tried creating a function to get all the initials for the day and just creating
'select FuncCall() as 1, FuncCall() as 2', etc. for each day of the week but that is a real performance suck.
Does anyone know of a better technique?
Update: I think the comment about PIVOT looks promising, but not quite sure because everything I can find uses an aggregate function in the PIVOT part. So if I have the following table:
create table #MyTable (OrderName nvarchar(10),DateDone date, TimeDone time, Initials nvarchar(4), PassTime nvarchar(8))
insert into #MyTable values('Order 1','2018/6/1','2:00','ABC','1st Pass')
insert into #MyTable values('Order 1','2018/6/1','2:20','DEF','1st Pass')
insert into #MyTable values('Order 1','2018/6/1','4:40','XYZ','2nd Pass')
insert into #MyTable values('Order 1','2018/6/3','5:00','ABC','1st Pass')
insert into #MyTable values('Order 1','2018/6/4','4:00','QXY','2nd Pass')
insert into #MyTable values('Order 1','2018/6/10','2:00','ABC','1st Pass')
select * from #MyTable
pivot () -- Can't figure out what goes here since all examples I see have an aggregate function call such as AVG...
drop table #MyTable
I don't see how to get this output since I am not aggregating anything other than the initials column:
Something like this?
DECLARE #taskTable TABLE(ID INT IDENTITY,Task VARCHAR(100),TaskPerson VARCHAR(100),TaskDate DATETIME);
INSERT INTO #taskTable VALUES
('Task before June 2018','AB','2018-05-15T12:00:00')
,('Task 1','AB','2018-06-03T13:00:00')
,('Task 1','CD','2018-06-04T14:00:00')
,('Task 2','AB','2018-06-05T15:00:00')
,('Task 1','CD','2018-06-06T16:00:00')
,('Task 1','EF','2018-06-06T17:00:00')
,('Task 1','EF','2018-06-06T18:00:00')
,('Task 2','GH','2018-06-07T19:00:00')
,('Task 1','CD','2018-06-07T20:00:00')
,('After June 2018','CD','2018-07-15T21:00:00');
SELECT p.*
FROM
(
SELECT t.Task
,ROW_NUMBER() OVER(PARTITION BY t.Task,CAST(t.TaskDate AS DATE) ORDER BY t.TaskDate) AS Taskindex
,CONCAT(t.TaskPerson,' ',CONVERT(VARCHAR(5),t.TaskDate,114)) AS Content
,DAY(TaskDate) AS ColumnName
FROM #taskTable t
WHERE YEAR(t.TaskDate)=2018 AND MONTH(t.TaskDate)=6
) tbl
PIVOT
(
MAX(Content) FOR ColumnName IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10]
,[11],[12],[13],[14],[15],[16],[17],[18],[19],[20]
,[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31])
) P
ORDER BY P.Task,Taskindex;
The result
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task | Taskindex | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task 1 | 1 | NULL | NULL | AB 13:00 | CD 14:00 | NULL | CD 16:00 | CD 20:00 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task 1 | 2 | NULL | NULL | NULL | NULL | NULL | EF 17:00 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task 1 | 3 | NULL | NULL | NULL | NULL | NULL | EF 18:00 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Task 2 | 1 | NULL | NULL | NULL | NULL | AB 15:00 | NULL | GH 19:00 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+--------+-----------+------+------+----------+----------+----------+----------+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
The first trick is, to use the day's index (DAY()) as column name. The second trick is the ROW_NUMBER(). This will add a running index per task and day thus replicating the rows per index. Otherwise you'd get just one entry per day.
You input tables will be more complex, but I think this shows the principles...
UPDATE: So we have to get it even slicker :-D
WITH prepareData AS
(
SELECT t.Task
,t.TaskPerson
,t.TaskDate
,CONVERT(VARCHAR(10),t.TaskDate,126) AS TaskDay
,DAY(t.TaskDate) AS TaskDayIndex
,CONVERT(VARCHAR(5),t.TaskDate,114) AS TimeContent
FROM #taskTable t
WHERE YEAR(t.TaskDate)=2018 AND MONTH(t.TaskDate)=6
)
SELECT p.*
FROM
(
SELECT t.Task
,STUFF((
SELECT ', ' + CONCAT(x.TaskPerson,' ',TimeContent)
FROM prepareData AS x
WHERE x.Task=t.Task
AND x.TaskDay= t.TaskDay
ORDER BY x.TaskDate
FOR XML PATH(''),TYPE
).value(N'.',N'nvarchar(max)'),1,2,'') AS Content
,t.TaskDayIndex
FROM prepareData t
GROUP BY t.Task, t.TaskDay,t.TaskDayIndex
) p--tbl
PIVOT
(
MAX(Content) FOR TaskDayIndex IN([1],[2],[3],[4],[5],[6],[7],[8],[9],[10]
,[11],[12],[13],[14],[15],[16],[17],[18],[19],[20]
,[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31])
) P
ORDER BY P.Task;
The result
+--------+------+------+----------+----------+----------+------------------------------+----------+------+
| Task | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
+--------+------+------+----------+----------+----------+------------------------------+----------+------+
| Task 1 | NULL | NULL | AB 13:00 | CD 14:00 | NULL | CD 16:00, EF 17:00, EF 18:00 | CD 20:00 | NULL |
+--------+------+------+----------+----------+----------+------------------------------+----------+------+
| Task 2 | NULL | NULL | NULL | NULL | AB 15:00 | NULL | GH 19:00 | NULL |
+--------+------+------+----------+----------+----------+------------------------------+----------+------+
This will use a well discussed XML trick within a correlated sub-query in order to get all common entries together as one. With this united content you can go the normal PIVOT path. The aggregate will not compute anything, as there is - for sure - just one value per cell.

SQL - Selecting employees that have ended most recent contract but have other contracts open

I've been going around in circles trying to figure this one out.
I'm trying to select employees who have ended their most recent contract but have an active contract still open from previous.
For example, an employee has several contracts (some may be temporary or part time - this is irrelevant) but ends their most recent contract, however, they still continue to be in their older contracts.
Please see the table below as to what I'm trying to achieve - with relevant fields:
+------+-------------+-------------+------------+------------+
| ID | CONTRACT_ID | EMPLOYEE_ID | START_DATE | END_DATE |
+------+-------------+-------------+------------+------------+
| 4321 | 974 | 321 | 21/01/2004 | 31/12/2016 |
+------+-------------+-------------+------------+------------+
| 4322 | 1485 | 321 | 09/01/2009 | 31/08/2014 |
+------+-------------+-------------+------------+------------+
| 4323 | NULL | 321 | 25/07/2009 | 31/01/2010 |
+------+-------------+-------------+------------+------------+
| 4324 | 2440 | 321 | 01/06/2012 | NULL |
+------+-------------+-------------+------------+------------+
| 4325 | 7368 | 321 | 01/01/2017 | NULL |
+------+-------------+-------------+------------+------------+
| 4326 | 7612 | 321 | 14/02/2017 | 06/06/2017 |
+------+-------------+-------------+------------+------------+
Here is the code I currently have, which is not bringing back the correct data:
select
cond.EMPLOYEE_ID
,cond.END_DATE
from
contracts as cond
join
(select
EMPLOYEE_ID
,START_DATE
,END_DATE
from
contracts
where
END_DATE is null) a on a.EMPLOYEE_ID = cond.employee_id and a.START_DATE <
cond.END_DATE
group by cond.end_date, cond.EMPLOYEE_ID
having
max(cond.START_DATE) is not null AND cond.END_DATE is not null
This is what the code results in (example):
+------+-------------+-------------+------------+------------+
| ID | CONTRACT_ID | EMPLOYEE_ID | START_DATE | END_DATE |
+------+-------------+-------------+------------+------------+
| 1234 | NULL | 123 | 03/12/2014 | 26/10/2015 |
+------+-------------+-------------+------------+------------+
| 1235 | NULL | 123 | 30/10/2015 | 28/01/2016 |
+------+-------------+-------------+------------+------------+
| 1236 | NULL | 123 | 06/11/2015 | 28/01/2016 |
+------+-------------+-------------+------------+------------+
| 1237 | 1234 | 123 | 07/03/2016 | NULL |
+------+-------------+-------------+------------+------------+
| 1238 | NULL | 123 | 04/04/2017 | 13/04/2017 |
+------+-------------+-------------+------------+------------+
| 1239 | NULL | 123 | 18/04/2017 | NULL |
+------+-------------+-------------+------------+------------+
As you can see, the most recent contract does not have an end date but there is an open contract.
Any help much appreciated.
using cross apply() to get the most recent start_date, end_date, and the count of open_contracts using a windowed aggregate function count() over() :
select
c.id
, c.contract_id
, c.employee_id
, start_date
, end_date
, max_start_date = x.start_date
, max_end_date = x.end_date
, x.open_contracts
from contracts c
cross apply (
select top 1
i.start_date
, i.end_date
, open_contracts = count(case when i.end_date is null then 1 end) over(partition by i.employee_id)
from contracts i
where i.employee_id = c.employee_id
order by i.start_date desc
) x
where x.end_date is not null
and x.open_contracts > 0
order by c.employee_id, c.start_date asc
test setup with some additional cases:
create table contracts (id int, contract_id int, employee_id int, start_date date, end_date date);
insert into contracts values
(4321, 974, 321, '20040121', '20161231')
,(4322, 1485, 321, '20090109', '20140831')
,(4323, null, 321, '20090725', '20100131')
,(4324, 2440, 321, '20120601', null)
,(4325, 7368, 321, '20170101', null)
,(4326, 7612, 321, '20170214', '20170606')
,(1, 1, 1, '20160101', null)
,(2, 2, 1, '20160701', '20161231')
,(3, 3, 1, '20170101', null) /* most recent is open, do not return */
,(4, 4, 2, '20160101', '20170630')
,(5, 5, 2, '20160701', '20161231')
,(6, 6, 2, '20170101', '20170630') /* most recent is closed, no others open, do not return */
,(7, 7, 3, '20160101', '20170630')
,(8, 8, 3, '20160701', null)
,(9, 9, 3, '20170101', '20170630') /* most recent is closed, one other open, return */
;
rextester demo: http://rextester.com/BUYKJ77928
returns:
+------+-------------+-------------+------------+------------+----------------+--------------+----------------+
| id | contract_id | employee_id | start_date | end_date | max_start_date | max_end_date | open_contracts |
+------+-------------+-------------+------------+------------+----------------+--------------+----------------+
| 7 | 7 | 3 | 2016-01-01 | 2017-06-30 | 2017-01-01 | 2017-06-30 | 1 |
| 8 | 8 | 3 | 2016-07-01 | NULL | 2017-01-01 | 2017-06-30 | 1 |
| 9 | 9 | 3 | 2017-01-01 | 2017-06-30 | 2017-01-01 | 2017-06-30 | 1 |
| 4321 | 974 | 321 | 2004-01-21 | 2016-12-31 | 2017-02-14 | 2017-06-06 | 2 |
| 4322 | 1485 | 321 | 2009-01-09 | 2014-08-31 | 2017-02-14 | 2017-06-06 | 2 |
| 4323 | NULL | 321 | 2009-07-25 | 2010-01-31 | 2017-02-14 | 2017-06-06 | 2 |
| 4324 | 2440 | 321 | 2012-06-01 | NULL | 2017-02-14 | 2017-06-06 | 2 |
| 4325 | 7368 | 321 | 2017-01-01 | NULL | 2017-02-14 | 2017-06-06 | 2 |
| 4326 | 7612 | 321 | 2017-02-14 | 2017-06-06 | 2017-02-14 | 2017-06-06 | 2 |
+------+-------------+-------------+------------+------------+----------------+--------------+----------------+
I'm not a SQL-server expert, but you might try something similar to this:
SELECT *
FROM contracts cont
WHERE cont.end_date IS NOT NULL
AND cont.end_date <= SYSDATE
AND NOT EXISTS (SELECT *
FROM contracts recent
WHERE recent.employee_id = cont.employee_id
AND recent.start_date > cont.start_date)
AND EXISTS (SELECT *
FROM contracts openc
WHERE openc.employee_id = cont.employee_id
AND (openc.end_date IS NULL OR openc.end_date > SYSDATE))
The first 2 conditions search for closed contracts.
The next one ("NOT EXISTS") makes sure the selected contract is the most recent one.
The last part assures there are other open contracts.
Try this dude.
SELECT [EMPLOYEE_ID]
FROM [contracts]
WHERE [END_DATE] IS NULL
AND [EMPLOYEE_ID] IN (SELECT B.[EMPLOYEE_ID] FROM (
SELECT * FROM (
SELECT RowN = Row_Number() over (partition by [EMPLOYEE_ID] ORDER BY[START_DATE] DESC)
, [EMPLOYEE_ID]
, [CONTRACT_ID]
, [END_DATE]
FROM [contracts]
) A
WHERE A.[END_DATE] IS NOT NULL
AND A.[RowN] = 1) B)
You can do this with ROW_NUMBER() and a CTE
See it in action: http://rextester.com/HQVXF56741
In the code below, I changed the dateformat which you may not have to do.
set dateformat dmy
declare #table table (ID int,CONTRACT_ID int, EMPLOYEE_ID int, [START_DATE] datetime, END_DATE datetime)
insert into #table
values
(4321,974,321,'21/01/2004','31/12/2016'),
(4322,1485,321,'09/01/2009','31/08/2014'),
(4323,NULL,321,'25/07/2009','31/01/2010'),
(4324,2440,321,'01/06/2012',NULL),
(4325,7368,321,'01/01/2017',NULL),
(4326,7612,321,'14/02/2017','06/06/2017')
--this applies a row_number to each contract per employee
--the most recent contract (by start date) gets a 1
;with cte as(
select
EMPLOYEE_ID
,ID
,row_number() over (partition by EMPLOYEE_ID order by [START_DATE] desc) as ContractRecentcy
from #table)
--this will return all contacts that are open, which aren't the most recent for the employee.
select
t.*
from
#table t
where
t.END_DATE is null
and t.ID not in (select ID from cte where ContractRecentcy = 1)
set dateformat mdy