Postgresql - access/use joined subquery result in another joined subquery - sql

I have a database with tables for
equipment we service (table e, field e_id)
contracts on the equipment (table c, fields c_id, e_id, c_start, c_end)
maintenance we have performed in the past (table m, e_id, m_id,
m_date)
I am trying to build a query that will show me all equipment records, if it is currently in contract with the start/end date, and a count of any maintenance performed since the start date of the contract.
I have a subquery to get the current contract (this table is large and has a new line for each contract revision), but I can't work out how to use the result of the contract subquery to return the maintenance visits since that date without returning multiple lines.
select
e.e_id,
c2.c_id,
c2.c_start,
c2.c_end,
m2.count
from e
left join (
select
c_id,
c_start,
c_end,
e_id
...other things and filtering by joining the table to itself
from c
) as c2 on c2.e_id = e.e_id
I would also like to be able to add this
m-subquery v1
left join (
select
count(*),
e_id
from m
where m.m_date >= c2.start
) as m2 on m2.e_id = e.e_id
But I'm unable to access c2.C_start from within the second subquery.
I am able to return this table by joining outside the subquery, but this returns multiple lines.
m-subquery v2
left join (
select
e_id,
m_date,
from m
) as m2 on m2.e_id = e.e_id and m.m_date >= c2.start
Is there a way to:
Get the subquery field c2.start into the m-subquery v1?
Aggregate the result of the m-subquery v2 without using group by (there are a lot of columns in the main select query)?
Do this differently?
I've seen lateral which I kind of think might be what I need but I have tried the keyword in front of both subqueries individually and together and it didn't work to let me use c2.c_start inside at any point.
I am a little averse to using group by, mainly as the BI analyst at work says "slap a group by on it" when there are duplicates in reports rather than trying to understand the business process/database properly. I feel like having a group by on the main query shouldn't be needed when I know for certain that the e table has one record per e_id, and the mess that having probably 59 out of 60 columns named in the group by would cause might make the query less maintainable.
Thanks,
Sam

Since not all RDBMS support lateral, I would like to present you the following general solution. You can make use of CTEs (WITH queries) to help structuring the query and reuse partial results. E.g. in the following code, you can think of current_contracts as a kind of virtual table existing only during query execution.
Part 1: DDLs and test data
DROP TABLE IF EXISTS e;
CREATE TABLE e
(
e_id INTEGER
);
DROP TABLE IF EXISTS c;
CREATE TABLE c
(
c_id INTEGER,
e_id INTEGER,
c_start DATE,
c_end DATE
);
DROP TABLE IF EXISTS m;
CREATE TABLE m
(
e_id INTEGER,
m_id INTEGER,
m_date DATE
);
INSERT INTO e VALUES (101),(102),(103);
INSERT INTO c VALUES (201, 101, DATE '2021-01-01', DATE '2021-12-31'), (202, 102, DATE '2021-03-01', DATE '2021-04-15'), (203, 102, DATE '2021-04-16', DATE '2021-04-30'), (204, 103, DATE '2003-01-01', DATE '2003-12-31'), (205, 103, DATE '2021-04-01', DATE '2021-04-30');
INSERT INTO m VALUES (101, 301, DATE '2021-01-01'), (101, 302, DATE '2021-02-01'), (101, 303, DATE '2021-03-01'), (102, 304, DATE '2021-04-02'), (102, 305, DATE '2021-04-03'), (103, 306, DATE '2021-04-03');
Part 2: the actual query
WITH
-- find currently active contracts per equipment:
-- we assume there is 0 or 1 contract active per equipment at any time
current_contracts AS
(
SELECT *
FROM c
WHERE c.c_start <= CURRENT_DATE -- only active contracts
AND c.c_end >= CURRENT_DATE -- only active contracts
),
-- count maintenance visits during the (single) active contract per equipment, if any:
current_maintenance AS
(
SELECT m.e_id, COUNT(*) AS count_m_per_e -- a count of maintenance visits per equipment
FROM m
INNER JOIN current_contracts cc
ON cc.e_id = m.e_id -- match maintenance to current contracts via equipment
AND cc.c_start <= m.m_date -- only maintenance that was done during the current contract
GROUP BY m.e_id
)
-- bring the parts together for our result:
-- we start with equipment and use LEFT JOINs to assure we retain all equipment
SELECT
e.*,
cc.c_start, cc.c_end,
CASE WHEN cc.e_id IS NOT NULL THEN 'yes' ELSE 'no' END AS has_contract,
COALESCE(cm.count_m_per_e, 0) -- to replace NULL when no contract is active
FROM e
LEFT JOIN current_contracts cc
ON cc.e_id = e.e_id
LEFT JOIN current_maintenance cm
ON cm.e_id = e.e_id
ORDER BY e.e_id;
Please note that your real pre-processing logic for contracts and maintenance visits may be more complex, e.g. due to overlapping periods of active contracts per equipment.

Related

How to do an as-of-join in SQL (Snowflake)?

I am looking to join two time-ordered tables, such that the events in table1 are matched to the "next" event in table2 (within the same user). I am using SQL / Snowflake for this.
For argument's sake table1 is "notification_clicked" events and table2 is "purchases"
This is one way to do it:
WITH partial_result AS (
SELECT
userId, notificationId, notificationTimeStamp, transactionId, transactionTimeStamp
FROM table1 CROSS JOIN table2
WHERE table1.userId = table2.userId
AND notificationTimeStamp <= transactionTimeStamp)
SELECT *
FROM partial_result
QUALIFY ROW_NUMBER() OVER(
PARTITION BY userId, notificationId ORDER BY transactionTimeStamp ASC
) = 1
It is not super readable, but is this "the" way to do this?
If you're doing an AsOf join against small tables, you can use a regular Venn diagram type of join. If you're running it against large tables, a regular join will lead to an intermediate cardinality explosion before the filter.
For large tables, this is the highest performance approach I have to date. Rather than treating an AsOf join like a regular Venn diagram join, we can treat it like a special type of union between two tables with a filter that uses the information from that union. The sample SQL does the following:
Unions the A and B tables so that the Entity and Time come from both tables and all other columns come from only one table. Rows from the other table specify NULL for these values (measures 1 and 2 in this case). It also projects a source column for the table. We'll use this later.
In the unioned table, it uses a LAG function on windows partitioned by the Entity and ordered by the Time. For each row with a source indicator from the A table, it lags back to the first Time with source in the B table, ignoring all values in the A table.
with A as
(
select
COLUMN1::int as "E", -- Entity
COLUMN2::int as "T", -- Time
COLUMN4::string as "M1" -- Measure (could be many)
from (values
(1, 7, 1, 'M1-1'),
(1, 8, 1, 'M1-2'),
(1, 41, 1, 'M1-3'),
(1, 89, 1, 'M1-4')
)
), B as
(
select
COLUMN1::int as "E", -- Entity
COLUMN2::int as "T", -- Time
COLUMN4::string as "M2" -- Different measure (could be many)
from (values
(1, 6, 1, 'M2-1'),
(1, 12, 1, 'M2-2'),
(1, 20, 1, 'M2-3'),
(1, 35, 1, 'M2-4'),
(1, 57, 1, 'M2-5'),
(1, 85, 1, 'M2-6'),
(1, 92, 1, 'M2-7')
)
), UNIONED as -- Unify schemas and union all
(
select 'A' as SOURCE_TABLE -- Project the source table
,E as AB_E -- AB_ means it's unified
,T as AB_T
,M1 as A_M1 -- A_ means it's from A
,NULL::string as B_M2 -- Make columns from B null for A
from A
union all
select 'B' as SOURCE_TABLE
,E as AB_E
,T as AB_T
,NULL::string as A_M1 -- Make columns from A null for B
,M2 as B_M2
from B
)
select AB_E as ENTITY
,AB_T as A_TIME
,lag(iff(SOURCE_TABLE = 'A', null, AB_T)) -- Lag back to
ignore nulls over -- previous B row
(partition by AB_E order by AB_T) as B_TIME
,A_M1 as M1_FROM_A
,lag(B_M2) -- Lag back to the previous non-null row.
ignore nulls -- The A sourced rows will already be NULL.
over (partition by AB_E order by AB_T) as M2_FROM_B
from UNIONED
qualify SOURCE_TABLE = 'A'
;
This will perform orders of magnitude faster for large tables because the highest intermediate cardinality is guaranteed to be the cardinality of A + B.
To simplify this refactor, I wrote a stored procedure that generates the SQL given the paths to table A and B, the entity column in A and B (right now limited to one, but if you have more it will get the SQL started), the order by (time) column in A and B, and finally the list of columns to "drag through" the AsOf join. It's rather lengthy so I posted it on Github and will work later to document and enhance it:
https://github.com/GregPavlik/AsOfJoin/blob/main/StoredProcedure.sql

Is this a bug, or does Snowflake not fully support correlated subqueries in a WHERE EXISTS clause?

Snowflake is throwing an error for an EXISTS clause if a filter condition depends on coalescing columns from both the outer table and the subquery table. The query will run if I remove the outer-table column from the COALESCE or replace the COALESCE with the long-form equivalent logic.
I'm seeing this error, specifically SQL compilation error: Unsupported subquery type cannot be evaluated, for what I would consider to be a fairly straightforward WHERE EXISTS clause. This would work in every (recent) SQL variant that I've used (e.g., SQL Server, Postgres), so I'm a little concerned that Snowflake doesn't support it. Am I missing something?
I found what seems to be a similar question at Snowflake's Community in 2019, where Snowflake was failing when the EXISTS clause included a WHERE filter condition that referenced a column from an outer query for something other than joining the tables. There was not a clear solution there.
Snowflake's documentation on its limited support for subqueries says that it supports both correlated and uncorrelated subqueries for "EXISTS, ANY / ALL, and IN subqueries in WHERE clauses".
So why is it failing on this EXISTS clause? Is what I'm seeing a bug, or is this a Snowflake limitation that is not clearly documented?
Code to reproduce the issue:
CREATE OR REPLACE TEMPORARY TABLE Employee (
Emp_SK INT NOT NULL
);
CREATE OR REPLACE TEMPORARY TABLE Employee_X_Pay_Rate (
Emp_SK INT NOT NULL, Pay_Rate_SK INT NOT NULL, Start_Date TIMESTAMP_NTZ NOT NULL, End_Date TIMESTAMP_NTZ NOT NULL
);
CREATE OR REPLACE TEMPORARY TABLE Employee_X_Location (
Emp_SK INT NOT NULL, Location_SK INT NOT NULL, Start_Date TIMESTAMP_NTZ NOT NULL, End_Date TIMESTAMP_NTZ NULL
);
INSERT INTO Employee
VALUES (1);
INSERT INTO Employee_X_Pay_Rate
VALUES
(1, 1, '2018-01-01', '2019-03-31')
,(1, 2, '2019-04-01', '2021-03-31')
,(1, 3, '2021-04-01', '2099-12-31')
;
INSERT INTO Employee_X_Location
VALUES
(1, 101, '2018-01-01', '2019-12-31')
,(1, 102, '2020-01-01', '2020-12-31')
,(1, 103, '2021-01-01', NULL)
;
SET Asof_Date = TO_DATE('2021-05-31', 'yyyy-mm-dd'); -- changing this to TO_TIMESTAMP makes no difference
SELECT
emp.Emp_SK
,empPay.Pay_Rate_SK
,$Asof_Date AS Report_Date
,empPay.Start_Date AS Pay_Start_Date
,empPay.End_Date AS Pay_End_Date
FROM Employee emp
INNER JOIN Employee_X_Pay_Rate empPay
ON emp.Emp_SK = empPay.Emp_SK
AND $Asof_Date BETWEEN empPay.Start_Date AND empPay.End_Date
WHERE EXISTS (
SELECT 1 FROM Employee_X_Location empLoc
WHERE emp.Emp_SK = empLoc.Emp_SK
-- Issue: Next line fails. empLoc.End_Date can be null
AND $Asof_Date BETWEEN empLoc.Start_Date AND COALESCE(empLoc.End_Date, empPay.End_Date)
);
The query will run if I replace the issue line with either of the following.
-- Workaround 1
AND (
$Asof_Date >= empLoc.Start_Date
AND ($Asof_Date <= empLoc.End_Date OR (empLoc.End_Date IS NULL AND $Asof_Date <= empPay.End_Date))
)
-- Workaround 2
AND $Asof_Date BETWEEN empLoc.Start_Date AND COALESCE(empLoc.End_Date, CURRENT_DATE)
I see this still happens, and I just notice you already know about swapping empPay.End_Date to CURRENT_DATE which is how I would have written it.
It does make the correlated sub query more complex because now you are mixing in two tables instead of one.
when CURRENT_DATE is used the SQL is the same as:
SELECT
s.emp_sk
,ep.pay_rate_sk
,TO_DATE('2021-05-31') AS report_date
,ep.start_date AS pay_start_date
,ep.end_date AS pay_end_date
FROM (
SELECT
e.emp_sk
FROM employee e
WHERE EXISTS (
SELECT 1
FROM employee_x_location AS el
WHERE e.emp_sk = el.emp_sk
AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, CURRENT_DATE)
)
) AS s
JOIN employee_x_pay_rate AS ep
ON s.emp_sk = ep.emp_sk
AND TO_DATE('2021-05-31') BETWEEN ep.start_date AND ep.end_date;
so demonstrating the correlation is complex verse simple can be shown, by swapping employee table with employee_x_pay_rate in the sub-select, like so:
SELECT
e.emp_sk
FROM Employee_X_Pay_Rate e
WHERE EXISTS (
SELECT 1
FROM employee_x_location AS el
WHERE e.emp_sk = el.emp_sk
AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, CURRENT_DATE)
)
works, but use the value from that table does not:
SELECT
e.emp_sk
FROM Employee_X_Pay_Rate e
WHERE EXISTS (
SELECT 1
FROM employee_x_location AS el
WHERE e.emp_sk = el.emp_sk
AND TO_DATE('2021-05-31') BETWEEN el.start_date AND COALESCE(el.end_date, e.End_Date)
)
sign IFNULL(el.end_date, e.End_Date) and NVL(el.end_date, e.End_Date) both fail also.
But you can restruct the code to move the COALESCE into a CTE, and then use the WHERE EXISTS like so:
WITH r_emp_pay AS (
SELECT
empPay.Emp_SK
,empPay.Pay_Rate_SK
,empPay.Start_Date
,empPay.End_Date
FROM Employee_X_Pay_Rate AS empPay
WHERE TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empPay.Start_Date AND empPay.End_Date
), r_emp_loc AS (
SELECT
empLoc.Emp_SK
,empLoc.Start_Date
,empLoc.End_Date
,COALESCE(empLoc.End_Date, empPay.End_Date) as col_end_date
FROM Employee_X_Location empLoc
JOIN r_emp_pay empPay
ON empPay.Emp_SK = empLoc.Emp_SK
WHERE TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empLoc.Start_Date AND COALESCE(empLoc.End_Date, CURRENT_DATE)
)
SELECT
emp.Emp_SK
,empPay.Pay_Rate_SK
,TO_DATE('2021-05-31', 'yyyy-mm-dd') AS Report_Date
,empPay.Start_Date AS Pay_Start_Date
,empPay.End_Date AS Pay_End_Date
FROM Employee emp
JOIN r_emp_pay empPay
ON emp.Emp_SK = empPay.Emp_SK
WHERE EXISTS (
SELECT 1 FROM r_emp_loc empLoc
WHERE emp.Emp_SK = empLoc.Emp_SK
AND TO_DATE('2021-05-31', 'yyyy-mm-dd') BETWEEN empLoc.Start_Date AND empLoc.col_end_date
);
gives:
EMP_SK
PAY_RATE_SK
REPORT_DATE
PAY_START_DATE
PAY_END_DATE
1
3
2021-05-31
2021-04-01 00:00:00.000
2099-12-31 00:00:00.000
I have been looking into this and it looks like Snowflake supports correlated subquery in where clause if it is convinced that the inner subquery returns a scalar result. So in short, using functions like COUNT, ANY_VALUE or DISTINCT can give the result.
Consider the following query -
SELECT * FROM Department D
INNER JOIN (select * from Employee) E
ON D.DepartmentID = E.DepartmentID
where EXISTS( SELECT **distinct** 1 FROM EMPLOYEES E1 WHERE E1.DEPARTMENTID=D.DEPARTMENTID );
(ignore the similar tables, they were used for testing).
My requirement was to have TOP 1 in the WHERE EXISTS clause. However, since Snowflake does not support it, I managed the same using DISTINCT.
That being said, your query might work with the same WHERE EXISTS(SELECT DISTINCT 1.... as you need just to know whether records exist or not.
References to [Snowflake Community Forum-]:(https://community.snowflake.com/s/question/0D53r00009mIxwYCAS/sql-compilation-error-unsupported-subquery-type-cannot-be-evaluated?t=1626899830543)
(https://community.snowflake.com/s/question/0D50Z00008BDZz0SAH/subquery-in-select-clause)

How merge two tables and average it (hourly vs daily tables)

I have the following tables:
CREATE TABLE a (DATE TEXT, PRICE INTEGER)
INSERT INTO a VALUES
("2019-04-27", 10), ("2019-04-29",20), ("2019-04-30",30), ("2019-05-01",40);
CREATE TABLE b (DATE TEXT, PRICE INTEGER)
INSERT INTO b VALUES
("2019-04-27 01:00", 1), ("2019-04-27 02:30)",3), ("2019-04-27 18:00",2),
("2019-04-28 17:00",2), ("2019-04-28 21:00",5),
("2019-04-29 17:00",50), ("2019-04-29 21:00",10),
("2019-04-30 17:00",10), ("2019-04-30 21:00",20),
("2019-05-01 17:00",40), ("2019-05-01 21:00",10),
("2019-05-02 17:00",10), ("2019-05-02 21:00",6);
I need to merge this two tables, so that Table b is averaged to daily and table has 2 columns (1 is date (all dates are necessary to be there) and 2 is Price (Null if no observations for that date). I tried several left joins , however do not know how to tackle the problem that I cannot average hourly data to the daily.
Could you help?
Please, execute query as per below SQL-Fiddle:
select DATE(c.date) as date, avg(c.price) as avg_price
from
(select date, price
from a
union all
select date, price
from b
) as c
group by DATE(c.date);
I suspect that you want a result set with two columns. I'm not a fan of having the date be in a string datatype, but you can use string functions for what you want:
select date, sum(price_a) as price_a, sum(price_b) as price_b
from (select a.date, a.price as price_a, null as price_b
from a
union all
select substr(b.date, 1, 10), null, price
from b
) ab
group by date;

How to create a "history" table in calaculation view for each day based on "time dimension"

I have very HANA/SQLScript specified problem. I need to create a history view for each day in form of scripted calculation view based on table with dates gaps?
I tried every cross joins with "_SYS_BI"."M_TIME_DIMENSION" and just use Functions.
In my experience cross joins are ideal fro cumulative sum but not for showing single value.
HANA dont allow to use columns as a input parameters to table functions.
Scalar functions cannot be used in calculation view too. I can activate that view but i cannot read data from it. Even if function standalone works fine:
Business Partner Discount Table:
CREATE COLUMN TABLE "BP_DISCOUNT" ("BP_CODE" VARCHAR(50) NOT NULL ,
"DATE" DATE L ,
"DISCOUNT" DECIMAL(21,6);
insert into "BP_DISCOUNT" values('abc','20190101','0');
insert into "BP_DISCOUNT" values('abc','20190105','5');
insert into "BP_DISCOUNT" values('abc','20190110','10');
My function that i wanted to use in place of:
CREATE FUNCTION bp_discountF (BP_Code varchar(50), Date_D date)
RETURNS discount decimal(21,6)
LANGUAGE SQLSCRIPT
SQL SECURITY INVOKER
AS
BEGIN
select "DISCOUNT" into DISCOUNT
from "BP_DISCOUNT" d
where
:BP_CODE = d."BP_CODE" and d."DATE"<= :DATE_D order by d."DATE" desc limit 1;
END;
My goal is to create a view that shows discount values for every possible day based on most recent value.
It must be in form of scripted calculation view to join it to more complex sales report views.
Expected result is to join it on BP and document date:
...
'abc', '20190101', 0;
'abc', '20190102', 0;
'abc', '20190103', 0;
'abc', '20190104', 0;
'abc', '20190105', 5;
'abc', '20190106', 5;
'abc', '20190107', 5;
'abc', '20190108', 5;
'abc', '20190109', 5;
'abc', '20190110', 10;
'abc', '20190111', 10;
..
you could try it like this:
DO BEGIN
times=
select DATE_SAP from M_TIME_DIMENSION
where DATE_SAP between '20190101' and '20190110';
dates=
select * from :times
left join bp_discount
on DATE <= DATE_SAP
order by DATE_SAP;
listdates=
select DATE_SAP, BP_CODE, max(DATE) as DATE
from :dates
group by DATE_SAP, BP_CODE
order by DATE_SAP;
select ld.DATE, ld.DATE_SAP, ld.BP_CODE, bpd.DISCOUNT from :listdates as ld
inner join bp_discount as bpd
on ld.BP_CODE = bpd.BP_CODE and ld.DATE = bpd.DATE
order by DATE_SAP;
END;
Times are just the dates you need, to make it easier I just selected to needed ones..
In dates, you get a table of every date from your discount table, and every date from the Time Dimension Table that is bigger than that.
Now you want every time the max Date, because of course, 06.01.2019 is also bigger than 01.01.2019, but you want the reference to the 05.01.2019 at this date. So you select Max(DATE) and you get the 01.01.2019 for every Day before the 05.01.2019 which is the next day in your pb_discount list.
Since you cannot group by the the discount, you join that as a last step and should have the table you need.

avoiding group by for column used in datediff?

As the database is currently constructed, I can only use a Date Field of a certain table in a datediff-function that is also part of a count aggregation (not the date field, but that entity where that date field is not null. The group by in the end messes up the counting, since the one entry is counted on it's own / as it's own group.
In some detail:
Our lead recruiter want's a report that shows the sum of applications, and conducted interviews per opening. So far no problem. Additionally he likes to see the total duration per opening from making it public to signing a new employee per opening and of cause only if the opening could already be filled.
I have 4 tables to join:
table 1 holds the data of the opening
table 2 has the single applications
table 3 has the interview data of the applications
table 4 has the data regarding the publication of the openings (with the date when a certain opening was made public)
The problem is the duration requirement. table 4 holds the starting point and in table 2 one (or none) applicant per opening has a date field filled with the time he returned a signed contract and therefor the opening counts as filled. When I use that field in a datediff I'm forced to also put that column in the group by clause and that results in 2 row per opening. 1 row has all the numbers as wanted and in the second row there is always that one person who has a entry in that date field...
So far I haven't come far in thinking of a way of avoiding that problem except for explanining to the colleague that he get's his time-to-fill number in another report.
SELECT
table1.col1 as NameOfProject,
table1.col2 as Company,
table1.col3 as OpeningType,
table1.col4 as ReasonForOpening,
count (table2.col2) as NumberOfApplications,
sum (case when table2.colSTATUS = 'withdrawn' then 1 else 0 end) as mberOfApplicantsWhoWithdraw,
sum (case when table3.colTypeInterview = 'PhoneInterview' then 1 else 0 end) as NumberOfPhoneInterview,
...more sum columns...,
table1.finished, // shows „1“ if opening is occupied
DATEDIFF(day, table4.colValidFrom, **table2.colContractReceived**) as DaysToCompletion
FROM
table2 left join table3 on table2.REF_NR = table3.REF_NR
join table1 on table2.PROJEKT = table1.KBEZ
left join table4 on table1.REFNR = table4.PRJ_REFNR
GROUP BY
**table2.colContractReceived**
and all other columns except the ones in aggregate (sum and count) functions go in the GROUP BY section
ORDER BY table1.NameOfProject
Here is a short rebuild of what it looks like. First a row where the opening is not filled and all aggregations come out in one row as wanted. The next project/opening shows up double, because the field used in the datediff is grouped independently...
project company; no_of_applications; no_of_phoneinterview; no_of_personalinterview; ... ; time_to_fill_in_days; filled?
2018_312 comp a 27 4 2 null 0
2018_313 comp b 54 7 4 null 0
2018_313 comp b 1 1 1 42 1
I'd be glad to get any idea how to solve this. Thanks for considering my request!
(During the 'translation' of all the specific column and table names I might have build in a syntax error here and there but the query worked well ecxept for that unwanted extra aggregation per filled opening)
If I've understood your requirement properly, I believe the issue you are having is that you need to show the date between the starting point and the time at which an applicant responded to an opening, however this must only show a single row based on whether or not the position was filled (if the position was filled, then show that row, if not then show that row).
I've achieved this result by assuming that you count a position as filled using the "ContractsRecevied" column. This may be wrong however the principle should still provide what you are looking for.
I've essentially wrapped your query in to a subquery, performed a rank ordering by the contractsfilled column descending and partitioned by the project. Then in the outer query I filter for the first instance of this ranking.
Even if my assumption about the column structure and data types is wrong, this should provide you with a model to work with.
The only issue you might have with this ranking solution is if you want to aggregate over both rows within one (so include all of the summed columns for both the position filled and position not filled row per project). If this is the case let me know and we can work around that.
Please let me know if you have any questions.
declare #table1 table (
REFNR int,
NameOfProject nvarchar(20),
Company nvarchar(20),
OpeningType nvarchar(20),
ReasonForOpening nvarchar(20),
KBEZ int
);
declare #table2 table (
NumberOfApplications int,
Status nvarchar(15),
REF_NR int,
ReturnedApplicationDate datetime,
ContractsReceived bit,
PROJEKT int
);
declare #table3 table (
TypeInterview nvarchar(25),
REF_NR int
);
declare #table4 table (
PRJ_REFNR int,
StartingPoint datetime
);
insert into #table1 (REFNR, NameOfProject, Company, OpeningType, ReasonForOpening, KBEZ)
values (1, '2018_312', 'comp a' ,'Permanent', 'Business growth', 1),
(2, '2018_313', 'comp a', 'Permanent', 'Business growth', 2),
(3, '2018_313', 'comp a', 'Permanent', 'Business growth', 3);
insert into #table2 (NumberOfApplications, Status, REF_NR, ReturnedApplicationDate, ContractsReceived, PROJEKT)
values (27, 'Processed', 4, '2018-04-01 08:00', 0, 1),
(54, 'Withdrawn', 5, '2018-04-02 10:12', 0, 2),
(1, 'Processed', 6, '2018-04-15 15:00', 1, 3);
insert into #table3 (TypeInterview, REF_NR)
values ('Phone', 4),
('Phone', 5),
('Personal', 6);
insert into #table4 (PRJ_REFNR, StartingPoint)
values (1, '2018-02-25 08:00'),
(2, '2018-03-04 15:00'),
(3, '2018-03-04 15:00');
select * from
(
SELECT
RANK()OVER(Partition by NameOfProject, Company order by ContractsReceived desc) as rowno,
table1. NameOfProject,
table1.Company,
table1.OpeningType,
table1.ReasonForOpening,
case when ContractsReceived >0 then datediff(DAY, StartingPoint, ReturnedApplicationDate) else null end as TimeToFillInDays,
ContractsReceived Filled
FROM
#table2 table2 left join #table3 table3 on table2.REF_NR = table3.REF_NR
join #table1 table1 on table2.PROJEKT = table1.KBEZ
left join #table4 table4 on table1.REFNR = table4.PRJ_REFNR
group by NameOfProject, Company, OpeningType, ReasonForOpening, ContractsReceived,
StartingPoint, ReturnedApplicationDate
) x where rowno=1