Several joins in query - possible to replacement to gain performance? - sql

I have a table consisting of 10 million rows where I am trying to find who was the first/last maintainer of some machines (id) depending on some dates and also depending on what status the machine had. My query uses six joins, is there any other preferred option?
EDIT: The original table has index, trying to optimise the query replacing the joins - if its possible?
SQL Fiddle with example:
SQL Fiddle
EDIT (added additional information below):
Example table:
CREATE TABLE vendor_info (
id INT,
datestamp INT,
statuz INT,
maintainer VARCHAR(25));
INSERT INTO vendor_info VALUES (1, 20180101, 0, 'Jay');
INSERT INTO vendor_info VALUES (2, 20180101, 0, 'Eric');
INSERT INTO vendor_info VALUES (3, 20180101, 1, 'David');
INSERT INTO vendor_info VALUES (1, 20180201, 1, 'Jay');
INSERT INTO vendor_info VALUES (2, 20180201, 0, 'Jay');
INSERT INTO vendor_info VALUES (3, 20180201, 1, 'Jay');
INSERT INTO vendor_info VALUES (1, 20180301, 1, 'Jay');
INSERT INTO vendor_info VALUES (2, 20180301, 1, 'David');
INSERT INTO vendor_info VALUES (3, 20180301, 1, 'Eric');
Query and desired output:
SELECT
id
, MIN(datestamp) AS min_datestamp
, MAX(datestamp) AS max_datestamp
, MAX(case when statuz = 0 then datestamp end) AS max_s0_date
, MAX(case when statuz = 1 then datestamp end) AS max_s1_date
, MIN(case when statuz = 0 then datestamp end) AS min_s0_date
, MIN(case when statuz = 1 then datestamp end) AS min_s1_date
INTO vendor_dates
FROM vendor_info
GROUP BY id;
SELECT
vd.id
, v1.maintainer AS first_maintainer
, v2.maintainer AS last_maintainer
, v3.maintainer AS last_s0_maintainer
, v4.maintainer AS last_s1_maintainer
, v5.maintainer AS first_s0_maintainer
, v6.maintainer AS first_s1_maintainer
FROM vendor_dates vd
LEFT JOIN vendor_info v1 ON vd.id = v1.id AND vd.min_datestamp = v1.datestamp
LEFT JOIN vendor_info v2 ON vd.id = v2.id AND vd.max_datestamp = v2.datestamp
LEFT JOIN vendor_info v3 ON vd.id = v3.id AND vd.max_s0_date = v3.datestamp
LEFT JOIN vendor_info v4 ON vd.id = v4.id AND vd.max_s1_date = v4.datestamp
LEFT JOIN vendor_info v5 ON vd.id = v5.id AND vd.min_s0_date = v5.datestamp
LEFT JOIN vendor_info v6 ON vd.id = v6.id AND vd.min_s1_date = v6.datestamp;

Adding an index to vendor_info reduces duration of your 2nd query from over 300ms to under 30ms average over repeated runs
PRIMARY KEY CLUSTERED (id, datestamp)
Changing the 2 step process into a CTE reduces total duration even more to well under 15ms average over repeated runs.
The CTE method lets the query optimiser use the new primary key
CREATE TABLE vendor_info (
id INT,
datestamp INT,
statuz INT,
maintainer VARCHAR(25)
PRIMARY KEY CLUSTERED (id, datestamp)
);
INSERT INTO vendor_info VALUES (1, 20180101, 0, 'Jay');
INSERT INTO vendor_info VALUES (2, 20180101, 0, 'Eric');
INSERT INTO vendor_info VALUES (3, 20180101, 1, 'David');
INSERT INTO vendor_info VALUES (1, 20180201, 1, 'Jay');
INSERT INTO vendor_info VALUES (2, 20180201, 0, 'Jay');
INSERT INTO vendor_info VALUES (3, 20180201, 1, 'Jay');
INSERT INTO vendor_info VALUES (1, 20180301, 1, 'Jay');
INSERT INTO vendor_info VALUES (2, 20180301, 1, 'David');
INSERT INTO vendor_info VALUES (3, 20180301, 1, 'Eric');
WITH vendor_dates AS
(SELECT
id
, MIN(datestamp) AS min_datestamp
, MAX(datestamp) AS max_datestamp
, MAX(case when statuz = 0 then datestamp end) AS max_s0_date
, MAX(case when statuz = 1 then datestamp end) AS max_s1_date
, MIN(case when statuz = 0 then datestamp end) AS min_s0_date
, MIN(case when statuz = 1 then datestamp end) AS min_s1_date
FROM vendor_info
GROUP BY id
)
SELECT
vd.id
, v1.maintainer AS first_maintainer
, v2.maintainer AS last_maintainer
, v3.maintainer AS last_s0_maintainer
, v4.maintainer AS last_s1_maintainer
, v5.maintainer AS first_s0_maintainer
, v6.maintainer AS first_s1_maintainer
FROM vendor_dates vd
LEFT JOIN vendor_info v1 ON vd.id = v1.id AND vd.min_datestamp = v1.datestamp
LEFT JOIN vendor_info v2 ON vd.id = v2.id AND vd.max_datestamp = v2.datestamp
LEFT JOIN vendor_info v3 ON vd.id = v3.id AND vd.max_s0_date = v3.datestamp
LEFT JOIN vendor_info v4 ON vd.id = v4.id AND vd.max_s1_date = v4.datestamp
LEFT JOIN vendor_info v5 ON vd.id = v5.id AND vd.min_s0_date = v5.datestamp
LEFT JOIN vendor_info v6 ON vd.id = v6.id AND vd.min_s1_date = v6.datestamp;

Check the following query.
WITH
a AS (
SELECT
id, datestamp, maintainer, statuz,
MIN(datestamp) OVER(PARTITION BY id) AS fm,
MAX(datestamp) OVER(PARTITION BY id) AS lm,
MIN(datestamp) OVER(PARTITION BY id, statuz) AS fZm,
MAX(datestamp) OVER(PARTITION BY id, statuz) AS lZm
FROM vendor_info
)
SELECT
id,
MIN(IIF(datestamp = fm, maintainer, NULL)) AS first_maintainer,
MAX(IIF(datestamp = lm, maintainer, NULL)) AS last_maintainer,
MAX(IIF(datestamp = lZm AND statuz = 0, maintainer, NULL)) AS last_s0_maintainer,
MAX(IIF(datestamp = lZm AND statuz = 1, maintainer, NULL)) AS last_s1_maintainer,
MIN(IIF(datestamp = fZm AND statuz = 0, maintainer, NULL)) AS first_s0_maintainer,
MIN(IIF(datestamp = fZm AND statuz = 1, maintainer, NULL)) AS first_s1_maintainer
FROM a
GROUP BY id;
It can be tested on SQL Fiddle.

I haven't had time yet to generate 10 mil test records , but try this with index on Id, datestamp - I've got hopes for it - the execution plan looked good - edit - with 50 mil records I generated, it looked pretty fast as long as the (id,datestamp) index (or other suitable index) is there.
SELECT tID.id, V1.first_maintainer, V2.last_maintainer, V3.last_s0_maintainer, V4.last_s1_maintainer, V5.first_s0_maintainer, V6.first_s1_maintainer
FROM (SELECT DISTINCT ID from vendor_info) tID
OUTER APPLY
(SELECT TOP 1 vi1.maintainer first_maintainer
FROM vendor_info vi1
WHERE vi1.id = tID.id
ORDER BY vi1.datestamp ASC) V1
OUTER APPLY
(SELECT TOP 1 vi2.maintainer last_maintainer
FROM vendor_info vi2
WHERE vi2.id = tID.id
ORDER BY vi2.datestamp DESC) V2
OUTER APPLY
(SELECT TOP 1 vi3.maintainer last_s0_maintainer
FROM vendor_info vi3
WHERE vi3.statuz = 0 AND vi3.id = tID.id
ORDER BY vi3.datestamp DESC) V3
OUTER APPLY
(SELECT TOP 1 vi4.maintainer last_s1_maintainer
FROM vendor_info vi4
WHERE vi4.statuz = 1 AND vi4.id = tID.id
ORDER BY vi4.datestamp DESC) V4
OUTER APPLY
(SELECT TOP 1 vi5.maintainer first_s0_maintainer
FROM vendor_info vi5
WHERE vi5.statuz = 0 AND vi5.id = tID.id
ORDER BY vi5.datestamp ASC) V5
OUTER APPLY
(SELECT TOP 1 vi6.maintainer first_s1_maintainer
FROM vendor_info vi6
WHERE vi6.statuz = 1 AND vi6.id = tID.id
ORDER BY vi6.datestamp ASC) V6

I'd go with Andrei Odegov's answer.
The perfect solution would be an aggregation function that gives you the name for the maximum or minimum date, like Oracle's KEEP FIRST/LAST. SQL Server doesn't feature such function, so using window functions as shown by Andrei Odegov seems the best solution.
If this is still too slow, it may be worth a try to concatenate days and names and look for MIN/MAX of these (e.g. '20180101Eric' < '20180201Jay'), then extract the names. A lot of string manipulation, but simple aggregation, and you must read the whole table anyway.
WITH vi AS
(
SELECT
id,
statuz,
CONVERT(VARCHAR, datestamp) + maintainer AS date_and_name
FROM vendor_info
)
SELECT
id
, SUBSTRING(MIN(date_and_name), 9, 100) AS first_maintainer
, SUBSTRING(MAX(date_and_name), 9, 100) AS last_maintainer
, SUBSTRING(MAX(case when statuz = 0 then date_and_name end), 9, 100) AS last_s0_maintainer
, SUBSTRING(MAX(case when statuz = 1 then date_and_name end), 9, 100) AS last_s1_maintainer
, SUBSTRING(MIN(case when statuz = 0 then date_and_name end), 9, 100) AS first_s0_maintainer
, SUBSTRING(MIN(case when statuz = 1 then date_and_name end), 9, 100) AS first_s1_maintainer
FROM vi
GROUP BY id
ORDER BY id;
(If you store the dates as dates and not as integers as shown in your SQL fiddle, then you'll have to change CONVERT and maybe SUBSTRING accordingly.)
SQL fiddle: http://sqlfiddle.com/#!18/9ee2c7/46

Also it's possible to use UNPIVOT/JOIN/PIVOT:
WITH
a AS (
SELECT
id, statuz,
MIN(datestamp) AS fzm, MAX(datestamp) AS lzm,
MIN(MIN(datestamp)) OVER(PARTITION BY id) AS fm,
MAX(MAX(datestamp)) OVER(PARTITION BY id) AS lm
FROM vendor_info
GROUP BY id, statuz
),
b AS (
SELECT
v.id,
up.[type] + IIF(up.[type] IN('fm', 'lm'), '', STR(up.statuz, 1)) AS p,
v.maintainer
FROM a
UNPIVOT(datestamp FOR [type] IN(fm, lm, fzm, lzm)) AS up
JOIN vendor_info v
ON up.id = v.id AND up.datestamp = v.datestamp
)
SELECT
id,
fm AS first_maintainer, lm AS last_maintainer,
lzm0 AS last_s0_maintainer, lzm1 AS last_s1_maintainer,
fzm0 AS fzmfirst_s0_maintainer, fzm1 AS first_s1_maintainer
FROM b
PIVOT(MIN(maintainer) FOR p IN(fm, lm, lzm0, lzm1, fzm0, fzm1)) AS p;
It can be tested on SQL Fiddle.

Related

SQL to select the 'first' date a project was made inactive for all projects

I am trying to work out the SQL I would need to select certain records, here is an example of what I'm trying to do:
Project number
Active/Inactive
Date
1
A
1/1/20
1
I
3/1/20
1
A
5/1/20
1
I
7/1/20
1
I
9/1/20
2
I
1/1/19
2
A
5/1/19
3
A
1/3/20
3
I
3/3/20
3
I
5/3/20
Note: A=Active project, I=Inactive.
What I would like to do is for each project where the project is currently inactive (i.e. the latest date for the project in the above table is set to I), return the row of the longest time ago it was made inactive, but NOT before it was last active (hope this is understandable!). So for the above table the following would be returned:
Project number
Active/Inactive
Date
1
I
7/1/20
3
I
3/3/20
So proj number 1 is inactive and the earliest time it was made inactive (after the last time it was active) is 7/1/20. Project 2 is not selected as it is currently active. Project 3 is inactive and the earliest time it was made inactive (after the last time it was active) is 3/3/20.
Thanks.
You could use the 'row_number' function to help you.
create TABLE #PROJECT(ProjectNumber int, [Status] varcha(1), [Date] date)
INSERT INTO #PROJECT VALUES
(1 ,'A' ,'1/1/20'),
(1 ,'I' ,'3/1/20'),
(1 ,'A' ,'5/1/20'),
(1 ,'I' ,'7/1/20'),
(1 ,'I' ,'9/1/20'),
(2 ,'I' ,'1/1/19'),
(2 ,'A' ,'5/1/19'),
(3 ,'A' ,'1/3/20'),
(3 ,'I' ,'3/3/20'),
(3 ,'I' ,'5/3/20')
select * from
(SELECT
row_number() over (partition by projectNumber order by [date]) as [index]
,*
FROM
#PROJECT
WHERE
[STATUS] = 'I'
) as a where [index] = 1
Using some effective date joins, this should work. I am using SQL Server. Create your tables and set up the same data set you provided:
CREATE TABLE dbo.PROJECTS
(
PROJ_NUM int NULL,
STTS char(1) NULL,
STTS_DT date NULL
) ON [PRIMARY]
GO
INSERT INTO dbo.PROJECTS values (1, 'A', '1/1/20');
INSERT INTO dbo.PROJECTS values (1, 'I', '3/1/20');
INSERT INTO dbo.PROJECTS values (1, 'A', '5/1/20');
INSERT INTO dbo.PROJECTS values (1, 'I', '7/1/20');
INSERT INTO dbo.PROJECTS values (1, 'I', '9/1/20');
INSERT INTO dbo.PROJECTS values (2, 'I', '1/1/19');
INSERT INTO dbo.PROJECTS values (2, 'A', '5/1/19');
INSERT INTO dbo.PROJECTS values (3, 'A', '1/3/20');
INSERT INTO dbo.PROJECTS values (3, 'I', '3/3/20');
INSERT INTO dbo.PROJECTS values (3, 'I', '5/3/20');
Write a sub-query that filters out just to the projects that are INACTIVE:
-- sub-query that gives you projects that are inactive
SELECT PROJ_NUM, STTS, STTS_DT FROM dbo.PROJECTS CURRSTTS
WHERE STTS_DT = (SELECT MAX(STTS_DT) FROM dbo.PROJECTS ALLP WHERE ALLP.PROJ_NUM = CURRSTTS.PROJ_NUM)
AND CURRSTTS.STTS = 'I'
;
Write another sub-query that provides you the last active status date for each project:
-- sub-query that gives you last active status date for each project
SELECT PROJ_NUM, STTS, STTS_DT FROM dbo.PROJECTS LASTACTV
WHERE STTS_DT = (SELECT MAX(STTS_DT) FROM dbo.PROJECTS ALLP WHERE ALLP.PROJ_NUM = LASTACTV.PROJ_NUM AND ALLP.STTS = 'A')
;
Combine those two sub-queries into a query that gives you the list of inactive projects with their last active status date:
-- sub-query using the 2 above to show only inactive projects with last active stts date
SELECT CURRSTTS.PROJ_NUM, CURRSTTS.STTS, CURRSTTS.STTS_DT, LASTACTV.STTS_DT AS LASTACTV_STTS_DT FROM dbo.PROJECTS CURRSTTS
INNER JOIN
(SELECT PROJ_NUM, STTS, STTS_DT FROM dbo.PROJECTS LASTACTV
WHERE STTS_DT = (SELECT MAX(STTS_DT) FROM dbo.PROJECTS ALLP WHERE ALLP.PROJ_NUM = LASTACTV.PROJ_NUM AND ALLP.STTS = 'A'))
LASTACTV ON CURRSTTS.PROJ_NUM = LASTACTV.PROJ_NUM
WHERE CURRSTTS.STTS_DT = (SELECT MAX(STTS_DT) FROM dbo.PROJECTS ALLP WHERE ALLP.PROJ_NUM = CURRSTTS.PROJ_NUM)
AND CURRSTTS.STTS = 'I'
Add one more layer to the query that selects the MIN(STTS_DT) that is greater than the LASTACTV_STTS_DT:
-- final query that uses above sub-query
SELECT P.PROJ_NUM, P.STTS, P.STTS_DT
FROM dbo.PROJECTS P
INNER JOIN (
SELECT CURRSTTS.PROJ_NUM, CURRSTTS.STTS, CURRSTTS.STTS_DT, LASTACTV.STTS_DT AS LASTACTV_STTS_DT FROM dbo.PROJECTS CURRSTTS
INNER JOIN
(SELECT PROJ_NUM, STTS, STTS_DT FROM dbo.PROJECTS LASTACTV
WHERE STTS_DT = (SELECT MAX(STTS_DT) FROM dbo.PROJECTS ALLP WHERE ALLP.PROJ_NUM = LASTACTV.PROJ_NUM AND ALLP.STTS = 'A'))
LASTACTV ON CURRSTTS.PROJ_NUM = LASTACTV.PROJ_NUM
WHERE CURRSTTS.STTS_DT = (SELECT MAX(STTS_DT) FROM dbo.PROJECTS ALLP WHERE ALLP.PROJ_NUM = CURRSTTS.PROJ_NUM)
AND CURRSTTS.STTS = 'I'
) SUB ON SUB.PROJ_NUM = P.PROJ_NUM
WHERE P.STTS_DT = (SELECT MIN(STTS_DT) FROM dbo.PROJECTS ALLP WHERE ALLP.PROJ_NUM = P.PROJ_NUM AND ALLP.STTS_DT > SUB.LASTACTV_STTS_DT)
The result I get back matches your desired result:
"Greatest n-per group" is the thing to look up when you run accross a problem like this again. Here is a query that will get what you need in postgresSQL.
I realized I changed your column to a boolean, but you will get the gist.
with most_recent_projects as (
select project_number, max(date) date from testtable group by project_number
),
currently_inactive_projects as (
select t.project_number, t.date from testtable t join most_recent_projects mrp on t.project_number = mrp.project_number and t.date = mrp.date where not t.active
),
last_active_date as (
select project_number, date from (
select t.project_number, rank() OVER (
PARTITION BY t.project_number
ORDER BY t.date DESC), t.date
from currently_inactive_projects cip join testtable t on t.project_number = cip.project_number where t.active) t1 where rank = 1
)
-- oldest inactive -- ie, result
select t.project_number, t.active, min(t.date) from last_active_date lad join testtable t on lad.project_number = t.project_number and t.date > lad.date group by t.project_number, t.active;
This is a variation of "gaps and islands" problem.
The query may be like this
SELECT
num,
status,
MIN(date) AS date
FROM (
SELECT
*,
MAX(group_id) OVER (PARTITION BY num) AS max_group_id
FROM (
SELECT
*,
SUM(CASE WHEN status = prev_status THEN 0 ELSE 1 END) OVER (PARTITION BY num ORDER BY date) AS group_id
FROM (
SELECT
*,
LAG(status) OVER (PARTITION BY num ORDER BY date) AS prev_status
FROM projects
) groups
) islands
) q
WHERE status = 'I' AND group_id = max_group_id
GROUP BY num, status
ORDER BY num
Another approach using CTEs
WITH last_status AS (
SELECT
*
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY num ORDER BY date DESC) AS rn
FROM projects
) rns
WHERE rn = 1
),
last_active AS (
SELECT
num,
MAX(date) AS date
FROM projects
WHERE status = 'A'
GROUP BY num
),
last_inactive AS (
SELECT
p.num,
MIN(p.date) AS date
FROM projects p
WHERE p.status = 'I'
AND (
EXISTS (
SELECT 1 FROM last_active la
WHERE la.num = p.num AND la.date < p.date
)
OR NOT EXISTS (
SELECT 1 FROM last_active la
WHERE la.num = p.num
)
)
GROUP BY num
)
SELECT
ls.num,
ls.status,
li.date
FROM last_status ls
JOIN last_inactive li ON li.num = ls.num
WHERE ls.status = 'I'
You can check a working demo with both queries here

SQL - Getting Sum of 'X' Consecutive Values where X is an Integer in another Row (With Categories)

Say for example, I wanted to SUM all the values from the current row until the provided count. See table below:
For example:
Category A, Row 1: 10+15+25 = 50 (because it adds Rows 1 to 3 due to Count)
Category A, Row 2: 15+25+30+40 = 110 (because it adds Rows 2 to 5 due to count)
Category A, Row 5: 40+60 = 100 (because it Adds Rows 5 and 6. Since the count is 5, but the category ends at Row 6, so instead of that, it sums all available data which is Rows 5 and 6 only, thus having a value of 100.
Same goes for Category B.
How do I do this?
You can do this using window functions:
with tt as (
select t.*,
sum(quantity) over (partition by category order by rownumber) as running_quantity,
max(rownumber) over (partition by category) as max_rownumber
from t
)
select tt.*,
coalesce(tt2.running_quantity, ttlast.running_quantity) - tt.running_quantity + tt.quantity
from tt left join
tt tt2
on tt2.category = tt.category and
tt2.rownumber = tt.rownumber + tt.count - 1 left join
tt ttlast
on ttlast.category = tt.category and
ttlast.rownumber = ttlast.max_rownumber
order by category, rownumber;
I can imagine that under some circumstances this would be much faster -- particularly if the count values are relatively large. For small values of count, the lateral join is probably faster, but it is worth checking if performance is important.
Actually, a pure window functions approach is probably the best approach:
with tt as (
select t.*,
sum(quantity) over (partition by category order by rownumber) as running_quantity
from t
)
select tt.*,
(coalesce(lead(tt.running_quantity, tt.count - 1) over (partition by tt.category order by tt.rownumber),
first_value(tt.running_quantity) over (partition by tt.category order by tt.rownumber desc)
) - tt.running_quantity + tt.quantity
)
from tt
order by category, rownumber;
Here is a db<>fiddle.
Try this:
DECLARE #DataSource TABLE
(
[Category] CHAR(1)
,[Row Number] BIGINT
,[Quantity] INT
,[Count] INT
);
INSERT INTO #DataSource ([Category], [Row Number], [Quantity], [Count])
VALUES ('A', 1, 10, 3)
,('A', 2, 15, 4)
,('A', 3, 25, 2)
,('A', 4, 30, 1)
,('A', 5, 40, 5)
,('A', 6, 60, 2)
--
,('B', 1, 12, 2)
,('B', 2, 13, 3)
,('B', 3, 17, 1)
,('B', 4, 11, 2)
,('B', 5, 10, 5)
,('B', 6, 7, 3);
SELECT *
FROM #DataSource E
CROSS APPLY
(
SELECT SUM(I.[Quantity])
FROM #DataSource I
WHERE I.[Row Number] <= E.[Row Number] + E.[Count] - 1
AND I.[Row Number] >= E.[Row Number]
AND E.[Category] = I.[Category]
) DS ([Sum]);

How can I spilt a varchar column to different columns

I have a database table that holds userspecified data for customer orders.
instead of making a column per custom field the wrighter of the software made a 3 column system like this:
orderline_ID Field_ID Value
--------------------------------
1 1 50
1 2 today
1 3 green
2 1 80
2 2 next week
2 3 60
I want this data sorted like this:
Orderline_ID 1 2 3
----------------------------------------
1 50 today green
2 80 next week 60
so I can join it in an other query I use.
But the code I wrote came up like
Orderline_ID 1 2 3
-----------------------------------------
1 50 NULL NULL
1 NULL today NULL
1 NULL NULL green
2 80 NULL NULL
2 NULL next week NULL
2 NULL NULL 60
and when I sort by Orderline_ID it results in a error.
The code I used:
SELECT
fldVerkoopOrderRegelID,
(SELECT VOG.fldWaarde
WHERE (VOG.fldVeldNummer = 1) AND (VOG.fldWaarde IS NOT NULL)) AS [aantal vaten],
(SELECT VOG.fldWaarde
WHERE (VOG.fldVeldNummer = 2) AND (VOG.fldWaarde IS NOT NULL)) AS [Vat nett0],
(SELECT VOG.fldWaarde
WHERE (VOG.fldVeldNummer = 3) AND (VOG.fldWaarde IS NOT NULL)) AS [Vat bruto],
(SELECT VOG.fldWaarde
WHERE (VOG.fldVeldNummer = 4) AND (VOG.fldWaarde IS NOT NULL)) AS [cust product code],
(SELECT VOG.fldWaarde
WHERE (VOG.fldVeldNummer = 5) AND (VOG.fldWaarde IS NOT NULL)) AS [extra text],
(SELECT VOG.fldWaarde
WHERE (VOG.fldVeldNummer = 6) AND (VOG.fldWaarde IS NOT NULL)) AS [HS code]
FROM
dbo.tblVerkoopOrderIngaveGegeven AS VOG
WHERE
(fldVerkoopOrderRegelID IS NOT NULL)
this achievable using left join.
select t1.orderline_id, t1.Value, t2.Value, t3.Value
from tblVerkoopOrderIngaveGegeven t1
left join tblVerkoopOrderIngaveGegeven t2 on t2.orderline_id = t1.orderline_id and t2.field_id = 2
left join tblVerkoopOrderIngaveGegeven t3 on t3.orderline_id = t1.orderline_id and t3.field_id = 3
where t1.field_id = 1
You can select distinct a unique order ids and then do a left join on three tables that each has the column you need i.e. 1,2,3
DECLARE #Orders TABLE (
[Orderline_ID] INT,
[Field_ID] INT,
[Value] VARCHAR(MAX)
)
INSERT INTO #Orders SELECT 1, 1, '50'
INSERT INTO #Orders SELECT 1, 2, 'today'
INSERT INTO #Orders SELECT 1, 3, 'green'
INSERT INTO #Orders SELECT 2, 1, '80'
INSERT INTO #Orders SELECT 2, 2, 'next week'
INSERT INTO #Orders SELECT 2, 3, '60'
SELECT
[T].[Orderline_ID],
[T1].[C1],
[T2].[C2],
[T3].[C3]
FROM
(SELECT DISTINCT [Orderline_ID] FROM #Orders ) AS [T]
LEFT JOIN (SELECT [Orderline_ID], [Field_ID], [Value] AS [C1] FROM #Orders) AS [T1] ON ([T].[Orderline_ID] = [T1].[Orderline_ID] AND [T1].[Field_ID] = 1)
LEFT JOIN (SELECT [Orderline_ID], [Field_ID], [Value] AS [C2] FROM #Orders) AS [T2] ON ([T].[Orderline_ID] = [T2].[Orderline_ID] AND [T2].[Field_ID] = 2)
LEFT JOIN (SELECT [Orderline_ID], [Field_ID], [Value] AS [C3] FROM #Orders) AS [T3] ON ([T].[Orderline_ID] = [T3].[Orderline_ID] AND [T3].[Field_ID] = 3)
Using PIVOT is also a way to achieve this.
SELECT orderline_ID,
[1] AS [total barrels],
[2] AS [vat netto],
[3] AS [vat bruto]
FROM
(
SELECT orderline_ID, Field_ID, [Value]
FROM YourSaleOrderInputDataTable
WHERE Field_ID IN (1, 2, 3) -- optional criteria
) AS src
PIVOT
(
MAX([Value])
FOR Field_ID IN ([1], [2], [3])
) AS pvt
ORDER BY orderline_ID;

TSQL Pivoting multiple columns

Code:
DECLARE #Employee TABLE
(
[Employee_Id] INT IDENTITY(1, 1)
, [Code] NVARCHAR(10)
) ;
INSERT INTO #Employee
VALUES ( N'E1' ), ( N'E2' ), ( N'E3' ) ;
DECLARE #Contact TABLE
(
[Employee_Id] INT
, [PhoneType] CHAR(1)
, [PhoneNumber] VARCHAR(20)
, [IsMainNumber] BIT
) ;
INSERT INTO #Contact
VALUES (1, 'M', '1234567890', 1), (1, 'H', '1234567891', 0),
(1, 'M', '1234567892', 0), (1, 'B', '1234567893', 0),
(2, 'M', '2234567890', 0), (2, 'H', '2234567891', 1),
(2, 'B', '2234567892', 0), (2, 'M', '2234567893', 0),
(3, 'M', '3234567890', 0), (3, 'H', '3234567891', 0),
(3, 'M', '3234567892', 0), (3, 'B', '3234567893', 1);
SELECT
[E].[Employee_Id],
[E].[Code],
[COA].[MainPhoneNumber],
[COA].[NonMainNumber]
FROM
#Employee AS [E]
OUTER APPLY
(SELECT
MAX (IIF([C].[IsMainNumber] = 1, [C].[PhoneNumber], NULL)) [MainPhoneNumber],
MAX (IIF([C].[IsMainNumber] = 0, [C].[PhoneNumber], NULL)) [NonMainNumber]
FROM
#Contact AS [C]
WHERE
[E].[Employee_Id] = [C].[Employee_Id]
GROUP BY
[C].[Employee_Id]) AS [COA] ;
Current output
Employee_Id Code MainPhoneNumber NonMainNumber
1 E1 1234567890 1234567893
2 E2 2234567891 2234567893
3 E3 3234567893 3234567892
Goal
I need to return the MAX main phone number and its phone type and MAX non-main phone number and its phone type. I'm able to get the MAX main/non-main phone numbers, but need to somehow get their phone types. I don't want to make two additional joins based on Employee_Id and PhoneNumber and get the type, because original table is huge and that would slow things down a lot. Trying to figure out an alternative that performs well.
Desired Output
Employee_Id Code MainPhoneType MainPhoneNumber NonMainPhoneType NonMainNumber
1 E1 M 1234567890 B 1234567893
2 E2 H 2234567891 M 2234567893
3 E3 B 3234567893 M 3234567892
Seems you need two apply :
select e.Employee_Id, e.Code,
c.PhoneType as MainPhoneType, c.PhoneNumber as MainPhoneNumber,
c1.PhoneType as NonMainPhoneType, c1.PhoneNumber as NonMainNumber
from #Employee e outer apply
(select top (1) c.PhoneType, c.PhoneNumber
from #Contact c
where c.Employee_Id = e.Employee_Id and
c.IsMainNumber = 1
order by c.phonetype
) c outer apply
(select top (1) c1.PhoneType, c1.PhoneNumber
from #Contact c1
where c1.Employee_Id = e.Employee_Id and
c1.IsMainNumber = 0
order by c1.phonetype
) c1;
If you don't want to do JOIN two times then you can use temp table just dump the contacts with relevant index
#temp (Employee_Id, IsMainNumber) include (PhoneType, PhoneNumber)
insert into #temp (Employee_Id, PhoneType, PhoneNumber, IsMainNumber)
select Employee_Id, PhoneType, PhoneNumber, IsMainNumber
from (select *, row_number() over (partition by Employee_Id, IsMainNumber order by PhoneType) as seq
from #Contact
) c
where seq = 1
Now, you don't need to use #Contact again :
select e.*, m.*
from #Employee e cross apply
(select max(case when t.IsMainNumber = 1 then t.PhoneType end) as MainPhoneType,
max(case when t.IsMainNumber = 1 then t.PhoneNumber end) as MainPhoneNumber,
max(case when t.IsMainNumber = 0 then t.PhoneType end) as NonMainPhoneType,
max(case when t.IsMainNumber = 0 then t.PhoneNumber end) as NonMainNumber
from #temp t
where t.Employee_Id = e.Employee_Id
) m;
Not really sure how you determine which nonMainNumber is the one you want. Seems that most of your sample data has several rows that could be returned. I will leave that exercise to you. Here is how you could use some conditional aggregation for this.
select x.Employee_Id
, x.Code
, MainPhoneType = max(case when x.RowNum = 1 then x.PhoneType end)
, MainPhoneNumber = max(case when x.RowNum = 1 then x.PhoneNumber end)
, NonMainPhoneType = max(case when x.RowNum = 2 then x.PhoneType end)
, NonMainPhoneNumber = max(case when x.RowNum = 2 then x.PhoneNumber end)
from
(
select e.Employee_Id
, e.Code
, c.PhoneType
, c.PhoneNumber
, RowNum = ROW_NUMBER() over(partition by e.Employee_Id order by c.IsMainNumber desc, c.PhoneType) --Not sure how you determine the non MainNumber when there are several to pick from
from #Employee e
join #Contact c on c.Employee_Id = e.Employee_Id
) x
group by x.Employee_Id
, x.Code
You can do this with conditional aggregation:
select e.Employee_Id, e.Code
max(case when seqnum = 1 and c.PhoneType = 'M' then c.PhoneType end) as MainPhoneType
max(case when seqnum = 1 and c.PhoneType = 'M' then x.PhoneNumber end) as MainPhoneNumber,
max(case when seqnum = 1 and c.PhoneType <> 'M' then c.PhoneType end) as NonMainPhoneType
max(case when seqnum = 1 and c.PhoneType <> 'M' then c.PhoneNumber end) as NonMainPhoneNumber
from #Employee e join
(select c.*,
row_number() over (partition by c.Employee_Id
(case when PhoneType = 'M' then 'M' end)
order by c.PhoneNumber desc
) as seqnum
from #Contact c
) c
on c.Employee_Id = e.Employee_Id
group by e.Employee_Id, e.Code;
The key idea in this logic is the partition by clause. It divides the two types of phones into two groups -- with 'M' for "main" and NULL for all else.

Best SQL query to retrieve the data which has all required data

I have a transaction table with item details for each company. I want to write a query to retrieve the companies only having item numbers 1,2 and 3 (according to my sample code in below). Selected companies should have all 1,2,3 items. If some company has only item 1, then it shouldn't come. How can I write this?
CREATE TABLE #TmpTran
(
ID BIGINT IDENTITY,
COMPANY_ID BIGINT,
ITEM_NAME VARCHAR(50),
ITEM_NUMBER INT
)
INSERT INTO #TmpTran (COMPANY_ID, ITEM_NAME, ITEM_NUMBER)
VALUES (1, 'ABC', 1), (1, 'DEF', 2), (1, 'HIJ', 3),
(2, 'KLM', 4), (2, 'KLM', 5), (2, 'ABC', 1)
How can I get only Company 1 data using WHERE or JOIN query?
You can do this with group by and having:
select company_id
from #tmptran tt
where item_number in (1, 2, 3)
group by company_id
having count(distinct item_number) = 3;
Another way (more flexible approach)
select company_id
from #tmptran tt
group by company_id
having count(case when item_number = 1 then 1 end) > 0;
and count(case when item_number = 2 then 1 end) > 0;
and count(case when item_number = 3 then 1 end) > 0;
select tt.company_id
from #tmptran tt
where tt.item_number in (1, 2, 3)
group by tt.company_id
having sum(max(case tt.item_number when 1 then 1 end)) +
and sum(max(case tt.item_number when 2 then 1 end)) +
and sum(max(case tt.item_number when 3 then 1 end)) = 3
You said you have a lot of fields. Probably the easiest for the reader to follow would be something like:
select distinct tt.company_id
from #tmptran tt
where tt.item_number in (1, 2, 3)
and exists(select 1
from #tmptran ttSub
where ttSub.company_id = tt.company_id and ttSub.item_number = 1)
and exists(select 1
from #tmptran ttSub
where ttSub.company_id = tt.company_id and ttSub.item_number = 2)
and exists(select 1
from #tmptran ttSub
where ttSub.company_id = tt.company_id and ttSub.item_number = 3)