Selecting a group of dates in SQL Server - sql

I have a history table that captures updates to a certain object and, in addition to other information, captures the time this update happened. What I would like to do is SELECT the MIN(LogDate) corresponding to a certain ActionTaken column.
More specifically, the history table may have other (more recent) rows where ActionTaken = 1, but I want to capture the date ActionTaken became 1.
Example:
SELECT MIN(LogDate) AS FirstActionDate
FROM HistoryTable
WHERE ID = 123
AND FirstActionTaken = 1
SELECT MIN(LogDate) AS SecondActionDate
FROM HistoryTable
WHERE ID = 123
AND SecondActionTaken = 1
SELECT MIN(LogDate) AS ThirdActionDate
FROM HistoryTable
WHERE ID = 123
AND ThirdActionTaken = 1
This works well, and I receive the proper dates without issue. Where I'm running into trouble is then going to select the MAX(LogDate) from this group:
SELECT MAX(LogDate) AS LastActionDate
FROM HistoryTable
WHERE ID = 123
AND LogDate IN
(
( SELECT MIN(LogDate) AS FirstActionDate
FROM HistoryTable
WHERE ID = 123
AND FirstActionTaken = 1 ),
( SELECT MIN(LogDate) AS SecondActionDate
FROM HistoryTable
WHERE ID = 123
AND SecondActionTaken = 1 ),
( SELECT MIN(LogDate) AS ThirdActionDate
FROM HistoryTable
WHERE ID = 123
AND ThirdActionTaken = 1 )
)
This also works, but I hate doing it this way. I could save out the previous statements into variables and just SELECT MAX() from those; it would certainly be more readable, but what would the JOIN syntax look like for this query?
Is there a way to combine the first three SELECT statements into one that returns all three dates and isn't an unreadable mess?
How can I grab the most recent LogDate (as a separate column) from this result set and without the (seemingly unnecessary) repeating SELECT statements?
EDIT:
Here are a few links I've found in relation to the answers that have been given so far:
Data Normalization
Using OUTER/CROSS APPLY
UNPIVOT (and others)
Hopefully these will help with others looking for solutions to similar problems!

This would be easier with a normalized data structure. Here is one method that uses conditional aggregation to calculate the three minimum dates. Then it takes the maximum of those values:
SELECT v.dt
FROM (SELECT MIN(CASE WHEN FirstActionTaken = 1 THEN LogDate END) AS d1,
MIN(CASE WHEN SecondActionTaken = 1 THEN LogDate END) AS d2,
MIN(CASE WHEN ThirdActionTaken = 1 THEN LogDate END) AS d3
FROM HistoryTable
WHERE ID = 123
) ht OUTER APPLY
(SELECT MAX(dt) as dt
FROM (VALUES (d1), (d2), (d3) ) v(dt)
) v;

EDIT 2
Based on new information that can be gleaned from OP's own answer (about how to define the latest action date), the query can be further simplified to simply this:
select coalesce(
min(case when ThirdActionTaken = 1 then LogDate end),
min(case when SecondActionTaken = 1 then LogDate end),
min(case when FirstActionTaken = 1 then LogDate end)
) as LastActionDate
from HistoryTable
where id = 123
Unpivot can also be used:
select max(ActionDate)
from (select min(case when FirstActionTaken = 1 then LogDate end) as FirstActionDate,
min(case when SecondActionTaken = 1 then LogDate end) as SecondActionDate,
min(case when ThirdActionTaken = 1 then LogDate end) as ThirdActionDate
from HistoryTable
where id = 123) t
unpivot (ActionDate for ActionDates in (FirstActionDate, SecondActionDate, ThirdActionDate)) unpvt
EDIT: Short explanation
This answer is very similar to Gordon's in that it uses conditional aggregation to get the 3 minimum dates in one query.
So the following part of the query:
select min(case when FirstActionTaken = 1 then LogDate end) as FirstActionDate,
min(case when SecondActionTaken = 1 then LogDate end) as SecondActionDate,
min(case when ThirdActionTaken = 1 then LogDate end) as ThirdActionDate
from HistoryTable
where id = 123
...might return something like...
FirstActionDate SecondActionDate ThirdActionDate
--------------- ---------------- ---------------
2015-01-01 2015-12-01 (null)
Then, the unpivot clause is what "unpivots" the 3 columns into a result set with 3 rows but a single column instead:
ActionDate
----------
2015-01-01
2015-12-01
(null)
Once the results are in this format, then a simple max aggregate function (select max(ActionDate)) can be applied to get the max value of the 3 rows.

You can use a UNION to join the 3 queries for your IN statement.
Something like
SELECT
MAX(ht1.LogDate) AS LastActionDate
FROM
HistoryTable ht1
WHERE
ht1.ID = 123
AND ht1.LogDate IN (SELECT
MIN(LogDate) AS FirstActionDate
FROM
HistoryTable ht2
WHERE
ht2.ID = ht1.ID
AND ht2.FirstActionTaken = 1
UNION
SELECT
MIN(LogDate) AS FirstActionDate
FROM
HistoryTable ht2
WHERE
ht2.ID = ht1.ID
AND ht2.SecondActionTaken = 1
UNION
SELECT
MIN(LogDate) AS FirstActionDate
FROM
HistoryTable ht2
WHERE
ht2.ID = ht1.ID
AND ht2.ThirdActionTaken = 1)

You can solve this problem without using PIVOT. The following code extends your initial code to store the MIN values into variables and then calculates the max value among them:
DECLARE #FirstActionDate DATETIME = NULL;
DECLARE #SecondActionDate DATETIME = NULL;
DECLARE #ThirdActionDate DATETIME = NULL;
DECLARE #LastActionDate DATETIME = NULL;
SELECT #FirstActionDate = MIN(LogDate)
FROM HistoryTable
WHERE ID = 123
AND FirstActionTaken = 1
SELECT #SecondActionDate = MIN(LogDate)
FROM HistoryTable
WHERE ID = 123
AND SecondActionTaken = 1
SELECT #ThirdActionDate = MIN(LogDate)
FROM HistoryTable
WHERE ID = 123
AND ThirdActionTaken = 1
-- calculate #LastActionDate as the greater from #FirstActionDate, #SecondActionDate and #ThirdActionDate.
SET #LastActionDate = #FirstActionDate;
IF (#SecondActionDate > #LastActionDate) SET #LastActionDate = #SecondActionDate;
IF (#ThirdActionDate > #LastActionDate) SET #LastActionDate = #ThirdActionDate;
SELECT #FirstActionDate AS [FirstActionDate]
, #SecondActionDate AS [SecondActionDate]
, #ThirdActionDate AS [ThirdActionDate]
, #LastActionDate AS [LastActionDate]
If you want the absolute last action date, you can change the original code to just a single statement, as follows:
SELECT MAX(LogDate) AS [LastActionDate]
, MIN(CASE WHEN FirstActionTaken = 1 THEN LogDate ELSE NULL END) AS [FirstActionDate]
, MIN(CASE WHEN SecondActionTaken = 1 THEN LogDate ELSE NULL END) AS [SecondActionDate]
, MIN(CASE WHEN ThirdActionTaken = 1 THEN LogDate ELSE NULL END) AS [ThirdActionDate]
FROM HistoryTable
WHERE ID = 123

My own attempt at refactoring the final SELECT statement:
SELECT MIN(ht2.LogDate) AS FirstActionDate,
MIN(ht3.LogDate) AS SecondActionDate,
MIN(ht4.LogDate) AS ThirdActionDate,
COALESCE (
MIN(ht4.LogDate),
MIN(ht3.LogDate),
MIN(ht2.LogDate)
) AS LastActionDate
FROM HistoryTable ht
INNER JOIN HistoryTable ht2
ON ht2.ID = ht.ID AND ht2.FirstActionTaken = 1
INNER JOIN HistoryTable ht3
ON ht3.ID = ht.ID AND ht3.SecondActionTaken = 1
INNER JOIN HistoryTable ht4
ON ht4.ID = ht.ID AND ht4.ThirdActionTaken = 1
WHERE ht.ID = 123
GROUP BY ht.ID
This JOINS back to HistoryTable for each xActionTaken column and SELECTS the MIN(LogDate) from each. Then, we walk backwards through the results (ThirdAction, SecondAction, FirstAction) and return the first one we find as LastActionTaken.
Admittedly this is a bit messy, but I thought it would be good to show another alternative to retrieving the same data.
Also worth noting for performance:
After running my answer against the UNPIVOT and OUTER APPLY methods, SSMS Execution Plan shows that UNPIVOT and OUTER APPLY are roughly equal (taking approx. 50% of the execution time each).
When comparing my method to either of these, my method takes approx. 88% of the execution time, where UNPIVOT/OUTER APPLY only take 12% - so both UNPIVOT and OUTER APPLY execute much faster (at least in this instance).
The reason that my method takes so much longer is that SQL does a table scan of HistoryTable for each time I join back to it, for a total of 4 scans. With the other two methods, this action is only performed once.

Related

SQL to return 1 or 0 depending on values in a column's audit trail

If I were to have a table such as the one below:
id_
last_updated_by
1
robot
1
human
1
robot
2
robot
3
robot
3
human
Using SQL, how could I group by the ID and create a new column to indicate whether a human has ever updated the record like this:
id_
last_updated_by
updated_by_human
1
robot
1
2
robot
0
3
robot
1
UPDATE
I'm currently doing the following, though I'm not sure how efficient this is. Selecting the latest record and then merging it with my calculated column via a sub-select.
SELECT MAIN.TRANSACTION_ID,
MAIN.CREATED_DATE
MAIN.CREATED_BY_USER_ID,
MAIN.OWNER_USER_ID,
STP.TOUCHED_BY_HUMAN
FROM (
SELECT TRANSACTION_ID,
CREATED_DATE
CREATED_BY_USER_ID_
OWNER_USER_ID_
FROM TABLE_NAME
WHERE CREATED_DATE >= CAST('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= CAST('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY row_number() OVER (partition by TRANSACTION_ID order by End_Dt desc) = 1
) MAIN
LEFT JOIN (
SELECT TRANSACTION_ID,
CASE
WHEN CREATED_BY_USER_ID IN ('ROBOT', 'MACHINE') OR
CREATED_BY_USER_ID LIKE 'N%' OR
CREATED_BY_USER_ID IS NULL
THEN 0
ELSE 1 END AS CREATED_BY_HUMAN,
CASE
WHEN OWNER_USER_ID IN ('ROBOT', 'MACHINE') OR
OWNER_USER_ID LIKE 'N%' OR
OWNER_USER_ID IS NULL
THEN 0
ELSE 1 END AS OWNED_BY_HUMAN,
CASE
WHEN CREATED_BY_HUMAN = 0 AND
OWNED_BY_HUMAN = 0
THEN 0
ELSE 1 END AS TOUCHED_BY_HUMAN_
FROM TABLE_NAME
WHERE CREATED_DATE >= CAST('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= CAST('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY row_number() OVER (partition by TRANSACTION_ID order by TOUCHED_BY_HUMAN_ desc) = 1
) STP
ON MAIN.TRANSACTION_ID = STP.TRANSACTION_ID
If I'm following your problem, then something like this should work.
SELECT
t.*
,CASE WHEN a.id IS NOT NULL THEN 1 ELSE 0 END AS updated_by_human
FROM table t
LEFT JOIN (SELECT DISTINCT id FROM table WHERE last_updated_by = 'human') a ON t.id = a.id
That takes care of the updated_by_human field, but if you also need to reduce the records in table (only keeping a subset) then you need more information to do that.
Exists clauses are usually not that performant but if your data isn't big this should work.
select id_,
IF (EXISTS (SELECT 1 FROM table_name t2 WHERE t2.last_updated_by = 'human' and t2.id_ = t1.id_), 1, 0) AS updated_by_human
from table_name t1;
here is another way
SELECT *
FROM table_name t1
GROUP BY ti.id_
HAVING COUNT(*) > 0
AND MAX(CASE t1.last_updated_by WHEN 'human' THEN 1 ELSE 0 END) = 1;
Since you didn't specified which column is used to determine this record is the newest record added by a given id, I assume that there will be a column to track the insert/modify timestamp (which is pretty standard table design), let's put it is last_updated_timestamp (if you don't have any, then I still insist you to have one as an auditing trail without timestamp does not make sense)
Given your table name is updating_trail
SELECT updating_trail.*, last_update_trail.modified_by_human
FROM updating_trail
INNER JOIN (
-- determine the id_, the lastest modified_timestamp, and a flag check to determine if there is any record with last_update_by is 'human' -> if yes then give 1
SELECT updating_trail.id_, MAX(last_update_timestamp) AS most_recent_update_ts, MAX(CASE WHEN updating_trail.last_updated_by = 'human' THEN 1 ELSE 0 END) AS modified_by_human
FROM updating_trail
GROUP BY updating_trail.id_
) last_update_trail
ON updating_trail.id_ = last_update_trail.id_ AND updating_trail.last_update_timestamp = last_update_trail.most_recent_update_ts;
Give
id_
last_updated_by
last_update_timestamp
modified_by_human
1
robot
2021-10-19T20:00:00.000Z
1
2
robot
2021-10-19T17:00:00.000Z
0
3
robot
2021-10-19T16:00:00.000Z
1
Check out this sample db fiddle I created for you
This is a 1:1 translation of your query to conditional aggregation:
SELECT TRANSACTION_ID,
CREATED_DATE,
CREATED_BY_USER_ID,
OWNER_USER_ID,
Max(CASE
WHEN CREATED_BY_USER_ID IN ('ROBOT', 'MACHINE') OR
CREATED_BY_USER_ID LIKE 'N%' OR
CREATED_BY_USER_ID IS NULL
THEN 0
ELSE 1
END) Over (PARTITION BY TRANSACTION_ID) AS CREATED_BY_HUMAN
FROM Table_Name
WHERE CREATED_DATE >= Cast('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= Cast('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY Row_Number() Over (PARTITION BY TRANSACTION_ID ORDER BY End_Dt DESC) = 1

Get single row depending of conditional

I have a simple select query with some joins like:
SELECT
[c].[column1]
, [c].[column2]
FROM [Customer] AS [c]
INNER JOIN ...
So I do a left join with my principal table as:
LEFT JOIN [Communication] AS [com] ON [c].[CustomerGuid] = [com].[ComGuid]
this relatioship its 1 to *, one customer can have multiple communications
So in my select I want to get value 1 or 2 depending of condition:
Condition:
if ComTypeKey (from communication) table have a row with value 3 and have another row with vale 4 return 1 then 0
So I try something like:
SELECT
[c].[column1]
, [c].[column2]
, IIF([com].[ComTypeKey] = 3 AND [com].[ComTypeKey] = 4,1,0)
FROM [Customer] AS [c]
INNER JOIN ...
LEFT JOIN [Communication] AS [com] ON [c].[CustomerGuid] = [com].[ComGuid]
But it throws me two rows, beacause there are 2 rows on communication. My desire value is to get only one row with value 1 if my condition is true
If you have multiple rows you need GROUP BY, then count the relevant keys and subtract 1 to get (1, 0)
SELECT
[c].[column1]
, [c].[column2]
, COUNT(CASE WHEN [ComTypeKey] IN (3,4) THEN 1 END) - 1 as FLAG_CONDITION
FROM [Customer] AS [c]
INNER JOIN ...
LEFT JOIN [Communication] AS [com]
ON [c].[CustomerGuid] = [com].[ComGuid]
GROUP BY
[c].[column1]
, [c].[column2]
I'm not really sure I understand.
This will literally find if both values 3 and 4 exist for that CustomerGuid, and only select one of them in that case - not filtering out any record otherwise.
If this is not what you want, providing sample data with the expected result would remove the ambiguity.
SELECT Field1,
Field2,
...
FieldN
FROM (SELECT TMP.*,
CASE WHEN hasBothValues = 1 THEN
ROW_NUMBER() OVER ( PARTITION BY CustomerGuid ORDER BY 1 )
ELSE 1
END AS iterim_rn
FROM (SELECT TD.*,
MAX(CASE WHEN Value1 = '3' THEN 1 ELSE 0 END) OVER
( PARTITION BY CustomerGuid ) *
MAX(CASE WHEN Value1 = '4' THEN 1 ELSE 0 END) OVER
( PARTITION BY CustomerGuid ) AS hasBothValues
FROM TEST_DATA TD
) TMP
) TMP2
WHERE interim_rn = 1

Comparing rows with another rows in a single SQL Server Table

I've a table with the following rows of data.
EngID Tower Billing Amt
100 ICS Y 5000
100 EDT Y 7777
100 ICS N 2000
and I want the result set to be consolidated by Tower & Eng ID and the amount put into the appropriate column (Invoiced or Not Invoiced) based on Billing criteria. So, below is how the final result set should look for the above table:
EngID Tower Inv Amt (Amt when Billing = Y) Non-Invoiced Amt (Billing=N)
100 ICS 5000 2000
100 EDT 7777
I'm able to get the 1st row of the result set by using the below query:
Select Temp1.Tower, Temp1. EngID, Temp2.InvoiceAmt as [Inv Amt], Temp1.InvoiceAmt AS [Non-Invoiced Amt] from
(
SELECT EngID, TOWER,BILLING, InvoiceAmt,RANK() OVER (PARTITION BY EngID, TOWER ORDER BY BILLING) AS RNK
FROM [GDF].[dbo].[Sample] ) Temp1 INNER JOIN (SELECT EngID, TOWER,Billing,InvoiceAmt, RANK() OVER (PARTITION BY EngID, TOWER ORDER BY BILLING) AS RNK
FROM [GDF].[dbo].[Sample] ) Temp2 ON
Temp1.EngID = Temp2.EngID
AND (Temp1.Tower = Temp2.Tower AND Temp1.Billing < Temp2.Billing)
However, struggling to get the second row result. My plan is to get the two rows through two separate queries and then do a union to combine the results.
One method is conditional aggregation:
select s.engid, s.tower,
sum(case when s.billing = 'Y' then amt end) as billing_y,
sum(case when s.billing = 'N' then amt end) as billing_n
from gdf.dbo.sample s
group by s.engid, s.tower;
Try this:
select engid, tower,
sum(case when billing = 'Y' then amt end) Inv_amt,
sum(case when billing = 'N' then amt end) Non_Inv_amt,
from my_table
group by
engid,
tower;
We can also do it using OUTER APPLY as below:
select A.EngID,
sum(A.Amt) as [Inv Amt (Amt when Billing = Y)],
sum(B.Amt) as [Non-Invoiced Amt (Billing=N)]
from #test A
outer apply(select b.Amt from #test B where A.EngID = b.EngID and b.tower = a.tower and B.Billing = 'n') B
where a.billing = 'y'
group by A.EngID, A.Tower
Simple LEFT JOIN:
select A.EngID,
sum(A.Amt) as [Inv Amt (Amt when Billing = Y)],
sum(B.Amt) as [Non-Invoiced Amt (Billing=N)]
from #test A
left join #test B on A.EngID = b.EngID
and b.tower = a.tower
and B.Billing = 'n'
where a.billing = 'y'
group by A.EngID, A.Tower
This code will give the desired result without any complexity.Please do find the snapshot of output from below mentioned query.Hope I solved your problem.
WITH Mycte
AS
(
Select ENGID,Tower,Case When Billing='Y' Then ISNULL(SUM(Amt),0) END AS Inv_Amt,
Case When Billing='N' Then ISNULL(SUM(Amt),0) END AS Non_Inv_Amt from #Sample
group by ENGID,Tower,Billing
)
Select ENGID,Tower,SUM(Inv_Amt) AS Inv_Amt,SUM(Non_Inv_Amt) AS Non_Inv_Amt from mycte
group by ENGID,Tower

How to get first and last record in HiveSQL if key is different

I need to get the first and last record for a user if one of the key fields is different over time using a Hive table:
This is some sample data:
UserID EntryDate Activity
a3324 1/1/16 walk
a3324 1/2/16 walk
a3324 1/3/16 walk
a3324 1/4/16 run
a5613 1/1/16 walk
a5613 1/2/16 walk
a5613 1/3/16 walk
a5613 1/4/16 walk
And I'm looking for output preferably like this:
a3324 1/1/16 walk 1/4/16 run
Or at least like this:
a3324 walk run
I start writing code like this:
SELECT UserID, MINIMUM(EntryDate), MAXIMUM(EntryDate), Activity
FROM
SELECT UserID, DISTINCT Activity
GROUP BY UserID
HAVING Count(Activity) > 1
But I know that's not it.
I'd also like to be able to specify the cases where the original activity was Walk and the second activity was Run perhaps in the Where clause.
Can you help with an approach?
Thanks
You can use lag /lead to get a solution
SELECT * FROM (
select UserID ,EntryDate , Activityslec,
lead(Activityslec, 1) over (UserID ,EntryDate ) as nextActivityslec
from table) as A
where Activityslec <> nextActivityslec
SELECT
t.UserId
,MIN(CASE WHEN t.RowNumAsc = 1 THEN t.EntryDate END) as MinEntryDate
,MIN(CASE WHEN t.RowNumAsc = 1 THEN t.Activity END) as MinActivity
,MAX(CASE WHEN t.RowNumDesc = 1 THEN t.EntryDate END) as MaxEntryDate
,MAX(CASE WHEN t.RowNumDesc = 1 THEN t.Activity END) as MaxActivity
FROM
(
SELECT
UserId
,EntryDate
,Activity
,ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY EntryDate) as RowNumAsc
,ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY EntryDate DESC) as RowNumDesc
FROM
Table
) t
WHERE
t.RowNumAsc = 1
OR t.RowNumDesc = 1
GROUP BY
t.UserId
Looks like window functions are supported (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics) so using 2 row numbers 1 for EntryDate Ascending and another for Descending with Conditional Aggregation should get you to the answer.
And if you don't want to use Analytic Functions (window functions) you can use self left joins and conditional aggregation:
SELECT
t.UserId
,MIN(CASE WHEN mn.UserId IS NULL THEN t.EntryDate END) as MinEntryDate
,MIN(CASE WHEN mn.UserId IS NULL THEN t.Activity END) as MinActivity
,MAX(CASE WHEN mx.UserId IS NULL THEN t.EntryDate END) as MaxEntryDate
,MAX(CASE WHEN mx.UserId IS NULL THEN t.Activity END) as MaxActivity
FROM
Table t
LEFT JOIN Table mn
ON t.UserId = mn.UserId
AND t.EntryDate > mn.EntryDate
LEFT JOIN Table mx
ON t.UserId = mx.UserId
AND t.EntryDate < mx.EntryDate
WHERE
mn.UserId IS NULL
OR mx.UserId IS NULL
GROUP BY
t.UserId
Or a correlated Sub Query way:
SELECT
UserId
,MIN(EntryDate) as MinEntryDate
,(SELECT
Activity
FROM
Activity a
WHERE
u.UserId = a.UserId
AND a.EntryDate = MIN(u.EntryDate)
LIMIT 1
) as MinActivity
,MAX(EntryDate) as MaxEntryDate
,(SELECT
Activity
FROM
Activity a
WHERE
u.UserId = a.UserId
AND a.EntryDate = MAX(u.EntryDate)
LIMIT 1
) as MaxActivity
FROM
Activity u
GROUP BY
UserId

2 Rows to 1 Row - Nested Query

I have a response column that stores 2 different values for a same product based on question 1 and question 2. That creates 2 rows for each product but I want only one row for each product.
Example:
select Product, XNumber from MyTable where QuestionID IN ('Q1','Q2')
result shows:
Product XNumber
Bat abc
Bat abc12
I want it to display like below:
Product Xnumber1 Xnumber2
Bat abc abc12
Please help.
Thanks.
If you always have two different values you can try this:
SELECT a.Product, a.XNumber as XNumber1, b.XNumber as XNumber2
FROM MyTable a
INNER JOIN MyTable b
ON a.Product = b.Product
WHERE a.QuestionId = 'Q1'
AND b.QuestionId = 'Q2'
I assume that XNumber1 is the result for Q1 and Xnumber2 is the result for Q2.
This will work best if you don't have answers for both Q1 and Q2 for all ids
SELECT a.Product, b.XNumber as XNumber1, c.XNumber as XNumber2
FROM (SELECT DISTINCT Product FROM MyTable) a
LEFT JOIN MyTable b ON a.Product = b.Product AND b.QuestionID = 'Q1'
LEFT JOIN MyTable c ON a.Product = c.Product AND c.QuestionID = 'Q2'
This is one way to achieve your expected results. However, it relies on knowing that only xNumber abc and abc12 are the values. If this is not the case, then a dynamic pivot would be likely needed.
SELECT product, max(case when XNumber = 'abc' then xNumber end) as XNumber1,
max(Case when xNumber = 'abc12' then xNumber end) as xNumber2
FROM MyTable
GROUP BY Product
The problem is that SQL needs to know how many columns will be in the result at the time it compiles the SQL. Since the number of columns could be dependent on the data itself (2 rows vs 5 rows) it can't complete the request. Using Dynamic SQL you can find out the number of rows, then pass those values in as the column names which is why the dynamic SQL works.
This will get you two columns, the first will be the product, and the 2nd will be a comma delimited list of xNumbers.
SELECT DISTINCT T.Product,
xNumbers = Stuff((SELECT DISTINCT ', ' + T1.XNumber
FROM MyTable T1
WHERE t.Product = T1.Product
FOR XML PATH ('')),1,1,'')
FROM MyTable T
To get what you want, we need to know how many columns there will be, what to name them, and how to determine which value goes into which column
Been using rank() a lot in current code we have been working on at my day job. So this fun variant came to mind for your solution.
Using rank to get the 1st, 2nd, and 3rd possible item identifier then grouping them to create a simulated pivot
DECLARE #T TABLE (PRODUCT VARCHAR(50), XNumber VARCHAR(50))
INSERT INTO #T VALUES
('Bat','0-12345-98765-6'),
('Bat','0-12345-98767-2'),
('Bat','0-12345-98768-1'),
('Ball','0-12345-98771-6'),
('Ball','0-12345-98772-7'),
('Ball','0-12345-98777-9'),
('Hat','0-12345-98711-6'),
('Hat','0-12345-98712-3'),
('Tee','0-12345-98465-1')
SELECT
PRODUCT,
MAX(CASE WHEN I = 1 THEN XNumber ELSE '' END) AS Xnumber1,
MAX(CASE WHEN I = 2 THEN XNumber ELSE '' END) AS Xnumber2,
MAX(CASE WHEN I = 3 THEN XNumber ELSE '' END) AS Xnumber3
FROM
(
SELECT
PRODUCT,
XNumber,
RANK() OVER(PARTITION BY PRODUCT ORDER BY XNumber) AS I
FROM #T
) AS DATA
GROUP BY
PRODUCT