How can I improve this SQL query? - sql

I ran into an interesting SQL problem today and while I came up with a solution that works I doubt it's the best or most efficient answer. I defer to the experts here - help me learn something and improve my query! RDBMS is SQL Server 2008 R2, query is part of an SSRS report that will run against about 100,000 rows.
Essentially I have a list of IDs that could have multiple values associated with them, the values being Yes, No, or some other string. For ID x, if any of the values are a Yes, x should be Yes, if they are all No, it should be No, if they contain any other values but yes and no, display that value. I only want to return 1 row per ID, no duplicates.
The simplified version and test case:
DECLARE #tempTable table ( ID int, Val varchar(1) )
INSERT INTO #tempTable ( ID, Val ) VALUES ( 10, 'Y')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 11, 'N')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 11, 'N')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 13, 'N')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 14, 'Y')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 14, 'N')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 15, 'Y')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 16, 'Y')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 17, 'F')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 18, 'P')
SELECT DISTINCT t.ID, COALESCE(t2.Val, t3.Val, t4.Val)
FROM #tempTable t
LEFT JOIN
(
SELECT ID, Val
FROM #tempTable
WHERE Val = 'Y'
) t2 ON t.ID = t2.ID
LEFT JOIN
(
SELECT
ID, Val FROM #tempTable
WHERE Val = 'N'
) t3 ON t.ID = t3.ID
LEFT JOIN
(
SELECT ID, Val
FROM #tempTable
WHERE Val <> 'Y' AND Val <> 'N'
) t4 ON t.ID = t4.ID
Thanks in advance.

Let's answer an easier problem: for each id, get the Val which is last in the alphabet. This will work if Y and N are the only values. And the query is much simpler:
SELECT t.ID, MAX(t.Val) FROM t GROUP BY t.ID;
So, reduce your case to the simple case. Use an enum (if your DB supports it) or break the value codes into another table with a collation column (in this case, you could have 1 for Y, 2 for N, and 999 for all other possible values, and you want the smallest). Then
SELECT ID, c.Val FROM
(SELECT t.ID, MIN(codes.collation) AS mx
FROM t join codes on t.Val = codes.Val GROUP BY t.ID) AS q
JOIN codes c ON mx=c.collation;
Here codes has two columns, Val and Collation.
You can also do this with a CTE type query, as long as you have the Values ordered as you want them. This approach has one join to a small lookup table and should be much, much faster than 3 self-joins.
WITH q AS (SELECT t.id, t.Val, ROW_NUMBER() AS r FROM t JOIN codes ON t.Val=codes.Val
PARTITION BY t.id ORDER BY codes.collation)
SELECT q.id, q.Val WHERE r=1;

I'd change it to this just to make it easier to read:
SELECT DISTINCT t.ID, COALESCE(t2.Val, t3.Val, t4.Val)
FROM #tempTable t
LEFT JOIN #tempTable t2 ON t.ID = t2.ID and t2.Val = 'Y'
LEFT JOIN #tempTable t3 ON t.ID = t3.ID and t3.Val = 'N'
LEFT JOIN #tempTable t4 ON t.ID = t4.ID and t4.Val <> 'Y' AND t4.Val <> 'N'
Gives the same results as your example.
I also looked at the execution plans for both and they looked exactly the same, I doubt you'd see any performance difference.

Try this:
;WITH a AS (
SELECT
ID,
SUM(CASE Val WHEN 'Y' THEN 1 ELSE 0 END) AS y,
SUM(CASE Val WHEN 'N' THEN 0 ELSE 1 END) AS n,
MIN(CASE WHEN Val IN ('Y','N') THEN NULL ELSE Val END) AS first_other
FROM #tempTable
GROUP BY ID
)
SELECT
ID,
CASE WHEN y > 0 THEN 'Y' WHEN n = 0 THEN 'N' ELSE first_other END AS Val
FROM a
If there are any 'Y' values then the sum of y will be greater than 0
If all values are 'N' then the sum of n will be zero
Get the first non 'Y' or 'N' character available if needed
In this case the result can be determined with only one pass through
the table

I'm reading your spec like this:
if any ID is Y then Y
if all IDs are N then N
else display value (other than Y or N)
eliminate rows per (1)
delete from #tempTable
where not Val='Y' and ID in (
select distinct ID
from #tempTable
where Val='Y'
)
select distinct to eliminate multiple N's per (2).
select distinct * from #tempTable
group up various "other" values to get a single row per ID.
SELECT A.Id, AllVals =
SubString(
(SELECT ', ' + B.Val
FROM C as B
WHERE A.Id = B.Id
FOR XML PATH ( '' ) ), 3, 1000)
FROM C as A
GROUP BY Id
Entire runnable query:
declare #tempTable table (ID int, Val char(1))
INSERT INTO #tempTable ( ID, Val ) VALUES ( 10, 'Y')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 11, 'N')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 11, 'N')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 13, 'N')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 14, 'Y')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 14, 'N')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 15, 'Y')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 16, 'Y')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 17, 'F')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 18, 'P')
INSERT INTO #tempTable ( ID, Val ) VALUES ( 18, 'F')
delete from #tempTable
where not Val='Y' and ID in (
select distinct ID
from #tempTable
where Val='Y'
);
WITH C as (select distinct * from #tempTable)
SELECT A.Id, AllVals =
SubString(
(SELECT ', ' + B.Val
FROM C as B
WHERE A.Id = B.Id
FOR XML PATH ( '' ) ), 3, 1000)
FROM C as A
GROUP BY Id
OUTPUT:
Id AllVals
10 Y
11 N
12 Y
13 N
14 Y
15 Y
16 Y
17 F
18 F, P

Related

Need to return an ID which has start and END in sql server

I have a scenario wherein I need to find the ID which only has start and END in it. Below is the table for reference.
Declare #T Table ( ID int, Name varchar(100))
Insert into #T values (1,'Start')
Insert into #T values (1,'END')
Insert into #T values (1,'Stuart')
Insert into #T values (1,'robin')
Insert into #T values (2,'Start')
Insert into #T values (2,'END')
Insert into #T values (3,'Start')
Insert into #T values (4,'END')
I want the Output as:
ID Name
2 Start
2 END
I want those ID which only has start and end in it.
What I tried so far:
SELECT * FROM #T t
WHERE EXISTS (SELECT * FROM #T WHERE id = t.id AND name = 'start')
AND EXISTS (SELECT * FROM #T WHERE id = t.id AND name = 'END')
But my query is giving ID 1 as well.
Can someone please help me rectify the problem.
I presume your issue is that record 1 has a 'Stuart' in it too?
As such, you can do a similar check in the WHERE e.g.,
SELECT * FROM #T t
WHERE EXISTS (SELECT * FROM #T WHERE id = t.id AND name = 'start')
AND EXISTS (SELECT * FROM #T WHERE id = t.id AND name = 'END')
AND NOT EXISTS (SELECT * FROM #T WHERE id = t.id AND name NOT IN ('start','END'))
Note that you may want to consider
What happens if you have two 'start' rows or two 'end' rows (e.g., start-start-end)? Can you even have two 'start' rows (e.g., start-start)?
What happens if you have a blank/NULL (e.g., start-NULL-end)?
EDIT: removed 'What happens if they're out of order (e.g., end-start)?' as a question as there is no sorting in the data at all (e.g., not even an implicit sort).
You can go for CTE to get group wise count and total count as 2.
Declare #T Table ( ID int, Name varchar(100))
Insert into #T values (1,'Start')
Insert into #T values (1,'END')
Insert into #T values (1,'Stuart')
Insert into #T values (1,'robin')
Insert into #T values (2,'Start')
Insert into #T values (2,'END')
Insert into #T values (3,'Start')
Insert into #T values (4,'END')
;WITH CTE_Total_StartEnd AS
(
select id, count(*) AS Total_Cnt
, COUNT( case when Name IN ('Start') THEN 1 END) as start_cnt
, COUNT( case when Name IN ('End') THEN 1 END) as end_cnt
from #t
group by id
having COUNT( case when Name IN ('Start') THEN 1 END) =1 and
COUNT( case when Name IN ('End') THEN 1 END) = 1 and
count(*) = 2
)
SELECT t.* from #t t
inner join CTE_Total_StartEnd as c
ON c.id = t.id
+----+-------+
| ID | Name |
+----+-------+
| 2 | Start |
| 2 | END |
+----+-------+
You can do this by using group by function also like below
WITH cte AS
(
SELECT 1 AS id , 'Start' AS name
UNION ALL
SELECT 1 AS id ,'END' AS name
UNION ALL
SELECT 1 AS id ,'Stuart' AS name
UNION ALL
SELECT 1 AS id ,'robin' AS name
UNION ALL
SELECT 2 AS id ,'Start' AS name
UNION ALL
SELECT 2 AS id ,'END' AS name
UNION ALL
SELECT 3 AS id ,'Start' AS name
UNION ALL
SELECT 4 AS id ,'END' AS name
)
SELECT T.ID,SUM(T.VAL)AS SUM
FROM
(
SELECT id,name , CASE WHEN name='Start' THEN 1
WHEN name='END' THEN 2
ELSE 3
END AS VAL
FROM cte
)T
GROUP BY T.ID
HAVING SUM(T.VAL) =3
could you please try this? Pls note i added collate command in the end of sql.
SQL Server check case-sensitivity?
SELECT * FROM #T t
WHERE EXISTS (SELECT * FROM #T WHERE id = t.id AND name = 'start' COLLATE SQL_Latin1_General_CP1_CS_AS)
AND EXISTS (SELECT * FROM #T WHERE id = t.id AND name = 'END' COLLATE SQL_Latin1_General_CP1_CS_AS)

Insert multiple row different column value

This is my existing table
In this table, each user has their own respective data according to their Status. Each of the user will surely have Status 1.
Now, there are 3 Status to be stored for every user.
Was trying to make every user to have 3 Status, by inserting new row of user copying their Status 1 data, such that:
User Ali currently only have Status 1 and its data, so need insert a new
row Ali with Status 2 and copy along the data from Status 1, again,
insert a new row Ali with Status 3 and copy along the data from
Status 1.
User John currently only have Status 1 and 2, so need insert a new
row John with Status 3 and copy along the data from Status 1.
continue same pattern with other user
Expected result:
I would use CROSS JOIN and NOT EXISTS
with data as
(
select name,
column1,
column2
from your_table
where status = 1
), cross_join_data as
(
select d1.name, t.status, d1.column1, d1.column2
from data d1
cross join
(
select 1 status
union
select 2 status
union
select 3 status
) t
where not exists (
select 1
from your_table d2
where d2.name = d1.name and
d2.status = t.status
)
)
select *
from your_table
union all
select *
from cross_join_data
dbfiddle demo
This should work
with cte as (
select
[Name], coalesce(max(iif([Status]=1, [Column1], null)), max(iif([Status]=2, [Column1], null)), max(iif([Status]=3, [Column1], null))) col1
, coalesce(max(iif([Status]=1, [Column2], null)), max(iif([Status]=2, [Column2], null)), max(iif([Status]=3, [Column2], null))) col2
from
MyTable
group by [Name]
)
--insert into MyTable
select
cte.[Name], nums.n, cte.col1, cte.col2
from
cte
cross join (values (1),(2),(3)) nums(n)
left join MyTable on cte.[Name]=MyTable.[Name] and n=MyTable.[Status]
where
MyTable.[Status] is null
This works if data is not nullable
declare #table table (name varchar(10), status int, data int);
insert into #table values
('a', 1, 2)
, ('a', 2, 5)
, ('a', 3, 7)
, ('b', 1, 5)
, ('b', 2, 6)
, ('c', 1, 3)
select stats.status as statusStats
, tn.name as nameTN
, t.status as statusData, t.name, t.data
, ISNULL(t.data, t1.data) as 'fillInData'
from (values (1),(2),(3)) as stats(status)
cross join (select distinct name from #table) tn
left join #table t
on t.status = stats.status
and t.name = tn.name
join #table t1
on t1.name = tn.name
and t1.status = 1
order by tn.name, stats.status
Here is what I would do:
CREATE TABLE #existingtable (Name VARCHAR(50), Status INT, Column1 VARCHAR (10), Column2 VARCHAR(10));
INSERT INTO #existingtable (Name,Status,Column1,Column2) Values('Ali',1,'100','90');
INSERT INTO #existingtable (Name,Status,Column1,Column2) Values('John',1,'20','200');
INSERT INTO #existingtable (Name,Status,Column1,Column2) Values('John',2,'80','90');
INSERT INTO #existingtable (Name,Status,Column1,Column2) Values('Ming',1,'54','345');
INSERT INTO #existingtable (Name,Status,Column1,Column2) Values('Mei',1,'421','123');
INSERT INTO #existingtable (Name,Status,Column1,Column2) Values('Mei',3,'24','344');
SELECT * FROM #existingtable;
WITH CTE (Name,Column1,Column2)
AS
(
SELECT DISTINCT NAME,COLUMN1,COLUMN2
FROM #existingtable
)
, CTE2 (NAME,Status,Column1,Column2)
AS
(
SELECT NAME,1 AS STATUS,COLUMN1,COLUMN2
FROM CTE
UNION
SELECT NAME,2 AS STATUS,COLUMN1,COLUMN2
FROM CTE
UNION
SELECT NAME,3 AS STATUS,COLUMN1,COLUMN2
FROM CTE
)
INSERT INTO #existingtable (Name,Status,Column1,Column2)
SELECT C.Name,C.Status,C.Column1,C.Column2
FROM CTE2 AS C
LEFT JOIN #existingtable AS E
ON C.NAME = E.Name
AND C.Status = E.Status
WHERE E.Status IS NULL
SELECT * FROM #existingtable
ORDER BY Name, status
This has 2 edits. Initial edit added a where clause to the CTE
Second edit added the values added by the OP

Select only distinct values from two columns from a table

If I have a table such as
1 A
1 B
1 A
1 B
2 C
2 C
And I want to select distinct from the two columns so that I would get
1
2
A
B
C
How can I word my query? Is the only way to concatenate the columns and wrap them around a distinct function operator?
You could use a union to create a table of all values from both columns:
select col1 as BothColumns
from YourTable
union
select col2
from YourTable
Unlike union all, union removes duplicates, even if they come from the same side of the union.
SQL Fiddle
Why even distinct in Union, try this :
select cast(id as char(1)) from test
union
select val from test
Please try:
Select Col1 from YourTable
union
Select Col2 from YourTable
UNION removes duplicate records (where all columns in the results are the same), UNION ALL does not.
Please check What is the difference between UNION and UNION ALL
For multiple columns, you can go for UNPIVOT.
SELECT distinct DistValues
FROM
(SELECT Col1, Col2, Col3
FROM YourTable) p
UNPIVOT
(DistValues FOR Dist IN
(Col1, Col2, Col3)
)AS unpvt;
Try this one -
DECLARE #temp TABLE
(
Col1 INT
, Col2 NVARCHAR(50)
)
INSERT INTO #temp (Col1, Col2)
VALUES (1, 'ab5defg'), (2, 'ae4eii')
SELECT disword = (
SELECT DISTINCT dt.ch
FROM (
SELECT ch = SUBSTRING(t.mtxt, n.number + 1, 1)
FROM [master].dbo.spt_values n
CROSS JOIN (
SELECT mtxt = (
SELECT CAST(Col1 AS VARCHAR(10)) + Col2
FROM #temp
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'
)
) t
WHERE [type] = N'p'
AND number <= LEN(mtxt) - 1
) dt
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'
)
Or try this -
DECLARE #temp TABLE
(
a CHAR(1), b CHAR(1)
)
INSERT INTO #temp (a, b)
VALUES
('1', 'A'), ('1', 'B'), ('1', 'A'),
('1', 'B'), ('2', 'C'), ('2', 'C')
SELECT a
FROM #temp
UNION
SELECT b
FROM #temp
Because what you want select is in different columns, you can use union like below:
select distinct tarCol from
(select distinct column1 as tarCol from table
union
select distinct column2 from table) as tarTab
You can use like this to get multiple distinct column values
(SELECT DISTINCT `enodeb` as res,
"enodeb" as columnname
FROM `raw_metrics`)
UNION
(SELECT DISTINCT `interval` as res,
"interval" as columnname
FROM `raw_metrics`)

Select item type and its base type in SQL Server?

I have a simple table :
DECLARE #t TABLE(item INT, itemType INT )
insert INTO #t SELECT 1000,0
insert INTO #t SELECT 1000,3
insert INTO #t SELECT 1000,5
insert INTO #t SELECT 1000,6
insert INTO #t SELECT 2000,0
insert INTO #t SELECT 2000,3
insert INTO #t SELECT 2000,5
insert INTO #t SELECT 2000,6
insert INTO #t SELECT 3000,0
insert INTO #t SELECT 3000,10
insert INTO #t SELECT 4000,11
I want to select all items where itemtype = 3 but if there is a row , provide also its base itemtype (if it exists) which is itemType = 0.
For example :
for itemType = 3
1000,0 should be provided --why ? because table also has 1000 + itemType 0
1000,3 should be provided --why ? because we looked for itemType=3
2000,0 should be provided --why ? because table has found 2000,3 and this 2000 also has itemType=0
2000,3 should be provided --why ? because we looked for itemType=3
for itemType = 10
3000,0 should be provided --why ? because table has found 3000,10 and this 3000 also has itemType=0
3000,10 should be provided --why ? because we looked for itemType=10
for itemType = 11
4000,11 should be provided --why ? because we looked for itemType=11 ( notice , there isn't itemType 0 , so only itself).
I started doing :
;with cte as(
SELECT * FROM #t
)
select * from cte where itemType=3
In summary, if the itemType is found, provide itself + its zero type (if exists), and also for his siblings ( sample(#1) )
But I can't do union here cause CTE is not recognized there... rubbish. it is possible.
How can I solve it ?
SQL ONLINE
To avoid evaluating the itemType = 3 query twice you can self outer join then use CROSS APPLY ... VALUES to UNPIVOT
DECLARE #itemType INT = 3;
WITH T(item1, itemType1, item2, itemType2 )
AS (SELECT *
FROM #t T1
LEFT JOIN #t T2
ON T1.item = T2.item
AND T2.itemType = 0
AND T1.itemType <> 0
WHERE T1.itemType = #itemType)
SELECT item,
itemType
FROM T
CROSS APPLY ( VALUES (item1, itemType1),
(item2, itemType2) ) v(item, itemType)
WHERE item IS NOT NULL
SQL Fiddle
Execution Plans
DECLARE #findtype INT = 3;
WITH results AS
(
SELECT t.item, #findtype
FROM #t t
WHERE t.itemType = #findtype
UNION ALL
SELECT t.item, 0
FROM #t t
INNER JOIN results r on r.item = t.item
WHERE t.itemType = 0
)
SELECT *
FROM results;
WITH recordList
AS
(
SELECT item, itemType
FROM SampleTable
WHERE itemType = 11 -- change here
)
SELECT item, itemType FROM recordList
UNION
SELECT a.item, a.itemType
FROM SampleTable a
INNER JOIN recordList b
ON a.item = b.item
WHERE a.itemtype = 0
SQLFiddle Demo
SQLFiddle Demo (using IN clause for multiple values)

interesting t-sql exercise

I am trying to resolve t-sql exercise
I need to update first table with values from second by joining by id. If I can not join then use value from default ID (default iD is the Id that is null)
please run it to see it
declare #t as table (
[id] INT
,val int
)
insert into #t values (null, null)
insert into #t values (2, null)
insert into #t values (3, null)
insert into #t values (4, null)
declare #t2 as table (
[id] INT
,val int
)
insert into #t2 values (null, 11)
insert into #t2 values (2, 22)
insert into #t2 values (3, 33)
select * from #t
select * from #t2
update t
set t.val = t2.val
from #t as t join #t2 as t2
on t.id = t2.id
or
(
(t.id is null or t.id not in (select id from #t2))
and t2.id is null
)
select * from #t
here is result
--#t
id val
---------------
NULL NULL
2 NULL
3 NULL
4 NULL
--#t2
id val
---------------
NULL 11
2 22
3 33
--#t after update
id val
---------------
NULL 11
2 22
3 33
4 NULL
how to make val in last row equal 11?
4 11
This solution left joins to t2 twice and then does a coalesce.
The ON on the second join matches on records that failed on the join and then looks for the "Default" case.
UPDATE t
set t.val = COALESCE(t2.val,t3.val)
from #t as t
LEFT join #t2 as t2
on t.id = t2.id
LEFT JOIN #t2 t3
ON t2.id is null and t3.id is null
See it working here
try this for the update...
update t
set t.val = t2.val
from #t as t join #t2 as t2
on t.id = t2.id
or
(
(t.id is null or not exists (select * from #t2 where id = t.id))
and t2.id is null
)
Problem is with not in operator and null values. This would also work...
update t
set t.val = t2.val
from #t as t join #t2 as t2
on t.id = t2.id
or
(
(t.id is null or t.id not in (select id from #t2 where id is not null))
and t2.id is null
)
Here's a technique that may help.
Start with the kind of simple code you want to be writing:
MERGE INTO #t AS target
USING source
ON target.id = source.id
WHEN MATCHED THEN
UPDATE
SET val = source.val;
Then write a table expression (source) that satisfies the requirements.
Requirement 1: "joining by id"
-- simple existential quantification e.g.
SELECT id, val
FROM #t2 AS T2
WHERE id IN ( SELECT id FROM #t )
Requirement 2: "If I can not join then use value from default ID (default iD is the Id that is null)"
-- first find the id values in the target that do not exist in the source:
SELECT id
FROM #t
EXCEPT
SELECT id
FROM #t2
then cross join the result with the row from the source where Id is null:
SELECT DT1.id, T2.val
FROM ( SELECT id
FROM #t
EXCEPT
SELECT id
FROM #t2 ) AS DT1,
#t2 AS T2
WHERE T2.id IS NULL
At this point you will want to query some test data to ensure each query satisfies its respective requirement.
Union the above two results to form a single table expression:
SELECT id, val
FROM #t2 AS T2
WHERE id IN ( SELECT id
FROM #t )
UNION
SELECT DT1.id, T2.val
FROM ( SELECT id
FROM #t
EXCEPT
SELECT id
FROM #t2 ) AS DT1,
#t2 AS T2
WHERE T2.id IS NULL
Then plug the table expression into the MERGE boilerplate code:
WITH source
AS
(
SELECT id, val
FROM #t2 AS T2
WHERE id IN ( SELECT id
FROM #t )
UNION
SELECT DT1.id, T2.val
FROM ( SELECT id
FROM #t
EXCEPT
SELECT id
FROM #t2 ) AS DT1,
#t2 AS T2
WHERE T2.id IS NULL
)
MERGE INTO #t AS target
USING source
ON target.id = source.id
WHEN MATCHED THEN
UPDATE
SET val = source.val;