SQL ORDER BY CSV input parameter - sql

I have sample table and query with the issue described here,
CREATE TABLE test
(
ID INT IDENTITY(1, 1),
NAME VARCHAR(250),
VALUE float
)
INSERT INTO test(NAME,[VALUE])VALUES('A',100)
INSERT INTO test(NAME,[VALUE])VALUES('B',200)
INSERT INTO test(NAME,[VALUE])VALUES('C',200)
SELECT * FROM test WHERE ID IN (2,1,3)
ID NAME VALUE
----------- --------- ----------------
1 A 100
2 B 200
3 C 200
QUERY : when I pass (2,1,3) in WHERE clause it should give result in same ORDER as below :
ID NAME VALUE
----------- --------- ----------------
2 B 200
1 A 100
3 C 200

I have no idea why you would expect the results to be in the order of the in list. SQL works with unordered sets; so there is no intrinsic ordering -- unless it is explicitly done with an order by clause.
It looks like you are using SQL Server. You can do what you want with a join:
SELECT t.*
FROM test t JOIN
(VALUES (2, 1), (1, 2), (3, 3)
) v(id, ordering)
ON t.id = v.id
ORDER BY ordering;

If you grab a copy of DelimitedSplit8K you could do this:
-- assuming these values come in as a parameter:
DECLARE #searchString varchar(100) = '2,1,3';
-- solution using delimitedSplit8K
SELECT t.ID, t.Name, t.VALUE
FROM dbo.test t
JOIN dbo.delimitedSplit8K(#searchString,',') s ON s.item = t.id
ORDER BY s.itemNumber;
Results:
ID Name VALUE
----------- ----- ------
2 B 200
1 A 100
3 C 200
What makes this technique particularly wonderful is how, if you examine the query execution plan, there is no Sort Operator.

Related

SQL Server Pivot/Map column values to rows

I've made schema changes/improvements to a table, but I need to ensure that I don't lose any existing data and it is 'migrated' across to the new schema and conforms to its design.
The existing schema is designed as follows:
ID FK_ID ShowChartX ShowChartY ShowChartZ
-- ----- ---------- ---------- ----------
1 2 1 0 1
The columns of ShowChartX, ShowChartY, and ShowChartZ are of type BIT (boolean).
I've now created a standalone table that keeps a record/reference of each chart. Each Chart record has a Chart_ID - the aim here is to use an ID for each type of chart instead of horizontally scaling a 'ShowChart' column for each type of chart going forward. Essentially, I would like to map all columns of 'ShowChart' to their actual Chart_ID key in the table I mention below:
The new schema would look like this:
ID FK_ID Chart_ID
-- ----- --------
1 2 1
2 2 2
I've started looking at Pivot/Unpivot, but I'm not sure if it's the correct operation. Could anyone please point me in the right direction here? Thanks in advance!
This will UNPIVOT the data. You can also, join the charts table by name in order to get the chart_id and check for differences with the new table:
DECLARE #DataSource TABLE
(
[ID] INT
,[FK_ID] INT
,[ShowChartX] BIT
,[ShowChartY] BIT
,[ShowChartZ] BIT
);
INSERT INTO #DataSource ([ID], [FK_ID], [ShowChartX], [ShowChartY], [ShowChartZ])
VALUES (1, 2, 1, 0, 1);
SELECT [ID]
,[FK_ID]
,[column] AS [chart_name]
FROM #DataSource DS
UNPIVOT
(
[value] FOR [column] IN ([ShowChartX], [ShowChartY], [ShowChartZ])
) UNPVT
WHERE [value] = 1;
For checking for differences it's pretty easy to use EXCEPT - for example:
SELECT *
FROM T1
EXCEPT
SELECT *
FROM T2;
to get records that are not including in T2 but in T1 and then the reverse:
SELECT *
FROM T2
EXCEPT
SELECT *
FROM T1;
Thanks to #gotqn for the table definition and values.
The same result can be achieved using CROSS APPLY. Here, I am deriving Chart_Id based on ChartType, as I don't have the table reference for ChartTypes. Ideally, You can join with ChartTypes to get the corresponding Chart_Id.
DECLARE #DataSource TABLE
(
[ID] INT
,[FK_ID] INT
,[ShowChartX] BIT
,[ShowChartY] BIT
,[ShowChartZ] BIT
);
INSERT INTO #DataSource ([ID], [FK_ID], [ShowChartX], [ShowChartY], [ShowChartZ])
VALUES (1, 2, 1, 0, 1);
SELECT id,
fk_id,
CASE charttype
WHEN 'ChartX' THEN 1
WHEN 'ChartY' THEN 3
WHEN 'ChartZ' THEN 2
END AS Chart_ID
FROM #DataSource
CROSS apply (VALUES('ChartX', showchartx),
('ChartY', showcharty),
('ChartZ', showchartz)) AS t(charttype, isavailable)
WHERE isavailable <> 0;
Result set
+----+-------+----------+
| ID | FK_ID | Chart_ID |
+----+-------+----------+
| 1 | 2 | 1 |
| 1 | 2 | 2 |
+----+-------+----------+

How can I get a random number generated in a CTE not to change in JOIN?

The problem
I'm generating a random number for each row in a table #Table_1 in a CTE, using this technique. I'm then joining the results of the CTE on another table, #Table_2. Instead of getting a random number for each row in #Table_1, I'm getting a new random number for every resulting row in the join!
CREATE TABLE #Table_1 (Id INT)
CREATE TABLE #Table_2 (MyId INT, ParentId INT)
INSERT INTO #Table_1
VALUES (1), (2), (3)
INSERT INTO #Table_2
VALUES (1, 1), (2, 1), (3, 1), (4, 1), (1, 2), (2, 2), (3, 2), (1, 3)
;WITH RandomCTE AS
(
SELECT Id, (ABS(CHECKSUM(NewId())) % 5)RandomNumber
FROM #Table_1
)
SELECT r.Id, t.MyId, r.RandomNumber
FROM RandomCTE r
INNER JOIN #Table_2 t
ON r.Id = t.ParentId
The results
Id MyId RandomNumber
----------- ----------- ------------
1 1 1
1 2 2
1 3 0
1 4 3
2 1 4
2 2 0
2 3 0
3 1 3
The desired results
Id MyId RandomNumber
----------- ----------- ------------
1 1 1
1 2 1
1 3 1
1 4 1
2 1 4
2 2 4
2 3 4
3 1 3
What I tried
I tried to obscure the logic of the random number generation from the optimizer by casting the random number to VARCHAR, but that did not work.
What I don't want to do
I'd like to avoid using a temporary table to store the results of the CTE.
How can I generate a random number for a table and preserve that random number in a join without using temporary storage?
This seems to do the trick:
WITH CTE AS(
SELECT Id, (ABS(CHECKSUM(NewId())) % 5)RandomNumber
FROM #Table_1),
RandomCTE AS(
SELECT Id,
RandomNumber
FROM CTE
GROUP BY ID, RandomNumber)
SELECT *
FROM RandomCTE r
INNER JOIN #Table_2 t
ON r.Id = t.ParentId;
It looks like SQL Server is aware that, at the point of being outside the CTE, that RandomNumber is effectively just NEWID() with some additional functions wrapped around it (DB<>Fiddle), and hence it still generates a unique ID for each row. The GROUP BY clause in the second CTE therefore forces the data engine to define RandomNumber a value so it can perform the GROUP BY.
Per the quote in this answer
The optimizer does not guarantee timing or number of executions of
scalar functions. This is a long-estabilished tenet. It's the
fundamental 'leeway' tha allows the optimizer enough freedom to gain
significant improvements in query-plan execution.
If it is important for your application that the random number be evaluated once and only once you should calculate it up front and store it into a temp table.
Anything else is not guaranteed and so is irresponsible to add into your application's code base - as even if it works now it may break as a result of a schema change/execution plan change/version upgrade/CU install.
For example Lamu's answer breaks if a unique index is added to #Table_1 (Id)
How about not using a real random number at all? Use rand() with a seed:
WITH RandomCTE AS (
SELECT Id,
CONVERT(INT, RAND(ROW_NUMBER() OVER (ORDER BY NEWID()) * 999999) * 5) as RandomNumber
FROM #Table_1
)
SELECT r.Id, t.MyId, r.RandomNumber
FROM RandomCTE rINNER JOIN
#Table_2 t
ON r.Id = t.ParentId;
The seed argument to rand() is pretty awful. Values of the seed near each other produce similar initial values, which is the reason for the multiplication.
Here is the db<>fiddle.

SQL Server : build valid tree filtering invalid branches

I have a table with following data:
ID ParentID Name
-----------------------
1 NULL OK1
2 1 OK2
3 2 OK3
5 4 BAD1
6 5 BAD2
So I need to take only those lines, which are linked to ParentID = NULL OR valid children of such lines (i.e: OK3 is valid because it's linked to OK2, which is linked to OK1, which is linked to NULL, which is valid.)
But BAD1 and BAD 2 are not valid because those are not linked to a line, which is linked to NULL.
The best solution I figured out is a procedure + function. And function is called as many times as the max number of link levels in the table.
Can anybody suggest better solution for such task?
All you need is love, and a basic recursive CTE :-)
Create and populate sample data (Please save us this step in future questions):
DECLARE #T as table
(
ID int,
ParentID int,
Name varchar(4)
)
INSERT INTO #T VALUES
(1, NULL, 'OK1'),
(2, 1, 'OK2'),
(3, 2, 'OK3'),
(5, 4, 'BAD1'),
(6, 5, 'BAD2')
The CTE and query:
;WITH CTE AS
(
SELECT ID, ParentId, Name
FROM #T
WHERE ParentId IS NULL
UNION ALL
SELECT T1.ID, T1.ParentId, T1.Name
FROM #T T1
INNER JOIN CTE T2 ON T1.ParentID = T2.ID
)
SELECT *
FROM CTE
Results:
ID ParentId Name
----------- ----------- ----
1 NULL OK1
2 1 OK2
3 2 OK3

Count value in output with normal rows

I have two tables named TEST and STEPS which are related by Test-Id column.
I am able to get all required columns by doing a join as below.
select t.id,t.name,s.step_no,s.step_data
from test t,steps s
where t.id = s.testid
What I require is that, apart fro the columns, I also need the total count of rows for each match.
Fiddle: http://sqlfiddle.com/#!6/794508/1
Current Output:
ID NAME STEP_NO STEP_DATA
-- ---- ------- ---------
1 TC1 1 Step 1
1 TC1 2 Step 2
1 TC1 3 Step 3
2 TC2 1 Step 1
Required Output:
ID NAME STEP_NO STEP_DATA COUNT
-- ---- ------- --------- -----
1 TC1 1 Step 1 3
1 TC1 2 Step 2 3
1 TC1 3 Step 3 3
2 TC2 1 Step 1 1
Where count is the total number of rows from the STEPS table for each Id in TEST table.
Please let me know if you need any information.
You could just add count(*) over ... to your query:
SELECT
t.id,
t.name,
s.step_no,
s.step_data,
[count] = COUNT(*) OVER (PARTITION BY s.testid)
FROM
test t,
steps s
WHERE
t.id = s.testid
You can read more about the OVER clause here:
OVER Clause (Transact-SQL)
Please consider also getting into the habit of
always specifying the schema for your tables, e.g.
test -> dbo.test
using the proper JOIN syntax, i.e. instead of
FROM
a, b
WHERE
a.col = b.col
do
FROM a
INNER JOIN b
ON a.col = b.col
ending your statements with a semicolon.
So, taking all those points into account, we could rewrite the above query like this:
SELECT
t.id,
t.name,
s.step_no,
s.step_data,
[count] = COUNT(*) OVER (PARTITION BY s.testid)
FROM
dbo.test AS t
INNER JOIN
dbo.steps AS s
ON
t.id = s.testid
;
select t.id,t.name,s.step_no,s.step_data,counts.count
from test t
join steps s ON t.id = s.testid
join (select testid, count(*) as count
from steps
group by testid) counts ON t.id = counts.testid
IS this works ....
DECLARE #test TABLE
(
id int identity primary key,
name varchar(20)
);
INSERT INTO #test VALUES('TC1'), ('TC2');
DECLARE #steps TABLE
(
id int identity primary key,
testid int,
step_no int,
step_data varchar(100)
);
INSERT INTO #steps(testid,step_no,step_data) VALUES
(1,1,'Step 1'), (1,2,'Step 2'),(1,3,'Step 3'),(2,1,'Step 1');
select t.id,t.name,s.step_no,s.step_data,(select SUM(testid) from #steps where testid = s.testid)
from #test t,#steps s
where t.id = s.testid

Fixing duplicate rows in a table

I have a table like below
DECLARE #ProductTotals TABLE
(
id int,
value nvarchar(50)
)
which has following value
1, 'abc'
2, 'abc'
1, 'abc'
3, 'abc'
I want to update this table so that it has the following values
1, 'abc'
2, 'abc_1'
1, 'abc'
3, 'abc_2'
Could someone help me out with this
Use a cursor to move over the table and try to insert every row in a second temporary table. If you get a collision (technically with a select), you can run a second query to get the maximum number (if any) that's appended to your item.
Once you know what maximum number is used (use isnull to cover the case of the first duplicate) just run an update over your original table and keep going with your scan.
Are you looking to remove duplicates? or just change the values so they aren't duplicate?
to change the values use
update producttotals
set value = 'abc_1'
where id =2;
update producttotals
set value = 'abc_2'
where id =3;
to find duplicate rows do a
select id, value
from producttotals
group by id, value
having count() > 2;
Assuming SQL Server 2005 or greater
DECLARE #ProductTotals TABLE
(
id int,
value nvarchar(50)
)
INSERT INTO #ProductTotals
VALUES (1, 'abc'),
(2, 'abc'),
(1, 'abc'),
(3, 'abc')
;WITH CTE as
(SELECT
ROW_NUMBER() OVER (Partition by value order by id) rn,
id,
value
FROM
#ProductTotals),
new_values as (
SELECT
pt.id,
pt.value,
pt.value + '_' + CAST( ROW_NUMBER() OVER (partition by pt.value order by pt.id) as varchar) new_value
FROM
#ProductTotals pt
INNER JOIN CTE
ON pt.id = CTE.id
and pt.value = CTE.value
WHERE
pt.id NOT IN (SELECT id FROM CTE WHERE rn = 1)) --remove any with the lowest ID for the value
UPDATE
#ProductTotals
SET
pt.value = nv.new_value
FROM
#ProductTotals pt
inner join new_values nv
ON pt.id = nv.id and pt.value = nv.value
SELECT * FROM #ProductTotals
Will produce the following
id value
----------- --------------------------------------------------
1 abc
2 abc_1
1 abc
3 abc_2
Explanation of the SQL
The first CTE creates a row number Value. So the numbering gets restarted whenever it sees a new value
rn id value
-------------------- ----------- --------
1 1 abc
2 1 abc
3 2 abc
4 3 abc
The second CTE called new_values ignores any IDs that are assoicated with with a RN of 1. So rn 1 and rn 2 get removed because they share the same ID. It also uses ROW_NUMBER() again to determine the number for the new_value
id value new_value
----------- ------ -------------
2 abc abc_1
3 abc abc_2
The final statement just updates the Old value with the new value