Related
I have a T-SQL query to do in the most efficient way as possible.
Here is an example of my table:
A
B
C
D
E
1
x, y
z
NULL
NULL
2
x
NULL
NULL
y
3
y
z
NULL
NULL
4
a
NULL
b
x
Now, I need to do a query to classify my best matching records. Let's say that I need to take the top 3 records that match the more of the values 'x' & 'y' (it could be more than 2 values) into the columns B, C, D, E
A
NumberOfMatches
Comment
1
2
Because Column B contains x, y
2
2
Because Column B contains x & Column E contains y
3
1
Because Column B contains y
4
1
Because Column E contains x
Could you help me to find a good way to do this query?
I would highly recommend that you change how your data is stored, storing delimited strings to record multiple values is a recipe for disaster. If you have a one to many relationship use a child table with a foreign key to the main table. There is very rarely a reason to store delimited data like this, and when you have to query it and manipulate it you realise why it is not advised.
Assuming that each of your columns are the same and can hold many values, you'll need to split all of them, which you can do using this:
SELECT t.A, upvt.Col, Value = TRIM(ss.value)
FROM #T AS t
CROSS APPLY (VALUES ('B', t.B), ('C', t.C), ('D', t.D), ('E', t.E)) upvt (Col, Value)
CROSS APPLY STRING_SPLIT(upvt.Value, ',') AS ss;
Which gives:
A
Col
Value
1
B
x
1
B
y
1
C
z
2
B
x
2
E
y
3
B
y
3
C
z
4
B
a
4
D
b
4
E
x
With this normalised data, you can then just do a simple WHERE Value IN ('x', 'y') along with GROUP BY and COUNT(*):
IF OBJECT_ID(N'tempdb..#T ', 'U') IS NOT NULL
DROP TABLE #T ;
CREATE TABLE #T (A INT, B VARCHAR(4), C VARCHAR(4), D VARCHAR(4), E VARCHAR(4));
INSERT #T(A, B, C, D, E)
VALUES
(1, 'x, y', 'z', NULL, NULL),
(2, 'x', NULL, NULL, 'y'),
(3, 'y', 'z', NULL, NULL),
(4, 'a', NULL, 'b', 'x');
SELECT t.A, NumberOfMatches = COUNT(*)
FROM #T AS t
CROSS APPLY (VALUES (t.B), (t.C), (t.D), (t.E)) upvt (Value)
CROSS APPLY STRING_SPLIT(upvt.Value, ',') AS ss
WHERE TRIM(ss.value) IN ('x', 'y')
GROUP BY t.A
ORDER BY COUNT(*) DESC, t.A;
If you can't normalize the model you can use this query to get your expected result:
select a, count(*) numberOfMatches,
concat('Because column ', string_agg(columnName, ', ')) AS comment
from (
select a, columnName, trim(value) targetValue from (
select a, columnName, value result
from tbl unpivot (value for columnName in ([B], [C], [D], [E])) up
) t
outer apply string_split(result,',')
) r
where targetValue in ('x','y')
group by a
-- Result
/*
a numberOfMatches comment
1 2 Because column B, B
2 2 Because column B, E
3 1 Because column B
4 1 Because column E
*/
Update
You can use this custom function in order to implement string_split in SQL Server prior versions.
IF OBJECT_ID('[dbo].[STRING_SPLIT]','IF') IS NULL BEGIN
EXEC ('CREATE FUNCTION [dbo].[STRING_SPLIT] () RETURNS TABLE AS RETURN SELECT 1 X')
END
GO
ALTER FUNCTION [dbo].[STRING_SPLIT]
(
#string nvarchar(MAX),
#separator nvarchar(MAX)
)
RETURNS TABLE WITH SCHEMABINDING
AS RETURN
WITH X(N) AS (SELECT 'Table1' FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) T(C)),
Y(N) AS (SELECT 'Table2' FROM X A1, X A2, X A3, X A4, X A5, X A6, X A7, X A8) , -- Up to 16^8 = 4 billion
T(N) AS (SELECT TOP(ISNULL(LEN(#string),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 N FROM Y),
Delim(Pos) AS (SELECT t.N FROM T WHERE (SUBSTRING(#string, t.N, LEN(#separator+'x')-1) LIKE #separator OR t.N = 0)),
Separated(value) AS (SELECT SUBSTRING(#string, d.Pos + LEN(#separator+'x')-1, LEAD(d.Pos,1,2147483647) OVER (ORDER BY (SELECT NULL)) - d.Pos - LEN(#separator))
FROM Delim d
WHERE #string IS NOT NULL)
SELECT s.value
FROM Separated s
WHERE s.value <> #separator
GO
Update 2
Just 1 if the field has x,y at the same time:
select a, count(distinct columnName) numberOfMatches,
concat('Because column ', string_agg(columnName, ', ')) AS comment
from (
select a, columnName, trim(value) targetValue from (
select a, columnName, value result
from tbl unpivot (value for columnName in ([B], [C], [D], [E])) up
) t
outer apply string_split(result,',')
) r
where targetValue in ('x','y')
group by a
/* another alternative */
select a, count(*) numberOfMatches,
concat('Because column ', string_agg(columnName, ', ')) AS comment
from (
select a, columnName, value targetValue
from tbl unpivot (value for columnName in ([B], [C], [D], [E])) up
) t
where targetValue like '%x%' or targetValue like '%y%'
group by a
-- Result
/*
a numberOfMatches comment
1 1 Because column B
2 2 Because column B, E
3 1 Because column B
4 1 Because column E
*/
I've two database tables, one called "Headers" and one called "Rows".
The structure is:
Header: IDPK | Description
Row: IDPK | IDPK_Header | Item_ID | Qty
I need to do a query that says: "From a Header, IDPK find another header that have the same number of rows and the same item ID and quantity".
For example:
Header Rows
IDPK Description IDPK Item_ID Qty
1 'Test1' 1 'A' 10
1 'Test1' 2 'B' 20
2 'Test2' 3 'A' 10
2 'Test2' 4 'B' 20
3 'Test3' 5 'A' 5
3 'Test3' 6 'B' 20
4 'Test4' 7 'A' 10
Header Test1 match Test2 but not Test3 and Test4
The problem is that the number of rows must be exactly the same. I try with ALL operator but without luck.
How I can do the query with an eye for the performance? The two tables can be very huge (~500.000 records).
Assuming there are no duplicates:
with r as (
select r.*, count(*) over (partition by idpk_header) as num_items
from rows r
)
select r1.idpk_header, r2.idpk_header
from r r1 join
r r2
on r1.item_id = r1.item_id and r2.qty = r1.qty and r2.num_items = r1.num_items
group by r1.idpk_header, r2.idpk_header, r1.num_items
having count(*) = r1.num_items;
Basically, this does a self-join on the items, so you only get matches. The on validates that the two have the same number of items. And the having guarantees that all match.
Note: This version returns each match of the header to itself. That is a nice check. You can of course filter this out in the on or a where clause.
If you do have duplicate items, you can simply replace r with:
select idpk_header, item_id, sum(qty) as qty,
count(*) over (partition by idpk_header) as num_items
from rows r
group by idpk_header, item_id;
I woul suggest using a forxml query in order to create the list of items per IDPK. Next I would search for matching item lists and quantities. See following example:
DECLARE #Headers TABLE(
IDPK INT,
Description NVARCHAR(100)
)
DECLARE #Rows TABLE(
IDPK INT,
ITEMID NVARCHAR(1),
Qty INT
)
INSERT INTO #Headers VALUES
(1, 'Test1'),
(2, 'Test2'),
(3, 'Test3'),
(4, 'Test4'),
(5, 'Test5')
INSERT INTO #Rows VALUES
(1, 'A', 10),
(1, 'B', 20),
(2, 'A', 10),
(2, 'B', 20),
(3, 'A', 5 ),
(3, 'B', 20),
(4, 'C', 10),
(5, 'A', 10),
(5, 'C', 20)
;
WITH cteHeaderRows AS(
SELECT IDPK
,ItemIDs=STUFF(
(
SELECT ',' + CAST(ITEMID AS VARCHAR(MAX))
FROM #Rows t2
WHERE t2.IDPK = t1.IDPK
ORDER BY ITEMID, QTY
FOR XML PATH('')
),1,1,''
)
,Qtys=STUFF(
(
SELECT ',' + CAST(Qty AS VARCHAR(MAX))
FROM #Rows t2
WHERE t2.IDPK = t1.IDPK
ORDER BY ITEMID, QTY
FOR XML PATH('')
),1,1,''
)
FROM #Rows t1
GROUP BY IDPK
),
cteFilter AS(
SELECT h1.IDPK AS IDPK1, h2.IDPK AS IDPK2
FROM cteHeaderRows h1
JOIN cteHeaderRows h2 ON h1.IDPK != h2.IDPK AND h1.ItemIDs = h2.ItemIDs AND h2.Qtys = h1.Qtys
)
SELECT DISTINCT h.IDPK, h.Description, r.ItemID, r.Qty
FROM #Headers h
JOIN cteFilter f ON f.IDPK1 = h.IDPK
JOIN #Rows r ON r.IDPK = f.IDPK1
ORDER BY 1,3,4
We have a requirement to assign row number to all rows using following rule
Row if pinned should have same row number
Otherwise sort it by GMD
Example:
ID GMD IsPinned
1 2.5 0
2 0 1
3 2 0
4 4 1
5 3 0
Should Output
ID GMD IsPinned RowNo
5 3 0 1
2 0 1 2
1 2.5 0 3
4 4 1 4
3 2 0 5
Please Note row number for Id's 2 and 4 stayed intact as they are pinned with values of 2 and 4 respectively even though the GMD are not in any order
Rest of rows Id's 1, 3 and 5 row numbers are sorted using GMD desc
I tried using RowNumber SQL 2012 however, it is pushing pinned items from their position
Here's a set-based approach to solving this. Note that the first CTE is unnecessary if you already have a Numbers table in your database:
declare #t table (ID int,GMD decimal(5,2),IsPinned bit)
insert into #t (ID,GMD,IsPinned) values
(1,2.5,0), (2, 0 ,1), (3, 2 ,0), (4, 4 ,1), (5, 3 ,0)
;With Numbers as (
select ROW_NUMBER() OVER (ORDER BY ID) n from #t
), NumbersWithout as (
select
n,
ROW_NUMBER() OVER (ORDER BY n) as rn
from
Numbers
where n not in (select ID from #t where IsPinned=1)
), DataWithout as (
select
*,
ROW_NUMBER() OVER (ORDER BY GMD desc) as rn
from
#t
where
IsPinned = 0
)
select
t.*,
COALESCE(nw.n,t.ID) as RowNo
from
#t t
left join
DataWithout dw
inner join
NumbersWithout nw
on
dw.rn = nw.rn
on
dw.ID = t.ID
order by COALESCE(nw.n,t.ID)
Hopefully my naming makes it clear what we're doing. I'm a bit cheeky in the final SELECT by using a COALESCE to get the final RowNo when you might have expected a CASE expression. But it works because the contents of the DataWithout CTE is defined to only exist for unpinned items which makes the final LEFT JOIN fail.
Results:
ID GMD IsPinned RowNo
----------- --------------------------------------- -------- --------------------
5 3.00 0 1
2 0.00 1 2
1 2.50 0 3
4 4.00 1 4
3 2.00 0 5
Second variant that may perform better (but never assume, always test):
declare #t table (ID int,GMD decimal(5,2),IsPinned bit)
insert into #t (ID,GMD,IsPinned) values
(1,2.5,0), (2, 0 ,1), (3, 2 ,0), (4, 4 ,1), (5, 3 ,0)
;With Numbers as (
select ROW_NUMBER() OVER (ORDER BY ID) n from #t
), NumbersWithout as (
select
n,
ROW_NUMBER() OVER (ORDER BY n) as rn
from
Numbers
where n not in (select ID from #t where IsPinned=1)
), DataPartitioned as (
select
*,
ROW_NUMBER() OVER (PARTITION BY IsPinned ORDER BY GMD desc) as rn
from
#t
)
select
dp.ID,dp.GMD,dp.IsPinned,
CASE WHEN IsPinned = 1 THEN ID ELSE nw.n END as RowNo
from
DataPartitioned dp
left join
NumbersWithout nw
on
dp.rn = nw.rn
order by RowNo
In the third CTE, by introducing the PARTITION BY and removing the WHERE clause we ensure we have all rows of data so we don't need to re-join to the original table in the final result in this variant.
this will work:
CREATE TABLE Table1
("ID" int, "GMD" number, "IsPinned" int)
;
INSERT ALL
INTO Table1 ("ID", "GMD", "IsPinned")
VALUES (1, 2.5, 0)
INTO Table1 ("ID", "GMD", "IsPinned")
VALUES (2, 0, 1)
INTO Table1 ("ID", "GMD", "IsPinned")
VALUES (3, 2, 0)
INTO Table1 ("ID", "GMD", "IsPinned")
VALUES (4, 4, 1)
INTO Table1 ("ID", "GMD", "IsPinned")
VALUES (5, 3, 0)
SELECT * FROM dual
;
select * from (select "ID","GMD","IsPinned",rank from(select m.*,rank()over(order by
"ID" asc) rank from Table1 m where "IsPinned"=1)
union
(select "ID","GMD","IsPinned",rank from (select t.*,rank() over(order by "GMD"
desc)-1 rank from (SELECT * FROM Table1)t)
where "IsPinned"=0) order by "GMD" desc) order by rank ,GMD;
output:
2 0 1 1
5 3 0 1
1 2.5 0 2
4 4 1 2
3 2 0 3
Can you try this query
CREATE TABLE Table1
(ID int, GMD numeric (18,2), IsPinned int);
INSERT INTO Table1 (ID,GMD, IsPinned)
VALUES (1, 2.5, 0),
(2, 0, 1),
(3, 2, 0),
(4, 4, 1),
(5, 3, 0)
select *, row_number () over(partition by IsPinned order by (case when IsPinned =0 then GMD else id end) ) [CustOrder] from Table1
This took longer then I thought, the thing is row_number would take a part to resolve the query. We need to differentiate the row_numbers by id first and then we can apply the while loop or cursor or any iteration, in our case we will just use the while loop.
dbo.test (you can replace test with your table name)
1 2.5 False
2 0 True
3 3 False
4 4 True
6 2 False
Here is the query I wrote to achieve your result, I have added comment under each operation you should get it, if you have any difficultly let me know.
Query:
--user data table
DECLARE #userData TABLE
(
id INT NOT NULL,
gmd FLOAT NOT NULL,
ispinned BIT NOT NULL,
rownumber INT NOT NULL
);
--final result table
DECLARE #finalResult TABLE
(
id INT NOT NULL,
gmd FLOAT NOT NULL,
ispinned BIT NOT NULL,
newrownumber INT NOT NULL
);
--inserting to uer data table from the table test
INSERT INTO #userData
SELECT t.*,
Row_number()
OVER (
ORDER BY t.id ASC) AS RowNumber
FROM test t
--creating new table for ids of not pinned
CREATE TABLE #ids
(
rn INT,
id INT,
gmd FLOAT
)
-- inserting into temp table named and adding gmd by desc
INSERT INTO #ids
(rn,
id,
gmd)
SELECT DISTINCT Row_number()
OVER(
ORDER BY gmd DESC) AS rn,
id,
gmd
FROM #userData
WHERE ispinned = 0
--declaring the variable to loop through all the no pinned items
DECLARE #id INT
DECLARE #totalrows INT = (SELECT Count(*)
FROM #ids)
DECLARE #currentrow INT = 1
DECLARE #assigningNumber INT = 1
--inerting pinned items first
INSERT INTO #finalResult
SELECT ud.id,
ud.gmd,
ud.ispinned,
ud.rownumber
FROM #userData ud
WHERE ispinned = 1
--looping through all the rows till all non-pinned items finished
WHILE #currentrow <= #totalrows
BEGIN
--skipping pinned numers for the rows
WHILE EXISTS(SELECT 1
FROM #finalResult
WHERE newrownumber = #assigningNumber
AND ispinned = 1)
BEGIN
SET #assigningNumber = #assigningNumber + 1
END
--getting row by the number
SET #id = (SELECT id
FROM #ids
WHERE rn = #currentrow)
--inserting the non-pinned item with new row number into the final result
INSERT INTO #finalResult
SELECT ud.id,
ud.gmd,
ud.ispinned,
#assigningNumber
FROM #userData ud
WHERE id = #id
--going to next row
SET #currentrow = #currentrow + 1
SET #assigningNumber = #assigningNumber + 1
END
--getting final result
SELECT *
FROM #finalResult
ORDER BY newrownumber ASC
--dropping table
DROP TABLE #ids
Output:
I have a table with 2 columns OLD_VALUE and NEW_VALUE and 5 rows. 1st row has values (A,B). Other row values can be (B,C),(C,D),(E,D),(D,F). I want to update all the old values with the new value (how a vlookup in excel would work) The Final Result Required: The newest value in the above example would be D,F. i.e. D points to F. E and C point to D. B points to C and A points to B. D pointing to F is the last and newest and there are no more successions after D,F. So (OLD_VALUE,NEW_VALUE)->(A,F), (B,F), (C,F), (D,F), (E,F). I want 5 rows with the NEW_VALUE as 'F'. The level of successions can be ranging from 1 to x.
This is the table I have used for the script:
declare #t as table(old_value char(1), new_value char(1));
insert into #t values('A','B')
insert into #t values('B','C')
insert into #t values('C','D')
insert into #t values('E','D')
insert into #t values('D','F')
This needs to be done with a recursive CTE. First, you will need to define an anchor for the CTE. The anchor in this case should be the record with the latest value. This is how I define the anchor:
select old_value, new_value, 1 as level
from #t
where new_value NOT IN (select old_value from #t)
And here is the recursive CTE I used to locate the latest value for each row:
;with a as(
select old_value, new_value, 1 as level
from #t
where new_value NOT IN (select old_value from #t)
union all
select b.old_value, a.new_value, a.level + 1
from a INNER JOIN #t b ON a.old_value = b.new_value
)
select * from a
Results:
old_value new_value level
--------- --------- -----------
D F 1
C F 2
E F 2
B F 3
A F 4
(5 row(s) affected)
I think a recursive CTE like the following is what you're looking for (where the parent is the row whose second value does not exist as a first value elsewhere). If there's no parent(s) to anchor to, this would fail (e.g. if you had A->B, B->C, C->A, you'd get no result), but it should work for your case:
DECLARE #T TABLE (val1 CHAR(1), val2 CHAR(2));
INSERT #T VALUES ('A', 'B'), ('B', 'C'), ('C', 'D'), ('E', 'D'), ('D', 'F');
WITH CTE AS
(
SELECT val1, val2
FROM #T AS T
WHERE NOT EXISTS (SELECT 1 FROM #T WHERE val1 = T.val2)
UNION ALL
SELECT T.val1, CTE.val2
FROM #T AS T
JOIN CTE
ON CTE.val1 = T.val2
)
SELECT *
FROM CTE;
Currently I have a system that is dumping data into a table with the format:
Table1
Id#, row#, row_dump
222, 1, “set1 = aaaa set2 =aaaaaa aaaa dd set4=1111”
I want to take the row dump and transpose it into rows and insert it into another table of the format:
Table2
Id#, setting, value
222, ‘set1’,’aaa’
222, ‘set2’,’aaaaaa aaaa dd’
222, ‘set4’,’1111’
Is there a way to make a trigger in MSSQL that will parse this string on insert in Table1 and insert it into Table2 properly?
All of the examples I’ve found required a common delimiter. ‘=’ separates the setting from the value, space(s) separate a value from a setting but a value could have spaces in it (settings do not have spaces in them so the last word before the equal sign is the setting name but there could be spaces between the setting name and equal sign).
There could be 1-5 settings and values in any given row. The values can have spaces. There may or may not be space between the setting name and the ‘=’ sign.
I have no control over the original insert process or format as it is used for other purposes.
You could use 'set' as a delimiter. This is a simple sample. It obviously may have to be molded to your environment.
use tempdb
GO
IF OBJECT_ID('dbo.fn_TVF_Split') IS NOT NULL
DROP FUNCTION dbo.fn_TVF_Split;
GO
CREATE FUNCTION dbo.fn_TVF_Split(#arr AS NVARCHAR(2000), #sep AS NCHAR(3))
RETURNS TABLE
AS
RETURN
WITH
L0 AS (SELECT 1 AS C UNION ALL SELECT 1) --2 rows
,L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B) --4 rows (2x2)
,L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B) --16 rows (4x4)
,L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B) --256 rows (16x16)
,L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B) --65536 rows (256x256)
,L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B) --4,294,967,296 rows (65536x65536)
,Nums AS (SELECT row_number() OVER (ORDER BY (SELECT 0)) AS N FROM L5)
SELECT
(n - 1) - LEN(REPLACE(LEFT(#arr, n-1), #sep, N'')) + 1 AS pos,
SUBSTRING(#arr, n, CHARINDEX(#sep, #arr + #sep, n) - n) AS element
FROM Nums
WHERE
n <= LEN(#arr) + 3
AND SUBSTRING(#sep + #arr, n, 3) = #sep
AND N<=100000
GO
declare #t table(
Id int,
row int,
row_dump varchar(Max)
);
insert into #t values(222, 1, 'set1 = aaaa set2 =aaaaaa aaaa dd set4=1111')
insert into #t values(111, 2, ' set1 =cx set2 =4444set4=124')
DECLARE #t2 TABLE(
Id int,
Setting VARCHAR(6),
[Value] VARCHAR(50)
)
insert into #t2 (Id,Setting,Value)
select
Id,
[Setting]='set' + left(LTRIM(element),1),
[Value]=RIGHT(element,charindex('=',reverse(element))-1)
from #t t
cross apply dbo.fn_TVF_Split(row_dump,'set')
where pos > 1
order by
id asc,
'set' + left(LTRIM(element),1) asc
select *
from #t2
Update
You could do something like this. It is not optimal and could probably be better handled in the transformation tool or application. Anyway here we go.
Note: You will need the split function I posted before.
declare #t table(
Id int,
row int,
row_dump varchar(Max)
);
insert into #t values(222, 1, 'set1 = aaaa set2 =aaaaaa aaaa dd set3=abc set4=1111 set5=7373')
insert into #t values(111, 2, 'set1 =cx set2 = 4444 set4=124')
DECLARE #t2 TABLE(
Id int,
Setting VARCHAR(6),
[Value] VARCHAR(50)
)
if OBJECT_ID('tempdb.dbo.#Vals') IS NOT NULL
BEGIN
DROP TABLE #Vals;
END
CREATE TABLE #Vals(
Id INT,
Row INT,
Element VARCHAR(MAX),
pos int,
value VARCHAR(MAX)
);
insert into #Vals
select
Id,
row,
element,
pos,
Value=STUFF(LEFT(element,len(element) - CHARINDEX(' ',reverse(element))),1,1,'')
from(
select
Id,
row,
row_dump = REPLACE(REPLACE(REPLACE(row_dump,'= ','='),' =','='),'=','=|')
from #t
) AS t
cross apply dbo.fn_TVF_Split(row_dump,'=')
where pos >=1 and pos < 10
insert into #t2 (Id,Setting,Value)
select
t1.Id,
Setting =
(
SELECT TOP 1
CASE WHEN t2.pos = 1
THEN LTRIM(RTRIM(t2.element))
ELSE LTRIM(RTRIM(RIGHT(t2.element,CHARINDEX(' ',REVERSE(t2.element)))))
END
FROM #Vals t2
where
t2.Id = t1.id
and t2.row = t1.row
and t2.pos < t1.pos
ORDER BY t2.pos DESC
),
t1.Value
from #Vals t1
where t1.pos > 1 and t1.pos < 10
order by t1.id,t1.row,t1.pos
select * from #t2