Comparing rows in a sql table - sql

I have a table which tracks changes to an entity, and I'm trying to extract the changes. The structure is more or less this:
| RowNumber | Value | SourceID| TargetID |
| 1 | A | 100 | 50 |
| 2 | B | 100 | 100 |
| 3 | C | 200 | 100 |
My select is
select t1.Value as Old, t2.Value as New from MyTable t1
inner join MyTable t2 on t1.SourceID = t2.TargetID
where t1.value != t2.value
Which gives me :
|Old|New|
|A | B |
|A | C |
|B | C |
The problem is, the data was changed from A->B, then from B->C. It never actually changed from A->C and I can't for the life of me find a way of doing this in one query, I realise that a cursor could achieve this going through the rows in order.
Is this possible in one query?

You can use the ROW_NUMBER window function to find the first next one.
Example:
declare #MyTable table (RowNumber int primary key identity(1,1), [Value] varchar(30), SourceID int, TargetID int);
insert into #MyTable ([Value], SourceID, TargetID) values
('A', 100, 50),
('B', 100, 100),
('C', 200, 100);
SELECT Old, New
FROM
(
select
t1.[Value] as Old,
t2.[Value] as New,
row_number() over (partition by t1.RowNumber order by t2.RowNumber) as RN
from #MyTable t1
join #MyTable t2
on t2.TargetID = t1.SourceID AND t2.RowNumber > t1.RowNumber
) q
WHERE RN = 1;
Returns:
Old New
A B
B C

Related

How to add items from another table based on a string aggregated column

I have 2 tables like this
[Table 1]:
|cust_id| tran |item |
| ------| -----|-------
| id1 | 123 |a,b,c |
| id2 | 234 |b,b |
| id3 | 345 |c,d,a,b|
[Table 2]:
| item. | value |
| ----- | ----- |
| a | 1 |
| b | 2 |
| c | 3 |
| d | 4 |
I want to create a target value by doing a lookup from table 2 in table 1 using big query.
|cust_id| tran.|item |target|
| ------| -----|------|------|
| id1 | 123 |a,b,c | 6
| id2 | 234 |b,b | 4
| id3 | 345 |c,d,a,b| 10
What can I try next?
Consider below simple approach
select *,
( select sum(value)
from unnest(split(item)) item
join table2
using (item)
) target
from table1
if applied to sample data in your question - output is
Try the following:
select t1.cust_id
, t1.tran
, t1.item
, sum(t2.value) as target
from table_1 t1
, UNNEST(split(t1.item ,',')) as item_unnested
LEFT JOIN table_2 t2
on item_unnested=t2.item
group by t1.cust_id
, t1.tran
, t1.item
With your data it gives the following:
Create a center table that splits the item column values on rows and join that table with table2.
Try following
--Cursor is used to split the item data row by row
--#temp is a temporary table
create table #temp (id varchar(10), trans varchar(10), item varchar(10), item1 varchar(10));
DECLARE #item varchar(10);
DECLARE #id varchar(10);
DECLARE #trans varchar(10);
DECLARE item_cusor CURSOR FOR
SELECT *
FROM table1;
OPEN item_cusor
FETCH NEXT FROM item_cusor
INTO #id,#trans,#item
WHILE ##FETCH_STATUS = 0
BEGIN
insert into #temp
SELECT #id,#trans,#item,*
FROM STRING_SPLIT (#item, ',')
FETCH NEXT FROM item_cusor
INTO #id,#trans,#item
END
CLOSE item_cusor;
DEALLOCATE item_cusor;
--select * from temp
select t.id as cust_id, t.trans,t.item , sum(cast(t2.value as int)) as target
from #temp t
JOIN table2 t2
on t.item1=t2.item
group by t.id, t.trans,t.item;
Cursors: https://www.c-sharpcorner.com/article/cursors-in-sql-server/
Temporary tables: https://www.sqlservertutorial.net/sql-server-basics/sql-server-temporary-tables/
String split function: https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql

Split Row and Paste to Different Tables Based on Column

I have a table like this. Table is populated each time an order is complete. One order can have one or many compartments.
+---------+-------+-------------+------+
| OrderID | Plant | Compartment | Qty |
+---------+-------+-------------+------+
| 91 | 12 | 1 | 2000 |
| 91 | 12 | 2 | 2000 |
| 91 | 12 | 3 | 2000 |
| 90 | 12 | 1 | 3000 |
| 89 | 12 | 1 | 5000 |
+---------+-------+-------------+------+
Please help write an SQL script that takes the above and splits it into two new tables like so:
Table 1
+---------+-------+
| OrderID | Plant |
+---------+-------+
| 91 | 12 |
| 90 | 12 |
| 89 | 12 |
+---------+-------+
Table 2
+---------+-------------+------+
| OrderID | Compartment | Qty |
+---------+-------------+------+
| 91 | 1 | 2000 |
| 91 | 2 | 2000 |
| 91 | 3 | 2000 |
| 90 | 1 | 3000 |
| 89 | 1 | 5000 |
+---------+-------------+------+
I've tried using the DISTINCT command as suggested;
SELECT * FROM table
WHERE [OrderID] = (SELECT DISTINCT OrderID from table where (COMPARTMENT = '1'))
Which returns the error;
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
If the script can keep track of already processed rows so as to avoid duplication each time it runs, that would be the icing on the cake.
This is how I'd do:
-- just for debugging, your table is my #table variable table
set nocount on
declare #table table (OrderId int, Plant int, Compartment int, Qty int)
insert into #table values (89, 12, 1, 5000)
insert into #table values (90, 12, 1, 3000)
insert into #table values (91, 12, 3, 2000)
insert into #table values (91, 12, 2, 2000)
insert into #table values (91, 12, 1, 2000)
insert into #table values (91, 12, 2, 3000)
set nocount off
--select * from #table
-- now here it comes selections:
if (exists (select *
from INFORMATION_SCHEMA.TABLES
where TABLE_SCHEMA = 'dbo' -- or your schema
and TABLE_NAME = 'THeader' -- or your header's table))
insert into THeader
select distinct t1.OrderId, t1.Plant
from #table t1
left join THeader t2 on t1.OrderId = t2.OrderId and t1.Plant = t2.Plant
where t2.OrderId is null
else
select distinct t1.OrderId, t1.Plant
into THeader
from #table t1
left join THeader t2 on t1.OrderId = t2.OrderId and t1.Plant = t2.Plant
where t2.OrderId is null
if (exists (select *
from INFORMATION_SCHEMA.TABLES
where TABLE_SCHEMA = 'dbo' -- or your schema
and TABLE_NAME = 'TChilds' -- or your childs's table))
insert into TChilds
select distinct t1.OrderId, t1.Compartment, t1.Qty
from #table t1
left join TChilds t2 on t1.OrderId = t2.OrderId and t1.Compartment = t2.Compartment and t1.Qty = t2.Qty
where t2.OrderId is null
else
select distinct t1.OrderId, t1.Compartment, t1.Qty
into TChilds
from #table t1
left join TChilds t2 on t1.OrderId = t2.OrderId and t1.Compartment = t2.Compartment and t1.Qty = t2.Qty
where t2.OrderId is null
-- just for debugging
select * from THeader
select * from TChilds
Edit: Even so, I see your master table as a child table, from where you must create Header's table. It's enough. I mean your Table can be my TChilds table, and key can be OrderId + Plant. (I don't know what means plant in this table)

How to modify duplicate strings in SQL Server?

I'm dealing with truncating a column of string values in a table from 15 characters to 10 characters (this is the new max length I want to permit for the column).
There is a unique key on a pair of columns in the table, this one being one of them.
Because of the truncation, there is a possibility that this could be violated.
For example:
| ID | C1 | C2 |
| -- | --------------- | -- |
| 1 | 123456789012345 | 1 |
| 2 | 123456789012346 | 1 |
| 3 | 123456789012345 | 2 |
| 4 | 123456789012346 | 2 |
Let's say I have a unique key on C1 and C2. C1 is currently varchar(15), but for reasons that are beyond my control, it's being changed to varchar(10).
I have to truncate the values in C1 to strings of length 10. But if I just do so mindlessly, I'll obviously end up (in the example above) violating the unique key constraint.
So, I know how to find all the duplicates using something like:
select
t1.ID,
LEFT(t1.C1, 10) as C1,
t1.C2
INTO
#ColumnDuplicates
FROM
t t1
join t t2 on
t1.ID <> t2.ID
AND LEFT(t1.C1, 10) = LEFT(t2.C1, 10)
WHERE
t1.C2 = t2.C2
SELECT * FROM #ColumnDuplicates
Referring to the table above, this query would get me:
| ID | C1 | C2 |
| -- | ---------- | -- |
| 1 | 1234567890 | 1 |
| 2 | 1234567890 | 1 |
| 3 | 1234567890 | 2 |
| 4 | 1234567890 | 2 |
Now here's where I'm not sure how to do the next step. What I need to do is somehow get to this:
| ID | C1 | C2 |
| -- | ---------- | -- |
| 1 | 123456_001 | 1 |
| 2 | 123456_002 | 1 |
| 3 | 123456_001 | 2 |
| 4 | 123456_002 | 2 |
Effectively, I want to find all the duplicate C1 values for each C2 value, and then change the last 4 characters to a _[0-9][0-9][0-9] pattern, and progressively number those duplicates from 000 (or 001, I don't really care which is used as the starting point) through to a maximum of 999. This will give me space to deal with around 999 duplicates per C2 value, which I am quite sure based on my familiarity with the data I'm working with will not be an issue.
And then I can easily just use this temporary table to update the C1 values in the main table I am modifying.
My knowledge of SQL at the moment is quite basic, so I don't really know how to accomplish this.
If you are lucky, you can look at duplicates in the first six characters. I say lucky, because this assumes you never have more than 1000 such duplicates:
with toupdate as (
select t.*,
row_number() over (partition by left(c1, 6), c2 order by c2) as seqnum,
count(*) over (partition by left(c1, 6), c2) as cnt
from t
)
update toupdate
set c1 = (case when cnt > 1
then concat(left(c1, 6), '_', format(seqnum, '000'))
else left(c1, 10)
end);
The above is a little pessimistic with respect to duplicates. It probably makes sense to filter out known singletons before using row_number():
with toupdate as (
select t.*,
row_number() over (partition by left(c1, 6), c2,
(case when cnt10 > 1 then 1 else 2 end)
order by c2
) as seqnum,
count(*) over (partition by left(c1, 6), c2,
(case when cnt10 > 1 then 1 else 2 end)
) as cnt6
from (select t.*,
count(*) over (partition by left(c1, 10), c2) as cnt10
from t
) t
)
update toupdate
set c1 = (case when cnt10 > 1
then concat(left(c1, 6), '_', format(seqnum, '000'))
else left(c1, 10)
end);
You can use an updatable CTE to achieve this:
CREATE TABLE dbo.YourTable (ID int NOT NULL,
C1 varchar(15) NOT NULL,
C2 int NOT NULL);
CREATE UNIQUE INDEX YourIndex ON dbo.YourTable (C1,C2);
GO
INSERT INTO dbo.YourTable (ID, C1, C2)
VALUES (1,'123456789012345',1),
(2,'123456789012346',1),
(3,'123456789012345',2),
(4,'123456789012346',2);
GO
WITH CTE AS(
SELECT C1,
LEFT(YT.C1,6) + '_' + RIGHT(CONCAT('000',ROW_NUMBER() OVER (ORDER BY YT.C1, YT.C2 ASC)),3) AS NewC1
FROM dbo.YourTable YT
WHERE LEN(YT.C1) > 10) --Unsure if that WHERE is needed
UPDATE CTE
SET C1 = NewC1;
GO
DROP INDEX YourIndex ON dbo.YourTable; --Has to be dropped to alter
ALTER TABLE dbo.YourTable ALTER COLUMN C1 varchar(10) NOT NULL;
GO
CREATE UNIQUE INDEX YourIndex ON dbo.YourTable (C1,C2); --Recreate
GO
SELECT *
FROM dbo.YourTable;
GO
DROP TABLE dbo.YourTable;

SQL Loop and Join

I have a table:
Vers | Rev
3 | A
7 | B
13 | C
And a second table:
Info | Version
aab | 1
adr | 2
bhj | 3
bgt | 4
nnh | 4
ggt | 7
I need to have a table:
Info | Version | Rev
aab | 1 | A
adr | 2 | A
bhj | 3 | A
bgt | 4 | B
nnh | 4 | B
ggt | 7 | B
How do I achieve the final table?
Rev A is for Versions 1-3, Rev B is versions 4-7, Rev C is versions 5-13.
If I were trying to do this with VB Excel, I would add a 1 in a new column. Then get the first Vers value (3) - second Vers value (7) then output 4....
Then I would use some logic If <= new column and >= Vers write Rev.
I don't know how to do this in SQL and I need to!
Try this you can do it by joining tables
select
t2.Info Info
,t2.Version Version
,t1.Rev Rev
from table1 t1,table2 t2
where t2.Version=t1.Vers;
Use outer apply:
select t2.*, t1.rev
from table2 t2 outer apply
(select top (1) t1.*
from table1 t1
where t2.version <= t1.vers
order by t1.vers asc
) t1;
This gets the "next" version in table1 relative to each version in table2.
You can also do this with a subquery:
SELECT *
, (SELECT TOP 1 b.rev
FROM Table1 b
WHERE a.version <= b.vers
ORDER BY b.vers)
FROM Table2 a
Or a third version:
declare #t1 table(V int, R char(1))
insert #t1 values (3,'A'),(7,'B'),(13,'C')
declare #t2 table(I char(3), V int)
insert #t2 values ('aab',1),('adr',2),('bhj',3),('bgt',4),('nnh',4),('ggt',7)
select t2.*, t1.R
from #t2 t2
join #t1 t1 on t1.V>=t2.V and not exists(select * from #t1 t3 where t3.V>=t2.v and t3.V<t1.V)

How to query the previous record that is in another table?

I have a view that shows something like the following:
View VW
| ID | DT | VAL|
|----|------------|----|
| 1 | 2016-09-01 | 7 |
| 2 | 2016-08-01 | 5 |
| 3 | 2016-07-01 | 8 |
I have a table with historical date that has something like:
Table HIST
| ID | DT | VAL|
|----|------------|----|
| 1 | 2016-06-27 | 4 |
| 1 | 2016-06-29 | 3 |
| 1 | 2016-07-15 | 0 |
| 1 | 2016-09-12 | 8 |
| 2 | 2016-05-05 | 3 |
What I need is to add another column to my view with a boolean that means "the immediately previous record exist in history and has a related value greater than zero".
The expected output is the following:
| ID | DT | VAL| FLAG |
|----|------------|----|------|
| 1 | 2016-09-01 | 7 | false| -- previous is '2016-07-15' and value is zero. '2016-09-12' in hist is greater than '2016-09-01' in view, so it is not the previous
| 2 | 2016-08-01 | 5 | true | -- previous is '2016-05-05' and value is 3
| 3 | 2016-07-01 | 8 | false| -- there is no previous value in HIST table
What have I tried
I've used the query below. It works for small loads, but fails in performance in production because my view is extremely complex and the historical table is too large. Is it possible to query this without using the view multiple times? (if so, the performance should be better and I won't see anymore timeouts)
You can test here http://rextester.com/l/sql_server_online_compiler
create table vw (id int, dt date, val int);
insert into vw values (1, '2016-09-01', 7), (2, '2016-08-01', 5), (3, '2016-07-01', 8);
create table hist (id int, dt date, val int);
insert into hist values (1, '2016-06-27', 4), (1, '2016-06-29', 3), (1, '2016-07-15', 0), (1, '2016-09-12', 8), (2, '2016-05-05', 3);
select vw.id, vw.dt, vw.val, (case when hist_with_flag.flag = 'true' then 'true' else 'false' end)
from vw
left join
(
select hist.id, (case when hist.val > 0 then 'true' else 'false' end) flag
from
(
select hist.id, max(hist.dt) as dt
from hist
inner join vw on vw.id = hist.id
where hist.dt < vw.dt
group by hist.id
) hist_with_max_dt
inner join hist
on hist.id = hist_with_max_dt.id and hist.dt = hist_with_max_dt.dt
) hist_with_flag
on vw.id = hist_with_flag.id
You can use OUTER APPLY in order to get the immediately previous record:
SELECT v.ID, v.DT, v.VAL,
IIF(t.VAL IS NULL OR t.VAL = 0, 'false', 'true') AS FLAG
FROM Vw AS v
OUTER APPLY (
SELECT TOP 1 VAL, DT
FROM Hist AS h
WHERE v.ID = h.ID AND v.DT > h.DT
ORDER BY h.DT DESC) AS t
Can you please try with this query, it returns same result as your query. It should work good performance wise
SELECT vw.id, MAX(vw.dt) dt,
MAX(vw.val) val,
case when MAX(h.val) > 0 then 'true' else 'false' END flag
FROM vw
OUTER APPLY(SELECT MAX(dt) dt FROM hist WHERE vw.id = hist.id
AND dt<vw.dt GROUP BY hist.id) t
LEFT JOIN hist h ON vw.id = h.id AND h.dt = t.dt
GROUP BY vw.id
You can avoid multiple JOIN using a simple CTE with 'ROW_NUMBER'.
;with cte_1
as
(select vw.id, vw.dt, vw.val,hist.val HistVal,hist.dt HistDt,ROW_NUMBER()OVER (PARTITION BY vw.id,vw.dt ORDER BY vw.id,vw.dt,hist.dt desc) RNO
FROM vw
left join hist
on hist.id = vw.id and hist.dt < vw.dt
)
SELECT Id,Dt,Val,case when ISNULL(HistVal,0)=0 THEN 'FALSE' ELSE 'TRUE' END as FLAG
FROM cte_1 WHERE RNO=1