Stuck finding combination differences - sql

I am really stuck on a problem regards to finding out if there is difference between two columns. Row value is a follows:
Serial code
D03L30225 A1
D03L30225 A1
D03L30225 A1
D03L30225 A1
D03L30225 A1
D03L30225 A1
D03L30225 A1
D03L30225 A1
D03L30225 A2
so say if there was another entry like A2 at the end , is there a way of knowing combination serial/code difference.
I have tried windows functions like partition and rank without success

This should work for you. One thing to note is that you have to order by something. Perhaps what I have ordered by is not correct for you situation, but you need something there.
IF OBJECT_ID('tempdb..#Test', 'U') IS NOT NULL DROP TABLE #Test;
create table #Test
(
Serial varchar(10),
code char(2)
)
insert into #Test values ('D03L30225', 'A1')
insert into #Test values ('D03L30225', 'A1')
insert into #Test values ('D03L30225', 'A1')
insert into #Test values ('D03L30225', 'A2')
;
with cte as
(
select rownum = row_number() over (order by Serial, code), Serial, code
from #Test
)
select curr.Serial, curr.code,
case
when curr.code <> prev.code then
1
else
0
end as 'DifferenceFlag'
from cte curr
left join cte prev on prev.rownum = curr.rownum - 1
If you are using SQL Server 2012 or higher you could use the LAG function. We are still on SQL Server 2008 R2. So I needed to do something similar recently I found the method I used above here.

According to your comment, I assume that you want to add another column third_column to your table, and set value for this column according to the change of pair Serial,Code
If that's true, you could use this:
ALTER TABLE
table_name
ADD
third_column numeric(18,0);
UPDATE t
SET t.third_column = t1.rwn
FROM table_name AS t
INNER JOIN
(select
serial, code
,row_number() over (order by serial, code) - 1 as rwn
from
table_name
group by
serial, code
order by
serial, code
) AS t1
ON
t.serial = t1.serial and t.code = t1.code;

I might write the code slightly different. The first query below will just list those codes having more than one serial number. And, the second will flag a whole group of codes where within that code are contained multiple serial numbers.
The other solutions provided will give you a proper row number. In any case, I don't know if this will help, but good luck!
select
code,
count(distinct serial) cnt_serial
from table
group by code
having
count(distinct serial) > 1
OR
select
code,
serial,
case when count(distinct serial) over (partition by code) > 1 then 'Y' end fl_code_has_dup
from table

Related

SQL detect change in row

I have data from sql server attached :
select * from log
What I want to do is I want to check if there any changes in code for the column name. So if you see the data from table log, the code change 2 times (B02,B03).
What I want to do is I want to retrieve the row which is the first changes everytime the code change. In this sample, the first changes is on the red box. So I want to have the result for row 5 and row 9.
I've tried to use partition like code below:
select a.name,a.code from(
select name,code,row_number() over(partition by code order by name) as rank from log)a
where a.rank=1
and get result like this.
However, I don't want the first row to be retrieved. Since it is the first value and I don't need that. So i just want to retrieve the changes indicates by column code. Please help if you know how to do it.
and please note, I can't write query using filter where code <> 'B01', because in this case, I don't know what is the first value.
Please assume the first value is the data that first inserted into the table.
Use lag to get the previous row's value (assuming id specifies ordering) and get the rows where it is different from the current row's value.
create table #log (id int identity(1,1) not null, name nvarchar(100), code nvarchar(100));
insert into #log(name,code) values ('SARUMA','B01'), ('SARUMA','B01'), ('SARUMA','B01'), ('SARUMA','B01');
insert into #log(name,code) values ('SARUMA','B02'), ('SARUMA','B02'), ('SARUMA','B02'), ('SARUMA','B02');
insert into #log(name,code) values ('SARUMA','B03'), ('SARUMA','B03');
select name
,code
from (
select l.*
,lag(code) over (
partition by name order by id
) as prev_code
from #log l
) l
where prev_code <> code
create table #log (name nvarchar(100), code nvarchar(100));
insert into #log values ('SARUMA','B01'), ('SARUMA','B01'), ('SARUMA','B01'), ('SARUMA','B01');
insert into #log values ('SARUMA','B02'), ('SARUMA','B02'), ('SARUMA','B02'), ('SARUMA','B02');
insert into #log values ('SARUMA','B03'), ('SARUMA','B03');
-- remove duplicates
with Singles (name, code)
AS (
select distinct name, code from #log
),
-- At first you need an order, in time? By alphanumerical code? Otherwise you cannot decide which is the first item you want to remove
-- So I added an identity ordering, but it is preferable to use a physical column
OrderedSingles (name, code, id)
AS (
select *, row_number() over(order by name)
from Singles
)
-- Now self-join to get the next one, if the index is sequential you can join id = id+1
-- and take the join columns
select distinct ii.name, ii.Code
from OrderedSingles i
inner join OrderedSingles ii
on i.Name = ii.Name and i.Code <> ii.Code
where i.id < ii.Id;
I think that your original post was pretty close, though you would want the windowing function to be on the [NAME] column, not the code. Please see my modifications, below. I've also changed the predicate to be >1, as 1 would be the original record.
SELECT
a.[name]
,a.[code]
FROM (
SELECT
[name]
,[code]
,ROW_NUMBER() OVER(PARTITION BY [name] order by [name], [code]) AS [rank]
FROM log)a
WHERE a.rank>1
NOTE: you may want to not use NAME as a field, since it is a reserved word. Additionally, RANK is a reserved word as well, and you've used it to alias the ROW_NUMBER in the nested query. You may want to use another non-reserved word for that - personally, I use RANKED for that purpose.

How to use a special while loop in tsql, do while numeric

I'm loading some quite nasty data through Azure data factory
This is how the data looks after being loaded, existing of 2 parts:
1. Metadata of a test
2. Actual measurements of the test -> the measurement is numeric
Image I have about 10 times such 'packages' of 1.Metadata + 2.Measurements
What I would like it to be / what I'm looking for is the following:
The number column with 1,2,.... is what I'm looking for!
Imagine my screenshot could go no further but this goes along until id=10
I guess a while loop is necessary here...
Query before:
SELECT Field1 FROM Input
Query after:
SELECT GeneratedId, Field1 FROM Input
Thanks a lot in advance!
EDIT: added a hint:
Here is a solution, this requires SQL-SERVER 2012 or later.
Start by getting an Id column on your data. If you can do this previous to the script that would be even better, but if not, try something like this...
CREATE TABLE #InputTable (
Id INT IDENTITY(1, 1),
TestData NVARCHAR(MAX) )
INSERT INTO #InputTable (TestData)
SELECT Field1 FROM Input
Now create a query to get the GeneratedId of each package as well as the Id where they start and end. You can do this by getting all the records LIKE 'title%' since that is the first record of each package, then using ROW_NUMBER, Id, and LEAD for the GeneratedId, StartId, and EndId respectively.
SELECT
GeneratedId = ROW_NUMBER() OVER(ORDER BY (Id)),
StartId = Id,
EndId = LEAD(Id) OVER (ORDER BY (Id))
FROM #InputTable
WHERE TestData LIKE 'title%'
Lastly, join this to the input in order to get all the records, with the correct GeneratedId.
SELECT
package.GeneratedId, i.TestData
FROM (
SELECT
GeneratedId = ROW_NUMBER() OVER(ORDER BY (Id)),
StartId = Id,
EndId = LEAD(Id) OVER (ORDER BY (Id))
FROM #InputTable
WHERE TestData LIKE 'title%' ) package
INNER JOIN #InputTable i
ON i.Id >= package.StartId
AND (package.EndId IS NULL OR i.Id < package.EndId)

Select column values from DB for which the subsequent row does not have a specified value

I have a table say MyTable has two columns Id, Data and has following records in it:
Id Data
----------
1. ABCDE00
2. DEFGH11
3. CCCCC21
4. AAAAA00
5. BBBBB10
6. vvvvv00
7. xxxxx88
Now what I want that all the records which have end with string 00 and does not have subsequent row having column ending with 11.
So my output using this condition should be like this:
1. AAAAA00
2. vvvvv00
Any help would be appreciated.
This answer makes some assumptions:
You have a column specifying the ordering. Let me call it id.
By "subsequent row" you mean the row with the next highest id.
You are using SQL Server 2012+.
In that case, lead() does what you want:
select t.*
from (select t.*, lead(data order by id) as next_data
from t
) t
where data like '%00' and (next_data not like '%11' or next_data is null);
Earlier versions of SQL Server have alternative methods for calculating next_data.
if anyone is not using sql server 2012,then they an try this
declare #t table(id int identity(1,1),col1 varchar(100))
insert into #t values
('ABCDE00')
,('DEFGH11')
,('CCCCC21')
,('AAAAA00')
,('BBBBB10')
,('vvvvv00')
,('xxxxx88')
;With CTE as
(
select *,case when CHARINDEX('00',reverse(col1))>0 then 1 end
End00 from #t
)
,CTE1 as
(
select a.id,a.col1 from cte A
where exists
(select id from cte b where a.id=b.id+1 and b.end00 is not null)
and CHARINDEX('11',reverse(a.col1))<=0
)
select a.id,a.col1 from cte A
where exists
(select id from cte1 b where a.id=b.id-1 )

SQL Join on sequence number

I have 2 tables (A, B). They each have a different column that is basically an order or a sequence number. Table A has 'Sequence' and the values range from 0 to 5. Table B has 'Index' and the values are 16740, 16744, 16759, 16828, 16838, and 16990. Unfortunately I do not know the significance of these values. But I do believe they will always match in sequential order. I want to join these tables on these numbers where 0 = 16740, 1 = 16744, etc. Any ideas?
Thanks
You could use a case expression to convert table a's values to table b's values (or vise-versa) and join on that:
SELECT *
FROM a
JOIN b ON a.[sequence] = CASE b.[index] WHEN 16740 THEN 0
WHEN 16744 THEN 1
WHEN 16759 THEN 2
WHEN 16828 THEN 3
WHEN 16838 THEN 4
WHEN 16990 THEN 5
ELSE NULL
END;
#Mureinik has a great example. If down the road you do end up adding more numbers maybe putting this information into a new table would be a good idea.
CREATE TABLE C(
AInfo INT,
BInfo INT
)
INSERT INTO TABLE C(AInfo,BInfo) VALUES(0,16740)
INSERT INTO TABLE C(AInfo,BInfo) VALUES(1,16744)
etc
Then you can Join all the tables.
If the values are in ascending order as per your example, you can use the ROW_NUMBER() function to achieve this:
;with cte AS (SELECT *, ROW_NUMBER() OVER(ORDER BY [Index])-1 RN
FROM B)
SELECT *
FROM cte

SQL Server: row present in one query, missing in another

Ok so I think I must be misunderstanding something about SQL queries. This is a pretty wordy question, so thanks for taking the time to read it (my problem is right at the end, everything else is just context).
I am writing an accounting system that works on the double-entry principal -- money always moves between accounts, a transaction is 2 or more TransactionParts rows decrementing one account and incrementing another.
Some TransactionParts rows may be flagged as tax related so that the system can produce a report of total VAT sales/purchases etc, so it is possible that a single Transaction may have two TransactionParts referencing the same Account -- one VAT related, and the other not. To simplify presentation to the user, I have a view to combine multiple rows for the same account and transaction:
create view Accounting.CondensedEntryView as
select p.[Transaction], p.Account, sum(p.Amount) as Amount
from Accounting.TransactionParts p
group by p.[Transaction], p.Account
I then have a view to calculate the running balance column, as follows:
create view Accounting.TransactionBalanceView as
with cte as
(
select ROW_NUMBER() over (order by t.[Date]) AS RowNumber,
t.ID as [Transaction], p.Amount, p.Account
from Accounting.Transactions t
inner join Accounting.CondensedEntryView p on p.[Transaction]=t.ID
)
select b.RowNumber, b.[Transaction], a.Account,
coalesce(sum(a.Amount), 0) as Balance
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
For reasons I haven't yet worked out, a certain transaction (ID=30) doesn't appear on an account statement for the user. I confirmed this by running
select * from Accounting.TransactionBalanceView where [Transaction]=30
This gave me the following result:
RowNumber Transaction Account Balance
-------------------- ----------- ------- ---------------------
72 30 23 143.80
As I said before, there should be at least two TransactionParts for each Transaction, so one of them isn't being presented in my view. I assumed there must be an issue with the way I've written my view, and run a query to see if there's anything else missing:
select [Transaction], count(*)
from Accounting.TransactionBalanceView
group by [Transaction]
having count(*) < 2
This query returns no results -- not even for Transaction 30! Thinking I must be an idiot I run the following query:
select [Transaction]
from Accounting.TransactionBalanceView
where [Transaction]=30
It returns two rows! So select * returns only one row and select [Transaction] returns both. After much head-scratching and re-running the last two queries, I concluded I don't have the faintest idea what's happening. Any ideas?
Thanks a lot if you've stuck with me this far!
Edit:
Here are the execution plans:
select *
select [Transaction]
1000 lines each, hence finding somewhere else to host.
Edit 2:
For completeness, here are the tables I used:
create table Accounting.Accounts
(
ID smallint identity primary key,
[Name] varchar(50) not null
constraint UQ_AccountName unique,
[Type] tinyint not null
constraint FK_AccountType foreign key references Accounting.AccountTypes
);
create table Accounting.Transactions
(
ID int identity primary key,
[Date] date not null default getdate(),
[Description] varchar(50) not null,
Reference varchar(20) not null default '',
Memo varchar(1000) not null
);
create table Accounting.TransactionParts
(
ID int identity primary key,
[Transaction] int not null
constraint FK_TransactionPart foreign key references Accounting.Transactions,
Account smallint not null
constraint FK_TransactionAccount foreign key references Accounting.Accounts,
Amount money not null,
VatRelated bit not null default 0
);
Demonstration of possible explanation.
Create table Script
SELECT *
INTO #T
FROM master.dbo.spt_values
CREATE NONCLUSTERED INDEX [IX_T] ON #T ([name] DESC,[number] DESC);
Query one (Returns 35 results)
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Query Two (Same as before but adding c2.[type] to the select list makes it return 0 results)
;
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type] ,c2.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Why?
row_number() for duplicate NAMEs isn't specified so it just chooses whichever one fits in with the best execution plan for the required output columns. In the second query this is the same for both cte invocations, in the first one it chooses a different access path with resultant different row_numbering.
Suggested Solution
You are self joining the CTE on ROW_NUMBER() over (order by t.[Date])
Contrary to what may have been expected the CTE will likely not be materialised which would have ensured consistency for the self join and thus you assume a correlation between ROW_NUMBER() on both sides that may well not exist for records where a duplicate [Date] exists in the data.
What if you try ROW_NUMBER() over (order by t.[Date], t.[id]) to ensure that in the event of tied dates the row_numbering is in a guaranteed consistent order. (Or some other column/combination of columns that can differentiate records if id won't do it)
If the purpose of this part of the view is just to make sure that the same row isn't joined to itself
where a.RowNumber <= b.RowNumber
then how does changing this part to
where a.RowNumber <> b.RowNumber
affect the results?
It seems you read dirty entries. (Someone else deletes/insertes new data)
try SET TRANSACTION ISOLATION LEVEL READ COMMITTED.
i've tried this code (seems equal to yours)
IF object_id('tempdb..#t') IS NOT NULL DROP TABLE #t
CREATE TABLE #t(i INT, val INT, acc int)
INSERT #t
SELECT 1, 2, 70
UNION ALL SELECT 2, 3, 70
;with cte as
(
select ROW_NUMBER() over (order by t.i) AS RowNumber,
t.val as [Transaction], t.acc Account
from #t t
)
select b.RowNumber, b.[Transaction], a.Account
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
and got two rows
RowNumber Transaction Account
1 2 70
2 3 70