T-SQL to generate expect result

T-SQL to generate expect result - sql

I have a case in SQL: Source table have three columns: ID, Cate, Type.
With same Cate, Type (A, A-) (B, B-) eliminate each other, and return rows have MAX(Id).
eg:
With cate = AM0001 : Id = 1,2,3 then Id 1,2 eliminate each other --> keep id =3.
With cate = AM003: Id= 4,6 , type = B --> keep both.
With cate = AM005: Id= 7,8,9 row: 7,8 eliminate each other --> keep: Id =9
With cate = AM0006: Id= 10,11, type = A -->keep both.
Expected result:
I'm using cursor to resolve it quite hard to resolve. Is there any clue for resolving it in T-SQL

Assuming that I understand the problem:
you have a number of rows with sections ("Cates") and symbols ("Type");
if there are any symbols ending in a minus sign then these indicate a row without a minus sign should be removed;
symbols are never "mixed" per section, i.e. a section can never have "A" and "B-";
there will always be a row to remove if there is a type with a minus;
rows should be removed starting with the lowest Id.
Then this should work:
DECLARE #data TABLE (
Id INT,
Cate VARCHAR(5),
[Type] VARCHAR(2));
INSERT INTO #data SELECT 1, 'AM001', 'A';
INSERT INTO #data SELECT 2, 'AM001', 'A-';
INSERT INTO #data SELECT 3, 'AM001', 'A';
INSERT INTO #data SELECT 4, 'AM003', 'B';
INSERT INTO #data SELECT 6, 'AM003', 'B';
INSERT INTO #data SELECT 7, 'AM005', 'B';
INSERT INTO #data SELECT 8, 'AM005', 'B-';
INSERT INTO #data SELECT 9, 'AM005', 'B';
INSERT INTO #data SELECT 10, 'AM006', 'A';
INSERT INTO #data SELECT 11, 'AM006', 'A';
INSERT INTO #data SELECT 12, 'AM011', 'B';
INSERT INTO #data SELECT 13, 'AM011', 'B-';
INSERT INTO #data SELECT 14, 'AM011', 'B';
WITH NumberToRemove AS (
SELECT
Cate,
COUNT(*) AS TakeOff
FROM
#data
WHERE
[Type] LIKE '_-'
GROUP BY
Cate),
Ordered AS (
SELECT
Id,
Cate,
[Type],
ROW_NUMBER() OVER (PARTITION BY Cate ORDER BY Id) AS RowId
FROM
#data
WHERE
[Type] NOT LIKE '_-')
SELECT
d.*
FROM
#data d
LEFT JOIN NumberToRemove m ON m.Cate = d.Cate
INNER JOIN Ordered o ON o.Id = d.Id
WHERE
o.RowId > ISNULL(m.TakeOff, 0);
The query works by first counting the number of rows to remove from each section ("Cate") by tallying up the number of symbols with a minus sign per section. Next it sorts the rows where the symbols don't have a minus sign and assigns each row a number in Id order ("row number"), starting back at 1 for each new section ("Cate").
Finally I just pick the rows without a minus sign symbol, where the row number is greater than the number that were to be removed. Note that if a section has no rows to remove then it will return NULL rows to remove, so I transform this to 0, because ALL rows in that section with have a row number greater than 0.
My results were:
Id Cate Type
3 AM001 A
4 AM003 B
6 AM003 B
9 AM005 B
10 AM006 A
11 AM006 A
14 AM011 B
If my assumptions were incorrect then this script could easily be amended to suit...

Related

SQL Server query to find where all preceding numbers are not included per each ID for a specific column

I am having a hard time trying to explain this succinctly but basically I need to query Table A for each ID number and find where in the positions column there are missing sequential numbers for each specific ID. If there is a position 7 for a certain ID, then there should be a 6, 5, 4, 3, 2, 1 position for that ID as well. Each ID can have anywhere from 1-15 position records.
Does anyone have any suggestions on the best way to go about this?
Edited to Add:
There is only one ID column, it is called GlobalID. There is only one Positions column. The end result is that I will update an Issues column with a code specific to the problem, it will populate with PositionsIncorrect for each GlobalID record where there is an incorrect sequence of numbers in the Positions column.

If you just want to identify the gaps, you can use lead() in a subuqery to get the value of the next position for the same id, and then do comparison in the outer query:
select *
from (
select
id,
position,
lead(position) over(partition by id order by position) lead_position
from tableA
) x
where lead_position is not null and lead_position != position + 1
This will return one row for each record of the same id where the next record is not in sequence, along with the position of the next record.

Something like this will show which positions are missing:
DECLARE #t table
(
ID int
, Position int
)
INSERT INTO #t (ID, Position)
VALUES
(1, 4)
, (1, 15)
, (2, 3)
, (2, 10)
;
WITH cte
AS
(
SELECT
ID
, MIN(Position) Position
, MAX(Position) MaxPosition
FROM #t
GROUP BY ID
UNION ALL
SELECT
ID
, Position + 1
, MaxPosition
FROM cte
WHERE Position + 1 <= MaxPosition
)
SELECT
C.ID
, C.Position
, CAST(CASE WHEN T.ID IS NULL THEN 1 ELSE 0 END AS bit) Missing
FROM
cte C
LEFT JOIN #t T ON
C.ID = T.ID
AND C.Position = T.Position
ORDER BY
ID
, Position
OPTION (MAXRECURSION 0)

Count length of consecutive duplicate values for each id

I have a table as shown in the screenshot (first two columns) and I need to create a column like the last one. I'm trying to calculate the length of each sequence of consecutive values for each id.
For this, the last column is required. I played around with
row_number() over (partition by id, value)
but did not have much success, since the circled number was (quite predictably) computed as 2 instead of 1.
Please help!

First of all, we need to have a way to defined how the rows are ordered. For example, in your sample data there is not way to be sure that 'first' row (1, 1) will be always displayed before the 'second' row (1,0).
That's why in my sample data I have added an identity column. In your real case, the details can be order by row ID, date column or something else, but you need to ensure the rows can be sorted via unique criteria.
So, the task is pretty simple:
calculate trigger switch - when value is changed
calculate groups
calculate rows
That's it. I have used common table expression and leave all columns in order to be easy for you to understand the logic. You are free to break this in separate statements and remove some of the columns.
DECLARE #DataSource TABLE
(
[RowID] INT IDENTITY(1, 1)
,[ID]INT
,[value] INT
);
INSERT INTO #DataSource ([ID], [value])
VALUES (1, 1)
,(1, 0)
,(1, 0)
,(1, 1)
,(1, 1)
,(1, 1)
--
,(2, 0)
,(2, 1)
,(2, 0)
,(2, 0);
WITH DataSourceWithSwitch AS
(
SELECT *
,IIF(LAG([value]) OVER (PARTITION BY [ID] ORDER BY [RowID]) = [value], 0, 1) AS [Switch]
FROM #DataSource
), DataSourceWithGroup AS
(
SELECT *
,SUM([Switch]) OVER (PARTITION BY [ID] ORDER BY [RowID]) AS [Group]
FROM DataSourceWithSwitch
)
SELECT *
,ROW_NUMBER() OVER (PARTITION BY [ID], [Group] ORDER BY [RowID]) AS [GroupRowID]
FROM DataSourceWithGroup
ORDER BY [RowID];

You want results that are dependent on actual data ordering in the data source. In SQL you operate on relations, sometimes on ordered set of relations rows. Your desired end result is not well-defined in terms of SQL, unless you introduce an additional column in your source table, over which your data is ordered (e.g. auto-increment or some timestamp column).
Note: this answers the original question and doesn't take into account additional timestamp column mentioned in the comment. I'm not updating my answer since there is already an accepted answer.

One way to solve it could be through a recursive CTE:
create table #tmp (i int identity,id int, value int, rn int);
insert into #tmp (id,value) VALUES
(1,1),(1,0),(1,0),(1,1),(1,1),(1,1),
(2,0),(2,1),(2,0),(2,0);
WITH numbered AS (
SELECT i,id,value, 1 seq FROM #tmp WHERE i=1 UNION ALL
SELECT a.i,a.id,a.value, CASE WHEN a.id=b.id AND a.value=b.value THEN b.seq+1 ELSE 1 END
FROM #tmp a INNER JOIN numbered b ON a.i=b.i+1
)
SELECT * FROM numbered -- OPTION (MAXRECURSION 1000)
This will return the following:
i id value seq
1 1 1 1
2 1 0 1
3 1 0 2
4 1 1 1
5 1 1 2
6 1 1 3
7 2 0 1
8 2 1 1
9 2 0 1
10 2 0 2
See my little demo here: https://rextester.com/ZZEIU93657
A prerequisite for the CTE to work is a sequenced table (e. g. a table with an identitycolumn in it) as a source. In my example I introduced the column i for this. As a starting point I need to find the first entry of the source table. In my case this was the entry with i=1.
For a longer source table you might run into a recursion-limit error as the default for MAXRECURSION is 100. In this case you should uncomment the OPTION setting behind my SELECT clause above. You can either set it to a higher value (like shown) or switch it off completely by setting it to 0.

IMHO, this is easier to do with cursor and loop.
may be there is a way to do the job with selfjoin
declare #t table (id int, val int)
insert into #t (id, val)
select 1 as id, 1 as val
union all select 1, 0
union all select 1, 0
union all select 1, 1
union all select 1, 1
union all select 1, 1
;with cte1 (id , val , num ) as
(
select id, val, row_number() over (ORDER BY (SELECT 1)) as num from #t
)
, cte2 (id, val, num, N) as
(
select id, val, num, 1 from cte1 where num = 1
union all
select t1.id, t1.val, t1.num,
case when t1.id=t2.id and t1.val=t2.val then t2.N + 1 else 1 end
from cte1 t1 inner join cte2 t2 on t1.num = t2.num + 1 where t1.num > 1
)
select * from cte2

sql join using recursive cte

Edit: Added another case scenario in the notes and updated the sample attachment.
I am trying to write a sql to get an output attached with this question along with sample data.
There are two table, one with distinct ID's (pk) with their current flag.
another with Active ID (fk to the pk from the first table) and Inactive ID (fk to the pk from the first table)
Final output should return two columns, first column consist of all distinct ID's from the first table and second column should contain Active ID from the 2nd table.
Below is the sql:
IF OBJECT_ID('tempdb..#main') IS NOT NULL DROP TABLE #main;
IF OBJECT_ID('tempdb..#merges') IS NOT NULL DROP TABLE #merges
IF OBJECT_ID('tempdb..#final') IS NOT NULL DROP TABLE #final
SELECT DISTINCT id,
current
INTO #main
FROM tb_ID t1
--get list of all active_id and inactive_id
SELECT DISTINCT active_id,
inactive_id,
Update_dt
INTO #merges
FROM tb_merges
-- Combine where the id from the main table matched to the inactive_id (should return all the rows from #main)
SELECT id,
active_id AS merged_to_id
INTO #final
FROM (SELECT t1.*,
t2.active_id,
Update_dt ,
Row_number()
OVER (
partition BY id, active_id
ORDER BY Update_dt DESC) AS rn
FROM #main t1
LEFT JOIN #merges t2
ON t1.id = t2.inactive_id) t3
WHERE rn = 1
SELECT *
FROM #final
This sql partially works. It doesn't work, where the id was once active then gets inactive.
Please note:
the active ID should return the last most active ID
the ID which doesn't have any active ID should either be null or the ID itself
ID where the current = 0, in those cases active ID should be the ID current in tb_ID
ID's may get interchanged. For example there are two ID's 6 and 7, when 6 is active 7 is inactive and vice versa. the only way to know the most current active state is by the update date
Attached sample might be easy to understand
Looks like I might have to use recursive cte for achieiving the results. Can someone please help?
thank you for your time!

I think you're correct that a recursive CTE looks like a good solution for this. I'm not entirely certain that I've understood exactly what you're asking for, particularly with regard to the update_dt column, just because the data is a little abstract as-is, but I've taken a stab at it, and it does seem to work with your sample data. The comments explain what's going on.
declare #tb_id table (id bigint, [current] bit);
declare #tb_merges table (active_id bigint, inactive_id bigint, update_dt datetime2);
insert #tb_id values
-- Sample data from the question.
(1, 1),
(2, 1),
(3, 1),
(4, 1),
(5, 0),
-- A few additional data to illustrate a deeper search.
(6, 1),
(7, 1),
(8, 1),
(9, 1),
(10, 1);
insert #tb_merges values
-- Sample data from the question.
(3, 1, '2017-01-11T13:09:00'),
(1, 2, '2017-01-11T13:07:00'),
(5, 4, '2013-12-31T14:37:00'),
(4, 5, '2013-01-18T15:43:00'),
-- A few additional data to illustrate a deeper search.
(6, 7, getdate()),
(7, 8, getdate()),
(8, 9, getdate()),
(9, 10, getdate());
if object_id('tempdb..#ValidMerge') is not null
drop table #ValidMerge;
-- Get the subset of merge records whose active_id identifies a "current" id and
-- rank by date so we can consider only the latest merge record for each active_id.
with ValidMergeCTE as
(
select
M.active_id,
M.inactive_id,
[Priority] = row_number() over (partition by M.active_id order by M.update_dt desc)
from
#tb_merges M
inner join #tb_id I on M.active_id = I.id
where
I.[current] = 1
)
select
active_id,
inactive_id
into
#ValidMerge
from
ValidMergeCTE
where
[Priority] = 1;
-- Here's the recursive CTE, which draws on the subset of merges identified above.
with SearchCTE as
(
-- Base case: any record whose active_id is not used as an inactive_id is an endpoint.
select
M.active_id,
M.inactive_id,
Depth = 0
from
#ValidMerge M
where
not exists (select 1 from #ValidMerge M2 where M.active_id = M2.inactive_id)
-- Recursive case: look for records whose active_id matches the inactive_id of a previously
-- identified record.
union all
select
S.active_id,
M.inactive_id,
Depth = S.Depth + 1
from
#ValidMerge M
inner join SearchCTE S on M.active_id = S.inactive_id
)
select
I.id,
S.active_id
from
#tb_id I
left join SearchCTE S on I.id = S.inactive_id;
Results:
id active_id
------------------
1 3
2 3
3 NULL
4 NULL
5 4
6 NULL
7 6
8 6
9 6
10 6

Need help writing an SQL query to count non duplicate rows (not a distinct count)

I have a table like below. I'm trying to do a count of IDs that are not duplicated. I don't mean a distinct count. A distinct count would return a result of 7 (a, b, c, d, e, f, g). I want it to return a count of 4 (a, c, d, f). These are the IDs that do not have multiple type codes. I've tried the following queries but got counts of 0 (the result should be a count in the millions).
select ID, count (ID) as number
from table
group by ID
having count (ID) = 1
Select count (distinct ID)
From table
Having count (ID) = 1
ID|type code
a|111
b|222
b|333
c|444
d|222
e|111
e|333
e|555
f|444
g|333
g|444
thanks to #scaisEdge! The first query you provided gave me exactly what I'm looking for in the above question. Now that that's figured out my leaders have asked for it to be taken a step further to show the count of how many times there is an ID within a single type code. For example, we want to see
type code|count
111|1
222|1
444|2
There are 2 instances of IDs that have a single type code of 444 (c, f), there is one instance of an ID that has a single type code of 111 (a), and 222 (d). I've tried modifying the query as such, but have been coming across errors when running the query
select count(admin_sys_tp_cd) as number
from (
select cont_id from
imdmadmp.contequiv
group by cont_id
having count(*) =1) t
group by admin_sys_tp_cd

If you want the count Could be
select count(*) from (
select id from
my_table
group by id
having count(*) =1
) t
if you want the id
select id from
my_table
group by id
having count(*) =1

Hou about this you do a loop in a temporary table?:
select
*
into #control
from tablename
declare #acum as int
declare #code as char(3)
declare #id as char(1)
declare #id2 as int
select #acum=0
while exists (select* from #control)
begin
select #code = (select top 1 code from #control order by id)
select #id = (select top 1 id from #control order by id)
select #id2 =count(id) from #control where id in (select id from tablename where id = #id and code <> #code)
if #id2=0
begin
select #acum = #acum+1
end
delete #control
where id = #id --and code = #code
end
drop table #control
print #acum

Simple SQL: How to calculate unique, contiguous numbers for duplicates in a set?

Let's say I create a table with an int Page, int Section, and an int ID identity field, where the page field ranges from 1 to 8 and the section field ranges from 1 to 30 for each page. Now let's say that two records have duplicate page and section. How could I renumber those two records so that the sequence of page and section numbering is contiguous?
select page, section
from #fun
group by page, section having count(*) > 1
shows the duplicates:
page 1 section 3
page 2 section 3
page 1 section 4 and page 2 section 4 are missing. Is there a way without using a cursor to find and renumber the positions in SQL 2000 that doesn't support Row_Number()?
This rownum below of course produces exactly the same number as in section:
select page, section,
(select count(*) + 1
from #fun b
where b.page = a.page and b.section < a.section) as rownum
from #fun a
I could create a pivot table having values 1 through 100, but what would I join against?
What I want to do is something like this:
update p set section = (expression that gets 4)
from #fun p
where (expression that identifies duplicate sections by page)

I don't have a 2000 server to test this on, but I think it should work.
Create test tables/data:
CREATE TABLE #fun
(Id INT IDENTITY(100,1)
,page INT NOT NULL
,section INT NOT NULL
)
INSERT #fun (page, section)
SELECT 1,1
UNION ALL SELECT 1,3 UNION ALL SELECT 1,2
UNION ALL SELECT 1,3 UNION ALL SELECT 1,5
UNION ALL SELECT 2,1 UNION ALL SELECT 2,2
UNION ALL SELECT 2,3 UNION ALL SELECT 2,5
UNION ALL SELECT 2,3
Now the processing:
-- create a worktable
CREATE TABLE #fun2
(Id INT IDENTITY(1,1)
,funId INT
,page INT NOT NULL
,section INT NOT NULL
)
-- insert data into the second temp table ordered by the relevant columns
-- the identity column will form the basis of the revised section number
INSERT #fun2 (funId, page, section)
SELECT Id,page,section
FROM #fun
ORDER BY page,section,Id
-- write the calculated section value back where it is different
UPDATE p
SET section = y.calc_section
FROM #fun AS p
JOIN
(
SELECT f2.funId, f2.id - x.adjust calc_section
FROM #fun2 AS f2
JOIN (
-- this subquery is used to calculate an offset like
-- PARTITION BY in a 2005+ ROWNUMBER function
SELECT MIN(Id) - 1 adjust, page
FROM #fun2
GROUP BY page
) AS x
ON f2.page = x.page
) AS y
ON p.Id = y.funId
WHERE p.section <> y.calc_section
SELECT * FROM #fun order by page, section

Disclaimer: I don't have SQL Server to test.
If I understand you correctly, if you knew the ROW_NUMBER of your #fun records partitioned over (page, section) duplicates, you could use this relative ranking to increment the "section":
UPDATE p
SET section = section + (rownumber - 1)
FROM #fun AS p
INNER JOIN ( -- SELECT id, ROW_NUMBER() OVER (PARTITION BY page, section) ...
SELECT id, COUNT(1) AS rownumber
FROM #fun a
LEFT JOIN #fun b
ON a.page = b.page AND a.section = b.section AND a.id <= b.id
GROUP BY a.id, a.page, a.section) d
ON p.id = d.id
WHERE rownumber > 1
That won't handle the case where the number of duplicates push you past your upper limit of 30. It may also create new duplicates where if higher numbered sections per page already exist -- that is, one instance of (pg 1, sec 3) becomes (pg 1, sec 4), which already existed -- but you can run the UPDATE repeatedly until no duplicates exist.
And then add a unique index on (page, section).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

T-SQL to generate expect result - sql

Related

SQL Server query to find where all preceding numbers are not included per each ID for a specific column

Count length of consecutive duplicate values for each id

sql join using recursive cte

Need help writing an SQL query to count non duplicate rows (not a distinct count)

Simple SQL: How to calculate unique, contiguous numbers for duplicates in a set?

Categories

Resources