Invert The Sequence - sql

This is a question that was asked to me during an interview. For a Table like this (Which may have hundreds or thousands of records of this kind)
what is the best way to revert the Seq so that A will be seq 4 and B will be seq 1 like this :
I gave him the below query (I used CTE Just to create the sample scenario. In the original case it was a table ) :
WITH CTE
AS
(
SELECT
Seq = 1,Nm = 'A'
UNION
SELECT
Seq = 2,Nm = 'A'
UNION
SELECT
Seq = 3,Nm = 'B'
UNION
SELECT
Seq = 4,Nm = 'B'
)
SELECT
Seq = ROW_NUMBER() OVER(ORDER BY Seq DESC),
Seq,
Nm
FROM CTE;
are there any alternative Dynamic queries of achieving the same in a more efficient way?

How about?
select seq, (case when nm = 'B' then 'A' else 'B' end) as nm
from t;

If I save the count of the table to a declared variable, then I can subquery to get a reversed 2nd column. The SQL Server Execution plan indicates the batch cost is less than with ROWNUMBER().
DECLARE #tablename TABLE(seq int PRIMARY KEY, Nm char(1))
INSERT INTO #tablename VALUES (1,'A'), (2,'A'), (3,'B'), (4,'B')
DECLARE #count int = (SELECT COUNT(*) FROM #tablename)
SELECT seq
,(SELECT Nm
FROM #tablename T2
WHERE T2.seq = #count +1 - T1.seq) AS [Nm]
FROM #tablename T1

An alternative query, not using an OVER-clause
SELECT
(SELECT MAX(mytable.Seq) FROM mytable) - mytable.Seq + 1 As Seq,
Nm
FROM mytable
ORDER BY mytable.Seq DESC

I don't think it's appropriate to help with interview questions here, but since this one is so unrealistic, and there are so many complicated answers here, I pose this much simpler solution:
DECLARE #MaxSeq INT;
SELECT #MaxSeq = MAX(Seq) FROM Table;
UPDATE Table
Set Seq = #MaxSeq - Seq + 1

Related

Selecting data from table where sum of values in a column equal to the value in another column

Sample data:
create table #temp (id int, qty int, checkvalue int)
insert into #temp values (1,1,3)
insert into #temp values (2,2,3)
insert into #temp values (3,1,3)
insert into #temp values (4,1,3)
According to data above, I would like to show exact number of lines from top to bottom where sum(qty) = checkvalue. Note that checkvalue is same for all the records all the time. Regarding the sample data above, the desired output is:
Id Qty checkValue
1 1 3
2 2 3
Because 1+2=3 and no more data is needed to show. If checkvalue was 4, we would show the third record: Id:3 Qty:1 checkValue:4 as well.
This is the code I am handling this problem. The code is working very well.
declare #checkValue int = (select top 1 checkvalue from #temp);
declare #counter int = 0, #sumValue int = 0;
while #sumValue < #checkValue
begin
set #counter = #counter + 1;
set #sumValue = #sumValue + (
select t.qty from
(
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY id ASC) AS rownumber,
id,qty,checkvalue
FROM #temp
) AS foo
WHERE rownumber = #counter
) t
)
end
declare #sql nvarchar(255) = 'select top '+cast(#counter as varchar(5))+' * from #temp'
EXECUTE sp_executesql #sql, N'#counter int', #counter = #counter;
However, I am not sure if this is the best way to deal with it and wonder if there is a better approach. There are many professionals here and I'd like to hear from them about what they think about my approach and how we can improve it. Any advice would be appreciated!
Try this:
select id, qty, checkvalue from (
select t1.*,
sum(t1.qty) over (partition by t2.id) [sum]
from #temp [t1] join #temp [t2] on t1.id <= t2.id
) a where checkvalue = [sum]
Smart self-join is all you need :)
For SQL Server 2012, and onwards, you can easily achieve this using ROWS BETWEEN in your OVER clause and the use of a CTE:
WITH Running AS(
SELECT *,
SUM(qty) OVER (ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningQty
FROM #temp t)
SELECT id, qty, checkvalue
FROM Running
WHERE RunningQty <= checkvalue;
One basic improvement is to try & reduce the no. of iterations. You're incrementing by 1, but if you repurpose the logic behind binary searching, you'd get something close to this:
DECLARE #RoughAverage int = 1 -- Some arbitrary value. The closer it is to the real average, the faster things should be.
DECLARE #CheckValue int = (SELECT TOP 1 checkvalue FROM #temp)
DECLARE #Sum int = 0
WHILE 1 = 1 -- Refer to BREAK below.
BEGIN
SELECT TOP (#RoughAverage) #Sum = SUM(qty) OVER(ORDER BY id)
FROM #temp
ORDER BY id
IF #Sum = #CheckValue
BREAK -- Indicating you reached your objective.
ELSE
SET #RoughAverage = #CheckValue - #Sum -- Most likely incomplete like this.
END
For SQL 2008 you can use recursive cte. Top 1 with ties limits result with first combination. Remove it to see all combinations
with cte as (
select
*, rn = row_number() over (order by id)
from
#temp
)
, rcte as (
select
i = id, id, qty, sumV = qty, checkvalue, rn
from
cte
union all
select
a.id, b.id, b.qty, a.sumV + b.qty, a.checkvalue, b.rn
from
rcte a
join cte b on a.rn + 1 = b.rn
where
a.sumV < b.checkvalue
)
select
top 1 with ties id, qty, checkvalue
from (
select
*, needed = max(case when sumV = checkvalue then 1 else 0 end) over (partition by i)
from
rcte
) t
where
needed = 1
order by dense_rank() over (order by i)

SQL Server window functions: building up a history string

I have a table of longitudinal data that looks like this:
where id is the partition variable, period is the time dimension, and val is the observation value.
I want to build up a history of val for each panel of id, like this:
I'm trying to do this with SQL window functions and not a cursor, but the issue I keep running into is the self-referential nature of the hist column definition. It almost seems like I'd have to create one row/column per period. For example, the closest I could come was this:
IF OBJECT_ID('dbo.my_try', 'U') IS NOT NULL
DROP TABLE dbo.my_try;
GO
SELECT
id, period, val,
CASE
WHEN (
period = MIN(period)
OVER (PARTITION by id order by period ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
) THEN CAST (val AS VARCHAR(60))
ELSE NULL
END AS hist
INTO my_try
FROM my_test
SELECT
id, period, val,
CASE
WHEN (
period = MIN(period) OVER
(PARTITION by id order by period ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
) THEN hist
ELSE (
CONCAT(
val, ' | ', LAG(hist, 1) OVER (PARTITION by id order by period)
)
)
END AS hist2
FROM my_try
I would have to spool out the iteration and do a hist3, etc. for it to finally work.
Is it possible to accomplish this with SQL window functions, or is cursor the only route?
Sample Data
Here is some code to generate the original table:
CREATE TABLE my_test (
id INT,
period INT,
val INT
)
BEGIN
DECLARE #id INT = 1;
DECLARE #period INT = 1;
WHILE #id <= 3
BEGIN
SET #period = 1
WHILE #period <= 3
BEGIN
INSERT INTO my_test VALUES (#id, #period, #period * POWER(10, #id))
SET #period = #period + 1
END
SET #id = #id + 1
END
END
Actually you don't need recursion here. You can leverage STUFF pretty easily. Of course if you are on 2017 you can use string_agg as suggested above. But if you are like me and your company is not the fastest to adopt the latest and greatest you can use this.
select t1.id
, t1.period
, t1.val
, STUFF((select ' | ' + convert(varchar(10), val)
from my_test t2
where t2.id = t1.id
and t2.period <= t1.period
order by t1.period
FOR XML PATH('')), 1, 3,'')
from my_test t1
order by t1.id
, t1.period
As discussed in the comments try using recursive query
with cte as(
select id, [period], val, convert(varchar(max), val) as agg from my_try where [period] = 1
union all
select t.id, t.[period], t.val, CONCAT(c.agg, ' | ', t.val) from my_try t join cte c on c.[period] +1 = t.[period] and c.id = t.id
)
select * from cte order by id, [period]

Loop through sql result set and remove [n] duplicates

I've got a SQL Server db with quite a few dupes in it. Removing the dupes manually is just not going to be fun, so I was wondering if there is any sort of sql programming or scripting I can do to automate it.
Below is my query that returns the ID and the Code of the duplicates.
select a.ID, a.Code
from Table1 a
inner join (
SELECT Code
FROM Table1 GROUP BY Code HAVING COUNT(Code)>1)
x on x.Code= a.Code
I'll get a return like this, for example:
5163 51727
5164 51727
5165 51727
5166 51728
5167 51728
5168 51728
This snippet shows three returns for each ID/Code (so a primary "good" record and two dupes). However this isnt always the case. There can be up to [n] dupes, although 2-3 seems to be the norm.
I just want to somehow loop through this result set and delete everything but one record. THE RECORDS TO DELETE ARE ARBITRARY, as any of them can be "kept".
You can use row_number to drive your delete.
ie
CREATE TABLE #table1
(id INT,
code int
);
WITH cte AS
(select a.ID, a.Code, ROW_NUMBER() OVER(PARTITION by COdE ORDER BY ID) AS rn
from #Table1 a
)
DELETE x
FROM #table1 x
JOIN cte ON x.id = cte.id
WHERE cte.rn > 1
But...
If you are going to be doing a lot of deletes from a very large table you might be better off to select out the rows you need into a temp table & then truncate your table and re-insert the rows you need.
Keeps the Transaction log from getting hammered, your CI getting Fragged and should be quicker too!
It is actually very simple:
DELETE FROM Table1
WHERE ID NOT IN
(SELECT MAX(ID)
FROM Table1
GROUP BY CODE)
Self join solution with a performance test VS cte.
create table codes(
id int IDENTITY(1,1) NOT NULL,
code int null,
CONSTRAINT [PK_codes_id] PRIMARY KEY CLUSTERED
(
id ASC
))
declare #counter int, #code int
set #counter = 1
set #code = 1
while (#counter <= 1000000)
begin
print ABS(Checksum(NewID()) % 1000)
insert into codes(code) select ABS(Checksum(NewID()) % 1000)
set #counter = #counter + 1
end
GO
set statistics time on;
delete a
from codes a left join(
select MIN(id) as id from codes
group by code) b
on a.id = b.id
where b.id is null
set statistics time off;
--set statistics time on;
-- WITH cte AS
-- (select a.id, a.code, ROW_NUMBER() OVER(PARTITION by code ORDER BY id) AS rn
-- from codes a
-- )
-- delete x
-- FROM codes x
-- JOIN cte ON x.id = cte.id
-- WHERE cte.rn > 1
--set statistics time off;
Performance test results:
With Join:
SQL Server Execution Times:
CPU time = 3198 ms, elapsed time = 3200 ms.
(999000 row(s) affected)
With CTE:
SQL Server Execution Times:
CPU time = 4197 ms, elapsed time = 4229 ms.
(999000 row(s) affected)
It's basically done like this:
WITH CTE_Dup AS
(
SELECT*,
ROW_NUMBER()OVER (PARTITIONBY SalesOrderno, ItemNo ORDER BY SalesOrderno, ItemNo)
AS ROW_NO
from dbo.SalesOrderDetails
)
DELETEFROM CTE_Dup WHERE ROW_NO > 1;
NOTICE: MUST INCLUDE ALL FIELDS!!
Here is another example:
CREATE TABLE #Table (C1 INT,C2 VARCHAR(10))
INSERT INTO #Table VALUES (1,'SQL Server')
INSERT INTO #Table VALUES (1,'SQL Server')
INSERT INTO #Table VALUES (2,'Oracle')
SELECT * FROM #Table
;WITH Delete_Duplicate_Row_cte
AS (SELECT ROW_NUMBER()OVER(PARTITION BY C1, C2 ORDER BY C1,C2) ROW_NUM,*
FROM #Table )
DELETE FROM Delete_Duplicate_Row_cte WHERE ROW_NUM > 1
SELECT * FROM #Table

returning total records in cte

With help from article here and recent answers from SO experts I have arrived with the following which will help me efficiently page through a set of records.
I think my last couple of questions are
See how I include the Total Number of Records in the payload at the end
of SQL CTE called 'Total'. Is that how you would do this?
Any other suggestions? Potential areas for being more concise or improvements? Return Total Number of Pages
DECLARE #page_size INT = 5;
DECLARE #page_nbr INT = 4;
DECLARE #search NVARCHAR(MAX) = '';
DECLARE #sort_order INT = 2;
WITH AllProducts
AS
(
SELECT *,
CASE #sort_order
WHEN 1 THEN ROW_NUMBER() OVER ( ORDER BY ProductID )
WHEN 2 THEN ROW_NUMBER() OVER ( ORDER BY ProductName )
END AS 'Seq'
FROM Products
),
Filtered
AS
(
SELECT * FROM AllProducts
WHERE ProductName like '%'+#search+'%'
OR
#search is null
)
SELECT (select COUNT(*) from Filtered) as 'Total', * FROM Filtered
WHERE seq > (#page_nbr - 1) * #page_size
AND seq <= #page_nbr * #page_size
I think there's something wrong in your query: it numbers records (for paging) and after that applies the filter.
It is, for example, possible that you request page 2 of records, but all records with the corresponding seq values could be filtered out in the meantime. So, in this case the query would yield no results, although there may be plenty of records in the table.
In order to fix that, you could do the filtering and record numbering in the same CTE, like this:
DECLARE #page_size INT = 5;
DECLARE #page_nbr INT = 4;
DECLARE #search NVARCHAR(MAX) = '';
DECLARE #sort_order INT = 2;
WITH Filtered AS (
SELECT *,
CASE #sort_order
WHEN 1 THEN ROW_NUMBER() OVER ( ORDER BY ProductID )
WHEN 2 THEN ROW_NUMBER() OVER ( ORDER BY ProductName )
END AS 'Seq'
FROM AllProducts
WHERE ProductName like '%'+#search+'%' OR #search is null
)
SELECT (select COUNT(*) from Filtered) as 'Total', * FROM Filtered
WHERE seq > (#page_nbr - 1) * #page_size
AND seq <= #page_nbr * #page_size

T-Sql count string sequences over multiple rows

How can I find subsets of data over multiple rows in sql?
I want to count the number of occurrences of a string (or number) before another string is found and then count the number of times this string occurs before another one is found.
All these strings can be in random order.
This is what I want to achieve:
I have one table with one column (columnx) with data like this:
A
A
B
C
A
B
B
The result I want from the query should be like this:
2 A
1 B
1 C
1 A
2 B
Is this even possible in sql or would it be easier just to write a little C# app to do this?
Since, as per your comment, you can add a column that will unambiguously define the order in which the columnx values go, you can try the following query (provided the SQL product you are using supports CTEs and ranking functions):
WITH marked AS (
SELECT
columnx,
sortcolumn,
grp = ROW_NUMBER() OVER ( ORDER BY sortcolumn)
- ROW_NUMBER() OVER (PARTITION BY columnx ORDER BY sortcolumn)
FROM data
)
SELECT
columnx,
COUNT(*)
FROM marked
GROUP BY
columnx,
grp
ORDER BY
MIN(sortcolumn)
;
You can see the method in work on SQL Fiddle.
If sortcolumn is an auto-increment integer column that is guaranteed to have no gaps, you can replace the first ROW_NUMBER() expression with just sortcolumn. But, I guess, that cannot be guaranteed in general. Besides, you might indeed want to sort on a timestamp instead of an integer.
I dont think you can do it with a single select.
You can use AdventureWorks cursor:
create table my_Strings
(
my_string varchar(50)
)
insert into my_strings values('A'),('A'),('B'),('C'),('A'),('B'),('B') -- this method will only work on SQL Server 2008
--select my_String from my_strings
declare #temp_result table(
string varchar(50),
nr int)
declare #myString varchar(50)
declare #myLastString varchar(50)
declare #nr int
set #myLastString='A' --set this with the value of your FIRST string on the table
set #nr=0
DECLARE string_cursor CURSOR
FOR
SELECT my_string as aux_column FROM my_strings
OPEN string_cursor
FETCH NEXT FROM string_cursor into #myString
WHILE ##FETCH_STATUS = 0 BEGIN
if (#myString = #myLastString) begin
set #nr=#nr+1
set #myLastString=#myString
end else begin
insert into #temp_result values (#myLastString, #nr)
set #myLastString=#myString
set #nr=1
end
FETCH NEXT FROM string_cursor into #myString
END
insert into #temp_result values (#myLastString, #nr)
CLOSE string_cursor;
DEALLOCATE string_cursor;
select * from #temp_result
Result:
A 2
B 1
C 1
A 1
B 2
Try this :
;with sample as (
select 'A' as columnx
union all
select 'A'
union all
select 'B'
union all
select 'C'
union all
select 'A'
union all
select 'B'
union all
select 'B'
), data
as (
select columnx,
Row_Number() over(order by (select 0)) id
from sample
) , CTE as (
select * ,
Row_Number() over(order by (select 0)) rno from data
) , result as (
SELECT d.*
, ( SELECT MAX(ID)
FROM CTE c
WHERE NOT EXISTS (SELECT * FROM CTE
WHERE rno = c.rno-1 and columnx = c.columnx)
AND c.ID <= d.ID) AS g
FROM data d
)
SELECT columnx,
COUNT(1) cnt
FROM result
GROUP BY columnx,
g
Result :
columnx cnt
A 2
B 1
C 1
A 1
B 2