Computed column formula for row_number function - sql

I am trying to find a formula for a computed column that will return same results as the ROW_NUMBER() function. since I cannot use the ROW_NUMBER() function and save results on the table.
I have a table as below;
ID YEAR
1 2018
2 2018
3 2019
4 2019
5 2020
6 2018
7 2019
I would like a formulae that will compute and assign numbers the rows depending with year as shown below;
ID YEAR COMPUTED COLUMN
1 2018 1
2 2018 2
3 2019 1
4 2019 2
5 2020 1
6 2018 3
7 2019 3

I am trying to find a formula for a computed column that will return same results as the ROW_NUMBER() function.
You just can't use window functions in a computed column; window functions operate on a range of rows (called the window frame), while a computed column has visibility to the row it belongs to only.
I cannot use the ROW_NUMBER() function and save results on the table
While this is technically possible, I would not recommend it. This is derived information, that can be computed on the fly whenever needed. You could use a view instead:
create view myview
as
select
id,
year,
row_number() over (partition by year order by id) rn
from mytable

There is always a way around it.
It's not a good way though...
select
id,
year,
row_number() over(partition by year order by id) rn,
(select count(*) from Table1 t where t.YEAR=d.YEAR and t.ID<=d.ID) PoorRN
from Table1 d
order by id
• This is going to have a very bad performance.
If your table has over a million rows - just go ahead and through it on production and tell us what happened ;)

Update: Based on the comments by OP, I have a new suggestion -
From the original table - create a view and then give numbers to each year
create view original_table_view
as
select *, ROW_NUMBER() over ( partition by [year] order by ID) as numbering
from original_table
Create a new table and apply a Merge on the new table from view
create table New_table (id int, [year] int, numbering int)
Create Merge syntax -
merge new_table t using original_table_view v
on (t.id = v.id) --merge_condition
when matched then update set
t.[year]=v.[year],
t.numbering=v.numbering
when not matched by target
then insert ( id, [year], numbering)
values (v.id, v.[year],numbering)
when not matched by source
then delete;
SO now if in future, if new values are inserted in the original table, you just need to run the merge query to update the new table.
view data -
select * from original_table order by ID
select * from original_table_view order by ID
select * from New_table order by ID
Old Answer:
I confused your query with calculated column instead of the computed column (sorry about that)
You cannot store function as a computed column in the SQL server. However, you can create a view on the table and then call the same.
example -
create table as below
create table year_computed2 (ID int, Year int )
go
insert into year_computed2 values (1,2018 )
insert into year_computed2 values (2,2018 )
insert into year_computed2 values (3,2019 )
insert into year_computed2 values (4,2019 )
insert into year_computed2 values (5,2020 )
insert into year_computed2 values (6,2018 )
insert into year_computed2 values (7,2019 )
go
Now Create a view using Row_number() function (you can also use Rank() and Ntile())
alter view year_computed_view as
select *, ROW_NUMBER() over ( partition by [year] order by ID) as Rn from year_computed2
Now Query this view like below
select * from year_computed_view order by ID
enter image description here
In-case you were talking about the calculated column - below answer is
available
Dense_Rank()
select DISTINCT id, [year], DENSE_RANK() over ( order by [year]) as [Dense_rank]
from year_computed
ORDER BY ID
Output is like:
If Dense_rank does't fulfill then try below two functions -
Rank()
select DISTINCT id, [year], RANK() over ( partition by [year] order by ID) as [Rank]
from year_computed
ORDER BY ID
enter image description here
NTile()
declare #ntile_count int = (select count (distinct [year])from year_computed)
select id, [year], ntile(#ntile_count) over (partition by [year] order by id) as rn from year_computed ORDER BY ID
enter image description here

Related

How to get the last record from the duplicate records in SQL?

I want to get the last record from the duplicate records and want the non-duplicate records also.
As depicted in the below image I want to get row number 4, 5, 7 and 9 in my output.
Here, In the below image the ** Main table** was shown. From which I have to concat first two columns and then from that new column I need the last row of duplicate records and the non-duplicate rows also.
I have tried with the given below SQL code.
DECLARE #dense_rank_demo AS TABLE (
Bid INT,
cid INT,
BCode NVARCHAR(10)
);
INSERT INTO #dense_rank_demo(Bid,cid,BCode)
VALUES(2393,1,'LAX'),(2394,54,'BRK'),(2395,57,'ONT'),(2393,1,'SAN'),(2393,1,'LAX'),(2393,1,'BRK'),(2394,54,'ONT'),(2395,57,'SAN'),(2394,1,'ONT');
SELECT * FROM #dense_rank_demo;
SELECT
CONCAT([Bid],'_',[cid]) as [Key],BCode,DENSE_RANK() over( order by CONCAT([Bid],'_',[cid]))
from #dense_rank_demo
From the SQL code I found that there is no column on which we can apply order by for getting the expected Result.
So that, I have add one column name Id and done some other changes for getting expected output.
Here I am Sharing the code in which I have done some changes.
DECLARE #dense_rank_demo AS TABLE (
ID INT IDENTITY(1,1),
Bid INT,
cid INT,
BCode NVARCHAR(10));
DECLARE #tableGroupKey TABLE
(
dr bigint,
[Key] VARCHAR(50)
)
INSERT INTO #dense_rank_demo(Bid,cid,BCode)
VALUES(2393,1,'LAX'),
(2394,54,'BRK'),
(2395,57,'ONT'),
(2393,1,'SAN'),
(2393,1,'LAX'),
(2393,1,'BRK'),
(2394,54,'ONT'),
(2395,57,'SAN'),
(2394,1,'ONT');
with [drd] as
(
select
concat([Bid],'_',[cid]) as [Key],
BCode,
dense_rank() over(partition by concat([Bid],'_',[cid]) order by ID) as
[dr]
from #dense_rank_demo
)
INSERT INTO #tableGroupKey(dr,[Key])
select MAX(dr) dr,[Key]
from [drd]
GROUP BY [Key]
SELECT *,CONCAT(Bid,'_',cid) AS [Key] FROM #dense_rank_demo [drd]
select Result.* FROM
(
SELECT *,CONCAT(Bid,'_',cid) AS [Key] ,
dense_rank() over(partition by concat([Bid],'_',[cid]) order by ID) as
[dr]
FROM #dense_rank_demo [drd]
) as [Result]
INNER JOIN #tableGroupKey [gk] ON
[Result].[Key] = [gk].[Key] AND [gk].dr = [Result].dr
ORDER BY [Result].ID
The Expected output is as below:
[Output]
The issue here is the ordering of the values within the result set. If you had a specific order to use, this would be fairly straightforward - however, you are relying on dense_rank() to consistently and reliably returning the same values for those in the table. If we could use, for example, the alpha sort on the BCode column then it would be simple to use a CTE and get the last/first one:
with [drd] as
(
select
concat([Bid],'_',[cid]) as [Key],
BCode,
dense_rank() over(partition by concat([Bid],'_',[cid]) order by Bcode desc) as [dr]
from #dense_rank_demo
)
select *
from [drd]
where dr = 1
As the order of dense_rank() is not guaranteed in your code, I'm not sure that this is feasible in a scalable way.
See this for more information about reliably sorted results: how does SELECT TOP works when no order by is specified?
you need one row per BID i.e the latest one, But you have not specified the logic of the last row. Usually, last row is the most recent added one and so there is usually a timestamp that can be used to pick the latest row where there are duplicates.
The code below uses the Bcode as a part of the order by calause, that means it will automatically pick the row that has the lowest alphabet order, which not be the row that you expect unless thats how you define the most recent row. You would in general need to play with the order by clause based on your needs but the timestamp makes most sense
row_number() generates the values 1-n based on the partition by, incase there is a tie, and you need both rows, then you need to use the dense_rank instead. Based on your needs you can adjust that
with main as (
select
concat(Bid, cid) as key,
row_number() over(partition by concat(Bid, cid) order by Bcode) as rank_
from <table_name>
)
select * from main where rank_ = 1

How could I limit a query in postgresql to 50% of the data when calling for the table

I have to bring 50% of the data contained in the data table
that is, if I have 10 lines, only bring 5 but without using limit 5, since it has to be constant depending on the number of lines
Select * from data limit 50% this generates an error
how could it be done?
If you want a random sample and an approximate 50% is sufficient, you can use tablesample:
select t.*
from t tablesample bernoulli (50);
Otherwise, you can define the sample yourself to get exactly (or off-by-1) 50%:
select t.*
from (select t.*, ntile(2) over (order by random()) as tile
from t
) t
where tile = 1;
If you have a unique column then you can use percent_rank() window function to get your desired data.
Here we have create a table with ID and code column then inserted four rows. Below query is returning 2 rows. If there were five rows it would have returned 3 rows.
Schema and insert statements:
create table temp (id int, code varchar(50));
insert into temp values(1,'a');
insert into temp values(2,'b');
insert into temp values(3,'c');
insert into temp values(4,'d');
Query:
SELECT
ID,
Code
FROM (
SELECT
ID,
COde,
percent_rank() over (order by id desc) as pct_rank
FROM temp
) t
WHERE pct_rank <= 0.50;
Output:
id
code
4
d
3
c
db<fiddle here
You can also use cume_dist() same way.

Gow to select unique id when ids are same and have different string values in variable

I am working with Microsoft SQL Server. Now I have many same ids and the values of other variables associated with them may or may not be same.
I just want to select one unique id every time (I don't care about values of other variables). Values of other variables can be anything. I am just focused on selecting unique id and any values associated with any of the duplicate ids.
You could use the row_number function to assign a unique number to every ID, and then query just one of them:
SELECT id, variablea, variableb
FROM (SELECT id, variablea, variableb,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY varaiblea) AS rn
FROM mytable) t
WHERE rn = 1
Another option is to use the WITH TIES clause in concert with Row_Number()
Note: No Extra Field.
Example
Declare #YourTable Table ([id] int,[variableA] varchar(50),[variableB] varchar(50))
Insert Into #YourTable Values
(1,'xyz','abc')
,(1,'fgh','rty')
,(2,'qwe','ui')
,(3,'jk','vbn')
,(3,'asd','ty')
,(3,'fgh','po')
Select top 1 with ties *
From #YourTable
Order By Row_Number() over (Partition By ID Order by variableA)
Returns
id variableA variableB
1 fgh rty
2 qwe ui
3 asd ty

Get every second row as a result table in t-sql

I'm looking for a t-sql script that returns a list, that shows every second value from a grouping from Table1.
For example I have the following data (Table1) and want the desired result-list:
Table1:
Customer Quantity
A 5
A 8 (*)
B 3
B 5 (*)
B 11
C 7
D 4
D 23 (*)
Desired retult-list:
Customer Quantity
A 8
B 5
D 23
I think about doing something something with 'select distinct and left outer join', but I can't get it to work. Possibly I need an row numbering, but can't figure out how to do it. Anyone can help me?
Beneath is the script I used to make and fill Table1:
CREATE TABLE Table1
(Customer nvarchar(1) NULL,
Quantity int NOT NULL);
INSERT INTO Table1(Customer,Quantity)
VALUES
('A',5),
('A',8),
('B',3),
('B',5),
('B',11),
('C',7),
('D',4),
('D',23);
This can be done quite easily using the row_number window function:
SELECT customer, quantity
FROM (SELECT customer, quantity,
ROW_NUMBER() OVER (PARTITION BY customer
ORDER BY quantity ASC) AS rn
FROM table1) t
WHERE rn = 2
You can use ROW_NUMBER and a CTE:
WITH data AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Quantity) rn
FROM #Table1
)
SELECT Customer, Quantity
FROM data
WHERE rn = 2
How it works:
Using ROW_NUMBER() will assign a sequential number to each row based on what's specified in OVER (). In OVER i specify to PARTITION the rows on customer, that means each group of data on same customer will be numberered separately. Then ORDER BY Quantity mean it should order the data based on quantity for each customer - so i can get the 2nd row for each customer ordered by quantity.

remove duplicate records with a criteria

I am using a script which requires only unique values. And I have a table which has duplicates like below, i need to keep only unique values (first occurrence) irrespective of what is present inside the brackets.
can I delete the records and keep the unique records using a single query?
Input table
ID Name
1 (Del)testing
2 (Del)test
3 (Delete)testing
4 (Delete)tester
5 (Del)tst
6 (Delete)tst
So the output tables should be something like
Input table
ID Name
1 (Del)testing
2 (Del)test
3 (Delete) tester
4 (Del)tst
SELECT DISTINCT * FROM FOO;
It depends how much data you have to retrieve, if you only have to change Delete -> Del you can try with REPLACE
http://technet.microsoft.com/en-us/library/ms186862.aspx
also grouping functions should help you
I don't think this would be easy query
Assumption: The name column always has all strings in the format given in the sample data.
Try this:
;with cte as
(select *, rank() over
(partition by substring(name, charindex(')',name)+1,len(name)+1 - charindex(')',name))
order by id) rn
from tbl
),
filtered_cte as
(select * from cte
where rn = 1
)
select rank() over (partition by getdate() order by id,getdate()) id , name
from filtered_cte
How this works:
The first CTE cte uses rank() to rank the occurrence of the string outside brackets in the name column.
The second CTE filtered_cte only returns the first row for each occurence of the specified string. In this step, we get the expected results, but not in the desired format.
In this step we partition by and order by the getdate() function. This function is chosen as a dummy to give us continuous values for the id column while using the rank function as we did in step 1.
Demo here.
Note that this solution will return filtered values, but not delete anything in the source table. If you wish, you can delete from the CTE created in step 1 to remove data from the source table.
First use this update to make them uniform
Update table set name = replace(Name, '(Del)' , '(Delete)')
then delete the repetitive names
Delete from table where id in
(Select id from (Select Row_Number() over(Partition by Name order by id) as rn,* from table) x
where rn > 1)
First create the input date table
CREATE TABLE test
(ID int,Name varchar(20));
INSERT INTO test
(`ID`, `Name`)
VALUES
(1, '(Del)testing'),
(2, '(Del)test'),
(3, '(Delete)testing'),
(4, '(Delete)tester'),
(5, '(Del)tst'),
(6, '(Delete)tst');
Select Query
select id, name
from (
select id, name ,
ROW_NUMBER() OVER(PARTITION BY substring(name,PATINDEX('%)%',name)+1,20) ORDER BY name) rn
from test ) t
where rn= 1
order by 1
SQL Fiddle Link
http://www.sqlfiddle.com/#!6/a02b0/34