I have a quick question and below is my sample query:
select
ROW_NUMBER()OVER(PARTITION BY diff ORDER BY NEWID())
, Account, Ticker, Client, Diff, Allocation
from #diffAdj
where
Account = #Account
and Ticker = #Ticker
and RemainingShares - isnull(Allocation, 0) > 0
order by NEWID()
Just want to ask a question on the NEWID(). The NEWID is also used to get the row_Number partitioned, which should be executed before order by. So is my understanding as below correct:
As of each row looped, a newid is generated and stored in the new column created
This newid will be used to sort rows to generate row_number
The same new_id column will be used to sort rows in the order stage, or will it discard the newid from previous step and loop through each row and generate the newid column again, and sort it again usingthe new column.
I'm a very very very beginner in sql, appreciate everyone's help.
Related
This is a follow-up question to Retrieving last record in each group from database - SQL Server 2005/2008
In the answers, this example was provided to retrieve last record for a group of parameters (example below retrieves last updates for each value in computername):
select t.*
from t
where t.lastupdate = (select max(t2.lastupdate)
from t t2
where t2.computername = t.computername
);
In my case, however, "lastupdate" is not unique (some updates come in batches and have same lastupdate value, and if two updates of "computername" come in the same batch, you will get non-unique output for "computername + lastupdate").
Suppose I also have field "rowId" that is just auto-incremental. The mitigation would be to include in the query another criterion for a max('rowId') field.
NB: while the example employs time-specific name "lastupdate", the actual selection criteria may not be related to the time at all.
I, therefore, like to ask, what would be the most performant query that selects the last record in each group based both on "group-defining parameter" (in the case above, "computername") and on maximal rowId?
If you don't have uniqueness, then row_number() is simpler:
select t.*
from (select t.*,
row_number() over (partition by computername order by lastupdate, rowid desc) as seqnum
from t
) t
where seqnum = 1;
With the right indexes, the correlated subquery is usually faster. However, the performance difference is not that great.
I have a requirement in a report to show alternate colors in row and for this I need to generate sequential numbers in a SQL Select statement (see example below) to use later while displaying rows.
I am trying row_number and some other techniques its not working. This should not be done using script, I should be able to generate within Select statement. Appreciate any help.
RowNumber - 1, Otherdata - Something1
RowNumber - 2, Otherdata - Something2
RowNumber - 3, Otherdata - Something3
RowNumber - 4, Otherdata - Something4
RowNumber - 5, Otherdata - Something5
There is no need to avoid Analytic Functions if your database supports them e.g ROW_NUMBER()
SELECT
ROW_NUMBER() OVER (ORDER BY [<PRIMARYKEY_COLUMN_NAME>]) AS Number
FROM
[<TABLE_NAME>]
The syntax is Func([ arguments ]) OVER (analytic_clause) you need to focus on OVER (). This last parentheses make partition(s) of your rows and apply the Func() on these partitions one by one. In above code we have only single set/partition of rows. Therefore the generated sequence is for all the rows.
You can make multiple set of your data and generate sequence number for each one in a single go. For example if you need generate sequence number for all the set of rows those have same categoryId. You just need to add Partition By clause like this (PARTITION BY categoryId ORDER BY [<PRIMARYKEY_COLUMN_NAME>]).
Remember that after FROM you can also use another extra ORDER BY to sort your data differently. But it has no effect on the OVER ()
If sort column contains unique values, you can also do it without the new built-in Row_Number() function, by using a subquery based on a sort column.
Select [other stuff],
(Select count(*) From table
where sortCol < a.sortCol) rowNum
From table a
Order by sortCol
change < to <= to start counting at 1 instead of 0
We have a stored proc to return set of records based on Page Number and Page Size. Sorting is being done by a column "CreateDateTime". If value of CreatedDateTime is same for all the records, it is giving the results sets in different orders. The behavior is inconsistent.
Some Portion of Code:
SET #FirstRec = ( #PageNo - 1 ) * #PageSize
SET #LastRec = ( #PageNo *#PageSize + 1 )
SELECT *
FROM
(
select ROW_NUMBER() OVER (ORDER BY CreatedDateTime)
AS rowNumber,EMPID
From Employee
) as KeyList
WHERE rowNumber > #FirstRec AND rowNumber < #LastRec
Please provide some inputs on this.
This is "by design"
SQL Server (or any RDBMS) does not guarantee results to be returned in a particular order if no ORDER BY clause was specified. Some people think that the rows are always returned in clustered index order or physical disk order if no order by clause is specified. However, that is incorrect as there are many factors that can change row order during query processing. A parallel HASH join is a good example for an operator that changes the row order.
If you specify an ORDER BY clause, SQL Server will sort the rows and return them in the requested order. However, if that order is not deterministic because you have duplicate values, within each "value group" the order is "random" for the same reasons mentioned above.
The only way to guarantee a deterministic order is to include a guaranteed unique column or column group (for example the Primary Key) in the ORDER BY clause.
If you need a reproducible order, then you need to ensure that you specify enough columns in your ORDER BY, such that (the combination of all columns listed in the ORDER BY) is unique for every row. E.g. add EmpID (if that's a primary key) to act as a "tie-breaker" between rows with equal CreatedDateTime values.
If the values in the column you ORDER BY are all the same, then there is no guarantee that they will be retrieved in the same order. You can ORDER BY a second column - perhaps the unique id if there is one? (I have called it UniqueId in the code below). This would ensure the order is always the same.
SELECT *
FROM
(
select ROW_NUMBER() OVER (ORDER BY CreatedDateTime, UniqueId)
AS rowNumber,EMPID
From Employee
) as KeyList
WHERE rowNumber > #FirstRec AND rowNumber < #LastRec
I have a table that has a unique non-clustered index and 4 of the columns are listed in this index. I want to update a large number of rows in the table. If I do so, they will no longer be distinct, therefore the update fails because of the index.
I am wanting to disable the index and then delete the oldest duplicate rows. Here's my query so far:
SELECT t.itemid, t.fieldid, t.version, updated
FROM dbo.VersionedFields w
inner JOIN
(
SELECT itemid, fieldid, version, COUNT(*) AS QTY
FROM dbo.VersionedFields
GROUP BY itemid, fieldid, version
HAVING COUNT(*) > 1
) t
on w.itemid = t.itemid and w.fieldid = t.fieldid and w.version = t.version
The select inside the inner join returns the right number of records that we want to delete, but groups them so there is actually twice the amount.
After the join it shows all the records but all I want to delete is the oldest ones?
How can this be done?
If you say SQL (Structured Query Language), but really mean SQL Server (the Microsoft relatinonal database system) by it, and if you're using SQL Server 2005 or newer, you can use a CTE (Common Table Expression) for this purpose.
With this CTE, you can partition your data by some criteria - i.e. your ItemId (or a combination of columns) - and have SQL Server number all your rows starting at 1 for each of those partitions, ordered by some other criteria - i.e. probably version (or some other column).
So try something like this:
;WITH PartitionedData AS
(
SELECT
itemid, fieldid, version,
ROW_NUMBER() OVER(PARTITION BY ItemId ORDER BY version DESC) AS 'RowNum'
FROM dbo.VersionedFields
)
DELETE FROM PartitionedData
WHERE RowNum > 1
Basically, you're partitioning your data by some criteria and numbering each partition, starting at 1 for each new partition, ordered by some other criteria (e.g. Date or Version).
So for each "partition" of data, the "newest" entry has RowNum = 1, and any others that belongs into the same partition (by means of having the same partitino values) will have sequentially numbered values from 2 up to however many rows there are in that partition.
If you want to keep only the newest entry - delete anything with a RowNum larger than 1 and you're done!
In SQL Server 2005 and above:
WITH q AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY itemid, fieldid, version ORDER BY updated DESC) AS rn
FROM versionedFields
)
DELETE
FROM q
WHERE rn > 1
Try something like:
DELETE FROM dbo.VersionedFields w WHERE w.version < (SELECT MAX(version) FROM dbo.VersionedFields)
Ofcourse, you'd want to limit the MAX(version) to only the versions of the field you're wanting to delete.
You probably need to look at this Stack Overflow answer (delete earlier of duplicate rows).
Essentially the technique uses grouping (or optionally, windowing) to find the minimum id value of a group in order to delete it. It may be more accurate to delete rows where the value <> max(row identifier).
So:
Drop unique index
Load data
Delete data using the grouping mechanism (ideally in a transaction, so that you can rollback if there is a mistake), then commit
Recreate the index.
Note that recreating an index on a big table can take a long time.
I am trying to update a table in my database with another row from another table. I have two parameters one being the ID and another being the row number (as you can select which row you want from the GUI)
this part of the code works fine, this returns one column of a single row.
(SELECT txtPageContent
FROM (select *, Row_Number() OVER (ORDER BY ArchiveDate asc) as rowid
from ARC_Content Where ContentID = #ContentID) as test
Where rowid = #rowID)
its just when i try to add the update/set it won't work. I am probably missing something
UPDATE TBL_Content
Set TBL_Content.txtPageContent = (select txtPageContent
FROM (select *, Row_Number() OVER (ORDER BY ArchiveDate asc) as rowid
from ARC_Content Where ContentID = #ContentID) as test
Where rowid = #rowID)
Thanks for the help! (i have tried top 1 with no avail)
I see a few issues with your update. First, I don't see any joining or selection criteria for the table that you're updating. That means that every row in the table will be updated with this new value. Is that really what you want?
Second, the row number between what is on the GUI and what you get back in the database may not match. Even if you reproduce the query used to create your list in the GUI (which is dangerous anyway, since it involves keeping the update and the select code always in sync), it's possible that someone could insert or delete or update a row between the time that you fill your list box and send that row number to the server for the update. It's MUCH better to use PKs (probably IDs in your case) to determine which row to use for updating.
That said, I think that the following will work for you (untested):
;WITH cte AS (
SELECT
txtPageContent,
ROW_NUMBER() OVER (ORDER BY ArchiveDate ASC) AS rowid
FROM
ARC_Content
WHERE
ContentID = #ContentID)
UPDATE
TC
SET
txtPageContent = cte.txtPageContent
FROM
TBL_Content TC
INNER JOIN cte ON
rowid = #rowID