I am trying to get a row number of the row. Since the table doesn't have any id column, I have used ROW_NUMBER() without any order which is shown below.
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS SNO, *
FROM [table1]
Now the challenge is i need to find a row with a condition which is just a select statement with where clause but with a original row number.
SELECT TOP 1 *
FROM table1
WHERE [Total Sales] = 2555
This statement returns a single record. I have tried to use INTERSECT to combine both statements to get result with row number.
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 100)) AS SNO, *
FROM [table1]
INTERSECT
SELECT TOP 1 *
FROM table1
WHERE [Total Sales] = 2555
Of course, this throws errors since number of columns are different. So what is the correct way to get the actual row number ?
When you run this query:
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS SNO, t.*
FROM [table1] t;
The SNO values are unstable. That means that the same query run multiple times might return different numbers. Sorting in SQL is not stable. That means that identical keys can be in an arbitrary order when the query is run multiple times. Why? SQL tables and result sets represent unordered sets. There is nothing to base a stable sort on.
The simplistic answer to your question is to use a subquery:
SELECT t.*
FROM (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS SNO, t.*
FROM [table1] t
) t
WHERE [Total Sales] = 2555;
However, the real answer is that you should be using multiple columns to create a stable sort, if you want to use this value for more than one query.
SQL does not have an initial "row number" for the entries. The table order shown is all based on the query results. If you are looking to keep them in the order they are put into the DB then maybe add a time stamp that's generated with a trigger and attached to the row when it's inserted. Then using this times tamp you can have them sorted by that.
What's the primary key if there is no I'd?
Related
I have a large data set with about 100 million rows that I want to 'compress' the data set and get a 1% sample of the entire dataset while ensuring relativity.
How can such query be implemented?
Step 1: create the helper table
You can use aggregation to group records by visit_id, and CROSS JOIN with a query that computes the total number of records in the table to compute the distribution percent:
CREATE TABLE my_helper AS
SELECT
t.visit_number,
COUNT(*) visit_count,
SUM(t.purchase_id) sum_purchase,
COUNT(*)/total.cnt distribution
FROM
mytable t
CROSS JOIN (SELECT COUNT(*) cnt FROM mytable) total
GROUP BY t.visit_number
Step 2: sample the main table using the helper table
Within a subquery, you can use ROW_NUMBER() OVER(PARTITION BY visit_number ORDER BY RANDOM()) to assign a random rank to each record within groups of records sharing the same visit_id. Then, in the outer query, you can join on the helper table to select the corect amount of records for each visit_id:
SELECT x.*
FROM (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY visit_number ORDER BY RANDOM()) rn
FROM mytable t
) x
INNER JOIN my_helper h ON h.visit_number = x.visit_number
WHERE x.rn <= 1000000 * h.distribution
Side notes:
this only works if there are indeed more than 1 million record in the source table
the exact number of records in the output might be slightly below or above 1 million (depending on the distribution in the original table)
it should be possible to combine both queries into a single one, which would avoid the need to use a helper table
This is doable. A quick way is to take every nth record only.
1) order by a random column (probably ID)
2) apply a nownum() attribute
3) apply a mod(rownum) = 0 on whatever percent makes sense (e.g. 1% would be rownum mod 100)
You may need steps 1/2 in a sub query and step 3 on the outside.
Enjoy and good luck!
I have the following code which is taking a looong time to get executed. What I need to do is select the column having row number equals 1 after partitioning it by three columns (col_1, col_2, col_3) [which are also the key columns] and ordering by some columns as mentioned below. The number of records in the table is around 90 million. Am I following the best approach or is there any other better one?
with cte as (SELECT
b.*
,ROW_NUMBER() OVER ( PARTITION BY col_1,col_2,col_3
ORDER BY new_col DESC, new_col_2 DESC, new_col_3 DESC ) AS ROW_NUMBER
FROM (
SELECT
*
,CASE
WHEN update_col = ' ' THEN new_update_col
ELSE update_col
END AS new_col_1
FROM schema_name.table_name
) b
)
select top 10 * from cte WHERE ROW_NUMBER=1
Currently you are applying CASE on different columns which is impacting all rows in the database table. CASE (String Comparison) Is a costly method.
At the end, you are keeping only records with ROW NUMBER = 1. If I guess this filter keeping Half of your all records, this will increase the query execution time if you filter (Generate ROW NUMBER First and Keep Rows with RN=1) first and then apply CASE method on columns.
I have a table that look like this:
The problem is I need to get the last record with duplicates in the column "NRODENUNCIA".
You can use MAX(DENUNCIAID), along with GROUP BY... HAVING to find the duplicates and select the row with the largest DENUNCIAID:
SELECT MAX(DENUNCIAID), NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
FROM YourTable
GROUP BY NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
HAVING COUNT(1) > 1
This will only show rows that have at least one duplicate. If you want to see non-duplicate rows too, just remove the HAVING COUNT(1) > 1
There are a number of solutions for your problem. One is to use row_number.
Note that I've ordered by DENUNCIID in the OVER clause. This defines the "Last Record" as the one that has the largest DENUNCIID. If you want to define it differently you'd need to change the field that is being ordered.
with dupes as (
SELECT
ROW_NUMBER() OVER (Partition by NRODENUNCIA ORDER BY DENUNCIID DESC) RN,
*
FROM
YourTable
)
SELECT * FROM dupes where rn = 1
This only get's the last record per dupe.
If you want to only include records that have dupes then you change the where clause to
WHERE rn =1
and NRODENUNCIA in (select NRODENUNCIA from dupes where rn > 1)
My table looks like the following:
I want to be able to select a number of records based on the auditidentity column.
I tried using LIMIT but this only limits the number of rows returned but each audit can have an undetermined number of rows.
I tried using DESTINCT but this returned only one row from each record.
In short,
Number of rows != number of audits
How do I select a number of them?
Edit: For clarity, each coloured outline is regarded as an "Audit".
You could do this:
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY auditidentity ORDER BY auditidentity) AS itemNumber,
DENSE_RANK() OVER(ORDER BY auditidentity) AS auditNbr,
tbl.*
FROM #tbl AS tbl
) AS tbl2
WHERE
tbl2.itemNumber<=100
AND tbl2.auditNbr<=2
Thanks for everyones help, a colleague of mine found the answer:
SELECT * FROM smp_audit_detail WHERE auditidentity IN
(SELECT DISTINCT auditidentity FROM smp_audit_detail LIMIT 100)
I am creating an SP which gives some result by applying distinct on it, now I want to implement sever side paging, so I tried using Row_number on distinct result like:
WITH CTE AS
(
SELECT ROW_NUMBER() OVER(ORDER BY tblA.TeamName DESC)
as Row,tblA.TeamId,tblA.TeamName,tblA.CompId,tblA.CompName,tblA.Title,tblA.Thumbnail,tblA.Rank,tblA.CountryId,tblA.CountryName
FROM
(
--The table query starts with SELECT
)tblA
)
SELECT CTE.* FROM CTE
WHERE CTE.Row BETWEEN #StartRowIndex AND #StartRowIndex+#NumRows-1
ORDER BY CTE.CountryName
but rows are first assigned RowNumber then distinct get applied that is why I am getting duplicate values, how to get distinct rows first then get row numbers for the same.
Any solution on this? Am I missing something?
need answer ASAP.
thanks in advance!
Don't you need to add "partition by" to your ROW_NUMBER statement?
ROW_NUMBER() OVER(Partition by ___, ___, ORDER BY tblA.TeamName DESC)
In the blank spaces, place the column names you would like to create a new row number for. Duplicates will receive a number that is NOT 1 so you might not need the distinct.
To gather the unique values you could write a subquery where the stored procedure only grabs the rows with a 1 in them.
select * from
(
your code
) where row = 1
Hope that helps.
I'm not sure why you're doing this:
WHERE CTE.Row BETWEEN #StartRowIndex AND #StartRowIndex+#NumRows-1