Distinct rows in a table in sql

Distinct rows in a table in sql - sql

I have a table with multiple rows of the same member id. I need only distinct rows based on 2 unique columns
Ex: there are 100 different customers, the table has 1000 rows because every customer has multiple cities and segments assigned to him.
I need 100 distinct rows for these customers depending on a unique segment and city combination. There is no specific requirement for this combination, just the first from the table is fine.
So, currently the table is somewhat like this,
Hope this helps.

use row_number()
select * from (select *,row_number() over(partition by memberid order by sales) rn
from table_name
) a where a.rn=1

Handy sql-server top(1) with ties syntax for that
select top(1) with ties t.*
from table_name t
order by row_number() over(partition by memberid order by sales)
As you have no paticular requirement for which exactly row to select, any column will do at order by, it can be null as well
select top(1) with ties t.*
from table_name t
order by row_number() over(partition by memberid order by (select null))

The simplest way to do this is to use the ROW_NUMBER() OVER(GROUP BY...) syntax. You have no need to use an order by, since you want an arbitrary row, but only one, for each member.
Since you need only the expected data, and not the Row_Number value, make sure that you detail the fields returned, like below:
SELECT
MemberId,
city,
segment,
sales
FROM (
SELECT *
ROW_NUMBER() OVER (GROUP BY MemberId) as Seq
FROM [Status]
) src
WHERE Seq = 1

Related

How do I select 1 [oldest] row per group of rows, given multiple groups?

Let's say we have the database table below, called USER_JOBS.
I'd like to write an SQL query that reflects this algorithm:
Divide the whole table in groups of rows defined by a common USER_ID (in the example table, the 2 resulting groups are colored yellow & green)
From each group, select the oldest row (according to SCHEDULE_TIME)
From this example table, the desired SQL query would return these 2 rows:

You can use ranking function (supported in most RDBS):
SELECT *
FROM
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY USER_ID ORDER BY SCHEDULE_TIME DESC) AS RowID
FROM [table]
)
WHERE RowID = 1

WITH Ranked AS (
SELECT
RANK() OVER (PARTITION BY User_ID ORDER BY ScheduleTime DESC) as Ranking,
*
FROM [table_name]
)
SELECT Status, Sob_Type, User_ID, TimeStamp FROM ranking WHERE Ranks = 1;

SQL query to select up to N records per criteria

I was wondering if it's possible to use SQL (preferably snowflake) to select up to N records given certain criteria.
To illustrate:
Lets say I have a table with 1 million records, containing full names and phone numbers.
There's no limits on the amount of phone numbers that can be assigned to X person, but I only want to select up to 10 numbers per person, even if the person has more than 10.
Notice I don't want to select just 10 records, I want the query to return every name in the table, I only want to ignore extra phone numbers when the person already has 10 of them.
Can this be done?

You can use window functions to solve this greatest-n-per-group problem:
select t.*
from (
select
t.*,
row_number() over(partition by name order by phone_number) rn
from mytable t
) t
where rn <= 10
Note that you need an ordering column to define what "top 10" actually means. I assumed phone_number, but you can change that to whatever suits your use case best.
Better yet: as commented by waldente, snowflake has the qualify syntax, which removes the need for a subquery:
select t.*
from mytable t
qualify row_number() over(partition by name order by phone_number) <= 10

This query will help your requirement:
select
full_name,
phonenumber
from
(select
full_name,
phonenumber,
ROW_NUMBER() over (partition by phonenumber order by full_name desc) as ROW_NUMBER from sample_tab) a
where
a.row_number between 1 and 10
order by
full_name asc,
phonenumber desc;
using Snowflake Qualify function:
select
full_name,
phonenumber
from
sample_tab qualify row_number() over (partition by phonenumber order by full_name) between 1 and 10
order by
full_name asc ,
phonenumber desc;

MS SQL add max()-1 to qyery

how to add to the query max(o.Acct)-1 rows. I need to visualize the last two o.Acct rows. My query is currently showing only the max(o.Acct)
SELECT Max(o.Acct) AS [MaxAcct],o.ObjectID,o.Opertype
FROM Operations o
GROUP By o.ObjectID,o.Opertype

If you want to see the last two rows (per group), you're better off using ROW_NUMBER() rather than GROUP BY.
SELECT
*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY ObjectID,
Opertype
ORDER BY Acct DESC
)
AS sequence_id
FROM
Operations
)
sortedOperations
WHERE
sequence_id <= 2
ORDER BY
ObjectID,
Opertype,
Acct

If you want the last two of something, I'm thinking order by and top. Something like this:
select top (2) o.*
from Operations o
order by o.acct desc;

How can I do "SELECT" query with "WHERE" condition on column that does not exist in the table in SQL Server

To understand the question, I'll show an example: my columns in the Students table are:
stuID, cityID, Name, updateDate
and my SELECT is:
SELECT
ROW_NUMBER() OVER (PARTITION BY cityID ORDER BY updateDate DESC) AS rownumber
stuID,
cityID,
Name
FROM
Students
WHERE
rownumber = 1
No matter - why I wish to make such a query, this is only example, but how can I put in the "WHERE" condition on rownumber????

The WHERE Clause will be evaluated immediately after FROM Clause,so SQL has no idea when you refer to somevalue you refered in select
Use CTE/SubQuery if you want to apply predicate to RowNumber :
;With cte
as
(
SELECT
ROW_NUMBER() OVER (PARTITION BY cityID ORDER BY updateDate DESC) AS rownumber
stuID,
cityID,
Name
FROM Students
)
select * from cte where rownumber=1
Below are the logical querying phases which dictates how the clauses defined in one Phase are made available to the clauses in next phases..
1. FROM
2. ON
3.OUTER
4.WHERE
5.GROUP BY
6.CUBE | ROLLUP
7.HAVING
8. SELECT
9. DISTINCT
10. TOP
11. ORDER BY
As you can see ,RowNumber you have defined in select clause which will be available only to next phases after select
Below is the Logical query processing flow chart for each clause :Itzik Ben-Gan

Any aliases and calculated fields are available if you put your query into subquery or CTE:
select *
from
(
SELECT
ROW_NUMBER() OVER (PARTITION BY cityID ORDER BY updateDate DESC) AS rownumber
stuID,
cityID,
Name
FROM Students
) s
WHERE rownumber = 1

How to select a row based on its row number?

I'm working on a small project in which I'll need to select a record from a temporary table based on the actual row number of the record.
How can I select a record based on its row number?

A couple of the other answers touched on the problem, but this might explain. There really isn't an order implied in SQL (set theory). So to refer to the "fifth row" requires you to introduce the concept
Select *
From
(
Select
Row_Number() Over (Order By SomeField) As RowNum
, *
From TheTable
) t2
Where RowNum = 5
In the subquery, a row number is "created" by defining the order you expect. Now the outer query is able to pull the fifth entry out of that ordered set.

Technically SQL Rows do not have "RowNumbers" in their tables. Some implementations (Oracle, I think) provide one of their own, but that's not standard and SQL Server/T-SQL does not. You can add one to the table (sort of) with an IDENTITY column.
Or you can add one (for real) in a query with the ROW_NUMBER() function, but unless you specify your own unique ORDER for the rows, the ROW_NUMBERS will be assigned non-deterministically.

What you're looking for is the row_number() function, as Kaf mentioned in the comments.
Here is an example:
WITH MyCte AS
(
SELECT employee_id,
RowNum = row_number() OVER ( order by employee_id )
FROM V_EMPLOYEE
ORDER BY Employee_ID
)
SELECT employee_id
FROM MyCte
WHERE RowNum > 0

There are 3 ways of doing this.
Suppose u have an employee table with the columns as emp_id, emp_name, salary. You need the top 10 employees who has highest salary.
Using row_number() analytic function
Select * from
( select emp_id,emp_name,row_number() over (order by salary desc) rank
from employee)
where rank<=10
Using rank() analytic function
Select * from
( select emp_id,emp_name,rank() over (order by salary desc) rank
from employee)
where rank<=10
Using rownum
select * from
(select * from employee order by salary desc)
where rownum<=10;

This will give you the rows of the table without being re-ordered by some set of values:
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT '1')) AS RowID, * FROM #table

If using SQL Server 2012 you can now use offset/fetch:
declare #rowIndexToFetch int
set #rowIndexToFetch = 0
select
*
from
dbo.EntityA ea
order by
ea.Id
offset #rowIndexToFetch rows
fetch next 1 rows only

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Distinct rows in a table in sql - sql

use row_number() select * from (select *,row_number() over(partition by memberid order by sales) rn from table_name ) a where a.rn=1

Related

How do I select 1 [oldest] row per group of rows, given multiple groups?

SQL query to select up to N records per criteria

MS SQL add max()-1 to qyery

How can I do "SELECT" query with "WHERE" condition on column that does not exist in the table in SQL Server

How to select a row based on its row number?

Categories

Resources