How to get the Distinct Record in Sql Server

How to get the Distinct Record in Sql Server - sql

How can I display all records in a single coulmn if the name is duplicate in sql server.
SELECT * INTO #temp
FROM (
Select 'S1' Name, '1' Age, 'A' X, 'B' Y UNION ALL
Select 'S1', '1', '', 'B'
) A
Select *
From #temp
[Output]
The expected result is:

The rules for the expected output aren't clear. It could be the "last" row based on some order, or each column returns the maximum value in a group.
If the maximum value is needed, the following should work:
SELECT Name, Max(Age) as Age, Max(X) as X, Max(Y) as Y
FROM SomeTable
GROUP BY Name
If the "last" row is needed, there must be a way to order the results. Table rows have no implicit order. Assuming there's an incrementing ID, one could use ROW_NUMBER to find and return the latest row per group:
with rows as
(
SELECT ID,Name,Age,X,Y,ROW_NUMBER(PARTITION BY Name ORDER BY ID DESC) as RN
FROM SomeTable
)
SELECT Name,Age,X,Y
FROM rows
WHERE RN=1
This will split (partition) the data by name and calculate a row_number based on descending order inside each partition. Since the rows are ordered by ID descending, the first row will the the latest one.

Related

Getting MAX of a column and adding one more

I'm trying to make an SQL query that returns the greatest number from a column and its respective id.
For more information I have two columns ID and NUMBER. Both of them have 2 entries and I want to get the highest number with the ID next to it. This is what I tried but didn't success.
SELECT ID, MAX(NUMBER) AS MAXNUMB
FROM TABLE1
GROUP BY ID, MAXNUMB;
The problem I'm experiencing is that it just shows ALL the entries and if I add a "where" expression it just shows the same (all entries [ids+numbers]).
Pd.: Yes, I got what I wanted but only with one column (number) if I add another column (ID) to select it "brokes".

Try:
SELECT
ID,
A_NUMBER
FROM TABLE1
WHERE A_NUMBER = (
SELECT MAX(A_NUMBER)
FROM TABLE1);
Presuming you want the IDs* of the row with the highest number (and not, instead, the highest number for each ID -- if IDs were not unique in your table, for example).
* there may be more than one ID returned if there are two or more IDs with equal maximum numbers

you can try this
Select ID,maxNumber
From
(
SELECT
ID,
(Select Max(NUMBER) from Tmp where Id = t.Id) maxNumber
FROM
Tmp t
)T1
Group By ID,maxNumber

The query you posted has an illegal column name (number) and is group by the alias for the max value, which is illegal and also doesn't make sense; and you can't include the unaliased max() within the group-by either. So it's likely you're actually doing something like:
select id, max(numb) as maxnumb
from table1
group by id;
which will give one row per ID, with the maximum numb (which is the new name I've made up for your numeric column) for each ID. Or as you said you get "ALL the entries" you might have group by id, numb, which would show all rows from the table (unless there are duplicate combinations).
To get the maximum numb and the corresponding id you could group by id only, order by descending maxnumb, and then return the first row only:
select id, max(numb) as maxnumb
from table1
group by id
order by maxnumb desc
fetch first 1 row only
If there are two ID with the same maxnumb then you would only get one of them - and which one is indeterminate unless you modify the order by - but in that case you might prefer to use first 1 row with ties to see them all.
You could achieve the same thing with a subquery and analytic function to generating a ranking, and have the outer query return the highest-ranking row(s):
select id, numb as maxnumb
from (
select id, numb, dense_rank() over (order by numb desc) as rnk
from table1
)
where rnk = 1
You could also use keep to get the same result as first 1 row only:
select max(id) keep (dense_rank last order by numb) as id, max(numb) as maxnumb
from table1
fiddle

How to select multiple columns from a table while ensuring that one specific column doesn't contain duplicate values in sql server?

Table:
x_id---y---z_id------a-------b-------c
1------0----NULL----Blah----Blah---Blah
2------0----NULL----Blah----Blah---Blah
3------10---6-------Blah----Blah---Blah
3------10---5-------Blah----Blah---Blah
3------10---4-------Blah----Blah---Blah
3------10---3-------Blah----Blah---Blah
3------10---2-------Blah----Blah---Blah
3------10---1-------Blah----Blah---Blah
4------0----NULL----Blah----Blah---Blah
5------0----NULL----Blah----Blah---Blah
My Query:
SELECT
#temp.x_id,
#temp.y,
MAX(#temp.z_id) AS z_id
FROM #temp
GROUP BY
#temp.x_id
Error:
Column 'y' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Requirement:
I want x_id to be unique and i want to select the MAX value of z_id
Expected Output:
x_id---y---z_id------a-------b-------c
1------0----NULL---Blah----Blah-----Blah
2------0----NULL---Blah----Blah-----Blah
3------10---6------Blah----Blah-----Blah
4------0----NULL---Blah----Blah-----Blah
5------0----NULL---Blah----Blah-----Blah

The error you are seeing is a common one, and it is happening because when you state GROUP BY x_id you are telling SQL Server to return a single record for every value of x_id. However, when you select y, it is unclear as to which one of possibly many values you want to use. Hence, it is resulting in error. One correct approach would be to use ROW_NUMBER:
SELECT TOP 1 WITH TIES x_id, y, z_id, a, b, c
FROM #temp
ORDER BY ROW_NUMBER() OVER (PARTITION BY x_id ORDER BY z_id DESC);

Query to return rows with duplicate values based on other column

I have a table in SQL Server, the format is as follows:
I would want to get rows according to the following conditions:
If the rows have multiple but similar Customer_ID and Order_Number then return only those rows with maximum date
Otherwise, return the rest of the rows
So the result in this case will be row 3, 4 and 5.
Any idea on how to achieve this using SQL query? The table has no primary or unique key.

use window function row_number()
select *
from
(
select *,
row_number() over(partition by Customer_ID,Order_Number order by date desc) as rn
from your_table
) t where rn=1
or use co-relates subquery
select *
from t
where date in (
select max(date)
from t t1
where t1.Customer_ID=t.Customer_ID and t1.Order_Number=t.Order_Number
)

SQL Select a distinct row based on two columns which has min value in third column

EDIT: I'm using PostgresSQL
My query needs to return all the unique rows for the id column and the type column. When there are multiple rows with the same id and type it will return the row with the smallest value in the time column.
SELECT id, type, value FROM TableName
GROUP BY MIN(time)
ORDER BY id ASC, type ASC
This is what I have so far but I feel like I'm using GROUP BY the wrong way

I think you can use ROW_NUMBER to mark the rows within each combination of id and type with the smallest time having rn = 1, then use WHERE clause to filter the table:
SELECT id, type, value FROM
(SELECT id, type, value,
ROW_NUMBER() OVER(PARTITION BY id, type ORDER BY time) AS rn
FROM TableName) a
WHERE rn = 1

Postgres support distinct on. This is usually the most efficient way to do what you want:
SELECT DISTINCT ON (id, type) id, type, value
FROM TableName
ORDER BY id, type, time ;

How can I de-duplicate records based on a specific column in BigQuery?

I have a table of records that is growing, and I'd like to be able to append modified records to it. However, I'd like to be able to then have a logical view of all of the "newest" versions of each record (highest modified_date + unique primary_key). I tried a JOIN against the table with a GROUP BY primary_key, but this then requires that the entire table have ORDER BY modified_date, which exceeds resources.

You can avoid the resource explosion by specifying PARTITION BY, which then allows for sorting on a more granular level. This pattern suffices:
SELECT
*
FROM (
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY primary_key ORDER BY modified_date DESC) seq
FROM
my_table)
WHERE
seq = 1;

There is now a better way of doing this. Here is an example:
WITH T AS (
SELECT x, y, MOD(y, 2) AS z
FROM UNNEST([5, 4, 3, 2]) AS x WITH OFFSET y
)
SELECT
z,
ARRAY_AGG(x ORDER BY y LIMIT 1)[OFFSET(0)] AS top_x
FROM T
GROUP BY z;
This returns the top x value as determined by some other column, grouped by a third column. The query in the other answer could be expressed as:
WITH my_table AS (
SELECT 1 AS primary_key, "foo" AS value, DATE('2016-11-09') AS modified_date UNION ALL
SELECT 1, "bar", DATE('2016-11-10') UNION ALL
SELECT 2, "baz", DATE('2016-01-01')
)
SELECT
row.*
FROM (
SELECT
ARRAY_AGG(t ORDER BY modified_date DESC LIMIT 1)[OFFSET(0)] AS row
FROM my_table AS t
GROUP BY primary_key
);
This returns the row associated with the most recent modified_date. In theory, you should just be able to use .* directly after [OFFSET(0)] (and not need a subselect), but there appears to be a bug with column resolution that I'm looking into.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to get the Distinct Record in Sql Server - sql

How can I display all records in a single coulmn if the name is duplicate in sql server. SELECT * INTO #temp FROM ( Select 'S1' Name, '1' Age, 'A' X, 'B' Y UNION ALL Select 'S1', '1', '', 'B' ) A Select * From #temp [Output] The expected result is:

Related

Getting MAX of a column and adding one more

How to select multiple columns from a table while ensuring that one specific column doesn't contain duplicate values in sql server?

Query to return rows with duplicate values based on other column

SQL Select a distinct row based on two columns which has min value in third column

How can I de-duplicate records based on a specific column in BigQuery?

Categories

Resources