Select multiple columns having distinct just in 3 of them - sql

i've got a table that i need to return about 14 column values but only return 1 row for the duplicates on some of the columns.
The second problem is that between the duplicates i need to keep the one that has the biggest int in one of the columns that is not required to be unique.
Since the Table is somewhat big, I am seeking advice into doing this in the most efficient way.
should i be doing a group by?
my table is somewhat like this, i will simplify the number of columns.
ID(UniqueIdentifier) | ACCID(UniqueIdentifier) | DateTime(DateTime) | distance(int)|type(int)
28761188-0886-E911-822F-DD1FA635D450 1238FD8A-BD00-411A-A81C-0F6F5C026BCC 2019-06-03 14:04:41.000 2 3
41761188-0886-E911-822F-DD1FA635D450 1238FD8A-BD00-411A-A81C-0F6F5C026BCC 2019-06-03 14:04:41.000 1 3
I should be only selecting when ACCID and DATETIME is unique, the column ID in primary so will never be duplicate, and i need to keep the row with the biggest distance.

You can use the ROW_NUMBER() window function, as in:
select *
from (
select
id,
accid,
datetime,
distance,
type,
row_number() over(partition by accid, datetime order by type desc) as rn
from t
) x
where rn = 1
If you want to show multiple "ties", then replace ROW_NUMBER() by RANK().

I would suggest a correlated subquery with the right index as the fastest method:
select t.*
from t
where t.id = (select top (1) t2.id
from t t2
where t2.ACCID = t.ACCID
order by t2.distance desc
) ;
The best index is on (ACCID, distance desc, id).

Related

Postgresql, how get one row for each distinct relation_id value in one pass over table?

For example, we have table:
id
value
relation_id
1
value1
1
2
value2
2
3
value3
1
4
value4
1
5
value5
3
And I want get rows with ids 1, 2, 5, for example (because they have distinct relation_id values). It is easy, but not if you have several billions rows. It is pretty slow even on SSD drive.
I have each possible value for relation_id and I tried such query:
(select value, relation_id from table where relation_id=2 limit 1)
union
(select value, relation_id from table where relation_id=3 limit 1)
-- so on
But turns out that for each subquery postgresql looking in the table from begining despite that it is one query. Is there a way to write query in such manner, that postgresql make single pass over table and collect required data along the way?
Approach 1: Use PostgreSQL DISTINCT ON operator. It will match only distinct values of the field(s) inside the parentheses, given an ORDER BY clause, to prevent duplicates.
SELECT DISTINCT ON(relation_id) id_, value_, relation_id
FROM tab
ORDER BY relation_id
Check the demo here.
Approach 2: Use the ROW_NUMBER window function to generate a ranking over your records partitioned per "relation_id", then select the first record for each relation (rownum = 1)
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY relation_id ORDER BY id_) AS rn
FROM tab
)
SELECT id_, value_, relation_id
FROM cte
WHERE rn = 1
Check the demo here.
Approach 3: Use ROW_NUMBER inside the filtering construct FETCH FIRST ROWS WITH TIES, which will act as Approach 2 but avoids you the subquery (gets tied rows, tied on rownum=1).
SELECT *
FROM tab
ORDER BY ROW_NUMBER() OVER(PARTITION BY relation_id ORDER BY id_) = 1 DESC
FETCH FIRST 1 ROWS WITH TIES
Check the demo here.

Stuck to select maximum row

I have a table with columns:
ID | FULLNAME | VALUE
01 Joseph 10
02 Sam 50
... ... ...
I need to select row with maximum value, and show info like
FULLNAME | VALUE
I tried using group function MAX(), but I can't select fullname, because if I use it as a GROUP BY expression, it will select max in groups.
Other way, is to use WITH statement, order table by value desc, use
rank() OVER (PARTITION BY ID) AS max_id
function and maximum value will be on max_id = 1, and then use
WHERE max_id = 1
to remove other rows.
But I think there is a way to do this better and I can't find one.
UPDATE:
A tricky solution to this problem is
SELECT *
FROM t t1
LEFT JOIN t t2 ON t1.value<t2.value
WHERE t2.value IS NULL
The simplest way is to sort the data and pull one row:
select t.*
from t
order by value desc
fetch first 1 row only;
If you want ties, you can add with ties to the fetch first.
Another method is:
select t.*
from t
where t.value = (select max(t2.value) from t t2);
This can have very good performance with an index on value.

How to find Max value in a column in SQL Server 2012

I want to find the max value in a column
ID CName Tot_Val PName
--------------------------------
1 1 100 P1
2 1 10 P2
3 2 50 P2
4 2 80 P1
Above is my table structure. I just want to find the max total value only from the table. In that four row ID 1 and 2 have same value in CName but total val and PName has different values. What I am expecting is have to find the max value in ID 1 and 2
Expected result:
ID CName Tot_Val PName
--------------------------------
1 1 100 P1
4 2 80 P1
I need result same as like mention above
select Max(Tot_Val), CName
from table1
where PName in ('P1', 'P2')
group by CName
This is query I have tried but my problem is that I am not able to bring PName in this table. If I add PName in the select list means it will showing the rows doubled e.g. Result is 100 rows but when I add PName in selected list and group by list it showing 600 rows. That is the problem.
Can someone please help me to resolve this.
One possible option is to use a subquery. Give each row a number within each CName group ordered by Tot_Val. Then select the rows with a row number equal to one.
select x.*
from ( select mt.ID,
mt.CName,
mt.Tot_Val,
mt.PName,
row_number() over(partition by mt.CName order by mt.Tot_Val desc) as No
from MyTable mt ) x
where x.No = 1;
An alternative would be to use a common table expression (CTE) instead of a subquery to isolate the first result set.
with x as
(
select mt.ID,
mt.CName,
mt.Tot_Val,
mt.PName,
row_number() over(partition by mt.CName order by mt.Tot_Val desc) as No
from MyTable mt
)
select x.*
from x
where x.No = 1;
See both solutions in action in this fiddle.
You can search top-n-per-group for this kind of a query.
There are two common ways to do it. The most efficient method depends on your indexes and data distribution and whether you already have another table with the list of all CName values.
Using ROW_NUMBER
WITH
CTE
AS
(
SELECT
ID, CName, Tot_Val, PName,
ROW_NUMBER() OVER (PARTITION BY CName ORDER BY Tot_Val DESC) AS rn
FROM table1
)
SELECT
ID, CName, Tot_Val, PName
FROM CTE
WHERE rn=1
;
Using CROSS APPLY
WITH
CTE
AS
(
SELECT CName
FROM table1
GROUP BY CName
)
SELECT
A.ID
,A.CName
,A.Tot_Val
,A.PName
FROM
CTE
CROSS APPLY
(
SELECT TOP(1)
table1.ID
,table1.CName
,table1.Tot_Val
,table1.PName
FROM table1
WHERE
table1.CName = CTE.CName
ORDER BY
table1.Tot_Val DESC
) AS A
;
See a very detailed answer on dba.se Retrieving n rows per group
, or here Get top 1 row of each group
.
CROSS APPLY might be as fast as a correlated subquery, but this often has very good performance (and better than ROW_NUMBER():
select t.*
from t
where t.tot_val = (select max(t2.tot_val)
from t t2
where t2.cname = t.cname
);
Note: The performance depends on having an index on (cname, tot_val).

Permuting values in SQL

Let's say I have a table with two columns:
id | value
----------
1 | 101
2 | 356
3 | 28
I need to randomly permute the value column so that each id is randomly assigned a new value from the existing set {101,356,28}. How could I do this in Oracle SQL?
It may sound odd but this is a real problem, just with more columns.
You can do this by using row_number() with a random number generator and then joining back to the original rows:
with cte as (
select id, value,
row_number() over (order by id) as i,
row_number() over (order by dbms_random.random) as rand_i
from table t
)
select cte.id, cte1.value
from cte join
cte cte1
on cte.i = cte.rand_i;
This guarantees a permutation (i.e. no original row has its value used twice).
EDIT:
By the way, if the original ids are sequential from 1 and have no gaps, you could just do:
select row_number() over (order by dbms.random) as id, value
from table t;
An Option : select * from x_table where id = round(dbms_random.value() * 3) + 1; [Here 3 is the number of rows in your random data table and I am assuming that id is incremental and unique?]
I'll think of other options.
I'm not sure whether this is the right task for SQL database. Maybe you should implement something like this:
Factoradic permutation - in PL/SQL and then return a cursor via PIPE ROW construct. Ordering by dbms.random might be slow for large data sets.

SQL Query to obtain the maximum value for each unique value in another column

ID Sum Name
a 10 Joe
a 8 Mary
b 21 Kate
b 110 Casey
b 67 Pierce
What would you recommend as the best way to
obtain for each ID the name that corresponds to the largest sum (grouping by ID).
What I tried so far:
select ID, SUM(Sum) s, Name
from Table1
group by ID, Name
Order by SUM(Sum) DESC;
this will arrange the records into groups that have the highest sum first. Then I have to somehow flag those records and keep only those. Any tips or pointers? Thanks a lot
In the end I'd like to obtain:
a 10 Joe
b 110 Casey
You want the row_number() function:
select id, [sum], name
from (select t.*]
row_number() over (partition by id order by [sum] desc) as seqnum
from table1
) t
where seqnum = 1;
Your question is more confusing than it needs to be because you have a column called sum. You should avoid using SQL reserved words for identifiers.
The row_number() function assigns a sequential number to a group of rows, starting with 1. The group is defined by the partition by clause. In this case, all rows with the same id are in the same group. The ordering of the numbers is determined by the order by clause, so the one with the largest value of sum gets the value of 1.
If you might have duplicate maximum values and you want all of them, use the related function rank() or dense_rank().
select *
from
(
select *
,rn = row_number() over (partition by Id order by sum desc)
from table
)x
where x.rn=1
demo