How to get the biggest column value between duplicated rows id? - sql

I am working on an Oracle 11g database query that needs to retrieve a list of the highest NUM value between duplicated rows in a table.
Here is an example of my context:
ID | NUM
------------
1 | 1111
1 | 2222
2 | 3333
2 | 4444
3 | 5555
3 | 6666
And here is the result I am expecting after the query is executed:
NUM
----
2222
4444
6666
I know how to get the GREATEST value in a list of numbers, but I have absolutely no guess on how to group two lines, fetch the biggest column value between them IF they have the same ID.
Programmaticaly it is something quite easy to achieve, but using SQL it tends to be a litle bit less intuitive for me. Any suggestion or advise is welcomed as I don't even know which function could help me doing this in Oracle.
Thank you !

This is the typical use case for a GROUP BY. Assuming your Num field can be compared:
SELECT ID, MAX(NUM) as Max
FROM myTable
GROUP BY ID
If you don't want to select the ID (as in the output you provided), you can run
SELECT Max
FROM (
SELECT ID, MAX(NUM) as Max
FROM myTable
GROUP BY ID
) results
And here is the SQL fiddle
Edit : if NUM is, as you mentioned later, VARCHAR2, then you have to cast it to an Int. See this question.

The most efficient way I would suggest is
SELECT ids,
value
FROM (SELECT ids,
value,
max(value)
over (
PARTITION BY ids) max_value
FROM test)
WHERE value = max_value;
This requires that the query maintain a single value per id of the maximum value encountered so far. If a new maximum is found then the existing value is modified, otherwise the new value is discarded. The total number of elements that have to be held in memory is related to the number of ids, not the number of rows scanned.
See this SQLFIDDLE

Related

Selecting filtered values from Oracle using ROWNUM

I have a requirement wherein i need to find the record number of the records that are returned from the resultset. I know that i can use ROWNUM to get the record number from the resultset but my issue is slightly different. below are the details
Table : ProcessSummary
Columns:
PS_PK ProcessId StepId AsscoiateId ProcessName AssetAmount
145 25 50 Process1 3,500.00
267 26 45 Process2 4,400.00
356 27 70 Process3 2,400.00
456 28 80 90 Process4 780.00
556 29 56 67 Process5 4,500.00
656 45 70 Process6 6,000.00
789 31 75 Process7 8,000.00
Now what i need to do is fetch all the records from the ProcessSummary Table when either of ProcessId OR StepId OR AssociateId is NULL. I wrote the below query
select * from ProcessSummary where ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
As expected i got 1st, 2nd, 3rd, 6th and 7th records in the resultset that got returned.
Now what i need is to get the records numbers 1,2,3,6,7. I tried to use the ROWNUM as below but i got the values of 1,2,3,4,5 and not 1,2,3,6,7.
select ROWNUM from ProcessSummary where ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
Is it possible to get the ROWNUM values in the sequence that i want and if yes then can you please let me know how can i do this. Also if ROWNUM cannot be used then what would be the other option that i can use to get the result in the form that i want.
Any help would be greately appericiated as i could not find much on the net or SO regarding this sort of requirement.
Thanks
Vikeng21
rownum is an internal numbering that gives you a row number based on the current query results only, so that numbering is not tied to a specific record, and it will change when you change the data or the query.
But the numbering you ask for is already in your table. It looks like you just need to SELECT PS_PK .. instead. PS_PK is the field in your table that contains the actual number you want.
You can generate a numbering using an analytical function, and then filter that query. You need some fields to order by, though. In this case I've chosen PS_PK, but it can be another field, like ProcessName or a combination of other fields as well.
select
*
from
(select
dense_rank() over (order by PS_PK) as RANKING,
p.*
from
ProcessSummary p)
where
ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
So, in this query, first a numbering is calculated for each row that is returned from the inner query. The numbering is returned as the field RANKING. And then the other query filters further, but still will return the field RANKING with the original numbering.
Instead of dense_rank there is also rank and row_number. The differences are subtle, but you can just experiment and read some docs here and here to learn about the differences and see which one fits you best.
Note that this might slow down your query, because the inner query first generates a number for each row in the table (there is no filtering on that level now).

SQL - Min difference between two integer fields

How I can get min difference between two integer fields(value_0 - value)?
value_0 >= value always
value_0 | value
-------------------
15 | 10
12 | 10
15 | 11
11 | 11
Try this:
SELECT MIN(value_0-value) as MinDiff
FROM TableName
WHERE value_0>=value
With the sample data you have given,
Output is 0. (11-11)
See demo in SQL Fiddle.
Read more about MIN() here.
Here is one way:
select min(value_0 - value)
from table t;
This is pretty basic SQL. If you want to see other values on the same row as the minimum, use order by and choose one row:
select (value_0 - value)
from table t
order by (value_0 - value)
limit 1;
The limit 1 works in some databases for getting one row. Others use top 1 in the select clause. Or fetch first 1 rows only. Or even something else.

how to select one tuple in rows based on variable field value

I'm quite new into SQL and I'd like to make a SELECT statement to retrieve only the first row of a set base on a column value. I'll try to make it clearer with a table example.
Here is my table data :
chip_id | sample_id
-------------------
1 | 45
1 | 55
1 | 5986
2 | 453
2 | 12
3 | 4567
3 | 9
I'd like to have a SELECT statement that fetch the first line with chip_id=1,2,3
Like this :
chip_id | sample_id
-------------------
1 | 45 or 55 or whatever
2 | 12 or 453 ...
3 | 9 or ...
How can I do this?
Thanks
i'd probably:
set a variable =0
order your table by chip_id
read the table in row by row
if table[row]>variable, store the table[row] in a result array,increment variable
loop till done
return your result array
though depending on your DB,query and versions you'll probably get unpredictable/unreliable returns.
You can get one value using row_number():
select chip_id, sample_id
from (select chip_id, sample_id,
row_number() over (partition by chip_id order by rand()) as seqnum
) t
where seqnum = 1
This returns a random value. In SQL, tables are inherently unordered, so there is no concept of "first". You need an auto incrementing id or creation date or some way of defining "first" to get the "first".
If you have such a column, then replace rand() with the column.
Provided I understood your output, if you are using PostGreSQL 9, you can use this:
SELECT chip_id ,
string_agg(sample_id, ' or ')
FROM your_table
GROUP BY chip_id
You need to group your data with a GROUP BY query.
When you group, generally you want the max, the min, or some other values to represent your group. You can do sums, count, all kind of group operations.
For your example, you don't seem to want a specific group operation, so the query could be as simple as this one :
SELECT chip_id, MAX(sample_id)
FROM table
GROUP BY chip_id
This way you are retrieving the maximum sample_id for each of the chip_id.

Finding the maximum value of year difference

I have two tables here
BIODATA
ID NAME
1 A
2 B
YEAR
ID JOIN YEAR GRADUATE YEAR
1 1990 1991
2 1990 1993
I already use
select
NAME,
max(year(JOIN_YEAR) - year(GRADUATE_YEAR)) as MAX
from
DATA_DIRI
right join DATA_KARTU
ON BIODATA.ID = YEAR.ID;
but the result became:
+--------+------+
| NAME | MAX |
+--------+------+
| A | 3 |
+--------+------+
I already try a lot of different kind of joins but I still can't find how the NAME to be "B". Anyone can help me? Thanks a lot before
If you use an aggregate and a non-aggregate in the selection set at once, then the row used for the non-aggregate field is essentially picked at random.
Basically, how max works is this - it gathers all rows for each group by query (if there is no group by, all of them), calculates the max and puts that in the result.
But since you also put in a non-aggregate field, it needs a value for that - so what SQL does is just pick a random row. You might think 'well, why doesn't it pick the same row max did?' but what if you used avg or count? These have no row associated with it, so the best it can do is pick randomly. This is why this behaviour exists in general.
What you need to do is use a subquery. Something like select d1.id from data_diri d1 where d1.graduate_year - d1.join_year = (select max(d2.graduate_year - d2.join_year from data_diri d2))

Select distinct values for a particular column choosing arbitrarily from duplicates

I have health data relating to deaths. Individual should die once maximum. In the database they sometimes don't; probably because causes of death were changed but the original entry was not deleted. I don't really understand how this was allowed to happen, but it has. So, as a made up example, I have:
Row_number | Individual_ID | Cause_of_death | Date_of_death
------------+---------------+-----------------------+---------------
1 | 1 | Stroke | 3 march 2008
2 | 2 | Myocardial infarction | 1 jan 2009
3 | 2 | Pulmonary Embolus | 1 jan 2009
I want each individual to have only one cause of death.
In the example, I want a query that returns row 1 and either row 2 or row 3 (not both). I have to make an arbitrary choice between rows 2 and 3 because there is no timestamp in any of the fields that can be used to determine which is the revision; it's not ideal but is unavoidable.
I can't make the SQL work to do this. I've tried inner joining distinct Individual_ID to the other fields, but this still gives all the rows. I've tried adding a 'having count(Individual_ID) = 1' clause with it. This leaves out people with more than one cause of death completely. Suggestions on the internet seem to be based on using a timestamped field to choose the most recent, but I don't have that.
IBM DB2. Windows XP. Any thoughts gratefully received.
Have you tried using MIN (or MAX) against the cause of death. (and the date of death, if they died on two different dates)
SELECT IndividualID, MIN(Cause_Of_Death), MIN (Date_Of_Death)
from deaths
GROUP BY IndividualID
I don't know DB2 so I'll answer in general. There are two main approaches:
select *
from T
join (
select keys, min(ID) as MinID
from T
group by keys
) on T.ID = MinID
And
select *, row_number() over (partition by keys) as r
from T
where r = 1
Both return all rows, no matter if duplicate or not. But they returns only one duplicate per "key".
Notice, that both statements are pseudo-SQL.
The row_number() approach is probably preferable from a performance standpoint. Here is usr's example, in DB2 syntax:
select * from (
select T.*, row_number() over (partition by Individual_ID) as r
from T
)
where r=1;