Filtering using aggregation functions

Filtering using aggregation functions - sql

I would like to filter my table by MIN() function but still keep columns which cant be grouped.
I have table:
+----+----------+----------------------+
| ID | distance | geom |
+----+----------+----------------------+
| 1 | 2 | DSDGSAsd23423DSFF |
| 2 | 11.2 | SXSADVERG678BNDVS4 |
| 2 | 2 | XCZFETEFD567687SDF |
| 3 | 24 | SADASDSVG3423FD |
| 3 | 10 | SDFSDFSDF343DFDGF |
| 4 | 34 | SFDHGHJ546GHJHJHJ |
| 5 | 22 | SDFSGTHHGHGFHUKJYU45 |
| 6 | 78 | SDFDGDHKIKUI45 |
| 6 | 15 | DSGDHHJGHJKHGKHJKJ65 |
+----+----------+----------------------+
This is what I would like to achieve:
+----+----------+----------------------+
| ID | distance | geom |
+----+----------+----------------------+
| 1 | 2 | DSDGSAsd23423DSFF |
| 2 | 2 | XCZFETEFD567687SDF |
| 3 | 10 | SDFSDFSDF343DFDGF |
| 4 | 34 | SFDHGHJ546GHJHJHJ |
| 5 | 22 | SDFSGTHHGHGFHUKJYU45 |
| 6 | 15 | DSGDHHJGHJKHGKHJKJ65 |
+----+----------+----------------------+
it is possible when I use MIN() on distance column and grouping by ID but then I loose my geom which is essential.
The query looks like this:
SELECT "ID", MIN(distance) AS distance FROM somefile GROUP BY "ID"
the result is:
+----+----------+
| ID | distance |
+----+----------+
| 1 | 2 |
| 2 | 2 |
| 3 | 10 |
| 4 | 34 |
| 5 | 22 |
| 6 | 15 |
+----+----------+
but this is not what I want.
Any suggestions?

One common approach to this is to find the minimum values in a derived table that you join with:
SELECT somefile."ID", somefile.distance, somefile.geom
FROM somefile
JOIN (
SELECT "ID", MIN(distance) AS distance FROM somefile GROUP BY "ID"
) t ON t.distance = somefile.distance AND t.ID = somefile.ID;
Sample SQL Fiddle

You need a window function to do this:
SELECT "ID", distance, geom
FROM (
SELECT "ID", distance, geom, rank() OVER (PARTITION BY "ID" ORDER BY distance) AS rnk
FROM somefile) sub
WHERE rnk = 1;
This effectively orders the entire set of rows first by the "ID" value, then by the distance and returns the record for each "ID" where the distance is minimal - no need to do a GROUP BY.

select a.*,b.geom from
(SELECT ID, MIN(distance) AS distance FROM somefile GROUP BY ID) as a
inner join somefile as b on a.id=b.id and a.distance=b.distance

You can use "distinct on" clause of the PostgreSQL.
select distinct on(id) id, distance, geom
from table_name
order by distance;
I think this is what you are exactly looking for.
For more details on how "distinct on" works, refer the documentation and the example.
But, remember, using "distinct on" does not comply to SQL standards.

Related

More efficient way to SELECT rows from PARTITION BY

Suppose I have the following table:
+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
| 1 | 1 | 3 |
| 1 | 2 | 3 |
| 1 | 3 | 4 |
| 2 | 2 | 3 |
| 2 | 3 | 4 |
| 2 | 4 | 5 |
+----+-------------+-------------+
My desired results are:
+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
| 1 | 1 | 3 |
| 2 | 2 | 3 |
+----+-------------+-------------+
My current solution is:
SELECT
*
FROM
(SELECT
id,
step_number,
MIN(step_number) OVER (PARTITION BY id) AS min_step_number,
employee_id
FROM
table_name) AS t
WHERE
t.step_number = t.min_step_number
Is there a more efficient way I could be doing this?
I'm currently using postgresql, version 12.

In Postgres, I would recommend using distinct on to adress this greatest-n-per-group problem:
select distinct on (id) t.*
from mytbale t
order by id, step_number
This Postgres extension to the SQL standard has usually better performance than the standard approach using window functions (and, as a bonus, the syntax is neater).
Note that this assumes unicity of (id, step_number) tuples: otherwise, the results might be different than those of your query (which allows ties, while distinct on does not).

Make a query making groups on the same result row

I have two tables. Like this.
select * from extrafieldvalues;
+----------------------------+
| id | value | type | idItem |
+----------------------------+
| 1 | 100 | 1 | 10 |
| 2 | 150 | 2 | 10 |
| 3 | 101 | 1 | 11 |
| 4 | 90 | 2 | 11 |
+----------------------------+
select * from items
+------------+
| id | name |
+------------+
| 10 | foo |
| 11 | bar |
+------------+
I need to make a query and get something like this:
+--------------------------------------+
| idItem | valtype1 | valtype2 | name |
+--------------------------------------+
| 10 | 100 | 150 | foo |
| 11 | 101 | 90 | bar |
+--------------------------------------+
The quantity of types of extra field values is variable, but every item ALWAYS uses every extra field.

If you have only two fields, then left join is an option for this:
select i.*, efv1.value as value_1, efv2.value as value_2
from items i left join
extrafieldvalues efv1
on efv1.iditem = i.id and
efv1.type = 1 left join
extrafieldvalues efv2
on efv1.iditem = i.id and
efv1.type = 2 ;
In terms of performance, two joins are probably faster than an aggregation -- and it makes it easier to bring in more columns from items. One the other hand, conditional aggregation generalizes more easily and the performance changes by little as more columns from extrafieldvalues are added to the select.

Use conditional aggregation
select iditem,
max(case when type=1 then value end) as valtype1,
max(case when type=2 then value end) as valtype2,name
from extrafieldvalues a inner join items b on a.iditem=b.id
group by iditem,name

How to select values, where each one depends on a previously aggregated state?

I have the following table:
|-----|-----|
| i d | val |
|-----|-----|
| 1 | 1 |
|-----|-----|
| 2 | 4 |
|-----|-----|
| 3 | 3 |
|-----|-----|
| 4 | 7 |
|-----|-----|
Can I get the following output:
|-----|
| sum |
|-----|
| 1 |
|-----|
| 5 |
|-----|
| 8 |
|-----|
| 1 5 |
|-----|
using a single SQLite3 SELECT-query? I know it could be easily achieved using variables, but SQLite3 lacks those. Maybe some recursive query? Thanks.

No.
In a relational database table rows do not have any order. If you specify an order for the rows, then it's possible to write a query.
Now, you could add an extra column to sort the rows. For example:
| val | sort
|-----|-----
| 1 | 10
| 4 | 20
| 3 | 30
| 7 | 40
The query could be:
select
sum(val) over(order by sort)
from my_table

For the updated question, you can write:
select
sum(val) over(order by id)
from my_table

By using the order of the id column and if you want only the sum column, you can do this:
select (select sum(val) from tablename where id <= t.id) sum
from tablename t

selecting data with highest field value in a field

I have a table, and I'd like to select rows with the highest value. For example:
----------------
| user | index |
----------------
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
| 3 | 4 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
----------------
Expected result:
----------------
| user | index |
----------------
| 1 | 1 |
| 2 | 2 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
----------------
How may I do so? I assume it can be done by some oracle function I am not aware of?
Thanks in advance :-)

You can use MAX() function for that with grouping user column like this:
SELECT "user"
,MAX("index") AS "index"
FROM Table1
GROUP BY "user"
ORDER BY "user";
Result:
| USER | INDEX |
----------------
| 1 | 1 |
| 2 | 2 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
See this SQLFiddle

if you have more than one column
select user , index
from (
select u.* , row_number() over (partition by user order by index desc) as rnk
from some_table u)
where rnk = 1
user is a reserved word - you should use a different name for the column.

select user,max(index) index from tbl
group by user;

Alternatively, you can use analytic functions:
select user,index, max(index) over (partition by user order by 1 ) highest from YOURTABLE
Note: Try NOT to use words like user, index, date etc.. as your column names, as they are reserved words for Oracle. If you will use, then use them with quotation marks, eg. "index", "date"...

sql - select row from group based on multiple values

I have a table like:
| ID | Val |
+-------+-----+
| abc-1 | 10 |
| abc-2 | 30 |
| cde-1 | 10 |
| cde-2 | 10 |
| efg-1 | 20 |
| efg-2 | 11 |
and would like to get the result based on the substring(ID, 1, 3) and minimum value and ist must be only the first in case the Val has duplicates
| ID | Val |
+-------+-----+
| abc-1 | 10 |
| cde-1 | 10 |
| efg-2 | 11 |
the problem is that I am stuck, because I cannot use group by substring(id,1,3), ID since it will then have again 2 rows (each for abc-1 and abc-2)

WITH
sorted
AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY substring(id,1,3) ORDER BY val, id) AS sequence_id
FROM
yourTable
)
SELECT
*
FROM
sorted
WHERE
sequence_id = 1

SELECT SUBSTRING(id,1,3),MIN(val) FROM Table1 GROUP BY SUBSTRING(id,1,3);
You were grouping the columns using both SUBSTRING(id,1,3),id instead of just SUBSTRING(id,1,3). It works perfectly fine.Check the same example in this below link.
http://sqlfiddle.com/#!3/fd9fc/1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Filtering using aggregation functions - sql

select a.*,b.geom from (SELECT ID, MIN(distance) AS distance FROM somefile GROUP BY ID) as a inner join somefile as b on a.id=b.id and a.distance=b.distance

Related

More efficient way to SELECT rows from PARTITION BY

Make a query making groups on the same result row

How to select values, where each one depends on a previously aggregated state?

selecting data with highest field value in a field

sql - select row from group based on multiple values

Categories

Resources