Group by and fetch column that is not in group by clause

Group by and fetch column that is not in group by clause - sql

I have (sample) data:
equipment_id | node_id | value (type: jsonb)
------------------------------
1 | 1 | 0.3
1 | 2 | 0.4
2 | 3 | 0.7
2 | 4 | 0.6
2 | 5 | 0.7
And I want to get the rows that has max value within the same equipment_id:
equipment_id | node_id | value
------------------------------
1 | 2 | 0.4
2 | 3 | 0.7
2 | 5 | 0.7
There is query that does what I want but I'm afraid of performance degradation because of casting jsonb to float:
with cte as (
select
equipment_id,
max(value::text::float) as val
from metrics
group by equipment_id
)
select cte.equipment_id, m.node_id, cte.val
from cte
join metrics m on cte.equipment_id = m.equipment_id and cte.val = m.value::text::float
How can I avoid casting?

Use distinct on:
select distinct on (equipement_id) m.*
from metrics m
order by equipment_id, value desc;
If your value is actually stored as a string, then use:
order by equipment_id, value::numeric desc;

You can use row_number()
select * from
(
select *, row_number() over(partition by equipment_id order by value::text::float desc) as rn
from tablename
)A where rn=1

Related

Product of distinct values before a certain date

I have a table with schema:
date | item_id | factor
----------------------
20180710 | 1 | 0.1
20180711 | 1 | 0.1
20180712 | 1 | 2
20180713 | 1 | 2
20180714 | 1 | 2
20180710 | 2 | 0.1
20180711 | 2 | 0.1
20180712 | 2 | 5
20180713 | 2 | 5
20180714 | 2 | 10
The factor for each item_id can change on any date. On each date, I need to calculate the product of all the distinct factors for each item_id up to that date (inclusive), so the final output for the above table should be:
date | id | cumulative_factor
20180710 | 1 | 0.1
20180711 | 1 | 0.1
20180712 | 1 | 0.2
20180713 | 1 | 0.2
20180714 | 1 | 0.2
20180710 | 2 | 0.1
20180711 | 2 | 0.1
20180712 | 2 | 0.5
20180713 | 2 | 0.5
20180714 | 2 | 5
Logic:
On 20180711, for id=1, the distinct factors is 0.1 only, so the cumulative factor is 0.1.
On 20180714, for id=1, the distinct factors are 0.1 and 2, so the cumulative factor is 0.1*2 = 0.2.
On 20180714, for id=2, the distinct factors are 0.1, 5 and 10, so the cumulative factor is 0.1*5*10 = 5.
I've tried
select a.id, a.date, b.cum_factor
from factor_table a
left join (
select id, date, ISNULL(EXP(SUM(distinct log_factor)),1) as cum_factor
from factor_table
where date < a.date
) b
on a.id=b.id and a.date = b.date
but I get the error
a.date not found

there isn't a product aggregate function in SQL Server.
However, you can emulate it with EXP ( SUM ( LAG ( value ) ) )
please refer to in-line query for the comments
; with
cte as
(
-- this cte set the factor to 1 if it is same as previous row
-- as you wanted `product of distinct`
select *,
factor2 = CASE WHEN LAG(factor) OVER (PARTITION BY id
ORDER BY [date]) IS NULL
OR LAG(factor) OVER (PARTITION BY id
ORDER BY [date]) <> factor
THEN factor
ELSE 1
END
from factor_table
),
cte2 as
(
-- this cte peform SUM( LOG ( factor ) ) only. except EXP()
select *, factor3 = SUM(LOG(factor2)) OVER (PARTITION BY id
ORDER BY [date])
from cte
)
-- EXP() is not a window function, so it has to do it in separately in another level
select *, EXP(factor3) as cumulative_factor
from cte2
Note : LAG() required SQL Server 2012 or later

Something seems wrong with multiplying distinct factors. You can pretty easily express this using window functions:
select f.id, f.date, f.cum_factor
exp(sum(distinct log(log_factor) over (partition by f.id order by f.date)))
from factor_table f;
To get around the limitation on distinct:
select f.id, f.date, f.cum_factor
exp(sum(log(case when seqnum = 1 then log_factor end) over (partition by f.id order by f.date)))
from (select t.*,
row_number() over (partition by id, log_factor order by date) as seqnum
from factor_table f
) f;

PostgreSQL LATERAL JOIN to LIMIT GROUP BY

Sorry I'm just failing to do the lateral join!
I got a table like this:
ID | NUMBER | VALUE
-------------------
20 | 12 | 0.7
21 | 12 | 0.8
22 | 13 | 0.8
23 | 13 | 0.7
24 | 13 | 0.9
25 | Null | 0.9
Now I would like to get the first 2 rows for each NUMBER sorted by decreasing order of VALUE.
ID | NUMBER | VALUE
-------------------
21 | 12 | 0.8
20 | 12 | 0.7
24 | 13 | 0.9
22 | 13 | 0.8
The code I tried so far looks like this:
(Found: Grouped LIMIT in PostgreSQL: show the first N rows for each group?)
SELECT DISTINCT t_outer.id, t_top.number, t_top.value
FROM table t_outer
JOIN LATERAL (
SELECT * FROM table t_inner
WHERE t_inner.number NOTNULL
AND t_inner.id = t_outer.id
AND t_inner.number = t_outer.number
ORDER BY t_inner.value DESC
LIMIT 2
) t_top ON TRUE
order by t_outer.value DESC;
Everything is fine so far, it just seems like the LIMIT 2 is not working. I get all the rows for all NUMBER elements back.

Make use of windows analytical function row_number
Rextester Demo
select "ID", "NUMBER", "VALUE" from
(select t.*
,row_number() over (partition by "NUMBER"
order by "VALUE" desc
) as rno
from table1 t
) t1
where t1.rno <=2;
Output
ID NUMBER VALUE
21 12 0,8000
20 12 0,7000
24 13 0,9000
22 13 0,8000
25 NULL 0,9000
Explanation:
Inner query t1, will assing rno order by value desc for each number group. Then in outer query, you can select rno <= 2 to get your output.

postgresql - How to get one row the min value

I have table (t_image) with this column
datacd | imagecode | indexdate
----------------------------------
A | 1 | 20170213
A | 2 | 20170213
A | 3 | 20170214
B | 4 | 20170201
B | 5 | 20170202
desired result is this
datacd | imagecode | indexdate
----------------------------------
A | 1 | 20170213
B | 4 | 20170201
In the above table, I want to retrieve 1 row for each datacd who has the minimum index date
Here is my query, but the result returns 2 rows for datacd A
select *
from (
select datacd, min(indexdate) as indexdate
from t_image
group by datacd
) as t1 inner join t_image as t2 on t2.datacd = t1.datacd and t2.indexdate = t1.indexdate;

The Postgres proprietary distinct on () operator is typically the fastest solution for greatest-n-per-group queries:
select distinct on (datacd) *
from t_image
order by datacd, indexdate;

One option uses ROW_NUMBER():
SELECT t.datacd,
t.imagecode,
t.indexdate
FROM
(
SELECT datacd, imagecode, indexdate,
ROW_NUMBER() OVER (PARTITION BY datacd ORDER BY indexdate) rn
FROM t_image
) t
WHERE t.rn = 1

Selecting Top 1 for Every ID

I have the following table:
| ID | ExecOrd | date |
| 1 | 1.0 | 3/4/2014|
| 1 | 2.0 | 7/7/2014|
| 1 | 3.0 | 8/8/2014|
| 2 | 1.0 | 8/4/2013|
| 2 | 2.0 |12/2/2013|
| 2 | 3.0 | 1/3/2014|
| 2 | 4.0 | |
I need to get the date of the top ExecOrd per ID of about 8000 records, and so far I can only do it for one ID:
SELECT TOP 1 date
FROM TABLE
WHERE DATE IS NOT NULL and ID = '1'
ORDER BY ExecOrd DESC
A little help would be appreciated. I have been trying to find a similar question to mine with no success.

There are several ways of doing this. A generic approach is to join the table back to itself using max():
select t.date
from yourtable t
join (select max(execord) execord, id
from yourtable
group by id
) t2 on t.id = t2.id and t.execord = t2.execord
If you're using 2005+, I prefer to use row_number():
select date
from (
select row_number() over (partition by id order by execord desc) rn, date
from yourtable
) t
where rn = 1;
SQL Fiddle Demo
Note: they will give different results if ties exist.

;with cte as (
SELECT id,row_number() over(partition by ID order byExecOrd DESC) r
FROM TABLE WHERE DATE IS NOT NULL )
select id from
cte where r=1

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.

For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.

If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t

Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle

How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Group by and fetch column that is not in group by clause - sql

Use distinct on: select distinct on (equipement_id) m.* from metrics m order by equipment_id, value desc; If your value is actually stored as a string, then use: order by equipment_id, value::numeric desc;

You can use row_number() select * from ( select *, row_number() over(partition by equipment_id order by value::text::float desc) as rn from tablename )A where rn=1

Related

Product of distinct values before a certain date

PostgreSQL LATERAL JOIN to LIMIT GROUP BY

postgresql - How to get one row the min value

Selecting Top 1 for Every ID

Grouping SQL Results based on order

Categories

Resources