How to query and get only consecutive rows with the same column(s) value(s)

How to query and get only consecutive rows with the same column(s) value(s) - sql

I have this activities table generated by the public_activity gem. Which has this two fields key owner_id part from others.
I'm trying to query the activities in chunks base on the owner_id and the key columns.
So if we have this data:
| id | owner_id | key |
| ---- | ---------- | -------------- |
| 1 | 1 | product.create |
| 2 | 1 | product.create |
| 3 | 1 | product.update |
| 4 | 2 | product.update |
| 5 | 2 | product.destroy |
| 6 | 2 | product.destroy |
| 7 | 2 | product.destroy |
| 8 | 1 | product.create |
| 9 | 1 | product.create |
The chunks would the matching rows.
Represented here by ids:
[1,2]
[3]
[4]
[5,6,7]
[8,9]
Chunks will be loaded one at a time via ajax, so the query will be preceded with a condition where(id > the_last_activity_fetched_id)
Any thoughts on how to build the necessary query are really welcome.

You can achieve this using array_agg:
select key,owner_id , array_agg(id) chunks from
(select key,owner_id,id from my_table order by id) A
group by key,owner_id
order by chunks
Here is an example: SQLFIDDLE

Using array_agg()
select owner_id,key, array_agg(id) chunks
from t
group by 1,2
order by 3
Demo
or
string_agg()
select owner_id,key,string_agg(id::text,',') chunks
from t
group by 1,2
order by 3
Demo
As per OP's comment
select owner_id,key
, array_agg(id) chunks
from t
group by 1,2 order by 3 desc limit 1

Related

More efficient way to SELECT rows from PARTITION BY

Suppose I have the following table:
+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
| 1 | 1 | 3 |
| 1 | 2 | 3 |
| 1 | 3 | 4 |
| 2 | 2 | 3 |
| 2 | 3 | 4 |
| 2 | 4 | 5 |
+----+-------------+-------------+
My desired results are:
+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
| 1 | 1 | 3 |
| 2 | 2 | 3 |
+----+-------------+-------------+
My current solution is:
SELECT
*
FROM
(SELECT
id,
step_number,
MIN(step_number) OVER (PARTITION BY id) AS min_step_number,
employee_id
FROM
table_name) AS t
WHERE
t.step_number = t.min_step_number
Is there a more efficient way I could be doing this?
I'm currently using postgresql, version 12.

In Postgres, I would recommend using distinct on to adress this greatest-n-per-group problem:
select distinct on (id) t.*
from mytbale t
order by id, step_number
This Postgres extension to the SQL standard has usually better performance than the standard approach using window functions (and, as a bonus, the syntax is neater).
Note that this assumes unicity of (id, step_number) tuples: otherwise, the results might be different than those of your query (which allows ties, while distinct on does not).

How to select values, where each one depends on a previously aggregated state?

I have the following table:
|-----|-----|
| i d | val |
|-----|-----|
| 1 | 1 |
|-----|-----|
| 2 | 4 |
|-----|-----|
| 3 | 3 |
|-----|-----|
| 4 | 7 |
|-----|-----|
Can I get the following output:
|-----|
| sum |
|-----|
| 1 |
|-----|
| 5 |
|-----|
| 8 |
|-----|
| 1 5 |
|-----|
using a single SQLite3 SELECT-query? I know it could be easily achieved using variables, but SQLite3 lacks those. Maybe some recursive query? Thanks.

No.
In a relational database table rows do not have any order. If you specify an order for the rows, then it's possible to write a query.
Now, you could add an extra column to sort the rows. For example:
| val | sort
|-----|-----
| 1 | 10
| 4 | 20
| 3 | 30
| 7 | 40
The query could be:
select
sum(val) over(order by sort)
from my_table

For the updated question, you can write:
select
sum(val) over(order by id)
from my_table

By using the order of the id column and if you want only the sum column, you can do this:
select (select sum(val) from tablename where id <= t.id) sum
from tablename t

calculating sum of rows with identical id

Let's imagine a table with two columns ex:
| Value | ID |
+-------+----+
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 1 | 2 |
| 2 | 2 |
| 2 | 2 |
What I am trying to do is to calculate the sum of those with similar id and display them in different table like:
| Sum | ID |
+-----+----+
| 9 | 1 |
| 5 | 2 |
and so on.
I could find a sum of a known id by
SELECT SUM(VALUE) FROM MYTABLE WHERE ID = 1;
However not sure on how to find sum of different id's separately, could you give an idea on how to proceed?

Select SUM(VALUE),ID FROM MYTABLE GROUP BY ID

Use GROUP BY clause:
SELECT SUM(VALUE) Sum, ID FROM MYTABLE GROUP BY ID;

SELECT SUM(VALUE),ID FROM MYTABLE Group By ID

First two rows per combination of two columns

Given a table like this in PostgreSQL:
Messages
message_id | creating_user_id | receiving_user_id | created_utc
-----------+------------------+-------------------+-------------
1 | 1 | 2 | 1424816011
2 | 3 | 2 | 1424816012
3 | 3 | 2 | 1424816013
4 | 1 | 3 | 1424816014
5 | 1 | 3 | 1424816015
6 | 2 | 1 | 1424816016
7 | 2 | 1 | 1424816017
8 | 1 | 2 | 1424816018
I want to get the newest two rows per creating_user_id/receiving_user_id where the other user_id is 1. So the result of the query should look like:
message_id | creating_user_id | receiving_user_id | created_utc
-----------+------------------+-------------------+-------------
1 | 1 | 2 | 1424816011
4 | 1 | 3 | 1424816014
5 | 1 | 3 | 1424816015
6 | 2 | 1 | 1424816016
Using a window function with row_number() I can get the first 2 messages for each creating_user_id or the first 2 messages for each receiving_user_id, but I'm not sure how to get the first two messages for per creating_user_id/receiving_user_id.

Since you filter rows where one of both columns is 1 (and irrelevant), and 1 happens to be the smallest number of all, you can simply use GREATEST(creating_user_id, receiving_user_id) to distill the relevant number to PARTITION BY. (Else you could employ CASE.)
The rest is standard procedure: calculate a row number in a subquery and select the first two in the outer query:
SELECT message_id, creating_user_id, receiving_user_id, created_utc
FROM (
SELECT *
, row_number() OVER (PARTITION BY GREATEST (creating_user_id
, receiving_user_id)
ORDER BY created_utc) AS rn
FROM messages
WHERE 1 IN (creating_user_id, receiving_user_id)
) sub
WHERE rn < 3
ORDER BY created_utc;
Exactly your result.
SQL Fiddle.

selecting data with highest field value in a field

I have a table, and I'd like to select rows with the highest value. For example:
----------------
| user | index |
----------------
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
| 3 | 4 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
----------------
Expected result:
----------------
| user | index |
----------------
| 1 | 1 |
| 2 | 2 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
----------------
How may I do so? I assume it can be done by some oracle function I am not aware of?
Thanks in advance :-)

You can use MAX() function for that with grouping user column like this:
SELECT "user"
,MAX("index") AS "index"
FROM Table1
GROUP BY "user"
ORDER BY "user";
Result:
| USER | INDEX |
----------------
| 1 | 1 |
| 2 | 2 |
| 3 | 7 |
| 4 | 1 |
| 5 | 1 |
See this SQLFiddle

if you have more than one column
select user , index
from (
select u.* , row_number() over (partition by user order by index desc) as rnk
from some_table u)
where rnk = 1
user is a reserved word - you should use a different name for the column.

select user,max(index) index from tbl
group by user;

Alternatively, you can use analytic functions:
select user,index, max(index) over (partition by user order by 1 ) highest from YOURTABLE
Note: Try NOT to use words like user, index, date etc.. as your column names, as they are reserved words for Oracle. If you will use, then use them with quotation marks, eg. "index", "date"...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to query and get only consecutive rows with the same column(s) value(s) - sql

You can achieve this using array_agg: select key,owner_id , array_agg(id) chunks from (select key,owner_id,id from my_table order by id) A group by key,owner_id order by chunks Here is an example: SQLFIDDLE

Related

More efficient way to SELECT rows from PARTITION BY

How to select values, where each one depends on a previously aggregated state?

calculating sum of rows with identical id

First two rows per combination of two columns

selecting data with highest field value in a field

Categories

Resources