SQLite: average of column indexed by two columns

SQLite: average of column indexed by two columns - sql

Given the following table
+----+----+------+
|id1 |id2 |value |
+----+----+------+
| 1 | 2 | 10 |
| 1 | 3 | 20 |
| 1 | 4 | 30 |
| 2 | 3 | 10 |
| 2 | 4 | 40 |
| 3 | 4 | 10 |
+----+----+------+
I want to have avg(value) of each id, whether located the id1 or id2 column.
Thus, the output should be:
1,20
2,20
3,16.66
4,26.6
Help would be greatly appreciated.

You could use UNION ALL:
WITH cte AS (
SELECT id, value FROM tab
UNION ALL
SELECT id2, value FROM tab
)
SELECT id, AVG(value) AS value
FROM cte
GROUP BY id;
DBFidde Demo

Related

Selecting rows that doesn't have duplicates

Let's say I have the following table:
| sku | id | value | count |
|-----|----|-------|-------|
| A | 1 | 1 | 2 |
| A | 1 | 2 | 2 |
| A | 3 | 3 | 3 |
I want to select rows that don't have the same count for the same id. So my desired outcome is:
| sku | id | value | count |
|-----|----|-------|-------|
| A | 3 | 3 | 3 |
I need something that works with Postgres 10

A simple method is window functions:
select t.*
from (select t.*, count(*) over (partition by sku, id) as cnt
from t
) t
where cnt = 1;
This assumes you really mean the sku/id combination.

Get maximum of sequence

+----+-------+
| id | value |
+----+-------+
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
| 5 | D |
| 6 | D |
| 7 | N |
| 8 | P |
| 9 | P |
+----+-------+
Desired output
+----+-------+---------------------+
| id | value | calc ↓ |
+----+-------+---------------------+
| 1 | A | 1 |
| 2 | B | 2 |
| 3 | C | 3 |
| 4 | D | 6 |
| 5 | D | 6 |
| 6 | D | 6 |
| 7 | N | 7 |
| 8 | P | 9 |
| 9 | P | 9 |
| 10 | D | 11 |
| 11 | D | 11 |
| 12 | Z | 12 |
+----+-------+---------------------+
Can you help me for a solution for this ? Id is identity, id must be present in output, must have the same 9 rows in output.
New note: I added rows 10,11,12. Notice that id 10 and 11 which has letter 'D' is in a different group from id 4,5,6
thanks

If the grouping also depends on the surrounding ids then this turns into something like the gaps and islands problem https://www.red-gate.com/simple-talk/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/#:~:text=The%20SQL%20of%20Gaps%20and%20Islands%20in%20Sequences,...%204%20Performance%20Comparison%20of%20Gaps%20Solutions.%20
You could use the Tabibitosan method https://rwijk.blogspot.com/2014/01/tabibitosan.html
Here you also need to group by your value column but that doesn't complicate it too much:
select id, value, max(id) over (partition by value, island) calc
from (
select id, value, id - row_number() over(partition by value order by id) island
from my_table
) as sq
order by id;
The id - row_number() over(partition by value order by id) expression gives you a number which changes each time the ID value changes by more than 1 for each value of value. This gets included in the max(id) over (partition by value, island) expression. The island number is only valid for that particular value. In your case, both values N and D have a computed island number of 6 but they need to be considered differently.
Db-fiddle https://www.db-fiddle.com/f/jahP7T6xBt3cpbLRhZZdQG/1

For this sample date you need MAX() window function:
SELECT id, value,
MAX(id) OVER (PARTITION BY value) calc
FROM tablename

SELECT id, value, (SELECT max(id) FROM TABLE inner where inner.value = outer.value)
FROM table as outer

SQL group by changing column

Suppose I have a table sorted by date as so:
+-------------+--------+
| DATE | VALUE |
+-------------+--------+
| 01-09-2020 | 5 |
| 01-15-2020 | 5 |
| 01-17-2020 | 5 |
| 02-03-2020 | 8 |
| 02-13-2020 | 8 |
| 02-20-2020 | 8 |
| 02-23-2020 | 5 |
| 02-25-2020 | 5 |
| 02-28-2020 | 3 |
| 03-13-2020 | 3 |
| 03-18-2020 | 3 |
+-------------+--------+
I want to group by changes in value within that given date range, and add a value that increments each time as an added column to denote that.
I have tried a number of different things, such as using the lag function:
SELECT value, value - lag(value) over (order by date) as count
GROUP BY value
In short, I want to take the table above and have it look like:
+-------------+--------+-------+
| DATE | VALUE | COUNT |
+-------------+--------+-------+
| 01-09-2020 | 5 | 1 |
| 01-15-2020 | 5 | 1 |
| 01-17-2020 | 5 | 1 |
| 02-03-2020 | 8 | 2 |
| 02-13-2020 | 8 | 2 |
| 02-20-2020 | 8 | 2 |
| 02-23-2020 | 5 | 3 |
| 02-25-2020 | 5 | 3 |
| 02-28-2020 | 3 | 4 |
| 03-13-2020 | 3 | 4 |
| 03-18-2020 | 3 | 4 |
+-------------+--------+-------+
I want to eventually have it all in one small table with the earliest date for each.
+-------------+--------+-------+
| DATE | VALUE | COUNT |
+-------------+--------+-------+
| 01-09-2020 | 5 | 1 |
| 02-03-2020 | 8 | 2 |
| 02-23-2020 | 5 | 3 |
| 02-28-2020 | 3 | 4 |
+-------------+--------+-------+
Any help would be very appreciated

you can use a combination of Row_number and Dense_rank functions to get the required results like below:
;with cte
as
(
select t.DATE,t.VALUE
,Dense_rank() over(partition by t.VALUE order by t.DATE) as d_rank
,Row_number() over(partition by t.VALUE order by t.DATE) as r_num
from table t
)
Select t.Date,t.Value,d_rank as count
from cte
where r_num = 1

You can use a lag and cumulative sum and a subquery:
SELECT value,
SUM(CASE WHEN prev_value = value THEN 0 ELSE 1 END) OVER (ORDER BY date)
FROM (SELECT t.*, LAG(value) OVER (ORDER BY date) as prev_value
FROM t
) t
Here is a db<>fiddle.

You can recursively use lag() and then row_number() analytic functions :
WITH t2 AS
(
SELECT LAG(value,1,value-1) OVER (ORDER BY date) as lg,
t.*
FROM t
)
SELECT t2.date,t2.value, ROW_NUMBER() OVER (ORDER BY t2.date) as count
FROM t2
WHERE value - lg != 0
Demo
and filter through inequalities among the returned values from those functions.

How to sum rows before a condition is met in SQL

I have a table which has multiple records for the same id. Looks like this, and the rows are sorted by sequence number.
+----+--------+----------+----------+
| id | result | duration | sequence |
+----+--------+----------+----------+
| 1 | 12 | 7254 | 1 |
+----+--------+----------+----------+
| 1 | 12 | 2333 | 2 |
+----+--------+----------+----------+
| 1 | 11 | 1000 | 3 |
+----+--------+----------+----------+
| 1 | 6 | 5 | 4 |
+----+--------+----------+----------+
| 1 | 3 | 20 | 5 |
+----+--------+----------+----------+
| 2 | 1 | 230 | 1 |
+----+--------+----------+----------+
| 2 | 9 | 10 | 2 |
+----+--------+----------+----------+
| 2 | 6 | 0 | 3 |
+----+--------+----------+----------+
| 2 | 1 | 5 | 4 |
+----+--------+----------+----------+
| 2 | 12 | 3 | 5 |
+----+--------+----------+----------+
E.g. for id=1, i would like to sum the duration for all the rows before and include result=6, which is 7254+2333+1000+5. Same for id =2, it would be 230+10+0. Anything after the row where result=6 will be left out.
My expected output:
+----+----------+
| id | duration |
+----+----------+
| 1 | 10592 |
+----+----------+
| 2 | 240 |
+----+----------+
The sequence has to be in ascending order.
I'm not sure how I can do this in sql.
Thank you in advance!

I think you want:
select t2.id, sum(t2.duration)
from t
where t.sequence <= (select t2.sequence
from t t2
where t2.id = t.id and t2.result = 6
);
In PrestoDB, I would recommend window functions:
select id, sum(duration)
from (select t.*,
min(case when result = 6 then sequence end) over (partition by id) as sequence_6
from t
) t
where sequence <= sequence_6;

You can use a simple aggregate query with a condition that uses a subquery to recover the sequence corresponding to the record whose sequence is 6 :
SELECT t.id, SUM(t.duration) total_duration
FROM mytable t
WHERE t.sequence <= (
SELECT sequence
FROM mytable
WHERE id = t.id AND result = 6
)
GROUP BY t.id
This demo on DB Fiddle with your test data returns :
| id | total_duration |
| --- | -------------- |
| 1 | 10592 |
| 2 | 240 |

Basic group by query should solve your issue
select
id,
sum(duration) duration
from t
group by id
for the certain rows:
select
id,
sum(duration) duration
from t
where id = 1
group by id
if you want to include it in your result set
select id, duration, sequence from t
union all
select
id,
sum(duration) duration
null sequence
from t
group by id

PostgreSQL - select count of repeated continuous sequences

I have the following table/data:
| user_id | action_id | data |
-------------------------------------
| 10 | 1 | fly |
| 10 | 2 | train |
| 10 | 3 | fly |
| 10 | 4 | fly |
| 10 | 5 | fly |
| 10 | 6 | train |
| 10 | 7 | fly |
| 10 | 8 | train |
| 10 | 9 | fly |
| 10 | 10 | fly |
Is there a way in postgresql to count repeated continuous 'fly' occurrences? In this example, the results should be:
counts
------
1
3
1
2

Yes, it's possible, using the lag window function and a cumulative sum:
with FlagCTE as (
select t.action_id, t.data,
case when t.data = 'fly' and t.data = lag(t.data) over (order by t.action_id) then 0 else 1 end as Flag
from some_table t),
GroupCTE as (
select t.action_id,
t.data,
sum(t.Flag) over (order by t.action_id) as GroupId
from FlagCTE t
where t.data = 'fly')
select count(*) as counts
from GroupCTE t
group by t.GroupId
order by t.GroupId
SQLFiddle Demo

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQLite: average of column indexed by two columns - sql

You could use UNION ALL: WITH cte AS ( SELECT id, value FROM tab UNION ALL SELECT id2, value FROM tab ) SELECT id, AVG(value) AS value FROM cte GROUP BY id; DBFidde Demo

Related

Selecting rows that doesn't have duplicates

Get maximum of sequence

SQL group by changing column

How to sum rows before a condition is met in SQL

PostgreSQL - select count of repeated continuous sequences

Categories

Resources