Aggregate rows between two rows with certain value - sql

I'm trying to formulate a query to aggregate rows that are between rows with a specific value: in this example I want to collapse and sum time of all rows that have an ID other than 1, but still show rows with ID 1.
This is my table:
ID | Time
----+-----------
1 | 60
2 | 10
3 | 15
1 | 30
4 | 100
1 | 20
This is the result I'm looking for:
ID | Time
--------+-----------
1 | 60
Other | 25
1 | 30
Other | 100
1 | 20
I have attempted to SUM and add a condition with CASE, or but so far my solutions only get me to sum ALL rows and I lose the intervals, so I get this:
ID | Time
------------+-----------
Other | 125
1 | 110
Any help or suggestions in the right direction would be greatly appreciated, thanks!

You need to define the groupings. SQLite is not great for this sort of manipulation, but you can do it by summing the "1" values up to each value.
In SQLite, we can use the rowid column for the ordering:
select (case when id = 1 then '1' else 'other' end) as which,
sum(time)
from (select t.*,
(select count(*) from t t2 where t2.rowid <= t.rowid and t2.id = 1) as grp
from t
) t
group by (case when id = 1 then '1' else 'other' end), grp
order by grp, which;

Related

Sql assign unique key to groups having particular pattern

Hi I was trying to group data based on a particular pattern.
I have a table with two column as below,
Name rollingsum
A 5
A 10
A 0
A 5
A 0
B 6
B 0
I need to generate a key column that increment only after rollingsum equals 0 is encountered.As given below
Name rollingsum key
A 5 1
A 10 1
A 0 1
A 5 2
A 0 2
B 6 3
B 0 3
I am using postgres, I tried to increment variable in case statement as below
Declare a int;
a:=1;
........etc
Case when rolling sum =0 then a:=a+1 else a end as key
But I am getting an error near :
Thanks in advance for all help
You need an ordering columns because the results depend on the ordering of the rows -- and SQL tables represent unordered sets.
Then do a cumulative sum of the 0 counts from the end of the data. That is in reverse order, so subtract that from the total:
select t.*,
(1 + sum( (rolling_sum = 0)::int ) over () -
sum( (rolling_sum = 0)::int ) over (order by ordercol desc)
) as key
from t;
Assuming that you have a column called id to order the rows, here is one option using a cumulative count and a window frame:
select name, rollingsum,
1 + count(*) filter(where rollingsum = 0) over(
order by id
rows between unbounded preceding and 1 preceding
) as key
from mytable
Demo on DB Fiddle:
name | rollingsum | key
:--- | ---------: | --:
A | 5 | 1
A | 10 | 1
A | 0 | 1
A | 5 | 2
A | 0 | 2
B | 6 | 3
B | 0 | 3

SQL select all rows per group after a condition is met

I would like to select all rows for each group after the last time a condition is met for that group. This related question has an answer using correlated subqueries.
In my case I will have millions of categories and hundreds of millions/billions of rows. Is there a way to achieve the same results using a more performant query?
Here is an example. The condition is all rows (per group) after the last 0 in the conditional column.
category | timestamp | condition
--------------------------------------
A | 1 | 0
A | 2 | 1
A | 3 | 0
A | 4 | 1
A | 5 | 1
B | 1 | 0
B | 2 | 1
B | 3 | 1
The result I would like to achieve is
category | timestamp | condition
--------------------------------------
A | 4 | 1
A | 5 | 1
B | 2 | 1
B | 3 | 1
If you want everything after the last 0, you can use window functions:
select t.*
from (select t.*,
max(case when condition = 0 then timestamp end) over (partition by category) as max_timestamp_0
from t
) t
where timestamp > max_timestamp_0 or
max_timestamp_0 is null;
With an index on (category, condition, timestamp), the correlated subquery version might also perform quite well:
select t.*
from t
where t.timestamp > all (select t2.timestamp
from t t2
where t2.category = t.category and
t2.condition = 0
);
You might want to try window functions:
select category, timestamp, condition
from (
select
t.*,
min(condition) over(partition by category order by timestamp desc) min_cond
from mytable t
) t
where min_cond = 1
The window min() with the order by clause computes the minimum value of condition over the current and following rows of the same category: we can use it as a filter to eliminate rows for which there is a more recent row with a 0.
Compared to the correlated subquery approach, the upside of using window functions is that it reduces the number of scans needed on the table. Of course this computing also has a cost, so you'll need to assess both solutions against your sample data.

Replace Union All to join to improve performance

I have a working query which takes 20 mins to return data. I want to optimize it .
I have table
Incentives:
Transaction_ID | Incentive_On_A | Incentive_On_B | Incentive_On_C
--------------+-----------------+-----------------+---------------
1 | 0 | 0 | 10
2 | 30 | 0 | 0
3 | 0 | 20 | 0
4 | 40 | 0 | 0
Required Output:
Transaction_ID| Product_Category | Incentive_Amt
---------- + -----------------+--------------
1 | A | 30
2 | B | 20
3 | C | 10
4 | A | 40
I am using this query :
select Transaction_ID, 'A' as Product_Category,
Incentive_On_A from Incentives
Union all
select Transaction_ID, 'B' as Product_Category,
Incentive_On_B from Incentives
Union all
select Transaction_ID, 'C' as Product_Category,
Incentive_On_C from Incentives
Is there any way I can optimize this query by removing union all with join?
Thanks alot for the help.
Edited**
1.Added one more row in both the tables.
Note:- Basically we are just doing a transpose of data - converting columns- 'Incentive_on_A','Incentive_on_B','Incentive_on_C' to a column - 'Category' having the values of the above 3 columns.
You don't need a JOIN here, you just need to unpivot your data:
SELECT transaction_id, REGEXP_SUBSTR(incentive_col, '[^_]*$') AS product_category
, incentive_amt
FROM (
SELECT transaction_id, incentive_a, incentive_b, incentive_c
FROM incentives
) UNPIVOT (
incentive_amt
FOR incentive_col IN (incentive_a, incentive_b, incentive_c )
) WHERE incentive_amt > 0;
Whether or not this will actually improve your performance, I could not say. My guess is that with the UNION ALL version of your query you're actually doing a full table scan 3 times.
To start with: this is a bad datamodel. If each record can only have one value, then just store one value, exactly as shown in your desired output.
As is, you can just add all values and use CASE WHEN to see which value is greater than zero:
select
transaction_id,
case when incentive_on_a > 0 then 'A'
when incentive_on_b > 0 then 'B'
when incentive_on_c > 0 then 'C'
end as product_category,
incentive_on_a + incentive_on_b + incentive_on_c as incentive_amt
from incentives
order by transaction_id;
(However, I still fail to see how such simple query like the one you are showing can run twenty minutes.)

Oracle - Different count clause on the same line

I wish I could find a request allowing me to have on the same result line, 2 values obtained with a different clause:
For example, let's say that I have this table:
ID |VAL
----------
0 | 1
1 | 0
2 | 0
3 | 1
4 | 0
5 | 0
I wish I could, in the same request, select the number of lines having val = 1, the number of total lines, (and if possible the total percentage of one count on the other) which would give result set like this:
nb_lines | nb_val_1 | ratio
---------------------------
6 | 2 | 0.5
I tried something like:
select count(t1.ID), (select count t2.ID
from table t2 where t2.val = 1
)
FROM table t1
But obviously, this syntax doesn't exist (and it wouldn't give me the ratio). How could I perform this request ?
Try this query which uses CASE to count only those rows we need.
SELECT nb_lines,nb_val_1,nb_val_0, nb_val_1/nb_val_0 FROM
(SELECT COUNT (t1.ID) nb_lines,
COUNT (CASE
WHEN t1.val = 1
THEN 1
ELSE NULL
END) nb_val_1,
COUNT (CASE
WHEN t1.val = 0
THEN 1
ELSE NULL
END) nb_val_0
FROM tabless t1);

SQL join problems - users betting on matches

I have the following table:
scores:
user_id | match_id | points
1 | 110 | 4
1 | 111 | 3
1 | 112 | 3
2 | 111 | 2
Users bet on matches and depending on the result of the match they are awarded with points. Depending on how accurate the bet was you are either awarded with 0, 2, 3 or 4 points for a match.
Now I want to rank the users so that i can see who is in 1st, 2nd place etc...
The ranking order is firstly by total_points. If these are equal its ordered by the amount of times a user has scored 4 points then by the amount of times a user scored 3 points and so on.
For that i would need the following table:
user_id | total_points | #_of_fours | #_of_threes | #_of_twos
1 | 10 | 1 | 2 | 0
2 | 2 | 0 | 0 | 1
But i cant figure out the join statements which would help me get it.
This is as far as i get without help:
SELECT user_id, COUNT( points ) AS #_of_fours FROM scores WHERE points = 4 GROUP BY user_id
Which results in
user_id | #_of_fours
1 | 1
2 | 0
Now i would have to do that for #_of_threes and twos aswell as total points and join it all together, but i cant figure out how.
BTW im using MySQL.
Any help would be really apreciated. Thanks in advance
SELECT user_id
, sum(points) as total_points
, sum(case when points = 4 then 1 end) AS #_of_fours
, sum(case when points = 3 then 1 end) AS #_of_threes
, sum(case when points = 2 then 1 end) AS #_of_twos
FROM scores
GROUP BY
user_id
Using mysql syntax, you can use SUM to count the matching rows easily;
SELECT
user_id,
SUM(points) AS total_points,
SUM(points=4) AS no_of_fours,
SUM(points=3) AS no_of_threes,
SUM(points=2) AS no_of_twos
FROM Table1
GROUP BY user_id;
Demo here.