Iterate over the rows of a second table to return resultset with cumulative sum

Iterate over the rows of a second table to return resultset with cumulative sum - sql

Yesterday, after the help of a SO user #
Iterate over the rows of a second table to return resultset
I was able to make a combination of rows with a selfjoin.
After some modifications, to adapt to my implementation, I faced a new challenge that I'm stuck: how to make an aggregate sum of a third column?
My issue is better explained in the image below:
Based on the code
SELECT
b1.table_a_id,
b1.label_x,
b2.label_y
FROM table_a a
INNER JOIN table_b b1
ON b1.table_a_id = a.table_a_id
INNER JOIN table_b b2
ON b2.table_a_id = b1.table_a_id AND
b2.label_y > b1.label_x
ORDER BY
b1.table_a_id,
b1.label_x,
b2.label_y;
I was able to acquire the combinations.
What should be the next step to get the cumulative sum based on a third column?
I couldn't think of a solution without using a second service, such as python with pandas, using a cumsum function.

To generate the expected resultset, you would need to join the table with itself with an inequality condition on the order column. Then, you can do a window sum:
select
t1.table_a_id,
t1.label_x,
t2.label_y,
sum(t2.value) over(
partition by t1.table_a_id, t1.label_x
order by t1."order", t2."order"
) agg_value
from
table_b t1
inner join table_b t2
on t1.table_a_id = t2.table_a_id
and t2."order" >= t1."order"
order by t1."order", t2."order"
Note: order is a reserved word, so it needs to be quoted; if you actual database column has a different name, you can remove the double quotes.
Demo on DB Fiddle:
TABLE_A_ID | LABEL_X | LABEL_Y | AGG_VALUE
---------: | :------ | :------ | --------:
1 | A | B | 1
1 | A | C | 3
1 | A | D | 6
1 | A | E | 10
1 | A | F | 15
1 | B | C | 2
1 | B | D | 5
1 | B | E | 9
1 | B | F | 14
1 | C | D | 3
1 | C | E | 7
1 | C | F | 12
1 | D | E | 4
1 | D | F | 9
1 | E | F | 5

You seem to want a cumulative sum:
SELECT b1.table_a_id, b1.label_x, b2.label_y,
SUM(b1.value) OVER (PARTITION BY b1.table_a_id, b1.label_x
ORDER BY b2.order
) as AGG_VALUE

Related

Postgres - Unique values for id column using CTE, Joins alongside GROUP BY

I have a table referrals:
id | user_id_owner | firstname | is_active | user_type | referred_at
----+---------------+-----------+-----------+-----------+-------------
3 | 2 | c | t | agent | 3
5 | 3 | e | f | customer | 5
4 | 1 | d | t | agent | 4
2 | 1 | b | f | agent | 2
1 | 1 | a | t | agent | 1
And another table activations
id | user_id_owner | referral_id | amount_earned | activated_at | app_id
----+---------------+-------------+---------------+--------------+--------
2 | 2 | 3 | 3.0 | 3 | a
4 | 1 | 1 | 6.0 | 5 | b
5 | 4 | 4 | 3.0 | 6 | c
1 | 1 | 2 | 2.0 | 2 | b
3 | 1 | 2 | 5.0 | 4 | b
6 | 1 | 2 | 7.0 | 8 | a
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Here is the query I ran:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select id, app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id )
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
Here is the result I got:
id | activations_count | amount_earned | referred_at | last_activated_at | id | best_selling_app | best_selling_app_count | best_selling_app_rank
----+-------------------+---------------+-------------+-------------------+----+------------------+------------------------+-----------------------
2 | 3 | 14.0 | 2 | 8 | 2 | b | 2 | 1
1 | 1 | 6.0 | 1 | 5 | 1 | b | 1 | 2
2 | 3 | 14.0 | 2 | 8 | 2 | a | 1 | 2
4 | 1 | 3.0 | 4 | 6 | 4 | c | 1 | 2
The problem with this result is that the table has a duplicate id of 2. I only need unique values for the id column.
I tried a workaround by harnessing distinct that gave desired result but I fear the query results may not be reliable and consistent.
Here is the workaround query:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select
distinct on(id), app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id
order by id, best_selling_app_count desc)
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
I need a recommendation on how best to achieve this.

I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Your question is really complicated with a very complicated SQL query. However, the above is what looks like the actual question. If so, you can use:
select r.*,
a.app_id as most_common_app_id,
a.cnt as most_common_app_id_count
from referrals r left join
(select distinct on (a.referral_id) a.referral_id, a.app_id, count(*) as cnt
from activations a
group by a.referral_id, a.app_id
order by a.referral_id, count(*) desc
) a
on a.referral_id = r.id;
You have not explained the other columns that are in your result set.

SQL select all rows that are not equal to an id, and replace the id column with the value - without cross join

Say I have a table like this:
+----+-------+
| id | value |
+----+-------+
| 1 | a |
| 1 | b |
| 2 | c |
| 2 | d |
| 3 | e |
| 3 | f |
+----+-------+
And I want to select all rows with id that are not a, and change their id to a; select all rows with id that are not b, and change the id to b; and select all rows with id that are not c, and change their id to c.
Here is the output I want:
+----+-------+
| id | value |
+----+-------+
| 1 | c |
| 1 | d |
| 1 | e |
| 1 | f |
| 2 | a |
| 2 | b |
| 2 | e |
| 2 | f |
| 3 | a |
| 3 | b |
| 3 | c |
| 3 | d |
+----+-------+
The only solution I can think of is through cross join and distinct:
select distinct a.id, b.value
from table a
cross join table b
where a.id != b.id
Is there any other way to avoid such expensive operation?

I think the typical way to write this is to generate all pairs of id and value and then remove the ones that exist:
select i.id, v.value
from (select distinct id from t) i cross join
(select distinct value from t) v left join
t
on t.id = i.id and t.value = i.value
where t.id is null;
First, I don't think this is what your query does. But this is what you seem to be describing.
From a performance perspective, you might have other sources for i and v that don't require subqueries. If so, use those for performance.
Finally, I don't think you can do much to improve the performance of this, apart from using explicit tables -- and perhaps having appropriate indexes on all the tables.

SQL: CROSS JOIN over table partitions

I have the following table
session_id | page_viewed
1 | A
1 | B
1 | C
2 | B
2 | E
What I would like to do is a cross join of the page_viewed column with itself but where the cross join is done on the partitions from session_id. So, from the table above the query would return:
session_id | page_1 | page_2
1 | A | A
1 | A | B
1 | A | C
1 | B | A
1 | B | B
1 | B | C
1 | C | A
1 | C | B
1 | C | C
2 | B | B
2 | B | E
2 | E | B
2 | E | E
I have looked into window functions today trying to find a way around it but it seems join functions cannot be used. Can anyone help?

You may join giving only the session_id as the join criteria:
SELECT
t1.session_id,
t1.page_viewed AS page_1,
t2.page_viewed AS page_2
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.session_id = t2.session_id;
-- ORDER BY clause optional, if you need it here
Demo

Hmmm . . . you seem to want a self-join:
select t1.session_id, t1.page_viewed as page_1, t2.page_viewed as page_2
from t t1 join
t t2
on t1.session_id = t2.session_id
order by t1.session_id, t1.page_viewed, t2.page_viewed;

SQL JOIN two table & show all rows for table A

I have a question about JOIN.
TABLE A | TABLE B |
-----------------------------------------|
PK | div | PK | div | val |
-----------------------------------------|
A | a | 1 | a | 10 |
B | b | 2 | a | 100 |
C | c | 3 | c | 9 |
------------------| 4 | c | 99 |
-----------------------
There are two tables something like above, and I have been trying to join two tables but I want to see all rows from TABLE A.
Something like
SELECT T1.PK, T1.div, T2.val
FROM A T1
LEFT OUTER JOIN B T2
ON T1.div = T2.div
and I want the result would look like this below.
PK | div | val |
-------------------------
A | a | 10 |
A | a | 100 |
B | null | null |
C | c | 9 |
C | c | 99 |
I have tried all JOINs I know but B doesn't appear because it doesn't exist. Is it possible to show all rows on TABLE A and just show null if it doesn't exists on TABLE B?
Thanks in advance!

If you change your query to
SELECT T1.PK, T2.div, T2.val
FROM A T1
LEFT OUTER JOIN B T2
ON T1.div = T2.div
(Note, that div comes from T2 here.), you'll get exactly the result posted (but maybe in a different order, add an ORDER BY clause if you want a specific order).
Your query as it stands will get you:
PK | div | val |
-------------------------
A | a | 10 |
A | a | 100 |
B | b | null |
C | c | 9 |
C | c | 99 |
(Note, that div is b for the row with the PK of B, not null.)

To get to your resultset, all you need to do is use T2.Div as that is the value that does not exist in the second table:
SELECT T1.PK, T2.div, T2.val
FROM A T1
LEFT OUTER JOIN B T2
ON T1.div = T2.div

Joining two tables based on three columns

For MS-Access, how do I accomplish following. I was thinking of writing VBA loop but I think it will take a while.
Here are the two tables:
Table A
| id | Day | Month | F_value1
---------------------------------------
| 1 | 10 | 11 | 523
| 1 | 11 | 11 | 955
| 2 | 1 | 11 | 45
| 2 | 2 | 11 | 49
Table B
| id | Day | Month | G_value1
---------------------------------------
| 1 | 10 | 11 | 19923
| 1 | 11 | 11 | 55455
| 2 | 1 | 11 | 45454
What I need:
| id | Day | Month | F_value1 | G_value1
-----------------------------------------------
| 1 | 10 | 11 | 523 | 19923
| 1 | 11 | 11 | 955 | 55455
| 2 | 1 | 11 | 45 | 45454
| 2 | 2 | 11 | 49 | Null
I tried Access Query designer but I had no luck. I'm not sure how to go about it in SQL. I already have table setup.
For programming way, I'm thinking
for each row in Table A
for each row in Table B
If TableA.fields = TableB.fields
Then Insert it into new table
End loop
End loop

You need multiple conditions for the joins. Fortunately, MS Access supports this with LEFT JOIN:
SELECT a.id, a.Day, a.Month, a.F_value1, b.G_Value1
FROM TableA as a LEFT JOIN
TableB as b
ON a.ID = b.ID AND a.day = b.day AND a.month = b.month;
You can use INSERT to insert into an existing table; INTO to create a new table. Or just run the query to get the results.

In SQL View, this should work and ideally be quicker than your suggested loop
SELECT a.*, b.G_Value1
INTO TableC
FROM TableA a
LEFT JOIN TableB b
ON a.ID=b.ID

If you need full join (ie all records of A and all records of B:
SELECT A.ID, A.Day, A.Month, A.F_value1, B.G_value1
FROM A LEFT JOIN B ON (A.Month= B.Month) AND (A.Day= B.Day) AND (A.ID = B.ID)
UNION
SELECT B.ID, B.Day, B.Month, A.F_value1, B.G_value1
FROM B LEFT JOIN A ON (B.ID = A.ID) AND (B.Day= A.Day) AND (B.Month= A.Month);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Iterate over the rows of a second table to return resultset with cumulative sum - sql

You seem to want a cumulative sum: SELECT b1.table_a_id, b1.label_x, b2.label_y, SUM(b1.value) OVER (PARTITION BY b1.table_a_id, b1.label_x ORDER BY b2.order ) as AGG_VALUE

Related

Postgres - Unique values for id column using CTE, Joins alongside GROUP BY

SQL select all rows that are not equal to an id, and replace the id column with the value - without cross join

SQL: CROSS JOIN over table partitions

SQL JOIN two table & show all rows for table A

Joining two tables based on three columns

Categories

Resources