SUM values in SQL starting from a specific point in another table - sql

I have a table that lists the index/order, the name, and the value. For example, it looks like this:
TABLE1:
ID | NAME | VALUE
1 | A | 2
2 | B | 5
3 | C | 2
4 | D | 7
5 | E | 0
Now, I have another table that has a random list of NAMEs. It'll just show either A, B, C, D, or E. Depending on what the NAME is, I wanted to calculate the SUM of all the values that it will take to get to E. Does that make sense?
So if for example, my table looks like this:
TABLE2:
NAME
D
B
A
I'd want another column next to NAME that'll show the sum. So D would have 7 because the next event is E. B would have to be the sum of 5, 2, and 7 because B is 5, and C is 2, and D is 7. And A would have the sum of 2, 5, 3, and 7 and so on.
Hopefully this is easy to understand.
I actually don't have much at all aside from joining the two tables and getting the current value of the NAME. But I wasn't sure how to increment and so on and keep adding?
SELECT T2.NAME, T1.VALUE
FROM Table1 T1
LEFT JOIN Table2 T2 ON T1.NAME = T2.NAME
Is doing this even possible? Or am I wasting my time? Should I be referring to actual code to do this? Or should I make a function?
I wasn't sure where to start and I was hoping someone could help me out.
Thank you in advance!

The query is in two parts; this is hard to see at first, so I'll walk through each step.
Step 1: Obtain the rolling sum
Join table1 to itself for any letters greater than itself:
select *
from table1 t1
inner join table1 t2 on t2.name >= t1.name
order by t1.name
This produces the following table
+ -- + ---- + ----- + -- + ---- + ----- +
| id | name | value | id | name | value |
+ -- + ---- + ----- + -- + ---- + ----- +
| 1 | A | 2 | 1 | A | 2 |
| 1 | A | 2 | 2 | B | 5 |
| 1 | A | 2 | 3 | C | 2 |
| 1 | A | 2 | 4 | D | 7 |
| 1 | A | 2 | 5 | E | 0 |
| 2 | B | 5 | 2 | B | 5 |
| 2 | B | 5 | 3 | C | 2 |
| 2 | B | 5 | 4 | D | 7 |
| 2 | B | 5 | 5 | E | 0 |
| 3 | C | 2 | 3 | C | 2 |
| 3 | C | 2 | 4 | D | 7 |
| 3 | C | 2 | 5 | E | 0 |
| 4 | D | 7 | 4 | D | 7 |
| 4 | D | 7 | 5 | E | 0 |
| 5 | E | 0 | 5 | E | 0 |
+ -- + ---- + ----- + -- + ---- + ----- +
Notice that if we group by the name from t1, we can get the rolling sum by summing the values from t2. This query
select t1.name,
SUM(t2.value) as SumToE
from table1 t1
inner join table1 t2
on t2.name >= t1.name
group by t1.name
gives us the rolling sums we want
+ ---- + ------ +
| name | sumToE |
+ ---- + ------ +
| A | 16 |
| B | 14 |
| C | 9 |
| D | 7 |
| E | 0 |
+ ---- + ------ +
Note: This is equivalent to using a windowed function that sums over a set, but it is much easier to visually see what you're doing via this joining technique.
Step 2: Join the rolling sum
Now that you have this rolling sum for each letter, you simply join it to table2 for the letters you want
select t1.*
from table2 t2
inner join (
select t1.name,
SUM(t2.value) as SumToE
from table1 t1
inner join table1 t2
on t2.name >= t1.name
group by t1.name
) t1 on t1.name = t2.name
Result:
+ ---- + ------ +
| name | sumToE |
+ ---- + ------ +
| A | 16 |
| B | 14 |
| D | 7 |
+ ---- + ------ +

As gregory suggests, you can do this with a simple windowed function, which (in this case) will sum up all the rows after and including the current one based on the ID value. Obviously there are a number of different ways in which you can slice your data, though I'll leave that up to you to explore :)
declare #t table(ID int,Name nvarchar(50),Val int);
insert into #t values(1,'A',2),(2,'B',5),(3,'C',2),(4,'D',7),(5,'E',0);
select ID -- The desc makes the preceding work the right way. This is
,Name -- essentially shorthand for "sum(Val) over (order by ID rows between current row and unbounded following)"
,Val -- which is functionally the same, but a lot more typing...
,sum(Val) over (order by ID desc rows unbounded preceding) as s
from #t
order by ID;
Which will output:
+----+------+-----+----+
| ID | Name | Val | s |
+----+------+-----+----+
| 1 | A | 2 | 16 |
| 2 | B | 5 | 14 |
| 3 | C | 2 | 9 |
| 4 | D | 7 | 7 |
| 5 | E | 0 | 0 |
+----+------+-----+----+

CREATE TABLE #tempTable2(name VARCHAR(1))
INSERT INTO #tempTable2(name)
VALUES('D')
INSERT INTO #tempTable2(name)
VALUES('B')
INSERT INTO #tempTable2(name)
VALUES('A')
CREATE TABLE #tempTable(id INT, name VARCHAR(1), value INT)
INSERT INTO #temptable(id,name,value)
VALUES(1,'A',2)
INSERT INTO #temptable(id,name,value)
VALUES(2,'B',5)
INSERT INTO #temptable(id,name,value)
VALUES(3,'C',2)
INSERT INTO #temptable(id,name,value)
VALUES(4,'D',7)
INSERT INTO #temptable(id,name,value)
VALUES(5,'E',0)
;WITH x AS
(
SELECT id, value, name, RunningTotal = value
FROM dbo.#temptable
WHERE id = (SELECT MAX(id) FROM #temptable)
UNION ALL
SELECT y.id, y.value, y.name, x.RunningTotal + y.value
FROM x
INNER JOIN dbo.#temptable AS y ON
y.id = x.id - 1
)
SELECT x.id, x.value, x.name, x.RunningTotal
FROM x
JOIN #tempTable2 t2 ON
x.name = t2.name
ORDER BY x.id
DROP TABLE #tempTable
DROP TABLE #tempTable2

Related

Take the row after the specific row

I have the table, where I need to take the next row after the row which has course 'TA' and flag = 1. For this I created the column rnum (OVER DATE) which may help for finding it
| student | date | course | flag | rnum |
| ------- | ----- | ----------- | ---- | ---- |
| 1 | 17:00 | Math | null | 1 |
| 1 | 17:10 | Python | null | 2 |
| 1 | 17:15 | TA | 1 | 3 |
| 1 | 17:20 | English | null | 4 |
| 1 | 17:35 | Geography | null | 5 |
| 2 | 16:10 | English | null | 1 |
| 2 | 16:20 | TA | 1 | 2 |
| 2 | 16:30 | SQL | null | 3 |
| 2 | 16:40 | Python | null | 4 |
| 3 | 19:05 | English | null | 1 |
| 3 | 19:20 | Literachure | null | 2 |
| 3 | 19:30 | TA | null | 3 |
| 3 | 19:40 | Python | null | 4 |
| 3 | 19:50 | Python | null | 5 |
As a result I should have:
| student | date | course | flag | rnum |
| ------- | ----- | ------- | ---- | ---- |
| 1 | 17:20 | English | null | 4 |
| 2 | 16:30 | SQL | null | 3 |
There are many ways to get your desired result, let's see some of them.
1) EXISTS
You can use the EXISTS clause, specifying a subquery to match for the condition.
SELECT T2.*
FROM #MyTable T2
WHERE EXISTS (
SELECT 'x' x
FROM #MyTable T1
WHERE T1.course = 'TA' AND T1.flag = 1
AND T1.student = T2.student AND T2.rnum = T1.rnum + 1
)
2) LAG
You ca use window function LAG to access previous row for a given order and then filter your resultset with your conditions.
SELECT w.student, w.date, w.course, w.flag, w.rnum
FROM (
SELECT T1.*
, LAG(course, 1) OVER (PARTITION BY student ORDER BY rnum) prevCourse
, LAG(flag, 1) OVER (PARTITION BY student ORDER BY rnum) prevFlag
FROM #MyTable T1
) w
WHERE prevCourse = 'TA' AND prevFlag = 1
3) JOIN
You can self-JOIN your table on the next rnum and keep only the rows who match the right condition.
SELECT T2.*
FROM MyTable T1
JOIN MyTable T2 ON T1.student = T2.student AND T2.rnum = T1.rnum + 1
WHERE T1.course = 'TA' AND T1.flag = 1
4) CROSS APPLY
You can use CROSS APPLY to specify a subquery with the matching condition. It is pretty similar to EXISTS clause, but you will also get in your resultset the columns from the subquery.
SELECT T2.*
FROM #MyTable T2
CROSS APPLY (
SELECT 'x' x
FROM #MyTable T1
WHERE T1.course = 'TA' AND T1.flag = 1
AND T1.student = T2.student AND T2.rnum = T1.rnum + 1
) x
5) CTE
You can use common table expression (CTE) to extract matching rows and then use it to filter your table with a JOIN.
;WITH
T1 AS (
SELECT student, rnum
FROM #MyTable T1
WHERE T1.course = 'TA' AND T1.flag = 1
)
SELECT T2.*
FROM #MyTable T2
JOIN T1 ON T1.student = T2.student AND T2.rnum = T1.rnum + 1
Adding the rownumber was a good start, you can use it to join the table with itself:
WITH matches AS (
SELECT
student,
rnum
FROM table
WHERE flag = 1
AND course = 'TA'
)
SELECT t.*
FROM table t
JOIN matches m
on t.student = m.student
and t.rnum = m.rnum + 1

Postgres - Unique values for id column using CTE, Joins alongside GROUP BY

I have a table referrals:
id | user_id_owner | firstname | is_active | user_type | referred_at
----+---------------+-----------+-----------+-----------+-------------
3 | 2 | c | t | agent | 3
5 | 3 | e | f | customer | 5
4 | 1 | d | t | agent | 4
2 | 1 | b | f | agent | 2
1 | 1 | a | t | agent | 1
And another table activations
id | user_id_owner | referral_id | amount_earned | activated_at | app_id
----+---------------+-------------+---------------+--------------+--------
2 | 2 | 3 | 3.0 | 3 | a
4 | 1 | 1 | 6.0 | 5 | b
5 | 4 | 4 | 3.0 | 6 | c
1 | 1 | 2 | 2.0 | 2 | b
3 | 1 | 2 | 5.0 | 4 | b
6 | 1 | 2 | 7.0 | 8 | a
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Here is the query I ran:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select id, app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id )
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
Here is the result I got:
id | activations_count | amount_earned | referred_at | last_activated_at | id | best_selling_app | best_selling_app_count | best_selling_app_rank
----+-------------------+---------------+-------------+-------------------+----+------------------+------------------------+-----------------------
2 | 3 | 14.0 | 2 | 8 | 2 | b | 2 | 1
1 | 1 | 6.0 | 1 | 5 | 1 | b | 1 | 2
2 | 3 | 14.0 | 2 | 8 | 2 | a | 1 | 2
4 | 1 | 3.0 | 4 | 6 | 4 | c | 1 | 2
The problem with this result is that the table has a duplicate id of 2. I only need unique values for the id column.
I tried a workaround by harnessing distinct that gave desired result but I fear the query results may not be reliable and consistent.
Here is the workaround query:
with agents
as
(select
referrals.id,
referral_id,
amount_earned,
referred_at,
activated_at,
activations.app_id
from referrals
left outer join activations
on (referrals.id = activations.referral_id)
where referrals.user_id_owner = 1),
distinct_referrals_by_id
as
(select
id,
count(referral_id) as activations_count,
sum(coalesce(amount_earned, 0)) as amount_earned,
referred_at,
max(activated_at) as last_activated_at
from
agents
group by id, referred_at),
distinct_referrals_by_app_id
as
(select
distinct on(id), app_id as best_selling_app,
count(app_id) as best_selling_app_count
from agents
group by id, app_id
order by id, best_selling_app_count desc)
select *, dense_rank() over (order by best_selling_app_count desc) best_selling_app_rank
from distinct_referrals_by_id
inner join distinct_referrals_by_app_id
on (distinct_referrals_by_id.id = distinct_referrals_by_app_id.id);
I need a recommendation on how best to achieve this.
I am trying to generate another table from the two tables that has only unique values for referrals.id and returns as one of the columns the count for each apps as best_selling_app_count.
Your question is really complicated with a very complicated SQL query. However, the above is what looks like the actual question. If so, you can use:
select r.*,
a.app_id as most_common_app_id,
a.cnt as most_common_app_id_count
from referrals r left join
(select distinct on (a.referral_id) a.referral_id, a.app_id, count(*) as cnt
from activations a
group by a.referral_id, a.app_id
order by a.referral_id, count(*) desc
) a
on a.referral_id = r.id;
You have not explained the other columns that are in your result set.

Divide Sequential Records

I have a table in MS Access like:
table
+-----+-----+-----+
| 1st | 2nd | 3rd |
+-----+-----+-----+
| A | 1 | 100 |
| A | 2 | 200 |
| A | 3 | 300 |
| B | 1 | 100 |
| B | 2 | 200 |
| B | 3 | 300 |
| C | 1 | 100 |
| C | 2 | 200 |
| C | 3 | 300 |
+-----+-----+-----+
Now I want to read the values from the 3rd column, do some sort of manipulation to it and store them in to another table like:
summary
+-----+---------+---------+
| 1st | 2nd | 3rd |
+-----+---------+---------+
| A | 100/200 | 200/300 |
| B | 100/200 | 200/300 |
| C | 100/200 | 200/300 |
+-----+---------+---------+
In another words, for summary.2nd this means:
select table.3rd FROM table where table.1st = A AND table.2nd = 1
divided by
select table.3rd FROM table where table.1st = A AND table.2nd = 3
Can someone give me a hint how this could be done?
Maybe VBA / ADO Recordset etc?
One method is conditional aggregation:
select [1st],
max(iif([2nd] = 1, [3rd], null)) / max(iif([2nd] = 2, [3rd], null)) as [2nd],
max(iif([2nd] = 2, [3rd], null)) / max(iif([2nd] = 3, [3rd], null)) as [3rd]
from t
group by [1st];
Try this SQL
INSERT INTO Summary
SELECT DISTINCT a.[1st],
a.[3rd] / b.[3rd] AS [2nd],
a.[3rd] / c.[3rd] AS [3rd]
FROM ((tbl AS a
INNER JOIN tbl AS b
ON a.[1st] = b.[1st])
INNER JOIN tbl AS c
ON a.[1st] = c.[1st] )
WHERE a.[2nd] = 1
AND b.[2nd] = 2
AND c.[2nd] = 3
Here's another alternative, using calculated join criteria:
select
t1.[1st],
t1.[3rd]/t2.[3rd] as [2nd],
t2.[3rd]/t3.[3rd] as [3rd]
from
(
[table] t1 inner join [table] t2
on t1.[1st] = t2.[1st] and t1.[2nd] = t2.[2nd]-1
)
inner join [table] t3
on t1.[1st] = t3.[1st] and t1.[2nd] = t3.[2nd]-2
Since the 2nd column values 1, 2 & 3 are not hard-coded, this is applicable to any three integers in the 2nd column whose values differ sequentially by one.
Change [table] to the name of your table.

Ranking by partition in visual foxpro

I have the following table that looks like
+ --- + --- +
| AID | Tag |
+ --- + --- +
| 1 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 820 |
| 2 | 821 |
| 3 | 2 |
| 4 | 5 |
| 4 | 18 |
| 4 | 2744|
| 4 | 2745|
+ --- + --- +
When I write the following SQL Server 2008 code
select AID,
Tag,
RANK() over (partition by AID order by Tag asc) as rank
from My_Table
it produces the following results
+ --- + --- + ---- +
| AID | Tag | Rank |
+ --- + --- + ---- +
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 2 | 3 | 2 |
| 2 | 820 | 3 |
| 2 | 821 | 4 |
| 3 | 2 | 1 |
| 4 | 5 | 1 |
| 4 | 18 | 2 |
| 4 | 2744| 3 |
| 4 | 2745| 4 |
+ --- + --- + ---- +
which is exactly what I want.
Now, I want to write the same thing in Visual FoxPro 9 SQL. I tried it using recno() as demonstrated here; this numbers my records, but doesn't seem to support the ability to partition, and correlated subqueries don't seem to be supported in VFP 9 SQL. I know that I could do this with cursors and scans, but I don't want to do it that way. Any suggestions?
In VFP there is not a rank() function. However, you can achieve the same effect in a number of ways. One way is simple scan...endscan pass updating the ranking value as in the following example:
*** Sample Data
Create Cursor mytable ( AID Int, Tag Int)
Insert Into mytable Values (1,1 )
Insert Into mytable Values (2,2 )
Insert Into mytable Values (2,3 )
Insert Into mytable Values (2,820 )
Insert Into mytable Values (2,821 )
Insert Into mytable Values (3,2 )
Insert Into mytable Values (4,5 )
Insert Into mytable Values (4,18 )
Insert Into mytable Values (4,2744)
Insert Into mytable Values (4,2745)
*** Sample Data
Select AID, Tag, Cast(0 As Int) As rank ;
from mytable ;
order By AID, Tag ;
into Cursor crsRanked ;
readwrite
Scan
AID = AID
rcno = Recno()
Replace rank With Recno()-m.rcno+1 While AID = m.AID
Skip -1
Endscan
Locate
Browse
EDIT: Yesterday I overlooked how MS SQL server's RANK() function work, sorry. Here is one that work like MS SQL Server's Rank(), Dense_Rank(), Row_number():
Create Cursor mytable ( AID Int, Tag Int)
Insert Into mytable Values (1,1 )
Insert Into mytable Values (2,2 )
Insert Into mytable Values (2,3 )
Insert Into mytable Values (2,820 )
Insert Into mytable Values (2,821 )
Insert Into mytable Values (3,2 )
Insert Into mytable Values (4,5 )
Insert Into mytable Values (4,18 )
Insert Into mytable Values (4,18 )
Insert Into mytable Values (4,18 )
Insert Into mytable Values (4,2744)
Insert Into mytable Values (4,2745)
Select AID, Tag, ;
Cast(0 As Int) As rownum, ;
Cast(0 As Int) As rank, ;
Cast(0 As Int) As denserank ;
from mytable ;
order By AID, Tag ;
into Cursor crsRanked ;
readwrite
Local AID,rank,denserank,nextrank,rcno
Scan
AID = AID
rank = 0
nextrank = 0
denserank = 0
rcno = Recno()
Scan While m.AID = AID
Tag = Tag
rank = nextrank + 1
denserank = m.denserank + 1
Replace ;
rank With m.rank, ;
denserank With m.denserank, ;
rownum With Recno()-m.rcno+1 ;
While AID = m.AID And Tag = m.Tag
nextrank = m.nextrank + _Tally
Skip -1
Endscan
Skip -1
Endscan
Locate
Browse
I discovered the answer, for anyone who cares to know. The following SQL code is supported in Visual FoxPro 9.0 and will do what we want.
select t1.aid, ;
t1.tag, ;
count(*) as rank ;
from my_table t1 ;
inner join my_table t2 ;
on t2.aid = t1.aid ;
and t2.tag <= t1.tag ;
group by t1.aid, t1.tag
To see why, let's take a closer look at the inner join by leaving out the aggregate and including the tags from t2.
select t1.aid, ;
t1.tag, ;
t2.tag ;
from my_table t1 ;
inner join my_table t2 ;
on t2.aid = t1.aid ;
and t2.tag <= t1.tag ;
order by t1.aid, t1.tag
This code produces a table like
+ --- + ---- + ---- +
| AID | Tag1 | Tag2 |
+ --- + ---- + ---- +
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 2 | 3 | 2 |
| 2 | 3 | 3 |
| 2 | 820 | 2 |
| 2 | 820 | 3 |
| 2 | 820 | 820 |
| 2 | 821 | 2 |
| 2 | 821 | 3 |
| 2 | 821 | 820 |
| 2 | 821 | 821 |
| 3 | 2 | 2 |
| 4 | 5 | 5 |
| 4 | 18 | 5 |
| 4 | 18 | 18 |
| 4 | 2744 | 5 |
| 4 | 2744 | 18 |
| 4 | 2744 | 2744 |
| 4 | 2745 | 5 |
| 4 | 2745 | 18 |
| 4 | 2745 | 2744 |
| 4 | 2745 | 2745 |
+ --- + ---- + ---- +
We don't actually care about the data in Tag2, but now we can clearly see that the rank is the count of the Tag1 grouped by Aid and Tag1.

Select from cross-reference based on inclusion (column values being superset)

Given a cross-reference table t relating table a with b:
| id | a_id | b_id |
--------------------
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 7 |
| 5 | 2 | 3 |
| 6 | 3 | 2 |
| 7 | 3 | 3 |
What would be the conventional way of selecting all a_id whose b_id is a superset of a given set?
For example, for the set (2,3), I would expect the result:
| a_id |
--------
| 1 |
| 3 |
Since a_id 1 and 3 are the only set of b_id that is a superset of (2,3).
The best solution I've found so far (thanks to this answer):
select id
from a
where 2 = (select count(*)
from t
where t.a_id = a.id and t.b_id in (2,3)
);
But I'd prefer to avoid calculating stuff like cardinality before running the query.
You can simply adapt the query as:
select id
from a cross join
(select count(*) as cnt
from t
where . . .
) x
where x.cnt = (select count(*)
from t
where t.a_id = a.id and t.b_id in (2,3)
);