Running Totals again. No over clause, no cursor, but increasing order - sql

I am still having trouble creating an running total based on the increasing order of the value. Row id has no real meaning, it is just the PK. My server doesn't support OVER.
Row Value
1 3
2 7
3 1
4 2
Result:
Row Value
3 1
4 3
1 6
2 13
I have tried self and cross joins where I specify that the value of the second amount(the one being summed up) is less than the current value of the first. I have also tried doing this with the having clause but that always threw an error when I tried it that way. Can someone explain why it would be wrong to use it in that manner and how I should be doing it?

Here is one way to do a running total:
select row, value,
(select sum(value) from t t2 where t2.value <= t.value) as runningTotal
from t

you can use the with rollup command if you have sql server 2008.
select sum(value) from t t2 where t2.value <= t.value with rollup

If your platform supports recursive queries(IIRC you should omit the RECURSIVE keyword for microsoft stuff). Because the CTE needs to estimate the begin/end of a "chain", unfortunately, the tuples need to be ordered in some way (I use the "row" field; an internal tuple-id would be perfect for this purpose):
WITH RECURSIVE sums AS (
-- Terminal part
SELECT d0.row
, d0.value AS value
, d0.value AS runsum
FROM data d0
WHERE NOT EXISTS (
SELECT * FROM data nx
WHERE nx.row < d0.row
)
UNION
-- Recursive part
SELECT t1.row AS row
, t1.value AS value
, t0.runsum + t1.value AS runsum
FROM data t1
, sums t0
WHERE t1.row > t0.row
AND NOT EXISTS (
SELECT * FROM data nx
WHERE nx.row > t0.row
AND nx.row < t1.row
)
)
SELECT * FROM sums
;
RESULT:
row | value | runsum
-----+-------+--------
1 | 3 | 3
2 | 7 | 10
3 | 1 | 11
4 | 2 | 13
(4 rows)

Related

SQL Server (terminal result) hierarchy map

In SQL Server 2016, I have a table with the following chaining structure:
dbo.Item
OriginalItem
ItemID
NULL
7
1
2
NULL
1
5
6
3
4
NULL
8
NULL
5
9
11
2
3
EDIT NOTE: Bold numbers were added as a response to #lemon comments below
Importantly, this example is a trivialized version of the real data, and the neatly ascending entries is not something that is present in the actual data, I'm just doing that to simplify the understanding.
I've constructed a query to get what I'm calling the TerminalItemID, which in this example case is ItemID 4, 6, and 7, and populated that into a temporary table #TerminalItems, the resultset of which would look like:
#TerminalItems
TerminalItemID
4
6
7
8
11
What I need, is a final mapping table that would look something like this (using the above example -- note that it also contains for 4, 6, and 7 mapping to themselves, this is needed by the business logic):
#Mapping
ItemID
TerminalItemID
1
4
2
4
3
4
4
4
5
6
6
6
7
7
8
8
9
11
11
11
What I need help with is how to build this last #Mapping table. Any assistance in this direction is greatly appreciated!
This should do:
with MyTbl as (
select *
from (values
(NULL, 1 )
,(1, 2 )
,(2, 3 )
,(3, 4 )
,(NULL, 5 )
,(5, 6 )
,(NULL, 7 )
) T(OriginalItem, ItemID)
)
, TerminalItems as (
/* Find all leaf level items: those not appearing under OriginalItem column */
select LeafItem=ItemId, ImmediateOriginalItem=M.OriginalItem
from MyTbl M
where M.ItemId not in
(select distinct OriginalItem
from MyTbl AllParn
where OriginalItem is not null
)
), AllLevels as (
/* Use a recursive CTE to find and report all parents */
select ThisItem=LeafItem, ParentItem=ImmediateOriginalItem
from TerminalItems
union all
select ThisItem=AL.ThisItem, M.OriginalItem
from AllLevels AL
inner join
MyTbl M
on M.ItemId=AL.ParentItem
)
select ItemId=coalesce(ParentItem,ThisItem), TerminalItemId=ThisItem
from AllLevels
order by 1,2
Beware of the MAXRECURSION setting; by default SQLServer iterates through recursion 100 times; this would mean that the depth of your tree can be 100, max (the maximum number of nodes between a terminal item and its ultimate original item). This can be increased by OPTION(MAXRECURSION nnn) where nnn can be adjusted as needed. It can also be removed entirely by using 0 but this is not recommended because your data can cause infinite loops.
This is a typical gaps-and-islands problem and can also be carried out without recursion in three steps:
assign 1 at the beginning of each partition
compute a running sum over your flag value (generated at step 1)
extract the max "ItemID" on your partition (generated at step 2)
WITH cte1 AS (
SELECT *, CASE WHEN OriginalItem IS NULL THEN 1 ELSE 0 END AS changepartition
FROM Item
), cte2 AS (
SELECT *, SUM(changepartition) OVER(ORDER BY ItemID) AS parts
FROM cte1
)
SELECT ItemID, MAX(ItemID) OVER(PARTITION BY parts) AS TerminalItemID
FROM cte2
Check the demo here.
Assumption: Your terminal id items correspond to the "ItemID" value preceding a NULL "OriginalItem" value.
EDIT: "Fixing orphaned records."
The query works correctly when records are not orphaned. The only way to deal them, is to get missing records back, so that the query can work correctly on the full data.
This is carried out by an extra subquery (done at the beginning), that will apply a UNION ALL between:
the available records of the original table
the missing records
WITH fix_orphaned_records AS(
SELECT * FROM Item
UNION ALL
SELECT NULL AS OriginalItem,
i1.OriginalItem AS ItemID
FROM Item i1
LEFT JOIN Item i2 ON i1.OriginalItem = i2.ItemID
WHERE i1.OriginalItem IS NOT NULL AND i2.ItemID IS NULL
), cte AS (
...
Missing records correspond to "OriginalItem" values that are never found within the "ItemID" field. A self left join will uncover these missing records.
Check the demo here.
You can use a recursive CTE to compute the last item in the sequence. For example:
with
n (orig_id, curr_id, lvl) as (
select itemid, itemid, 1 from item
union all
select n.orig_id, i.itemid, n.lvl + 1
from n
join item i on i.originalitem = n.curr_id
)
select *
from (
select *, row_number() over(partition by orig_id order by lvl desc) as rn from n
) x
where rn = 1
Result:
orig_id curr_id lvl rn
-------- -------- ---- --
1 4 4 1
2 4 3 1
3 4 2 1
4 4 1 1
5 6 2 1
6 6 1 1
7 7 1 1
See running example at db<>fiddle.

Access Top N Query where N is given in another table

I have two tables in MS SQL Server. Table2 has the following:
TaskId TopN
1 2
2 3
3 1
Table1 has the following:
TaskId TopN Value
1 2 12
1 2 12
1 2 12
2 3 1
2 3 1
2 3 5
2 3 12
2 3 8
2 3 5
I want to be able to select the top N records based on the TopN field in table2 (which is the same TopN value found in table1, so maybe I don't even need to bother using two tables). The desired output should be as follows:
TaskId TopN Value
1 2 12
1 2 12
2 3 12
2 3 8
2 3 5
I have tried the below SQL statement, but it skips TaskId=1. Any idea of what I am doing wrong?
SELECT DISTINCT T1.TaskId,
T1.TopN,
T1.values
FROM Table1 T1 INNER JOIN Table1 T2 ON
T1.TaskId = T2.TaskId AND
T1.TopN = T2.TopN AND
T1.Value <= T2.Value
GROUP BY T1.TaskId,
T1.TopN,
T1.Value
HAVING COUNT(*) <= (
SELECT TopN
FROM table2
WHERE table2.TaskID = T1.TaskId
)
Please note that in the question you have named Table2 as the one which has the fields - TaskId, TopN, Values however in your query you have used the opposite. Assuming Table2 is the one which has the details, you can use the query below to get the desired result. You would not need to use the other table (Table1 - as per the question) which has just the task_id and topN since all the info is already present in Table2.
Select Taskid, TopN, Values
from
(Select T1.*, row_number() over(partition by Taskid order by Values desc) As rnk
from Table2 T1) Tb
where Tb.TopN >= Tb.rnk;
** Fixed the typo in the code (changed to >= instead of <=), it should work fine now.
The problem is that you have three rows with the same values -- and 3 > 2. That is, the subquery returns "3" which is not less than "2". In SQL Server, you would do this much more simply using row_number().
If you are using MS Access, you need a column that distinguishes the rows.
EDIT:
In SQL Server, you would use:
select t1.*
from (select t1.*,
row_number() over (partition by taskid order by value desc) as seqnum
from table1 t1
) t1
where t1.seqnum <= t1.topn;

Running "distinct on" across all unique thresholds in a postgres table

I have a Postgres 11 table called sample_a that looks like this:
time | cat | val
------+-----+-----
1 | 1 | 5
1 | 2 | 4
2 | 1 | 6
3 | 1 | 9
4 | 3 | 2
I would like to create a query that for each unique timestep, gets the most recent values across each category at or before that timestep, and aggregates these values by taking the sum of these values and dividing by the count of these values.
I believe I have the query to do this for a given timestep. For example, for time 3 I can run the following query:
select sum(val)::numeric / count(val) as result from (
select distinct on (cat) * from sample_a where time <= 3 order by cat, time desc
) x;
and get 6.5. (This is because at time 3, the latest from category 1 is 9 and the latest from category 2 is 4. The count of the values are 2, and they sum up to 13, and 13 / 2 is 6.5.)
However, I would ideally like to run a query that will give me all the results for each unique time in the table. The output of this new query would look as follows:
time | result
------+----------
1 | 4.5
2 | 5
3 | 6.5
4 | 5
This new query ideally would avoid adding another subselect clause if possible; an efficient query would be preferred. I could get these prior results by running the prior query inside my application for each timestep, but this doesn't seem efficient for a large sample_a.
What would this new query look like?
See if performance is acceptable this way. Syntax might need minor tweaks:
select t.time, avg(mr.val) as result
from (select distinct time from sample_a) t,
lateral (
select distinct on (cat) val
from sample_a a
where a.time <= t.time
order by a.cat, a.time desc
) mr
group by t.time
I think you just want cumulative functions:
select time,
sum(sum(val)) over (order by time) / sum(sum(num_val)) over (order by time) as result
from (select time, sum(val) as sum_val, count(*) as num_val
from sample_a a
group by time
) a;
Note if val is an integer, you might need to convert to a numeric to get fractional values.
This can be expressed without a subquery as well:
select time,
sum(sum(val)) over (order by time) / sum(count(*)) over (order by time) as result
from sample_a
group by time

Oracle SQL - display values up to current record

Can I use LISTAGG or a similar analytical function in Oracle SQL to display all values in group up to current record?
This is my table:
id group_id value
-- -------- -----
1 1 A
2 1 B
3 1 C
4 2 X
5 2 Y
6 2 Z
I would like the following result:
id group_id values
-- -------- ------
1 1 A
2 1 AB
3 1 ABC
4 2 X
5 2 XY
6 2 XYZ
Here is one option, using a correlated subquery to handle the rollup of the value column:
SELECT
t1.id,
t1.group_id,
(SELECT LISTAGG(t2.val, '') WITHIN GROUP (ORDER BY t2.id)
FROM yourTable t2
WHERE t1.group_id = t2.group_id AND t2.id <= t1.id) AS vals
FROM yourTable t1
ORDER BY
t1.id;
Demo
The logic here is that, for each group, with rollup a concatenation of all values coming at or before the current id value in a given row.
Another approach to this, one which might perform and scale better, would be to use a recursive CTE. But, that would take more code, and might be harder to digest than what I wrote above.

Cumulative count of duplicates

For a table looking like
ID | Value
-------------
1 | 2
2 | 10
3 | 3
4 | 2
5 | 0
6 | 3
7 | 3
I would like to calculate the number of IDs with a higher Value, for each Value that appears in the table, i.e.
Value | Position
----------------
10 | 0
3 | 1
2 | 4
0 | 6
This equates to the offset of the Value in a ORDER BY Value ordering.
I have considered doing this by calculating the number of duplicates with something like
SELECT Value, count(*) AS ct FROM table GROUP BY Value";
And then cumulating the result, but I guess that is not the optimal way to do it (nor have I managed to combine the commands accordingly)
How would one go about calculating this efficiently (for several dozens of thousands of rows)?
This seems like a perfect opportunity for the window function rank() (not the related dense_rank()):
SELECT DISTINCT ON (value)
value, rank() OVER (ORDER BY value DESC) - 1 AS position
FROM tbl
ORDER BY value DESC;
rank() starts with 1, while your count starts with 0, so subtract 1.
Adding a DISTINCT step (DISTINCT ON is slightly cheaper here) to remove duplicate rows (after computing counting ranks). DISTINCT is applied after window functions. Details in this related answer:
Best way to get result count before LIMIT was applied
Result exactly as requested.
An index on value will help performance.
SQL Fiddle.
You might also try this if you're not comfortable with window functions:
SELECT t1.value, COUNT(DISTINCT t2.id) AS position
FROM tbl t1 LEFT OUTER JOIN tbl t2
ON t1.value < t2.value
GROUP BY t1.value
Note the self-join.