Select latest available value SQL - sql

Below is a test table for simplification of what I am looking to achieve in a query. I am attempting to create a query using a running sum which inserts into column b that last sum result that was not null. If you can imagine, i'm looking to have a cumulative sum the purchases of a customer every day, some days no purchases occurs for a particular customer thus I want to display the latest sum for that particular customer instead of 0/null.
CREATE TABLE test (a int, b int);
insert into test values (1,null);
insert into test values (2,1);
insert into test values (3,3);
insert into test values (4,null);
insert into test values (5,5);
insert into test values (6,null);
1- select sum(coalesce(b,0)),coalesce(0,sum(b)) from test
2- select a, sum(coalesce(b,0)) from test group by a order by a asc
3- select a, sum(b) over (order by a asc rows between unbounded preceding and current row) from test group by a,b order by a asc
I'm not sure if my interpretation of how coalesce works is correct. I thought this sum(coalesce(b,0)) will insert 0 where b is null and always take the latest cumulative sum of column b.
Think I may have solved it with query 3.
The result I expect will look like this:
a | sum
--------
1
2 1
3 4
4 4
5 9
6 9
Each records of a displays the last cumulative sum of column b.
Any direction would be of valuable.
Thanks

In Postgres you can also use the window function of SUM for a cummulative sum.
Example:
create table test (a int, b int);
insert into test (a,b) values (1,null),(2,1),(3,3),(4,null),(5,5),(6,null);
select a, sum(b) over (order by a, b) as "sum"
from test;
a | sum
-- | ----
1 | null
2 | 1
3 | 4
4 | 4
5 | 9
6 | 9
db<>fiddle here
And if "a" isn't unique, but you want to group on a?
Then you could use a suminception:
select a, sum(sum(b)) over (order by a) as "sum"
from test
group by a

Related

Running "distinct on" across all unique thresholds in a postgres table

I have a Postgres 11 table called sample_a that looks like this:
time | cat | val
------+-----+-----
1 | 1 | 5
1 | 2 | 4
2 | 1 | 6
3 | 1 | 9
4 | 3 | 2
I would like to create a query that for each unique timestep, gets the most recent values across each category at or before that timestep, and aggregates these values by taking the sum of these values and dividing by the count of these values.
I believe I have the query to do this for a given timestep. For example, for time 3 I can run the following query:
select sum(val)::numeric / count(val) as result from (
select distinct on (cat) * from sample_a where time <= 3 order by cat, time desc
) x;
and get 6.5. (This is because at time 3, the latest from category 1 is 9 and the latest from category 2 is 4. The count of the values are 2, and they sum up to 13, and 13 / 2 is 6.5.)
However, I would ideally like to run a query that will give me all the results for each unique time in the table. The output of this new query would look as follows:
time | result
------+----------
1 | 4.5
2 | 5
3 | 6.5
4 | 5
This new query ideally would avoid adding another subselect clause if possible; an efficient query would be preferred. I could get these prior results by running the prior query inside my application for each timestep, but this doesn't seem efficient for a large sample_a.
What would this new query look like?
See if performance is acceptable this way. Syntax might need minor tweaks:
select t.time, avg(mr.val) as result
from (select distinct time from sample_a) t,
lateral (
select distinct on (cat) val
from sample_a a
where a.time <= t.time
order by a.cat, a.time desc
) mr
group by t.time
I think you just want cumulative functions:
select time,
sum(sum(val)) over (order by time) / sum(sum(num_val)) over (order by time) as result
from (select time, sum(val) as sum_val, count(*) as num_val
from sample_a a
group by time
) a;
Note if val is an integer, you might need to convert to a numeric to get fractional values.
This can be expressed without a subquery as well:
select time,
sum(sum(val)) over (order by time) / sum(count(*)) over (order by time) as result
from sample_a
group by time

How to add two values of the same column in a table

Consider the following table?
ID COL VALUE
1 A 10
2 B 10
3 C 10
4 D 10
5 E 10
Output:
ID COL VALUE
1 A 10
2 B 20
3 C 30
4 D 40
5 E 50
Based on your (deleted) comment in output it is taking up the sum of the upper values, it sounds like you're wanting a cumulative SUM().
You can do this with a windowed function:
Select Id, Col, Sum(Value) Over (Order By Id) As Value
From YourTable
Output
Id Col Value
1 A 10
2 B 20
3 C 30
4 D 40
5 E 50
Please make use of the the below code to obtain the cumulative sum. The code is working as expected with SQL Server 2012.
DECLARE #Table TABLE (ID int, COL CHAR(2), VALUE int)
INSERT #Table
(ID,COL,[VALUE])
VALUES
(1,'A',10),
(2,'B',10),
(3,'C',10),
(4,'D',10),
(5,'E',10)
SELECT t.ID,t.COL,SUM(VALUE) OVER (ORDER BY t.ID) AS VALUE
FROM #Table t
Not really sure what you are asking for. If my assumption is correct, you want to SUM the contents of a column and group it.
Select sum(value), col
from table
group by col

SQL - Row For Each Criteria - Access

I am trying to create a query that will take a list of data, and give me N rows each time it meets a criteria.
Say I have the following data:
ID | Type
1 | Vegetables
2 | Vegetables
3 | Vegetables
4 | Fruits
5 | Fruits
6 | Meats
7 | Dairy
8 | Dairy
9 | Dairy
10 | Dairy
And what I want is:
Type
Dairy
Dairy
Dairy
Fruits
Fruits
Meats
Meats
Vegetables
Vegetables
The criteria I have is that for every 2 of each Type I count it as a "whole" value. If there is anything more than a whole value, round up to the nearest whole number. So, the Vegetables Type rounds up from 1.5 to 2 rows and the Dairy Type stays at 2 rows.
Then I want to add a row to every Type that is not the last type in the set (Which is why Vegetables only has two rows), perhaps with another column denomination showing that it was the added row.
This query will return every type along with the number of times that it has to be repeated:
SELECT Type, tot+IIf(Type=(SELECT MAX(Type) FROM tablename),0,1) AS Rep
FROM (SELECT tablename.Type, -Int(-Count([tablename].[ID])/2) AS tot
FROM tablename
GROUP BY tablename.Type
) AS s;
then my idea is to use a table named [times] that contains every number repeated n times:
n
---
1
2
2
3
3
3
...
and then your query could be like this:
SELECT s.*
FROM (
SELECT Type, tot+IIf(Type=(SELECT MAX(Type) FROM tablename),0,1) AS rep
FROM (SELECT tablename.Type, -Int(-Count([tablename].[ID])/2) AS tot
FROM tablename
GROUP BY tablename.Type
) AS s1) s INNER JOIN times ON s.rep=times.n
So you want to count the records, divide by 2, round up and then add 1.
--Create a temporary table with all numbers from 1 to 1024.
declare #Numbers table
(
MaxQty INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
)
WHILE COALESCE(SCOPE_IDENTITY(), 0) <= 1024
BEGIN
INSERT #Numbers DEFAULT VALUES
END
--First get the count of records
SELECT [Type], Sum(1) as CNT
INTO #TMP1
FROM MyTable
Group By [Type]
--Now get the number of times the record should be repeated, based on this formula :
-- count the records, divide by 2, round up and then add 1
SELECT [Type], CNT, CEILING((CNT/2)+1) as TimesToRepeat
INTO #TMP2
FROM #TMP1
--Join the #TMP2 table with the #Numbers table so you can repeat your records the
-- required number of times
SELECT A.*
from #TMP2 as A
join #Numbers as B
on B.MaxQty <= A.TimesToRepeat
Not pretty, but it should work. This still doesn't account for the last type in the set, I'm a little stumped by that part.

Multiple columns from a table into one, large column?

I don't know what in the world is the best way to go about this. I have a very large array of columns, each one with 1-25 rows associated with it. I need to be able to combine all into one large column, skipping blanks if at all possible. Is this something that Access can do?
a b c d e f g h
3 0 1 1 1 1 1 5
3 5 6 8 8 3 5
1 1 2 2 1 5
4 4 2 1 1 5
1 5
there are no blanks within each column, but each column has a different number of numbers in it. they need to be added from left to right so a,b, c, d, e, f. And the 0 from be needs to be in the first blank cell after the second 3 in A. And the first 5 in H needs to be directly after the 1 in g, with no blanks.
So you want a result like:
3
3
0
5
1
4
1
6
1
4
etc?
Here is how I would approach the problem. Insert your array into a work table with an autonumber column (important to retain the order the data is in, databases do not guarnatee an order unless you can give them something to sort on) called id
as well as the array columns.
Create a final table with an autonumber column (see above note on why you need an automnumber) and the column you want as you final table.
Run a separate insert statment for each column in your work table and run them in the order you want the data.
so the inserts would look something like:
insert table2 (colA)
select columnA from table1 order by id
insert table2 (colA)
select columnB from table1 order by id
insert table2 (colA)
select columnC from table1 order by id
Now when you do select columnA from table2 order by id you should have the results you need.

Can multiple rows within a window be referenced by an analytic function?

Given a table with:
ID VALUE
-- -----
1 1
2 2
3 3
4 4
I would like to compute something like this:
ID VALUE SUM
-- ----- ---
1 1 40 -- (2-1)*2 + (3-1)*3 + (4-1)*4 + (5-1)*5
2 2 26 -- (3-2)*3 + (4-2)*4 + (5-2)*5
3 3 14 -- (4-3)*4 + (5-3)*5
4 4 5 -- (5-4)*5
5 5 0 -- 0
Where the SUM on each row is the sum of the values of each subsequent row multiplied by the difference between the value of the subsequent row and the current row.
I could start with something like this:
CREATE TABLE x(id int, value int);
INSERT INTO x VALUES(1, 1);
INSERT INTO x VALUES(2, 2);
INSERT INTO x VALUES(3, 3);
INSERT INTO x VALUES(4, 4);
INSERT INTO x VALUES(5, 5);
SELECT id, value
,SUM(value) OVER(ORDER BY id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS sum
FROM x;
id | value | sum
----+-------+-----
1 | 1 | 14
2 | 2 | 12
3 | 3 | 9
4 | 4 | 5
5 | 5 |
(5 rows)
where each row has the sum of all subsequent rows. But to take it further, I would really want something like this pseudo code:
SELECT id, value
,SUM( (value - FIRST_ROW(value)) * value )
OVER(ORDER BY id ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS sum
FROM x;
But this is not valid. And that is the crux of the question: is there a way to reference multiple rows in the window of an analytic function? Or a different way to approach this? The example above is contrived. I was actually playing with an interesting puzzle from another post Rollup Query which led me to this problem. I am trying this in Postgresql 9.1, but not bound to that.
Not quite sure if I've understood your requirement exactly here, but the query that you want is something like
select a.id, a.value, sum(( b.value - a.value ) * b.value )
from x a, x b
where a.id < b.id
group by a.id, a.value
Hope that helps.