Count and agregate data by time - sql

I'm looking for a beauty-easy to read-smart SQL query (SQLite engine) to agregate data in column. It is more easy to explain with an example :
Data table :
id elapsedtime httpcode
1 0.0 200
2 0.1 200
3 0.3 301
4 0.6 404
5 1.0 200
6 1.1 404
7 1.2 500
Expected result set : a column by httpcode, with number of code by time. In this example, the time agregation is 0.2s (but it can be agregated at a second, or 10s). I'm interested only in some expected http_code :
time code_200 code_404 code_500 code_other
0.0 2 0 0 0
0.2 0 0 0 1
0.4 0 1 0 0
0.6 0 0 0 0
0.8 0 0 0 0
1.0 1 1 1 0
It is not mandatory for "time" to be continuous. In the previous example, line with time 0.6 and 0.6 can be removed.
For the moment, I can do this by doing 4 different requests (one by http code) and I agregate the result in my developped application:
select
0.2 * cast (elapsedtime/ 0.2 as int) as time, count(id) as code_200
from test
where (httpcode=200)
group by time
But i'm pretty sure i can achieve this with a single query. Unfortunally i'm not mastering UNION keyword...
Is there a way to get such data in a single SELECT ?
See SQLFiddle : http://sqlfiddle.com/#!5/2081f/3/1

Found a nicer solution than my original post, which I'll leave in, in case you're curious. Here's the nicer solution:
with t1 as (
select
0.2 * cast (elapsedtime/ 0.2 as int) as time,
case httpcode when 200 then 1 else 0 end code_200,
case httpcode when 404 then 1 else 0 end code_404,
case httpcode when 500 then 1 else 0 end code_500,
case when httpcode not in (200, 404, 500) then 1 else 0 end code_other
from test
)
select time,
sum(code_200) as count_200,
sum(code_404) as count_404,
sum(code_500) as count_500,
sum(code_other) as count_other
from t1
group by time;
Old solution:
This might not be too easy on the eye, but it more or less works (only difference between your desired output and what I get with this is that time groupings that have no values (0.6 and 0.8 in your example) are omitted:
with
t_all as (select
0.2 * cast (elapsedtime/ 0.2 as int) as time, count(id) as total
from test
group by time
),
t_200 as (select
0.2 * cast (elapsedtime/ 0.2 as int) as time, count(id) as code_200
from test
where (httpcode=200)
group by time),
t_404 as (select
0.2 * cast (elapsedtime/ 0.2 as int) as time, count(id) as code_404
from test
where (httpcode=404)
group by time),
t_500 as (select
0.2 * cast (elapsedtime/ 0.2 as int) as time, count(id) as code_500
from test
where (httpcode=500)
group by time),
t_other as (select
0.2 * cast (elapsedtime/ 0.2 as int) as time, count(id) as code_other
from test
where (httpcode not in (200, 404, 500))
group by time)
select
t_all.time,
total,
ifnull(code_200,0) as count_200,
ifnull(code_404,0) as count_404,
ifnull(code_500,0) as count_500,
ifnull(code_other,0) as count_other
from t_all
left join t_200 on t_all.time = t_200.time
left join t_404 on t_all.time = t_404.time
left join t_500 on t_all.time = t_500.time
left join t_other on t_all.time = t_other.time;

It may help you
select
0.2 * cast (elapsedtime/ 0.2 as int) as time, count(id) as code_200,
0 as code_404,
0 as code_500,
0 as code_other
from
test
where (httpcode=200)
group by time
union
select
0.2 * cast (elapsedtime/ 0.2 as int) as time,0 as code_200,
count(id) as code_404,
0 as code_500,
0 as code_other
from
test
where (httpcode=404)
group by time
union
select
0.2 * cast (elapsedtime/ 0.2 as int) as time,0 as code_200,
0 as code_400,
count(id) as code_500,
0 as code_other
from
test
where (httpcode=500)
group by time
union
select
0.2 * cast (elapsedtime/ 0.2 as int) as time,0 as code_200,
0 as code_400,
0 as code_500,
count(id) as code_other
from
test
where (httpcode<>200 and httpcode <> 404 and httpcode <> 500)
group by time

Related

SQL decreasing sum by a percentage

I have a table like
timestamp
type
value
08.01.2023
1
5
07.01.2023
0
20
06.01.2023
1
1
05.01.2023
0
50
04.01.2023
0
50
03.01.2023
1
1
02.01.2023
1
1
01.01.2023
1
1
Type 1 means a deposit, type 0 means a withdrawal.
The thing is when a type is 1 then the amount is the exact amount the user deposited so we can just sum that but type 0 means a withdrawal in percentage.
What I'm looking for is to create another column with current deposited amount. For the example above it would look like that.
timestamp
type
value
deposited
08.01.2023
1
5
5.4
07.01.2023
0
20
1.4
06.01.2023
1
1
1.75
05.01.2023
0
50
0.75
04.01.2023
0
50
1.5
03.01.2023
1
1
3
02.01.2023
1
1
2
01.01.2023
1
1
1
I can't figure out how to make a sum like this which would subtract percentage of previous total
You are trying to carry state over time, so ether need to use a UDTF to doing the carry work for you. Or use a recursive CTE
with data(transaction_date, type, value) as (
select to_date(column1, 'dd.mm.yyyy'), column2, column3
from values
('08.01.2023', 1, 5),
('07.01.2023', 0, 20),
('06.01.2023', 1, 1),
('05.01.2023', 0, 50),
('04.01.2023', 0, 50),
('03.01.2023', 1, 1),
('02.01.2023', 1, 1),
('01.01.2023', 1, 1)
), pre_process_data as (
select *
,iff(type = 0, 0, value)::number as add
,iff(type = 0, value, 0)::number as per
,row_number()over(order by transaction_date asc) as rn
from data
), rec_cte_block as (
with recursive rec_sub_cte as (
select
p.*,
p.add::number(20,4) as deposited
from pre_process_data as p
where p.rn = 1
union all
select
p.*,
round(div0((r.deposited + p.add)*(100-p.per), 100), 2) as deposited
from rec_sub_cte as r
left join pre_process_data as p
where p.rn = r.rn+1
)
select *
from rec_sub_cte
)
select * exclude(add, per, rn)
from rec_cte_block
order by 1;
I wrote the recursive CTE this way, as there currently is an incident if IFF or CASE is used inside the CTE.
TRANSACTION_DATE
TYPE
VALUE
DEPOSITED
2023-01-01
1
1
1
2023-01-02
1
1
2
2023-01-03
1
1
3
2023-01-04
0
50
1.5
2023-01-05
0
50
0.75
2023-01-06
1
1
1.75
2023-01-07
0
20
1.4
2023-01-08
1
5
6.4
Solution without recursion and UDTF
create table depo (timestamp date,type int, value float);
insert into depo values
(cast('01.01.2023' as date),1, 1.0)
,(cast('02.01.2023' as date),1, 1.0)
,(cast('03.01.2023' as date),1, 1.0)
,(cast('04.01.2023' as date),0, 50.0)
,(cast('05.01.2023' as date),0, 50.0)
,(cast('06.01.2023' as date),1, 1.0)
,(cast('07.01.2023' as date),0, 20.0)
,(cast('08.01.2023' as date),1, 5.0)
;
with t0 as(
select *
,sum(case when type=0 and value>=100 then 1 else 0 end)over(order by timestamp) gr
from depo
)
,t1 as (select timestamp as dt,type,gr
,case when type=1 then value else 0 end depo
,case when type=0 then ((100.0-value)/100.0) else 0.0 end pct
,sum(case when type=0 and value<100 then log((100.0-value)/100.0,2.0)
when type=0 and value>=100 then null
else 0.0
end)
over(partition by gr order by timestamp ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING) totLog
from t0
)
,t2 as(
select *
,case when type=1 then
isnull(sum(depo*power(cast(2.0 as float),totLog))
over(partition by gr order by dt rows between unbounded preceding and 1 preceding)
,0)/power(cast(2.0 as float),totLog)
+depo
else
isnull(sum(depo*power(cast(2.0 as float),totLog))
over(partition by gr order by dt rows between unbounded preceding and 1 preceding)
,0)/power(cast(2.0 as float),totLog)*pct
end rest
from t1
)
select dt,type,depo,pct*100 pct
,rest-lag(rest,1,0)over(order by dt) movement
,rest
from t2
order by dt
dt
type
depo
pct
movement
rest
2023-01-01
1
1
0
1
1
2023-02-01
1
1
0
1
2
2023-03-01
1
1
0
1
3
2023-04-01
0
0
50
-1.5
1.5
2023-05-01
0
0
50
-0.75
0.75
2023-06-01
1
1
0
1
1.75
2023-07-01
0
0
80
-0.35
1.4
2023-08-01
1
5
0
5
6.4
I think, it is better to perform this kind of calculations on client side or middle level.
Sequential calculations are difficult to implement in Sql. In some special cases, you can use logarithmic expressions. But it is clearer and easier to implement through recursion, as #Simeon showed.
To expand on #ValNik's answer
The fist simple step is to change "deduct 20%, then deduct 50%, then deduct 30%" in to a multiplication...
X - 20% - 50% - 30%
=>
x * 0.8 * 0.5 * 0.7
=>
x * 0.28
The second trick is to understand how to calculate cumulative PRODUCT() when you only have cumulative sum; SUM() OVER (), using the properties of logarithms...
a * b == exp( log(a) + log(b) )
0.8 * 0.5 * 0.7
=>
exp( log(0.8) + log(0.5) + log(0.7) )
=>
exp( -0.2231 + -0.6931 + -0.3567 )
=>
exp( -1.2730 )
=>
0.28
The next trick is easier to explain with integers rather than percentages. That is to be able to break down the original problem in to one that can be solved using "cumulative sum" and "cumulative product"...
Current working:
row_id
type
value
equation
result
1
+
10
0 + 10
10
2
+
20
(0 + 10 + 20)
30
3
*
2
(0 + 10 + 20) * 2
60
4
+
30
(0 + 10 + 20) * 2 + 30
90
5
*
3
((0 + 10 + 20) * 2 + 30) * 3
270
Rearranged working:
row_id
type
value
CUMPROD
new equation
result
1
+
10
2*3=6
(10*6 ) / 6
10
2
+
20
2*3=6
(10*6 + 20*6 ) / 6
30
3
*
2
3=3
(10*6 + 20*6 ) / 3
60
4
+
30
3=3
(10*6 + 20*6 + 30*3) / 3
90
5
*
3
=1
(10*6 + 20*6 + 30*3) / 1
270
CUMPROD is the "cumulative product" of all future "multiplication values".
The equation is then the "cumulative sum" of value * CUMPROD divided by the current CUMPROD.
So...
row 1 : SUM(10*6 ) / 6 => SUM(10 )
row 2 : SUM(10*6, 20*6 ) / 6 => SUM(10, 20)
row 3 : SUM(10*6, 20*6 ) / 3 => SUM(10, 20) * 2
row 4 : SUM(10*6, 20*6, 30*3) / 3 => SUM(10, 20) * 2 + SUM(30)
row 5 : SUM(10*6, 20*6, 30*3) / 1 => SUM(10, 20) * 2*3 + SUM(30) * 3
The only things to be cautious of are:
LOG(0) = Infinity (which would happen when deducting 100%)
Deducting more than 100% makes no sense
So, I copied #ValNik's code that creates a new partition every time 100% or more is deducted (forcing everything in the next partition to start at zero again).
This gives the following SQL (a re-arranged version of #ValNik's code):
WITH
partition_when_deduct_everything AS
(
SELECT
*,
SUM(
CASE WHEN type = 0 AND value >= 100 THEN 1 ELSE 0 END
)
OVER (
ORDER BY timestamp
)
AS deduct_everything_id,
CASE WHEN type = 1 THEN value
ELSE 0
END
AS deposit,
CASE WHEN type = 1 THEN 1.0 -- Deposits == Deduct 0%
WHEN value >= 100 THEN 1.0 -- Treat "deduct everything" as a special case
ELSE (100.0-value)/100.0 -- Change "deduct 20%" to "multiply by 0.8"
END
AS multiplier
FROM
your_table
)
,
cumulative_product_of_multipliers as
(
SELECT
*,
EXP(
ISNULL(
SUM(
LOG(multiplier)
)
OVER (
PARTITION BY deduct_everything_id
ORDER BY timestamp
ROWS BETWEEN 1 FOLLOWING
AND UNBOUNDED FOLLOWING
)
, 0
)
)
AS future_multiplier
FROM
partition_when_deduct_everything
)
SELECT
*,
ISNULL(
SUM(
deposit * future_multiplier
)
OVER (
PARTITION BY deduct_everything_id
ORDER BY timestamp
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
),
0
)
/
future_multiplier
AS rest
FROM
cumulative_product_of_multipliers
Demo : https://dbfiddle.uk/mrioIMiB
So how this should be solved, is a UDTF, because it requires to "sorting the data once" and "traversing the data only once", and if you have different PARTITIONS aka user_id etc etc you can work in parallel):
create or replace function carry_value_state(_TYPE float, _VALUE float)
returns table (DEPOSITED float)
language javascript
as
$$
{
initialize: function(argumentInfo, context) {
this.carried_value = 0.0;
},
processRow: function (row, rowWriter, context){
if(row._TYPE === 1) {
this.carried_value += row._VALUE;
} else {
let limited = Math.max(Math.min(row._VALUE, 100.0), 0.0);
this.carried_value -= (this.carried_value * limited) / 100;
}
rowWriter.writeRow({DEPOSITED: this.carried_value});
}
}
$$;
which then gets used like:
select d.*,
c.*
from data as d
,table(carry_value_state(d.type::float, d.value::float) over (order by transaction_date)) as c
order by 1;
so for the data we have been using in the example, that gives:
TRANSACTION_DATE
TYPE
VALUE
DEPOSITED
2023-01-01
1
1
1
2023-01-02
1
1
2
2023-01-03
1
1
3
2023-01-04
0
50
1.5
2023-01-05
0
50
0.75
2023-01-06
1
1
1.75
2023-01-07
0
20
1.4
2023-01-08
1
5
6.4
yes, the results are now in floating point, so you should double round to avoid FP representation problems, like:
round(round(c.deposited, 6) , 2) as deposited
An alternative approach using Match_Recognize(), POW(), SUM().
I would not recommend using Match_Recognize() unless you have too, it's fiddly and can waste time, however does look elegant.
with data(transaction_date, type, value) as (
select
to_date(column1, 'dd.mm.yyyy'),
column2,
column3
from
values
('08.01.2023', 1, 5),
('07.01.2023', 0, 20),
('06.01.2023', 1, 1),
('05.01.2023', 0, 50),
('04.01.2023', 0, 50),
('03.01.2023', 1, 1),
('02.01.2023', 1, 1),
('01.01.2023', 1, 1)
)
select
*
from
data match_recognize(
order by
transaction_date measures
sum(iff(CLASSIFIER() = 'ROW_WITH_DEPOSIT', value, 0)) DEPOSITS,
pow(iff(CLASSIFIER() = 'ROW_WITH_WITHDRAWL', value / 100, 1) ,count(row_with_withdrawl.*)) DISCOUNT_FROM_WITHDRAWL,
CLASSIFIER() TRANS_TYPE,
first(transaction_date) as start_date,
last(transaction_date) as end_date,
count(*) as rows_in_sequence,
count(row_with_deposit.*) as num_deposits,
count(row_with_withdrawl.*) as num_withdrawls
after
match skip PAST LAST ROW pattern((row_with_deposit + | row_with_withdrawl +)) define row_with_deposit as type = 1,
row_with_withdrawl as type = 0
);

Subtract in Union

I have this data, where I want to generate the last row "on the fly" from the first two:
Group
1yr
2yrs
3yrs
date
code
Port
19
-15
88
1/1/2020
arp
Bench
10
-13
66
1/1/2020
arb
Diff
9
2
22
I am trying to subtract the Port & Bench returns and have the difference on the new row. How can I do this?
Here's my code so far:
Select
date
Group,
Code,
1 yr returnp,
2 yrs returnp,
3yrs return
From timetable
union
Select
date,
Group,
Code,
1 yr returnb,
2 yrs returnb,
3yrs returnb
From timetable
Seems to me that a UNION ALL in concert with a conditional aggregation should do the trick
Note the sum() is wrapped in an abs() to match desired results
Select *
From YourTable
Union All
Select [Group] = 'Diff'
,[1yr] = abs(sum([1yr] * case when [Group]='Bench' then -1 else 1 end))
,[2yrs] = abs(sum([2yrs] * case when [Group]='Bench' then -1 else 1 end))
,[3yrs] = abs(sum([3yrs] * case when [Group]='Bench' then -1 else 1 end))
,[date] = null
,[code] = null
from YourTable
Results
Group 1yr 2yrs 3yrs date code
Port 19 -15 88 2020-01-01 arp
Bench 10 -13 66 2020-01-01 arb
Diff 9 2 22 NULL NULL
If you know there is always 2 rows, something like this would work
SELECT * FROM timetable
UNION ALL
SELECT
MAX(1yr) - MIN(1yr),
MAX(2yrs) - MIN(2yrs),
MAX(3yrs) - MIN(3yrs),
null,
null,
FROM timetable

Remove the time series before sequence of zero using sql

I have a time series as follows :
Day. Data
1/1/2020. 0
2/1/2020 .2
3/1/2020 0
...... ...
1/2/2020 0
2/2/2020. 0
3/2/2020. .2
4/2/2020. .3
5/2/2020. 0
6/2/2020 0
7/2/2020. 0
8/2/2020 2
9/2/2020 2.4
10/2/2020 3
So I want filter data only show after final sequence of zeros that we have in time series in this case I want to get only data after 8/2/202.
I have tried this
SELECT * FROM table where Data> 0
here is the result :
Day. Data
2/1/2020 .2
...... ...
3/2/2020. .2
4/2/2020. .3
8/2/2020 2
9/2/2020 2.4
10/2/2020 3
However this does not find the lates 0 and remove everything before that.
I want also show the result 2 days after the final zero in sequence in the table.
Day Data
10/2/2020 3
11/2/2020. 3.5
..... ....
One method is:
select t.*
from t
where t.day > (select max(t2.day) from t t2 where t2.value = 0);
You can offset this:
where t.day > (select max(t2.day) + interval '2' day from t t2 where t2.value = 0);
The above assumes that at least one row has zeros. Here are two easy fixes:
where t.day > all (select max(t2.day) from t t2 where t2.value = 0);
or:
where t.day > (select coalesce(max(t2.day), '2000-01-01') from t t2 where t2.value = 0);
You can use window functions:
select day, data
from (
select t.*, max(case when data = 0 then day end) over() day0
from mytable t
) t
where day > day0 or day0 is null
order by day0
This is easily adapted if you want to start two days after the last 0:
select day, data
from (
select t.*, max(case when data = 0 then day end) over() day0
from mytable t
) t
where day > day0 + interval '2 day' or day0 is null
order by day0

Calculating Percentages in Postgres

I'm completely new to PostgreSQL. I have the following table called my_table:
a b c date
1 0 good 2019-05-02
0 1 good 2019-05-02
1 1 bad 2019-05-02
1 1 good 2019-05-02
1 0 bad 2019-05-01
0 1 good 2019-05-01
1 1 bad 2019-05-01
0 0 bad 2019-05-01
I want to calculate the percentage of 'good' from column c for each date. I know how to get the number of 'good':
SELECT COUNT(c), date FROM my_table WHERE c != 'bad' GROUP BY date;
That returns:
count date
3 2019-05-02
1 2019-05-01
My goal is to get this:
date perc_good
2019-05-02 25
2019-05-01 75
So I tried the following:
SELECT date,
(SELECT COUNT(c)
FROM my_table
WHERE c != 'bad'
GROUP BY date) / COUNT(c) * 100 as perc_good
FROM my_table
GROUP BY date;
And I get an error saying
more than one row returned by a subquery used as an expression.
I found this answer but not sure how to or if it applies to my case:
Calculating percentage in PostgreSql
How do I go about calculating the percentage for multiple rows?
avg() is convenient for this purpose:
select date,
avg( (c = 'good')::int ) * 100 as percent_good
from t
group by date
order by date;
How does this work? c = 'good' is a boolean expression. The ::int converts it to a number, with 1 for true and 0 for false. The average is then the average of a bunch of 1s and 0s -- and is the ratio of the true values.
For this case you need to use conditional AVG():
SELECT
date,
100 * avg(case when c = 'good' then 1 else 0 end) perc_good
FROM my_table
GROUP BY date;
See the demo.
You could use a conditional sum for get the good value and count for total
below an exaustive code sample
select date
, count(c) total
, sum(case when c='good' then 1 else 0 end) total_good
, sum(case when c='bad' then 1 else 0 end) total_bad
, (sum(case when c='good' then 1 else 0 end) / count(c))* 100 perc_good
, (sum(case when c='bad' then 1 else 0 end) / count(c))* 100 perc_bad
from my_table
group by date
and for your result
select date
, (sum(case when c='good' then 1 else 0 end) / count(c))* 100 perc_good
from my_table
group by date
or as suggested by a_horse_with_no_name using count(*) filter()
select date
, ((count(*) filter(where c='good'))/count(*))* 100 perc_good
from my_table
group by date

Calculating new percentages over total in a "select distinct" query with multiple percentages rows

The title may be puzzling, but the situation should be really simple foranyone reading the following lines.
I have this table VAL:
customer month_1 month_2 month_3 value
ABC 0.5 0 0.50 200
ABC 0.25 0.25 0.50 200
XYZ 1 0 0 150
RST 0 0 1 200
RST 0 0.50 0.50 130
(This is the code to get it)
create table VAL (customer nvarchar(255), month_1 decimal(18,2), month_2 decimal(18,2), month_3 decimal(18,2), value decimal(18,2))
insert into VAL values ('ABC', 0.5, 0 , 0.50, 200)
insert into VAL values ('ABC',0.25 , 0.25, 0.50 , 200 )
insert into VAL values ('XYZ', 1 , 0 , 0 , 150 )
insert into VAL values ('RST', 0 , 0 , 1 , 200 )
insert into VAL values ('RST', 0 , 0.50, 0.50 , 130 )
This can be seen (transforming what is a percentage in actual values) as
customer value_month_1 value_month_2 value_month_3 value
ABC 100 0 100 200
ABC 50 50 100 200
XYZ 150 0 0 150
RST 0 0 200 200
RST 0 65 65 130
I need to transform everything in the following table, that is sort of a collapsed version of the first one:
customer month_1 month_2 month_3 value
ABC 0.375 0.125 0.50 400
XYZ 1.0 0 0.0 150
RST 0.0 0.197 0.793 330
So far, I'm able to do this customer by customer with the following code:
select distinct customer
,sum(month_1) as month_1
,sum(month_2) as month_2
,sum(month_3) as month_3
,sum(value) as value
from (
select distinct customer
,month_1 * sum(value)/(select sum(value) from VAL where customer='ABC' group by customer) as month_1
,month_2 * sum(value)/(select sum(value) from VAL where customer='ABC' group by customer) as month_2
,month_3 * sum(value)/(select sum(value) from VAL where customer='ABC' group by customer) as month_3
,sum(value) as value from VAL where customer='ABC' group by customer, month_1, month_2,month_3) as NEW
group by customer
that gives me the following result:
customer month_1 month_2 month_3 value
ABC 0.375 0.125 0.50 400
I'm pretty sure there are better ways to do this, probably with some command I don't know very well how to use.
Anyone able to help?
You just need a weighted average:
select customer,
sum(month_1 * value) / sum(value) as month_1,
sum(month_2 * value) / sum(value) as month_2,
sum(month_3 * value) / sum(value) as month_3,
sum(value)
from val
group by customer;