PostgreSQL sum of intervals - sql

In my database I have rows like:
date , value
16:13:00, 500
16:17:00, 700
16:20:00, 0
Now I want to do "special sum" over value from 16:00:00 to 17:00:00. So until 16:13 we assume that we have 0.
So special sum would look like (I'll omit seconds):
...
0 + -- (16:12)
500 + -- (16:13)
500 + -- (16:14)
500 + -- (16:15)
500 + -- (16:16)
700 + -- (16:17)
700 + -- (16:18)
700 + -- (16:19)
0 + -- (16:20)
...
So I have in database only changes of value and when this change occurs. And I want to sum over the whole hour. Result of this should be 4100.
What is the optimal way of doing that kind of sum in sql with PostgreSQL?
Best

You could at first select only the hour of your timestamp and then group by this hour:
SELECT
sum(s.value),
s.hour
FROM
(SELECT
value,
EXTRACT(HOUR FROM time) as hour
FROM la_table) as s
GROUP BY s.hour
This way you would just get values from 15:00:00 to 15:59:59 of course.
SQLFiddle for playing: http://sqlfiddle.com/#!1/d6ad1/1

If I've understood you correctly, you are looking for simply totals per hour?
SELECT EXTRACT(hour FROM "date") hr,
SUM(value) total
FROM yourtable
GROUP BY hr
ORDER BY hr;

If I understand, you wish to select every entries with occurance within a time period. In this example any row with a time value within 10:00 and 11:00 is selected if it stars before the period or during the period or if it ends during the period of after the period.
select * from table
where table.start_time < end_of_period and table.end_time > end_of_period
select * from table
where (start_time < '2017-05-16 11:00:00') and (end_time > '2017-05-16 10:00:00')

Related

Extract previous row calculated value for use in current row calculations - Postgres

Have a requirement where I would need to rope the calculated value of the previous row for calculation in the current row.
The following is a sample of how the data currently looks :-
ID
Date
Days
1
2022-01-15
30
2
2022-02-18
30
3
2022-03-15
90
4
2022-05-15
30
The following is the output What I am expecting :-
ID
Date
Days
CalVal
1
2022-01-15
30
2022-02-14
2
2022-02-18
30
2022-03-16
3
2022-03-15
90
2022-06-14
4
2022-05-15
30
2022-07-14
The value of CalVal for the first row is Date + Days
From the second row onwards it should take the CalVal value of the previous row and add it with the current row Days
Essentially, what I am looking for is means to access the previous rows calculated value for use in the current row.
Is there anyway we can achieve the above via Postgres SQL? I have been tinkering with window functions and even recursive CTEs but have had no luck :(
Would appreciate any direction!
Thanks in advance!
select
id,
date,
coalesce(
days - (lag(days, 1) over (order by date, days))
, days) as days,
first_date + cast(days as integer) as newdate
from
(
select
-- get a running sum of days
id,
first_date,
date,
sum(days) over (order by date, days) as days
from
(
select
-- get the first date
id,
(select min(date) from table1) as first_date,
date,
days
from
table1
) A
) B
This query get the exact output you described. I'm not at all ready to say it is the best solution but the strategy employed is to essential create a running total of the "days" ... this means that we can just add this running total to the first date and that will always be the next date in the desired sequence. One finesse: to put the "days" back into the result, we calculated the current running total less the previous running total to arrive at the original amount.
assuming that table name is table1
select
id,
date,
days,
first_value(date) over (order by id) +
(sum(days) over (order by id rows between unbounded preceding and current row))
*interval '1 day' calval
from table1;
We just add cumulative sum of days to first date in table. It's not really what you want to do (we don't need date from previous row, just cumulative days sum)
Solution with recursion
with recursive prev_row as (
select id, date, days, date+ days*interval '1 day' calval
from table1
where id = 1
union all
select t.id, t.date, t.days, p.calval + t.days*interval '1 day' calval
from prev_row p
join table1 t on t.id = p.id+ 1
)
select *
from prev_row

SQL Big Query - How to write a COUNTIF statement applied to an INTERVAL column

I have a trip_duration column in interval format. I want to remove all observations less than 90 seconds and count how many observations match this condition.
My current SQL query is
WITH
org_table AS (
SELECT
ended_at - started_at as trip_duration
FROM `cyclistic-328701.12_month_user_data_cyclistic.20*`
)
SELECT
COUNTIF(x < 1:30) AS false_start
FROM trip_duration AS x;
I returns Syntax error: Expected ")" but got ":" at [8:16]
I have also tried
SELECT
COUNTIF(x < "0-0 0 0:1:30") AS false_start
FROM trip_duration AS x
It returns Table name "trip_duration" missing dataset while no default dataset is set in the request.
I've read through other questions and have not been able to write a solution.
My first thought is to cast the trip_duration from INTERVAL to TIME format so COUNT IF statements can reference a TIME formatted column instead of INTERVAl.
~ Marcus
Below example shows you the way to handle intervals
with trip_duration as (
select interval 120 second as x union all
select interval 10 second union all
select interval 2 minute union all
select interval 50 second
)
select
count(*) as all_starts,
countif(x < interval 90 second) as false_starts
from trip_duration
with output
To filter the data without the durations less than 90 secs:
SELECT
* # here is whatever field(s) you want to return
FROM
`cyclistic-328701.12_month_user_data_cyclistic.20*`
WHERE
TIMESTAMP_DIFF(ended_at, started_at, SECOND) > 90
You can read about the TIMESTAMP_DIFF function here.
To count the number of occurrences:
SELECT
COUNTIF(TIMESTAMP_DIFF(ended_at, started_at,SECOND) < 90) AS false_start,
COUNTIF(TIMESTAMP_DIFF(ended_at, started_at,SECOND) >= 90) AS non_false_start
FROM
`cyclistic-328701.12_month_user_data_cyclistic.20*`

How do i give the condition to group by time period?

I need to get the count of records using PostgreSQL from time 7:00:00 am till next day 6:59:59 am and the count resets again from 7:00am to 6:59:59 am.
Where I am using backend as java (Spring boot).
The columns in my table are
id (primary_id)
createdon (timestamp)
name
department
createdby
How do I give the condition for shift wise?
You'd need to pick a slice based on the current time-of-day (I am assuming this to be some kind of counter which will be auto-refreshed in some application).
One way to do that is using time ranges:
SELECT COUNT(*)
FROM mytable
WHERE createdon <# (
SELECT CASE
WHEN current_time < '07:00'::time THEN
tsrange(CURRENT_DATE - '1d'::interval + '07:00'::time, CURRENT_DATE + '07:00'::time, '[)')
ELSE
tsrange(CURRENT_DATE + '07:00'::time, CURRENT_DATE + '1d'::interval + '07:00'::time, '[)')
END
)
;
Example with data: https://rextester.com/LGIJ9639
As I understand the question, you need to have a separate group for values in each 24-hour period that starts at 07:00:00.
SELECT
(
date_trunc('day', (createdon - '7h'::interval))
+ '7h'::interval
) AS date_bucket,
count(id) AS count
FROM lorem
GROUP BY date_bucket
ORDER BY date_bucket
This uses the date and time functions and the GROUP BY clause:
Shift the timestamp value back 7 hours ((createdon - '7h'::interval)), so the distinction can be made by a change of date (at 00:00:00). Then,
Truncate the value to the date (date_trunc('day', …)), so that all values in a bucket are flattened to a single value (the date at midnight). Then,
Add 7 hours again to the value (… + '7h'::interval), so that it represents the starting time of the bucket. Then,
Group by that value (GROUP BY date_bucket).
A more complete example, with schema and data:
DROP TABLE IF EXISTS lorem;
CREATE TABLE lorem (
id serial PRIMARY KEY,
createdon timestamp not null
);
INSERT INTO lorem (createdon) (
SELECT
generate_series(
CURRENT_TIMESTAMP - '36h'::interval,
CURRENT_TIMESTAMP + '36h'::interval,
'45m'::interval)
);
Now the query:
SELECT
(
date_trunc('day', (createdon - '7h'::interval))
+ '7h'::interval
) AS date_bucket,
count(id) AS count
FROM lorem
GROUP BY date_bucket
ORDER BY date_bucket
;
produces this result:
date_bucket | count
---------------------+-------
2019-03-06 07:00:00 | 17
2019-03-07 07:00:00 | 32
2019-03-08 07:00:00 | 32
2019-03-09 07:00:00 | 16
(4 rows)
You can use aggregation -- by subtracting 7 hours:
select (createdon - interval '7 hour')::date as dy, count(*)
from t
group by dy
order by dy;

SQL Server Return 4 rows for every hour in a day

I have a query that returns all messages from a device within a day (simplified):
SELECT date, value
FROM Messages
WHERE date between '04/01/2018 00:00:00' AND '04/01/2018 23:59:59'
ORDER BY date asc
The problem is that it returns too many rows. For example, 1 row per minute minimum (1440 rows in a day), and I have to print that in a chart.
How could I return the first row in every quarter hour so I get 4 rows per every hour of the day?
Expected result:
date value
2018-01-04 05:00:00.000 || 5,52
2018-01-04 05:15:00.000 || 5,48
2018-01-04 05:30:00.000 || 5,35
2018-01-04 05:45:00.000 || 5,42
you can do it by a Modulus (%) like as follow:
SELECT date, value
FROM Messages
WHERE date between '04/01/2018 00:00:00' AND '04/01/2018 23:59:59' and (datepart(minute,date) % 15) = 0
ORDER BY date asc;
This query returns a data which contains a date row which minute completely divide with 15 (Quarter). I think this may solve your problem.
Note: I not used Seconds because of your data added per minute as per
your language in question.
In case you have more than one row in one minute or rows do not exactly match hour:minute pattern, you can use following:
SELECT * INTO tab FROM (VALUES
('2018-01-01 05:00:01', 1),
('2018-01-01 05:10', 2),
('2018-01-01 05:20', 3),
('2018-01-01 05:28', 4),
('2018-01-01 05:31', 5)
) T(Date,Value)
SELECT Date,Value FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY CAST(Date AS DATE),
DATEPART(HOUR,Date),
DATEPART(MINUTE,Date)/15
ORDER BY Date) RowNum FROM tab
) T WHERE RowNum=1
It returns:
Date Value
---- -----
2018-01-01 05:00:01 1
2018-01-01 05:20 3
2018-01-01 05:31 5
You could simply use a "like" condition:
and date like '%00:00.000' or date like '15:00.000' ...
Use a modulo function on the minute part of the date
select *
from mytable T1
where datepart(minute, T1.date)%15 = 0
I would start by getting a row_number partitioned by 15 minute intervals.
SELECT Truncdate, value FROM (
SELECT date
, value
, dateadd(minute, datediff(minute, 0, date) / 15 * 15, 0) AS TruncDate
, row_number() OVER (PARTITION BY dateadd(minute, datediff(minute, 0, date) / 15 * 15, 0) ORDER BY date) as RowNum
FROM messages
) x
WHERE x.rownum = 1
You can change Truncdate to date in the outer select if you want to see the actual first datetime in that 15 minute block instead of the block-rounded date.
e: I didn't actually read the question. This approach has the advantage that it will still get the first value in each block even if it occurs after the minute the block starts, which I now notice isn't a requirement for your solution.

MySQL AVG function for recent 15 records by date (order date desc) in every symbol

I am trying to create a statement in SQL (for a table which holds stock symbols and price on specified date) with avg of 5 day price and avg of 15 days price for each symbol.
Table columns:
symbol
open
high
close
date
The average price is calculated from last 5 days and last 15 days. I tried this for getting 1 symbol:
SELECT avg(close),
avg(`trd_qty`)
FROM (SELECT *
FROM cashmarket
WHERE symbol = 'hdil'
ORDER BY `M_day` desc
LIMIT 0,15 ) s
but I couldn't get the desired list for showing avg values for all symbols.
You can either do it with row numbers as suggested by astander, or you can do it with dates.
This solution will also take the last 15 days if you don't have rows for every day while the row number solution takes the last 15 rows. You have to decide which one works better for you.
EDIT: Replaced AVG, use CASE to avoid division by 0 in case no records are found within the period.
SELECT
CASE WHEN SUM(c.is_5) > 0 THEN SUM( c.close * c.is_5 ) / SUM( c.is_5 )
ELSE 0 END AS close_5,
CASE WHEN SUM(c.is_5) > 0 THEN SUM( c.trd_qty * c.is_5 ) / SUM( c.is_5 )
ELSE 0 END AS trd_qty_5,
CASE WHEN SUM(c.is_15) > 0 THEN SUM( c.close * c.is_15 ) / SUM( c.is_15 )
ELSE 0 END AS close_15,
CASE WHEN SUM(c.is_15) > 0 THEN SUM( c.trd_qty * c.is_15 ) / SUM( c.is_15 )
ELSE 0 END AS trd_qty_15
FROM
(
SELECT
cashmarket.*,
IF( TO_DAYS(NOW()) - TO_DAYS(m_day) < 15, 1, 0) AS is_15,
IF( TO_DAYS(NOW()) - TO_DAYS(m_day) < 5, 1, 0) AS is_5
FROM cashmarket
) c
The query returns the averages of close and trd_qty for the last 5 and the last 15 days. Current date is included, so it's actually today plus the last 4 days (replace < by <= to get current day plus 5 days).
Use:
SELECT DISTINCT
t.symbol,
x.avg_5_close,
y.avg_15_close
FROM CASHMARKET t
LEFT JOIN (SELECT cm_5.symbol,
AVG(cm_5.close) 'avg_5_close',
AVG(cm_5.trd_qty) 'avg_5_qty'
FROM CASHMARKET cm_5
WHERE cm_5.m_date BETWEEN DATE_SUB(NOW(), INTERVAL 5 DAY) AND NOW()
GROUP BY cm_5.symbol) x ON x.symbol = t.symbol
LEFT JOIN (SELECT cm_15.symbol,
AVG(cm_15.close) 'avg_15_close',
AVG(cm_15.trd_qty) 'avg_15_qty'
FROM CASHMARKET cm_15
WHERE cm_15.m_date BETWEEN DATE_SUB(NOW(), INTERVAL 15 DAY) AND NOW()
GROUP BY cm_15.symbol) y ON y.symbol = t.symbol
I'm unclear on what trd_qty is, or how it factors into your equation considering it isn't in your list of columns.
If you want to be able to specify a date rather than the current time, replace the NOW() with #your_date, an applicable variable. And you can change the interval values to suit, in case they should really be 7 and 21.
Have a look at How to number rows in MySQL
You can create the row number per item for the date desc.
What you can do is to retrieve the Rows where the rownumber is between 1 and 15 and then apply the group by avg for the selected data you wish.
trdqty is the quantity traded on particular day.
the days are not in order coz the market operates only on weekdays and there are holidays too so date may not be continuous