Summing measurements - sql

I have this code:
#Name("Creating_hourly_measurement_Position_Stopper for line 2")
insert into CreateMeasurement
select
m.measurement.source as source,
current_timestamp().toDate() as time,
"Line2_Count_Position_Stopper_Measurement" as type,
{
"Line2_DoughDeposit2.Hourly_Count_Position_Stopper.value",
count(cast(getNumber(m, "Status.Sidestopper_positioning.value"), double)),
"Line2_DoughDeposit2.Hourly_Count_Position_Stopper.unit",
getString(m, "Status.Sidestopper_positioning.unit")
} as fragments
from MeasurementCreated.win:time(1 hours) m
where getNumber(m, "Status.Sidestopper_positioning.value") is not null
and cast(getNumber(m, "Status.Sidestopper_positioning.value"), int) = 1
and m.measurement.source.value = "903791"
output last every 1 hours;
but it seems to loop. I believe it's because new measurement will modify this group, meaning it is constantly extending. This mean that recalculation will be performed each time when new data will be available.
Is there a way to count the measurement or get the total of the measurements per hour or per day?

The stream it consumes is "MeasurementCreated" (see from) and that isn't produced by any EPL so one can safely say that this EPL by itself cannot possibly loop.
If you want to improve the EPL there is some information at this link: http://esper.espertech.com/release-8.2.0/reference-esper/html_single/index.html#processingmodel_basicfilter
By moving the where-clause text into a filter you can discard events early.

Doesn't the insert into CreateMeasurement then cause an event in MeasurementCreated?

Related

Is it possible to get time difference between dates and provide a default value, with PostgreSQL?

So, the table setup is something like this:
table: ticket(ticket_id, ...)
table: ticket_status_history(status_history_id, ticket_id, ticket_status, ticket_status_datetime)
The default ticket_status is OPEN, and the first ticket status that I'm interested in is ACKNOWLEDGED.
So, the idea is that a specific ticket has a set of ticket_status_history events, each recorded in the separate table. Each ticket status entry points to its corresponding ticket, ticket_id is a foreign key.
Now, a ticket can actually be created directly in ACKNOWLEDGED so it would get a corresponding entry directly in ACKNOWLEDGED, without ever being in OPEN. But many of them will go OPEN -> ACKNOWLEDGED -> ...
What I'd like to do would be to determine for each ticket the time interval between ticket creation (OPEN) and ticket acknowledgment (ACKNOWLEDGE), but if the state is directly ACKNOWLEDGE, set the time difference as a default of 0 (because the ticket was created directly in this state).
Is this doable in SQL, for PostgreSQL? I'm a bit stumped at the moment. I found this: Calculate Time Difference Between Two Rows but it's for SQL Server, instead, plus I'm not sure how the default value could be included.
The end state would actually be aggregating the time differences and computing an average duration, but I can't figure out the first step 😞
Your query could look like this:
SELECT t.*,coalesce(ack.ticket_status_datetime - op.ticket_status_datetime
,'0'::interval) AS op_ack_diff
FROM ticket t
LEFT JOIN ticket_status_history ack ON(t.ticket_id = ack.ticket_id
AND ack.ticket_status = 'ACKNOWLEDGED')
LEFT JOIN ticket_status_history op ON(t.ticket_id = op.ticket_id
AND op.ticket_status = 'OPENED')
WHERE t.ticket_id = x;
The difference of the timestamps yields null if one of the entries is missing. The coalesce function will return its second argument in this case.

How do I get rapid SQL inserts with MAX()+1 to always increment between each call?

I have an REST-ish endpoint that creates a day object and sets its order property to whatever the maximum order is +1. I'm having an issue where calling that endpoint in rapid succession results in some of the days having the same order. How do I solve this?
SQL Query is like so.
insert into "days" ("order", "program_id") values (
(select max(days.order)+1
from "days"
where "days"."program_id" = '5'), '5')
returning *
And it results in something like
{"program_id":5,"id":147,"order":38}
{"program_id":5,"id":150,"order":38}
{"program_id":5,"id":148,"order":38}
{"program_id":5,"id":149,"order":38}
{"program_id":5,"id":151,"order":39}
{"program_id":5,"id":152,"order":40}
{"program_id":5,"id":153,"order":41}
If it helps, I'm on Node (Express) and using Knex and Objection to build my queries for a Postgres database. The JavaScript code is as follows.
json.order = knex.select(knex.raw('max(days.order)+1'))
.from('days')
.where('days.program_id', json.program_id);
return await Days
.query(trx)
.returning('*')
.insert(json);
I'm also using max+1 as I want the order values to increment on a per program basis. So days of a program will have unique orders, but it is possible to have days of different programs with the same order.
Thanks!
You could probably add select ... for update locking to the subquery where you are calculating max+1:
json.order = knex.select(knex.raw('max(days.order)+1'))
.forUpdate() // <-------- lock rows
.from('days')
.where('days.program_id', json.program_id);
So after that no other concurrent connection can read those rows that are used for calculating max until this transaction ends.

SQL Calculate cumulative total based on rows within the same table

I'm trying to calculate a cumulative total for a field for each row in a table.
Consider a number of passengers on a bus, I know how many people get on & off at each stop, but i need to add to this the load on the bus, arriving at each stop.
I've got as far as getting a field which will calculate how the load changes at each stop, but how do I get the load from the stop before it? note, there are a number of trips within the same table, so for Stop 1 on a new trip, the load would be zero.
I've tried searching, but being new to this, I'm not even sure what i should be looking for and the results I do get I'm not even sure are relevant!
SELECT [Tripnumber], [Stop], Sum([Boarders] - [Alighters]) AS LoadChange
FROM table
Group By [Tripnumber], [Stop], [Boarders], [Alighters]
Order By [Tripnumber], [Stop]
You can use window functions:
SELECT [Tripnumber], [Stop],
Sum([Boarders] - [Alighters]) OVER (PARTITION BY tripnumber ORDER BY Stop) As LoadChange
FROM table;
I don't think the GROUP BY is necessary.

daily difference calculation performance improvement

I need to calculate the daily price difference in percentage. The query I have works but is getting slower every day. The main idea is to calculate the delta with the previous row. The previous row is normally the previous day, but there might sometimes be a day missing. When that happens it needs to take the last day available.
I'm looking for a way to limit the set that I retrieve in the inner query. There are about 20.000 records added per day.
update
price_watches pw
set
min_percent_changed = calc.delta
from
(select
id,
product_id,
calculation_date,
(1 - (price_min / lag(price_min) over (order by product_id, calculation_date))) * 100 as delta
from
price_watches
where
price_min > 0) calc
where
calc.id = pw.id;
This is wrong on many levels.
1.) It looks like you are updating all rows, including old rows that already have their min_percent_changed set and probably shouldn't be updated again.
2.) You are updating even if the new min_percent_changed is the same as the old.
3.) You are updating rows to store a redundant value that could be calculated on the fly rather cheaply (if done right), thereby making the row bigger and more error prone and producing lots of dead row versions, which means a lot of work for vacuum and slowing down everything else.
You shouldn't be doing any of this.
If you need to materialize the daily delta for read performance optimization, I suggest a small additional 1:1 table that can be updated cheaply without messing with the main table. Especially, if you recalc the value for every row every time. But better calculate new data.
If you really want to recalculate for every row (like your current UPDATE seems to do), make that a MATERIALIZED VIEW to automate the process.
If the new query I am going to demonstrate is fast enough, don't store any redundant data and calculate deltas on the fly.
For your current setup, this query should be much faster, when combined with this matching index:
CREATE INDEX price_watches_product_id_calculation_date_idx
ON price_watches(product_id, calculation_date DESC NULLS LAST);
Query:
UPDATE price_watches pw
SET min_percent_changed = calc.delta
FROM price_watches p1
, LATERAL (
SELECT (1 - p1.price_min / p2.price_min) * 100 AS delta
FROM price_watches p2
WHERE p2.product_id = p1.product_id
AND p2.calculation_date < p1.calculation_date
ORDER BY p2.calculation_date DESC NULLS LAST
LIMIT 1
) calc
WHERE p1.price_min > 0
AND p1.calculation_date = current_date - 1 -- only update new rows!
AND pw.id = p1.id
AND pw.min_percent_changed IS DISTINCT FROM calc.delta;
I am restricting the update to rows from "yesterday": current_date - 1. This is a wild guess at what you actually need.
Explanation for the added last line of the query:
How do I (or can I) SELECT DISTINCT on multiple columns?
Similar to this answer on dba.SE from just a few hours ago:
Slow window function query with big table
Proper information in the question would allow me to adapt the query and give more explanation.

What is an unbounded query?

Is an unbounded query a query without a WHERE param = value statement?
Apologies for the simplicity of this one.
An unbounded query is one where the search criteria is not particularly specific, and is thus likely to return a very large result set. A query without a WHERE clause would certainly fall into this category, but let's consider for a moment some other possibilities. Let's say we have tables as follows:
CREATE TABLE SALES_DATA
(ID_SALES_DATA NUMBER PRIMARY KEY,
TRANSACTION_DATE DATE NOT NULL
LOCATION NUMBER NOT NULL,
TOTAL_SALE_AMOUNT NUMBER NOT NULL,
...etc...);
CREATE TABLE LOCATION
(LOCATION NUMBER PRIMARY KEY,
DISTRICT NUMBER NOT NULL,
...etc...);
Suppose that we want to pull in a specific transaction, and we know the ID of the sale:
SELECT * FROM SALES_DATA WHERE ID_SALES_DATA = <whatever>
In this case the query is bounded, and we can guarantee it's going to pull in either one or zero rows.
Another example of a bounded query, but with a large result set would be the one produced when the director of district 23 says "I want to see the total sales for each store in my district for every day last year", which would be something like
SELECT LOCATION, TRUNC(TRANSACTION_DATE), SUM(TOTAL_SALE_AMOUNT)
FROM SALES_DATA S,
LOCATION L
WHERE S.TRANSACTION_DATE BETWEEN '01-JAN-2009' AND '31-DEC-2009' AND
L.LOCATION = S.LOCATION AND
L.DISTRICT = 23
GROUP BY LOCATION,
TRUNC(TRANSACTION_DATE)
ORDER BY LOCATION,
TRUNC(TRANSACTION_DATE)
In this case the query should return 365 (or fewer, if stores are not open every day) rows for each store in district 23. If there's 25 stores in the district it'll return 9125 rows or fewer.
On the other hand, let's say our VP of Sales wants some data. He/she/it isn't quite certain what's wanted, but he/she/it is pretty sure that whatever it is happened in the first six months of the year...not quite sure about which year...and not sure about the location, either - probably in district 23 (he/she/it has had a running feud with the individual who runs district 23 for the past 6 years, ever since that golf tournament where...well, never mind...but if a problem can be hung on the door of district 23's director so be it!)...and of course he/she/it wants all the details, and have it on his/her/its desk toot sweet! And thus we get a query that looks something like
SELECT L.DISTRICT, S.LOCATION, S.TRANSACTION_DATE,
S.something, S.something_else, S.some_more_stuff
FROM SALES_DATA S,
LOCATIONS L
WHERE EXTRACT(MONTH FROM S.TRANSACTION_DATE) <= 6 AND
L.LOCATION = S.LOCATION
ORDER BY L.DISTRICT,
S.LOCATION
This is an example of an unbounded query. How many rows will it return? Good question - that depends on how business conditions were, how many location were open, how many days there were in February, etc.
Put more simply, if you can look at a query and have a pretty good idea of how many rows it's going to return (even though that number might be relatively large) the query is bounded. If you can't, it's unbounded.
Share and enjoy.
http://hibernatingrhinos.com/Products/EFProf/learn#UnboundedResultSet
An unbounded result set is where a query is performed and does not explicitly limit the number of returned results from a query. Usually, this means that the application assumes that a query will always return only a few records. That works well in development and in testing, but it is a time bomb waiting to explode in production.
The query may suddenly start returning thousands upon thousands of rows, and in some cases, it may return millions of rows. This leads to more load on the database server, the application server, and the network. In many cases, it can grind the entire system to a halt, usually ending with the application servers crashing with out of memory errors.
Here is one example of a query that will trigger the unbounded result set warning:
var query = from post in blogDataContext.Posts
where post.Category == "Performance"
select post;
If the performance category has many posts, we are going to load all of them, which is probably not what was intended. This can be fixed fairly easily by using pagination by utilizing the Take() method:
var query = (from post in blogDataContext.Posts
where post.Category == "Performance"
select post)
.Take(15);
Now we are assured that we only need to handle a predictable, small result set, and if we need to work with all of them, we can page through the records as needed. Paging is implemented using the Skip() method, which instructs Entity Framework to skip (at the database level) N number of records before taking the next page.
But there is another common occurrence of the unbounded result set problem from directly traversing the object graph, as in the following example:
var post = postRepository.Get(id);
foreach (var comment in post.Comments)
{
// do something interesting with the comment
}
Here, again, we are loading the entire set without regard for how big the result set may be. Entity Framework does not provide a good way of paging through a collection when traversing the object graph. It is recommended that you would issue a separate and explicit query for the contents of the collection, which will allow you to page through that collection without loading too much data into memory.