System get stuck when testing stream calculation in TDengine - sql

I want to perform stream calculations on a super table with 100,000 sub-tables in TDengine, time window of 30 minutes / slide of 5 minutes. However, the system get stuck after a time.
My stream creation statement is:
create stream calctemp_stream into
calctemp_stream_output_stb as
select
_wstart as start,
_wend as wend,
last(temperature) -first (temperature) as difftemperature,
last(temperature) as lasttemperature,
last (ts) as lastts ,
first (temperature) as firsttemperature ,
last(alarm_type) as alarm_type,
xl,atype,xh
from calctemp partition by xl,atype,xh interval (30m) sliding(5m) ;
xl,atype,xh are the tags of supertable.
I don't know how to debug it.

Related

Summing measurements

I have this code:
#Name("Creating_hourly_measurement_Position_Stopper for line 2")
insert into CreateMeasurement
select
m.measurement.source as source,
current_timestamp().toDate() as time,
"Line2_Count_Position_Stopper_Measurement" as type,
{
"Line2_DoughDeposit2.Hourly_Count_Position_Stopper.value",
count(cast(getNumber(m, "Status.Sidestopper_positioning.value"), double)),
"Line2_DoughDeposit2.Hourly_Count_Position_Stopper.unit",
getString(m, "Status.Sidestopper_positioning.unit")
} as fragments
from MeasurementCreated.win:time(1 hours) m
where getNumber(m, "Status.Sidestopper_positioning.value") is not null
and cast(getNumber(m, "Status.Sidestopper_positioning.value"), int) = 1
and m.measurement.source.value = "903791"
output last every 1 hours;
but it seems to loop. I believe it's because new measurement will modify this group, meaning it is constantly extending. This mean that recalculation will be performed each time when new data will be available.
Is there a way to count the measurement or get the total of the measurements per hour or per day?
The stream it consumes is "MeasurementCreated" (see from) and that isn't produced by any EPL so one can safely say that this EPL by itself cannot possibly loop.
If you want to improve the EPL there is some information at this link: http://esper.espertech.com/release-8.2.0/reference-esper/html_single/index.html#processingmodel_basicfilter
By moving the where-clause text into a filter you can discard events early.
Doesn't the insert into CreateMeasurement then cause an event in MeasurementCreated?

Get the first row of a nested field in BigQuery

I have been struggling with a question that seem simple, yet eludes me.
I am dealing with the public BigQuery table on bitcoin and I would like to extract the first transaction of each block that was mined. In other word, to replace a nested field by its first row, as it appears in the table preview. There is no field that can identify it, only the order in which it was stored in the table.
I ran the following query:
#StandardSQL
SELECT timestamp,
block_id,
FIRST_VALUE(transactions) OVER (ORDER BY (SELECT 1))
FROM `bigquery-public-data.bitcoin_blockchain.blocks`
But it process 492 GB when run and throws the following error:
Error: Resources exceeded during query execution: The query could not be executed in the allotted memory. Sort operator used for OVER(ORDER BY) used too much memory..
It seems so simple, I must be missing something. Do you have an idea about how to handle such task?
#standardSQL
SELECT * EXCEPT(transactions),
(SELECT transaction FROM UNNEST(transactions) transaction LIMIT 1) transaction
FROM `bigquery-public-data.bitcoin_blockchain.blocks`
Recommendation: while playing with large table like this one - I would recommend creating smaller version of it - so it incur less cost for your dev/test. Below can help with this - you can run it in BigQuery UI with destination table which you will then be using for your dev. Make sure you set Allow Large Results and unset Flatten Results so you preserve original schema
#legacySQL
SELECT *
FROM [bigquery-public-data:bitcoin_blockchain.blocks#1529518619028]
The value of 1529518619028 is taken from below query (at a time of running) - the reason I took four days ago is that I know number of rows in this table that time was just 912 vs current 528,858
#legacySQL
SELECT INTEGER(DATE_ADD(USEC_TO_TIMESTAMP(NOW()), -24*4, 'HOUR')/1000)
An alternative approach to Mikhail's: Just ask for the first row of an array with [OFFSET(0)]:
#StandardSQL
SELECT timestamp,
block_id,
transactions[OFFSET(0)] first_transaction
FROM `bigquery-public-data.bitcoin_blockchain.blocks`
LIMIT 10
That first row from the array still has some nested data, that you might want to flatten to only their first row too:
#standardSQL
SELECT timestamp
, block_id
, transactions[OFFSET(0)].transaction_id first_transaction_id
, transactions[OFFSET(0)].inputs[OFFSET(0)] first_transaction_first_input
, transactions[OFFSET(0)].outputs[OFFSET(0)] first_transaction_first_output
FROM `bigquery-public-data.bitcoin_blockchain.blocks`
LIMIT 1000

SQL Calculate cumulative total based on rows within the same table

I'm trying to calculate a cumulative total for a field for each row in a table.
Consider a number of passengers on a bus, I know how many people get on & off at each stop, but i need to add to this the load on the bus, arriving at each stop.
I've got as far as getting a field which will calculate how the load changes at each stop, but how do I get the load from the stop before it? note, there are a number of trips within the same table, so for Stop 1 on a new trip, the load would be zero.
I've tried searching, but being new to this, I'm not even sure what i should be looking for and the results I do get I'm not even sure are relevant!
SELECT [Tripnumber], [Stop], Sum([Boarders] - [Alighters]) AS LoadChange
FROM table
Group By [Tripnumber], [Stop], [Boarders], [Alighters]
Order By [Tripnumber], [Stop]
You can use window functions:
SELECT [Tripnumber], [Stop],
Sum([Boarders] - [Alighters]) OVER (PARTITION BY tripnumber ORDER BY Stop) As LoadChange
FROM table;
I don't think the GROUP BY is necessary.

BigQuery from VM Instance accessing GHCN Daily data causing hiccups

$queryResults = $bigQuery->runQuery("
SELECT
date,
(value/10)+273 as temperature
FROM
[bigquery-public-data:ghcn_d.ghcnd_$year]
WHERE
id = '{$station['id']}' AND
element LIKE '%$element%'
ORDER BY
date,
temperature"
);
I am calling this query and iterating over the years for each station. It gets through 1 or 2 stations and then I get a
killed
on my output and the process is halted...
Is it possible that the queries are not closing? I am looking for a close query, but it appears that it should close itself?
Any Ideas?

SQLite limit to 5 records per sensor, replace oldest with newest

There have been similar(ish) questions, but this is a bit more specific...
I have 10 different sensors. I want to only keep the most recent 5 readings for each sensor.
The problem I am having is that when it reaches 5 readings, it will delete all the readings for that particular sensor and then start filling up again, until it reaches 5 again... and so on.
On reaching 5 readings, I want it to just "replace" the oldest with the newest. Been at this a while and can't fathom it out. I would prefer a reasonably easy-to-understand method as this is to help some kids who are creating a school project.
Thanks.
Here is the basic code so far...
Python/SQL codeblock to delete the oldest entry for the given sensor and insert a new one (if greater than 5 readings)
cursor = db.cursor()
cursor.execute("DELETE FROM readings WHERE Sensor IN (SELECT Sensor FROM readings WHERE Sensor = '%s' GROUP BY Sensor HAVING COUNT(Sensor)>5 and MIN(Timestamp))" % sensorName)
db.commit()
PS: I can see why it isn't working - it's because the SELECT statement is only going to select anything when the count is greater than 5, then it will delete all of the readings for that sensor, but I can't work out how to get it to only select the oldest and then delete that.
I think this method will work in SQLite:
delete from readings
where sensor = '%s' and
timestamp not in (select r2.timestamp
from readings r2
where r2.sensor = readings.sensor
order by timestamp
limit 5
);