In OrientDB I have setup a time series using this use case. However, instead of appending my Vertex as an embedded list to the respective hour I have opted to just create an edge from the hour to the time dependent Vertex.
For arguments sake lets say that each hour has up to 60 time Vertex each identified by a timestamp. This means I can perform the following query to obtain a specific desired Vertex:
SELECT FROM ( SELECT expand( month[5].day[12].hour[0].out() ) FROM Year WHERE year = 2015) WHERE timestamp = 1434146922
I can see from the use case that I can use UNION to get several specified time branches in one go.
SELECT expand( records ) FROM (
SELECT union( month[3].day[20].hour[10].out(), month[3].day[20].hour[11].out() ) AS records
FROM Year WHERE year = 2015
)
This works fine if you only have a small number of branches but it doesn't work very well if you want to get all the records for a given time span. Say you wanted to get all the records between;
month[3].day[20].hour[11] -> month[3].day[29].hour[23]
I could iterate through the timespan and create a huge union query but at some point I guess the query would be too long and my guess is that it wouldn't be very efficient. I could also completely bypass the time branches and query the Vectors directly based on the timestamp.
SELECT FROM Sector WHERE timestamp BETWEEN 1406588622 AND 1406588624
The problem being that you loose all efficiencies gained by the time branches.
By experimenting and reading a bit about data types in orientdb, I found that :
The squared brackets allow to :
filtering by one index, example out()[0]
filtering by multiple indexes, example out()[0,2,4]
filtering by ranges, example out()[0-9]
OPTION 1 (UPDATE) :
Using a union to join on multiple time is the only option if you don't want to create all indexes and if your range is small. Here is a query exemple using union in the documentation.
OPTION 2 :
If you always have all the indexes created for your time and if you filter on wide ranges, you should filter by ranges. This is more performant then option 1 for the cost of having to create all indexes on which you want to filter on. Official documentation about field part.
This is how the query would look like :
select
*
from
(
select
expand(hour[0-23].out())
from
(select
expand(month[3].day[20-29])
from
Year
where
year = 2015)
)
where timestamp > 1406588622
I would highly recommend reading this.
Related
Brief Summary:
I am currently trying to get a count of completed parts that fall within a specific time range, machine number, operation number, and matches the tool number.
For example:
SELECT Sequence, Serial, Operation,Machine,DateTime,value as Tool
FROM tbPartProfile
CROSS APPLY STRING_SPLIT(Tool_Used, ',')
ORDER BY DateTime desc
is running a query which pulls all the instances that a tool has been changed, I am splitting the CSV from Tool_Used column. I am doing this because there can be multiple changes during one operation.
Objective:
This is where the production count come into place. For example, record 1 has a to0l change of 36 on 12/12/2022. I will need to go back in to the table and get the amount of part completed that equals the OPERATION/MACHINE/TOOL and fall between the date range.
For example:
SELECT *
FROM tbPartProfile
WHERE Operation = 20 AND Machine = 1 AND Tool_Used LIKE '%36%'
ORDER BY DateTime desc
For example this query will give me the datetimes the tools LIKE 36 was changed. I will need to take this datetime and compare it previous query and get the sum of all parts that were ran in this TimeRange/Operation/Machine/Tool Used
How to bulid a query in Ms Access to include the day before amounts as an opening balance. So on running the query i enter 3/10/18 in the WorkDay parameter box and records for 3/10/18 and 2/10/18 is shown. The Table is setup as follows:
WorkDay....TranactionID....Amount
2/10/18......Opening........1000
2/10/18......Credit.........500
2/10/18.......Debit.........300
3/10/18.......Credit........700
3/10/18.......Debit.........200
So if I run the query for 3/10/18 it should return
WorkDay....TranactionID....Amount
2/10/18......[Expr].........800
3/10/18.......Credit........700
3/10/18.......Debit.........200
If you are using the GUI add DateAdd("d",-1,[MyDateParameter]) to the OR line under [MyDateParameter] in the Workday field.
For SQL WHERE statement you would use
WorkDay=[MyDateParameter] OR Workday=DateAdd("d",-1,[MyDateParameter])
Obviously substitute [MyDateParameter] with whatever your date parameter actually is.
First some notes about the request:
The desired results imposes different requirements for the current day vs the previous day, so there must be two different queries. If you want them in one result set, you would need to use a UNION.
(You could write a single SQL UNION query, but since UNION queries do not work at all with the visual designer, you are left to write and test the query without any advantages of the query Design View. My preference is therefore to create two saved queries instead of embedded subqueries, then create a UNION which combines the results of the saved queries.)
Neither the question, nor answers to comments indicate what to do with any exceptions, like missing dates, weekends, etc. The following queries take the "day before" literally without exception.
The other difficulty is that the Credit entries also have a positive amount, so you must handle them specially. If Credits were saved with negative values, the summation would be simple and direct.
QueryCurrent:
PARAMETERS [Which WorkDay] DateTime;
SELECT S.WorkDay, S.TransactionID, Sum(S.[Amount]) As Amount
FROM [SomeUnspecifiedTable] As S
WHERE S.WorkDay = [Which WorkDay]
GROUP BY S.WorkDay, S.TransactionID
QueryPrevious:
PARAMETERS [Which WorkDay] DateTime;
SELECT S.WorkDay, "[Expr]" As TransactionID,
Sum(IIF(S.TransactionID = "Credit", -1, 1) * S.[Amount]) As Amount
FROM [SomeUnspecifiedTable] As S
WHERE S.WorkDay = ([Which WorkDay] - 1)
GROUP BY S.WorkDay
Union query:
SELECT * FROM QueryCurrent
UNION
SELECT * FROM QueryPrevious
ORDER BY [WorkDay]
Notes about the solution:
You could also use DateAdd() function, but add/subtracting integers from dates defaults to a change of days.
I am running Postgres 9.2 and I have a large table something like
CREATE TABLE sensor_values
(
ts timestamp with time zone NOT NULL,
value double precision NOT NULL DEFAULT 'NaN'::real,
sensor_id integer NOT NULL
)
I have values coming into the system constantly ie many per minute. I want to maintain a rolling standard deviation / average for the last 200 values so I can determine if new values entering the system are within say 3 standard deviations of the mean. To do so I would need the current standard deviation and mean to be constantly updated for the last 200 values.
As the table can be hundreds of millions of rows I do not want to get the last say 200 rows for a sensor ordered by time and then do vg(value), var_samp(value) for every new value coming in. I and assuming it will be faster to updated the standard deviation and mean.
I have started writing a PL/pgSQL function to update a rolling variance and mean on each new value entering the system for a particular sensor.
I can do this using code pseudo like
newavg = oldavg + (new_value - old_value)/window_size
new_variance += (new_value-old_value)*(new_value-newavg+old_value-oldavg)/(window_size-1)
This is based on
http://jonisalonen.com/2014/efficient-and-accurate-rolling-standard-deviation/
Basically the window is of size 200 values. The old_value is the first value of the window. When a new value comes in we shift the window forward one. After I get the result I store the following values for the sensor
The first value of the window.
The mean average of the window values.
The variance of the window values.
This way I don't have to constantly get there last 200 value and do a sum etc.I can reuse this values when a new sensor value come in.
My problem is when first running I have no previous window data for a sensor ie the three values above so I have to do it the slow way.
something like
WITH s AS
(SELECT value FROM sensor_values WHERE sensor_values.sensor_id = $1 AND ts >= (NOW() - INTERVAL '2 day')::timestamptz ORDER BY ts DESC LIMIT 200)
SELECT avg(value), var_samp(value) INTO last_window_average, last_window_variance FROM s;
But how could I get the last value (ealiest) to save from that select statement ?
Can I access the first row from s in PL/pgSQL.
I thought PL/pgSQL would be faster / cleaner approach but maybe its better to do this is client code ?
Are there better ways to perform this type on rolling statistic update ?
I assume, that it will not be drastically slow to re-calculated latest 200 entries each time with proper indexing. If you'll do an index, like:
CREATE INDEX i_sensor_values ON sensor_values(sensor_id, ts DESC);
you'll be able to get results fairly quickly doing:
SELECT sum("value") -- add more expressions as required
FROM sensor_values
WHERE sensor_id=$1
ORDER BY ts DESC
LIMIT 200;
You can execute this query in a loop from PL/pgSQL function.
If you'll migrate to 9.3 (or higher) any time soon, you'll be able to also use LATERAL joins for this purpose.
I do not think a covering index will do a good thing here, as table is constantly changing and IndexOnlyScan will not kick in.
It is good to check Loose Index scans also.
P.S. Column name value should be double quoted, as this is an SQL reserved word.
I saw the news about Table Decorators being available to limit the amount of data that is queried by specifying a time interval or limit. I did not see any examples on how to use the Table Decorators in the Big Query UI. Below is an example query that I'd like to run and only look at data that came in over the last 4hours. Any tips on how I can modify this query to utilize Table Decorators?
SELECT
foo,
count(*)
FROM [bigtable.201309010000]
GROUP BY 1
EDIT after trying example below
The first query above scans 180GB of data for the month of September (up through Sept 19th). I'd expect the query below to only scan data that came in during the time period specified. In this case 4hrs, so I'd expect the billing to be about 1.6GB not 180GB. Is there a way to set up ETL/query so we do not get billed for scanning the whole table?
SELECT
foo,
count(*)
FROM [bigtable.201309010000#-14400000]
GROUP BY 1
To use table decorators, you can either specify #timestamp or #timestamp-end_time. Timestamp can be negative, in which case it is relative; end_time can be empty, in which case it is the current time. You can use both of these special cases together, to get a time range relative to now. e.g. [table#-time_in_ms-]. So for your case, since 4 hours is 14400000 milliseconds, you can use:
SELECT foo, count(*) FROM [dataset.table#-14400000-] GROUP BY 1
This is a little bit confusing, we're intending to publish better documentation and examples soon.
The Problem
I have a list of Keys and another list of Dates for each of these keys. Basically a Multimap of Keys to Dates (in Java, Multimap<Key, Date>). I use these Keys and Dates to query a table like this:
select * from Table where key = :key and date = :date
This is horrible performance wise as Σ(|Date(Key)|) queries are generated. To improve this I can look at querying on periods in the form of:
select * from Table where key in (:keys) and date >= :startDate and date <= :endDate
As such only one query is required, but there is still a performance problem in that these dates can differ by very large periods (years). As example take a basic case where there are two Keys, with the first having a Date of '2010-01-01' assigned and the second a date of '2012-01-01'. In that case this query will return all values between that period, even though I only need two records.
Solution Approach
Ideally I'd like to generate the optimal number of queries, where the optimum is a function on the number of queries and the amount of data returned. I'd like as few queries as possible, but in such a way that they return the least amount of unnecessary data. Put another way, a simple fitness function could be w|Queries| x |Data|, where w is some weight.
Given this the previous example will result in two queries, whereas if the dates were close together it would only be a single query.
Options
This seems like a clustering problem, but I don't have much knowledge on clustering and as such I'm not really sure where to start. I guess that I'd probably have to break the Multimap into individuals of the form (Key, Date), and from there look for an algorithm that identifies the number of clusters itself.
Is there any clustering algorithm or approach that is well suited to my problem, or is there perhaps a solution other than clustering?
Try using IN:
select * from Table where key = :key and date IN (date1, date2, date3, etc.)
With it you can select the desired dates all at once.