Using a subquery in the WHERE clause of gapfill in TimescaleDB - sql

I would like to run the gapfill function of timescaleDB in a way, where the start and end dates are generated automatically. For example, I would like to run the gapfill function between the largest and the lowest entries in the database.
Given dataset playground:
CREATE TABLE public.playground (
value1 numeric,
"timestamp" bigint,
name "char"
);
INSERT INTO playground(name, value1, timestamp)
VALUES ('test', 100, 1599100000000000000);
INSERT INTO playground(name, value1, timestamp)
VALUES ('test', 100, 1599100001000000000);
INSERT INTO playground(name, value1, timestamp)
VALUES ('test', 100, 1599300000000000000);
I have tried getting the data as such:
SELECT time_bucket_gapfill(300E9::BIGINT, timestamp) as bucket
FROM playground
WHERE
timestamp >= (SELECT COALESCE(MIN(timestamp), 0) FROM playground)
AND
timestamp < (SELECT COALESCE(MAX(timestamp), 0) FROM playground)
GROUP BY bucket
I get an error:
ERROR: missing time_bucket_gapfill argument: could not infer start from WHERE clause
If I try the query with hard coded timestamps, the query runs just fine.
For example:
SELECT time_bucket_gapfill(300E9::BIGINT, timestamp) as bucket
FROM playground
WHERE timestamp >= 0 AND timestamp < 15900000000000000
GROUP BY bucket
Another approach of providing the start and end dates as arguments in the gapfill function fails as well.
WITH bounds AS (
SELECT COALESCE(MIN(timestamp), 0) as min, COALESCE(MAX(timestamp), 0) as max
FROM playground
WHERE timestamp >= 0 AND timestamp < 15900000000000000
),
gapfill as(
SELECT time_bucket_gapfill(300E9::BIGINT, timestamp, bounds.min, bounds.max) as bucket
FROM playground, bounds
GROUP BY bucket
)
select * from gapfill
ERROR: invalid time_bucket_gapfill argument: start must be a simple expression

time_bucket_gapfill only accepts start and finish values, which can be evaluated to constants at the query planning time. So it works to provide expression with constants and now, however it doesn't work to access a table in the expressions.
While this limitation on time_bucket_gapfill is in place it is not possible to achieve the desired behaviour in a single query. The work around is to calculate values for start and finish separately and then provide the values into the query with time_bucket_gapfill, which can be done in a stored procedure or in the application.
A side note, if PREPARE statement will be used in PostgreSQL 12, it is important to explicitly disable generic plan for the same reason.

for inferring start and stop from WHERE clause only direct column references are supported
see : https://github.com/timescale/timescaledb/issues/1345
so something like that might work , ( I have no timescaleDB access to test)
but try this :
SELECT
time_bucket_gapfill(300E9::BIGINT, time_range.min , time_range.max ) AS bucket
FROM
(
SELECT
COALESCE(MIN(timestamp), 0) AS min
, COALESCE(MAX(timestamp), 0) AS max
FROM
playground
) AS time_range
, playground
WHERE
timestamp >= time_range.min
AND timestamp < time_range.max
GROUP BY
bucket;

Related

BigQuery - UDF returns LEFT JOIN error when used in view

I`m facing the following issue with BQ and UDF (when UDF is used in a view) when calculating the number of days between inputs.
CREATE FUNCTION my_test_function(from_date TIMESTAMP, to_date TIMESTAMP) AS (
(
SELECT COUNT(date) AS count FROM my_test_table
WHERE date >= CAST(from_date as date) and date < CAST(to_date as date)
)
);
my_test_table - one column - date as DATE
When invoking the function in the query editor, it all works fine.
When using it in view, I got the following error.
LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
Looking at the execution plan - there are no JOINs there. Just INPUT and OUTPUT.
INPUT
READ
$1:date
FROM my_test_table
WHERE and(greater_or_equal($1, 18748), less($1, 18778))
AGGREGATE
$20 := COUNT($1)
WRITE
$20
TO __stage00_output
OUTPUT
READ
$20
FROM __stage00_output
AGGREGATE
$10 := SUM_OF_COUNTS($20)
WRITE
$10
TO __stage01_output
Found few similar issues here on SO, but all of them were using joins in the UDF query.
Any thoughts?
Thanks
EDIT:
The function call here:
with test_data as (
select CAST('2021-05-01' as timestamp) from_date, CAST('2021-05-31' as timestamp) to_date,
union all select CAST('2021-06-01' as timestamp) from_date, CAST('2021-06-20' as timestamp) to_date
)
select
*,
my_test_function(from_date, to_date) as date_diff
from
test_data

Dynamic query using bigquery and data studio

I want to take out data for every date range in Data Studio without the need to change date range selectors in my BigQuery all the time. However, not sure if it is even possible to do so. The reasons I do this is to make sure that the queried data is only for 30 days, as later it do some kind of segmentation using that 30 days data.
Then I figured out that the Data Studio can use dynamic_date, however this way will never produce any datatable (datatable will be used to do other queries from it). Is it possible to do dynamic_date in BigQuery instead? like retrieving data from BigQuery using a date range not previously defined in the query.
From my point of view, code should be like :
SELECT
ID,
FROM `table`
WHERE DATE(Timestamp) between $DS_START_DATE and $DS_START_DATE + INTERVAL 30 DAY)
or
WHERE DATE(Timestamp) >= #DS_START_DATE
I believe in pure Bigquery you can use DECLARE clause for that purpose, defining variables of the specified type:
declare DS_START_DATE date default "2020-03-03";
declare DS_END_DATE date default "2020-03-04";
WITH sample AS (
SELECT '10001' AS id, cast('2020-03-01' AS timestamp) as date_id UNION ALL
SELECT '10002', cast('2020-03-02' AS timestamp) UNION ALL
SELECT '10003', cast('2020-03-03' AS timestamp) UNION ALL
SELECT '10004', cast('2020-03-04' AS timestamp) UNION ALL
SELECT '10005', cast('2020-03-05' AS timestamp) UNION ALL
SELECT '10006', cast('2020-03-06' AS timestamp)
)
select id, date_id from sample
where date(date_id) between DS_START_DATE and DS_END_DATE
Alternatively, you can take a look at parameterized queries, however as I mentioned in the comment, they are not supported in classic BigQuery web UI.

query to subtract date from systimestamp in oracle 11g

I want to perform a subtraction operation on the date returned from another query and the system time in oracle SQL. So far I have been able to use the result of another query but when I try to subtract from systimestamp it gives me the following error
ORA-01722: invalid number
'01722. 00000 - "invalid number"
*Cause: The specified number was invalid.
*Action: Specify a valid number.
Below is my query
select round(to_number(systimestamp - e.last_time) * 24) as lag
from (
select ATTR_VALUE as last_time
from CONFIG
where ATTR_NAME='last_time'
and PROCESS_TYPE='new'
) e;
I have also tried this
select to_char(sys_extract_utc(systimestamp)-e.last_time,'YYYY-MM-DD HH24:MI:SS') as lag
from (
select ATTR_VALUE as last_time
from CONFIG
where ATTR_NAME='last_time'
and PROCESS_TYPE='new'
) e;
I want the difference between the time intervals to be in hours.
Thank you for any help in advance.
P.S. The datatype of ATTR_VALUE is VARCHAR2(150). A sample result of e.last_time is 2016-09-05 22:43:81796
"its VARCHAR2(150). That means I need to convert that to date"
ATTR_VALUE is a string so yes you need to convert it to the correct type before attempting to compare it with another datatype. Given your sample data the correct type would be timestamp, in which case your subquery should be:
(
select to_timestamp(ATTR_VALUE, 'yyyy-mm-dd hh24:mi:ss.ff5') as last_time
from CONFIG
where ATTR_NAME='last_time'
and PROCESS_TYPE='new'
)
The assumption is that your sample is representative of all the values in your CONFIG table for the given keys. If you have values in different formats your query will break on some other way: that's the danger of using this approach.
So finally after lots of trial and errors I got this one
1. Turns out initially the error was because the data_type of e.last_time was VARCHAR(150).
To find out the datatype of a given column in the table I used
desc <table_name>
which in my case was desc CONFIG
2. To convert VARCHAR to system time I have two options to_timestamp and to_date. If I use to_timestamp like
select round((systimestamp - to_timestamp(e.last_time,'YYYY-MM-DD HH24:MI:SSSSS')) * 24, 2) as lag
from (
select ATTR_VALUE as last_time
from CONFIG
where ATTR_NAME='last_time'
and PROCESS_TYPE='new'
) e;
I get an error that round expects NUMBER and got INTERVAL DAY TO SECONDS since the difference in date comes out to be like +41 13:55:20.663990. To convert that into hour would require a complex logic.
An alternative is to use to_data which I preferred and used it as
select round((sysdate - to_date(e.last_time,'YYYY-MM-DD HH24:MI:SSSSS')) * 24, 2) as lag
from (
select ATTR_VALUE as last_time
from CONFIG
where ATTR_NAME='last_time'
and PROCESS_TYPE='new'
) e;
This returns me the desired result i.e. the difference in hours rounded off to 2 floating digits

SQL Server: how to add case statement to select

I am using the following select to query a date from a database table.
The input (ms) for this query results from an xml string and the stored procedure then loops through all the single values in the xml to return a certain number (integer) for each of them.
This works fine so far.
Is there a way that I can return a placeholder number (like 99999) if the input (ms) is empty / nothing ?
Currently the below returns 0 in such a case which I cannot use to identify this as 0 can also be a valid result in other cases.
My stored procedure so far:
SELECT ms as date,
type,
(
SELECT COUNT(calendar_dt)
FROM Calendar
WHERE day_of_week NOT IN (1, 7)
AND calendar_dt > GETDATE()
AND calendar_dt <= ms
) as bDays
FROM #dates
FOR XML PATH('ms'), ELEMENTS, TYPE, ROOT('ranks')
Many thanks in advance for any help with this, Tim.
If the column "ms" is actually NULL or populated, just use ISNULL.
http://technet.microsoft.com/en-us/library/ms184325.aspx
SELECT ISNULL(ms, 99999) AS date
However, if that column can contain an empty string, which is not the same as NULL, then also use NULLIF.
http://technet.microsoft.com/en-us/library/ms177562.aspx
SELECT ISNULL(NULLIF(ms,''), 99999) AS date

Mysql optimization

I'm currently trying to optimize a MYSQL statement that is taking quite some time. The table this is running on is 600k+ and the query is taking over 10 seconds.
SELECT DATE_FORMAT( timestamp, '%Y-%m-%d' ) AS date, COUNT( DISTINCT (
email
) ) AS count
FROM log
WHERE timestamp > '2009-02-23'
AND timestamp < '2020-01-01'
AND TYPE = 'play'
GROUP BY date
ORDER BY date DESC
I've just indexes on timestamp and type and also one on timestamp_type (type_2).
Here is the explain results, the problem seems to be a file sort but I don't know how to get around this...
id: 1
select_type: SIMPLE
table: log
type: ref
possible_keys: type,timestamp,type_2
key: type_2
key_len: 1
ref: const
rows: 226403
Extra: Using where; Using filesort
Thanks
Things to try:
Have a separate date column (indexed) and use that instead of your timestamp column
Add an index across type and date
Use BETWEEN (don't think it will affect the speed but it's easier to read)
So ideally you would
Create a date column and fill it using UPDATE table SET date = DATE(timestamp)
Index across type and date
Change your select to ... type = ? AND date BETWEEN ? AND ?
Try rewriting to filter on TYPE alone first. Then apply your date range and aggregates. Basically create an inline view that filters type down. I know it's likely that the optimizer is doing this already, but when trying to improve performance I find it's helpful to be very certain of what things are happening first.
DATE_FORMAT will not utilizing the indexes.
You can still use the below query to utilize the index on timestamp column
SELECT timestamp AS date, COUNT( DISTINCT (
email
) ) AS count
FROM log
WHERE timestamp > '2009-02-23 00:00:00'
AND timestamp < '2020-01-01 23:59:59'
AND TYPE = 'play'
GROUP BY date
ORDER BY date DESC
Format the datetime value to date while printing/using