Vertica different calculations than in PostgreSQL - sql

I have one query:
SELECT CAST(((stats.ts_spawn - 1427835600) / 86400) * 86400 +
1427835600 AS INTEGER) AS anon_1 FROM stats WHERE stats.ts_spawn >
1427835600 AND stats.ts_spawn < 1428440399 GROUP BY anon_1 order by anon_1;
I'm expecting to get start of the each day in a week.
Result in Postgresql:
1427835600
1427922000
1428008400
1428094800
1428181200
1428267600
1428354000
Vertica returns start of each hour of each day of the week:
1427839200
1427842800
1427846400
1427850000
... and so on, total 167 records(24 * 7 - 1)
I have no idea how to modify this query.

The second one is obviously resulting in a float not an integer in division. In Vertica documents we can read this:
the Vertica 6 release introduced a behavior change when dividing integers using the / operator
If you want the query to behave the same on both systems either change the configuration option as mentioned in that doc or use the Floor() function on the result of division.

Related

ORA 00933 sql command not properly ended

I wrote this block of code
SELECT max(order)
FROM orders_table
GROUP BY UNIX_TIMESTAMP(timestamp) DIV 30 ;
Order is a column that I'm trying to take the max of every 30 seconds from a table called orders_table. I found the last line of code on here in the answer to someone elses program. However, I get an error when I try to run this code.
Thanks in advance
Your query uses MySQL syntax. In Oracle server, neither DIV nor UNIX_TIMESTAMP exists.
To do integer division, you may just TRUNC the results of the division.
To compute the number of seconds since January 1st, 1970, you could use the following expression (since Oracle, when substracting dates, returns the result as a number of days) :
(date_column - TO_DATE('1970-01-01', 'yyyy-mm-dd')) * 60 *60 *24
You probably want :
SELECT MAX(o.order)
FROM orders_table o
GROUP BY TRUNC(o.timestamp - TO_DATE('1970-01-01', 'yyyy-mm-dd')) / 30 )

REGR_SLOPE in Teradata SQL Query Returning 0 Slope

I am a relative newbie with Teradata SQL and have run into this strange (I think strange) situation. I am trying to run a regression (REGR_SLOPE) on sensor data. I am gathering sensor readings for a single day, each day is 80 observations which is confirmed by the COUNT in the outer SELECT. My query is:
SELECT
d.meter_id,
REGR_SLOPE(d.reading_measure, d.x_axis) AS slope,
COUNT(d.x_axis) AS xcount,
COUNT(d.reading_measure) AS read_count
FROM
(
SELECT
meter_id,
reading_measure,
row_number() OVER (ORDER BY Reading_Dttm) AS x_axis
FROM data_mart.v_meter_reading
WHERE Reading_Start_Dt = '2017-12-12'
AND Meter_Id IN (11932101, 11419827, 11385229, 11643466)
AND Channel_Num = 5
) d
GROUP BY 1
When I use the "IN" clause in the subquery to specify Meter_Id, I get slope values, but when I take it out (to run over all meters) all the slopes are 0 (zero). I would simply like to run a line through a day's worth of observations (80).
I'm using Teradata v15.0.
What am I missing / doing wrong?
I would bet a Pepperoni Pizza that it's the x_axis value.
Instead try ROW_NUMBER() OVER (PARTITION BY meter_id ORDER BY reading_dttm)
This will ensure that the x_axis starts again from 1 for each meter, and each reading will always be 1 away from the previous reading on the x_axis.
This makes me thing you should probably just use reading_dttm as the x_axis value, rather than fabricating one with ROW_NUMBER(). That way readings with a 5 hour gap between them have a different slope to readings with a 10 day gap between them. You may need to convert the reading_dttm's data-type, with a function like TO_UNIXTIME(reading_dttm), or something similar.
I'll message you my address for the Pizza Delivery. (Joking.)
Additional to #MatBailie's answer.
You probably know that should you order by the timestamp instead of the ROW_NUMBER, but you couldn't do it because Teradata doesn't allow timestamps in this place (strange).
There's no built-in TO_UNIXTIME function in Teradata, but you can use this instead:
REPLACE FUNCTION TimeStamp_to_UnixTime (ts TIMESTAMP(6))
RETURNS decimal(18,6)
LANGUAGE SQL
CONTAINS SQL
DETERMINISTIC
SQL SECURITY DEFINER
COLLATION INVOKER
INLINE TYPE 1
RETURN
(Cast(ts AS DATE) - DATE '1970-01-01') * 86400
+ (Extract(HOUR From ts) * 3600)
+ (Extract(MINUTE From ts) * 60)
+ (Extract(SECOND From ts));
If you're not allowed to create UDFs simply cut&paste the calculation.

Oddities with postgres SQL [negative date interval and alias that doesn't work only in condition clause]

I'm coming to you guys with with two small oddities I can't seem to understand with postgres:
(1)
SELECT "LASTREQUESTED",
(DATE_TRUNC('seconds', CURRENT_TIMESTAMP - "LASTREQUESTED")
- INTERVAL '8 hours') AS "TIME"
FROM "USER" AS u
JOIN "REQUESTLOG" AS r ON u."ID" = r."ID"
ORDER BY "TIME"
I'm calculating when users can make their next request [once every 8 hours], but if you look at entry 16 I get "1 day -06:20:47" instead of "18:00:00" ish, unlike every other line. [The table LASTREQUESTED is a simple timestamp, nothing different here from the other entries for line 16], why is that?
(2)
On the same request, if I try to add a condition on the "TIME" column, the compiler says it doesn't exist although using it to order by is ok. I don't get why.
SELECT (DATE_TRUNC('seconds', CURRENT_TIMESTAMP - "LASTREQUESTED")
- INTERVAL '8 hours') AS "TIME"
FROM "USER" AS u
JOIN "REQUESTLOG" AS r ON u."ID" = r."ID"
WHERE "TIME" > 0
ORDER BY "TIME";
Question #1: negative hours but positive days?
According to the PostgreSQL documentation, this is a situation where PostgreSQL differs from the SQL standard:
According to the SQL standard all fields of an interval value must have the same sign…. PostgreSQL allows the fields to have different signs….
Internally interval values are stored as months, days, and seconds. This is done because the number of days in a month varies, and a day can have 23 or 25 hours if a daylight savings time adjustment is involved. The months and days fields are integers while the seconds field can store fractions. …
You can see a more extreme example of this with the following query:
=# select interval '1 day' - interval '300 hours';
?column?
------------------
1 day -300:00:00
(1 row)
So this is not a single interval in seconds expressed in a strange way; instead, it's an interval of 0 months, +1 day, and -1,080,000.0 seconds. If you are certain that there's no daylight savings time issues with the timestamps that you got these intervals from, you can use justify_hours to convert days into 24-hour periods and get an interval that makes more sense:
=# select justify_hours(interval '1 day' - interval '300 hours');
justify_hours
--------------------
-11 days -12:00:00
Question #2: SELECT columns can't be used in WHERE?
This is standard PostgreSQL behavior. See this duplicate question. Solutions presented there include:
Repeat the expression twice, once in the SELECT list, and again in the WHERE clause. (I've done this more times than I want to remember…)
SELECT (my - big * expression) AS x
FROM stuff
WHERE (my - big * expression) > 5
ORDER BY x
Create a subquery without that WHERE filter, and put the WHERE conditions in the outer query
SELECT *
FROM (SELECT (my - big * expression) AS x
FROM stuff) AS subquery
WHERE x > 5
ORDER BY x
Use a WITH statement to achieve something similar to the subquery trick.
I don't now exactly why it's calculating as-is (maybe because you subtract an Interval from another Interval) but when you change the calculation to Timestamp minus Timestamp it works as expected:
DATE_TRUNC('seconds', CURRENT_TIMESTAMP - (LASTREQUESTED + INTERVAL '8 hours'))
See Fiddle
Regarding #2: Based on Standard SQL the columns in the Select-list are calculated after FROM/WHERE/GROUP BY/HAVING, but before ORDER, that's why you can't use an alias in WHERE. There are some good articles on that topic written by Itzik Ben-Gan (based on MS SQL Server, but similar for PostgreSQL).

Get data that is no more than an hour old in BigQuery

Trying to use the statement:
SELECT *
FROM data.example
WHERE TIMESTAMP(timeCollected) < DATE_ADD(USEC_TO_TIMESTAMP(NOW()), 60, 'MINUTE')
to get data from my bigquery data. It seems to return same set of result even when time is not within the range. timeCollected is of the format 2015-10-29 16:05:06.
I'm trying to build a query that is meant to return is data that is not older than an hour. So data collected within the last hour should be returned, the rest should be ignored.
Using Standard SQL:
SELECT * FROM data
WHERE timestamp > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -60 MINUTE)
The query you made means "return to me anything that has a collection time smaller than an hour in the future" which will literally mean your whole table. You want the following (from what I got through your comment, at least) :
SELECT *
FROM data.example
WHERE TIMESTAMP(timeCollected) > DATE_ADD(USEC_TO_TIMESTAMP(NOW()), -60, 'MINUTE')
This means that any timeCollected that is NOT greater than an hour ago will not be returned. I believe this is what you want.
Also, unless you need it, Select * is not ideal in BigQuery. Since the data is saved by column, you can save money by selecting only what you need down the line. I don't know your use case, so * may be warranted though
To get table data collected within the last hour:
SELECT * FROM [data.example#-3600000--1]
https://cloud.google.com/bigquery/table-decorators
Using Standard SQL:
SELECT * FROM data WHERE timestamp > **TIMESTAMP_SUB**(CURRENT_TIMESTAMP(), INTERVAL 60 MINUTE)

Sql date > date by more than 8hrs

I am trying to do a query that chart date is the system time the data was charted is more than 8 hrs after the perform date time.
I am using the following query:
select * from pat_results where app_type like 'L' and (chart_dt_utc > perform_dt_utc +8)
The date and time format for both columns are 2012-12-29 11:44:00
Is the +8 correct?
No. In databases that allow you to add a number to a date, the number is measured in days.
The value you want to add is 8/24.0 -- include the decimal place, because some databases calculate 8/24 as integers and give you 0.
No, + 8 adds 8 days. You want:
select * from pat_results where app_type like 'L' and datediff(hour, chart_dt_utc, perform_dt_utc) > 8
Edit: Oh. For some reason I thought you were using SQL server. Well, suffice it to say, use whatever equivalent exists in your RDBMS.
Edit 2: In Oracle you can do this:
select * from pat_results where app_type like 'L'
and (chart_dt_utc > perform_dt_utc + (8 / 24))