select rows with condition of date presto - sql

I try to select by hour the number of impression for a particular day :
I try with this code :
SELECT
date_trunc('hour', CAST(date_time AS timestamp)) date_time,
COUNT(impression_id) AS count_impression_id
FROM
parquet_db.imp_pixel
WHERE
date_time = '2022-07-27'
LIMIT 100
GROUP BY 1
But I got this error when I add the "where" clause :
line 5:1: mismatched input 'group'. Expecting:
Can you help me to fix it? thanks

LIMIT usually comes last in a SQL query. Also, you should not be using LIMIT without ORDER BY. Use this version:
SELECT DATE_TRUNC('hour', CAST(date_time AS timestamp)) date_time,
COUNT(impression_id) AS count_impression_id
FROM parquet_db.imp_pixel
WHERE CAST(date_time AS date) = '2022-07-27'
GROUP BY 1
ORDER BY <something>
LIMIT 100;
Note that the ORDER BY clause determines which 100 records you get in the result set. Your current (intended) query lets Presto decide on its own which 100 records get returned.

Related

How do I select a data every second with PostgreSQL?

I've got a SQL query that selects every data between two dates and now I would like to add the time scale factor so that instead of returning all the data it returns one data every second, minute or hour.
Do you know how I can achieve it ?
My query :
"SELECT received_on, $1 FROM $2 WHERE $3 <= received_on AND received_on <= $4", [data_selected, table_name, date_1, date_2]
The table input:
As you can see there are several data the same second, I would like to select only one per second
If you want to select data every second, you may use ROW_NUMBER() function partitioned by 'received_on' as the following:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY received_on ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
If you want to select data every minute or hour, you may use the extract function to get the number of seconds in 'received_on' and divide it by 60 to get the minutes or divide it by 3600 to get the hours.
epoch: For date and timestamp values, the number of seconds since 1970-01-01 00:00:00-00 (can be negative); for interval values, the total number of seconds in the interval
Group by minutes:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY floor(extract(epoch from (received_on)) / 60) ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
Group by hours:
WITH DateGroups AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY floor(extract(epoch from (received_on)) / (60*60)) ORDER BY adc_v) AS rn
FROM table_name
)
SELECT received_on, adc_v, adc_i, acc_axe_x, acc_axe_y, acc_axe_z
FROM DateGroups
WHERE rn=1
ORDER BY received_on
See a demo.
When there are several rows per second, and you only want one result row per second, you can decide to pick one of the rows for each second. This can be a randomly chosen row or you pick the row with the greatest or least value in a column as shown in Ahmed's answer.
It would be more typical, though, to aggregate your data per second. The columns show figures and you are interested in those figures. Your sample data shows two times the value 2509 and three times the value 2510 for the adc_v column at 2022-07-29, 15:52. Consider what you would like to see. Maybe you don't want this value go below some boundary, so you show the minimum value MIN(adc_v) to see how low it went in the second. Or you want to see the value that occured most often in the second MODE(adc_v). Or you'd like to see the average value AVG(adc_v). Make this decision for every value, so as to get the informarion most vital to you.
select
received_on,
min(adc_v),
avg(adc_i),
...
from mytable
group by received_on
order by received_on;
If you want this for another interval, say an hour instead of the month, truncate your received_on column accordingly. E.g.:
select
date_trunc('hour', received_on) as received_hour,
min(adc_v),
avg(adc_i),
...
from mytable
group by date_trunc('hour', received_on)
order by date_trunc('hour', received_on);

Aggregrate the variable from timestamp on bigQuery

I am planning to calculate the most frequency part_of_day for each of the user. In this case, firstly, I encoded timestamp with part_of_day, then aggregrate with the most frequency part_of_day. I use the ARRAY_AGG to calculate the mode (). However, I’m not sure how to deal with timestamp with the ARRAY_AGG, because there is error, so my code structure might be wrong
SELECT User_ID, time,
ARRAY_AGG(Time ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day,
case
when time BETWEEN '04:00:00' AND '12:00:00'
then "morning"
when time < '04:00:00' OR time > '20:00:00'
then "night"
end AS part_of_day
FROM (
SELECT User_ID,
TIME_TRUNC(TIME(Request_Timestamp), SECOND) AS Time
COUNT(*) AS cnt
Error received:
Syntax error: Expected ")" but got identifier "COUNT" at [19:9]
Even though you did not share any sample data, I was able to identify some issues within your code.
I have used some sample data I created based in the formats and functions you used in your code to keep consistency. Below is the code, without any errors:
WITH data AS (
SELECT 98 as User_ID,DATETIME "2008-12-25 05:30:00.000000" AS Request_Timestamp, "something!" AS channel UNION ALL
SELECT 99 as User_ID,DATETIME "2008-12-25 22:30:00.000000" AS Request_Timestamp, "something!" AS channel
)
SELECT User_ID, time,
ARRAY_AGG(Time ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day1,
case
when time BETWEEN '04:00:00' AND '12:00:00'
then "morning"
when time < '04:00:00' OR time > '20:00:00'
then "night"
end AS part_of_day
FROM (
SELECT User_ID,
TIME_TRUNC(TIME(Request_Timestamp), SECOND) AS time,
COUNT(*) AS cnt
FROM data
GROUP BY User_ID, Channel, Request_Timestamp
#order by Request_Timestamp
)
GROUP BY User_ID, Time;
First, notice that I have changed the column's name in your ARRAY_AGG() method, it had to be done because it would cause the error "Duplicate column name". Second, after your TIME_TRUNC() function, it was missing a comma so you could select COUNT(*). Then, within your GROUP BY, you needed to group Request_Timestamp as well because it wasn't aggregated nor grouped. Lastly, in your last GROUP BY, you needed to aggregate or group time. Thus, after theses corrections, your code will execute without any errors.
Note: the Syntax error: Expected ")" but got identifier "COUNT" at [19:9] error you experienced is due to the missing comma. The others would be shown after correcting this one.
If you want the most frequent part of each day, you need to use the day part in the aggregation:
SELECT User_ID,
ARRAY_AGG(part_of_day ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day
FROM (SELECT User_ID,
(case when time BETWEEN '04:00:00' AND '12:00:00' then 'morning'
when time < '04:00:00' OR time > '20:00:00' then 'night'
end) AS part_of_day
COUNT(*) AS cnt
FROM cognitivebot2.chitchaxETL.conversations
GROUP BY User_ID, part_of_day
) u
GROUP BY User_ID;
Obviously, if you want the channel as well, then you need to include that in the queries.

Athena greater than condition in date column

I have the following query that I am trying to run on Athena.
SELECT observation_date, COUNT(*) AS count
FROM db.table_name
WHERE observation_date > '2017-12-31'
GROUP BY observation_date
However it is producing this error:
SYNTAX_ERROR: line 3:24: '>' cannot be applied to date, varchar(10)
This seems odd to me. Is there an error in my query or is Athena not able to handle greater than operators on date columns?
Thanks!
You need to use a cast to format the date correctly before making this comparison. Try the following:
SELECT observation_date, COUNT(*) AS count
FROM db.table_name
WHERE observation_date > CAST('2017-12-31' AS DATE)
GROUP BY observation_date
Check it out in Fiddler: SQL Fidle
UPDATE 17/07/2019
In order to reflect comments
SELECT observation_date, COUNT(*) AS count
FROM db.table_name
WHERE observation_date > DATE('2017-12-31')
GROUP BY observation_date
You can also use the date function which is a convenient alias for CAST(x AS date):
SELECT *
FROM date_data
WHERE trading_date >= DATE('2018-07-06');
select * from my_schema.my_table_name where date_column = cast('2017-03-29' as DATE) limit 5
I just want to add my little words here, if you have date column with ISO-8601 format, for example: 2022-08-02T01:46:46.963120Z then you can use parse_datetime function.
In my case, the query looks like this:
SELECT * FROM internal_alb_logs
WHERE elb_status_code >= 500 AND parse_datetime(time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') > parse_datetime('2022-08-01-23:00:00','yyyy-MM-dd-HH:mm:ss')
ORDER BY time DESC
See more other examples here: https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html#query-alb-logs-examples

How to get min value in postgres sql

I have few records and i want to create a query to give hourly records of a each divece battery level
What i did from timestamp field i extract date and select min function to get low value but as extract hour from timestamp is not aggragate function so i need to add in group by which now given me duplicate records.
Here is my sql:
select extract(hour from observationtime) as hour,
deviceid,
min(batterylevel) as batterylevel
from smartvakt_device_report
where batterylevel!=''
and deviceid!=''
and observationtime between '2016-02-02' and '2016-03-02'
group by observationtime,deviceid
order by observationtime ASC, deviceid ASC
Here is above query output:
Here are actual records:
Can someone suggest how i can remove these duplicate
Change Group by column Order to first Group by deviceid then by the hour using the same function extract(hour from observationtime).
SELECT
deviceid,
extract(hour from observationtime) AS hour,
min(batterylevel) AS batterylevel
FROM smartvakt_device_report
WHERE
batterylevel!=''
AND deviceid!=''
AND observationtime BETWEEN '2016-02-02'
AND '2016-03-02'
GROUP BY
deviceid,
extract(hour from observationtime)
ORDER BY
extract(hour from observationtime) ASC,
deviceid ASC
Since you are only interested in the hour, when you are grouping, you have to indicate that like this
group by extract(hour from observationtime)
Otherwise, postgresql will try to group together rows whose observationtime values are identical. But observationtime contains the time with full resolution, not just the hour.

Postgres sql query by time window

I have a table "meterreading" that has columns: "timestamp", "value", "meterId". I would like to get sums of the "value" for each hour starting a specific time... So far I have come up with this query, but it is erroring saying I need to group by timestamp. Timestamps are just integers representing unix epoch timestamps.
select date_trunc('hour', to_timestamp(timestamp)) as hours, sum(value)
from meterreading
WHERE timestamp >= 1377993600 AND timestamp < 1409595081
group by date_trunc('hours', to_timestamp(timestamp))
order by date_trunc('hours', to_timestamp(timestamp)) asc
select date_trunc('hour', to_timestamp(timestamp)) as hours, sum(value)
from meterreading
WHERE timestamp >= 1377993600 AND timestamp < 1409595081
group by 1
order by 1
or use the exact same expression used in the select list
group by date_trunc('hour', to_timestamp(timestamp));
Notice 'hour' in instead of 'hours'. Hence the convenience of the number reference syntax in the group by. It is clearer and less prone to errors.