get count all with groupby timestamp into hourly intervals - sql

I have a hive table that has a timestamp in string format as below,
20190516093836, 20190304125015, 20181115101358
I want to get row count with an aggregate timestamp into hourly as below
date_time count
-----------------------------
2019:05:16: 00:00:00 23
2019:05:16: 01:00:00 64
I followed several links like this but was unable to generate the desired results yet.
This is my final query:
SELECT
DATE_PART('day', b.date_time) AS date_prt,
DATE_PART('hour', b.date_time) AS hour_prt,
COUNT(*)
FROM
(SELECT
from_unixtime(unix_timestamp(`timestamp`, "yyyyMMddHHmmss")) AS date_time
FROM table_name
WHERE from_unixtime(unix_timestamp(`timestamp`, "yyyyMMddHHmmss"))
BETWEEN '2018-12-10 07:02:30' AND '2018-12-12 08:02:30') b
GROUP BY
date_prt, hour_prt
I hope for some guidance from you, thanks in advance

You can extract date_time already in required format 'yyyy-MM-dd HH:00:00'. I prefer using regexp_replace:
SELECT
date_time,
COUNT(*) as `count`
FROM
(SELECT
regexp_replace(`timestamp`, '^(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})$','$1-$2-$3 $4:00:00') AS date_time
FROM table_name
WHERE regexp_replace(`timestamp`, '^(\\d{4})(\\d{2})(\\d{2})(\\d{2})(\\d{2})(\\d{2})$','$1-$2-$3 $4:$5:$6')
BETWEEN '2018-12-10 07:02:30' AND '2018-12-12 08:02:30') b
GROUP BY
date_time
This will also work:
from_unixtime(unix_timestamp('20190516093836', "yyyyMMddHHmmss"),'yyyy-MM-dd HH:00:00') AS date_time

Related

PRESTO SQL count group by date

I have a Presto sql table called "imp_pixel".
Here a record of a table :
date_time ip impression_id
2022-08-27 07:05:48 192.0.0.1 001
2022-08-27 07:05:58 192.0.0.12 002
I would like to show the sum of impression_id group by hour
I tryed with this code
select
date_trunc('hour', CAST(date_time AS date)) date_time,
COUNT(impression_id,0) AS 'impression_id'
from parquet_db.imp_pixel
group by date_trunc('hour', date)
But I got this error :
line 3:31: mismatched input ''impression_id''. Expecting: <identifier>
Can you help me please to fix this error?
thanks
Formatting date_time to date, we lose the hourly data
select
date_trunc('hour', CAST(date_time AS timestamp)) date_time,
COUNT(impression_id) AS impression_id
from parquet_db.imp_pixel
group by 1

Why group by date is returning multiple rows for the same date?

I have a query like the following.
select some_date_col, count(*) as cnt
from <the table>
group by some_date_col
I get something like that at the output.
13-12-2021, 6
13-12-2021, 8
13-12-2021, 9
....
How is that possible? Here some_date_col is of type Date.
A DATE is a binary data-type that is composed of 7 bytes (century, year-of-century, month, day, hour, minute and second) and will always have those components.
The user interface you use to access the database can choose to display some or all of those components of the binary representation of the DATE; however, regardless of whether or not they are displayed by the UI, all the components are always stored in the database and used in comparisons in queries.
When you GROUP BY a date data-type you aggregate values that have identical values down to an accuracy of a second (regardless of the accuracy the user interface).
So, if you have the data:
CREATE TABLE the_table (some_date_col) AS
SELECT DATE '2021-12-13' FROM DUAL CONNECT BY LEVEL <= 6 UNION ALL
SELECT DATE '2021-12-13' + INTERVAL '1' SECOND FROM DUAL CONNECT BY LEVEL <= 8 UNION ALL
SELECT DATE '2021-12-13' + INTERVAL '1' MINUTE FROM DUAL CONNECT BY LEVEL <= 9;
Then the query:
SELECT TO_CHAR(some_date_col, 'YYYY-MM-DD HH24:MI:SS') AS some_date_col,
count(*) as cnt
FROM the_table
GROUP BY some_date_col;
Will output:
SOME_DATE_COL
CNT
2021-12-13 00:01:00
9
2021-12-13 00:00:01
8
2021-12-13 00:00:00
6
The values are grouped according to equal values (down to the maximum precision stored in the date).
If you want to GROUP BY dates with the same date component but any time component then use the TRUNCate function (which returns a value with the same date component but the time component set to midnight):
SELECT TRUNC(some_date_col) AS some_date_col,
count(*) as cnt
FROM <the table>
GROUP BY TRUNC(some_date_col)
Which, for the same data outputs:
SOME_DATE_COL
CNT
13-DEC-21
23
And:
SELECT TO_CHAR(TRUNC(some_date_col), 'YYYY-MM-DD HH24:MI:SS') AS some_date_col,
count(*) as cnt
FROM the_table
GROUP BY TRUNC(some_date_col)
Outputs:
SOME_DATE_COL
CNT
2021-12-13 00:00:00
23
db<>fiddle here
Oracle date type holds a date and time component. If the time components do not match, grouping by that value will place the same date (with different times) in different groups:
The fiddle
CREATE TABLE test ( xdate date );
INSERT INTO test VALUES (current_date);
INSERT INTO test VALUES (current_date + INTERVAL '1' MINUTE);
With the default display format:
SELECT xdate, COUNT(*) FROM test GROUP BY xdate;
Result:
XDATE
COUNT(*)
13-DEC-21
1
13-DEC-21
1
Now alter the format and rerun:
ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MON-DD HH24:MI:SS';
SELECT xdate, COUNT(*) FROM test GROUP BY xdate;
The result
XDATE
COUNT(*)
2021-DEC-13 23:29:36
1
2021-DEC-13 23:30:36
1
Also try this:
SELECT to_char(xdate, 'YYYY-MON-DD HH24:MI:SS') AS formatted FROM test;
Result:
FORMATTED
2021-DEC-13 23:29:36
2021-DEC-13 23:30:36
and this:
SELECT to_char(xdate, 'YYYY-MON-DD HH24:MI:SS') AS formatted, COUNT(*) FROM test GROUP BY xdate;
Result:
FORMATTED
COUNT(*)
2021-DEC-13 23:29:36
1
2021-DEC-13 23:30:36
1

ORACLE SQL: Hourly Date to be group by day time and sum of the amount

I have the following situation:
ID DATE_TIME AMOUNT
23 14-MAY-2021 10:47:01 5
23 14-MAY-2021 11:49:52 3
23 14-MAY-2021 12:03:18 4
How can get the sum of the amount and take the DATE by day not hourly?
Example:
ID DATE_TIME TOTAL
23 20210514 12
I tried this way but i got error:
SELECT DISTINCT ID, TO_CHAR(DATE_TIME, 'YYYYMMDD'), SUM(AMOUNT) AS TOTAL FROM MY_TABLE
WHERE ID ='23' AND DATE_TIME > SYSDATE-1
GROUP BY TOTAL, DATE_TIME
You don't need DISTINCT if you use GROUP BY - anything that is grouped must be distinct unless it joined to something else later on that caused it to repeat again
You were almost there too
SELECT ID, TO_CHAR(DATE_TIME, 'YYYYMMDD') AS DATE_TIME, SUM(AMOUNT) AS TOTAL
FROM MY_TABLE
WHERE ID ='23' AND DATE_TIME > SYSDATE-1
GROUP BY ID, TO_CHAR(DATE_TIME, 'YYYYMMDD')
You need to group by the output of the function, not the input. Not every database can GROUP BY aliases used in the select (technically the SELECT hasn't been done by the time the GROUP is done so the aliases don't exist yet, and you wouldnt group by the total because that's an aggregate (the result of summing up every various value in the group)
If you need to do further work with that date, don't convert it to a string.. Cut the time off using TRUNC:
SELECT ID, TRUNC(DATE_TIME) as DATE_TIME, SUM(AMOUNT) AS TOTAL
FROM MY_TABLE
WHERE ID ='23' AND DATE_TIME > SYSDATE-1
GROUP BY ID, TRUNC(DATE_TIME)
TRUNC can cut a date down to other parts, for example TRUNC(DATE_TIME, 'HH24') will remove the minutes and seconds but leave the hours
Convert the DATE column to a string with the required accuracy and then group on that:
SELECT ID,
TO_CHAR("DATE", 'YYYY-MM-DD'),
SUM(AMOUNT) AS TOTAL FROM MY_TABLE
WHERE ID ='23'
AND "DATE" > SYSDATE-1
GROUP BY ID, TO_CHAR("DATE", 'YYYY-MM-DD')
or truncate the value so that the time component is set to midnight for each date:
SELECT ID,
TRUNC("DATE"),
SUM(AMOUNT) AS TOTAL FROM MY_TABLE
WHERE ID ='23'
AND "DATE" > SYSDATE-1
GROUP BY ID, TRUNC("DATE")
(Note: DATE is a keyword and cannot be used as an identifier unless you use a quoted-identifier; and you would need to use the quotes, and the exact case, everytime you refer to the column. You would be better to rename the column to something else that is not a keyword.)

date_format() equivalent in BigQuery

I have a date column in my table that contains a string in the format 2017-08-05 09-AM, and I'm trying to format it so that there is a column Date with a date type and column Time with a time type.
SELECT ID, DATE_FORMAT(a.date, "%Y-%d-%m") as date, DATE_FORMAT(a.date, "%T") as time, Symbol
FROM `crypto_market_data.BTC_1H` a
ORDER BY ID
The query runs how I want it to in MySQL, but date_format() is not supported in BigQuery. I'm wondering if there is a similar way to cast my string date to a separate Date and Time object.
Below is for BigQuery Standard SQL
#standardSQL
SELECT id, Symbol,
DATE(PARSE_DATETIME('%Y-%m-%d %H-%p', a.date)) AS `date`,
TIME(PARSE_DATETIME('%Y-%m-%d %H-%p', a.date)) AS time
FROM `project.dataset.table` a
You can test, play with above using dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '2017-08-05 09-AM' `date`, 'x' Symbol UNION ALL
SELECT 2, '2019-02-05 12-AM', 'y' UNION ALL
SELECT 3, '2019-01-31 11-PM', 'z'
)
SELECT id, Symbol,
DATE(PARSE_DATETIME('%Y-%m-%d %H-%p', a.date)) AS `date`,
TIME(PARSE_DATETIME('%Y-%m-%d %H-%p', a.date)) AS time
FROM `project.dataset.table` a
-- ORDER BY id
with result
Row id Symbol date time
1 1 x 2017-08-05 09:00:00
2 2 y 2019-02-05 12:00:00
3 3 z 2019-01-31 11:00:00
There is a PARSE_DATETIME function available for parsing custom datetime formats in BigQuery.
In your case, this should help:
select ID, extract(date from dt) as date, extract(time from dt) as time, Symbol from (
select
a.ID as ID,
parse_datetime('%Y-%d-%m %H-%p', a.date) as dt,
a.Symbol as Symbol
from
`crypto_market_data.BTC_1H` a
order by a.ID
)
SELECT ID, FORMAT_DATE("%Y-%d-%m",a.date) as date, FORMAT_DATE("%T",a.date) as time, Symbol FROM crypto_market_data.BTC_1H a ORDER BY ID
Bigquery uses the FORMAT_DATE For more info, follow this link BIG QUERY DATE FUNCTIONS

CAST a date in Presto to next count

I would like to query Athena with JSON files. I matched creation_date with id because I would like to get a heatmap where on Y axis I have month, on X axis there day and I count the id's inside. I created a table with 2 columns:
creation_date date, id int. Next I am query with the below code:
SELECT CAST(creation_date as DATE) as ad_creation,
COUNT(id) as Total_ads
FROM default.test
GROUP BY CAST(creation_at_first as DATE)
unfortunately I am getting this error:
DatabaseError: Execution failed on sql: SELECT CAST(creation_date as DATE) as ad_creation, COUNT(id) as Total_ads FROM default.testing_fresh_1 GROUP BY CAST(creation_date as DATE)
When I query Select * from...
I get results formatted like this:
creation_date
2018-07-01 02:02:09
2018-06-05 01:39:30
2018-05-16 21:28:48
2017-04-23 17:03:53
Any idea what I am doing wrong?
From your select * result set, I guess there isn't ID column in your table.
You can try to use COUNT(*) instead of COUNT(id)
SELECT CAST(creation_date as DATE) as ad_creation,
COUNT(*) as Total_ads
FROM default.test
GROUP BY CAST(creation_date as DATE)
Try below Code.
SELECT CAST(creation_date as DATE) as ad_creation,
COUNT(id) as Total_ads
FROM default.testing_fresh_1
GROUP BY ad_creation