I would like to query Athena with JSON files. I matched creation_date with id because I would like to get a heatmap where on Y axis I have month, on X axis there day and I count the id's inside. I created a table with 2 columns:
creation_date date, id int. Next I am query with the below code:
SELECT CAST(creation_date as DATE) as ad_creation,
COUNT(id) as Total_ads
FROM default.test
GROUP BY CAST(creation_at_first as DATE)
unfortunately I am getting this error:
DatabaseError: Execution failed on sql: SELECT CAST(creation_date as DATE) as ad_creation, COUNT(id) as Total_ads FROM default.testing_fresh_1 GROUP BY CAST(creation_date as DATE)
When I query Select * from...
I get results formatted like this:
creation_date
2018-07-01 02:02:09
2018-06-05 01:39:30
2018-05-16 21:28:48
2017-04-23 17:03:53
Any idea what I am doing wrong?
From your select * result set, I guess there isn't ID column in your table.
You can try to use COUNT(*) instead of COUNT(id)
SELECT CAST(creation_date as DATE) as ad_creation,
COUNT(*) as Total_ads
FROM default.test
GROUP BY CAST(creation_date as DATE)
Try below Code.
SELECT CAST(creation_date as DATE) as ad_creation,
COUNT(id) as Total_ads
FROM default.testing_fresh_1
GROUP BY ad_creation
Related
I wanted to take just the year and month from '2021-07-31 13:53:26' and group them based on count values.
i tried the date, datetime, strftime functions.
Date and Datetime resulting null. strftime result something, but i cant group the Year and Month i get with the count i want, resulting null again
Here is the preview of the data.
expected result example is like '2021-07' with the count of how many times this year and month occurs
This is the syntax i tried with strftime:
select strftime('%Y%m', started_at) year_month, count(year_month) from bike_trip
group by year_month
Thank You
Sqlite doesn't have a date data type so you will need to do string comparison to achieve this.
with d as (
select '2021-07-31 13:53:26' as d, 'A' val union all
select '2021-08-30 13:53:26' as d, 'B' val
)
select substr(d,1,4) as yyyy, substr(d,6,2) as mm, count(*)
from d
group by substr(d,1,4), substr(d,6,2)
in your query:
select substr(started_at,1,4) as yyyy, substr(started_at,6,2) as mm, count(*)
from bike_trip
group by substr(started_at,1,4), substr(started_at,6,2)
Use a CTE to get your answer.
with
-- uncomment to test
/*bike_trip(started_at) as (
values
('2021-07-31 13:53:26'),
('2021-07-17 19:06:01'),
('2021-08-30 13:53:26')
),*/
bike_months(year_month) as (
select strftime('%Y-%m', started_at) year_month from bike_trip
)
select year_month, count(year_month) count_year_month from bike_months
group by year_month;
Output:
year_month|count_year_month
2021-07|2
2021-08|1
This is a sample data file
Data Contains unique IDs with different latitudes and longitudes on multiple timestamps.I would like to select the rows of latest 30 days of coordinates for each unique ID.Please help me on how to run the query .This date is in Hive table
Regards,
Akshay
According to your example above (where no current year dates for id=2,3), you can numbering date for each id (order by date descending) using window function ROW_NUMBER(). Then just get latest 30 values:
--get all values for each id where num<=30 (get last 30 days for each day)
select * from
(
--numbering each date for each id order by descending
select *, row_number()over(partition by ID order by DATE desc)num from Table
)X
where num<=30
If you need to get only unique dates (without consider time) for each id, then can try this query:
select * from
(
--numbering date for each id
select *, row_number()over(partition by ID order by new_date desc)num
from
(
-- move duplicate using distinct
select distinct ID,cast(DATE as date)new_date from Table
)X
)Y
where num<=30
In Oracle this will be:
SELECT * FROM TEST_DATE1
WHERE DATEUPDT > SYSDATE - 30;
select * from MyTable
where
[Date]>=dateadd(d, -30, getdate());
To group by ID and perform aggregation, something like this
select ID,
count(*) row_count,
max(Latitude) max_lat,
max(Longitude) max_long
from MyTable
where
[Date]>=dateadd(d, -30, getdate())
group by ID;
I have the following query that I am trying to run on Athena.
SELECT observation_date, COUNT(*) AS count
FROM db.table_name
WHERE observation_date > '2017-12-31'
GROUP BY observation_date
However it is producing this error:
SYNTAX_ERROR: line 3:24: '>' cannot be applied to date, varchar(10)
This seems odd to me. Is there an error in my query or is Athena not able to handle greater than operators on date columns?
Thanks!
You need to use a cast to format the date correctly before making this comparison. Try the following:
SELECT observation_date, COUNT(*) AS count
FROM db.table_name
WHERE observation_date > CAST('2017-12-31' AS DATE)
GROUP BY observation_date
Check it out in Fiddler: SQL Fidle
UPDATE 17/07/2019
In order to reflect comments
SELECT observation_date, COUNT(*) AS count
FROM db.table_name
WHERE observation_date > DATE('2017-12-31')
GROUP BY observation_date
You can also use the date function which is a convenient alias for CAST(x AS date):
SELECT *
FROM date_data
WHERE trading_date >= DATE('2018-07-06');
select * from my_schema.my_table_name where date_column = cast('2017-03-29' as DATE) limit 5
I just want to add my little words here, if you have date column with ISO-8601 format, for example: 2022-08-02T01:46:46.963120Z then you can use parse_datetime function.
In my case, the query looks like this:
SELECT * FROM internal_alb_logs
WHERE elb_status_code >= 500 AND parse_datetime(time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') > parse_datetime('2022-08-01-23:00:00','yyyy-MM-dd-HH:mm:ss')
ORDER BY time DESC
See more other examples here: https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html#query-alb-logs-examples
I am trying to find all record count for each day using query:
select cast(Timestamp_field as date), count(*) as cnt
from table1
group by 1
having cast(Timestamp_field as date) between date and date -10;
Timestamp_field is a timestamp and I am casting this to date. This; despite max value of Timestamp_field showing 2016-09-20 12:31:38.000000, doesn't return any record. Any idea why?
My guess is that the problem is the between. Perhaps this will work for you:
select cast(Timestamp_field as date), count(*)
from table1
group by 1
having cast(Timestamp_field as date) between date - 10 and date;
The smaller value should go first for the between comparands.
Note: You should do the filtering before the group by, not after:
select cast(Timestamp_field as date), count(*)
from table1
where cast(Timestamp_field as date) between date - 10 and date;
group by 1
I have a problem unique to a business process. My user needs to know how many dates, counted, are before a specific end time that do not match on the hour or the day.
Here is an example.
AAA, 2016-03-15 16:00:28.967, 2016-03-15 16:02:58.487, 2016-03-17 14:01:24.243
In the example above id AAA has 3 entries. I need to count only the ones that don't have a matching hour and day. So the actual count should come out to be 2.
I have to do this all in SQL and can't use a CTE. It needs to be either a sub select or some type of join.
Something like this.
SELECT id, date, (
SELECT COUNT(*)
FROM x
WHERE day!=day
AND hour!=hour AND date < z
) AS DateCount
Results would be AAA, 2
I am thinking some type of recursive comparison but I am not sure how to accomplish this without a CTE.
In SQL Server you can try something like this:
SELECT id, CONVERT(VARCHAR(13), [date], 120) AS [Date], COUNT(*) AS DateCount
FROM YourTable
WHERE [date] < #ENDDATE
GROUP BY id, CONVERT(VARCHAR(13), [date], 120)
SELECT a AS current_a, COUNT(*) AS b,day AS day, hour as hour,
(SELECT COUNT(*)
FROM t
WHERE day != day
AND hour != hour
AND date < z ) as datecount
FROM t GROUP BY a ORDER by b DESC