Group by a generated column - sql

I'm trying to group data by minutes, so I tried this query:
SELECT FROM_UNIXTIME(
unix_timestamp (time, 'yyyy-mm-dd hh:mm:ss'), 'yyyy-mm-dd hh:mm') as ts,
count (*) as cnt
from toucher group by ts limit 10;
Then hive tells me no such column,
FAILED: SemanticException [Error 10004]: Line 1:134 Invalid table
alias or column reference 'ts': (possible column names are: time, ip,
username, code)
So is it not supported by hive?

SELECT FROM_UNIXTIME(unix_timestamp (time, 'yyyy-mm-dd hh:mm:ss'), 'yyyy-mm-dd hh:mm') as ts,
count (*) as cnt
from toucher
group by FROM_UNIXTIME(unix_timestamp (time, 'yyyy-mm-dd hh:mm:ss'), 'yyyy-mm-dd hh:mm') limit 10;
or and better
select t.ts, count(*) from
(SELECT FROM_UNIXTIME(unix_timestamp (time, 'yyyy-mm-dd hh:mm:ss'), 'yyyy-mm-dd hh:mm') as ts
from toucher ) t
group by t.ts limit 10;

As is the case with most relational database systems, the SELECT clause is processed after the GROUP BY clause. This means you cannot use columns aliased in the SELECT (such as ts in this example) in your GROUP BY.
There are essentially two ways around this. Both are correct, but some people have preference for one over the other for various reasons.
First, you could group by the original expression, rather than the alias. This results in duplicate code, as you will have the exact same expression in both your SELECT and GROUP BY clause.
SELECT
FROM_UNIXTIME(unix_timestamp(time,'yyyy-mm-dd hh:mm:ss'),'yyyy-mm-dd hh:mm') as ts,
COUNT(*) as cnt
FROM toucher
GROUP BY FROM_UNIXTIME(unix_timestamp(time,'yyyy-mm-dd hh:mm:ss'),'yyyy-mm-dd hh:mm')
LIMIT 10;
A second approach is to wrap your expression and alias in a subquery. This means you do not have to duplicate your expression, but you will have two nested queries and this may have performance implications.
SELECT
ts,
COUNT(*) as cnt
FROM
(SELECT
FROM_UNIXTIME(unix_timestamp(time,'yyyy-mm-dd hh:mm:ss'),'yyyy-mm-dd hh:mm') as ts,
FROM toucher) x
GROUP BY x.ts
LIMIT 10;
Both should have the same result. Which you should use in this case will depend on your particular use; or perhaps personal preference.
Hope that helps.

Related

Oracle SQL group by to_char - not a group by expression

I want to group by dd-mm-yyyy format to show working_hours per employee (person) per day, but I get error message ORA-00979: not a GROUP BY expression, when I remove TO_CHAR from GROUP BY it works fine, but that's not I want as I want to group by days regardless hours, what am I doing wrong here?
SELECT papf.person_number emp_id,
to_char(sh21.start_time,'dd/mm/yyyy') start_time,
to_char(sh21.stop_time,'dd/mm/yyyy') stop_time,
SUM(sh21.measure) working_hours
FROM per_all_people_f papf,
hwm_tm_rec sh21
WHERE ...
GROUP BY
papf.person_number,
to_char(sh21.start_time,'dd/mm/yyyy'),
to_char(sh21.stop_time,'dd/mm/yyyy')
ORDER BY sh21.start_time
ORDER BY sh21.start_time
needs to either be just the column alias defined in the SELECT clause:
ORDER BY start_time
or use the expression in the GROUP BY clause:
ORDER BY to_char(sh21.start_time,'dd/mm/yyyy')
If you use sh21.start_time then the table_alias.column_name syntax refers to the underlying column from the table and you are not selecting/grouping by that.

Average Redo Usage Query

I'm trying to query the average redo in Gb but failing with the below error.
The query to get redo usage by day & hour (without the AVG()) works.
SELECT
Start_Date,
Start_Time,
Num_Logs,
AVG(Round(Num_Logs * (Vl.Bytes / (1024 * 1024 * 1024)),2)) AS AVG_Gbytes,
Vdb.NAME AS Dbname
FROM
(SELECT
To_Char(Vlh.First_Time, 'YYYY-MM-DD') AS Start_Date,
To_Char(Vlh.First_Time, 'HH24') || ':00' AS Start_Time,
COUNT(Vlh.Thread#) Num_Logs
FROM
V$log_History Vlh
GROUP BY
To_Char(Vlh.First_Time, 'YYYY-MM-DD'),
To_Char(Vlh.First_Time, 'HH24') || ':00'
) Log_Hist, V$log Vl , V$database Vdb
WHERE
Vl.Group# = 1
ORDER BY
Log_Hist.Start_Date, Log_Hist.Start_Time;
The error:
ERROR at line 2:
ORA-00937: not a single-group group function
In an aggregation query, the SELECT columns need to be consistent with the GROUP BY. An aggregation query either has an explicit GROUP BY or uses aggregation functions (such as AVG()).
In your case, you have an aggregation function, and no GROUP BY. To fix the problem, include all unaggregated expressions in the GROUP BY. So add:
GROUP BY Start_Date, Start_Time, Num_Logs, Vdb.NAME
It is not clear if this actually does what you want. But you haven't explained. If this worked (i.e. no error), but doesn't do what you want, ask a new question with sample data, desired results, and a clear explanation.

Aggregrate the variable from timestamp on bigQuery

I am planning to calculate the most frequency part_of_day for each of the user. In this case, firstly, I encoded timestamp with part_of_day, then aggregrate with the most frequency part_of_day. I use the ARRAY_AGG to calculate the mode (). However, I’m not sure how to deal with timestamp with the ARRAY_AGG, because there is error, so my code structure might be wrong
SELECT User_ID, time,
ARRAY_AGG(Time ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day,
case
when time BETWEEN '04:00:00' AND '12:00:00'
then "morning"
when time < '04:00:00' OR time > '20:00:00'
then "night"
end AS part_of_day
FROM (
SELECT User_ID,
TIME_TRUNC(TIME(Request_Timestamp), SECOND) AS Time
COUNT(*) AS cnt
Error received:
Syntax error: Expected ")" but got identifier "COUNT" at [19:9]
Even though you did not share any sample data, I was able to identify some issues within your code.
I have used some sample data I created based in the formats and functions you used in your code to keep consistency. Below is the code, without any errors:
WITH data AS (
SELECT 98 as User_ID,DATETIME "2008-12-25 05:30:00.000000" AS Request_Timestamp, "something!" AS channel UNION ALL
SELECT 99 as User_ID,DATETIME "2008-12-25 22:30:00.000000" AS Request_Timestamp, "something!" AS channel
)
SELECT User_ID, time,
ARRAY_AGG(Time ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day1,
case
when time BETWEEN '04:00:00' AND '12:00:00'
then "morning"
when time < '04:00:00' OR time > '20:00:00'
then "night"
end AS part_of_day
FROM (
SELECT User_ID,
TIME_TRUNC(TIME(Request_Timestamp), SECOND) AS time,
COUNT(*) AS cnt
FROM data
GROUP BY User_ID, Channel, Request_Timestamp
#order by Request_Timestamp
)
GROUP BY User_ID, Time;
First, notice that I have changed the column's name in your ARRAY_AGG() method, it had to be done because it would cause the error "Duplicate column name". Second, after your TIME_TRUNC() function, it was missing a comma so you could select COUNT(*). Then, within your GROUP BY, you needed to group Request_Timestamp as well because it wasn't aggregated nor grouped. Lastly, in your last GROUP BY, you needed to aggregate or group time. Thus, after theses corrections, your code will execute without any errors.
Note: the Syntax error: Expected ")" but got identifier "COUNT" at [19:9] error you experienced is due to the missing comma. The others would be shown after correcting this one.
If you want the most frequent part of each day, you need to use the day part in the aggregation:
SELECT User_ID,
ARRAY_AGG(part_of_day ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day
FROM (SELECT User_ID,
(case when time BETWEEN '04:00:00' AND '12:00:00' then 'morning'
when time < '04:00:00' OR time > '20:00:00' then 'night'
end) AS part_of_day
COUNT(*) AS cnt
FROM cognitivebot2.chitchaxETL.conversations
GROUP BY User_ID, part_of_day
) u
GROUP BY User_ID;
Obviously, if you want the channel as well, then you need to include that in the queries.

Check if timestamp is contained in date

I'm trying to check if a datetime is contained in current date, but I'm not veing able to do it.
This is my query:
select
date(timestamp) as event_date,
count(*)
from pixel_logs.full_logs f
where 1=1
where event_date = CUR_DATE()
How can I fix it?
Like Mikhail said, you need to use CURRENT_DATE(). Also, count(*) requires you to GROUP BY the date in your example. I do not know how your data is formatted, but one way to modify your query:
#standardSQL
WITH
table AS (
SELECT
1494977678 AS timestamp_secs) -- Current timestamp (in seconds)
SELECT
event_date,
COUNT(*) as count
FROM (
SELECT
DATE(TIMESTAMP_SECONDS(timestamp_secs)) AS event_date,
CURRENT_DATE()
FROM
table)
WHERE
event_date = CURRENT_DATE()
GROUP BY
event_date;

DISTINCT date value in ORA-01791: not a SELECTed expression

I would like to get the unique date values from order table using oracle query. I am getting
ORA-01791: not a SELECTed expression
error, When i tried this below query
SELECT DISTINCT (TO_DATE(LAST_INSERT_TIMESTAMP, 'YYYY-MM-DD HH24:MI'))
FROM ORDER
WHERE LAST_INSERT_TIMESTAMP IS NOT NULL
ORDER BY LAST_INSERT_TIMESTAMP DESC;
LAST_INSERT_TIMESTAMP is not in your result list, because you have aggregated your rows with DISTINCT to a truncated timestamp. You can only order by this.
SELECT DISTINCT TRUNC(LAST_INSERT_TIMESTAMP, 'MI')
FROM ORDER
WHERE LAST_INSERT_TIMESTAMP IS NOT NULL
ORDER BY TRUNC(LAST_INSERT_TIMESTAMP, 'MI') DESC;
If you don't want to repeat the expression use positional sort:
ORDER BY 1 DESC;
Or use an alias for the expression:
SELECT DISTINCT TRUNC(LAST_INSERT_TIMESTAMP, 'MI') AS LAST_INSERT
FROM ORDER
WHERE LAST_INSERT_TIMESTAMP IS NOT NULL
ORDER BY LAST_INSERT DESC;
Please note that I replaced your TO_DATE with the appropriate TRUNC because all you want to do is truncate your timestamp, not convert to and from string.