I'm trying to query the average redo in Gb but failing with the below error.
The query to get redo usage by day & hour (without the AVG()) works.
SELECT
Start_Date,
Start_Time,
Num_Logs,
AVG(Round(Num_Logs * (Vl.Bytes / (1024 * 1024 * 1024)),2)) AS AVG_Gbytes,
Vdb.NAME AS Dbname
FROM
(SELECT
To_Char(Vlh.First_Time, 'YYYY-MM-DD') AS Start_Date,
To_Char(Vlh.First_Time, 'HH24') || ':00' AS Start_Time,
COUNT(Vlh.Thread#) Num_Logs
FROM
V$log_History Vlh
GROUP BY
To_Char(Vlh.First_Time, 'YYYY-MM-DD'),
To_Char(Vlh.First_Time, 'HH24') || ':00'
) Log_Hist, V$log Vl , V$database Vdb
WHERE
Vl.Group# = 1
ORDER BY
Log_Hist.Start_Date, Log_Hist.Start_Time;
The error:
ERROR at line 2:
ORA-00937: not a single-group group function
In an aggregation query, the SELECT columns need to be consistent with the GROUP BY. An aggregation query either has an explicit GROUP BY or uses aggregation functions (such as AVG()).
In your case, you have an aggregation function, and no GROUP BY. To fix the problem, include all unaggregated expressions in the GROUP BY. So add:
GROUP BY Start_Date, Start_Time, Num_Logs, Vdb.NAME
It is not clear if this actually does what you want. But you haven't explained. If this worked (i.e. no error), but doesn't do what you want, ask a new question with sample data, desired results, and a clear explanation.
Related
I want to group by dd-mm-yyyy format to show working_hours per employee (person) per day, but I get error message ORA-00979: not a GROUP BY expression, when I remove TO_CHAR from GROUP BY it works fine, but that's not I want as I want to group by days regardless hours, what am I doing wrong here?
SELECT papf.person_number emp_id,
to_char(sh21.start_time,'dd/mm/yyyy') start_time,
to_char(sh21.stop_time,'dd/mm/yyyy') stop_time,
SUM(sh21.measure) working_hours
FROM per_all_people_f papf,
hwm_tm_rec sh21
WHERE ...
GROUP BY
papf.person_number,
to_char(sh21.start_time,'dd/mm/yyyy'),
to_char(sh21.stop_time,'dd/mm/yyyy')
ORDER BY sh21.start_time
ORDER BY sh21.start_time
needs to either be just the column alias defined in the SELECT clause:
ORDER BY start_time
or use the expression in the GROUP BY clause:
ORDER BY to_char(sh21.start_time,'dd/mm/yyyy')
If you use sh21.start_time then the table_alias.column_name syntax refers to the underlying column from the table and you are not selecting/grouping by that.
I am planning to calculate the most frequency part_of_day for each of the user. In this case, firstly, I encoded timestamp with part_of_day, then aggregrate with the most frequency part_of_day. I use the ARRAY_AGG to calculate the mode (). However, I’m not sure how to deal with timestamp with the ARRAY_AGG, because there is error, so my code structure might be wrong
SELECT User_ID, time,
ARRAY_AGG(Time ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day,
case
when time BETWEEN '04:00:00' AND '12:00:00'
then "morning"
when time < '04:00:00' OR time > '20:00:00'
then "night"
end AS part_of_day
FROM (
SELECT User_ID,
TIME_TRUNC(TIME(Request_Timestamp), SECOND) AS Time
COUNT(*) AS cnt
Error received:
Syntax error: Expected ")" but got identifier "COUNT" at [19:9]
Even though you did not share any sample data, I was able to identify some issues within your code.
I have used some sample data I created based in the formats and functions you used in your code to keep consistency. Below is the code, without any errors:
WITH data AS (
SELECT 98 as User_ID,DATETIME "2008-12-25 05:30:00.000000" AS Request_Timestamp, "something!" AS channel UNION ALL
SELECT 99 as User_ID,DATETIME "2008-12-25 22:30:00.000000" AS Request_Timestamp, "something!" AS channel
)
SELECT User_ID, time,
ARRAY_AGG(Time ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day1,
case
when time BETWEEN '04:00:00' AND '12:00:00'
then "morning"
when time < '04:00:00' OR time > '20:00:00'
then "night"
end AS part_of_day
FROM (
SELECT User_ID,
TIME_TRUNC(TIME(Request_Timestamp), SECOND) AS time,
COUNT(*) AS cnt
FROM data
GROUP BY User_ID, Channel, Request_Timestamp
#order by Request_Timestamp
)
GROUP BY User_ID, Time;
First, notice that I have changed the column's name in your ARRAY_AGG() method, it had to be done because it would cause the error "Duplicate column name". Second, after your TIME_TRUNC() function, it was missing a comma so you could select COUNT(*). Then, within your GROUP BY, you needed to group Request_Timestamp as well because it wasn't aggregated nor grouped. Lastly, in your last GROUP BY, you needed to aggregate or group time. Thus, after theses corrections, your code will execute without any errors.
Note: the Syntax error: Expected ")" but got identifier "COUNT" at [19:9] error you experienced is due to the missing comma. The others would be shown after correcting this one.
If you want the most frequent part of each day, you need to use the day part in the aggregation:
SELECT User_ID,
ARRAY_AGG(part_of_day ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] part_of_day
FROM (SELECT User_ID,
(case when time BETWEEN '04:00:00' AND '12:00:00' then 'morning'
when time < '04:00:00' OR time > '20:00:00' then 'night'
end) AS part_of_day
COUNT(*) AS cnt
FROM cognitivebot2.chitchaxETL.conversations
GROUP BY User_ID, part_of_day
) u
GROUP BY User_ID;
Obviously, if you want the channel as well, then you need to include that in the queries.
I am using BigQuery for SQL and I can't figure out why there is an error message that comes like this:
Window ORDER BY expression references column start_date which is neither grouped nor aggregated at [4:73]
Here is my code:
SELECT EXTRACT(WEEK FROM start_date) as week, count(start_date) as count,
RANK() OVER (PARTITION BY start_station_name ORDER BY EXTRACT(WEEK FROM start_date))
from `bigquery-public-data.london_bicycles.cycle_hire`
GROUP BY EXTRACT(WEEK FROM start_date), start_station_name)
I thought I have grouped the week below, as seen in the last line. So what can cause this error message to keep popping up?
This is a parsing error in BigQuery, which you can work around with an aggregation function. Your query has another issue, which is the start_station_name.
SELECT EXTRACT(WEEK FROM start_date) as week, start_station_name, count(start_date) as count,
RANK() OVER (PARTITION BY start_station_name ORDER BY MIN(EXTRACT(WEEK FROM start_date)))
from `bigquery-public-data.london_bicycles.cycle_hire`
GROUP BY 1, 2;
The MIN() really serves no purpose other than lettering BigQuery parse the query. Because the expression is part of the GROUP BY, there is only one value for the MIN() to consider.
This is a bug in the BigQuery parsing, because it does not recognize that the expression is the same as the expression in the GROUP BY. Happily, it is easy to work around.
try like below using cte
with cte as
(
SELECT *, EXTRACT(WEEK FROM start_date) as week
from `bigquery-public-data.london_bicycles.cycle_hire`
) select week,count(start_date) as count,
RANK() OVER (PARTITION BY start_station_name ORDER BY week)
from cte group by week,start_station_name
In query you need to make sure that you have to put ORDER BY only on those values which you are selecting.
With your query, the problem is you are doing ORDER BY EXTRACT(WEEK from start_date). Rather than doing this you should write ORDER BY week because you are selecting week already
I'm trying to group data by minutes, so I tried this query:
SELECT FROM_UNIXTIME(
unix_timestamp (time, 'yyyy-mm-dd hh:mm:ss'), 'yyyy-mm-dd hh:mm') as ts,
count (*) as cnt
from toucher group by ts limit 10;
Then hive tells me no such column,
FAILED: SemanticException [Error 10004]: Line 1:134 Invalid table
alias or column reference 'ts': (possible column names are: time, ip,
username, code)
So is it not supported by hive?
SELECT FROM_UNIXTIME(unix_timestamp (time, 'yyyy-mm-dd hh:mm:ss'), 'yyyy-mm-dd hh:mm') as ts,
count (*) as cnt
from toucher
group by FROM_UNIXTIME(unix_timestamp (time, 'yyyy-mm-dd hh:mm:ss'), 'yyyy-mm-dd hh:mm') limit 10;
or and better
select t.ts, count(*) from
(SELECT FROM_UNIXTIME(unix_timestamp (time, 'yyyy-mm-dd hh:mm:ss'), 'yyyy-mm-dd hh:mm') as ts
from toucher ) t
group by t.ts limit 10;
As is the case with most relational database systems, the SELECT clause is processed after the GROUP BY clause. This means you cannot use columns aliased in the SELECT (such as ts in this example) in your GROUP BY.
There are essentially two ways around this. Both are correct, but some people have preference for one over the other for various reasons.
First, you could group by the original expression, rather than the alias. This results in duplicate code, as you will have the exact same expression in both your SELECT and GROUP BY clause.
SELECT
FROM_UNIXTIME(unix_timestamp(time,'yyyy-mm-dd hh:mm:ss'),'yyyy-mm-dd hh:mm') as ts,
COUNT(*) as cnt
FROM toucher
GROUP BY FROM_UNIXTIME(unix_timestamp(time,'yyyy-mm-dd hh:mm:ss'),'yyyy-mm-dd hh:mm')
LIMIT 10;
A second approach is to wrap your expression and alias in a subquery. This means you do not have to duplicate your expression, but you will have two nested queries and this may have performance implications.
SELECT
ts,
COUNT(*) as cnt
FROM
(SELECT
FROM_UNIXTIME(unix_timestamp(time,'yyyy-mm-dd hh:mm:ss'),'yyyy-mm-dd hh:mm') as ts,
FROM toucher) x
GROUP BY x.ts
LIMIT 10;
Both should have the same result. Which you should use in this case will depend on your particular use; or perhaps personal preference.
Hope that helps.
I need to query for each minute the total count of rows up to that minute.
The best I could achieve so far doesn't do the trick. It returns count per minute, not the total count up to each minute:
SELECT COUNT(id) AS count
, EXTRACT(hour from "when") AS hour
, EXTRACT(minute from "when") AS minute
FROM mytable
GROUP BY hour, minute
Return only minutes with activity
Shortest
SELECT DISTINCT
date_trunc('minute', "when") AS minute
, count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY 1;
Use date_trunc(), it returns exactly what you need.
Don't include id in the query, since you want to GROUP BY minute slices.
count() is typically used as plain aggregate function. Appending an OVER clause makes it a window function. Omit PARTITION BY in the window definition - you want a running count over all rows. By default, that counts from the first row to the last peer of the current row as defined by ORDER BY. The manual:
The default framing option is RANGE UNBOUNDED PRECEDING, which is the
same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY,
this sets the frame to be all rows from the partition start up
through the current row's last ORDER BY peer.
And that happens to be exactly what you need.
Use count(*) rather than count(id). It better fits your question ("count of rows"). It is generally slightly faster than count(id). And, while we might assume that id is NOT NULL, it has not been specified in the question, so count(id) is wrong, strictly speaking, because NULL values are not counted with count(id).
You can't GROUP BY minute slices at the same query level. Aggregate functions are applied before window functions, the window function count(*) would only see 1 row per minute this way.
You can, however, SELECT DISTINCT, because DISTINCT is applied after window functions.
ORDER BY 1 is just shorthand for ORDER BY date_trunc('minute', "when") here.
1 is a positional reference reference to the 1st expression in the SELECT list.
Use to_char() if you need to format the result. Like:
SELECT DISTINCT
to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
, count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY date_trunc('minute', "when");
Fastest
SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
FROM (
SELECT date_trunc('minute', "when") AS minute
, count(*) AS minute_ct
FROM tbl
GROUP BY 1
) sub
ORDER BY 1;
Much like the above, but:
I use a subquery to aggregate and count rows per minute. This way we get 1 row per minute without DISTINCT in the outer SELECT.
Use sum() as window aggregate function now to add up the counts from the subquery.
I found this to be substantially faster with many rows per minute.
Include minutes without activity
Shortest
#GabiMe asked in a comment how to get eone row for every minute in the time frame, including those where no event occured (no row in base table):
SELECT DISTINCT
minute, count(c.minute) OVER (ORDER BY minute) AS running_ct
FROM (
SELECT generate_series(date_trunc('minute', min("when"))
, max("when")
, interval '1 min')
FROM tbl
) m(minute)
LEFT JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute)
ORDER BY 1;
Generate a row for every minute in the time frame between the first and the last event with generate_series() - here directly based on aggregated values from the subquery.
LEFT JOIN to all timestamps truncated to the minute and count. NULL values (where no row exists) do not add to the running count.
Fastest
With CTE:
WITH cte AS (
SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
FROM tbl
GROUP BY 1
)
SELECT m.minute
, COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM (
SELECT generate_series(min(minute), max(minute), interval '1 min')
FROM cte
) m(minute)
LEFT JOIN cte USING (minute)
ORDER BY 1;
Again, aggregate and count rows per minute in the first step, it omits the need for later DISTINCT.
Different from count(), sum() can return NULL. Default to 0 with COALESCE.
With many rows and an index on "when" this version with a subquery was fastest among a couple of variants I tested with Postgres 9.1 - 9.4:
SELECT m.minute
, COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM (
SELECT generate_series(date_trunc('minute', min("when"))
, max("when")
, interval '1 min')
FROM tbl
) m(minute)
LEFT JOIN (
SELECT date_trunc('minute', "when") AS minute
, count(*) AS minute_ct
FROM tbl
GROUP BY 1
) c USING (minute)
ORDER BY 1;