This is what I have
select avg(visit_count) from ( SELECT count(user_id) as visit_count from table )group by user_id;
But I get the below error
ERROR 1248 (42000): Every derived table must have its own alias
if I add alias
then I get avg for only one user_id
What I want is the avg of visit_count for all user ids
SEE the picture for reference
Example 3,2.5,1.5
It means that your subquery needs to have an alias.
Like this:
select avg(visit_count) from (
select count(user_id) as visit_count from table
group by user_id) a
Your subquery is missing an alias. I think this is the version you want:
SELECT AVG(visit_count)
FROM
(
SELECT COUNT(user_id) AS visit_count
FROM yourTable
GROUP BY user_id
) t;
Note that GROUP BY belongs inside the subquery, as you want to find counts for all users.
Trying to get an overall distinct count of the employees for a range of records which has a group by on it.
I've tried using the "over()" clause but couldn't get that to work. Best to explain using an example so please see my script below and wanted result below.
EDIT:
I should mention I'm hoping for a solution that does not use a sub-query based on my "sales_detail" table below because in my real example, the "sales_detail" table is a very complex sub-query.
Here's the result I want. Column "wanted_result" should be 9:
Sample script:
CREATE TEMPORARY TABLE [sales_detail] (
[employee] varchar(100),[customer] varchar(100),[startdate] varchar(100),[enddate] varchar(100),[saleday] int,[timeframe] varchar(100),[saleqty] numeric(18,4)
);
INSERT INTO [sales_detail]
([employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty])
VALUES
('Wendy','Chris','8/1/2019','8/12/2019','5','Afternoon','1'),
('Wendy','Chris','8/1/2019','8/12/2019','5','Morning','5'),
('Wendy','Chris','8/1/2019','8/12/2019','6','Morning','6'),
('Dexter','Chris','8/1/2019','8/12/2019','2','Mid','2.5'),
('Jennifer','Chris','8/1/2019','8/12/2019','4','Morning','2.75'),
('Lila','Chris','8/1/2019','8/12/2019','2','Morning','3.75'),
('Rita','Chris','8/1/2019','8/12/2019','2','Mid','1'),
('Tony','Chris','8/1/2019','8/12/2019','4','Mid','2'),
('Tony','Chris','8/1/2019','8/12/2019','1','Morning','6'),
('Mike','Chris','8/1/2019','8/12/2019','4','Mid','1.5'),
('Logan','Chris','8/1/2019','8/12/2019','3','Morning','6.25'),
('Blake','Chris','8/1/2019','8/12/2019','4','Afternoon','0.5')
;
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
9 AS [wanted_result]
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty]
FROM
[sales_detail]
) AS [s]
GROUP BY
[timeframe]
;
If I understand correctly, you are simply looking for a COUNT(DISTINCT) for all employees in the table? I believe this query will return the results you are looking for:
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
(SELECT COUNT(DISTINCT [employee]) FROM [sales_detail]) AS [employee_count2],
9 AS [wanted_result]
FROM #sales_detail [s]
GROUP BY
[timeframe]
You can try this below option-
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
[wanted_result]
-- select count form sub query
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty],
(select COUNT(DISTINCT [employee]) from [sales_detail]) AS [wanted_result]
--caculate the count with first sub query
FROM [sales_detail]
) AS [s]
GROUP BY
[timeframe],[wanted_result]
Use a trick where you only count each person on the first day they are seen:
select timeframe, sum(saleqty) as total_qty),
count(distinct employee) as employee_count1,
sum( (seqnum = 1)::int ) as employee_count2
9 as wanted_result
from (select sd.*,
row_number() over (partition by employee order by startdate) as seqnum
from sales_detail sd
) sd
group by timeframe;
Note: From the perspective of performance, your complex subquery is only evaluated once.
I tried to do two counts for different columns in my query:
select count(distinct color) as cid,
count(distinct entity) as eid from my_table
The above query wouldn't work with the following errors:
SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException:
all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT color); deviating function: count(DISTINCT entity)
), Query: select count(distinct color) as cid,
count(distinct entity) as eid from my_table
However, if I just do one count the query would work. Why is that? Is it possible for me to do two counts in one query?
Thanks!
Impala does not currently support multiple count distinct expressions within the same query, see IMPALA-110. This is a requested feature, but is surprisingly hard to implement so hasn't been added yet.
For now, if you do not need precise accuracy, you can produce an estimate of the distinct values for a column by specifying NDV(column); a query can contain multiple instances of NDV(column). To make Impala automatically rewrite COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query option (see the documentation).
An update on this - Impala 3.1 (released Nov 2018) adds support for multiple distinct aggregate functions in a new query block.
I'm not 100% sure this will work in Impala, but you can do count(distinct) using window functions and conditional aggregation. So, this query:
select count(distinct color) as cid,
count(distinct entity) as eid
from my_table ;
is equivalent to:
select sum(case when seqnum_color = 1 then 1 else 0 end) as cid,
sum(case when seqnum_entity = 1 then 1 else 0 end) as eid
from (select t.*,
row_number() over (partition by color order by color) as seqnum_color,
row_number() over (partition by entity order by entity) as seqnum_entity
from my_table t
) t;
I have this query:
select
id,
count(1) as "visits",
count(distinct visitor_id) as "visitors"
from my_table
where timestamp > '2016-01-14'
group by id
order by "visits", "visitors"
It works.
If I change to this
select
id,
count(1) as "visits",
count(distinct visitor_id) as "visitors"
from my_table
where timestamp > '2016-01-14'
group by id
order by (("visits") + ("visitors"))
I get
column "visits" does not exist
If I change to
select
id,
count(1) as "visits",
count(distinct visitor_id) as "visitors"
from my_table
where timestamp > '2016-01-14'
group by id
order by count(1) + count(distinct visitor_id)
it works again.
Why does it work for example 1 and 3, but not for example 2? Is there any way to order by the sum of two column using their aliases?
The alternatives I could think of:
Create an outer select and order it, but that would create extra code and I would like to avoid that
Recalculate the values in the order by statement. But that would make the query more complex and maybe I would lose performance due to recalculating stuff.
PS: This query is a toy-query. The real one is much more complicated. I would like to reuse the value calculated in the select statement in the order by, but all summed up together.
Expression evaluation order is not defined. If your visits + visitors expression is evaluated before aliases you will get the error shown here above.
Instead of using the alias try using the actual column also try change the type to varchar or nvarchar, and by that I mean the following:
select
id,
count(1) as "visits",
count(distinct visitor_id) as "visitors"
from my_table
where timestamp > '2016-01-14'
group by id
order by (CAST(count(1) AS VARCHAR) + CAST(count(distinct visitor_id) AS VARCHAR))