I have this query:
select
id,
count(1) as "visits",
count(distinct visitor_id) as "visitors"
from my_table
where timestamp > '2016-01-14'
group by id
order by "visits", "visitors"
It works.
If I change to this
select
id,
count(1) as "visits",
count(distinct visitor_id) as "visitors"
from my_table
where timestamp > '2016-01-14'
group by id
order by (("visits") + ("visitors"))
I get
column "visits" does not exist
If I change to
select
id,
count(1) as "visits",
count(distinct visitor_id) as "visitors"
from my_table
where timestamp > '2016-01-14'
group by id
order by count(1) + count(distinct visitor_id)
it works again.
Why does it work for example 1 and 3, but not for example 2? Is there any way to order by the sum of two column using their aliases?
The alternatives I could think of:
Create an outer select and order it, but that would create extra code and I would like to avoid that
Recalculate the values in the order by statement. But that would make the query more complex and maybe I would lose performance due to recalculating stuff.
PS: This query is a toy-query. The real one is much more complicated. I would like to reuse the value calculated in the select statement in the order by, but all summed up together.
Expression evaluation order is not defined. If your visits + visitors expression is evaluated before aliases you will get the error shown here above.
Instead of using the alias try using the actual column also try change the type to varchar or nvarchar, and by that I mean the following:
select
id,
count(1) as "visits",
count(distinct visitor_id) as "visitors"
from my_table
where timestamp > '2016-01-14'
group by id
order by (CAST(count(1) AS VARCHAR) + CAST(count(distinct visitor_id) AS VARCHAR))
Related
any ideas why this doesn't work?
select [column_name_2], max(count(distinct([column_name_1])))
from [table_name]
group by [column_name_2]
but it works if done like this
select [column_name_2], count(distinct([column_name_1])) as [x]
into #temp_table
from [table_name]
group by [column_name_2]
select max(x)
from #temp_table
Well, that's just the way SQL (the language) is defined to work. When you use GROUP BY, the corresponding SELECT list will produce a row for each group in the result. You're trying to take that result and aggregate twice, once with GROUP BY [column_name_2] and a second time with GROUP BY (), as defined by standard SQL. We can't do that in the same query expression.
The good news is you can break this up into more than one query expression:
WITH cte1 AS (
SELECT count(distinct([column_name_1])) AS cnt
FROM [table_name]
GROUP BY [column_name_2]
)
SELECT MAX(cnt) FROM cte1
;
or use a derived table.
You can even order the initial query result by cnt DESC and limit the result to the first row.
In your case, you may not want just the MAX, but also the other column.
With SQL Server, which you may be using. Note: You should add a database specific tag to the question.
SELECT TOP 1 [column_name_2], count(distinct([column_name_1])) AS cnt
FROM [table_name]
GROUP BY [column_name_2]
ORDER BY cnt DESC
;
I don't understand "this doesn't work", what were you expecting and what did you get? Normally you include the GROUP BY value in the result set. So it would be:
select [column_name_2], max(cnt) cnt
from (select [column_name_2], count(distinct [column_name_1]) cnt
from [table_name]
group by [column_name_2]) x
group by [column_name_2]
Ok, after reading your comment I think above is what you are looking for.
SELECT [column_name_two]
, max(x) x
FROM (
SELECT [column_name_two]
, COUNT(DISTINCT [column_name_one]
FROM table_name
GROUP BY [column_name_two]
) AS Tbl
GROUP BY [column_name_two]
I want to do a count grouping by the first column but omitting the others columns in the group by. Let me explain:
I have a table with those columns
So, what I want to get is a new column with the work orders total by Instrument, something like this:
How can I do that? Because if I do a count like this:
SELECT INSTRUMENT, WORKORDER, DATE, COUNT(*)
FROM TABLE1
GROUP BY INSTRUMENT, WORKORDER, DATE;
I get this:
Just use a window function:
select t.*,
count(*) over (partition by instrument) as instrument_count
from table1 t;
Although answer given by Gordon is perfect but there is also another option by using group by and subquery. You can add date column to this query as well
SELECT * FROM
(
SELECT A.INSTRUMENT, B.TOTAL_COUNT_BY_INSTRUMENT
FROM work_order A,
(SELECT COUNT(1) AS TOTAL_COUNT_BY_INSTRUMENT,
INSTRUMENT
FROM WORK_ORDER
GROUP BY INSTRUMENT
) B
WHERE A.INSTRUMENT = B.instrument);
I have a table like as shown below
What I would like to do is get the minimum of each subject. Though I am able to do this with row_number function, I would like to do this with groupby and min() approach. But it doesn't work.
row_number approach - works fine
SELECT * FROM (select subject_id,value,id,min_time,max_time,time_1,
row_number() OVER (PARTITION BY subject_id ORDER BY value) AS rank
from table A) WHERE RANK = 1
min() approach - doesn't work
select subject_id,id,min_time,max_time,time_1,min(value) from table A
GROUP BY SUBJECT_ID,id
As you can see just the two columns (subject_id and id) is enough to group the items together. They will help differentiate the group. But why am I not able to use the other columns in select clause. If I use the other columns, I may not get the expected output because time_1 has different values.
I expect my output to be like as shown below
In BigQuery you can use aggregation for this:
SELECT ARRAY_AGG(a ORDER BY value LIMIT 1)[SAFE_OFFSET(1)].*
FROM table A
GROUP BY SUBJECT_ID;
This uses ARRAY_AGG() to aggregate each record (the a in the argument list). ARRAY_AGG() allows you to order the result (by value) and to limit the size of the array. The latter is important for performance.
After you concatenate the arrays, you want the first element. The .* transforms the record referred to by a to the component columns.
I'm not sure why you don't want to use ROW_NUMBER(). If the problem is the lingering rank column, you an easily remove it:
SELECT a.* EXCEPT (rank)
FROM (SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY subject_id ORDER BY value) AS rank
FROM A
) a
WHERE RANK = 1;
Are you looking for something like below-
SELECT
A.subject_id,
A.id,
A.min_time,
A.max_time,
A.time_1,
A.value
FROM table A
INNER JOIN(
SELECT subject_id, MIN(value) Value
FROM table
GROUP BY subject_id
) B ON A.subject_id = B.subject_id
AND A.Value = B.Value
If you do not required to select Time_1 column's value, this following query will work (As I can see values in column min_time and max_time is same for the same group)-
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
--A.time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time
Finally, the best approach is if you can apply something like CAST(Time_1 AS DATE) on your time column. This will consider only the date part regardless of the time part. The query will be
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE) Time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE)
-- Make sure the syntax of CAST AS DATE
-- in BigQuery is as I written here or bit different.
Below is for BigQuery Standard SQL and is most efficient way for such cases like in your question
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY value LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY subject_id
Using ROW_NUMBER is not efficient and in many cases lead to Resources exceeded error.
Note: self join is also very ineffective way of achieving your objective
A bit late to the party, but here is a cte-based approach which made sense to me:
with mins as (
select subject_id, id, min(value) as min_value
from table
group by subject_id, id
)
select distinct t.subject_id, t.id, t.time_1, t.min_time, t.max_time, m.min_value
from table t
join mins m on m.subject_id = t.subject_id and m.id = t.id
I don't understand why I can't use this in my code :
SELECT MAX(SMTHNG), COUNT(MAX(SMTHNG))
FROM SomeTable;
Searched for an answer but didn't find it in documentation about these aggregate functions.
Also I get an SQL-compiler error "Invalid column name "SMTHNG"".
You want to know what the maximum SMTHNG in the table is with:
SELECT MAX(SMTHNG) FROM SomeTable;
This is an aggregation without GROUP BY and hence results in one single row containing the maximum SMTHNG.
Now you also want to know how often this SMTHNG occurs and you add COUNT(MAX(SMTHNG)). This, however, does not work, because you can not aggregate an aggregate directly.
This doesn't work either:
SELECT ANY_VALUE(max_smthng), COUNT(*)
FROM (SELECT MAX(smthng) AS max_smthng FROM sometable) t;
because the sub query only contains one row, so it's too late to count.
So, either use a sub query and select from the table again:
SELECT ANY_VALUE(smthng), COUNT(*)
FROM sometable
WHERE smthng = (SELECT MAX(smthng) FROM sometable);
Or count per SMTHNG before looking for the maximum. Here is how to get the counts:
SELECT smthng, COUNT(*)
FROM sometable
GROUP BY smthng;
And the easiest way to get the maximum from this result is:
SELECT TOP(1) smthng, COUNT(*)
FROM sometable
GROUP BY smthng
ORDER BY COUNT(*) DESC;
First of all, please read my comment.
Depending on what you're trying to achieve, the statement have to be changed.
If you want to count the highest values in SMTHNG field, you may try this:
SELECT T1.SMTHNG, COUNT(T1.SMTHNG)
FROM SomeTable T1 INNER JOIN
(
SELECT MAX(SMTHNG) AS A
FROM SomeTable
) T2 ON T1.SMTHNG = T2.A
GROUP BY T1.SMTHNG;
use cte like below or subquery
with cte as
(
select count(*) as cnt ,col from table_name
group by col
) select max(cnt) from cte
you can not use double aggregate function at a time on same column
If i perform a standard query in SQLite:
SELECT * FROM my_table
I get all records in my table as expected. If i perform following query:
SELECT *, 1 FROM my_table
I get all records as expected with rightmost column holding '1' in all records. But if i perform the query:
SELECT *, COUNT(*) FROM my_table
I get only ONE row (with rightmost column is a correct count).
Why is such results? I'm not very good in SQL, maybe such behavior is expected? It seems very strange and unlogical to me :(.
SELECT *, COUNT(*) FROM my_table is not what you want, and it's not really valid SQL, you have to group by all the columns that's not an aggregate.
You'd want something like
SELECT somecolumn,someothercolumn, COUNT(*)
FROM my_table
GROUP BY somecolumn,someothercolumn
If you want to count the number of records in your table, simply run:
SELECT COUNT(*) FROM your_table;
count(*) is an aggregate function. Aggregate functions need to be grouped for a meaningful results. You can read: count columns group by
If what you want is the total number of records in the table appended to each row you can do something like
SELECT *
FROM my_table
CROSS JOIN (SELECT COUNT(*) AS COUNT_OF_RECS_IN_MY_TABLE
FROM MY_TABLE)