query for sql to retrive id deistic value count - sql

I want to resolve below issue but not able to figure out how I can do that.

You can use the distinct keyword to get the number of different values in a count expression:
SELECT sensor_id, COUNT(DISTINCT event_type) types
FROM events
GROUP BY sensor_id
ORDER BY sensor_id ASC

Related

How to find AVG of Count in SQL

This is what I have
select avg(visit_count) from ( SELECT count(user_id) as visit_count from table )group by user_id;
But I get the below error
ERROR 1248 (42000): Every derived table must have its own alias
if I add alias
then I get avg for only one user_id
What I want is the avg of visit_count for all user ids
SEE the picture for reference
Example 3,2.5,1.5
It means that your subquery needs to have an alias.
Like this:
select avg(visit_count) from (
select count(user_id) as visit_count from table
group by user_id) a
Your subquery is missing an alias. I think this is the version you want:
SELECT AVG(visit_count)
FROM
(
SELECT COUNT(user_id) AS visit_count
FROM yourTable
GROUP BY user_id
) t;
Note that GROUP BY belongs inside the subquery, as you want to find counts for all users.

using date of datetime in group by and order by in single SELECT versus using subquery

I (suddenly?) have trouble getting this very simple query to work on Google BigQuery without using a subquery and not sure why.
The data contains a datetime column and I just want to check up on the number of rows per day.
However, it keeps complaining I'm using the datetime column 'which is neither grouped or aggregated'.
SELECT date(datetime_col) as row_date, count(*) as count
FROM table1
GROUP BY date(datetime_col)
ORDER BY count DESC
Without the ORDER BY it works just fine. When I add the ORDER BY it suddenly complains the
'SELECT list expression references column 'datetime_col' which is
neither grouped nor aggregated'
If I remove the count and group by and order by on the date then it does work.
Now if I use a subquery to do the date casting in there it does work:
SELECT row_date, count(row_date) as count FROM
(SELECT date(datetime_col) as row_date FROM table1)
GROUP BY row_date
ORDER BY count DESC
So I'm wondering what is going on why the first single select query is not working and if that can be fixed without using the subquery?
Try putting row_date into GROUP BY:
SELECT date(datetime_col) as row_date, count(*) as count
FROM table1
GROUP BY row_date
ORDER BY count DESC

Get minimum without using row number/window function in Bigquery

I have a table like as shown below
What I would like to do is get the minimum of each subject. Though I am able to do this with row_number function, I would like to do this with groupby and min() approach. But it doesn't work.
row_number approach - works fine
SELECT * FROM (select subject_id,value,id,min_time,max_time,time_1,
row_number() OVER (PARTITION BY subject_id ORDER BY value) AS rank
from table A) WHERE RANK = 1
min() approach - doesn't work
select subject_id,id,min_time,max_time,time_1,min(value) from table A
GROUP BY SUBJECT_ID,id
As you can see just the two columns (subject_id and id) is enough to group the items together. They will help differentiate the group. But why am I not able to use the other columns in select clause. If I use the other columns, I may not get the expected output because time_1 has different values.
I expect my output to be like as shown below
In BigQuery you can use aggregation for this:
SELECT ARRAY_AGG(a ORDER BY value LIMIT 1)[SAFE_OFFSET(1)].*
FROM table A
GROUP BY SUBJECT_ID;
This uses ARRAY_AGG() to aggregate each record (the a in the argument list). ARRAY_AGG() allows you to order the result (by value) and to limit the size of the array. The latter is important for performance.
After you concatenate the arrays, you want the first element. The .* transforms the record referred to by a to the component columns.
I'm not sure why you don't want to use ROW_NUMBER(). If the problem is the lingering rank column, you an easily remove it:
SELECT a.* EXCEPT (rank)
FROM (SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY subject_id ORDER BY value) AS rank
FROM A
) a
WHERE RANK = 1;
Are you looking for something like below-
SELECT
A.subject_id,
A.id,
A.min_time,
A.max_time,
A.time_1,
A.value
FROM table A
INNER JOIN(
SELECT subject_id, MIN(value) Value
FROM table
GROUP BY subject_id
) B ON A.subject_id = B.subject_id
AND A.Value = B.Value
If you do not required to select Time_1 column's value, this following query will work (As I can see values in column min_time and max_time is same for the same group)-
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
--A.time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time
Finally, the best approach is if you can apply something like CAST(Time_1 AS DATE) on your time column. This will consider only the date part regardless of the time part. The query will be
SELECT
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE) Time_1,
MIN(A.value)
FROM table A
GROUP BY
A.subject_id,A.id,A.min_time,A.max_time,
CAST(A.time_1 AS DATE)
-- Make sure the syntax of CAST AS DATE
-- in BigQuery is as I written here or bit different.
Below is for BigQuery Standard SQL and is most efficient way for such cases like in your question
#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY value LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY subject_id
Using ROW_NUMBER is not efficient and in many cases lead to Resources exceeded error.
Note: self join is also very ineffective way of achieving your objective
A bit late to the party, but here is a cte-based approach which made sense to me:
with mins as (
select subject_id, id, min(value) as min_value
from table
group by subject_id, id
)
select distinct t.subject_id, t.id, t.time_1, t.min_time, t.max_time, m.min_value
from table t
join mins m on m.subject_id = t.subject_id and m.id = t.id

Postgres another issue with "column must appear in the GROUP BY clause or be used in an aggregate function"

I have 2 tables in my Postgres database.
vehicles
- veh_id PK
- veh_number
positions
- position_id PK
- vehicle_id FK
- time
- latitude
- longitude
.... few more fields
I have multiple entries in Position table for every Vehicle. I would like to get all vehicle positions but the newest ones (where time field is latest). I tried query like this:
SELECT *
FROM positions
GROUP BY vehicle_id
ORDER BY time DESC
But there's an error:
column "positions.position_id" must appear in the GROUP BY clause or be used in an aggregate function
I tried to change it to:
SELECT *
FROM positions
GROUP BY vehicle_id, position_id
ORDER BY time DESC
but then it doesn't group entries.
I tried to found similiar problems e.g.:
PostgreSQL - GROUP BY clause or be used in an aggregate function
or
GroupingError: ERROR: column must appear in the GROUP BY clause or be used in an aggregate function
but I didn't really helped with my problem.
Could you help me fix my query?
Is simple if you have columns on the SELECT those should be also on the GROUP section unless they are wrapped with aggregated function
Also dont use * use the column names
SELECT col1, col2, MAX(col3), COUNT(col4), AVG(col5) -- aggregated columns
-- dont go in GROUP BY
FROM yourTable
GROUP BY col1, col2 -- all not aggregated field
Now regarding your query, looks like you want
SELECT *
FROM (
SELECT * ,
row_number() over (partition by vehicle_id order by time desc) rn
FROM positions
) t
WHERE t.rn = 1;
try to use this group by clause
GROUP BY position_id,vehicle_id
primary key then FK

how to find the highest number of items in a table

Im using sql server 2012,I have entries created on different dates,I wish to find on which date i have the maximum number of entries..Does using max() will help me?
Use group by clause , group your results by date and then select max.
select top 1 entry_date from entries group by entry_date order by count(*) desc
(SQL Fiddle)