Number of IDs selected - sql

In the following sql fiddle, how would I change the view to get the desired output?
http://sqlfiddle.com/#!6/a737a/1
VIEW
select
sum(dollars) as totalDollars,
sum(dollars)/count(id) as factor,
count(id) as numberOfEvents,
id as eventID,
event_date
from
events
group by
id,
event_date
Query
select
*
from eventStats
where
event_date between '1/1/2015' and '1/16/2015'
desired output
The numberOfevents should = 2 (the actual number of events, not the number of records for each event, determined by the where clause in the query) to properly do the math in the view.

You can count distinct fk_id without the group by clause:
select count(distinct fk_id) as number_of_IDs
from [myTable]
where [someCondition]

Use distinct keyword in count function:
select
count(distinct fk_id) as number_of_IDs
,id
from
myTable
where
someCondition
group by
id

Related

BigQuery - Extract last entry of each group

I have one table where multiple records inserted for each group of product. Now, I want to extract (SELECT) only the last entries. For more, see the screenshot. The yellow highlighted records should be return with select query.
The HAVING MAX and HAVING MIN clause for the ANY_VALUE function is now in preview
HAVING MAX and HAVING MIN were just introduced for some aggregate functions - https://cloud.google.com/bigquery/docs/release-notes#February_06_2023
with them query can be very simple - consider below approach
select any_value(t having max datetime).*
from your_table t
group by t.id, t.product
if applied to sample data in your question - output is
You might consider below as well
SELECT *
FROM sample_table
QUALIFY DateTime = MAX(DateTime) OVER (PARTITION BY ID, Product);
If you're more familiar with an aggregate function than a window function, below might be an another option.
SELECT ARRAY_AGG(t ORDER BY DateTime DESC LIMIT 1)[SAFE_OFFSET(0)].*
FROM sample_table t
GROUP BY t.ID, t.Product
Query results
You can use window function to do partition based on key and selecting required based on defining order by field.
For Example:
select * from (
select *,
rank() over (partition by product, order by DateTime Desc) as rank
from `project.dataset.table`)
where rank = 1
You can use this query to select last record of each group:
Select Top(1) * from Tablename group by ID order by DateTime Desc

best way to get count and distinct count of rows in single query

What is the best way to get count of rows and distinct rows in a single query?
To get distinct count we can use subquery like this:
select count(*) from
(
select distinct * from table
)
I have 15+ columns and have many duplicates rows as well and I want to calculate count of rows as well as distinct count of rows in one query.
More if I use this
select count(*) as Rowcount , count(distinct *) as DistinctCount from table
This will not give accurate results as count(distinct *) doesn't work.
Why don't you just put the subquery inside another query?
select count(*),
(select count(*) from (select distinct * from table))
from table;
create table tbl
(
col int
);
insert into tbl values(1),(2),(1),(3);
select count(*) as distinct_count, sum(sum) as all_count
from (
select count(col) sum from tbl group by col
)A
I think I have understood what you are looking for. You need to use some window function. So, you query should be look like =>
Select COUNT(*) OVER() YourRowcount ,
COUNT(*) OVER(Partition BY YourColumnofGroup) YourDistinctCount --Basic of the distinct count
FROM Yourtable
NEW Update
select top 1
COUNT(*) OVER() YourRowcount,
DENSE_RANK() OVER(ORDER BY YourColumn) YourDistinctCount
FROM Yourtable ORDER BY TT DESC
Note: This code is written sql server. Please check the code and let me know.

SQL Total Distinct Count on Group By Query

Trying to get an overall distinct count of the employees for a range of records which has a group by on it.
I've tried using the "over()" clause but couldn't get that to work. Best to explain using an example so please see my script below and wanted result below.
EDIT:
I should mention I'm hoping for a solution that does not use a sub-query based on my "sales_detail" table below because in my real example, the "sales_detail" table is a very complex sub-query.
Here's the result I want. Column "wanted_result" should be 9:
Sample script:
CREATE TEMPORARY TABLE [sales_detail] (
[employee] varchar(100),[customer] varchar(100),[startdate] varchar(100),[enddate] varchar(100),[saleday] int,[timeframe] varchar(100),[saleqty] numeric(18,4)
);
INSERT INTO [sales_detail]
([employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty])
VALUES
('Wendy','Chris','8/1/2019','8/12/2019','5','Afternoon','1'),
('Wendy','Chris','8/1/2019','8/12/2019','5','Morning','5'),
('Wendy','Chris','8/1/2019','8/12/2019','6','Morning','6'),
('Dexter','Chris','8/1/2019','8/12/2019','2','Mid','2.5'),
('Jennifer','Chris','8/1/2019','8/12/2019','4','Morning','2.75'),
('Lila','Chris','8/1/2019','8/12/2019','2','Morning','3.75'),
('Rita','Chris','8/1/2019','8/12/2019','2','Mid','1'),
('Tony','Chris','8/1/2019','8/12/2019','4','Mid','2'),
('Tony','Chris','8/1/2019','8/12/2019','1','Morning','6'),
('Mike','Chris','8/1/2019','8/12/2019','4','Mid','1.5'),
('Logan','Chris','8/1/2019','8/12/2019','3','Morning','6.25'),
('Blake','Chris','8/1/2019','8/12/2019','4','Afternoon','0.5')
;
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
9 AS [wanted_result]
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty]
FROM
[sales_detail]
) AS [s]
GROUP BY
[timeframe]
;
If I understand correctly, you are simply looking for a COUNT(DISTINCT) for all employees in the table? I believe this query will return the results you are looking for:
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
(SELECT COUNT(DISTINCT [employee]) FROM [sales_detail]) AS [employee_count2],
9 AS [wanted_result]
FROM #sales_detail [s]
GROUP BY
[timeframe]
You can try this below option-
SELECT
[timeframe],
SUM([saleqty]) AS [total_qty],
COUNT(DISTINCT [s].[employee]) AS [employee_count1],
SUM(COUNT(DISTINCT [s].[employee])) OVER() AS [employee_count2],
[wanted_result]
-- select count form sub query
FROM (
SELECT
[employee],[customer],[startdate],[enddate],[saleday],[timeframe],[saleqty],
(select COUNT(DISTINCT [employee]) from [sales_detail]) AS [wanted_result]
--caculate the count with first sub query
FROM [sales_detail]
) AS [s]
GROUP BY
[timeframe],[wanted_result]
Use a trick where you only count each person on the first day they are seen:
select timeframe, sum(saleqty) as total_qty),
count(distinct employee) as employee_count1,
sum( (seqnum = 1)::int ) as employee_count2
9 as wanted_result
from (select sd.*,
row_number() over (partition by employee order by startdate) as seqnum
from sales_detail sd
) sd
group by timeframe;
Note: From the perspective of performance, your complex subquery is only evaluated once.

Some two columns using alias in order by

I have this query:
select
id,
count(1) as "visits",
count(distinct visitor_id) as "visitors"
from my_table
where timestamp > '2016-01-14'
group by id
order by "visits", "visitors"
It works.
If I change to this
select
id,
count(1) as "visits",
count(distinct visitor_id) as "visitors"
from my_table
where timestamp > '2016-01-14'
group by id
order by (("visits") + ("visitors"))
I get
column "visits" does not exist
If I change to
select
id,
count(1) as "visits",
count(distinct visitor_id) as "visitors"
from my_table
where timestamp > '2016-01-14'
group by id
order by count(1) + count(distinct visitor_id)
it works again.
Why does it work for example 1 and 3, but not for example 2? Is there any way to order by the sum of two column using their aliases?
The alternatives I could think of:
Create an outer select and order it, but that would create extra code and I would like to avoid that
Recalculate the values in the order by statement. But that would make the query more complex and maybe I would lose performance due to recalculating stuff.
PS: This query is a toy-query. The real one is much more complicated. I would like to reuse the value calculated in the select statement in the order by, but all summed up together.
Expression evaluation order is not defined. If your visits + visitors expression is evaluated before aliases you will get the error shown here above.
Instead of using the alias try using the actual column also try change the type to varchar or nvarchar, and by that I mean the following:
select
id,
count(1) as "visits",
count(distinct visitor_id) as "visitors"
from my_table
where timestamp > '2016-01-14'
group by id
order by (CAST(count(1) AS VARCHAR) + CAST(count(distinct visitor_id) AS VARCHAR))

SQL Server aggregate function query error

My query
SELECT TOP 1 *, COUNT(*) AS totalRun
FROM history
ORDER BY starttime DESC`
Estimated outcome is all the data from 1 row in the history table with the latest starttime and a fieldtotalrun with the total amount of records, but... I get the following error.
Msg 8120, Level 16, State 1, Line 1
Column 'history.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
What am I doing wrong?
EDIT
example of the result:
These are all the fields of the row with the latest starttime in the history table with the extra COUNT field 'totalRun'
Aggregates can only be expressed in two cases.
Where you have a GROUP BY statement
Where you use the OVER clause
The following will give you the most recent start time and the number of rows in your source table that share that start time...
SELECT
starttime,
COUNT(*) AS row_count
FROM
history
GROUP BY
starttime
ORDER BY
starttime DESC
In this structure the only fields you can select are the ones in the GROUP BY statement (and you can have several), or aggregates *(such as SUM(), COUNT(), etc).
If, however, you want the COUNT(*) to be done over the whole table, and not just the rows grouped together, you can use the OVER clause in the SELECT statement.
SELECT
*,
COUNT(*) OVER (PARTITION BY 1) AS row_count
FROM
history
ORDER BY
starttime DESC
Because this doesn't use a GROUP BY, you can then also select * rather than just teh fields you are grouping by.
If you need something different, please could you include some example data and the results you would desire?
You either aggregate or group by a column. You have columns that are neither
SELECT TOP 1
starttime, COUNT(*) AS totalRun
FROM history
GROUP BY starttime, foo
ORDER BY starttime DESC;
If you need a column foo, then add it as follows
SELECT TOP 1
starttime, foo, COUNT(*) AS totalRun
FROM history
GROUP BY starttime, foo
ORDER BY starttime DESC, foo;
I could not unerstand the requirement properly. Why you are using top 1 with count(*)
and without the group by clause.
If you want the result TotalRuns earned on last date then you can use this query
SELECT TOP 1 starttime, COUNT(1) AS totalRun
FROM history Group by starttime ORDER BY starttime DESC
If this is the requirement:
Estimated outcome is all the data from 1 row in the history table with the latest starttime and a fieldtotalrun with the total amount of records,
select top 1 *
from
(Select *
from history
where starttime= (select max(starttime) from history) )i
full outer join
(select count(1) count , max(starttime) sttime as fieldtotalrun
from history ) j on i.starttime=j.sttime