Excluding fields while using Over() function in SQL - sql

I am using a count(field_name) Over(partition by field_name) function in sql and wanted to only show the values greater than 1. I found that I cannot use an aggregate function in the where or having clause and was hoping there was another way, short of writing a function as I do not have write privileges.
Any help would be greatly appreciated.
Thanks

COUNT excludes nulls, so instead of counting the column itself, you can add some logic there by introducing a case expression:
COUNT(CASE WHEN field_name > 1 THEN field_name ELSE NULL END) OVER (PARTITION BY field_name) f
EDIT:
I seem to have misunderstood the original question. If you want to filter out the results of the count function, you'll need a subquery:
SELECT office, cnt
FROM (SELECT office, COUNT(office) OVER(PARTITION BY office) cnt
FROM my_table)
WHERE cnt > 1

Related

SQL Server Count field values without merge

How do I create a COUNT column to count the repetitive values?
And I want to keep the table EXACTLY as below but add the last column (count_id).
The values at the left come from a JOIN so they are "equal".
Thanks! (I tried a lot)
You just want count(*) as a window function:
select t.*,
count(*) over (partition by id, name, department) as count_id
from t;

How to get first and last record from same group in SQL Server?

I'm a new SQL user and need help.
Let's say I have a vehicle number 123 and I've traveled from Region 3 to final destination Region 4. In between, I've visited Region 1 and 5 as well but that's not my concern.
Simple example would be as follow.
Original Table
Desired Output
How can this be done in SQL query?
You have a sequence number so you can use some form of aggregation. One method is:
select records,
max(case when sequence = 1 then fromregion end) as fromregion,
max(case when sequence = maxsequence then toregion) as toregion
from (select t.*, max(sequence) over (partition by records) as max_sequence
from t
) t
group by records;
Unfortunately, SQL Server doesn't offer "first()" or "last()" as aggregation functions. But it does support first_value() as a window function. This allows you to do the logic without a subquery:
select distinct records,
first_value(fromRegion) over (partition by records order by sequence) as fromregion,
first_value(toRegion) over (partition by records order by sequence desc) as toregion
from t;

SQL Oracle Find who has the max : having count(*) >= all(..) VS having count(*) = (select max(count(*))

I have to find which person is the most present in a table.
I have two working solutions but I don't know if there is any difference between them, and which one to prefer.
Solution 1 : using all
select numPerson
From nameTable
Group by numPerson
Having count(*)>= all(select count(*) from nameTable group by numPerson);
Solution 2 : using max
select numPerson
From nameTable
Group by numPerson
Having count(*)= (select max(count(*)) from nameTable group by numPerson);
There are better ways to write this using window functions. I'm not a fan of the second syntax, nested aggregation functions, because it uses bespoke Oracle functionality -- but Oracle aficionados probably love it. I would write that as:
having count(*)= (select max(cnt) from (select count(*) as cnt from nameTable group by numPerson) p);
But that is irrelevant to your question. In your having clause the two are equivalent.
However, there are differences:
The first involves no rows being returned in the subquery:
The all form will return all rows when the subquery returns no values.
The aggregation form will return no rows when the subquery returns no values.
The second involves null values in column used in the all:
The all returns no rows.
The aggregation ignores the NULL values if there are any non-NULL values.
Because of the nature of your query, "all rows when the subquery returns no values" is the same as "no rows". And count(*) never returns NULL. So the two are equivalent.
select stats_mode(numperson) from nametable
The above will return only one row. Your solutions will return more than one row in case of ties. If you accept Oracle window functions, you can get the ties yet avoid the second "full table scan":
select numPerson from (
select numPerson, count(*) cnt, max(count(*)) over() max_cnt
from nameTable
group by numPerson
)
where cnt = max_cnt;

SQL, Impala: why can't I do two counts on one query

I tried to do two counts for different columns in my query:
select count(distinct color) as cid,
count(distinct entity) as eid from my_table
The above query wouldn't work with the following errors:
SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException:
all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT color); deviating function: count(DISTINCT entity)
), Query: select count(distinct color) as cid,
count(distinct entity) as eid from my_table
However, if I just do one count the query would work. Why is that? Is it possible for me to do two counts in one query?
Thanks!
Impala does not currently support multiple count distinct expressions within the same query, see IMPALA-110. This is a requested feature, but is surprisingly hard to implement so hasn't been added yet.
For now, if you do not need precise accuracy, you can produce an estimate of the distinct values for a column by specifying NDV(column); a query can contain multiple instances of NDV(column). To make Impala automatically rewrite COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query option (see the documentation).
An update on this - Impala 3.1 (released Nov 2018) adds support for multiple distinct aggregate functions in a new query block.
I'm not 100% sure this will work in Impala, but you can do count(distinct) using window functions and conditional aggregation. So, this query:
select count(distinct color) as cid,
count(distinct entity) as eid
from my_table ;
is equivalent to:
select sum(case when seqnum_color = 1 then 1 else 0 end) as cid,
sum(case when seqnum_entity = 1 then 1 else 0 end) as eid
from (select t.*,
row_number() over (partition by color order by color) as seqnum_color,
row_number() over (partition by entity order by entity) as seqnum_entity
from my_table t
) t;

SQL Server: How can I use the COUNT clause without GROUPing?

I'm looking get two things from a query, given a set of contraints:
The first match
The total number of matches
I can get the first match by:
SELECT TOP 1
ID,
ReportYear,
Name,
SignDate,
...
FROM Table
WHERE
...
ORDER BY ... //I can put in here which one I want to take
And then I can get the match count if I use
SELECT
MIN(ID),
MIN(ReportYear),
MIN(Name),
MIN(SignDate),
... ,
COUNT(*) as MatchCount
FROM Table
WHERE
...
GROUP BY
??? // I don't really want any grouping
I really want to avoid both grouping and using an aggregate function on all my results. This question SQL Server Select COUNT without using aggregate function or group by suggests the answer would be
SELECT TOP 1
ID,
ReportYear,
Name,
SignDate,
... ,
##ROWCOUNT as MatchCount
FROM Table
This works without the TOP 1, but when it's in there, ##ROWCOUNT = number of rows returned, which is 1. How can I get essentially the output of COUNT(*) (whatever's left after the where clause) without any grouping or need to aggregate all the columns?
What I don't want to do is repeat each of these twice, once for the first row and then again for the ##ROWCOUNT. I'm not finding a way I can properly use GROUP BY, because I strictly want the number of items that match my criteria, and I want columns that if I GROUPed them would throw this number off - unless I'm misunderstanding GROUP BY.
Assuming you are using a newish version of SQL Server (2008+ from memory) then you can use analytic functions.
Simplifying things somewhat, they are a way of way of doing an aggregate over a set of data instead of a group - an extension on basic aggregates.
Instead of this:
SELECT
... ,
COUNT(*) as MatchCount
FROM Table
WHERE
...
You do this:
SELECT
... ,
COUNT(*) as MatchCount OVER (PARTITION BY <group fields> ORDER BY <order fields> )
FROM Table
WHERE
...
GROUP BY
Without actually running some code, I can't recall exactly which aggregates that you can't use in this fashion. Count is fine though.
Well, you can use OVER clause, which is an window function.
SELECT TOP (1)
OrderID, CustID, EmpID,
COUNT(*) OVER() AS MatchCount
FROM Sales.Orders
WHERE OrderID % 2 = 1
ORDER BY OrderID DESC
Try next query:
select top 1
*, count(*) over () rowsCount
from
(
select
*, dense_rank() over (order by ValueForOrder) n
from
myTable
) t
where
n = 1