SQL Row_Count function with Partition - sql

I have a query that returns a set of results as a table called DATA, from several UNION ALL joined queries.
I am then doing ROW_NUMBER() on this, to get the row number for a specific grouping (WorksOrderNo)
ROW_NUMBER() Over(partition by Data.WorksOrderNo order by Data.WorksOrderNo) as RowNo,
Is there an equivalent ROW_Count function where I can specify a partition, and return the count of rows for that partition?
ROW_Count() Over(partition by Data.WorksOrderNo order by Data.WorksOrderNo) as RowNo ???
Reason being, this is query being used to drive a report layout.
As part of this, I need to format based on whether the total row count for each WorksOrderNo is >1 or not.
So for instance if there were three rows for a works order, the row_number function currently returns 1, 2 and 3, where the row count would return 3 on each row.

The function is simply COUNT(). In SQL Server, all the aggregation functions can be used as window functions, as long as they do not use DISTINCT.
Note that for the total count, you do not want the ORDER BY:
COUNT(*) Over (partition by Data.WorksOrderNo) as cnt
If you include the ORDER BY, then the COUNT() is cumulative, rather than constant for all rows in the partition.

It looks like you just need group by and count:
select WorksOrderNo, count(*) as Row_Count
from Data
group by WorksOrderNo

Related

Scalar subquery producing more than 1 record with partition - SQL

I have data with two rows as follows:
group_id item_no
weoifne 1
weoifne 2
I want to retrieve the max item_no for each group_id. I'm using this query:
SELECT MAX(item_no)
OVER (PARTITION BY group_id)
FROM my_table;
I need only one record because I'm embedding this query in a CASE WHEN statement to apply logic based on whether or not item_no is the highest value per group.
Desired Output:
2
Actual Output:
2
2
How do I modify my query to only output one record with the maximum item_no per group_id?
Use an aggregate function along with GROUP BY instead of an window function.
A window function, also known as an analytic function, computes values over a group of rows and returns a single result for each row. This is different from an aggregate function, which returns a single result for a group of rows.
https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls
SELECT group_id, MAX(item_no)
FROM my_table
GROUP BY group_id;
If you still want to use the window function, you can use DISTINCT in your script to get rid of the duplicates as shown below. DISTINCT works across all the columns
SELECT DISTINCT group_id
, MAX(item_no) OVER (PARTITION BY group_id)
FROM my_table

Getting first row of each group with SQL in Microsoft Access [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 10 months ago.
I am trying to get the first row of each group in the individual_id column, but I keep getting errors.
In the first section of the query I am just trying to SELECT the individual_id, pics, and species from my Train table and GROUP BY the individual_id:
SELECT individual_id, pics, species
FROM Train
GROUP BY individual_id
This alone throws an error saying that pics doesn't have an aggregate function, but I don't want to use an aggregate function on the data I want it to be the same table just grouped.
In the second part of the query I get an error in the WITH OWNERSHIP ACCESS declaration which I don't even have.
WITH added_row_number AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY individual_id ORDER BY pics DESC) AS row_number
FROM
Train
)
SELECT
*
FROM
added_row_number
WHERE
row_number = 1;
GROUP BY means aggregation: ie collapsing multiple rows into a single row for each unique value of the GROUP BY expression. It is individual_id, in your case (in other words, Access attempts to return one and only one row for each individual_id, but doesn't know what to do with the other columns: pics, species.
You said that you wanted the 'FIRST ROW' of each group. MsAccess has a FIRST aggregation function you can use for this purpose:
SELECT individual_id, FIRST(pics) as FIRST_pics, FIRST(species) as FIRST_species
FROM Train
GROUP BY individual_id
The FIRST function does not have a way specifying which row (of the same inidividual_id) is to be selected; it simply chooses the first retrieved as the FIRST (like the ORDER BY clause in the ROW_NUMBER() OVER function other rdbms products have).

Count()over() have repeated records

I often use sum() over() to calculate cumulative value,but today,I tried count ()over(),the result is out of my expectation,can someone explain why the result have repeated records on the same day?
I know the regular way is to count (distinct I'd) group by date,and then sum()over(order by date),just curious for the result of "count(id)over(order by date)"
Select pre.date,count(person_id) over (order by pre.date)
From (select distinct person_id, date from events) pre
The result will be repeated records for the same day.
Because your outer query has not filtered or aggregated the results from the inner query. It returns the same number of rows.
You want aggregation:
select pre.date, count(*) as cnt_on_date,
sum(count(*)) over (order by pre.date) as running_count
from (select distinct person_id, date from events) pre
group by pre.date;
Almost all analytical functions, except row_number() which comes to mind, do not differentiate ties for the same value of columns in order by clause. In some documentation it is stated directly:
Oracle
If you specify a logical window with the RANGE keyword, then the function returns the same result for each of the rows
Postgresql
By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause.
My SQL
With 'ORDER BY': The default frame includes rows from the partition start through the current row, including all peers of the current row (rows equal to the current row according to the ORDER BY clause).
But in general, the addition of ORDER BY in analytical clause implicitly sets window specification to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. As window calculation is made for each row in the defined window, with default to RANGE rows with the same value of ORDER BY columns will come into the same window and will produce the same result. So to have a real running total, there should be ROWS BETWEEN or more detail column in ORDER BY part of analytic clause. Functions that does not support windowing clause are exception of this rule, but it sometimes not documented directly, so I will not try to list them here. Functions that can be used as aggregate are not exception in general and produce the same value.

Group by or Distinct - But several fields

How can I use a Distinct or Group by statement on 1 field with a SELECT of All or at least several ones?
Example: Using SQL SERVER!
SELECT id_product,
description_fr,
DiffMAtrice,
id_mark,
id_type,
NbDiffMatrice,
nom_fr,
nouveaute
From C_Product_Tempo
And I want Distinct or Group By nom_fr
JUST GOT THE ANSWER:
select id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
from (
SELECT rn = row_number() over (partition by [nom_fr] order by id_mark)
, id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
From C_Product_Tempo
) d
where rn = 1
And this works prfectly!
If I'm understanding you correctly, you just want the first row per nom_fr. If so, you can simply use a subquery to get the lowest id_product per nom_fr, and just get the corresponding rows;
SELECT * FROM C_Product_Tempo WHERE id_product IN (
SELECT MIN(id_product) FROM C_Product_Tempo GROUP BY nom_fr
);
An SQLfiddle to test with.
You need to decide what to do with the other fields. For example, for numeric fields, do you want a sum? Average? Max? Min? For non-numeric fields to you want the values from a particular record if there are more than one with the same nom_fr?
Some SQL Systems allow you to get a "random" record when you do a GROUP BY, but SQL Server will not - you must define the proper aggregation for columns that are not in the GROUP BY.
GROUP BY is used to group in conjunction with an aggregate function (see http://www.w3schools.com/sql/sql_groupby.asp), so it's no use grouping without counting, summing up etc. DISTINCT eleminates duplicates but how that matches with the other columns you want to extract, I can't imagine, because some rows will be removed from the result.

Teradata - limiting the results using TOP

I am trying to fetch a huge set of records from Teradata using JDBC. And I need to break this set into parts for which I'm using "Top N" clause in select.
But I dont know how to set the "Offset" like how we do in MySQL -
SELECT * FROM tbl LIMIT 5,10
so that next select statement would fetch me the records from (N+1)th position.
RANK and QUALIFY I beleive are your friends here
for example
SEL RANK(custID), custID
FROM mydatabase.tblcustomer
QUALIFY RANK(custID) < 1000 AND RANK(custID) > 900
ORDER BY custID;
RANK(field) will (conceptually) retrieve all the rows of the resultset,
order them by the ORDER BY field and assign an incrementing rank ID to them.
QUALIFY allows you to slice that by limiting the rows returned to the qualification expression, which now can legally view the RANKs.
To be clear, I am returning the 900-1000th rows in the query select all from cusotmers,
NOT returning customers with IDs between 900 and 1000.
You can also use the ROW_NUMBER window aggregate on Teradata.
SELECT ROW_NUMBER() OVER (ORDER BY custID) AS RowNum_
, custID
FROM myDatabase.myCustomers
QUALIFY RowNum_ BETWEEN 900 and 1000;
Unlike the RANK windows aggregate, ROW_NUMBER will provide you a sequence regardless of whether the column you are ordering over the optional partition set is unique or not.
Just another option to consider.