Getting first row of each group with SQL in Microsoft Access [duplicate] - sql

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 10 months ago.
I am trying to get the first row of each group in the individual_id column, but I keep getting errors.
In the first section of the query I am just trying to SELECT the individual_id, pics, and species from my Train table and GROUP BY the individual_id:
SELECT individual_id, pics, species
FROM Train
GROUP BY individual_id
This alone throws an error saying that pics doesn't have an aggregate function, but I don't want to use an aggregate function on the data I want it to be the same table just grouped.
In the second part of the query I get an error in the WITH OWNERSHIP ACCESS declaration which I don't even have.
WITH added_row_number AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY individual_id ORDER BY pics DESC) AS row_number
FROM
Train
)
SELECT
*
FROM
added_row_number
WHERE
row_number = 1;

GROUP BY means aggregation: ie collapsing multiple rows into a single row for each unique value of the GROUP BY expression. It is individual_id, in your case (in other words, Access attempts to return one and only one row for each individual_id, but doesn't know what to do with the other columns: pics, species.
You said that you wanted the 'FIRST ROW' of each group. MsAccess has a FIRST aggregation function you can use for this purpose:
SELECT individual_id, FIRST(pics) as FIRST_pics, FIRST(species) as FIRST_species
FROM Train
GROUP BY individual_id
The FIRST function does not have a way specifying which row (of the same inidividual_id) is to be selected; it simply chooses the first retrieved as the FIRST (like the ORDER BY clause in the ROW_NUMBER() OVER function other rdbms products have).

Related

Scalar subquery producing more than 1 record with partition - SQL

I have data with two rows as follows:
group_id item_no
weoifne 1
weoifne 2
I want to retrieve the max item_no for each group_id. I'm using this query:
SELECT MAX(item_no)
OVER (PARTITION BY group_id)
FROM my_table;
I need only one record because I'm embedding this query in a CASE WHEN statement to apply logic based on whether or not item_no is the highest value per group.
Desired Output:
2
Actual Output:
2
2
How do I modify my query to only output one record with the maximum item_no per group_id?
Use an aggregate function along with GROUP BY instead of an window function.
A window function, also known as an analytic function, computes values over a group of rows and returns a single result for each row. This is different from an aggregate function, which returns a single result for a group of rows.
https://cloud.google.com/bigquery/docs/reference/standard-sql/window-function-calls
SELECT group_id, MAX(item_no)
FROM my_table
GROUP BY group_id;
If you still want to use the window function, you can use DISTINCT in your script to get rid of the duplicates as shown below. DISTINCT works across all the columns
SELECT DISTINCT group_id
, MAX(item_no) OVER (PARTITION BY group_id)
FROM my_table

SQL Query - Rank showing only 1 rank for all records

I am trying to perform ranking based on some calculation of already existing columns. I tried using the SQL RANK() function however it is showing the result as 1 for all entries even if the value of the order by (score column) is different. Please see the details below:
qu_point and ti_points are calculated columns
score column is again a derived column, however, simply sum of two columns mentioned in point 1.
I have used the SQL query as follow:
use EFR_DB
GO
select d.serial, d.question_set_id, d.correct_answers, d.total_questions, d.time_taken_seconds, q.total_time_in_secs,
(cast(d.correct_answers as float)/d.total_questions) as qu_point, ((q.total_time_in_secs-d.time_taken_seconds)/q.total_time_in_secs) as ti_point,
(((cast(d.correct_answers as float)/d.total_questions)*2) + ((q.total_time_in_secs-d.time_taken_seconds)/q.total_time_in_secs)) as score,
rank() over (partition by d.question_set_id order by score)
from daily_quiz_record d join Question_set q
on q.question_set_id=d.question_set_id
Please help me how can I do the raking which is partitioned by question_set_id and ranked on the basis of the score.
Screenshot attached for your reference.
enter image description here
You can’t use an alias defined in the select clause in the same clause. I suppose that one of your table has a column called score, otherwise your query would error - so this existing column is being used for ordering instead of the computed value.
Since your expression is lengthy, it is simpler to turn the query to a subquery, and rank in the outer query:
select
t.*,
rank() over(partition by question_set_id order by score) rn
from (
-- your existing query (without rank)
) t

SQL Row_Count function with Partition

I have a query that returns a set of results as a table called DATA, from several UNION ALL joined queries.
I am then doing ROW_NUMBER() on this, to get the row number for a specific grouping (WorksOrderNo)
ROW_NUMBER() Over(partition by Data.WorksOrderNo order by Data.WorksOrderNo) as RowNo,
Is there an equivalent ROW_Count function where I can specify a partition, and return the count of rows for that partition?
ROW_Count() Over(partition by Data.WorksOrderNo order by Data.WorksOrderNo) as RowNo ???
Reason being, this is query being used to drive a report layout.
As part of this, I need to format based on whether the total row count for each WorksOrderNo is >1 or not.
So for instance if there were three rows for a works order, the row_number function currently returns 1, 2 and 3, where the row count would return 3 on each row.
The function is simply COUNT(). In SQL Server, all the aggregation functions can be used as window functions, as long as they do not use DISTINCT.
Note that for the total count, you do not want the ORDER BY:
COUNT(*) Over (partition by Data.WorksOrderNo) as cnt
If you include the ORDER BY, then the COUNT() is cumulative, rather than constant for all rows in the partition.
It looks like you just need group by and count:
select WorksOrderNo, count(*) as Row_Count
from Data
group by WorksOrderNo

Group by or Distinct - But several fields

How can I use a Distinct or Group by statement on 1 field with a SELECT of All or at least several ones?
Example: Using SQL SERVER!
SELECT id_product,
description_fr,
DiffMAtrice,
id_mark,
id_type,
NbDiffMatrice,
nom_fr,
nouveaute
From C_Product_Tempo
And I want Distinct or Group By nom_fr
JUST GOT THE ANSWER:
select id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
from (
SELECT rn = row_number() over (partition by [nom_fr] order by id_mark)
, id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
From C_Product_Tempo
) d
where rn = 1
And this works prfectly!
If I'm understanding you correctly, you just want the first row per nom_fr. If so, you can simply use a subquery to get the lowest id_product per nom_fr, and just get the corresponding rows;
SELECT * FROM C_Product_Tempo WHERE id_product IN (
SELECT MIN(id_product) FROM C_Product_Tempo GROUP BY nom_fr
);
An SQLfiddle to test with.
You need to decide what to do with the other fields. For example, for numeric fields, do you want a sum? Average? Max? Min? For non-numeric fields to you want the values from a particular record if there are more than one with the same nom_fr?
Some SQL Systems allow you to get a "random" record when you do a GROUP BY, but SQL Server will not - you must define the proper aggregation for columns that are not in the GROUP BY.
GROUP BY is used to group in conjunction with an aggregate function (see http://www.w3schools.com/sql/sql_groupby.asp), so it's no use grouping without counting, summing up etc. DISTINCT eleminates duplicates but how that matches with the other columns you want to extract, I can't imagine, because some rows will be removed from the result.

GROUP BY combined with ORDER BY

The GROUP BY clause groups the rows, but it does not necessarily sort the results in any particular order. To change the order, use the ORDER BY clause, which follows the GROUP BY clause. The columns used in the ORDER BY clause must appear in the SELECT list, which is unlike the normal use of ORDER BY. [Oracle by Example, fourth Edition, page 274]
Why is that? Why does using GROUP BY influence the required columns in the SELECT clause?
Also, in the case where I do not use GROUP BY: Why would I want to ORDER BY some columns but then select only a subset of the columns?
Actually the statement is not entirely true as Dave Costa's example shows.
The Oracle documentation says that an expression can be used but the expression must be based on the columns in the selection list.
expr - expr orders rows based on their value for expr. The expression is based on
columns in the select list or columns in the tables, views, or materialized views in the
FROM clause. Source: Oracle® Database
SQL Language Reference
11g Release 2 (11.2)
E26088-01
September 2011. Page 19-33
From the the same work page 19-13 and 19-33 (Page 1355 and 1365 in the PDF)
http://docs.oracle.com/cd/E11882_01/server.112/e26088/statements_10002.htm#SQLRF01702
http://docs.oracle.com/cd/E11882_01/server.112/e26088/statements_10002.htm#i2171079
The bold text from your quote is incorrect (it's probably an oversimplification that is true in many common use cases, but it is not strictly true as a requirement). For instance, this statement executes just fine, although AVG(val) is not in the select list:
WITH DATA AS (SELECT mod(LEVEL,3) grp, LEVEL val FROM dual CONNECT BY LEVEL < 100)
SELECT grp,MIN(val),MAX(val)
FROM DATA
GROUP BY grp
ORDER BY AVG(val)
The expressions in the ORDER BY clause simply have to be possible to evaluate in the context of the GROUP BY. For instance, ORDER BY val would not work in the above example, because the expression val does not have a distinct value for each row produced by the grouping.
As to your second question, you may care about the ordering but not about the value of the ordering expression. Excluding unneeded expressions from the select lists reduces the amount of data that must actually be sent from the server to the client.
First:
The implementation of group by is one which creates a new resultset that differs in structure to the original from clause (table view or some joined tables). That resultset is defined by what is selected.
Not every SQL RDBMS has this restriction, though it is a always requirement that what is ordered by be either an aggregate function of the non-grouped columns (AVG, SUM, etc) or one of the columns grouped by, or functions upon more than one of those results (like adding two columns), because this is a logical requirement of the result of the grouping operation.
Second:
Because you only care about that column for the ordering. For example, you might have a list of the top selling singles without giving their sales (the NYT Bestsellers keeps some details of their data a secret, but do have a ranked list). Of course, you can get around this by just selecting that column and then not using it.
The data is aggregated before it is sorted for the ORDER BY.
If you try to order by any other column (that is not in the group by list or an aggregation function), what value would be used? There is no single value to use for ordering.
I believe that you can use combinations of the values for sorting. So you can say:
order by a+b
If a and b are in the group by. You just cannot introduce columns not mentioned in the SELECT. I believe you can use aggregation functions not mentioned in the SELECT, however.
Sample table
sample.grades
Name Grade Score
Adam A 95
Bob A 97
Charlie C 75
First Query using GROUP BY
Select grade, count(Grade) from sample.grades GROUP BY Grade
Output
Grade Count
A 2
C 1
Second Query using order by
select Name, score from sample grades order by score
Output
Bob A 97
Adam A 95
Charlie C 75
Third Query using GROUP BY and ordering
Select grade, count(Grade) from sample.grades GROUP BY Grade desc
Output
Grade Count
A 2
C 1
Once you start using things like Count, you must have group by. You can use them together, but they have very different uses, as I hope the examples clearly show.
To try and answer the question, why does group by effect the items in the select section, because that is what group by is meant to do. You can't do the count of a column if you do not group by that column.
Second question, why would you want to order by but not select all the columns?
If I want to order by the score, but do not care about the actual grade or even the score I might do
select name from sample.grades order by score
Output
Name
Bob
Adam
Charlie
Which results do you expect to see ordering by columns not listed in the select list and not participated in group by clause? at any case all kind of sort by non-mentioned in SELECT list columns will be omitted so Oracle guys added the restriction correctly.
with c as (
select 1 id, 2 value from dual
union all
select 1 id, 3 value from dual
union all
select 2 id, 3 value from dual
)
select id
from c
group by id
order by count(*) desc
Here my inderstanding
"The GROUP BY clause groups the rows, but it does not necessarily sort the results in any particular order."
-> you can use Group by without order by
"To change the order, use the ORDER BY clause, which follows the GROUP BY clause."
-> the rows are selected by defaut with primary key, and if you add order by you must add after group by
"The columns used in the ORDER BY clause must appear in the SELECT list, which is unlike the normal use of ORDER BY."