Linq Grouping - aggregate, outside of a group by - sql

I've got a SQL query, that works as follows:
SELECT TOP 100
Max(Table_ID) as Max_ID,
Col1,
Col2,
Col3,
COUNT(*) AS Occurences
FROM myTable
GROUP BY Col1, Col2, Col3
ORDER BY Occurences DESC
How can I write an identical Linq query?
The issue is, that as soon as I apply my grouping, I cannot access the non-grouped columns Table_ID in my case.
var errors = from r in MyTable
group e by new {e.Col1, e.Col2} into g
orderby g.Count() descending
select new {MaxId = ???, Count = g.Count(), g.Key.Col1, g.Key.Col2};

Use g.Max(x => x.TableID):
var errors = from r in MyTable
group e by new {e.Col1, e.Col2} into g
orderby g.Count() descending
select new {MaxId = g.Max(x => x.TableID),
Count = g.Count(), g.Key.Col1, g.Key.Col2};
(Assuming you want the maximum within each group, of course.)

Jon's answer is good, I just want to elaborate a little on why:
The issue is, that as soon as i apply my grouping, I cannot access the non-grouped columns
r went out of scope... why is that?
The two ways of ending a query are select or group by clauses. When you add the query continuation clause into g to the group by, you are saying - the top level elements of the query are g and all variables introduced in the query up to this point are removed from scope.
If you search this msdn article for the word splice, you can see samples.
Don't confuse this use of into with join on equals into, which is a group join, not a query continuation. Group join does not remove previous variables from scope.

Related

SQL COUNT FORM JOIN TABLES

I have the following sql command:
SELECT "USERNAME"."TOPICS".VALUE,
"USERNAME"."TOPICS".QID,
"USERNAME"."QUESTION".QRATING
FROM "USERNAME"."TOPICS" JOIN "USERNAME"."QUESTION"
ON "USERNAME"."TOPICS".QID = "USERNAME"."QUESTION".QID
AND "USERNAME"."TOPICS".VALUE = 'kia'
ORDER BY QRATING DESC
It works really well, but I want to count how many element returns. So I tried to use:
SELECT COUNT("USERNAME"."TOPICS".QID)
FROM "USERNAME"."TOPICS" JOIN "USERNAME"."QUESTION"
ON "USERNAME"."TOPICS".QID = "USERNAME"."QUESTION".QID
AND "USERNAME"."TOPICS".VALUE = 'kia'
ORDER BY QRATING DESC
But I get the error :
Column reference 'USERNAME.TOPICS.VALUE' is invalid. When the SELECT
list contains at least one aggregate then all entries must be valid
aggregate expressions.
What is the problem?
Hmmm. The ORDER BY should be getting the error, not the SELECT. However, your query would be much easier to understand using table aliases:
SELECT COUNT(t.QID)
FROM "USERNAME"."TOPICS" t JOIN
"USERNAME"."QUESTION" q
ON t.QID = q.QID AND t.VALUE = 'kia';
If the first query works, I see no reason why this would not (and your original without the ORDER BY should also work).

Assistance with SQL Query (aggregating)

I have a requirement to create a Sales report and I have a sql query:
SELECT --top 1
t.branch_no as TBranchNo,
t.workstation_no as TWorkstation,
t.tender_ref_no as TSaleRefNo,
t.tender_line_no as TLineNo,
t.tender_code as TCode,
T.contribution as TContribution,
l.sale_line_no as SaleLineNo
FROM TENDER_LINES t
LEFT JOIN SALES_TX_LINES l
on t.branch_no = l.branch_no and t.workstation_no = l.workstation_no and t.tender_ref_no = l.sale_tx_no
where l.sale_tx_no = 2000293 OR l.sale_tx_no = 1005246 --OR sale_tx_no = 1005261
order by t.tender_ref_no asc,
l.sale_line_no desc
The results of the query look like the following:
The results I am trying to achieve is:
With only 1 line for transaction 2 either SaleLineNo 1 or 2, while still have=ing both lines for transaction 1 because the TCode is different.
Thanks
I am using SSQL2012.
Not exactly sure on what data you have, but you might want to try
GROUP BY TlineNo, TCode ...
But you have to keep a look on not to group by something that would result in duplicate contribution values.
You can use the ROW_NUMBER function that allows to partition the rows in groups, and number the lines inside each group starting by one. If you choose the right columns to define the partition, and keep only the rows with "row_number = 1`, you have solved the first part of your problem, i.e. discarding the lines that don't have to appear in the report. (See the sample sin the linked documentation, they're quite clear).
Once you have solved this problem, you simply have to repeat what you're doing, but on the result of this data, instead of the original data. You can use a view, a CTE, or a subselect to achieve your result, i.e.
With view:
CREATE VIEW FilteredData AS -- here the rank function query, then selct from the view
SELECT --here your current query --
FROM FilteredData
With CTE
WITH -- here the rank function query
SELECT -- your current querym, from the CTE
With subselect
SELECT -- your current query
FROM (SELECT FROM -- here the rank function query -- )
Appreciate your assistance with my query. After playing around, I have found a solution that works just as I want. It is as below: I did a Group by as hinted by #Yogesh86 on a few fields.
SELECT
MAX(t.branch_no) as TBranchNo,
Max(t.workstation_no) as TWorkstation,
t.tender_ref_no as TSaleRefNo,
Max(t.tender_line_no) as TLineNo,
t.tender_code as TCode,
MAx(T.contribution) as TContribution,
MAX(l.sale_line_no) as SaleLineNo
FROM TENDER_LINES t
LEFT JOIN SALES_TX_LINES l
on t.branch_no = l.branch_no and t.workstation_no = l.workstation_no and t.tender_ref_no = l.sale_tx_no
where l.sale_tx_no = 2000293 OR l.sale_tx_no = 1005246 --OR sale_tx_no = 1005261
GROUP BY
t.tender_ref_no,
t.tender_line_no,
t.tender_code

single-row subquery returns more than one row. Query not working with main query

I hve to display several cell values into one cell. So I am using this query:
select LISTAGG(fc.DESCRIPTION, ';'||chr(10))WITHIN GROUP (ORDER BY fc.SWITCH_NAME) AS DESCRIP from "ORS".SWITCH_OPERATIONS fc
group by fc.SWITCH_NAME
It is working fine. But when I am merging this with my main(complete) query then I am getting the error as: Error code 1427, SQL state 21000: ORA-01427: single-row subquery returns more than one row
Here is my complete query:
SELECT
TRACK_EVENT.LOCATION,
TRACK_EVENT.ELEMENT_NAME,
(select COUNT(*) from ORS.TRACK_EVENT b where (b.ELEMENT_NAME = sw.SWITCH_NAME)AND (b.ELEMENT_TYPE = 'SWITCH')AND (b.EVENT_TYPE = 'I')AND (b.ELEMENT_STATE = 'NORMAL' OR b.ELEMENT_STATE = 'REVERSE'))as COUNTER,
(select COUNT(*) from ORS.SWITCH_OPERATIONS fc where TRACK_EVENT.ELEMENT_NAME = fc.SWITCH_NAME and fc.NO_CORRESPONDENCE = 1 )as FAIL_COUNT,
(select MAX(cw.COMMAND_TIME) from ORS.SWITCH_OPERATIONS cw where ((TRACK_EVENT.ELEMENT_NAME = cw.SWITCH_NAME) and (cw.NO_CORRESPONDENCE = 1)) group by cw.SWITCH_NAME ) as FAILURE_DATE,
(select LISTAGG(fc.DESCRIPTION, ';'||chr(10))WITHIN GROUP (ORDER BY fc.SWITCH_NAME) AS DESCRIP from "ORS".SWITCH_OPERATIONS fc
group by fc.SWITCH_NAME)
FROM
ORS.SWITCH_OPERATIONS sw,
ORS.TRACK_EVENT TRACK_EVENT
WHERE
sw.SEQUENCE_ID = TRACK_EVENT.SEQUENCE_ID
Not only are subqueries in the SELECT list required to return exactly one row (or any time they're used for a singular comparison, like <, =, etc), but their use in that context tends to make the database execute them RBAR - Row-by-agonizing-row. That is, they're slower and consume more resources than they should.
Generally, unless the result set outside the subquery contains only a few rows, you want to construct subqueries as part of a table-reference. Ie, something like:
SELECT m.n, m.z, aliasForSomeTable.a, aliasForSomeTabe.bSum
FROM mainTable m
JOIN (SELECT a, SUM(b) AS bSum
FROM someTable
GROUP BY a) aliasForSomeTable
ON aliasForSomeTable.a = m.a
This benefits you in other ways to - it's easier to get multiple columns out of the same table-reference, for example.
Assuming that LISTAGG(...) can be included with other aggregate functions, you can change your query to look like this:
SELECT Track_Event.location, Track_Event.element_name,
Counted_Events.counter,
Failure.fail_count, Failure.failure_date, Failure.descrip
FROM ORS.Track_Event
JOIN ORS.Switch_Operations
ON Switch_Operations.sequence_id = Track_Event.sequence_id
LEFT JOIN (SELECT element_name, COUNT(*) AS counter
FROM ORS.Track_Event
WHERE element_type = 'SWITCH'
AND event_type = 'I'
AND element_state IN ('NORMAL', 'REVERSE')
GROUP BY element_name) Counted_Events
ON Counted_Events.element_name = Switch_Operations.swicth_name
LEFT JOIN (SELECT switch_name,
COUNT(CASE WHEN no_correspondence = 1 THEN '1' END) AS fail_count,
MAX(CASE WHEN no_correspondence = 1 THEN command_time END) AS failure_date,
LISTAGG(description, ';' || CHAR(10)) WITHIN GROUP (ORDER BY command_time) AS descrip
FROM ORS.Switch_Operations
GROUP BY switch_name) Failure
ON Failure.switch_name = Track_Event.element_name
This query was written to (attempt to) preserve the semantics of your original query. I'm not completely sure that's what you actually need but without sample starting data and desired results, I have no way to tell how else to improve this. For instance, I'm a little suspicious of the need of Switch_Operations in the outer query, and the fact that LISTAGG(...) is run over row where no_correspondence <> 1. I did change the ordering of LISTAGG(...), because the original column would not have done anything (because the order way the same as the grouping), so would not have been a stable sort.
Single-row subquery returns more than one row.
This error message is self descriptive.
Returned field can't have multiple values and your subquery returns more than one row.
In your complete query you specify fields to be returned. The last field expects single value from the subquery but gets multiple rows instead.
I have no clue about the data you're working with but either you have to ensure that subquery returns only one row or you have to redesign the wrapping query (possibly using joins when appropriate).

Set limit to array_agg()

I have the following Postgres query:
SELECT array_agg("Esns".id )
FROM public."Esns",
public."PurchaseOrderItems"
WHERE
"Esns"."PurchaseOrderItemId" = "PurchaseOrderItems".id
AND "PurchaseOrderItems"."GradeId"=2
LIMIT 2;
The limit will affect the rows. I want it to limit the array_agg() to 2 items. The following query works but I get my output with each entry in quotes:
SELECT array_agg ("temp")
FROM (
SELECT "Esns".id
FROM public."Esns",
public."PurchaseOrderItems"
WHERE
"Esns"."PurchaseOrderItemId" = "PurchaseOrderItems".id
AND "PurchaseOrderItems"."GradeId"=2
LIMIT 4
) as "temp" ;
This give me the following output
{(13),(14),(15),(12)}
Any ideas?
select id[1], id[2]
from (
SELECT array_agg("Esns".id ) as id
FROM public."Esns",
public."PurchaseOrderItems"
WHERE
"Esns"."PurchaseOrderItemId" = "PurchaseOrderItems".id
AND "PurchaseOrderItems"."GradeId"=2
) s
or if you want the output as array you can slice it:
SELECT (array_agg("Esns".id ))[1:2] as id_array
FROM public."Esns",
public."PurchaseOrderItems"
WHERE
"Esns"."PurchaseOrderItemId" = "PurchaseOrderItems".id
AND "PurchaseOrderItems"."GradeId"=2
The parentheses (not "quotes") in the result are decorators for the row literals. You are building an array of whole rows (which happen to contain only a single column). Instead, aggregate only the column.
Also, direct array construction from a query result is typically simpler and faster:
SELECT ARRAY (
SELECT e.id
FROM public."Esns" e
JOIN public."PurchaseOrderItems" p ON p.id = e."PurchaseOrderItemId"
WHERE p."GradeId" = 2
-- ORDER BY ???
LIMIT 4 -- or 2?
)
You need to ORDER BY something if you want a stable result and / or pick certain rows. Otherwise the result is arbitrary and can change with every next call.
While being at it I rewrote the query with explicit JOIN syntax, which is generally preferable, and used table aliases to simplify.

HQL and grouping

After much problems with using 'group by' in linq2nhibernate, I have tried to switch to HQL, but I am struggeling with a simple example.
I have the following table (ForumThreadRatings):
I would like to retrieve a list of the highest rated forum threads, which means I need to do a sum with the positive column and a group by the forumthread. I have tried for an example just to do a simple group by in HQL with no luck:
select ftr.ForumThread from ForumThreadRating ftr group by ftr.ForumThread
But I receive the following error:
Column 'ForumThreads.Id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
What might I be missing?
From the docs:
NHibernate currently does not expand a grouped entity, so you can't write group by cat if all properties of cat are non-aggregated. You have to list all non-aggregated properties explicitly.
In any case, that exact query can be accomplished by:
select distinct ftr.ForumThread from ForumThreadRating ftr
But of course you probably need to sum or count something, so you'll need to explicitly aggregate the properties.
Update: here's how to get the top 10 threads:
var topThreads = session.CreateQuery(#"
select (select sum(case
when rating.Positive = true then 1
else -1
end)
from ForumThreadRating rating
where rating.ForumThread = thread),
thread
from ForumThread thread
order by 1 desc
")
.SetMaxResults(10)
.List<object[]>()
As you can see, this query returns a list of object[] with two elements each: [0] is the rating and [1] is the ForumThread.
You can get just the ForumThreads using:
.Select(x => (ForumThread)x[1]);
Or project them into a DTO, etc.