GroupBy and SelectMany analogues in SQL?

GroupBy and SelectMany analogues in SQL? - sql

In C# I can group collection items, run subquery for each group and then return rows.
var rows = collection.GroupBy( item => item.Property).SelectMany( g => ...);
In SelectMany part query can run against the collection (due to the closure) or/and the grouped items. The results of subqueries are then combined.
My problem is that I do not know how to achieve the same with SQL query.
Subquery placed inside select can be parametrized but allows to return one row at most, and joins do not exclude items that do not belong to the group.
I have no sql-oriented mindset and apologize if it is really a dumb question.
Example of temporal DB:
id instance_id data from to
1 1 A 20140301 20140310
2 1 AA 20140311 20140321
3 2 B 20140301 20140305
4 2 BC 20140306 20140316
I need to run subqueries for each instance_id i.e. against historical data of the entity and select only part of the history for each entity.

You can do that using CROSS APPLY like this:
SELECT *
FROM mytable
CROSS APPLY (
SELECT *
FROM othertable
WHERE instance_id = mytable.instance_id)
I would recommend using a join if possible, but APPLY comes in handy if you don't have a simple join condition. This article explains it nicely.

Related

Query with Left outer join and group by returning duplicates

To begin with, I have a table in my db that is fed with SalesForce info. When I run this example query it returns 2 rows:
select * from SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513'
When I run this next query on the same table I obtain one of the rows, which is what I need:
SELECT MAX(ID_SAP_BAYER__c) FROM SalesForce_INT_Account__c where ID_SAP_BAYER__c = '3783513' GROUP BY ID_SAP_BAYER__c
Now, I have another table (PedidosEspecialesZarateCabeceras) which has a field (NroClienteDireccionEntrega) that I can match with the field I've been using in the SalesForce table (ID_SAP_BAYER__c). This table has a key that consists of just 1 field (NroPedido).
What I need to do is join these 2 tables to obtain a row from PedidosEspecialesZarateCabeceras with additional fields coming from the SalesForce table, and in case those additional fields are not available, they should come as NULL values, so for that im using a LEFT OUTER JOIN.
The problem is, since I have to match NroClienteDireccionEntrega and ID_SAP_BAYER__c and there's 2 rows in the salesforce table with the same ID_SAP_BAYER__c, my query returns 2 duplicate rows from PedidosEspecialesZarateCabeceras (They both have the same NroPedido).
This is an example query that returns duplicates:
SELECT
cab.CUIT AS CUIT,
convert(nvarchar(4000), cab.NroPedido) AS NroPedido,
sales.BillingCity__c as Localidad,
sales.BillingState__c as IdProvincia,
sales.BillingState__c_Desc as Provincia,
sales.BillingStreet__c as Calle,
sales.Billing_Department__c as Distrito,
sales.Name as RazonSocial,
cab.NroCliente as ClienteId
FROM PedidosEspecialesZarateCabeceras AS cab WITH (NOLOCK)
LEFT OUTER JOIN
SalesForce_INT_Account__c AS sales WITH (NOLOCK) ON
cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c
and sales.ID_SAP_BAYER__c in
( SELECT MAX(ID_SAP_BAYER__c)
FROM SalesForce_INT_Account__c
GROUP BY ID_SAP_BAYER__c
)
WHERE cab.NroPedido ='5320'
Even though the join has MAX and Group By, this returns 2 duplicate rows with different SalesForce information (Because of the 2 salesforce rows with the same ID_SAP_BAYER__c), which should not be possible.
What I need is for the left outer join in my query to pick only ONE of the salesforce rows to prevent duplication like its happening right now. For some reason the select max with the group by is not working.
Maybe I should try to join this tables in a different way, can anyone give me some other ideas on how to join the two tables to return just 1 row? It doesnt matter if the SalesForce row that gets picked out of the 2 isn't the correct one, I just need it to pick one of them.

Your IN clause is not actually doing anything, since...
SELECT MAX(ID_SAP_BAYER__c)
FROM SalesForce_INT_Account__c
GROUP BY ID_SAP_BAYER__c
... returns all possible IDSAP_BAYER__c values. (The GROUP BY says you want to return one row per unique ID_SAP_BAYER__c and then, since your MAX is operating on exactly one unique value per group, you simply return that value.)
You will want to change your query to operate on a value that is actually different between the two rows you are trying to differentiate (probably the MAX(ID) for the relevant ID_SAP_BAYER__c). Plus, you will want to link that inner query to your outer query.
You could probably do something like:
...
LEFT OUTER JOIN
SalesForce_INT_Account__c sales
ON cab.NroClienteDireccionEntrega = sales.ID_SAP_BAYER__c
and sales.ID in
(
SELECT MAX(ID)
FROM SalesForce_INT_Account__c sales2
WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega
)
WHERE cab.NroPedido ='5320'
By using sales.ID in ... SELECT MAX(ID) ... instead of sales.ID_SAP_BAYER__c in ... SELECT MAX(ID_SAP_BAYER__c) ... this ensures you only match one of the two rows for that ID_SAP_BAYER__c. The WHERE sales2.ID_SAP_BAYER__c = cab.NroClienteDireccionEntrega condition links the inner query to the outer query.
There are multiple ways of doing the above, especially if you don't care which of the relevant rows you match on. You can use the above as a starting point and make it match your preferred style.
An alternative might be to use OUTER APPLY with TOP 1. Something like:
SELECT
...
FROM PedidosEspecialesZarateCabeceras AS cab
OUTER APPLY(
SELECT TOP 1 *
FROM SalesForce_INT_Account__c s1
WHERE cab.NroClienteDireccionEntrega = s1.ID_SAP_BAYER__c
) sales
WHERE cab.NroPedido ='5320'
Without an ORDER BY the match that TOP 1 chooses will be arbitrary, but I think that's what you want anyway. (If not, you could add an ORDER BY).

Abap subquery Where Cond [duplicate]

I have a requirement to pull records, that do not have history in an archive table. 2 Fields of 1 record need to be checked for in the archive.
In technical sense my requirement is a left join where right side is 'null' (a.k.a. an excluding join), which in abap openSQL is commonly implemented like this (for my scenario anyways):
Select * from xxxx //xxxx is a result for a multiple table join
where xxxx~key not in (select key from archive_table where [conditions] )
and xxxx~foreign_key not in (select key from archive_table where [conditions] )
Those 2 fields are also checked against 2 more tables, so that would mean a total of 6 subqueries.
Database engines that I have worked with previously usually had some methods to deal with such problems (such as excluding join or outer apply).
For this particular case I will be trying to use ABAP logic with 'for all entries', but I would still like to know if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields using SQL (without involving application server).

I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases.
Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.
Select * from xxxx //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS (
select key from archive_table
where key = xxxx~key OR key = XXXX-foreign_key
)
EDIT:
With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX in my example contained a single left join ( where main to secondary table relation is 1-* ) and it appeared relatively fast.
This is where context becomes helpful for understanding the problem:
Initial requirement: pull all vendors, without financial records in 3
tables.
Additional requirements: also exclude based on alternative
payers (1-* relationship). This is what example above is based on.
More requirements: also exclude based on alternative payee (*-* relationship between payer and payee).
Many-to-many join exponentially increased the record count within the construct I labeled XXXX, which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers, and 3 payees produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.
At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance.
than any smarter looking alternatives.
select * from lfa1 inner join lfb1
where
( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
inner join bsik on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs )
and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
)
and [3 more sets of sub queries like the 3 above, just checking different tables].
My Conclusion:
When exclusion is based on a single field, both not in/not exits work. One might be better than the other, depending on filters you use.
When exclusion is based on 2 or more fields and you don't have many-to-many join in main query, not exists ( select .. from table where id = a.id or id = b.id or... ) appears to be the best.
The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).
Anyways, any additional insight into this is welcome.
EDIT2: Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works. I just failed horribly at implementing it the first time.
select header~key
from headers left join items on headers~key = items~key
where items~key is null

if it is possible to use results of a sub-query to check more than
than 1 field or use another form of excluding join logic on multiple
fields
No, it is not possible to check two columns in subquery, as SAP Help clearly says:
The clauses in the subquery subquery_clauses must constitute a scalar
subquery.
Scalar is keyword here, i.e. it should return exactly one column.
Your subquery can have multi-column key, and such syntax is completely legit:
SELECT planetype, seatsmax
FROM saplane AS plane
WHERE seatsmax < #wa-seatsmax AND
seatsmax >= ALL ( SELECT seatsocc
FROM sflight
WHERE carrid = #wa-carrid AND
connid = #wa-connid )
however you say that these two fields should be checked against different tables
Those 2 fields are also checked against two more tables
so it's not the case for you. Your only choice seems to be multi-join.
P.S. FOR ALL ENTRIES does not support negation logic, you cannot just use some sort of NOT IN FOR ALL ENTRIES, it won't be that easy.

SQL SELECT query where the IDs were already found

I have 2 tables:
Table A has 3 columns (for example) with opportunity sales header data:
OPP_ID, CLOSE_DTTM, STAGE
Table B has 3 columns with the individual line items for the Opportunities:
OPP_LINE_ID, OPP_ID, AMOUNT_USD
I have a select statement that correctly parses through Table A and returns a list of Opportunities. What I would like to do is, without joining the data, to have a SELECT statement that will get data from Table B but only for the OPP_IDs that were found in my first query.
The result should be 2 views/resultset (one for each select query) and not just 1 combined view where Table B is joined to Table A.
The reason why I want to keep them separate is because I will have to perform a few manipulations to the result from table B and i don't want the result from table A affected.

Subquery is all what you need
SELECT OPP_ID, CLOSE_DTTM, STAGE
From table a
where a.opp_id IN (Select opp_id from table b)

Presuming you're using this in some client side data access library that represents B's data in some 2 dimensional collection and you want to manipulate it without affecting/ having A's data present in that collection:
Identify the records in A:
SELECT * FROM a WHERE somecolumn = 'somevalue'
Identify the records in B that relate to A, but don't return A's data:
SELECT b.* FROM a JOIN b ON a.opp_id = b.opp_id WHERE a.somecolumn = 'somevalue'
Just because JOIN is used doesn't mean your end-consuming program has to know about A's data. You could also use IN, like the other answer does, but internally the database will rewrite them to be the same thing anyway

I tend to use exists for this type of query:
select b.*
from b
where exists (select 1 from a where a.opp_id = b.opp_id);
If you want two results sets, you need to run two queries. It is unclear what the second query is, perhaps the first query on A.

Complex QueryOver

Hi I have to translate folowing sql to QueryOver
Will it be possible ? my actual query may more complex. But I have stuck in this stage.
SELECT InnerQuery.USERID,
InnerQuery.TRAFFICZONEID,
InnerQuery.StatusCategory,
COUNT(*) AS LineCount
FROM (
SELECT MissionID,
UserId,
TRAFFICZONEID,
CASE
WHEN status BETWEEN 1
AND 5
THEN 1
WHEN status BETWEEN 6
AND 8
THEN 2
WHEN status BETWEEN 9
AND 17
THEN 3
ELSE 0
END AS [StatusCategory]
FROM mission
) AS InnerQuery
LEFT OUTER JOIN trafficzone t ON InnerQuery.TRAFFICZONEID = t.Trafficzoneid
GROUP BY InnerQuery.USERID,
InnerQuery.TRAFFICZONEID,
InnerQuery.StatusCategory
Is it possible to do that kind of uery in QueryOver ? Or else what is the best way to do it with NHibernate ?
Thank you,
DineshNS

I suggest you use HQL for this.
From all the query methods supported by NHibernate, it's the one that works best with free-form queries.
The only thing it does NOT support directly is an outer join to a subquery. You can fake that using an implicit join with the subquery (from A, (subquery) B where A.X = B.X) and a UNION for items that don't have matching elements in the subquery.

For a complex query, you could create a view in database and use that from NHibernate.

Optimize a query that is using multiple left joins on the same tables

I've come across a query that is taking "too long". The query has 50+ left joins between 10 or so tables. To give a brief overview of the database model, the tables joined are tables that store data for a particular data type (ex: date_fields, integer_fields, text_fields, etc.) and each has a column for the value, a "datafield" id, and a ticket id. The query is built programmatically based on an association table between a "ticket" and its "data fields".
The join statements look something like the following:
...FROM tickets t
LEFT JOIN ticket_text_fields t001 ON(t.id=t001.ticket_id AND t001.textfield_id=7)
...
LEFT JOIN ticket_date_fields t056 ON(t.id=t056.ticket_id AND t056.datafield_id=434)
When using explain on the query shows the following:
1 SIMPLE t ref idx_dataset_id idx_dataset_id 5 const 2871 Using where; Using temporary; Using filesort
1 SIMPLE t001 ref idx_ticket_id,idx_datafield_id idx_ticket_id 5 t.id 5
...
1 SIMPLE t056 ref idx_ticket_id,idx_datafield_id idx_ticket_id 5 t.id 8
What direction can I take to tune this query? All the indexes seem to be in place. Perhaps the t table (tickets) row number (2871) should be reduced. How many left joins is too much? Should the datafield tables be joined only once and then queried each for the data that is required?

You're using a variation of the terrible antipattern called Entity-Attribute-Value. You're storing attributes on separate rows, so if you want to reconstruct something that looks like a conventional row of data, you need to make one join per attribute.
It's not surprising this creates a query with 50 joins. This is far too many for most databases to run efficiently (you haven't identified which database you're using). Eventually you'll want a few more attributes and you might exceed some architectural limit of the database on the number of joins it can do.
The solution is: don't reconstruct the row in SQL.
Instead, query the attributes as multiple rows, instead of trying to combine them onto a single row.
SELECT ... FROM tickets t
INNER JOIN ticket_text_fields f ON t.id=f.ticket_id
WHERE f.textfield_id IN (7, 8, 9, ...)
UNION ALL
SELECT ... FROM tickets t
INNER JOIN ticket_date_fields d ON t.id=d.ticket_id
WHERE d.datafield_id IN (434, 435, 436, ...)
Then you have to write a function in your application to loop over the resulting rowset, and collect the attributes one by one into an object in application space, so then you can use it as if it's a single entity.

for the clearer query i would use something like this:
SELECT ... FROM tickets as t
JOIN ticket_text_fields as txt ON t.id = txt.ticket_id
JOIN ticket_date_fields as dt ON t.id = dt.ticket_id
WHERE txt.textfield_id IN (...)
AND dt.datefield_id IN (...)
Joins would be probably LEFT, but it depends on the structure of your data.
There is no union in the query and there are only two joins

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

GroupBy and SelectMany analogues in SQL? - sql

You can do that using CROSS APPLY like this: SELECT * FROM mytable CROSS APPLY ( SELECT * FROM othertable WHERE instance_id = mytable.instance_id) I would recommend using a join if possible, but APPLY comes in handy if you don't have a simple join condition. This article explains it nicely.

Related

Query with Left outer join and group by returning duplicates

Abap subquery Where Cond [duplicate]

SQL SELECT query where the IDs were already found

Complex QueryOver

Optimize a query that is using multiple left joins on the same tables

Categories

Resources