I have two crosstab Queries that are almost identical one is made to order the other is not made to order (on/off checkbox (criteria on the crosstabs is true or false)) the MTO one has 203 rows and the NMTO one has 160 rows if I left join so I get everything from MTO one I get 213 rows but I need to have 225 rows total however 17 rows only have NMTO data and aren't being included I've tried to rewrite the join using just conditions and am having no luck
I'm probably missing something simple (I hope)
You need to create a "super-set" of the 2 datasets containing the key columns and then use that in the middle of the query using Outer joins
Stage 1: create a UNION query to get the full set of unique keys from MTO and NMTO. This should contain the 225 records which form the superset.
Stage 2: create a second query which includes this UNION query plus the MTO and NMTO. Set the joins to include everything from UNION query and matching records from MTO; do exactly the same for the join between the UNION query and NMTO.
This should now return the 225 records.
==================================================================
EDIT: worked example
Assuming 2 tables (or queries), mto and nmto with a consistent ID field(s) on which to join, the following query would work:
SELECT superset.id, mto.mto_1, mto.mto_2, nmto.nmto_1, nmto.nmto_2
FROM ((SELECT mto.id from mto UNION SELECT nmto.id from nmto) AS superset
LEFT JOIN mto ON superset.id = mto.id)
LEFT JOIN nmto ON superset.id = nmto.id;
You can extend this approach to include more than 2 tables, with the superset at the centre. Any questions please comment below and I will provide a screen shot and/or further information.
Related
I have a requirement to pull records, that do not have history in an archive table. 2 Fields of 1 record need to be checked for in the archive.
In technical sense my requirement is a left join where right side is 'null' (a.k.a. an excluding join), which in abap openSQL is commonly implemented like this (for my scenario anyways):
Select * from xxxx //xxxx is a result for a multiple table join
where xxxx~key not in (select key from archive_table where [conditions] )
and xxxx~foreign_key not in (select key from archive_table where [conditions] )
Those 2 fields are also checked against 2 more tables, so that would mean a total of 6 subqueries.
Database engines that I have worked with previously usually had some methods to deal with such problems (such as excluding join or outer apply).
For this particular case I will be trying to use ABAP logic with 'for all entries', but I would still like to know if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields using SQL (without involving application server).
I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases.
Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.
Select * from xxxx //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS (
select key from archive_table
where key = xxxx~key OR key = XXXX-foreign_key
)
EDIT:
With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX in my example contained a single left join ( where main to secondary table relation is 1-* ) and it appeared relatively fast.
This is where context becomes helpful for understanding the problem:
Initial requirement: pull all vendors, without financial records in 3
tables.
Additional requirements: also exclude based on alternative
payers (1-* relationship). This is what example above is based on.
More requirements: also exclude based on alternative payee (*-* relationship between payer and payee).
Many-to-many join exponentially increased the record count within the construct I labeled XXXX, which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers, and 3 payees produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.
At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance.
than any smarter looking alternatives.
select * from lfa1 inner join lfb1
where
( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
inner join bsik on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs )
and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
)
and [3 more sets of sub queries like the 3 above, just checking different tables].
My Conclusion:
When exclusion is based on a single field, both not in/not exits work. One might be better than the other, depending on filters you use.
When exclusion is based on 2 or more fields and you don't have many-to-many join in main query, not exists ( select .. from table where id = a.id or id = b.id or... ) appears to be the best.
The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).
Anyways, any additional insight into this is welcome.
EDIT2: Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works. I just failed horribly at implementing it the first time.
select header~key
from headers left join items on headers~key = items~key
where items~key is null
if it is possible to use results of a sub-query to check more than
than 1 field or use another form of excluding join logic on multiple
fields
No, it is not possible to check two columns in subquery, as SAP Help clearly says:
The clauses in the subquery subquery_clauses must constitute a scalar
subquery.
Scalar is keyword here, i.e. it should return exactly one column.
Your subquery can have multi-column key, and such syntax is completely legit:
SELECT planetype, seatsmax
FROM saplane AS plane
WHERE seatsmax < #wa-seatsmax AND
seatsmax >= ALL ( SELECT seatsocc
FROM sflight
WHERE carrid = #wa-carrid AND
connid = #wa-connid )
however you say that these two fields should be checked against different tables
Those 2 fields are also checked against two more tables
so it's not the case for you. Your only choice seems to be multi-join.
P.S. FOR ALL ENTRIES does not support negation logic, you cannot just use some sort of NOT IN FOR ALL ENTRIES, it won't be that easy.
I have an Oracle DB and use this query below to fetch records for a requirement. Five columns from three tables and a where condition.
select un.name, he.emp_no, he.lname, hr.in_unit, hr.out_unit
from hr_employee he
inner join hr_roster hr on he.eid = hr.eid
inner join units un on he.unit = un.unit_code
where hr.unit_date = to_date( '24-JUL-20','dd-MON-yy')
Later on I realize that if used in this way below, without Joins it is slightly faster.
select un.name, he.emp_no, he.lname, hr.in_unit, hr.out_unit
from hr_employee he, hr_roster hr, units un
where hr.unit_date = to_date( '24-JUL-20','dd-MON-yy')
But I notice that there's a difference of the rows getting fetched comparing the queries above.
When I took a row count of both queries, the one using Joins returns 1012 and the other one keeps fetching without a count.
I am bit confused and do not know which query is the most suitable to use.
The Second query treats as a CROSS JOIN, since there's no respective join conditions among those tables' columns, just exists a restriction due to a certain date, while the first one has a standard inner joins among tables with regular INNER JOIN conditions.
The second query is basically incorrect as does not have join conditions on the second and 3rd table, except for a limitation on a date for the first table only. So it basically produces a cartesian product of the selected records from 1rst table times ALL records on 2nd table times ALL records on 3rd table.
The first query, which looks more correct, produces the selected records on 1rst table times the records on 2nd table joined by he.eid = hr.eid times the records on 3rd table joined by he.unit = un.unit_code
Here is my current query
Screenshot of my form:
SELECT * FROM jdsubs
INNER JOIN amipartnumbers ON amipartnumbers.oemitem = jdsubs.oempartnumber
WHERE ((([txtEnterNumber])
In ([jdsubs].[oemsubnumber],[jdsubs].[oempartnumber])));
UNION SELECT * FROM ihsubs
INNER JOIN amipartnumbers ON amipartnumbers.oemitem = ihsubs.oempartnumber
WHERE ((([txtEnterNumber])
In ([ihsubs].[oemsubnumber],[ihsubs].[oempartnumber])));
UNION SELECT * FROM mfsubs
INNER JOIN amipartnumbers ON amipartnumbers.oemitem =mfsubs.oempartnumber
WHERE ((([txtEnterNumber])
In ([mfsubs].[oemsubnumber],[mfsubs].[oempartnumber])));
Can I simplify this to just do a union on one query then on another query i can compare txtEnterNumber to oemsubnumber and oempartnumber?
I feel like this one query is doing too much work.
Or am i doing this right?
I'm searching about a millions records so I want to make sure this is efficient as possible
You'll have to run it as is. Assuming oemitem, oempartnumber, & oemsubnumber are all indexed, as they should be.
If you union everything first, then try compare your part numbers, you'll be doing so against an un-indexed query result.
A couple of ideas for improvement are:
If a part number can match only match just one parts table, then do each query one
at a time until you get a result back.
Combine all three of your part tables (setting 1 field as a flag to
determine part origin), then run your search against that table.
Good luck
I have a very large view containing 5 million records containing repeated names with each row having unique transaction number. Another view of 9000 records containing unique names is also present. Now I want to retrieve records in first view whose names are present in second view
select * from v1 where name in (select name from v2)
But the query is taking very long to run. Is there any short cut method?
Did you try just using a INNER JOIN. This will return all rows that exist in both tables:
select v1.*
from v1
INNER JOIN v2
on v1.name = v2.name
If you need help learning JOIN syntax, here is a great visual explanation.
You can add the DISTINCT keyword which will remove any duplicate values that the query returns.
use JOIN.
The DISTINCT will allow you to return only unique records from the list since you are joining from the other table and there could be possibilities that a record may have more than one matches on the other table.
SELECT DISTINCT a.*
FROM v1 a
INNER JOIN v2 b
ON a.name = b.name
For faster performance, add an index on column NAME on both tables since you are joining through it.
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
I have combined two different tables together, one side is named DynDom and the other is CATH. I am trying to remove duplicates from that table such as below:
However, if i select distinct Dyndom pdbcode from the table, it returns distinct values of that pdbcode.
and
Based on the pictures above, I commented out the DynDom/CATH columns in the table and ran the query separately for DynDom/CATH and it returned those values accordingly, which is what i need and i was wondering if it's possible for me to use 2 distinct statements to return distinct values of the entire table based on the pdbcode.
Here's my code :
select DISTINCT
cath_dyndom_table_2."DYNDOM_DOMAINID",
cath_dyndom_table_2."DYNDOM_DSTART",
cath_dyndom_table_2."DYNDOM_DEND",
cath_dyndom_table_2."DYNDOM_CONFORMERID",
cath_dyndom_table_2.pdbcode,
cath_dyndom_table_2."DYNDOM_ChainID",
cath_dyndom_table_2.cath_pdbcode,
cath_dyndom_table_2."CATH_BEGIN",
cath_dyndom_table_2."CATH_END"
from
cath_dyndom_table_2
where
pdbcode = '2hun'
order by
cath_dyndom_table_2."DYNDOM_DOMAINID",
cath_dyndom_table_2."DYNDOM_DSTART",
cath_dyndom_table_2."DYNDOM_DEND",
cath_dyndom_table_2.pdbcode,
cath_dyndom_table_2.cath_pdbcode,
cath_dyndom_table_2."CATH_BEGIN",
cath_dyndom_table_2."CATH_END";
In the end, i would like to search domains from DynDom and CATH, based on the pdbcode and return the rows without having duplicate values.
Thank you.
UPDATE :
This is my VIEW table that i have done.
CREATE VIEW cath_dyndom_table AS
SELECT
r.domainid AS "DYNDOM_DOMAINID",
r.DomainStart AS "DYNDOM_DSTART",
r.Domain_End AS "DYNDOM_DEND",
r.ddid AS "DYN_DDID",
r.confid AS "DYNDOM_CONFORMERID",
r.pdbcode,
r.chainid AS "DYNDOM_ChainID",
d.cath_pdbcode,
d.cathbegin AS "CATH_BEGIN",
d.cathend AS "CATH_END"
FROM dyndom_domain_table r
FULL OUTER JOIN cath_domains d ON d.cath_pdbcode::character(4) = r.pdbcode
ORDER BY confid ASC;
What you are getting is the cartesian product of the ´two tables`.
In order to get one line without duplicates you need to have to have a 1-to-1 relation between both tables.
You can see HERE what are cartesian joins and HERE how to avoid them!
It sounds as though you want a UNION of domain name and ranges from each table - this can be achieved like so:
SELECT DYNDOM_DOMAINID, DYNDOM_DSTART, DYNDOM_DEND
FROM DynDom
UNION
SELECT RTRIM(cath_pdbcode), CATH_BEGIN, CATH_END
FROM CATH
This should eliminate exact duplicates (ie. where the domain name, start and end are all identical) but will not eliminate duplicate domain names with different ranges - if these exist you will need to decide how to handle them (retain them as separate entries, combine them with lowest start and highest end, or whatever other option is preferred).
EDIT: Actually, I believe you can get the desired results simply by changing the JOIN ON condition in your view to be:
FULL OUTER JOIN cath_domains d
ON d.cath_pdbcode::character(5) = r.pdbcode || r.chainid AND
r.DomainStart <= d.cathbegin AND
r.Domain_End >= d.cathend