I have a very large view containing 5 million records containing repeated names with each row having unique transaction number. Another view of 9000 records containing unique names is also present. Now I want to retrieve records in first view whose names are present in second view
select * from v1 where name in (select name from v2)
But the query is taking very long to run. Is there any short cut method?
Did you try just using a INNER JOIN. This will return all rows that exist in both tables:
select v1.*
from v1
INNER JOIN v2
on v1.name = v2.name
If you need help learning JOIN syntax, here is a great visual explanation.
You can add the DISTINCT keyword which will remove any duplicate values that the query returns.
use JOIN.
The DISTINCT will allow you to return only unique records from the list since you are joining from the other table and there could be possibilities that a record may have more than one matches on the other table.
SELECT DISTINCT a.*
FROM v1 a
INNER JOIN v2 b
ON a.name = b.name
For faster performance, add an index on column NAME on both tables since you are joining through it.
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
Related
I have a requirement to pull records, that do not have history in an archive table. 2 Fields of 1 record need to be checked for in the archive.
In technical sense my requirement is a left join where right side is 'null' (a.k.a. an excluding join), which in abap openSQL is commonly implemented like this (for my scenario anyways):
Select * from xxxx //xxxx is a result for a multiple table join
where xxxx~key not in (select key from archive_table where [conditions] )
and xxxx~foreign_key not in (select key from archive_table where [conditions] )
Those 2 fields are also checked against 2 more tables, so that would mean a total of 6 subqueries.
Database engines that I have worked with previously usually had some methods to deal with such problems (such as excluding join or outer apply).
For this particular case I will be trying to use ABAP logic with 'for all entries', but I would still like to know if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields using SQL (without involving application server).
I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases.
Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.
Select * from xxxx //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS (
select key from archive_table
where key = xxxx~key OR key = XXXX-foreign_key
)
EDIT:
With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX in my example contained a single left join ( where main to secondary table relation is 1-* ) and it appeared relatively fast.
This is where context becomes helpful for understanding the problem:
Initial requirement: pull all vendors, without financial records in 3
tables.
Additional requirements: also exclude based on alternative
payers (1-* relationship). This is what example above is based on.
More requirements: also exclude based on alternative payee (*-* relationship between payer and payee).
Many-to-many join exponentially increased the record count within the construct I labeled XXXX, which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers, and 3 payees produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.
At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance.
than any smarter looking alternatives.
select * from lfa1 inner join lfb1
where
( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
inner join bsik on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs )
and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
)
and [3 more sets of sub queries like the 3 above, just checking different tables].
My Conclusion:
When exclusion is based on a single field, both not in/not exits work. One might be better than the other, depending on filters you use.
When exclusion is based on 2 or more fields and you don't have many-to-many join in main query, not exists ( select .. from table where id = a.id or id = b.id or... ) appears to be the best.
The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).
Anyways, any additional insight into this is welcome.
EDIT2: Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works. I just failed horribly at implementing it the first time.
select header~key
from headers left join items on headers~key = items~key
where items~key is null
if it is possible to use results of a sub-query to check more than
than 1 field or use another form of excluding join logic on multiple
fields
No, it is not possible to check two columns in subquery, as SAP Help clearly says:
The clauses in the subquery subquery_clauses must constitute a scalar
subquery.
Scalar is keyword here, i.e. it should return exactly one column.
Your subquery can have multi-column key, and such syntax is completely legit:
SELECT planetype, seatsmax
FROM saplane AS plane
WHERE seatsmax < #wa-seatsmax AND
seatsmax >= ALL ( SELECT seatsocc
FROM sflight
WHERE carrid = #wa-carrid AND
connid = #wa-connid )
however you say that these two fields should be checked against different tables
Those 2 fields are also checked against two more tables
so it's not the case for you. Your only choice seems to be multi-join.
P.S. FOR ALL ENTRIES does not support negation logic, you cannot just use some sort of NOT IN FOR ALL ENTRIES, it won't be that easy.
I am looking for the best way to combine two tables in a way that will remove duplicate records based on email with a priority of replacing any duplicates with the values in "Table 2", I have considered full outer join and UNION ALL but Union all will be too large as each table has several 1000 columns. I want to create this combination table as my full reference table and save as a view so I can reference it without always adding a union or something to that effect in my already complex statements. From my understanding, a full outer join will not necessarily remove duplicates. I want to:
a. Create table with ALL columns from both tables (fields that don't apply to records in one table will just have null values)
b. Remove duplicate records from this master table based on email field but only remove the table 1 records and keep the table 2 duplicates as they have the information that I want
c. A left-join will not work as both tables have unique records that I want to retain and I would like all 1000+ columns to be retained from each table
I don't know how feasible this even is but thank you so much for any answers!
If I understand your question correctly you want to join two large tables with thousands of columns that (hopefully) are the same between the two tables using the email column as the join condition and replacing duplicate records between the two tables with the records from Table 2.
I had to do something similar a few days ago so maybe you can modify my query for your purposes:
WITH only_in_table_1 AS(
SELECT *
FROM table_1 A
WHERE NOT EXISTS
(SELECT * FROM table_2 B WHERE B.email_field = A.email_field))
SELECT * FROM table_2
UNION ALL
SELECT * FROM only_in_table_1
If the columns/fields aren't the same between tables you can use a full outer join on only_in_table_1 and table_2
try using a FULL OUTER JOIN between the two tables and then a COALESCE function on each resultset column to determine from which table/column the resultset column is populated
I have run this query in adventureworks but the result is run successfully but i only get the columns instead of the data with columns how so?
select
a.BusinessEntityID,b.bonus,b.SalesLastYear
from
[Sales].[SalesPersonQuotaHistory] a
inner join
[Sales].[SalesPerson] b
on
a.SalesQuota = b.SalesQuota
My best guess is that instead of joining the tables on SalesQuota, you should be joining them on something else - An ID field, typically.
I don't have Adventureworks here, but judging from the names of the tables and the columns that you've provided, I would assume that there's a SalesPersonID field of some sort that actually connects a Salesperson's quota history to the Salesperson him/herself.
I would expect that you're looking for something closer to this:
SELECT
a.BusinessEntityID
,b.bonus
,b.SalesLastYear
FROM [Sales].[SalesPersonQuotaHistory] a
INNER JOIN [Sales].[SalesPerson] b
ON a.SalesPersonID = b.SalesPersonID
General Knowledge:
INNER JOIN means "Show me only entries (rows) that have a matching value on both sides of the condition." (i.e. The value in Table A matches the value in Table B).
So ON a.SalesQuota = b.SalesQuota means "Only where the value of SalesQuota in Table A matches the value of SalesQuota in Table B."
I'm not sure what the purpose of this query could be, since it is entirely possible that two salespeople have the same values in both tables, and then you would get duplicate rows (because the values of SalesQuota would match in both cases), or that the values wouldn't match at all, and then you wouldn't get any rows - I suspect that is what's happening to you.
Consider the conditions of what you're trying to join. Are you really trying to join quota amounts, or are you trying to retrieve quota information for specific salespeople? The answer should help guide your JOIN conditions.
i've been recently working in mysql and in one of the requests i wrote :
SELECT SIGLE_EEP, ID_SOUS_MODULE, LIBELLE
FROM mef_edi.eep a, mef_edi.envoi e, mef_edi.sous_module s
WHERE a.ID_EEP = e.ID_EEP
AND a.ID_SOUS_MODULE = s.ID_SOUS_MODULE;
and they told me :
Column ID_SOUS_MODULE in field list is ambiguous
What should i do ?
More than one table has a column named ID_SOUS_MODULE.
So you need to name the table every time you mention the column to specify which table you mean.
Change
SELECT ID_SOUS_MODULE
for instance to
SELECT a.ID_SOUS_MODULE
I agree with the answer above, you may have duplicate column names across your 3 tables, assigning the table id (a, e, s) as noted above will avoid that issue in the select. In addition to what #juergen said you may want to get rid of that cartesian join by using an inner or left join (inner seems to be what your going for). The way you are joining your table you are joining every possible combination of rows together than filtering. using a proper join will get you better performance in the long run as your table line counts grow. Here is an example of a non cartesian join:
SELECT SIGLE_EEP, ID_SOUS_MODULE, LIBELLE
FROM mef_edi.eep a
INNER JOIN mef_edi.envoi e ON (a.ID_EEP = e.ID_EEP)
INNER JOIN mef_edi.sous_module s ON (a.ID_SOUS_MODULE = s.ID_SOUS_MODULE)
I have combined two different tables together, one side is named DynDom and the other is CATH. I am trying to remove duplicates from that table such as below:
However, if i select distinct Dyndom pdbcode from the table, it returns distinct values of that pdbcode.
and
Based on the pictures above, I commented out the DynDom/CATH columns in the table and ran the query separately for DynDom/CATH and it returned those values accordingly, which is what i need and i was wondering if it's possible for me to use 2 distinct statements to return distinct values of the entire table based on the pdbcode.
Here's my code :
select DISTINCT
cath_dyndom_table_2."DYNDOM_DOMAINID",
cath_dyndom_table_2."DYNDOM_DSTART",
cath_dyndom_table_2."DYNDOM_DEND",
cath_dyndom_table_2."DYNDOM_CONFORMERID",
cath_dyndom_table_2.pdbcode,
cath_dyndom_table_2."DYNDOM_ChainID",
cath_dyndom_table_2.cath_pdbcode,
cath_dyndom_table_2."CATH_BEGIN",
cath_dyndom_table_2."CATH_END"
from
cath_dyndom_table_2
where
pdbcode = '2hun'
order by
cath_dyndom_table_2."DYNDOM_DOMAINID",
cath_dyndom_table_2."DYNDOM_DSTART",
cath_dyndom_table_2."DYNDOM_DEND",
cath_dyndom_table_2.pdbcode,
cath_dyndom_table_2.cath_pdbcode,
cath_dyndom_table_2."CATH_BEGIN",
cath_dyndom_table_2."CATH_END";
In the end, i would like to search domains from DynDom and CATH, based on the pdbcode and return the rows without having duplicate values.
Thank you.
UPDATE :
This is my VIEW table that i have done.
CREATE VIEW cath_dyndom_table AS
SELECT
r.domainid AS "DYNDOM_DOMAINID",
r.DomainStart AS "DYNDOM_DSTART",
r.Domain_End AS "DYNDOM_DEND",
r.ddid AS "DYN_DDID",
r.confid AS "DYNDOM_CONFORMERID",
r.pdbcode,
r.chainid AS "DYNDOM_ChainID",
d.cath_pdbcode,
d.cathbegin AS "CATH_BEGIN",
d.cathend AS "CATH_END"
FROM dyndom_domain_table r
FULL OUTER JOIN cath_domains d ON d.cath_pdbcode::character(4) = r.pdbcode
ORDER BY confid ASC;
What you are getting is the cartesian product of the ´two tables`.
In order to get one line without duplicates you need to have to have a 1-to-1 relation between both tables.
You can see HERE what are cartesian joins and HERE how to avoid them!
It sounds as though you want a UNION of domain name and ranges from each table - this can be achieved like so:
SELECT DYNDOM_DOMAINID, DYNDOM_DSTART, DYNDOM_DEND
FROM DynDom
UNION
SELECT RTRIM(cath_pdbcode), CATH_BEGIN, CATH_END
FROM CATH
This should eliminate exact duplicates (ie. where the domain name, start and end are all identical) but will not eliminate duplicate domain names with different ranges - if these exist you will need to decide how to handle them (retain them as separate entries, combine them with lowest start and highest end, or whatever other option is preferred).
EDIT: Actually, I believe you can get the desired results simply by changing the JOIN ON condition in your view to be:
FULL OUTER JOIN cath_domains d
ON d.cath_pdbcode::character(5) = r.pdbcode || r.chainid AND
r.DomainStart <= d.cathbegin AND
r.Domain_End >= d.cathend