I ran two queries that I came across and they both seemed to have returned different results.
SELECT *
FROM [Log].[dbo].[LogTable] AS MainLog, [Archive].[dbo].[LogTable]
ORDER BY [Log].[dbo].[LogTable] DESC;
SELECT *
FROM [Log].[dbo].[LogTable]
UNION ALL
SELECT *
FROM [Archive].[dbo].[LogTable]
ORDER BY [Log].[dbo].[LogTable] DESC;
The second query returned the correct number of rows. It also ordered correctly too. The first query brought back a lot of rows, what exactly did the first query do? It didn't error and it did combine the data.
Without specifying how and what columns to join on, you are performing a cross-join. The second query is selecting the records from each table and concatenating the results together.
What you are getting in query one is called Cartesian product. Let's read your query step by step.
Access two databases.
Join LogTable in Archive and Log database.
No Where clause or Join on Condition makes way for Cross join
Cross Join means if you have a,b,c in Log and 1,2,3 in Archive DB. O/P will be a1,a2,a3,b1,b2,b3,c1,c2,c3. (of course row wise)
Where as in second query you are just writing o/p of two statements one after the other. a,b,c,1,2,3
Related
I have an Oracle DB and use this query below to fetch records for a requirement. Five columns from three tables and a where condition.
select un.name, he.emp_no, he.lname, hr.in_unit, hr.out_unit
from hr_employee he
inner join hr_roster hr on he.eid = hr.eid
inner join units un on he.unit = un.unit_code
where hr.unit_date = to_date( '24-JUL-20','dd-MON-yy')
Later on I realize that if used in this way below, without Joins it is slightly faster.
select un.name, he.emp_no, he.lname, hr.in_unit, hr.out_unit
from hr_employee he, hr_roster hr, units un
where hr.unit_date = to_date( '24-JUL-20','dd-MON-yy')
But I notice that there's a difference of the rows getting fetched comparing the queries above.
When I took a row count of both queries, the one using Joins returns 1012 and the other one keeps fetching without a count.
I am bit confused and do not know which query is the most suitable to use.
The Second query treats as a CROSS JOIN, since there's no respective join conditions among those tables' columns, just exists a restriction due to a certain date, while the first one has a standard inner joins among tables with regular INNER JOIN conditions.
The second query is basically incorrect as does not have join conditions on the second and 3rd table, except for a limitation on a date for the first table only. So it basically produces a cartesian product of the selected records from 1rst table times ALL records on 2nd table times ALL records on 3rd table.
The first query, which looks more correct, produces the selected records on 1rst table times the records on 2nd table joined by he.eid = hr.eid times the records on 3rd table joined by he.unit = un.unit_code
I seem to be missing something. I keep reading that you should use a join instead of a sub-select in most articles I read. However running a quick experiment myself shows a big win for the sub-query when it comes down to execution time.
Trying to get all first names of people that have made a bid (I presume the tables speak for themselves) results in the follwing.
This join takes 10 seconds
select U.firstname
from Bid B
inner join [User] U on U.userName = B.[user]
This query with sub-query takes 3 seconds
select firstname
from [User]
where userName in (select [user] from bid)
Why is my experiment not in line with what I keep reading everywhere or am I missing something?
Experimenting on I found that execution times are the same after adding distinct to both.
They're not the same thing. In the query with joins you can potentially multiply rows or have rows entirely removed from the results.
Inner Join removes rows on non-matched keys. It also multiplies rows on any matched keys that repeat in either one or both tables being joined. Inner Join therefor goes through the additional step of multiplying and removing rows.
The subquery you used is a SELECT. Since there are no filters using a WHERE it is as fast as a simple SELECT and since there are no joins you get results as fast as the results can be selected.
Some may argue that Outer joins return NULLs similar to sub-queries- but they can still multiply rows. Hence, sub-queries and joins are not the same thing.
In the queries you provided, you want to use the 2nd query (the one with the subquery) since it doesn't multiply or remove rows.
Good Read for Subquery vs Inner Join
https://www.essentialsql.com/subquery-versus-inner-join/
I am working in Microsoft SQL Server 2012.
I run this query:
select * from tblbill
^Returns four rows. Particularly 4 distinct values of my field of interest paymentduedate^
I run a second query:
select b.paymentduedate, ledgertypeid, l.Billid
from tblbill as b
join tblledger as l on b.billid = l.billid
^^Returns twenty rows with values ofb.paymentduedate that are not returned when I run the elect *. paymentduedate is not a column in tblledger.
How is this possible? My first guess is that somehow rows in tblBill may be hidden but I do now know how to check that.
There could be few reasons:
There are 20 records with matching billid in table tblledger (the so called duplicate records came from the same 4 records in tblbill, you should count distinct values to determine if there are duplicates)
after you ran the first query, that data was changed.
Any way there is no such thing as hidden records
when you join you get all options. use inner, left or right join
Here is my current query
Screenshot of my form:
SELECT * FROM jdsubs
INNER JOIN amipartnumbers ON amipartnumbers.oemitem = jdsubs.oempartnumber
WHERE ((([txtEnterNumber])
In ([jdsubs].[oemsubnumber],[jdsubs].[oempartnumber])));
UNION SELECT * FROM ihsubs
INNER JOIN amipartnumbers ON amipartnumbers.oemitem = ihsubs.oempartnumber
WHERE ((([txtEnterNumber])
In ([ihsubs].[oemsubnumber],[ihsubs].[oempartnumber])));
UNION SELECT * FROM mfsubs
INNER JOIN amipartnumbers ON amipartnumbers.oemitem =mfsubs.oempartnumber
WHERE ((([txtEnterNumber])
In ([mfsubs].[oemsubnumber],[mfsubs].[oempartnumber])));
Can I simplify this to just do a union on one query then on another query i can compare txtEnterNumber to oemsubnumber and oempartnumber?
I feel like this one query is doing too much work.
Or am i doing this right?
I'm searching about a millions records so I want to make sure this is efficient as possible
You'll have to run it as is. Assuming oemitem, oempartnumber, & oemsubnumber are all indexed, as they should be.
If you union everything first, then try compare your part numbers, you'll be doing so against an un-indexed query result.
A couple of ideas for improvement are:
If a part number can match only match just one parts table, then do each query one
at a time until you get a result back.
Combine all three of your part tables (setting 1 field as a flag to
determine part origin), then run your search against that table.
Good luck
Is there a way of joining results from 2 tables without using JOIN or SELECT from more than one table? The reason being the database im working with requires queries that only contain SELECT, FROM, and WHERE clauses containing only one distinct table. I do, however, need information from other tables for the project i'm working on.
More info: the querier returns the query results in a .csv format, is there something we can manipulate there?
Pick a programming language. Any language will do. Run one query, and get the results. Run another query, get the results. Use the programming language to combine the results.
Seriously. You are asking how to join data from a database that doesn't support joins. If the database doesn't support it, you're going to have to do it externally.
select a.*, b.* from tablea a, tableb b
You can do it either by Using JOIN or SELECT. You have to use one of it. By Join you must be knowing. I am writing an example for without using JOIN and just using SELECT, to join two tables.
SELECT * from Table1, Table2 where Table1.common_attribute = Table2.common_attribute;