SQL Statements Display More Rows Than Table Has

SQL Statements Display More Rows Than Table Has - sql

So I've just come across something interesting, and I don't know if it's been answered before, as I have no clue what it's called. Lets say that you have two tables; the first has one row and the second has two rows. If you run the following statment:
SELECT t1.*
FROM table1 t1, table2 t2
it returns two rows, and both have the same value, but the first table only has one row! Why does this occur? I didn't think having another table in the from clause changed anything if you didn't change the select clause accordingly.

You are selecting a cartesian product of the two tables.
It will return COUNT(t1) * COUNT(t2) records: all possible combinations of records from t1 with records from t2.
Using ANSI syntax, your query would read as:
SELECT t1.*
FROM table1 t1
CROSS JOIN
table2 t2

Related

Oracle Toad SQL queries leading to inconsistent id counts

I'm using Oracle Toad with SQL commands in the editor window.
I created two new tables (PIDS1 and PIDS2) that contain only one column of ID numbers from two related tables.
I had expected that PIDS2 would contain a superset of the ID's in PIDS1. When I tried to identify the ID's in PIDS2 that are not in PIDS1, I got started on a wild goose chase.
Let's say that it is a given that there is something unexpected going on with the data in my tables. But I cannot make any sense of the two simplified queries described below. The numbers are inconsistent. Can someone explain what is going on?
-- PIDS1 IS A SINGLE-COLUMN TABLE THAT CONTAINS 1638061 DISTINCT ID'S
-- PIDS2 IS A SINGLE-COLUMN TABLE THAT CONTAINS 3510272 DISTINCT ID'S
SELECT COUNT(T2.ID)
FROM PIDS2 T2
WHERE T2.ID NOT IN (
SELECT T1.ID
FROM PIDS1 T1);
-- RESULT IS ZERO!
-- WTF? PIDS2 HAS MORE ID'S THAN PIDS1!
SELECT COUNT(T1.ID)
FROM PIDS1 T1
WHERE T1.ID NOT IN (
SELECT T2.ID
FROM PIDS2 T2);
-- RESULT IS 786690
-- WHERE DID THAT NUMBER COME FROM? LOOKS ARBITRARY

Never use NOT IN with a subquery. If any of the values returned by the subquery are NULL, then all rows are filtered out.
For this reason, I always advise NOT EXISTS:
SELECT T2.ID
FROM PIDS2 T2
WHERE NOT EXISTS (SELECT 1 FROM PIDS1 T1 WHERE t1.ID = T2.ID);
Of course, you can also add WHERE t1.ID IS NOT NULL to the NOT IN version. In my experience, you'll forget it at some inopportune time in the future. Just use NOT EXISTS.

sql - what's the faster/better way to refer to columns in a where clause with inner joins?

Say I've got a query like this:
select table1.id, table1.name
from table1
inner join table2 on table1.id = table2.id
where table1.name = "parent" and table2.status = 1
Is it true that, since there's an inner join, I can refer the table2's status column even from table1? Like this:
select table1.id, table1.name
from table1
inner join table2 on table1.id = table2.id
where table1.name = "parent" and table1.status = 1
And if yes, what's the best of the two ways?

If I am not mistaken, you are asking that in an inner join, two fields of the same name, data type and length will be one field in the particular query. Technically that is not the case. Regardless of anything, Table1.Status will refer to Table1 and Table2.Status will refer to Table2's condition/value.
The two queries above CAN product different results from each other.
A good rule on this is that you stick your conditions on the base table, or Table1, in this case. If a field is exclusive to another table, that's when you'll use that Table's field.

No, that's not true. By Inner join what you are doing is say if you have table1 with m rows and table two with n rows then the third SET that will be produced by joining the two tables will have m*n rows based on match condition that you have mentioned in where clause. It's not m+n rows or infact columns of the two tables are not getting merged at database level. status column will remain in the table it has been defined.
Hope that helps!!!

You can see this is not the case if you do
CREATE TABLE table1 (id INT, name VARCHAR);
CREATE TABLE table2 (id INT, status INT);
Now if you run your second query you will get an error because you refer to t1.status, and the status column does not existing in table t1.
If there was a status field in both tables the query would run, but likely would not give the results you want e.g. assume status in table1 was always 1, and in table2 was always 0. Now your first query could never return rows, but your second one certainly could return rows.

In SQL Server, how to filter lots of elements across multiple columns

I have a table, t1, with columns such as name, code1, code2,..., code20
There are, say, 100K rows.
I have another look up table, t2, which has one column, code; it has 10k rows and each row has a code. So, totally there are 10K codes in this 1-column table.
I need to filter out all the rows in t1 that have the codes in t2 from any column, i.e. columns code1 to code20. In other words, in each row in t1, once a column has one of the codes in t2, it should be captured.
Is there an easy way to do this? Thanks a lot!

Here is a way to do it using not exists:
select t1.*
from t1
where not exists (select 1
from t2
where t2.code = t1.code1 or
t2.code = t1.code2 or
. . .
t2.code = t1.code20
);
It is tempting to use in as the condition in the nested select, but this behaves in a funky way with NULLs. The sequence of direct comparisons is easier.
That said, having 20 columns with the same type of data is usually a sign of poor table design. More typically, the data would be in some sort of association/junction table, with the 20 columns each appearing in their own row.

Sounds like you need to pivot the data in Table t1 then join on t2.
So instead of t1 where you have name, code1, code2,...Code 20 you would pivot t1 to
just Name and Code columns then join on t2.
Alternatively you could just perform separate joins of t1 on t2 for each of t2's columns Code 1 to 20 and union the result.
That's if I understand your problem correctly.

Problem with sql query

I'm using MySQL and I'm trying to construct a query to do the following:
I have:
Table1 [ID,...]
Table2 [ID, tID, start_date, end_date,...]
What I want from my query is:
Select all entires from Table2 Where Table1.ID=Table2.tID
**where at least one** end_date<today.
The way I have it working right now is that if Table 2 contains (for example) 5 entries but only 1 of them is end_date< today then that's the only entry that will be returned, whereas I would like to have the other (expired) ones returned as well. I have the actual query and all the joins working well, I just can't figure out the ** part of it.
Any help would be great!
Thank you!

SELECT * FROM Table2
WHERE tID IN
(SELECT Table2.tID FROM Table1
INNER JOIN Table2 ON Table1.ID = Table2.tID
WHERE Table2.end_date < NOW
)
The subquery will select all tId's that match your where clause. The main query will use this subquery to filter the entries in table 2.
Note: the use of inner join will filter all rows from table 1 with no matching entry in table 2. This is no problem; these entries wouldn't have matched the where clause anyway.

Maybe, just maybe, you could create a sub-query to join with your actual tables and in this subquery you use a count() which can be used later on you where clause.

Select proper columns from JOIN statement

I have two tables: table1, table2. Table1 has 10 columns, table2 has 2 columns.
SELECT * FROM table1 AS T1 INNER JOIN table2 AS T2 ON T1.ID = T2.ID
I want to select all columns from table1 and only 1 column from table2. Is it possible to do that without enumerating all columns from table1 ?

Yes, you can do the following:
SELECT t1.*, t2.my_col FROM table1 AS T1 INNER JOIN table2 AS T2 ON T1.ID = T2.ID

Even though you can do the t1.*, t2.col1 thing, I would not recommend it in production code.
I would never ever use a SELECT * in production - why?
you're telling SQL Server to get all columns - do you really, really need all of them?
by not specifying the column names, SQL Server has to go figure that out itself - it has to consult the data dictionary to find out what columns are present which does cost a little bit of performance
most importantly: you don't know what you're getting back. Suddenly, the table changes, another column or two are added. If you have any code which relies on e.g. the sequence or the number of columns in the table without explicitly checking for that, your code can brake
My recommendation for production code: always (no exceptions!) specify exactly those columns you really need - and even if you need all of them, spell it out explicitly. Less surprises, less bugs to hunt for, if anything ever changes in the underlying table.

Use table1.* in place of all columns of table1 ;)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Statements Display More Rows Than Table Has - sql

You are selecting a cartesian product of the two tables. It will return COUNT(t1) * COUNT(t2) records: all possible combinations of records from t1 with records from t2. Using ANSI syntax, your query would read as: SELECT t1.* FROM table1 t1 CROSS JOIN table2 t2

Related

Oracle Toad SQL queries leading to inconsistent id counts

sql - what's the faster/better way to refer to columns in a where clause with inner joins?

In SQL Server, how to filter lots of elements across multiple columns

Problem with sql query

Select proper columns from JOIN statement

Categories

Resources