SQL referring to same table in a query Stuck - sql

I have a query that requires me to join/refers to the same table, however, I am unable to get a result using the query.
Below is a sample of my query
SELECT a."column1", b."column1" as anotherColumn
FROM table1 AS a, table2 AS b
where a.'x' = b.'x'
AND NOT a.'y' = b.'y'
This query take forever to load. However, if I just run:
SELECT a."column1"
FROM table1 AS A
it only takes 14sec.
I'm currently using PostgreSQL with Pgadmin. table1 has 1.4million table currently.
Is it because there is a lock on the table 1 when it was first referred to as a?
EDIT : Each row contains the record of "author","book published" and in this case, there might be many authors for a book hence being collaborators. What I am trying to achieve is to find out the number of collaborators for each author

What I am trying to achieve is to find out the number of collaborators for each author
Something like this would count the number of authors, and I guess where that number is greater than 1, the number of collaborators is that number - 1
select b.name, count(a.*)-1 as num_collaborators
from books b
inner join authors a on b.id = a.book_id
group by b.name
having count(a.*) > 1
--original
SELECT a."column1", b."column1" as anotherColumn
FROM table1 AS a, table2 AS b
;
--amended
SELECT a."column1", b."column1" as anotherColumn
FROM table1 AS a, table2 AS b
where a.'x' = b.'x'
AND NOT a.'y' = b.'y'
Over 25 years ago ANSI standards for SQL introduced a more "explicit" syntax for joins and using this is well established as "best practice" now.
One of the greatest benefits of this "explicit join syntax" is that accidentally forgetting to join properly becomes impossible, unlike the original query which did forget the joining predicate. (& When that happens an unexpected Cartesian product is produced.)
So, I encourage you to stop using commas between table names. Taking that simple step will help you use better join syntax.

Related

Oracle: Use only few tables in WHERE clause but mentioned more tables in 'FROM' in a jon SQL

What will happen in an Oracle SQL join if I don't use all the tables in the WHERE clause that were mentioned in the FROM clause?
Example:
SELECT A.*
FROM A, B, C, D
WHERE A.col1 = B.col1;
Here I didn't use the C and D tables in the WHERE clause, even though I mentioned them in FROM. Is this OK? Are there any adverse performance issues?
It is poor practice to use that syntax at all. The FROM A,B,C,D syntax has been obsolete since 1992... more than 30 YEARS now. There's no excuse anymore. Instead, every join should always use the JOIN keyword, and specify any join conditions in the ON clause. The better way to write the query looks like this:
SELECT A.*
FROM A
INNER JOIN B ON A.col1 = B.col1
CROSS JOIN C
CROSS JOIN D;
Now we can also see what happens in the question. The query will still run if you fail to specify any conditions for certain tables, but it has the effect of using a CROSS JOIN: the results will include every possible combination of rows from every included relation (where the "A,B" part counts as one relation). If each of the three parts of those joins (A&B, C, D) have just 100 rows, the result set will have 1,000,000 rows (100 * 100 * 100). This is rarely going to give the results you expect or intend, and it's especially suspect when the SELECT clause isn't looking at any of the fields from the uncorrelated tables.
Any table lacking join definition will result in a Cartesian product - every row in the intermediate rowset before the join will match every row in the target table. So if you have 10,000 rows and it joins without any join predicate to a table of 10,000 rows, you will get 100,000,000 rows as a result. There are only a few rare circumstances where this is what you want. At very large volumes it can cause havoc for the database, and DBAs are likely to lock your account.
If you don't want to use a table, exclude it entirely from your SQL. If you can't for reason due to some constraint we don't know about, then include the proper join predicates to every table in your WHERE clause and simply don't list any of their columns in your SELECT clause. If there's a cost to the join and you don't need anything from it and again for some very strange reason can't leave the table out completely from your SQL (this does occasionally happen in reusable code), then you can disable the joins by making the predicates always false. Remember to use outer joins if you do this.
Native Oracle method:
WITH data AS (SELECT ROWNUM col FROM dual CONNECT BY LEVEL < 10) -- test data
SELECT A.*
FROM data a,
data b,
data c,
data d
WHERE a.col = b.col
AND DECODE('Y','Y',NULL,a.col) = c.col(+)
AND DECODE('Y','Y',NULL,a.col) = d.col(+)
ANSI style:
WITH data AS (SELECT ROWNUM col FROM dual CONNECT BY LEVEL < 10)
SELECT A.*
FROM data a
INNER JOIN data b ON a.col = b.col
LEFT OUTER JOIN data c ON DECODE('Y','Y',NULL,a.col) = b.col
LEFT OUTER JOIN data d ON DECODE('Y','Y',NULL,a.col) = d.col
You can plug in a variable for the first Y that you set to Y or N (e.g. var_disable_join). This will bypass the join and avoid both the associated performance penalty and the Cartesian product effect. But again, I want to reiterate, this is an advanced hack and is probably NOT what you need. Simply leaving out the unwanted tables it the right approach 95% of the time.

SQL Query returns more

I'm having a bit of a problem with a SQL Query that returns too many results. I'm fairly new to SQL so please bear with me.
Please see the following:
Table Structures
The Query that I use looks like:
SELECT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'
I just want the results from Table_B and not what it's giving me.
Please explain this to me as I have spent 3 days on it non-stop.
What am I missing?
You want data from TABLE_B? Then select from it only and have the conditions on the other tables in your where clause.
The inner joins on the other tables serve as existence tests, I assume? Don't do that. You'd only multiply your records, just as you are doing now, only to have to dismiss duplicates later. That can cause bad performance on large tables and errors in more complicated queries. Use EXISTS or IN instead.
select *
from table_b
where item_status <> 'C'
and (common_id, seq_3c) in
(
select common_id, seq_3c
from table_a
where checklist_status = 'I'
and admin_function = 'ADMA'
and checklist_cd = 'APPL'
)
and common_id in
(
select EMPLID
from table_c
where admit_term = '2171'
and institution = 'SOMEWHERE'
);
SELECT DISTINCT TABLE_B.*
FROM
TABLE_A
JOIN
TABLE_B
ON
TABLE_A.COMMON_ID=TABLE_B.COMMON_ID
AND TABLE_A.SEQ_3C=TABLE_B.SEQ_3C
JOIN
TABLE_C
ON
TABLE_A.COMMON_ID=TABLE_C.EMPLID
WHERE
TABLE_B.ITEM_STATUS<>'C'
and TABLE_A.CHECKLIST_STATUS='I'
and TABLE_A.ADMIN_FUNCTION='ADMA'
and TABLE_A.CHECKLIST_CD='APPL'
and TABLE_A.COMMON_ID = '123456789'
and TABLE_C.ADMIT_TERM='2171'
and TABLE_C.INSTITUTION='SOMEWHERE'
This should be easy to understand without looking at all your tables and output.
Suppose you join two tables, A and B, on a column id. You only want the columns from table B, and in table B the `id' column is a unique identifier.
Even so, if in table A an id (the same id) appears five times, the join will have five rows for that id. Then you just select the columns from table B, so it will look like you got the same row five different times.
Perhaps you don't really need a join? What is your underlying problem you are trying to solve?
It's hard to answer this question without more information about why you're executing these joins. I can explain why you're getting the results you're getting, and hopefully that will allow you to solve the problem yourself.
You start, in your FROM clause, with table A. You join this table with table B on matching COMMON_ID, which, based on the tables you provide, returns three matches for the one record you have in table A. This increases your result set size to three records. Next, you join these three records with table C, on matching ID. Because all ID's are, in fact, identical, this returns nine matches for every record in your current result set: you now have 9 x 3 = 27 records in your result set.
Finally, the WHERE clause comes into effect. This clause excludes 6 out of 9 records in table C, so you have 3 of those records left. Your final result set is therefore 1 (table A) x 3 (table B) x 3 (table C) = 9 records.

How do I put multiple criteria for a column in a where clause?

I have five results to retrieve from a table and I want to write a store procedure that will return all desired rows.
I can write the query like that temporarily:
Select * from Table where Id = 1 OR Id = 2 or Id = 3
I supposed I need to receive a list of Ids to split, but how do I write the WHERE clause?
So, if you're just trying to learn SQL, this is a short and good example to get to know the IN operator. The following query has the same result as your attempt.
SELECT *
FROM TABLE
WHERE ID IN (SELECT ID FROM TALBE2)
This translates into what is your attempt. And judging by your attempt, this might be the simplest version for you to understand. Although, in the future I would recommend using a JOIN.
A JOIN has the same functionality as the previous code, but will be a better alternative. If you are curious to read more about JOINs, here are a few links from the most important sources
Joins - wikipedia
and also a visual representation of how different types of JOIN work
Another way to do it. The inner join will only include rows from T1 that match up with a row from T2 via the Id field.
select T1.* from T1 inner join T2 on T1.Id = T2.Id
In practice, inner joins are usually preferable to subqueries for performance reasons.

SQL Method of checking that INNER / LEFT join doesn't duplicate rows

Is there a good or standard SQL method of asserting that a join does not duplicate any rows (produces 0 or 1 copies of the source table row)? Assert as in causes the query to fail or otherwise indicate that there are duplicate rows.
A common problem in a lot of queries is when a table is expected to be 1:1 with another table, but there might exist 2 rows that match the join criteria. This can cause errors that are hard to track down, especially for people not necessarily entirely familiar with the tables.
It seems like there should be something simple and elegant - this would be very easy for the SQL engine to detect (have I already joined this source row to a row in the other table? ok, error out) but I can't seem to find anything on this. I'm aware that there are long / intrusive solutions to this problem, but for many ad hoc queries those just aren't very fun to work out.
EDIT / CLARIFICATION: I'm looking for a one-step query-level fix. Not a verification step on the results of that query.
If you are only testing for linked rows rather than requiring output, then you'd use EXISTS.
More correctly, you need a "semi-join" but this isn't supported by most RDBMS unless as EXISTS
SELECT a.*
FROM TableA a
WHERE EXISTS (SELECT * FROM TableB b WHERE a.id = b.id)
Also see:
Using 'IN' with a sub-query in SQL Statements
EXISTS vs JOIN and use of EXISTS clause
SELECT JoinField
FROM MyJoinTable
GROUP BY JoinField
HAVING COUNT(*) > 1
LIMIT 1
Is that simple enough? Don't have Postgres but I think it's valid syntax.
Something along the lines of
SELECT a.id, COUNT(b.id)
FROM TableA a
JOIN TableB b ON a.id = b.id
GROUP BY a.id
HAVING COUNT(b.id) > 1
Should return rows in TableA that have more than one associated row in TableB.

Is it possible for the data to be returned in different column order between execution when you run SELECT * FROM multiple table joins multiple times?

For example I have the following tables resulting from:
CREATE TABLE A (Id int, BId int, Number int)
CREATE TABLE B (Id int, Number decimal(10,2))
GO
INSERT INTO A VALUES(1, 3, 10)
INSERT INTO B VALUES(3, 50)
INSERT INTO A VALUES(2, 5, 20)
INSERT INTO B VALUES(5, 671.35)
GO
And I run the following query multiple times:
SELECT * FROM A INNER JOIN B ON A.BId = B.Id
I should get something like:
ID BId Number ID Number
1 3 10 3 50.00
2 5 20 5 671.35
But is it possible for A.Number and column B.Number be in different position (also ID in that respect) so I'll get something like:
ID Number ID BId Number
3 50.00 1 3 10
5 671.35 2 5 20
We are currently experiencing some weird problem that might be resulting from something like this. We have an ASP.NET application, executing a custom reflection based code generated data mapper that is connecting to SQL Server 2008 cluster.
We found sometimes that we get an error like so:
Object of type 'System.Decimal' cannot be converted to type 'System.Int32'
Trying to figure out if this is a behaviour in SQL Server or it's something in the reflection based data mapper engine.
As you can see the field names are the same in the two tables. Thinking perhaps when we tried to do DataReader.GetValue(DataReader.GetOrdinal("Number")), it will return B.Number which is a decimal instead of A.Number which is an int.
To complicate the matter further, this only happen intermittently (but consistently after it happened once on a particular IIS web server in a web farm only).
Say Web Server A is doing okay up until 2:00pm and suddenly, we got this error and we'll keep getting that error on Web Server A until we reset IIS on that server and then it will be okay again.
Something to do w/ connection pooling and how SQL query plan cache perhaps?
The second scenario is possible only if you interchange table orders.
Something like SELECT * FROM B INNER JOIN A ON A.BId = B.Id.
Otherwise its not possible.
SQL is a relational algebra - the standard does not specify what order columns will be returned in if you don't explicitly state the order yourself.
I tend to avoid "select *" as much as possible since it can clog up the network with unnecessary traffic and makes it harder to catch things like column renames and ordering until it's too late.
Just select the columns you actually need.
For your specific case, I would also just return the shared ID once since it has to be equal due to your join (I tend to prefer the "old" style as the DBMS should be smart enough to optimize this to an inner join anyway):
select
a.Id as Id,
a.BId as BId,
a.Number as Number,
b.Number as BNumber
from
a, b
where
a.BId = b.Id
The second scenario is possible only if you interchange table orders.
Order of joins has nothing to do with position of columns, except if you use simple SELECT * which is not recommended.
Even in that case you can use
SELECT B., A. without changing order of joins
The best and recommended solution is to put column names instead of * (as posted in 1st two answers)
-- added later
I forgot one more thing I wanted to point on: Use column aliases for the columns with the same names e.g.
Select A.Number as NumberA, B.Number as NumberB
I don't think I've ever seen it swap the order, but personally I don't use "SELECT *" in production code, either:
SELECT A.[Id], A.[Bid], A.[Number], B.[Id], b.[Number]
FROM A INNER JOIN B ON A.[BId] = B.[Id]
Or at worst:
SELECT A.*, B.* FROM A INNER JOIN B ON A.[BId] = B.[Id]