How to efficiently retrieve data in one to many relationships - sql

I am running into an issue where I have a need to run a Query which should get some rows from a main table, and have an indicator if the key of the main table exists in a subtable (relation one to many).
The query might be something like this:
select a.index, (select count(1) from second_table b where a.index = b.index)
from first_table a;
This way I would get the result I want (0 = no depending records in second_table, else there are), but I'm running a subquery for each record I get from the database. I need to get such an indicator for at least three similar tables, and the main query is already some inner join between at least two tables...
My question is if there is some really efficient way to handle this. I have thought of keeping record in a new column the "first_table", but the dbadmin don't allow triggers and keeping track of it by code is too risky.
What would be a nice approach to solve this?
The application of this query will be for two things:
Indicate that at least one row in second_table exists for a given row in first_table. It is to indicate it in a list. If no row in the second table exists, I won't turn on this indicator.
To search for all rows in first_table which have at least one row in second_table, or which don't have rows in the second table.
Another option I just found:
select a.index, b.index
from first_table a
left join (select distinct(index) as index from second_table) b on a.index = b.index
This way I will get null for b.index if it doesn' exist (display can finally be adapted, I'm concerned on query performance here).
The final objective of this question is to find a proper design approach for this kind of case. It happens often, a real application culd be a POS system to show all clients and have one icon in the list as an indicator wether the client has open orders.

Try using EXISTS, I suppose, for such case it might be better then joining tables. On my oracle db it's giving slightly better execution time then the sample query, but this may be db-specific.
SELECT first_table.ID, CASE WHEN EXISTS (SELECT * FROM second_table WHERE first_table.ID = second_table.ID) THEN 1 ELSE 0 END FROM first_table

why not try this one
select a.index,count(b.[table id])
from first_table a
left join second_table b
on a.index = b.index
group by a.index

Two ideas: one that doesn't involve changing your tables and one that does. First the one that uses your existing tables:
SELECT
a.index,
b.index IS NOT NULL,
c.index IS NOT NULL
FROM
a_table a
LEFT JOIN
b_table b ON b.index = a.index
LEFT JOIN
c_table c ON c.index = a.index
GROUP BY
a.index, b.index, c.index
Worth noting that this query (and likely any that resemble it) will be greatly helped if b_table.index and c_table.index are either primary keys or are otherwise indexed.
Now the other idea. If you can, instead of inserting a row into b_table or c_table to indicate something about the corresponding row in a_table, indicate it directly on the a_table row. Add exists_in_b_table and exists_in_c_table columns to a_table. Whenever you insert a row into b_table, set a_table.exists_in_b_table = true for the corresponding row in a_table. Deletes are more work since in order to update the a_table row you have to check if there are any rows in b_table other than the one you just deleted with the same index. If deletes are infrequent, though, this could be acceptable.

Or you can avoid join altogether.
WITH comb AS (
SELECT index
, 'N' as exist_ind
FROM first_table
UNION ALL
SELECT DISTINCT
index
, 'Y' as exist_ind
FROM second_table
)
SELECT index
, MAX(exist_ind) exist_ind
FROM comb
GROUP BY index

The application of this query will be for two things:
Indicate that at least one row in second_table exists for a given row in first_table. It is to indicate it in a list.
To search for all rows in first_table which have at least one row in second_table.
Here you go:
SELECT a.index, 1 as c_check -- 1: at least one row in second_table exists for a given row in first_table
FROM first_table a
WHERE EXISTS
(
SELECT 1
FROM second_table b
WHERE a.index = b.index
);

I am assuming that you can't change the table definitions, e.g. partitioning the columns.
Now, to get a good performance you need to take into account other tables which are getting joined to your main table.
It all depends on data demographics.
If the other joins will collapse the rows by high factor, you should consider doing a join between your first table and second table. This will allow the optimizer to pick best join order , i.e, first joining with other tables then the resulting rows joined with your second table gaining the performance.
Otherwise, you can take subquery approach (I'll suggest using exists, may be Mikhail's solution).
Also, you may consider creating a temporary table, if you need such queries more than once in same session.

I am not expert in using case, but will recommend the join...
that works even if you are using three tables or more..
SELECT t1.ID,t2.name, t3.date
FROM Table1 t1
LEFT OUTER JOIN Table2 t2 ON t1.ID = t2.ID
LEFT OUTER JOIN Table3 t3 ON t2.ID = t3.ID
--WHERE t1.ID = #ProductID -- this is optional condition, if want specific ID details..
this will help you fetch the data from Normalized(BCNF) tables.. as they always categorize data with type of nature in separate tables..
I hope this will do...

Related

Compare 2 Tables and write RowID from Table A to Table B

I have Table A with RowID, Date, Vendor and Cost, Confirmed
I have table B with RowID, Date, Vendor and Cost, Confirmed
Table A list our purchases.
Table B list the statement data from the credit card.
I would like to compare Date, Vendor and Cost in Table A with the same columns in Table B. If there is a match with those three columns, then I would like to take the RowID value from Table A and write it to the matching row in Table B under the Confirmation column.
I am very new to SQL and I am not even sure this is a reasonable expectation.
What do you think?
Is this enough detail to provide your opinion?
Thank you for any help you can give.
Currently I am using an Outer Right Join to get all the rows that do NOT have a match. What I really need is the opposite.
It might help to know what database engine you are using...
My answers are going to relate to MS SQL Server, but much SQL Syntax is the same...
To answer your first question, I would write something like:
Update TableB Set Confirmation = TableA.RowID From TableA
Where TableA.Vendor = TableB.Vendor
And TableA.Cost = TableB.Cost
And TableA.Date = TableB.Date
I would use aliases, but I left them out to hopefully make it easier to understand.
To answer your second question, you can specify an INNER JOIN which is the opposite of an OUTER JOIN and as you mentioned, what you are looking for, as it will return ALL rows that Match and exclude the rest.
You can do this with this query
UPDATE
B
SET
Confirmation = A.RowID
FROM
TableA A
INNER JOIN TableB B
ON B.Vendor = A.Vendor
AND B.Cost = A.Cost
AND B.Date = A.Date
Basically we do an inner join to keep the intersection (the records that match) of the two tables. and update the records that coincided from table B with the id of table A

SQL joining without common keys

If I have a table with the following atributes:
A: id, race, key1
B: key1, driving_id
C: driving_id, fines
why would it be possible for us to have the following queries:
select A.id, A.race, B.key1, B.driving_id, C.fines
from A
left join B on A.key1=B.key1
left join C on B.driving_id= C.driving_id
even though there are no common keys for A and C in the last line of the SQL query?
The query that you have written is parsed as:
select A.id, A.race, B.key1, B.driving_id, C.fines
from (A left join
B
on A.key1 = B.key1
) left join
C
on B.driving_id = C.driving_id;
That is, C is -- logically -- being joined to the result of A and B. Any keys from those tables would be valid.
Although your original query is the preferable way to write it, you could also write:
select ab.id, ab.race, ab.key1, ab.driving_id, C.fines
from (select . . . -- whatever columns you need
from A left join
B
on A.key1 = B.key1
) ab left join
C
on ab.driving_id = C.driving_id;
The three versions are all equivalent, but the last one may help you better understand what is going on with joins between multiple tables.
Without seeing sample data from the three tables, we might not know for sure in the query makes any sense or would even run. Assuming it does run, then there should be nothing wrong with the join logic. For example, it is perfectly possible for table B to have a key key1 which relates to the A table, while at the same time having another key driving_id which relates to the C table. Note that either of these keys (but not both) could be a primary key in the B table, and if not then each key would be a foreign key.
The LEFT JOIN keyword returns all records from the left table (tableA), and the matched records from the right table (tableB). Furthermore, Similarly it returns all records from the result of first set, and the matched records from the right table (tableC). The result is NULL from the right side, if there is no match.
So A & C have a link through table B.

SQL Join between tables with conditions

I'm thinking about which should be the best way (considering the execution time) of doing a join between 2 or more tables with some conditions. I got these three ways:
FIRST WAY:
select * from
TABLE A inner join TABLE B on A.KEY = B.KEY
where
B.PARAM=VALUE
SECOND WAY
select * from
TABLE A inner join TABLE B on A.KEY = B.KEY
and B.PARAM=VALUE
THIRD WAY
select * from
TABLE A inner join (Select * from TABLE B where B.PARAM=VALUE) J ON A.KEY=J.KEY
Consider that tables have more than 1 milion of rows.
What your opinion? Which should be the right way, if exists?
Usually putting the condition in where clause or join condition has no noticeable differences in inner joins.
If you are using outer joins ,putting the condition in the where clause improves query time because when you use condition in the where clause of
left outer joins, rows which aren't met the condition will be deleted from the result set and the result set becomes smaller.
But if you use the condition in join clause of left outer joins ,no rows deletes and result set is bigger in comparison to using condition in the where clause.
for more clarification,follow the example.
create table A
(
ano NUMBER,
aname VARCHAR2(10),
rdate DATE
)
----A data
insert into A
select 1,'Amand',to_date('20130101','yyyymmdd') from dual;
commit;
insert into A
select 2,'Alex',to_date('20130101','yyyymmdd') from dual;
commit;
insert into A
select 3,'Angel',to_date('20130201','yyyymmdd') from dual;
commit;
create table B
(
bno NUMBER,
bname VARCHAR2(10),
rdate DATE
)
insert into B
select 3,'BOB',to_date('20130201','yyyymmdd') from dual;
commit;
insert into B
select 2,'Br',to_date('20130101','yyyymmdd') from dual;
commit;
insert into B
select 1,'Bn',to_date('20130101','yyyymmdd') from dual;
commit;
first of all we have normal query which joins 2 tables with each other:
select * from a inner join b on a.ano=b.bno
the result set has 3 records.
now please run below queries:
select * from a inner join b on a.ano=b.bno and a.rdate=to_date('20130101','yyyymmdd')
select * from a inner join b on a.ano=b.bno where a.rdate=to_date('20130101','yyyymmdd')
as you see above results row counts have no differences,and According to my experience there is no noticeable performance differences for data in large volume.
please run below queries:
select * from a left outer join b on a.ano=b.bno and a.rdate=to_date('20130101','yyyymmdd')
in this case,the count of output records will be equal to table A records.
select * from a left outer join b on a.ano=b.bno where a.rdate=to_date('20130101','yyyymmdd')
in this case , records of A which didn't met the condition deleted from the result set and as I said the result set will have less records(in this case 2 records).
According to above examples we can have following conclusions:
1-in case of using inner joins,
there is no special differences between putting condition in where clause or join clause ,but please try to put tables in from clause in order to have minimum intermediate result row counts:
(http://www.dba-oracle.com/art_dbazine_oracle10g_dynamic_sampling_hint.htm)
2-In case of using outer joins,whenever you don't care of exact result row counts (don't care of missing records of table A which have no paired records in table B and fields of table B will be null for these records in the result set),put the condition in the where clause to delete a set of rows which aren't met the condition and obviously improve query time by decreasing the result row counts.
but in special cases you HAVE TO put the condition in the join part.for example if you want that your result row count will be equal to table 'A' row counts(this case is common in ETL processes) you HAVE TO put the condition in the join clause.
3-avoiding subquery is recommended by lots of reliable resources and expert programmers.It usually increase the query time and you can use subquery just when its result data set is small.
I hope this will be useful:)
1M rows really isn't that much - especially if you have sensible indexes. I'd start off with making your queries as readable and maintainable as possible, and only start optimizing if you notice a perforamnce problem with the query (and as Gordon Linoff said in his comment - it's doubtful there would even be a difference between the three).
It may be a matter of taste, but to me, the third way seems clumsy, so I'd cross it out. Personally, I prefer using JOIN syntax for the joining logic (i.e., how A and B's rows are matched) and WHERE for filtering (i.e., once matched, which rows interest me), so I'd go for the first way. But again, it really boils down to personal taste and preferences.
You need to look at the execution plans for the queries to judge which is the most computationally efficient. As pointed out in the comments you may find they are equivalent. Here is some information on Oracle execution plans. Depending on what editor / IDE you use the may be a shortcut for this e.g. F5 in PL/SQL Developer.

How to get a related row if one (another) row exists?

I'm aware that this question's title might be a little bit inaccurate but I couldn't come up with anything better. Sorry.
I have to fetch 2 different fields, one is always there, the other isn't. That means I'm looking at a LEFT JOIN. Good so far.
But the row I want shown is not the row whose existence is uncertain.
I would like to do something like:
Show name and picture, but only show the picture if that name has a picture_id. Otherwise show nothing for the picture, but I still want the names regardless(left join).
I know this might be a little confusing but there's some clever guys out here so I guess somebody will understand it.
I tried some approaches but I couldn't quite say what I want in SQL.
P.S.: solutions specific to Oracle are good too.
------------------------------------------------------------------------------------------------------------------------------------
EDIT I've tried some queries but the main problem I found is that, inside the ON clause, I am only able to reference the last table mentioned, in other words:
There are four tables from which I'm retrieving data, but I can only mention the last (third table) inside the on clause of the LEFT JOIN(which is the 4th table). I'll describe the tables hopefully that'll help. Try not to delve too much on the names, because they are in Portuguese:
There are 4 tables. The fields I want to retrieve are :TB395.dsclaudo and TB397.dscrecomendacao, for a given TB392.nronip. The tables are as follows:
TB392(laudoid,nronip,codlaudo) // laudoid is PK, references TB395
TB395(codlaudo,dsclaudo) //codlaudo is PK
TB398(laudoid,codrecomendacao) //the pair laudoid,codrecomendacao is PK , references TB397
TB397(codrecomendacao,dscrecomendacao) // codrecomendacao is PK
Fields with the same name are foreign keys.
The problem is that there's no guarantee that, for a given laudoid,there will be one codrecomendacao. But, if there is, I want the dscrecomendacao field returned, that's what I don't know how to do. But even if there isn't a corresponding codrecomendacao for the laudoid, I still want the dsclaudo field, that's why I think a LEFT JOIN applies.
Sounds like you want your primary row source to be the join of TB392 and TB395; then you want an outer join to TB398, and when that gets a match, you want to lookup the corresponding value in TB397.
I would suggest coding the primary join as one inline view; the join between the two extra tables as a second inline view; and then doing an outer join between them. Something like:
SELECT ... FROM
(SELECT ... FROM TB392 JOIN TB395 ON ...) join1
LEFT JOIN
(SELECT ... FROM TB398 JOIN TB397 ON ...) join2
ON ...
It would be nice if you could specify what your tables are, which columns are on which tables, and what columns they join on. Its not clear if you have two tables or only one. I guess you have two tables because you are talking about a LEFT JOIN, and seem to imply that the join is on the name column. So you can use the NVL2 function to accomplish waht you want. So guessing what I can from your question, maybe something like:
SELECT T1.name
, NVL2( T2.picture_id, T1.picture, NULL )
FROM table1 T1
LEFT JOIN
table2 T2
ON T1.name = T2.name
If you only have one table, then its even simpler
SELECT T1.name
, NVL2( T1.picture_id, T1.picture, NULL )
FROM table1 T1
I think you need:
SELECT ...
FROM
TB395
JOIN
TB392
ON ...
LEFT JOIN --- this should be a LEFT JOIN
TB398
ON ...
LEFT JOIN --- and this as well, so the previous is not cancelled
TB397
ON ...
The details may be not accurate:
SELECT
a.dsclaudo
, b.laudoid
, c.codrecomendacao
, d.dscrecomendacao
FROM
TB395 a
JOIN
TB392 b
ON b.codlaudo = a.codlaudo
LEFT JOIN
TB398 c
ON c.laudoid = b.laudoid
LEFT JOIN
TB397 d
ON d.codrecomendacao = c.codrecomendacao
Create two views and then do your left join on the views. For example:
Create View view392_395
as
SELECT
t1.laudoid,
t1.nronip,
t1.codlaudo,
t2.dsclaudo
FROM TB392 t1
INNER JOIN TB395 t2
ON t1.codlaudo
= t2.codlaudo
Create View view398_397
as
SELECT
t1.laudoid,
t1.codrecomendacao,
t2.dscrecomendacao
FROM TB398 t1
INNER JOIN TB397 t2
ON t1.codrecomendacao
= t2.codrecomendacao
SELECT
v1.laudoid,
v1.nronip,
v1.codlaudo,
v1.dsclaudo,
v2.codrecomendacao,
v2.dscrecomendacao
FROM view392_395 v1
LEFT OUTER JOIN view398_397 v2
ON v1.laudoid
= v2.laudoid
In my opinion, views are always under used. Views are your friend. They can simplify some of the most complicated queries.

SQL Method of checking that INNER / LEFT join doesn't duplicate rows

Is there a good or standard SQL method of asserting that a join does not duplicate any rows (produces 0 or 1 copies of the source table row)? Assert as in causes the query to fail or otherwise indicate that there are duplicate rows.
A common problem in a lot of queries is when a table is expected to be 1:1 with another table, but there might exist 2 rows that match the join criteria. This can cause errors that are hard to track down, especially for people not necessarily entirely familiar with the tables.
It seems like there should be something simple and elegant - this would be very easy for the SQL engine to detect (have I already joined this source row to a row in the other table? ok, error out) but I can't seem to find anything on this. I'm aware that there are long / intrusive solutions to this problem, but for many ad hoc queries those just aren't very fun to work out.
EDIT / CLARIFICATION: I'm looking for a one-step query-level fix. Not a verification step on the results of that query.
If you are only testing for linked rows rather than requiring output, then you'd use EXISTS.
More correctly, you need a "semi-join" but this isn't supported by most RDBMS unless as EXISTS
SELECT a.*
FROM TableA a
WHERE EXISTS (SELECT * FROM TableB b WHERE a.id = b.id)
Also see:
Using 'IN' with a sub-query in SQL Statements
EXISTS vs JOIN and use of EXISTS clause
SELECT JoinField
FROM MyJoinTable
GROUP BY JoinField
HAVING COUNT(*) > 1
LIMIT 1
Is that simple enough? Don't have Postgres but I think it's valid syntax.
Something along the lines of
SELECT a.id, COUNT(b.id)
FROM TableA a
JOIN TableB b ON a.id = b.id
GROUP BY a.id
HAVING COUNT(b.id) > 1
Should return rows in TableA that have more than one associated row in TableB.