I am new to sql, i use Sql Developer (Oracle db).
When I need to select some data with null values I write one of these selects:
1)
SELECT i.number
,i.amount
,(SELECT value FROM ATTRIBUTES a
WHERE a.parameter = 'aaa' AND a.item_nr = i.number) AS atr_value
FROM ITEMS i
2)
SELECT i.number
,i.amount
,a.value as atr_value
FROM ITEMS i
left outer join ATTRIBUTES a
on a.parameter = 'aaa'
and a.item_nr = i.number
Questions:
What is difference?
How first approach is called (how can I google it)? Where can I read about it?
Which one should I use further (what is best practices), maybe there is better way to select same data?
Axample of tables:
Your two queries are not exactly the same. If you have multiple matches in the second table, then the first generates an error and the second generates multiple rows.
Which is better? As a general rule, the LEFT JOIN method (the second method) is considered the better practice than the correlated subquery (the first method). Oracle has a pretty good optimizer and it offers many ways of optimizing joins. I also think Oracle can use JOIN algorithms for the correlated subqueries (not all databases are so smart). And, with the right indexes, the two forms probably have very similar performance.
There are situations where correlated subqueries have better performance than the equivalent JOIN construct. For this example, though, I would expect the performance to be similar.
In case there is never more than one matching row in table attributes, the queries do the same. It's just two ways to query the same data. Both querys are fine and straight-forward.
In case there can be more than one match, query one (which is using a correlated subquery) would fail. It would be inappropriate for the given task then.
The query with the outer join is easier to extend, when you want a second column from the attributes table.
The first query makes it crystal-clear that you expect zero or one matches in table attributes for each item. In case of data inconsistency or if you have an error in your query such as a forgotten criteria, it will fail, which is good.
The second query would simply retrieve more rows in case of such error, which may not be desired.
So it's a matter of personal preference and of your choice how the query is to deal with inconsistencies which query to choose.
Related
Let's assume I have three tables, subscriptions that has a field called type, which can only have 2 values;
FREE
PREMIUM.
The other two tables are called premium_users and free_users. I'd like to perfom a LEFT JOIN, starting from the subscriptions table but the thing is that depending on the value of the field type I will ONLY find the matching row in one or the other table, i.e. if type equals 'FREE', then the matching row will ONLY be in free_users table and vice versa.
I'm thinking of some ways to do this, such as LEFT JOINING both tables and then using a COALESCE function the get the non null value, or with a UNION, with two different queries using a INNER JOIN on both queries, but I'm not quite sure which would be the best way in terms of performance. Also, as you would guess, the free_users table is almost five times larger than the premium_users table. Another thing you should know, is that I'm joining by user_id field, which is PK in both free_users and premium_users
So, my question is: which would be the most performant way to do a JOIN that depending on the value of type column will match to one table or another. Would this solution be any different if instead of two tables there were three, or even more?
Disclaimer: This DB is a PostgreSQL and is already up and running in production and as much as I'd like to have a single users table it won't happen in the short term.
What is the best in terms of performance? Well, you should try on your data and your systems.
My recommendation is two left joins:
select s.*,
coalesce(fu.name, pu.name) as name
from subscriptions s left join
free_users fu
on fu.free_id = s.subscription_id and
s.type = 'free' left join
premium_users pu
on pu.premium_id = s.suscription_id and
s.type = 'premium';
You want indexes on free_users(free_id) and premium_users(premium_id). These are probably "free" because these ids should be the primary keys in the table.
If you use union all, then the optimizer may not use indexes for the joins. And not using indexes could have a dastardly impact on performance.
For Example:
I have 4 columns (A,B,C,D).
I thought that instead of connecting each and every column in join I should make a concatenated column in both projection(CA_CONCAT-> A+B+C+D) and make a join on this, Just to check on which method performance is better.
It was working faster earlier but in few CV's this method is slower sometimes, especially at time of filtering!
Can any one suggest which is an efficient method?
I don't think the JOIN conditions with concatenated fields will work better in performance.
Although we say in general there is not a need for index on column tables on HANA database, the column tables have a structure that works with an index on every column.
So if you concatenate 4 columns and produce a new calculated field, first you loose the option to use these index on 4 columns and the corresponding joining columns
I did not check the execution plan, but it will probably make a full scan on these columns
In fact I'm surprised you have mentioned that it worked faster, and experienced problems only on a few
Because concatenation or applying a function on a database column is even only by itself a workload over the SELECT process. It might include implicit type cast operation, which might bring additional workload more than expected
First I would suggest considering setting your table to column store and check the new performance.
After that I would suggest to separate the JOIN to multiple JOINs if you are using OR condition in your join.
Third, INNER JOIN will give you better performance compare to LEFT JOIN or LEFT OUTER JOIN.
Another thing about JOINs and performance, you better use them on PRIMARY KEYS and not on each column.
For me, both the time join with multiple fields is performing faster than join with concatenated fields. For filtering scenario, planviz shows when I join with multiple fields, filter gets pushed down to both the tables. On the other hand, when I join with concatenated field only one table gets filtered.
However, if you put filter on both the fields (like PRODUCT from Tab1 and MATERIAL from Tab2), then you can push the filter down to both the tables.
Like:
Select * from CalculationView where PRODUCT = 'A' and MATERIAL = 'A'
I'm not sure I can provide enough details for an answer, but my company is having a performance issue with an older mssql view. I've narrowed it down to the right outer joins, but I'm not familiar with the structure of joins following joins without a "ON" with each one, as in the code snippet below.
How do I write the joins below to either improve performance or to the simpler format of Join Tablename on Field1 = field2 format ?
FROM dbo.tblObject AS tblObject_2
JOIN dbo.tblProspectB2B PB ON PB.Object_ID = tblObject_2.Object_ID
RIGHT OUTER JOIN dbo.tblProspectB2B_CoordinatorStatus
RIGHT OUTER JOIN dbo.tblObject
INNER JOIN dbo.vwDomain_Hierarchy
INNER JOIN dbo.tblContactUser
INNER JOIN dbo.tblProcessingFile WITH ( NOLOCK )
LEFT OUTER JOIN dbo.enumRetentionRealization AS RR ON RR.RetentionRealizationID = dbo.tblProcessingFile.RetentionLeadTypeID
INNER JOIN dbo.tblLoan
INNER JOIN dbo.tblObject AS tblObject_1 WITH ( NOLOCK ) ON dbo.tblLoan.Object_ID = tblObject_1.Object_ID ON dbo.tblProcessingFile.Loan_ID = dbo.tblLoan.Object_ID ON dbo.tblContactUser.Object_ID = dbo.tblLoan.ContactOwnerID ON dbo.vwDomain_Hierarchy.Object_ID = tblObject_1.Domain_ID ON dbo.tblObject.Object_ID = dbo.tblLoan.ContactOwnerID ON dbo.tblProspectB2B_CoordinatorStatus.Object_ID = dbo.tblLoan.ReferralSourceContactID ON tblObject_2.Object_ID = dbo.tblLoan.ReferralSourceContactID
Your last INNER JOIN has a number of ON statements. Per this question and answer, such syntax is equivalent to a nested subquery.
That is one of the worst queries I have ever seen. Since I cannot figure out how it is supposed to work without the underlying data, this is what I suggest to you.
First find a good sample loan and write a query against this view to return where loan_id = ... Now you have a data set you chan check you changes against more easily than the, possibly, millions of records this returns. Make sure these results make sense (that right join to tbl_objects is bothering me as it makes no sense to return all the objects records)
Now start writing your query with what you think should be the first table (I would suggest that loan is the first table, if it not then the first table is Object left joined to loan)) and the where clause for the loan id.
Check your results, did you get the same loan information as teh view query with the where clause added?
Then add each join one at a time and see how it affects the query and whether the results appear to be going off track. Once you have figured out a query that gives the same results with all the tables added in, then you can try for several other loan ids to check. Once those have checked out, then run teh whole query with no where clause and check against the view results (if it is a large number you may need to just see if teh record counts match and visually check through (use order by on both things in order to make sure your results are in the same order). In the process try to use only left joins and not that combination of right and left joins (its ok to leave teh inner ones alone).
I make it a habit in complex queries to do all the inner joins first and then the left joins. I never use right joins in production code.
Now you are ready to performance tune.
I woudl guess the right join to objects is causing a problem in that it returns teh whole table and the nature of that table name and teh other joins to the same table leads me to believe that he probably wanted a left join. Without knowing the meaning of the data, it is hard to be sure. So first if you are returning too many records for one loan id, then consider if the real problem is that as tables have grown, returning too many records has become problematic.
Also consider that you can often take teh view and replace it with code to get the same results. Views calling views are a poor technique that often leads to performance issues. Often the views on top of the other views call teh same tables and thus you end up joining to them multiple times when you don;t need to.
Check your Explain plan or Execution plan depending on what database backend you have. Analysis of this should show where you might have missing indexes.
Also make sure that every table in the query is needed. This especially true when you join to a view. The view may join to 12 other tables but you only need the data from one of them and it can join to one of your tables. MAke sure that you are not using select * but only returning teh fields the view actually needs. You have inner joins so, by definition, select * is returning fields you don't need.
If your select part of teh view has a distinct in it, then consider if you can weed down the multiple records you get that made distinct needed by changing to a derived table or adding a where clause. To see what is causing the multiples, you may need to temporarily use select * to see all the columns and find out which one is not uniques and is causing the issue.
This whole process is not going to be easy or fun. Just take it slowly, work carefully and methodically and you will get there and have a query that is understandable and maintainable in the end.
I have a problem where I have to try to find people who have old accounts with an outstanding balance, but who have created a new account. I need to match them by comparing SSNs. The problem is that we have primary and additional contacts, so 2 potential SSNs per account. I need to match it even if they where primary at first, but now are secondary etc.
Here was my first attempt, I'm just counting now to get the joins and conditions down. I'll select actual data later. Basically the personal table is joined once to active accounts, and another copy to delinquent accounts. The two references to the personal table are then compared based on the 4 possible ways SSNs could be related.
select count(*)
from personal pa
join consumer c
on c.cust_nbr = pa.cust_nbr
and c.per_acct = pa.acct
join personal pu
on pu.ssn = pa.ssn
or pu.ssn = pa.addl_ssn
or pu.addl_ssn = pa.ssn
or pu.addl_ssn = pa.addl_ssn
join uncol_acct u
on u.cust_nbr = pu.cust_nbr
and u.per_acct = pu.acct
where u.curr_bal > 0
This works, but it takes 20 minutes to run. I found this question Is having an 'OR' in an INNER JOIN condition a bad idea? so I tried re-writing it as 4 queries (one per ssn combination) and unioning them. This took 30 minutes to run.
Is there a better way to do this, or is it just a really inefficient process no mater how you do it?
Update: After playing with some options here, and some other experimenting I think I found the problem. Our software vendor encrypts the SSNs in the database and provides a view that decrypts them. Since I have to work from that view it takes a really long time to decrypt and then compare.
If you run separate joins and then union then, then you might have problems. What if the same record pair fulfills at least two conditions? You will have duplicates in your result then.
I believe your first approach is feasible, but do not forget that you are joining four tables. If the number of rows is A, B, C, D in the respective tables, then the RDBMS will have to check a maximum of A * B * C * D records. If you have many records in your database, then this will take a lot of time.
Of course, you can optimize your query by adding indexes to some columns and that would be a good idea if they are not indexed already. But do not forget that if you add an index to a column, then the RDBMS will be quicker to read from there, but slower to write there. If your operations are mostly reads (select), then you should index your columns, but not blindly, study indexing a bit before you start doing it.
Also, if you are joining four tables, personal, consumer, personal (again) and uncol_acct, then you might do something like this:
Write a query, which contains two subqueries, each of them named as t1 and t2, respectively. The first subquery joins personal and consumer and will name the result as t1. The second query will join the second occurrence of personal with uncol_acct and the where clause will be inside your second join. As described before, your query will contain two subqueries, named t1 and t2, respectively. Your query will join t1 and t2. This way you opimise, as your main query will consider only the pairing of valid t1 and t2.
Also, if your where clause is outside as in your example query, then the 4-dimensional join will be executed and only after that will the where be taken into consideration. This is why the where clause should be inside the second sub-query, so the where clause will run before the main join. Also, you can create a subquery inside the second subquery to calculate the where if the condition is fulfilled rarely.
Cheers!
I often know exactly what I want, and know how tables are related in order to get it, but I have a real hard time translating that literal language knowledge to SQL syntax when it comes to joins. Do you have any tips or tricks you can share that have worked for you in the past?
This is a basic, but poor example:
"I have Categories, which have one-to-many Products, which have one-to-many Variants, which have one-to-many Sources. I need all Sources that belong to Category XYZ."
I imagine doing something where you cross out certain language terms and replace them with SQL syntax. Can you share how you formulate your queries based upon some concept similar to that? Thanks!
Use SQL Query Designer to easily buid Join queries from the visual table collection right there, then if you want to learn how it works, simply investigate it, that's how I learned it.
You won't notice how charming it is till you try it.
Visual Representation of SQL Joins - A walkthrough explaining SQL JOINs.
Complete ref of SQL-Server Join, Inner Join, Left Outer Join, Right Outer Join, Full Outer Join, in SQL-Server 2005 (View snapshot bellow).
ToTraceString of Entity Frameork' ObjectQuery (that you add Include shapings to it) is also a good way to learn it.
SQL-Server Join types (with detailed examples for each join type):
INNER JOIN - Match rows between the two tables specified in the INNER JOIN statement based on one or more columns having matching data. Preferably the join is based on referential integrity enforcing the relationship between the tables to ensure data integrity.
Just to add a little commentary to the basic definitions above, in general the INNER JOIN option is considered to be the most common join needed in applications and/or queries. Although that is the case in some environments, it is really dependent on the database design, referential integrity and data needed for the application. As such, please take the time to understand the data being requested then select the proper join option.
Although most join logic is based on matching values between the two columns specified, it is possible to also include logic using greater than, less than, not equals, etc.
LEFT OUTER JOIN - Based on the two tables specified in the join clause, all data is returned from the left table. On the right table, the matching data is returned in addition to NULL values where a record exists in the left table, but not in the right table.
Another item to keep in mind is that the LEFT and RIGHT OUTER JOIN logic is opposite of one another. So you can change either the order of the tables in the specific join statement or change the JOIN from left to right or vice versa and get the same results.
RIGHT OUTER JOIN - Based on the two tables specified in the join clause, all data is returned from the right table. On the left table, the matching data is returned in addition to NULL values where a record exists in the right table but not in the left table.
Self Join - In this circumstance, the same table is specified twice with two different aliases in order to match the data within the same table.
CROSS JOIN - Based on the two tables specified in the join clause, a Cartesian product is created if a WHERE clause does filter the rows. The size of the Cartesian product is based on multiplying the number of rows from the left table by the number of rows in the right table. Please heed caution when using a CROSS JOIN.
FULL JOIN - Based on the two tables specified in the join clause, all data is returned from both tables regardless of matching data.
I think most people approach its:
Look for substantives, as they can point to potential tables
Look for adjectives, cause they probably are fields
Relationships between substantives gives JOIN rules
Nothing better than to draw these structures in a paper sheet.
Write and debug a query which returns the fields from the table having the majority of—or the most important—data. Add constraints which depend only on that table, or which are independent of all tables.
Add a new where term which relates another table.
Repeat 2 until done.
I've yet to use the join operator in a query, even after 20+ years of writing SQL queries. One can almost always write them in the form
select field, field2, field3, <etc.>
from table
where field in (select whatever from table2 where whatever) and
field2 in (select whatever from table2 where whatever) and ...
or
select field, field2, field3, <etc.>
from table1, table2, ...
where table1.field = table2.somefield and
table1.field2 = table3.someotherfield and ...
Like someone else wrote, just be bold and practice. It will be like riding a bicycle after 4 or 5 times creating such a query.
One word: Practice.
Open up the query manager and start running queries until you get what you want. Look up similar examples and adapt them to your situation. You will always have to do some trial and error with the queries to get them right.
SQL is very different from imperative programming.
1) To design tables, consider Entities (the real THINGS in the world of concern), Relationships (between the Entities), and Attributes (the values associated with an Entity).
2) to write a Select statement consider a plastic extrusion press:
a) you put in raw records From tables, Where conditions exist in the records
b) you may have to join tables to get at the data you need
c) you craft extrusion nozzles to make the plastic into the shapes you want. These are the individual expressions of the select List.
d) you may want the n-ary sets of list data to come to you in a certain order, you can apply an Order By clause.
3) crafting the List expressions is the most like imperative programming after you discover the if(exp,true-exp,false-exp) function.
Look at the ERD.
Logical or physical version, it will show what tables are related to one another. This way, you can see what table(s) you need to join to in order to get from point/table a to point/table b, and what criteria.