I’m trying to get data about Installations in buildings. The problem is that one building can have multiple installations and I’m unsure how to adjust my sql for that as the initial table I query only holds the relations that own the buildings.
Here’s the situation.
Table 1 (RELRLGRP) holds the id of the group the relations that own the buildings that have the installations that have the data I need.
This is what I have so far, I’m worried I shouldn’t use this many joins in an SQL statement but cannot find a quicker link between the information I need from my starting point at the group of relations till the installation data I seek in the BORGINST table. Please disregard the select portion of the statement (removed it for clarity).
SELECT *
FROM RELRLGRP A
JOIN RELATION R ON A.RELATION_GC_ID = R.GC_ID
JOIN BUILDING G ON R.CODE = G.GC_CODE
JOIN INSTALL I ON G.GC_CODE = I.GC_CODE
JOIN BORGINST B ON I.GC_ID = B.GC_ID
WHERE A.RELGROUP_GC_ID LIKE '100109' (<- the group the relations belong to)
I’ve done some rudimentary SQL but this linking through tables is new territory for me, in that sense I’d be happy to know if this many join statements are the way to go or if I should head a different route entirely.
JesseJ - Since I don't know all the columns that exist in your tables, I am going to assume that you are joining on primary keys. If this is the case, your solution may be the only one available to link the RELRLGRP to the BORGINST table.
Linking multiple tables like you are doing can be common in a normalized database.
Example:
In the example I posted, in order to find the State where a particular transaction happened, you have to link all the tables together. There is no shortcut.
Don't sweat it: I have views with three times as many joins. Every join does add complexity and suck more processor, but it really comes down to performance: if this process doesn't finish as quickly as you need it to, you can look into other methods, but otherwise multiple joins like this are perfectly fine.
Related
I think I got a knot in my line of thought, surely you can untie it.
Basically I have two working queries which are based on the same table and result in an identical structure (same as source table). They are simply two different kinds of row filters. Now I would like to "stack" these filters, meaning that I want to retract all the entries which are in query a and query b.
Why do I want that?
Our club is structured in several local groups and I need to hand different kinds of lists (e.g. members with email-entry) to these groups. In this example I would have a query "groupA" and a query "newsletter". Another could be "groupB" and "activemember", but also "groupB" and "newsletter". Unfortunately each query is based on a set of conditions, which imho would be stored best in a single query instead of copying the conditions several times to different queries (in case something changes).
Judging from the Venn diagrams 1, I suppose I need to use INNER JOIN but could not get it to work. Neither with the LibreOffice Base query assistant nor an SQL-Code. I tried this:
SELECT groupA.*
FROM groupA
INNER JOIN newsletter
ON groupA.memberID = newsletter.memberID
The error code says: Cannot be in ORDER BY clause in statement
I suppose that the problem comes from the fact, that both queries are based on the same table.
May be there is an even easier way of nesting queries?
I am hoping for something like
SELECT * FROM groupA
WHERE groupA.memberID = newsletter.memberID
Thank you and sorry if this already has a duplicate, I just could not find the right search terms.
Okay guys, i'm literally SCREWED. My professor is on indefinite leave and I have an assignment due next Friday which I'm completely lost on. I've entered all my data into my tables and I'm creating the views. Our assignment is to create 5 reports for a business in SQL and transfer them to Excel to create a frontend.
Basically, can someone describe to me how I would I utilise view and joins to create a report for this
A join means you are going to match up columns in two tables that have the same column, and add the data for both tables together, essentially creating one big table. You can create a view by using this code. What that will do is give you one thing to call, the view, and it will contain all of the code from the joins you did to create it, so you don't have to re-code and re-validate every time you want to use those joins. This isn't the place where we can just give you what you'd learn in your course, but I hope that helps.
Example:
select *
from tableSales a
join tableStaff b on a.Staff_ID = b.Staff_ID
join tableNext c on b.Column = c.Column (you can also join to table a)
This will give you the data from both tables in one place, based on the staff ID. You can then join a column from the tableStaff table to another table and so on.
With this one statement you can run it and see how it puts all the columns into one table. If you put this code into a view, you can then access it. Furthermore, Excel has built in functionality to read the views you have created, and lets you refresh the reports by connecting to the database and then to the view.
Good luck!
Watch out for duplicates!
I have a DB (Access 2010) that I am pulling data from, but I am trying to make it easier to pull specific cases instead of mucking about in Excel.
We have about 78 product type codes that we classify as a certain account type. Unfortunately I can't use an IN() function because there are too many characters (there is the 1024 char limit). I looked online for help and it was suggested that I make a table to inner join on the product codes that I want.
I created a table with the codes I want to pull, then joined on the productcodetype in the linked database table. Unfortunately when I run the sql nothing shows up, just blank. I tried different join combinations to no avail, read up further and found that you can't enforce referential integrity on linked DB tables from non-linked DB tables.
I think this is my problem but I'm not sure, and I don't know if I'm using the right language, but I can't find a similar issue to mine so I'm hoping it's an easy fix and I'm just not thinking about it the right way.
Is there any way to select certain cases (78 product type codes) from a large database using something like IN() or a reference table when I can't create a new table in the linked db?
Thank you,
K
You must to use two tables and build a query that join them. If your join don't return any result, be sure that the joined fields are of the same data type and realy share the same values.
If your data source is Excel, be sure that there isn't any trailing blanks or other 'invisible' character.
Here's my query, it is fairly straightforward:
SELECT
INVOICE_ITEMS.II_IVNUM, INVOICE_ITEMS.IIQSHP
FROM
INVOICE_ITEMS
LEFT JOIN
INVOICES
ON
INVOICES.INNUM = INVOICE_ITEMS.II_INNUM
WHERE
INVOICES.IN_DATE
BETWEEN
'2010-08-29' AND '2010-08-30'
;
I have very limited knowledge of SQL, but I'm trying to understand some of the concepts like subqueries and the like. I'm not looking for a redesign of this code, but rather an explanation of why it is so slow (600+ seconds on my test database) and how I can make it faster.
From my understanding, the left join is creating a virtual table and populating it with every result row from the join, meaning that it is processing every row. How would I stop the query from reading the table completely and just finding the WHERE/BETWEEN clause first, then creating a virtual table after that (if it is possible)?
How is my logic? Are there any consistently recommended resources to get me to SQL ninja status?
Edit: Thanks everyone for the quick and polite responses. Currently, I'm connecting over ODBC to a proprietary database that is used in the rapid application development framework called OMNIS. Therefore, I really have no idea what sort of optimization is being run, but I believe it is based loosely on MSSQL.
I would rewrite it like this, and make sure you have an index on i.INNUM, ii.INNUM, and i.IN_DATE. The LEFT JOIN is being turned into an INNER JOIN by your WHERE clause, so I rewrote it as such:
SELECT ii.II_IVNUM, ii.IIQSHP
FROM INVOICE_ITEMS ii
INNER JOIN INVOICES i ON i.INNUM = ii.II_INNUM
WHERE i.IN_DATE BETWEEN '2010-08-29' AND '2010-08-30'
Depending on what database you are using, what may be happening is all of the records from INVOICE_ITEMS are being joined (due to the LEFT JOIN), regardless of whether there is a match with INVOICE or not, and then the WHERE clause is filtering down to the ones that matched that had a date within range. By switching to an INNER JOIN, you may make the query more efficient, by only needing to apply the WHERE clause to INVOICES records that have a matching INVOICE_ITEMS record.
SInce that is a very basic query the optimizer should do fine with it, likely your problem would be incorrect indexing. DO you haveindexes on the In_date field and INVOICE_ITEMS.II_INNUM field? If you have properly set up PK Fk relationships, INVOICES.INNUM should already be indexed but FKs are not indexed automatically.
Your query is fine, it's the indexes you have to look at.
Are INVOICES.INNUM and INVOICE_ITEMS.II_INNUM indexed?
If not SQL has to do something called a 'scan' - it searches every single record.
You can think of indexes as like the tabs on the side of a phone book - you know where to start looking for people based on the first letters of their surname. Without an index (say you want to look for names that end in '...son') you have to search the entire book.
There are different types of index - they can be ordered (like the phone book index - all ordered by surname) or not (like the index at the back of a book - there's an overhead in finding the index and then the actual page).
You should also be able to view the query plan - this is how the server executes the SQL statement. That can tell you all sorts of more advanced stuff - for instance there are multiple ways to do the job: a merge join is possible if both tables are sorted by the join field or a nested join will loop through the smaller table for every record in the larger table.
well there is no reason why this query is slow... the only thing that comes to mind is, do you have indexes on INVOICES.INNUM = INVOICE_ITEMS.II_INNUM? if you add them it could speed up the select but it would slow down updates/inserts...
A join doesn't create a "virtual table" on anything more than just a conceptual level.
The performance issue with your query most likely lies in poor or insufficient indexing. You should have indexes on:
INVOICE_ITEMS.II_INNUM
INVOICES.IN_DATE
You should also have an index on INVOICES.INNUM, but if that's the primary key of the table then it already has one.
Also, don't use a left join here. If there's a foreign key between INVOICE_ITEMS.II_INNUM and INVOICES.INNUM (and INVOICE_ITEMS.II_INNUM is not nullable), then you'll never encounter a record in INVOICE_ITEMS that won't match up to a record in INVOICES. Even if there were, your WHERE condition is using a value from INVOICES, so you'd eliminate any unmatched rows anyway. Just use a regular JOIN.
Every time a database diagram gets looked out, one area people are critical of is inner joins. They look at them hard and has questions to see if an inner join really needs to be there.
Simple Library Example:
A many-to-many relationship is normally defined in SQL with three tables: Book, Category, BookCategory.
In this situation, Category is a table that contains two columns: ID, CategoryName.
In this situation, I have gotten questions about the Category table, is it need? Can it be used as a lookup table, and in the BookCategory table store the CategoryName instead of the CategoryID to stop from having to do an additional INNER JOIN. (For this question, we are going to ignore the changing, deleting of any CategoryNames)
The question is, what is so bad about inner joins? At what point is doing them a negative thing (general guidelines like # of transactions, # of records, # of joins in a statement, etc)?
Your example is a good counterexample. How do you rename categories if they're spread throughout the various rows of the BookCategory table? Your UPDATE to do the rename would touch all the rows in the same category.
With the separate table, you only have to update one row. There is no duplicate information.
I would be more concerned about OUTER joins, and the potential to pick up info that wasn't intended.
In your example, having the Category table means that a book is limited to being filed under a preset Category (via a foriegn key relationship), if you just shoved multiple entries in to the BookCategory table then it would be harder to limit what is selected for the Category.
Doing an INNER join is not so bad, it is what databases are made for. The only time it is bad is when you are doing it on a table or column that is inadequately indexed.
I am not sure there is some thing wrong in inner join per se, it is like each IF you add to your code impacts performance (or should I say every line...), but still, you need a minimum number of those to make your system work (yes yes, I know about Turing machines).
So if you have something that is not needed, it will be frowned upon.
When you map your domain model onto the relational model you have to split the information across multiple relations in order to get a normalized model - there is no way around that. And then you have to use joins to combine the relations again and get your information back. The only bad thing about this is that joins are relative expensive.
The other option would be not to normalize your relational model. This will fill your database with much redundant data, give you many opportunities to turn your data inconsistent and make updates a nightmare.
The only reason not to normalize a relational model (I can think of at the moment) is that reading performance is extremely - and I mean extremely - critical.
By the way, why do you (they) only mention inner joins? How are left, right, and full outer joins significantly different from inner joins?
Nobody can offer much about general guidelines - they'd be specific to the server, hardware, database design, and expectations... way too many variables.
Specifically about INNER JOINs being inefficient or bad... JOINs are the center of relational DBs, and they've been around for decades. It's only wrong when you use it wrong, because obviously someone's doing it right since it's not extinct yet. Personally, I'd assume anyone throwing out blanket statements like that either don't know SQL or know just enough to get in trouble. Next time it comes up, teach them how to use the query cache.
(Not mentioning update/delete, but you didn't say inserts!: the increased maintainability through avoiding humans and their typos can easily be worth at least 10x the time a join will take.)