Can I use an IN clause with a LEFT JOIN clause - sql

FORE NOTE: This question regards the IN clause that belongs in the FROM statement and lets you reference to an external database. Do not confuse this with the IN statements that might live in a WHERE clause, please.
Version: MS Access 2016
External table is on the local network
The crux of what I am trying to do is grab an [Employee] table from an external ACCDB database and LEFT JOIN it to a local [Employees] (note the 's') table. I am trying to generate a list of (non-terminated) employees that are not yet added to my local [Employees] table. As in:
SELECT Employee.Last_Name, Employee.First_Name, Employee.Job_Title
FROM Employee IN "\\{full path}\Time Clock 1.0_be.accdb"
LEFT JOIN Employees
ON Employee.Last_Name = Employees.LastName
AND Employee.First_Name = Employees.FirstName
WHERE Employees.FirstName IS NULL
AND Employee.Termination_Date = ""
ORDER BY Employee.Last_Name, Employee.First_Name;
Only the above SQL doesn't work. Access gives me the ever-so-not-very helpful Syntax error in FROM clause to brighten my neurotic insanity.
Does the IN clause have to go last and does it effect both tables? At:
https://msdn.microsoft.com/en-us/library/bb177907(v=office.12).aspx
they say it can be combined with a LEFT JOIN but they don't specify if both tables must be external.
Can you even LEFT JOIN a table from an external DB to an local table? I don't really want to link the table formally as this query will only run occasionally and I don't want any more traffic pinging the Time Clock back end DB than I have to. It's slow enough as it is.

In answer to my original question:
#cha was right to suggest I use nested queries. This solves the problem of an internal table being joined to an external table.
#Gord Thompson had a much more specific way of referencing to an external DB that seems clearer to me than the IN clause in this simple case. Programmers may want to use the IN clause when connecting to different types of external databases as it gives you the ability to specify all that in the IN clause.
In the end none of this helped me because the train-wreck-of-a-database I lovingly caress uses multi-value fields and Access will not link an internal table with multi-valued fields to an external table.
Those who come after you (and probably you yourself) will thank you for observing 1st 2nd and 3rd normal forms except in the most unusual and carefully considered cases and for never ever ever ever using multi-valued fields instead of linking tables for many-to-many relationships.
Aloha!

Related

When to use three-part column references in SQL 2014

Firstly, apologies if this is in the wrong section, or the wrong style. Hunted for this answer for a while, to no avail.
Imagine you have a (sample) SQL query in SQL 2014 -
SELECT
dbo.Users.Surname,
dbo.Accounts.Type
FROM
dbo.Users
INNER JOIN
dbo.Accounts
ON (dbo.Users.Id = dbo.Accounts.Id)
Up until now, this is the format I've been using - fully qualifying the table objects with [schema].[tablename].[column].
However, looking at the SQL 2014 Deprecated Database Engine Features, it says that this style is no longer standard -
Two-part names is the standard-compliant behavior.
After digging around for a while, I found the Transact-SQL Syntax Conventions, where it says -
To avoid name resolution errors, we recommend specifying the schema name whenever you specify a schema-scoped object.
So I'm a little confused as to how my little code snippet should be written. Should I only use the schema when referencing the tables, but when referring to columns, skip the schema and just use the table names? Or is it assuming all table objects should have an alias?
Again, apologies for the potential subjectivity of this question. But essentially I'm asking about how to write SQL that does not use a deprecated feature of SQL 2014, but still reads well when joining multiple tables.
It says that the deprecated feature applies to referencing columns, not tables.
To clarify imagine two statements:
SELECT dbo.Orders.ID FROM dbo.Orders
and
SELECT Orders.ID FROM dbo.Orders
The first is deprecated, not the last.
To avoid name resolution errors, we recommend specifying the schema
name whenever you specify a schema-scoped object.
This applies to default schema of user. If user has default schema say 'Person' and two tables with the same name 'dbo.Persons', 'Person.Persons' exist in the database, then if that user executes:
SELECT * FROM Persons
he will get results from table in Person schema, even if he did want data from dbo.
So the actual answer is:
Use
SELECT Orders.ID FROM dbo.Orders

left join using comma separated column using sql

I am working on an asp.net application with SQL server database. This db has two tables Vacancies and dutystations. Vacancies table has a column named dutystationId which stores ids of dutystations in comma separated list like this:
2,12,15,18,19,23
Now I want to show this vacancy in grid and I have used left join like this:
QUERY
SELECT * FROM dbo.hr_Vacancies
CROSS APPLY dbo.hr_Split(dbo.hr_Vacancies.DutyStationID, ',') AS s
LEFT OUTER JOIN dbo.hr_DutyStations
ON s.Data = dbo.hr_DutyStations.DutyStationID
and in xsd, I have set vacancyid as primary key. but I get error:
ERROR
Failed to enable constraints. One or more rows contain values violating non-null, unique, or foreign-key constraints.
If I remove this constraint, I get 6 rows. I want to show one row only. How can I do this?
I stopped reading here:
Vacancies table has a column named dutystationId which stores ids of dutystations in comma seperated list
That is your problem right there. If you have comma separated values in an RDBMS, specifically if they contain foreign keys to other tables, you should halt full stop whatever you're doing and start redesigning your database. Many-to-many relations in an RDBMS are implemented with junction tables, and if you use them all your problems will suddenly solve themselves.
Your current design is not only hell to write SQL queries for, like this question illustraties perfectly as you cannot solve a trivial task, but it also kills performance - those calls to hr_Split are infinitely more computationally expensive than just doing proper joins.
Don't fall into the XY trap, solve the real problem first. Which is that you're even violating First Normal Form right now.

How to best explain on what fields should a user join on?

I need to explain to somebody how they can determine what fields from multiple tables/views they should join on. Any suggestions? I know how to do it but am having difficulty trying to explain it.
One of the issues they have is they will take two fields from two tables that are the same (zip code) and join on those, when in reality they should be joining on ID columns. When they choose the wrong column to join on it increases records they receive in return.
Should I work in PK and FK somewhere?
While it is indeed typical to join a PK to an FK any conversation about JOIN clauses that only revolve around PK's and FK's is fairly limited
For example I had this FROM clause in a recent SQL answer I gave
FROM
YourTable firstNames
LEFT JOIN YourTable lastNames
ON firstnames.Name = lastNames.Name
AND lastNames.NameType =2
and firstnames.FrequencyPercent < lastNames.FrequencyPercent
The table referenced on each side of the table is the same table (a self join) and it includes three condidtions one of which is an inequality. Furthermore there would never be an FK here because its looking to join on a field, that is by design, not a Candidate Key.
Also you don't have even have to join one table to another. You can join inline queries to each other which of course can't possibly have a Key.
So in order to properly understand JOIN you just need to understand that it combines the records from two relations (tables, views, inline queries) where some conditions evaluate to true. This means you need to understand boolean logic and the database and the data in the database.
If your user is having a problem with a specific JOIN ask them to SELECT some rows from one table and also the other and then ask them under what conditions would you want to combine the rows.
You don't need to talk in terms of a primary key of a table but you should point to it and explain that it uniquely identifies a given row and that you must join to related tables using it or you could get duplicated results.
Give them examples of joining with it and joining without it.
An ER diagram showing all of the tables they use and their key relationships would help ensure that they always use the correct keys.
It sounds to me like neither you, nor the person you are trying to help understands how this particular database is constructed and perhaps don't really even understand basic database fundamentals, like PK's and FK's. Most often a PK from one table is joined to a FK to another table.
Assuming the database has the proper PK's and FK's in place, it would probably help a great deal to generate an ER diagram. That would make the joining concept much easier to grasp.
Another approach you could take is to find someone who does understand these things and create some views for this person to use. This way he doesn't need to understand how to join the tables together.
A user shouldn't typically be doing joins. A user should have an interface that lets them get the data that they need in the way that they need it. If you don't have the developer resources to do that then you're going to be stuck with this problem of having to teach a user technical details. You also need to be very careful about what kind of damage the user can do. Do they have update rights on the data? I hope they don't accidentally do a DELETE FROM Table with no WHERE clause. Even if you restrict their permissions, a poorly written query can crush the database server or block resources causing problems for other users (and more work for you).
If you have no choice, then I think that you need to certainly teach them about primary and foreign keys, even if you don't call them that. Point out that the id on your table (or whatever your PK is) identifies a row. Then explain how the id appears in other tables to show the relationship. For example, "See, in the address table we have a person_id which tells us who that address belongs to."
After that, expect to spend a large portion of your time with that user as they make mistakes or come up with other things that they want to get from the database, but which they can't figure out how to get.
From theory, and ideally, you should define primary keys on all tables, and join tables using a primary key to the matching field or fields (foreign key) in the other table.
Even if you don't define or if they're not defined as primary keys, you need to make sure the fields uniquely identify the records in the table, and that they should be properly indexed.
For example, let's say the 'person' table has a SSN and a driver's license field. The SSN could be considered and flagged as the 'primary key', but if you join that table to a 'drivers' table which might not have the SSN, but does have the driver's license #, you could join them by the driver's license field (even if it's not flagged as primary key), but you need to make sure that the field is properly indexed in both tables.
...explain to somebody how they can determine what fields from multiple tables/views they should join on.
Simply put, look for the columns with values that match between the tables/views. Preferably, match exactly but some massaging might be necessary.
The existence of foreign key constraints would help to know what matches to what, but the constraint might not be directly to the table/view that is to be joined.
The existence of a primary key doesn't mean it is the criteria that is necessary for the query, so I would overlook this detail (depending on the audience).
I would recommend attacking the desired result set by starting with the columns desired, and working back from there. If there's more than one table's columns in the result set, focus on the table whose columns should be returning distinct results first and then gradually add joins, checking the result set between each JOIN addition to confirm the results are still the same. Otherwise, need to review the JOIN or if a JOIN is actually necessary vs IN or EXISTS.
I did this when I first started out, it comes from thinking of joins as just linking tables together, so I linked at all possible points.
Once you think of joins as a way to combine AND filter the data it becomes easier to understand them.
Writing out your request as a sentence is helpful too, "I want to see all the times Table A interacted with Table B". Then build a query from that using only the ID, noting that if you wanted to know "All the times Table A was in the same zip code as Table B" then you would join by zip code.

Is eager loading same as join fetch?

Is eager fetch same as join fetch?
I mean whether eagerly fetching a has-many relation fires 2 queries or a single join query?
How does rails active record implement a join fetch of associations as it doesnt know the table's meta-data in first hand (I mean columns in the table)? Say for example i have
people - id, name
things - id, person_id, name
person has one-to-many relation with the things. So how does it generate the query with all the column aliases even though it cannot know it when i do a join fetch on people?
An answer hasn't been accepted so I will try to answer your questions as I understand them:
"how does it know all the fields available in a table?"
It does a SQL query for every class that inherits from ActiveRecord::Base. If the class is 'Dog', it will do a query to find the column names of the table 'dogs'. In production mode it should only do this query once per run of the server -- in development mode it does it a lot. The query will differ depending on the database you use, and it is usually an expensive query.
"Say if i have a same name for column in a table and in an associated table how does it resolve this?"
If you are doing a join, it generates sql using the table names as prefixes to avoid ambiguities. In fact, if you are doing a join in Rails and want to add a condition (using custom SQL) for name, but both the main table and join table have a name column, you need to specify the table name in your sql. (e.g. Human.join(:pets).where("humans.name = 'John'"))
"I mean whether eagerly fetching a has-many relation fires 2 queries or a single join query?"
Different Rails versions are different. I think that early versions did a single join query at all times. Later versions would sometimes do multiple queries and sometimes a single join query, based on the realization that a single join query isn't always as performant as multiple queries. I'm not sure of the exact logic that it uses to decide. Recently, in Rails 3, I am seeing multiple queries happening in my current codebase -- but maybe it sometimes does a join as well, I'm not sure.
It knows the columns through a type of reflection. Ruby is very flexible and allows you to build functionality that will be used/defined during runtime and doesn't need to be stated ahead of time. It learns the associated "person_id" column by interpreting the "belongs_to :person" and knowing that "person_id" is the field that would be associated and the table would be called "people".
If you do People.includes(:things) then it will generate 2 queries, 1 that gets the people and a second that gets the things that have a relation to the people that exist.
http://guides.rubyonrails.org/active_record_querying.html

SQL Modeling / Query Question

I currently have this database structure:
One entry can have multiple items of the type "file", "text" and "url".
Everyone of these items has exactly one corresponding item in either the texts, urls or files table - where data is stored.
I need a query to efficiently select an entry with all its corresponding items and their data.
So my first approach was someting like
SELECT * FROM entries LEFT JOIN entries_items LEFT JOIN texts LEFT JOIN urls LEFT JOIN files
and then loop through it and do the post processing in my application.
But the thing is that its very unlikely that multiple items of different types exist. Its even a rare case that more then one item exists per entry. And in most cases it will be a file. But I need It anways...
So not to scan all 3 tables for eveyr item I thought I could do something like case/switch and scan the corresponding table based on the value of "type" in entries_items.
But I couldn't get it working.
I also thought about making the case/switch logic in the application, but then I would have multiple queries which would probabably be slower as the mysql server will be external.
I can also change the structure if you have a better approach!
I also having all the fields of "texts", "urls" and "files" in side the table entries_items, as its only a 1:1 relation and just have everything that is not needed null.
What would be the pros/cons of that? I think it needs more storage space and i cant do my cosntraints as i have them now. Everything needs also to be null...
Well I am open to all sorts of ideas. The application is not written yet, so I can basically change whatever I like.
You have three different entity types (URL, TEXT, FILE) being linked to the primary ENTRIES table via the intermediary table ENTRIES_ITEMS, and you are violating normal form with this "conditional join" approach. Given your structure, it is impossible to declare a foreign key constraint on ENTRIES_ITEMS.id because the id column could reference the URLS, the TEXTS, or the FILES table. To normalize the ENTRIES_ITEMS table you would have to add three separate fields, urlid, textid, and fileid and allow them to be nullable, and then you could join each of the three entities tables to the ENTRIES table via your linking table. The approach you are taking is very commonly found in legacy databases that were not SQL92-compliant, where the values were grabbed from the entities tables programmatically/procedurally rather than declaratively using SQL selects.
I would first consider adding a column to your "entries_items" table that contains an XML representation of texts, urls, and files. I can't speak for MySQL, but SQL Server has fantastic facilities for handling XML. I bet MySQL does too.
If not a state-of-the-art technique like that, then I would consider going retro and just having one items table with many nulls, as you already considered.
This may get you started, but wil not resolve hierarchical structure (parent_id) of entries and entries_items.
select *
from entries as e
join entries_items as i on i.entry_id = e.id
left join texts as t on t.item_id = i.id and i.type = 'text'
left join urls as u on u.item_id = i.id and i.type = 'url'
left join files as f on f.file_id = i.id and i.type = 'file'
;
If considering the model cleanup, this may be a starting point.