SAS SQL join criteria - sql

I am working on a project where I have inherited an SQL Join that uses join
criteria in a format I have not seen before. The basic format of the join
is this:
Proc Sql;
create table mytest as
select t1.var1,
t1.var2,
t1.var3
from mysource1 t1
left join mysource2 t2 on
(t1.var1 = t2.var1), myparam t3;
quit;
The bit I am confused about is why myparam is included as a join
condition within the ON statement of the LEFT JOIN. The contents of
'myparam' is derived from the SAS Parameter File we have defined on our
system and contains just one row, with two columns. One contains month
start date, the other month end date.
None of the columns in this parameter file are in the other two source
tables and none of the columns in the parameter file appear in the final
output (they aren't referenced in the SELECT statement so they won't do).
I'm guessing that including the 'myparam' dataset in this context is
somehow using the date values within in it to cut the data in mysource1 and
mysource2, but could someone please provide confirmation that this is the
case and the exact mechanism at work please?
Thanks

This is an unusual construction for a join in SAS, but it's basically a Cartesian product. The myparam table isn't part of the LEFT JOIN condition but a new table, starting a new join. Any table included using a comma and no join condition causes it to be joined with all rows from one table matching to all rows in the other. This can be dangerous when two large tables are used (as the amount of rows is multiplied) but in your case the myparam table has one row, so it's only 1 x n.
However, saying all that, the query you have come across doesn't use any values from myparam (or mysource2 for that matter), so I don't see why these tables are being joined on at all. I'm fairly certain the following query would be equivalent:
proc sql;
select var1,var2,var3
from mysource1;
quit;
I'm aware this answer might come across as incomplete, so please feel free to comment...

Related

Compare a column in one table against a whole other table?

I know that SQL joins exist but that is only for one column against another column in another table. Is there any way to do something similar with one column against a whole table? I'm trying to figure out if the people that exist within one organization are a certain kind of employee. The problem is I have all the people in a organization listed within a column in one table while the classification for people is scattered throughout various columns in another table.
While I will answer this, I recommend you do one of those schoolkids tutorials on SQL. This question is at such a basic level you'll probably just get confused by the answers anyway...
From your question I would gather that the tables are probably modeled incorrectly to start with (not normalized well enough). But if you want to join a column to all the columns in another table you can do it in two ways:
SELECT COLUMN_1 FROM TABLE_1 T1 INNER JOIN TABLE_2 T2 ON T1.COLUMN_1 = T2.COLUMN_1
UNION ALL
SELECT COLUMN_1 FROM TABLE_1 T1 INNER JOIN TABLE_2 T2 ON T1.COLUMN_1 = T2.COLUMN_2
UNION ALL
... (just change the column name on each row)
(works best if you copy/paste this into Excel with a macro and a list of column names from table 2).
2) More complex: create a view or subquery where you first union all columns in table 2 one by one (hopefully they all have the same type!) and then join to the subquery, which now acts as a table with just one column.
3) Start pivoting table 2. Not going into that one, too complex for your current level.
Yes, you can Basically if you have one Table and then you want to filter out some name from there and then you can use joins
and the second part is yes you can add multiple conditions with the help of where clause.
and id you want to join on multiple conditions then also you can do with and condition

Change Value in Access VBA

I have a table full of code IDs and their descriptions in access. And in another table is a field that has code IDs that correlate to the IDs in the Codes table. I am trying to design a macro that when executed will replace the code ID in the second table with the correct description but I am unsure a way to do this. I was thinking of using a SQL Insert query to do so but am unsure of what the statement would look like.
JOIN statement:
SELECT ShouldImportMetricsIDsTable.FORMULARYID, ReasonCodes.Description
FROM ShouldImportMetricsIDsTable,ReasonCodes
INNER JOIN ReasonCodes
ON ShouldImportMetricsIDsTable.ReasonCode=ReasonCodes.CodeID
Mention ReasonCodes only once in your query's FROM clause.
Change this ...
FROM ShouldImportMetricsIDsTable,ReasonCodes
INNER JOIN ReasonCodes
To this ...
FROM ShouldImportMetricsIDsTable
INNER JOIN ReasonCodes
As general advice, I suggest you begin your queries in the Access query designer. At least choose the data sources (tables or saved queries) and set up joins there.
With your original example, the designer would have applied an alias, ReasonCodes_1, for one of those duplicate ReasonCodes names. And that could be an early warning that the data sources aren't correct.

When is it required to give a table name an alias in SQL?

I noticed when doing a query with multiple JOINs that my query didn't work unless I gave one of the table names an alias.
Here's a simple example to explain the point:
This doesn't work:
SELECT subject
from items
join purchases on items.folder_id=purchases.item_id
join purchases on items.date=purchases.purchase_date
group by folder_id
This does:
SELECT subject
from items
join purchases on items.folder_id=purchases.item_id
join purchases as p on items.date=p.purchase_date
group by folder_id
Can someone explain this?
You are using the same table Purchases twice in the query. You need to differentiate them by giving a different name.
You need to give an alias:
When the same table name is referenced multiple times
Imagine two people having the exact same John Doe. If you call John, both will respond to your call. You can't give the same name to two people and assume that they will know who you are calling. Similarly, when you give the same resultset named exactly the same, SQL cannot identify which one to take values from. You need to give different names to distinguish the result sets so SQL engine doesn't get confused.
Script 1: t1 and t2 are the alias names here
SELECT t1.col2
FROM table1 t1
INNER JOIN table1 t2
ON t1.col1 = t2.col1
When there is a derived table/sub query output
If a person doesn't have a name, you call them and since you can't call that person, they won't respond to you. Similarly, when you generate a derived table output or sub query output, it is something unknown to the SQL engine and it won't what to call. So, you need to give a name to the derived output so that SQL engine can appropriately deal with that derived output.
Script 2: t1 is the alias name here.
SELECT col1
FROM
(
SELECT col1
FROM table1
) t1
The only time it is REQUIRED to provide an alias is when you reference the table multiple times and when you have derived outputs (sub-queries acting as tables) (thanks for catching that out Siva). This is so that you can get rid of ambiguities between which table reference to use in the rest of your query.
To elaborate further, in your example:
SELECT subject
from items
join purchases on items.folder_id=purchases.item_id
join purchases on items.date=purchases.purchase_date
group by folder_id
My assumption is that you feel that each join and its corresponding on will use the correlating table, however you can use whichever table reference you want. So, what happens is that when you say on items.date=purchases.purchase_date, the SQL engine gets confused as to whether you mean the first purchases table, or the second one.
By adding the alias, you now get rid of the ambiguities by being more explicit. The SQL engine can now say with 100% certainty which version of purchases that you want to use. If it has to guess between two equal choices, then it will always throw an error asking for you to be more explicit.
It is required to give them a name when the same table is used twice in a query. In your case, the query wouldn't know what table to choose purchases.purchase_date from.
In this case it's simply that you've specified purchases twice and the SQL engine needs to be able to refer to each dataset in the join in a unique way, hence the alias is needed.
As a side point, do you really need to join into purchases twice? Would this not work:
SELECT
subject
from
items
join purchases
on items.folder_id=purchases.item_id
and items.date=purchases.purchase_date
group by folder_id
The alias are necessary to disambiguate the table from which to get a column.
So, if the column's name is unique in the list of all possible columns available in the tables in the from list, then you can use the coulmn name directly.
If the column's name is repeated in several of the tables available in the from list, then the DB server has no way to guess which is the right table to get the column.
In your sample query all the columns names are duplicated because you're getting "two instances" of the same table (purchases), so the server needs to know from which of the instance to take the column. SO you must specify it.
In fact, I'd recommend you to always use an alias, unless there's a single table. This way you'll avoid lots of problems, and make the query much more clear to understand.
You can't use the same table name in the same query UNLESS it is aliased as something else to prevent an ambiguous join condition. That's why its not allowed. I should note, it's also better to use always qualify table.field or alias.field so other developers behind you don't have to guess which columns are coming from which tables.
When writing a query, YOU know what you are working with, but how about the person behind you in development. If someone is not used to what columns come from what table, it can be ambiguous to follow, especially out here at S/O. By always qualifying by using the table reference and field, or alias reference and field, its much easier to follow.
select
SomeField,
AnotherField
from
OneOfMyTables
Join SecondTable
on SomeID = SecondID
compare that to
select
T1.SomeField,
T2.AnotherField
from
OneOfMyTables T1
JOIN SecondTable T2
on T1.SomeID = T2.SecondID
In these two scenarios, which would you prefer reading... Notice, I've simplified the query using shorter aliases "T1" and "T2", but they could be anything, even an acronym or abbreviated alias of the table names... "oomt" (one of my tables) and "st" (second table). Or, as something super long as has been in other posts...
Select * from ContractPurchaseOffice_AgencyLookupTable
vs
Select * from ContractPurchaseOffice_AgencyLookupTable AgencyLkup
If you had to keep qualifying joins, or field columns, which would you prefer looking at.
Hope this clarifies your question.

SQL Server join and wildcards

I want to get the results of a left join between two tables, with both having a column of the same name, the column on which I join. The following query is seen as valid by the import/export wizard in SQL Server, but it always gives an error. I have some more conditions, so the size wouldn't be too much. We're using SQL Server 2000 iirc and since we're using an externally developed program to interact with the database (except for some information we can't retrieve that way), we can not simply change the column name.
SELECT table1.*, table2.*
FROM table1
LEFT JOIN table2 ON table1.samename = table2.samename
At least, I think the column name is the problem, or am I doing something else wrong?
Do more columns than just your join key have the same name? If only your join key has the same name then simply select one of them since the values will be equivalent except for the non-matching rows (which will be NULL). You will have to enumerate all your other columns from one of the tables though.
SELECT table2.samename,table1.othercolumns,table2.*
FROM table1
LEFT JOIN table2 ON table1.samename = table2.samename
You may need to explicitly list the columns from one of the tables (the one with less fields), and leave out the 2nd instance of what would be the duplicate field..
select Table1.*, {skip the field Table2.sameName} Table2.fld2, Table2.Fld3, Table2.Fld4... from
Since its a common column, it APPEARS its trying to create twice in the result set, thus choking your process.
Since you should never use select *, simply replace it with the column names of the columns you want. THe join column has the same value (or null) in both sides of the join, so only select one of themm the one from table1 which will always have the value.
If you want to select all the columns from both tables just use Select * instead of including the tables separately. That will however leave you with duplicate column names in the result set, so even reading them out by name will not work and reading them by index will give inconsistent results, as changing the columns in the database will change the resultset, breaking any code depending on the ordinals of the columns.
Unfortunately the best solution is to specify exactly the columns you need and create aliases for the duplicates so they are unique.
I quickly get the column headings by setting the query to text mode and copying the top row ...

When is a good situation to use a full outer join?

I'm always discouraged from using one, but is there a circumstance when it's the best approach?
It's rare, but I have a few cases where it's used. Typically in exception reports or ETL or other very peculiar situations where both sides have data you are trying to combine.
The alternative is to use an INNER JOIN, a LEFT JOIN (with right side IS NULL) and a RIGHT JOIN (with left side IS NULL) and do a UNION - sometimes this approach is better because you can customize each individual join more obviously (and add a derived column to indicate which side is found or whether it's found in both and which one is going to win).
I noticed that the wikipedia page provides an example.
For example, this allows us to see
each employee who is in a department
and each department that has an
employee, but also see each employee
who is not part of a department and
each department which doesn't have an
employee.
Note that I never encountered the need of a full outer join in practice...
I've used full outer joins when attempting to find mismatched, orphaned data, from both of my tables and wanted all of my result set, not just matches.
Just today I had to use Full Outer Join. It is handy in situations where you're comparing two tables. For example, the two tables I was comparing were from different systems so I wanted to get following information:
Table A has any rows that are not in Table B
Table B has any rows that are not in Table A
Duplicates in either Table A or Table B
For matching rows whether values are different (Example: The table A and Table B both have Acct# 12345, LoanID abc123, but Interest Rate or Loan Amount is different
In addition, I created an additional field in SELECT statement that uses a CASE statement to 'comment' why I am flagging this row. Example: Interest Rate does not match / The Acct doesn't exist in System A, etc.
Then saved it as a view. Now, I can use this view to either create a report and send it to users for data correction/entry or use it to pull specific population by 'comment' field I created using a CASE statement (example: all records with non-matching interest rates) in my stored procedure and automate correction, etc.
If you want to see an example, let me know.
The rare times i have used it has been around testing for NULLs on both sides of the join in case i think data is missing from the initial INNER JOIN used in the SQL i'm testing on.
They're handy for finding orphaned data but I rarely use then in production code. I wouldn't be "always discouraged from using one" but I think in the real world they are less frequently the best solution compared to inners and left/right outers.
In the rare times that I used Full Outer Join it was for data analysis and comparison purpose such as when comparing two customers tables from different databases to find out duplicates in each table or to compare the two tables structures, or to find out null values in one table compared to the other, or finding missing information in one tables compared to the other.
For example, suppose you have two tables: one containing customer data and another containing order data. A full outer join would allow you to see all customers and all orders, even if some customers have no orders or some orders have no corresponding customer. This can help you identify any gaps in the data and ensure that all relevant information is included in the result set.
It's important to note that a full outer join can produce a huge result set since it includes all rows from both tables. This can be inefficient in terms of performance, so it's best to use a full outer join only when it is necessary to include all rows from both tables.
SELECT *
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name;
This will return all rows from both table1 and table2, filling in NULL values for missing matches on either side.