How to join on multiple conditions in SQL? - sql

I have two datasets, and I need to be able to join on null column values
*there are many null values in the 'Model' column for this table
Customer
Brand
Model
Bill
Nike
Dunk
Kayla
Adidas
Shoe 2
Max
Nike
2)
SaleID
Customer
Brand
Model
1234
Mike
Puma
X3
5678
Bill
Nike
Dunk
7433
Max
Nike
I want to join entire rows from table one with table 2, where essentially all three values in a row from table 1 act as a single record to be joined with table 2. (Bill, Nike, Dunk) is one value essentially
So far I have tried:
create table blank as
select columns,
columns,
columns,
from table 1 as x
left join table 2 as y
on x.customer=y.customer and x.brand=y.brand and x.model=y.model
;
quit;
The problem I am running into with this code is the join only includes rows where the 'Model' column is not null. There are many sales IDs with null 'Model' and I would like to be able to join these records.
For example, the final output of joining these tables with my code is:
| SaleID | Customer | Brand | Model |
| 5678 | Bill | Nike | Dunk |
Where I would like for there to be a record for Max, but since there is a null value in that column in that record it is not joined

SAS itself would match those records (unless the values of MODEL in one or the other dataset is not actually all blanks but includes some other invisible characters) because SAS uses strictly binary logic. A=B is either TRUE or FALSE.
But most external database systems (Oracle, etc.) use TRI-level logic for comparisons. A=B will be neither TRUE nor FALSE when either A or B is a NULL value.
So you need to explicitly account for the NULL values in the test condition.
create table want as
select y.saleid
, x.*
from table1 x
left join table2 y
on x.customer=y.customer
and x.brand=y.brand
and (x.model=y.model or (x.model is null and y.model is null))
;

Related

Left join returns null, even though there is a matching value

2 tables. I match on column ProductIdentifier which is present in both tables. The left table is a big table with many records and ProductIdentifier is not unique and the same value will be present multiple times. The right table has only limited records and a maximum of 1 matching value (this column is unique).
I execute the following query.
select distinct `upe`.`ProductIdentifier` AS `ProductIdentifier`,
`pmt`.`ProductIdentifier` AS `PMT_ID`
from `prices`.`unassigned_price_entries` `upe`
left join `prices`.`product_match_table` `pmt`
on (`pmt`.`ProductIdentifier` = `upe`.`ProductIdentifier`)
where `upe`.`ProductIdentifier` like '%Brand%'
Basically, all works well, except for one thing. The result is like this:
ProductIdentifier | PMT_ID
----------------------------
Brand A | Brand A
Brand A | Brand A
Brand B | Brand B
Brand B | NULL
I don't understand. It can match Brand B obviously but doesn't do it the second time. It does however for A.
This would occur if the two values for upe.ProductIdentifier did not have the same value. They might look the same but be different, for various reasons:
Leading or trailing spaces.
Hidden characters in the string.
Look-alike characters in some collations.
If you use where upe.ProductIdentifier = 'Brand B', you will probably get only one of the rows (there is a possibility that neither would match).

Transform Row Values to Column Names

I have a table of customer contacts and their role. Simplified example below.
customer | role | userid
----------------------------
1 | Support | 123
1 | Support | 456
1 | Procurement | 567
...
desired output
customer | Support1 | Support2 | Support3 | Support4 | Procurement1 | Procurement2
-----------------------------------------------------------------------------------
1 | 123 | 456 | null | null | 567 | null
2 | 123 | 456 | 12333 | 45776 | 888 | 56723
So dynamically create number of required columns based on how many user are in that role. It's a small number of roles. Also I can assume max 5 user in that same role. Which means worst case I need to generate 5 columns for each role. The userids don't need to be in any particular order.
My current approach is getting 1 userid per role/customer. Then a second query pulls another id that wasn't part of first results set. And so on. But that way I have to statically create 5 queries. It works. But I was wondering whether there is a more efficient way? Dynamically creating needed columns.
Example of pulling one user per role:
SELECT customer,role,
(SELECT top 1 userid
FROM temp as tmp1
where tmp1.customer=tmp2.customer and tmp1.role=tmp2.role
) as userid
FROM temp as tmp2
group by customer,role
order by customer,role
SQL create with dummy data
create table temp
(
customer int,
role nvarchar(20),
userid int
)
insert into temp values (1,'Support',123)
insert into temp values (1,'Support',456)
insert into temp values (1,'Procurement',567)
insert into temp values (2,'Support',123)
insert into temp values (2,'Support',456)
insert into temp values (2,'Procurement',888)
insert into temp values (2,'Support',12333)
insert into temp values (2,'Support',45776)
insert into temp values (2,'Procurement',56723)
You may need to adapt your approach slightly if you want to avoid getting into the realm of programming user defined table functions (which is what you would need in order to generate columns dynamically). You don't mention which SQL database variant you are using (SQL Server, PostgreSQL, ?). I'm going to make the assumption that it supports some form of string aggregation feature (they pretty much all do), but the syntax for doing this will vary, so you will probably have to adjust the code to your circumstances. You mention that the number of roles is small (5-ish?). The proposed solution is to generate a comma-separated list of user ids, one for each role, using common table expressions (CTEs) and the LISTAGG (variously named STRING_AGG, GROUP_CONCAT, etc. in other databases) function.
WITH tsupport
AS (SELECT customer,
Listagg(userid, ',') AS "Support"
FROM temp
WHERE ROLE = 'Support'
GROUP BY customer),
tprocurement
AS (SELECT customer,
Listagg(userid, ',') AS "Procurement"
FROM temp
WHERE ROLE = 'Procurement'
GROUP BY customer)
--> tnextrole...
--> AS (SELECT ... for additional roles
--> Listagg...
SELECT a.customer,
"Support",
"Procurement"
--> "Next Role" etc.
FROM tsupport a
JOIN tprocurement b
ON a.customer = b.customer
--> JOIN tNextRole ...
Fiddle is here with a result that appears as below based on your dummy data:

Left join - two tables with the same data

Lets say that that I have two simple tables with the following columns and data:
Table 1 Table 2
year month year month
2017 01 2017 01
2016 12 2016 12
The primary key is a composite key that consists of the year and the month.
So a classical left join, gives me all the data in the left table with the matching rows in the right table.
If I do a left join like this:
select
t1.year, t2.month
from
table1 t1
left join table 2 t2 on (t1.year = t2.year and t1.month = t2.month)
Why do I get only two rows?? Shouldn't I get 4 rows??
Tnx,
Tom
A classical left join will give you the number of rows in the "Left Table" (the one in from) multiplied by the number of matches in the "Right Table" (the one in LEFT JOIN in this case), plus all the rows in the LEFT Table that have no match in the first table.
Number of rows in LEFT Table = 2
Number of matches in Right Table = 1
Number of rows in LEFT Table withouth matches = 0
2 x 1 + 0 = 2
Edit: Actually the multiplication is given for each row. Would be something like
Sum (row_i x matches_i) + unmatched
Where row_i is means each row, and matches_i to the matches for the i row in the first table. The difference with this is that each row could have different number of matches (the previous formula is only adapted to your case)
This will result in
1 (row1) x 1 (matches for row 1) + 1 (row2) x 1 (matches for row 2) +
0 (unmatched rows in table 1) = result
1x1 + 1x1 + 0 = result
1 + 1 = 2 = result
If you expected 4 rows maybe you wanted to get a Cartesian Product. As the comment stated, you can use Cross Join in that case
When you join tables together, you're essentially asking the database to combine data from two different tables and display it as a single record. When you perform a left join, you are saying:
Give me all the rows from Table1, as well as any associated data from
Table2 (if it exists).
In this sense, the data from Table2 doesn't represent separate or additional records to Table1 (even though they are stored as separate records in a separate table), it represents associated data. You are linking the data between the tables, not appending rows from each table.
Imagine that Table1 stored people, and Table2 stored phone numbers.
Table1 Table2
+------+-------+--------+ +------+-------+-------------+
| Year | Month | Person | | Year | Month | Phone |
+------+-------+--------+ +------+-------+-------------+
| 2017 | 12 | Bob | | 2017 | 12 | 555-123-4567|
| 2016 | 01 | Frank | | 2016 | 01 | 555-234-5678|
+------+-------+-------+ +------+-------+--------------+
You could join them together to get a list of people and their corresponding phone numbers. But you wouldn't expect to get a combination of rows from each table (two rows of people and two rows of phone numbers).
You will get two rows as both the columns have 2 rows that match exactly the sam and its a composite key.
It will make the same way if you had 4 rows in each you will only get 4 rows in total.
The Left Join takes Table1 (t1) as the Left table.
It searches for and retrieves all values from the Right ie:- from Table 2 (t2) matching the criteria T1.Year&Month = T2.Year&Month (alias GOD/s) as well as the additional join condition T1.Month=T2.Month. The result is that only 2 rows from T1 match the join criteria as well as the additional join criteria
Another takeaway : The AND T1.Month=T2.Month condition on the left join is redundant as the composite GOD key takes care of it explicitly.
cross join returns every row you can make by combining a row from each argument. (inner) join on returns the rows from cross join that satisfy its condition. Ie (inner) join on returns every row you can make that combines a row from each argument and that satisfies its condition.
left join on returns the rows from (inner) join on plus the rows you can make by extending unjoined left argument rows by null for columns of the right argument.
Notice that this is regardless of primary keys, unique column sets, foreign keys or any other constraints.
Here there are 2 rows in each argument so there are 2 X 2 = 4 rows in the cross join. But only 2 meet the condition--the ones where a row is combined with itself.
(If you left join a table with itself where the condition is the conjunction of one or more equalities of the left and right versions of a column and there are no nulls in those columns then every left argument row gets joined with at least itself from the right argument. So there are no unjoined left argument rows. So only the rows of the (inner) join on are returned.)

update a single column with join lookups

I have a table adjustments with columns adjustable_id | adjustable_type | order_id
order_id is the target column to fill with values, this value should come from another table line_items which has a order_id column.
adjustable_id (int) and _type (varchar) references that table.
table: adjustments
id | adjustable_id | adjustable_type | order_id
------------------------------------------------
100 | 1 | line_item | NULL
101 | 2 | line_item | NULL
table: line_items
id | order_id | other | columns
--------------------------------
1 | 10 | bla | bla
2 | 20 | bla | bla
In the case above I guess I need a join query to update adjustments.order_id first row with value 10, second row with 20 and so on for the other rows using Postgres 9.3+.
In case the lookup fails, I need to delete invalid adjustments rows, for which they have no corresponding line_items.
There are two ways to do this. The first one using a co-related sub-query:
update adjustments a
set order_id = (select lorder_id
from line_items l
where l.id = a.adjustable_id)
where a.adjustable_type = 'line_item';
this is standard ANSI SQL as standard SQL does not define a join condition for the UPDATE statement.
The second way is using a join, which is a Postgres extension to the SQL standard (other DBMS also support that but with different semantics and syntax).
update adjustments a
set order_id = l.order_id
from line_items l
where l.id = a.adjustable_id
and a.adjustable_type = 'line_item';
The join is probably the faster one. Note that both versions (especially the first one) will only work if the join between line_items and adjustments will always return exactly one row from the line_items table. If that is not the case they will fail.
The reason why Arockia's query was "eating your RAM" is that his/her query creates a cross-join between table1 and table1 which is then joined against table2.
The Postgres manual contains a warning about that:
Note that the target table must not appear in the from_list, unless you intend a self-join
update a set A.name=B.name from table1 A join table2 B on
A.id=B.id

MS SQL Studio 2008 Query - If I am expecting 100 results but only receive 95, what is an easy way to determine what records are missing?

I have been getting requests that require me to query up to 1,000 account numbers against a several different tables.
I am looking for an easy way to determine which account numbers are not found in the tables I am querying.
For Example:
select
a.account#
a.date
a.amount
from
transactiontable as A
where
a.account# in ('1','2','3','4')
If account# 3 is not in this table, the account is not shown at all and my result will look like:
Account# | Date | Amount
--------------------------
1 | 8/31 | $2.50
2 | 8/31 | $7.25
4 | 8/31 | $0.63
With only 4 account numbers, its easy to determine what one is missing. With 1,000+ account numbers is can be very difficult if not impossible to find out which are missing. I can't use a "NOT IN" clause as that will return tens of thousands of records I am not looking for.
I've experimented doing a variety of joins with a master table that has all account numbers, but have had no success.
Is there a quick way in sql studio to determine what account numbers are missing? Or is there a way to code the query to get a result that looks more like this?
Account# | Date | Amount
--------------------------
1 | 8/31 | $2.50
2 | 8/31 | $7.25
3 | NULL | NULL
4 | 8/31 | $0.63
Or is there a way to code the query to get a result that looks more like this?
OUTER JOIN your transaction table with your Accounts table. SELECT your Account# from the Accounts table, and your remaining fields from your transactions table. This will produce records for every Account, but null data for those accounts with no corresponding transactions.
Something like this should work. Use your text editor of choice to build the first section of the statement.
DECLARE #accountsTable TABLE (AccountId INT)
INSERT INTO #accountsTable VALUES (1)
INSERT INTO #accountsTable VALUES (2)
INSERT INTO #accountsTable VALUES (3)
INSERT INTO #accountsTable VALUES (4)
SELECT a.AccountId, t.*
FROM #accountsTable a
LEFT OUTER JOIN transactionTable t ON a.AccountId = t.AccountId