BigQuery - Joining on multiple conditions using subqueries and OR statements - sql

Is there anyway to join two tables on multiple potential conditions?
I'm currently migrating some code from Postgres to Bigquery where I joined on multiple potential values like:
SELECT
*
FROM
(
SELECT
offer_table.offer_id
,customer_table.customer_name
,customer_table.visit_count
,ROW_NUMBER() OVER (PARTITION BY offer_table.offer_id ORDER BY customer_table.visit_count DESC) AS customer_visit_rank
FROM
offer_table
LEFT JOIN customer_table ON
(
offer_table.customer_id = customer_table.customer_id
OR offer_table.email = customer_table.email
OR offer_table.phone = customer_table.phone
)
) dummy
WHERE
customer_visit_rank = 1
I needed to this because my offer and customer data had inconsistent usage of our id, email, and phone fields but all were valid potential matches. If multiple fields worked (ex: id and email matched), there would be duplicate rows and I'd filter them out based on the row_number column after ranking using the ORDER BY section.
However when I try to join on multiple conditions in BigQuery, I get this error message:
LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
Has anyone figured out a solution to join on multiple values instead of doing the above?

You can write separate queries, then use COALESCE:
SELECT
*
FROM
(
SELECT
offer_table.offer_id
,COALESCE(c1.customer_name,c2.customer_name,c3.customer_name)
,COALESCE(c1.visit_count,c2.visit_count,c3.visit_count)
,ROW_NUMBER() OVER (PARTITION BY offer_table.offer_id ORDER BY customer_table.visit_count DESC) AS customer_visit_rank
FROM
offer_table
LEFT JOIN customer_table c1
ON offer_table.customer_id = customer_table.customer_id
LEFT JOIN customer_table c2
ON offer_table.email = customer_table.email
LEFT JOIN customer_table c3
ON offer_table.phone = customer_table.phone
)
) AS dummy
WHERE
customer_visit_rank = 1

Related

SQL Left Join - OR clause

I am trying to join two tables. I want to join where all the three identifiers (Contract id, company code and book id) are a match in both tables, if not match using contract id and company code and the last step is to just look at contract id
Can the task be performed wherein you join using all three parameters, if does not, check the two parameters and then just the contract id ?
Code:
SELECT *
INTO #prem_claim_wtauto_test
FROM #contract_detail A
LEFT JOIN #claim_total C
ON ( ( C.contract_id_2 = A.contract_id
AND C.company_cd_2 = A.company_cd
AND C.book_id_2 = A.book_id )
OR ( C.contract_id_2 = A.contract_id
AND C.company_cd_2 = A.company_cd )
OR ( C.contract_id_2 = A.contract_id ) )
Your ON clause boils down to C.contract_id_2 = A.contract_id. This gets you all matches, no matter whether the most precise match including company and book or a lesser one. What you want is a ranking. Two methods come to mind:
Join on C.contract_id_2 = A.contract_id, then rank the rows with ROW_NUMBER and keep the best ranked ones.
Use a lateral join in order to only join the best match with TOP.
Here is the second option. You forgot to tell us which DBMS you are using. SELECT INTO looks like SQL Server. I hope I got the syntax right:
SELECT *
INTO #prem_claim_wtauto_test
FROM #contract_detail A
OUTER APPLY
(
SELECT TOP(1) *
FROM #claim_total C
WHERE C.contract_id_2 = A.contract_id
ORDER BY
CASE
WHEN C.company_cd_2 = A.company_cd AND C.book_id_2 = A.book_id THEN 1
WHEN C.company_cd_2 = A.company_cd THEN 2
ELSE 3
END
);
If you want to join all rows in case of ties (e.g. many rows matching contract, company and book), then make this TOP(1) WITH TIES.

Can we select first row of data from column in sql?

I have a table with multiple data for same ID. I want to get the first row data for the ID.
I have added the below SQL that I have tried.
SELECT
"client"."id",
"client"."company_name",
"client_details"."address"
from Client
LEFT OUTER JOIN "client_details" ON ("client"."id" = "client_details"."client_id")
Since I have multiple address for the same ID, can we get only the first id?
Currently the output I get is 2 rows with different addresses.
You can add to your SQL LIMIT 1 and in case you want to be sure the order you can also add to your SQL ORDER BY...
You can use distinct on:
select distinct on (c.id) c.id, c.company_name, cd.address
from Client c left join
client_details cd
on c.id = cd.client_id
order by c.id, ?;
The ? is for the column that specifies the ordering (the definition of "first"). I am guessing that cd.id is what you want.
Note that this query removes the double quotes and introduces table aliases. This is easier on both the eyes (to read) and the fingers (to type).
use row_number()
select * from
(
SELECT
"client"."id",
"client"."company_name",
"client_details"."address",row_number() over(partition by "client"."id" order by "client_details"."address") as rn
from Client
LEFT OUTER JOIN "client_details" ON "client"."id" = "client_details"."client_id"
)A where rn=1
If there is a field you can order the results by you could use a lateral join e.g.
SELECT
"client"."id",
"client"."company_name",
"client_details"."address"
from Client
left join lateral (
select *
from client_details cd
where cd.client_id = client.id
order by [some_ordering_field]
limit 1
) "client_details" on true

Oracle Sql Duplicate rows when joining new table

I am using oracle sql to join tables. I use the following code:
SELECT
T.TRANSACTION_KEY,
PR.ACCOUNT_KEY,
T.ACCT_CURR_AMOUNT,
T.EXECUTION_LOCAL_DATE_TIME,
TC.DESCRIPTION,
T.OPP_ACCOUNT_NAME,
T.OPP_COUNTRY,
PT.PARTY_TYPE_DESC,
P.PARTY_NAME,
P.CUSTOM_SMALL_STRING_02,
CO.COUNTRY_NAME,
LE.LIST_CD
FROM TRANSACTIONS T
LEFT JOIN TRANSACTION_CODE TC
ON T.TRANSACTION_CODE = TC.ENTITY
LEFT JOIN PARTY_ACCOUNT_RELATION PR
ON T.ACCOUNT = PR.ACCOUNT
LEFT JOIN PARTY P
ON PR.PARTY_KEY = P.PARTY_KEY
LEFT JOIN PARTY_TYPE PT
ON P.PARTY_TYPE = PT.ENTITY
LEFT JOIN COUNTRY CO
ON T.OPP_COUNTRY = CO.ENTITY
LEFT JOIN LISTED_ENTITY LE
ON CO.COUNTRY = LE.ENTITY_KEY
WHERE
PR.PARTY_KEY = '111111111' and T.EXECUTION_LOCAL_DATE_TIME>'2017-01-01';
It works fine until now but I want to join another table which has a column in common(ENTITY_KEY) with PARTY_ACCOUNT_RELATION table (ACCOUNT_KEY) and I want to include some of the new table's columns but when I do that, it becomes dublicated. I am adding the following lines before "where" statment:
LEFT JOIN EVALUATE_RULE ER
ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
Does anyone know where the problem is?
If joining another table into an existing query causes the existing rows to be duplicated, it is because the table being joined in has duplicate values in the columns that are being used as keys for the join
In your case, if you do
SELECT ENTITY_KEY FROM EVALUATE_RULE GROUP BY ENTITY_KEY HAVING COUNT(*) > 1
You'll see which entity_keys are duplicated. When these duplicates are joined to the existing data, the existing data has to be doubled up to permit both rows from EVALUATE_RULE with the same ENTITY_KEY to exist in the result set
You must either de-dupe the table, or put other clauses into your ON condition to further restrict the rows coming from EVALUATE_RULE.
For example, after adding EVALUATE_RULE and putting ER.* in your SELECT list, imagine that you can see that the rows from ER are status = 'old' and status = 'current' but you know you only want the current ones.. So put AND er.status = 'current' in your ON clause
Your comment indicates that multiple records differ by some column you don't care about, so this technique will just select only one row:
LEFT JOIN
(SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.name) as rown FROM evaluate_rule e) er
ON
er.entity_key = pr.account_key and
er.rown = 1
If you want info on why this works, run that sql in isolation:
SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.name) as rown FROM evaluate_rule e
ORDER BY e.entity_key -- i added this to make it more clear what is going on. You don't need it in your main query
It just assigns a number to each row in the table, the number restarts at 1 every time entity_key changes, so we can then select all those with rown = 1
If it turns out you DO want something specific like "the latest row from evaluate_rule", you can use something like this:
SELECT e.*, ROW_NUMBER() OVER(PARTITION BY e.entity_key ORDER BY e.created_date DESC) as rown FROM evaluate_rule e
Now the latest created_date row will always have rown = 1
So far as I can understain from your description, table EVALUATE_RULE has moro records with ACCOUNT_KEY=ENTITY_KEY.
You can change your query section:
LEFT JOIN EVALUATE_RULE ER ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
to
LEFT JOIN (SELECT DISTINCT ENTITY_KEY FROM EVALUATE_RULE) ER ON PR.ACCOUNT_KEY = ER.ENTITY_KEY
If you post structure of EVALUATE_RULE (indicating PK columns) I can change my answer to let you includ EVALUATE_RULE columns in final query.

SQL pagination using INNER JOINs and filtering with LIKE

This query feeds a data table with sorting, filtering, and pagination. All features worked fine until I added the INNER JOIN and then i got:
The multi-part 'identifier "Types.Description" could not be bound
if i remove the second WHERE clause at the end of the query the LIKE statements work, but i lose pagination. I removed some of the LIKE clauses to try and clean up this monstrous query.
SELECT *
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY TAG asc) AS RowNumber, *
FROM (
SELECT (SELECT COUNT(*) FROM Instruments) AS TotalDisplayRows, (SELECT COUNT(*) FROM Instruments) AS TotalRows, Instruments.Tag, Instruments.Location, Instruments.Description, Types.Description As TypeDesc, Manufacturer.Name, Lease.Name as LeaseName, Facility.Name as FacName
FROM Instruments
INNER JOIN Types ON Instruments.Type = Types.ID
INNER JOIN Manufacturer ON Instruments.Manufacturer = Manufacturer.ID
INNER JOIN Facility ON Instruments.Facility = Facility.ID
INNER JOIN Lease ON Instruments.Lease = Lease.ID
WHERE (Types.Description LIKE '%Cat%')
) RawResults
) Results
WHERE (Types.Description LIKE '%Cat%') AND RowNumber BETWEEN 1 AND 10
I think this is your problem
WHERE (types.description LIKE '%Cat%')
You can't do this because you are actually selecting from your derived table named Results and you aliased the column as TypeDesc.
So it should be
WHERE (results.typeDesc LIKE '%Cat%')

Select last record out of grouped records

i have this code and i want someone to help me to change it to a grouped query which orders froms below.
SELECT *
FROM dbo.users_pics INNER JOIN profile
ON users_pics.email = profile.email
Left Join photo_comment
On users_pics.u_pic_id = photo_comment.pic_id
WHERE users_pics.wardrobe = MMColParam
ORDER BY u_pic_id asc
what i mean is i have grouped of records which i want to select one per record only from beneath. for example if i have 10 records of the name "John" i want to select the last "John" out of the 10 and then the rest also follows
I'm going to presume that your users table contains a single user, and each user has a single profile, and your photo_comment table can contain multiple comments.
Depending on your RDBMS, you can do this a number of ways. Row_Number can often be a quick way of doing this if you're using a database which supports window functions such as SQL Server or Oracle.
A generic solution to this is to join the table back to itself using the MAX aggregate. This is dependent on having a field to determine which record is the max. Generally speaking, that would be an identity/auto number field or a time stamp field.
Here is the basic concept using photo_comment_id as your determining column:
SELECT *
FROM dbo.users_pics INNER JOIN profile
ON users_pics.email = profile.email
LEFT Join (
SELECT pic_id, MAX(photo_comment_id) max_photo_comment_id
FROM max_photo_comment
GROUP BY pic_id
) max_photo_comment On users_pics.u_pic_id = max_photo_comment.pic_id
LEFT Join photo_comment On
max_photo_comment.pic_id = photo_comment.pic_id AND
max_photo_comment.max_photo_comment_id = photo_comment.photo_comment_id
WHERE users_pics.wardrobe = MMColParam
ORDER BY u_pic_id asc
If your database supports ROW_NUMBER, then you can do this as well (still using the photo_comment_id field):
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY photo_comment.pic_id
ORDER BY photo_comment.photo_comment_id DESC) rn
FROM dbo.users_pics INNER JOIN profile
ON users_pics.email = profile.email
LEFT JOIN photo_comment
ON users_pics.u_pic_id = photo_comment.pic_id
WHERE users_pics.wardrobe = MMColParam
) t
WHERE rn = 1
ORDER BY u_pic_id asc