Dynamic columns in sql Join condition - sql

Customer
customer_id | customer_name | customer_city | customer_number
---------------------------------------------------------------
1 | john | sanjose | 978234
2 | chris | newyork | 293
3 | mary | madrid | 342943
4 | tom | bangkok | 8627093
---------------------------------------------------------------
Data
data_id | data_name | data_city | data_number | data_cust_id | customer_id
--------------------------------------------------------------------------------------------
1 | abc | xyz | 990 | 1 | NULL
2 | john | sanjose | 978234 | 1 | NULL
3 | mary | madrid | 8627093 | 3 | NULL
4 | tom | LA | 7729 | 4 | NULL
ActionType
action_id | action_description
-----------------------------------
1 | customer_name
2 | customer_number
3 | customer_city
DataToAction
id | data_id | action_id
--------------------------
1 | 1 | 1
2 | 1 | 2
4 | 2 | 1
5 | 2 | 2
6 | 2 | 3
7 | 3 | 1
8 | 3 | 2
9 | 4 | 1
There are 4 tables -
Customer - Has customer datails
Data - Raw data pulled from an external source (has customer data and others)
ActionType - Has the column names which will be used in a join condition
DataToAction - For each of the raw data row in Data table, the columns to be used in the join is specified here.
Objective - To populate customer_id in 'Data' table.
I need something like this
UPDATE D
SET D.customer_id = C.customer_id
FROM Data D
INNER JOIN Customer C on D.data_cust_id = C.customer_id
WHERE *("GET THE COLUMNS TO BE MATCHED FROM DATATOACTION TABLE AND USE HERE")*
For eg., for Data id 1, i will update customer_id based on customer_name & customer_number, for data id 2 i will udpate customer_id based on customer_name, customer_number & customer_city and so on.
How do I apply the dynamic column conditions in the where clause for each of the row wherein the columns to be matched are specified in a different table.

Well the question is quite unclear. Can u elaborate the final resulset.
Purpose of ActionType table??
UPDATE D
SET D.customer_id = C.customer_id
FROM Data D
INNER JOIN Customer C on D.data_cust_id = C.customer_id
INNER JOIN DataToAction DA ON DA.data_id = D.data_id

Related

Select from a concatenation of two columns after a left join

Problem description
Let the tables C and V have those values
>> Table V <<
| UnID | BillID | ProductDesc | Value | ... |
| 1 | 1 | 'Orange Juice' | 3.05 | ... |
| 1 | 1 | 'Apple Juice' | 3.05 | ... |
| 1 | 2 | 'Pizza' | 12.05 | ... |
| 1 | 2 | 'Chocolates' | 9.98 | ... |
| 1 | 2 | 'Honey' | 15.98 | ... |
| 1 | 3 | 'Bread' | 3.98 | ... |
| 2 | 1 | 'Yogurt' | 8.55 | ... |
| 2 | 1 | 'Ice Cream' | 7.05 | ... |
| 2 | 1 | 'Beer' | 9.98 | ... |
| 2 | 2 | 'League of Legends RP' | 40.00 | ... |
>> Table C <<
| UnID | BillID | ClientName | ... |
| 1 | 1 | 'Alexander' | ... |
| 1 | 2 | 'Tom' | ... |
| 1 | 3 | 'Julia' | ... |
| 2 | 1 | 'Tom' | ... |
| 2 | 2 | 'Alexander' | ... |
Table C have the values of each product, which is associated with a bill number. Table V has the relationship between the client name and the bill number. However, the bill number has a counter that is dependent on the UnId, which is the store unity ID. That being said, each store has it`s own Bill number 1, number 2, etc. Also, the number of bills from each store are not equal.
Solution description
I'm trying to make select between the C left join V without sucess. Because each BillID is dependent on the UnID, I have to make the join considering the concatenation between those two columns.
I've used this script, but it gives me an error.
SELECT
SUM(C.Value),
V.ClientName
FROM
C
LEFT JOIN
V
ON
CONCAT(C.UnID, C.BillID) = CONCAT(V.UnID, V.BillID)
GROUP BY
V.ClientName
and SQL server returns me this 'CONCAT' is not a recognized built-in function name.
I'm using Microsoft SQL Server 2008 R2
Is the use of CONCAT wrong? Or is it the way I tried to SELECT? Could you give me a hand?
[OBS: The tables I've present you are just for the purpose of explaining my difficulties. That being said, if you find any errors in the explanation, please let me know to correct them.]
You should be joining on the equality of the UnID and BillID columns in the two tables:
SELECT
c.ClientName,
COALESCE(SUM(v.Value), 0) AS total
FROM C c
LEFT JOIN V v
ON c.UnID = v.UnID AND
c.BillID = v.BillID
GROUP BY
c.ClientName;
In theory you could try joining on CONCAT(UnID, BillID). However, you could run into problems. For example, UnID = 1 with BillID = 23 would, concatenated together, be the same as UnID = 12 and BillID = 3.
Note: We wrap the sum with COALESCE, because should a given client have no entries in the V table, the sum would return NULL, which we then replace with zero.
concat is only available in sql server 2012.
Here's one option.
SELECT
SUM(C.Value),
V.ClientName
FROM
C
LEFT JOIN
V
ON
cast(C.UnID as varchar(100)) + cast(C.BillID as varchar(100)) = cast(V.UnID as varchar(100)) + cast(V.BillID as varchar(100))
GROUP BY
V.ClientName

SQL: Cascading conditions on Join

I have found a few similar questions to this on SO but nothing which applies to my situation.
I have a large dataset with hundreds of millions of rows in Table 1 and am looking for the most efficient way to run the following query. I am using Google BigQuery but I think this is a general SQL question applicable to any DBMS?
I need to apply an owner to every row in Table 1. I want to join in the following priority:
1: if item_id matches an identifier in Table 2
2: if no item_id matches try match on item_name
3: if no item_id or item_name matches try match on item_division
4: if no item_division matches, return null
Table 1 - Datapoints:
| id | item_id | item_name | item_division | units | revenue
|----|---------|-----------|---------------|-------|---------
| 1 | xyz | pen | UK | 10 | 100
| 2 | pqr | cat | US | 15 | 120
| 3 | asd | dog | US | 12 | 105
| 4 | xcv | hat | UK | 11 | 140
| 5 | bnm | cow | UK | 14 | 150
Table 2 - Identifiers:
| id | type | code | owner |
|----|---------|-----------|-------|
| 1 | id | xyz | bob |
| 2 | name | cat | dave |
| 3 | division| UK | alice |
| 4 | name | pen | erica |
| 5 | id | xcv | fred |
Desired output:
| id | item_id | item_name | item_division | units | revenue | owner |
|----|---------|-----------|---------------|-------|---------|-------|
| 1 | xyz | pen | UK | 10 | 100 | bob | <- id
| 2 | pqr | cat | US | 15 | 120 | dave | <- code
| 3 | asd | dog | US | 12 | 105 | null | <- none
| 4 | xcv | hat | UK | 11 | 140 | fred | <- id
| 5 | bnm | cow | UK | 14 | 150 | alice | <- division
My attempts so far have involved multiple joining the table onto itself and I fear it is becoming hugely inefficient.
Any help much appreciated.
Another option for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_AGG(a)[OFFSET(0)].*,
ARRAY_AGG(owner
ORDER BY CASE
WHEN type = 'id' THEN 1
WHEN type = 'name' THEN 2
WHEN type = 'division' THEN 3
END
LIMIT 1
)[OFFSET(0)] owner
FROM Datapoints a
JOIN Identifiers b
ON (a.item_id = b.code AND b.type = 'id')
OR (a.item_name = b.code AND b.type = 'name')
OR (a.item_division = b.code AND b.type = 'division')
GROUP BY a.id
ORDER BY a.id
It leaves out entries which k=have no owners - like in below result (id=3 is out as it has no owner)
Row id item_id item_name item_division units revenue owner
1 1 xyz pen UK 10 100 bob
2 2 pqr cat US 15 120 dave
3 4 xcv hat UK 11 140 fred
4 5 bnm cow UK 14 150 alice
I am using the following query (thanks #Barmar) but want to know if there is a more efficient way in Google BigQuery:
SELECT a.*, COALESCE(b.owner,c.owner,d.owner) owner FROM datapoints a
LEFT JOIN identifiers b on a.item_id = b.code and b.type = 'id'
LEFT JOIN identifiers c on a.item_name = c.code and c.type = 'name'
LEFT JOIN identifiers d on a.item_division = d.code and d.type = 'division'
I'm not sure if BigQuery optimizes today a query like this - but at least you would be writing a query that gives strong hints to not run the subqueries when not needed:
#standardSQL
SELECT COALESCE(
null
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.login=a.user)
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.id = SAFE_CAST(user AS INT64))
)
FROM (SELECT '15229281' user) a
4.2s elapsed, 683 GB processed
{"action":"started"}
For example, the following query took a long time to run, but BigQuery could optimize its execution massively in the future (depending on how frequently users needed an operation like this):
#standardSQL
SELECT COALESCE(
"hello"
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.login=a.user)
, (SELECT MIN(payload)
FROM `githubarchive.year.2016`
WHERE actor.id = SAFE_CAST(user AS INT64))
)
FROM (SELECT actor.login user FROM `githubarchive.year.2016` LIMIT 10) a
114.7s elapsed, 683 GB processed
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello

Only include grouped observations where event order is valid

I have a table of dates for eye exams and eye wear purchases for individuals. I only want to keep instances where individuals bought their eye wear following an eye exam. In the example below, I would want to keep person 1, events 2 and 3 for person 2, person 3, but not person 4. How can I do this in SQL server?
| Person | Event | Order |
| 1 | Exam | 1 |
| 1 | Eyewear| 2 |
| 2 | Eyewear| 1 |
| 2 | Exam | 2 |
| 2 | Eyewear| 3 |
| 3 | Exam | 1 |
| 3 | Eyewear| 2 |
| 4 | Eyewear| 1 |
| 4 | Exam | 2 |
The final result would look like
| Person | Event | Order |
| 1 | Exam | 1 |
| 1 | Eyewear| 2 |
| 2 | Exam | 2 |
| 2 | Eyewear| 3 |
| 3 | Exam | 1 |
| 3 | Eyewear| 2 |
Self join should work...
select
t.Person
,t.Event
,t.[Order]
from
yourTable t
inner join
yourTable t2 on t2.Person = t.Person
and t2.[Order] = (t.[Order] +1)
where
t2.Event = 'Eyewear'
and t.Event = 'Exam'
I haven't tried to optimize it but this seems to work:
create table t(
person varchar(10),
event varchar(10),
[order] varchar(10)
);
insert into t values
('1','Exam','1'),
('1','Eyewear','2'),
('2','Eyewear','1'),
('2','Exam','2'),
('2','Eyewear','3'),
('3','Exam','1'),
('3','Eyewear','2'),
('4','Eyewear','1'),
('4','Exam','2');
with xxx(person,event_a,seq_a,event_b,seq_b) as (
select a.person,a.event,a.[order],b.event,b.[order]
from t a join t b
on a.person = b.person
and a.[order] < b.[order]
and a.event like 'exam'
and b.event like 'eyewear'
)
select person,event_a event,seq_a [order] from xxx
union
select person,event_b event,seq_b [order] from xxx
order by 1,3

Check if relation exists and return true or false

I have 3 tables, Category Step and CategoryStep, where CategoryStep relates the two other tables together. I want to return all categories with a true/false column whether or not the relation exists in CategoryStep based on a StepID.
The schema for the tables is simple,
Category:
CategoryID | CategoryName
Step:
StepID | StepName
CategoryStep:
CategoryStepID | CategoryID | StepID
When trying to get results based on StepID, I only get the relations that exist, and not ones that don't.
SELECT [CategoryID], [Category], CAST(CASE WHEN [CategoryStep].[CategoryStep] IS NULL THEN 0 ELSE 1 END AS BIT) AS related
FROM Category
LEFT JOIN CategoryStep ON Category.CategoryID = CategoryStep.CategoryID
INNER JOIN Step ON CategoryStep.StepID = Step.StepID
WHERE Step.StepID = 2
Step Table:
|StepID | StepName
|-------|---------
| 1 | StepOne
| 2 | StepTwo
| 3 | StepThree
Category Table:
| CategoryID | CategoryName
|------------|-------------
| 1 | Holidays
| 2 | States
| 3 | Cities
| 4 | Animals
| 5 | Food
CategoryStep Table
| CategoryStepID | CategoryID | StepID
|----------------|------------|-------
| 1 | 1 | 1
| 2 | 1 | 2 <--
| 3 | 2 | 1
| 4 | 2 | 3
| 5 | 3 | 2 <--
| 6 | 4 | 1
| 7 | 4 | 2 <--
| 8 | 4 | 3
| 9 | 5 | 1
| 10 | 5 | 3
So, if I was looking for StepID = 2 the result table I am looking for is:
| CategoryID | Category | Related
|------------|----------|--------
| 1 | Holidays | 1
| 2 | States | 0
| 3 | Cities | 1
| 4 | Animals | 1
| 5 | Food | 0
Try replacing the INNER JOIN with a LEFT JOIN.
Update:
The fatal flaw with your original attempt was the WHERE clause. You were performing the correct LEFT JOIN, but the WHERE clause was filtering off category records which did not match. In the query below, I moved the check for step ID into the join condition, where it belongs.
SELECT [CategoryID], [Category],
CAST(CASE WHEN [CategoryStep].[CategoryStep] IS NULL THEN 0 ELSE 1 END AS BIT) AS related
FROM Category
LEFT JOIN CategoryStep
ON Category.CategoryID = CategoryStep.CategoryID AND
CategoryStep.StepCodeID = 2
LEFT JOIN Step
ON CategoryStep.StepID = Step.StepID

Getting Sum of MasterTable's amount which joins to DetailTable

I have two tables:
1. Master
| ID | Name | Amount |
|-----|--------|--------|
| 1 | a | 5000 |
| 2 | b | 10000 |
| 3 | c | 5000 |
| 4 | d | 8000 |
2. Detail
| ID |MasterID| PID | Qty |
|-----|--------|-------|------|
| 1 | 1 | 1 | 10 |
| 2 | 1 | 2 | 20 |
| 3 | 2 | 2 | 60 |
| 4 | 2 | 3 | 10 |
| 5 | 3 | 4 | 100 |
| 6 | 4 | 1 | 20 |
| 7 | 4 | 3 | 40 |
I want to select sum(Amount) from Master which joins to Deatil where Detail.PID in (1,2,3)
So I execute the following query:
SELECT SUM(Amount) FROM Master M INNER JOIN Detail D ON M.ID = D.MasterID WHERE D.PID IN (1,2,3)
Result should be 20000. But I am getting 40000
See this fiddle. Any suggestion?
You are getting exactly double the amount because the detail table has two occurences for each of the PIDs in the WHERE clause.
See demo
Use
SELECT SUM(Amount)
FROM Master M
WHERE M.ID IN (
SELECT DISTINCT MasterID
FROM DETAIL
WHERE PID IN (1,2,3) )
What is the requirement of joining the master table with details when you have all your columns are in Master table.
Also, isnt there any FK relationhsip defined on these tables. Looking at your data it seems to me that there should be FK on detail table for MasterId. If that is the case then you do not need join the table at all.
Also, in case you want to make sure that you have records in details table for the records for which you need sum and there is no FK relationship. Then you could give a try for exists instead of join.