SQL Server : joining/ merging tables - sql

Assume there are three total tables A,B, and C
Each table contains following attributes:
Table A: "date", "id", "tv_sales_amt"
Table B: "date", "id", "newspaper_sales_amt"
Table C: "date", "id", "radio_sales_amt"
Using join operators, I'm trying to achieve a single table view with:
Result table:
"date", "id", "tv_sales_amt", "newspaper_sales_amt", "radio_sales_amt"
Desired look of the result table:
date id tv_sales_amt newspaper_sales_amt radio_sales_amt
--------------------------------------------------------------------
20190101 012C 2000 1850 NULL
20190102 102D 1000 NULL 1300
.
.
.
Here are some queries that I've tried:
Query #1:
SELECT
A.date, A.id, tv_sales_amt, newspaper_sales_amt, radio_sales_amt
FROM A
INNER JOIN B ON A.id = B.id
INNER JOIN C ON A.id = C.id
Using inner inner-join, I get duplicated values, which is understandable but is not what I'm looking for.
Query #2:
SELECT
A.date, A.id, tv_sales_amt, newspaper_sales_amt, radio_sales_amt
FROM A
FULL OUTER JOIN B ON A.id = B.id
FULL OUTER JOIN C ON A.id = C.id
Since inner-join would only return results from Table B and C (newspaper_sales_amt and radio_sales_amt) that intersects with Table A, I tried full-outer-join in the hopes that it would give me overview of entire results, even though it includes null values.
With both options that I've tried, I wasn't able to get the expected result (Desired look described above).
Would someone be able to tell me what I'm doing wrong here?
I'm using the latest version of SQL Server Management Studio.
I do know that there must be lots of null values if I were to take an overview of tv_sales_amt, newspapers_sales_amt, and radio_sales_amt, but currently, there are no null values but with duplicates.

Your description implies that date also should be the part of joins.
select coalesce(a.date, b.date, c.date) [date],
coalesce(a.id, b.id, c.id) id,
a.tv_sales_amt, b.newspaper_sales_amt, c.radio_sales_amt
from tableA a
full join tableB b on a.id = b.id and a.date = b.date
full join tableC c on (c.id = b.id and c.date = b.date) or
(c.id = a.id and c.date = a.date);
EDIT: This is DbFiddle demo. Try removing coalesce() for date or id (removing in only one of them helps to see it better).
EDIT: Maybe this shows the need for coalesce() better:
select coalesce(a.date, b.date, c.date) [date],
a.id idA, b.id idB, c.id idC,
a.tv_sales_amt, b.newspaper_sales_amt, c.radio_sales_amt
from tableA a
full join tableB b on a.id = b.id and a.date = b.date
full join tableC c on (c.id = b.id and c.date = b.date) or
(c.id = a.id and c.date = a.date);

Full outer joins are tricky. I would suggest:
SELECT COALESCE(A.DATE, B.DATE, C.DATE) as date,
COALESCE(A.ID, B.ID, C.ID) as id,
A.tv_sales_amt, B.newspaper_sales_amt, C.radio_sales_amt
FROM A FULL JOIN
B
ON B.id = A.id AND
B.date = A.date FULL JOIN
C
ON C.id = COALESCE(A.id, B.id) AND
C.date = COALESCE(B.date, B.date);
I find that using COALESCE() in the ON clause simplifies adding more conditions. From a performance perspective, both COALESCE() and OR are pretty bad.

Related

Sql several rows greater than value of other table

My model is A as many B. These tables are related by A.id and B.item_id. B has a column called created_at. I need to select all rows from A such that B.created_at is greater than 2016-09-1`. I would like to know if it possible to solve that question using a query similar to this one:
SELECT * FROM A
WHERE (
(SELECT B.created_at
FROM B
WHERE B.item_id = A.id)
>= '2016-09-1' )
You can do this with a JOIN and selecting only from A:
Select A.*
From A
Join B On A.id = B.item_id
Where B.created_at > '2016-09-01'
need to select from A all rows such that B.created_at is greater than '2016-09-1'
select * from A where exists(SELECT 1
FROM B
WHERE B.item_id = A.id)
b.created_at>= '2016-09-1')
In general this can be done with an INNER JOIN containing your date filter criteria. Something like:
SELECT A.*
FROM A
INNER JOIN B
ON A.id = B.Iditem_id
AND B.created_at > '2016-09-01'
This would be a T-SQL method for doing this.
You can also apply a simple WHERE clause - you'd want to check the execution plan to see if either of these options make a real difference...
SELECT A.*
FROM A
INNER JOIN B
ON A.id = B.Iditem_id
WHERE B.created_at > '2016-09-01'
If you want to use a subset from table B first... you can do...
SELECT A.*
FROM A
INNER JOIN (
SELECT B.*
FROM B
WHERE B.created_at > '2016-09-01')
)Bsub
ON A.id = Bsub.Iditem_id
All of these would work with T-SQL, can't speak to other SQL-like languages.
select A.*
from A
left join B
on B.item_id=A.id
where B.created_at >= '2016-09-1'

How to find if LEFT JOIN joined an actual row, or placeholder NULL values?

Suppose I issue a query like this:
SELECT a.x, b.y FROM a LEFT JOIN b ON b.id = a.id
I also want to know if a row from b is actually joined or there are just placeholder NULL values supplied by LEFTJOIN. I guess I can determine it comparing values of a.id and b.id in the result, but is there a way to do this in the query itself?
I.e. I'd want something like
SELECT a.x, b.y, b_is_actually_joined FROM a LEFT JOIN b ON b.id = a.id
where values in the column b_is_actually_joined are 1 or 0 (for example).
Just check for NULL b.id:
SELECT a.x, b.y, b.id IS NOT NULL AS b_is_actually_joined
FROM a
LEFT JOIN b ON b.id = a.id
For Oracle SQL you can use NVL2 function:
SELECT a.id, b.*, NVL2(b.id, 1, 0) AS b_is_actually_joined
FROM a
LEFT JOIN b ON b.id = a.id
SQL Fiddle
This should work in Ms Sql Server:
select CAST((coalesce(b.id, 0)) as bit) as b_is_actually_joined FROM a
LEFT JOIN b ON b.id = a.id
I am unaware of a standard SQL solution for this

Using Summary Data as a Parameter in SQL

I'm new to SQL and am using Access to run queries that Excel can't really handle. Here's the basic design of the query:
SELECT A.ID, A.Description, A.Location, B.ID, B.Quantity, B.Location
FROM A LEFT JOIN B ON A.ID = B.ID
In table B, location is all the same value. I want to retain the left join above, but limit the resulting values in table A to whatever the location value is in column B. In my mind this would be a WHERE clause in which A.Location = max(B.Location) or something like that.
Any ideas?
If you want to limit the resulting values in table A to whatever the location value is in table B, why can't you simply use the join based on location also?
SELECT A.ID, A.Description, A.Location, B.ID, B.Quantity, B.Location
FROM A LEFT JOIN B
ON A.ID = B.ID
AND A.location = B.location
You can use a DMax expression to fetch the duplicated non-Null value of B.Location. And that expression can be used in the WHERE clause to limit A rows to only those with matching [Location]:
SELECT A.ID, A.Description, A.Location, B.ID, B.Quantity, B.Location
FROM A LEFT JOIN B ON A.ID = B.ID
WHERE A.Location = DMax("[Location]", "B");
If you prefer not to use DMax since it is Access-specific, you can do it this way instead:
SELECT A.ID, A.Description, A.Location, B.ID, B.Quantity, B.Location
FROM A LEFT JOIN B ON A.ID = B.ID
WHERE A.Location = (SELECT Max([Location]) FROM B);

Alternative to an outer join to a subquery?

Apparently outer-joins to a subquery are not allowed by Oracle. For each row on table A, I'm trying find the row on table B with the same ID, and latest date.
Something like this:
SELECT a.*, b.date, b.val1, b.val2
FROM a, b
WHERE b.id (+) = a.id
AND b.date (+) = (SELECT MAX(b.date) FROM a, b WHERE a.id = b.id);
Removing the outer join (+) on b.date allows it to be parsed, but no rows are returned when there are no rows on table B. I need the query to just return NULL in this case. Is there a way around this?
Thanks
I think what you want is this:
SELECT a.*, b.date, b.val1, b.val2
FROM a
LEFT JOIN b ON b.id = a.id
WHERE (b.date is null
or b.date = (SELECT MAX(b2.date) FROM b b2 WHERE a.id = b2.id));
This way, the outer join is just performed on id. Then we're filtering out all of the rows where b.date is not the max for the corresponding row in a.
As an aside, you'll note that I removed a from the sub-query. As originally written, the sub-query returned the largest date in b that had a corresponding row in a. The same value would be used for every row of the outer query. The revised version makes the sub-query correlate to the outer query (i.e. it will get the corresponding max(date) for each row returned).
I already voted for Allan's answer, but just to demonstrate an alternative approach, here's how it can be done with an analytic function:
SELECT * FROM (
SELECT a.*, b.date, b.val1, b.val2,
ROW_NUMBER() over (PARTITION BY a.id ORDER BY b.date DESC) r
FROM a LEFT JOIN b ON a.id=b.id
)
WHERE r=1
This will include only one row for each a.id, even if there are multiple b rows with the maximum date. To include all of them, change ROW_NUMBER to RANK.
How about a scalar subquery?
select a.*, (select max(b.date) from b where b.id = a.id) as b_date
from a;
Edit: You can save the max date to a variable
DECLARE #maxDate as datetime
SET #maxDate = (SELECT MAX(date) FROM b)
SELECT a.*, b.date, b.val1, b.val2
FROM a
LEFT OUTER JOIN b ON a.id = b.id
AND b.date = #maxDate
This may be more or less efficient than Allan's answer, depending on if A has many more rows than B (or vice-versa). If B has a ton of rows, then querying it twice (which my answer does) is probably not the best solution.

Simulate a left join without using "left join"

I need to simulate the left join effect without using the "left join" key.
I have two tables, A and B, both with id and name columns. I would like to select all the dbids on both tables, where the name in A equals the name in B.
I use this to make a synchronization, so at the beginning B is empty (so I will have couples with id from A with a value and id from B is null). Later I will have a mix of couples with value - value and value - null.
Normally it would be:
SELECT A.id, B.id
FROM A left join B
ON A.name = B.name
The problem is that I can't use the left join and wanted to know if/how it is possible to do the same thing.
you can use this approach, but you must be sure that the inner select only returns one row.
SELECT A.id,
(select B.id from B where A.name = B.name) as B_ID
FROM A
Just reverse the tables and use a right join instead.
SELECT A.id,
B.id
FROM B
RIGHT JOIN A
ON A.name = B.name
I'm not familiar with java/jpa. Using pure SQL, here's one approach:
SELECT A.id AS A_id, B.id AS B_id
FROM A INNER JOIN B
ON A.name = B.name
UNION
SELECT id AS A_id, NULL AS B_id
FROM A
WHERE name NOT IN ( SELECT name FROM B );
In SQL Server, for example, You can use the *= operator to make a left join:
select A.id, B.id
from A, B
where A.name *= B.name
Other databases might have a slightly different syntax, if such an operator exists at all.
This is the old syntax, used before the join keyword was introduced. You should of course use the join keyword instead if possible. The old syntax might not even work in newer versions of the database.
I can only think of two ways that haven't been given so far. My last three ideas have already been given (boohoo) but I put them here for posterity. I DID think of them without cheating. :-p
Calculate whether B has a match, then provide an extra UNIONed row for the B set to supply the NULL when there is no match.
SELECT A.Id, A.Something, B.Id, B.Whatever, B.SomethingElse
FROM
(
SELECT
A.*,
CASE
WHEN EXISTS (SELECT * FROM B WHERE A.Id = B.Id) THEN 1
ELSE 0
END Which
FROM A
) A
INNER JOIN (
SELECT 1 Which, B.* FROM B
UNION ALL SELECT 0, B* FROM B WHERE 1 = 0
) B ON A.Which = B.Which
AND (
A.Which = 0
OR (
A.Which = 1
AND A.Id = b.Id
)
)
A slightly different take on that same query:
SELECT A.Id, B.Id
FROM
(
SELECT
A.*,
CASE
WHEN EXISTS (SELECT * FROM B WHERE A.Id = B.Id) THEN A.Id
ELSE -1 // a value that does not exist in B
END PseudoId
FROM A
) A
INNER JOIN (
SELECT B.Id PseudoId, B.Id FROM B
UNION ALL SELECT -1, NULL
) B ON A.Which = B.Which
AND A.PseudoId = B.PseudoId
Only for SQL Server specifically. I know, it's really a left join, but it doesn't SAY LEFT in there!
SELECT A.Id, B.Id
FROM
A
OUTER APPLY (
SELECT *
FROM B
WHERE A.Id = B.Id
) B
Get the inner join then UNION the outer join:
SELECT A.Id, B.Id
FROM
A
INNER JOIN B ON A.name = B.name
UNION ALL
SELECT A.Id, NULL
FROM A
WHERE NOT EXISTS (
SELECT *
FROM B
WHERE A.Id = B.Id
)
Use RIGHT JOIN. That's not a LEFT JOIN!
SELECT A.Id, B.Id
FROM
B
RIGHT JOIN A ON B.name = A.name
Just select the B value in a subquery expression (let's hope there's only one B per A). Multiple columns from B can be their own expressions (YUCKO!):
SELECT A.Id, (SELECT TOP 1 B.Id FROM B WHERE A.Id = B.Id) Bid
FROM A
Anyone using Oracle may need some FROM DUAL clauses in any SELECTs that have no FROM.
You could use subqueries, something like:
select a.id
, nvl((select b.id from b where b.name = a.name), "") as bId
from a
you can use oracle + operator for left join :-
SELECT A.id, B.id
FROM A , B
ON A.name = B.name (+)
Find link :-
Oracle "(+)" Operator
SELECT A.id, B.id
FROM A full outer join B
ON A.name = B.name
where A.name is not null
I'm not sure if you just can't use a LEFT JOIN or if you're restricted from using any JOINS at all. But as far as I understand your requirements, an INNER JOIN should work:
SELECT A.id, B.id
FROM A
INNER JOIN B ON A.name = B.name
Simulating left join using pure simple sql:
SELECT A.name
FROM A
where (select count(B.name) from B where A.id = B.id)<1;
In left join there are no lines in B referring A so 0 names in B will refer to the lines in A that dont have a match
+ or A.id = B.id in where clause to simulate the inner join