Alternative to an outer join to a subquery? - sql

Apparently outer-joins to a subquery are not allowed by Oracle. For each row on table A, I'm trying find the row on table B with the same ID, and latest date.
Something like this:
SELECT a.*, b.date, b.val1, b.val2
FROM a, b
WHERE b.id (+) = a.id
AND b.date (+) = (SELECT MAX(b.date) FROM a, b WHERE a.id = b.id);
Removing the outer join (+) on b.date allows it to be parsed, but no rows are returned when there are no rows on table B. I need the query to just return NULL in this case. Is there a way around this?
Thanks

I think what you want is this:
SELECT a.*, b.date, b.val1, b.val2
FROM a
LEFT JOIN b ON b.id = a.id
WHERE (b.date is null
or b.date = (SELECT MAX(b2.date) FROM b b2 WHERE a.id = b2.id));
This way, the outer join is just performed on id. Then we're filtering out all of the rows where b.date is not the max for the corresponding row in a.
As an aside, you'll note that I removed a from the sub-query. As originally written, the sub-query returned the largest date in b that had a corresponding row in a. The same value would be used for every row of the outer query. The revised version makes the sub-query correlate to the outer query (i.e. it will get the corresponding max(date) for each row returned).

I already voted for Allan's answer, but just to demonstrate an alternative approach, here's how it can be done with an analytic function:
SELECT * FROM (
SELECT a.*, b.date, b.val1, b.val2,
ROW_NUMBER() over (PARTITION BY a.id ORDER BY b.date DESC) r
FROM a LEFT JOIN b ON a.id=b.id
)
WHERE r=1
This will include only one row for each a.id, even if there are multiple b rows with the maximum date. To include all of them, change ROW_NUMBER to RANK.

How about a scalar subquery?
select a.*, (select max(b.date) from b where b.id = a.id) as b_date
from a;

Edit: You can save the max date to a variable
DECLARE #maxDate as datetime
SET #maxDate = (SELECT MAX(date) FROM b)
SELECT a.*, b.date, b.val1, b.val2
FROM a
LEFT OUTER JOIN b ON a.id = b.id
AND b.date = #maxDate
This may be more or less efficient than Allan's answer, depending on if A has many more rows than B (or vice-versa). If B has a ton of rows, then querying it twice (which my answer does) is probably not the best solution.

Related

SQL Server : joining/ merging tables

Assume there are three total tables A,B, and C
Each table contains following attributes:
Table A: "date", "id", "tv_sales_amt"
Table B: "date", "id", "newspaper_sales_amt"
Table C: "date", "id", "radio_sales_amt"
Using join operators, I'm trying to achieve a single table view with:
Result table:
"date", "id", "tv_sales_amt", "newspaper_sales_amt", "radio_sales_amt"
Desired look of the result table:
date id tv_sales_amt newspaper_sales_amt radio_sales_amt
--------------------------------------------------------------------
20190101 012C 2000 1850 NULL
20190102 102D 1000 NULL 1300
.
.
.
Here are some queries that I've tried:
Query #1:
SELECT
A.date, A.id, tv_sales_amt, newspaper_sales_amt, radio_sales_amt
FROM A
INNER JOIN B ON A.id = B.id
INNER JOIN C ON A.id = C.id
Using inner inner-join, I get duplicated values, which is understandable but is not what I'm looking for.
Query #2:
SELECT
A.date, A.id, tv_sales_amt, newspaper_sales_amt, radio_sales_amt
FROM A
FULL OUTER JOIN B ON A.id = B.id
FULL OUTER JOIN C ON A.id = C.id
Since inner-join would only return results from Table B and C (newspaper_sales_amt and radio_sales_amt) that intersects with Table A, I tried full-outer-join in the hopes that it would give me overview of entire results, even though it includes null values.
With both options that I've tried, I wasn't able to get the expected result (Desired look described above).
Would someone be able to tell me what I'm doing wrong here?
I'm using the latest version of SQL Server Management Studio.
I do know that there must be lots of null values if I were to take an overview of tv_sales_amt, newspapers_sales_amt, and radio_sales_amt, but currently, there are no null values but with duplicates.
Your description implies that date also should be the part of joins.
select coalesce(a.date, b.date, c.date) [date],
coalesce(a.id, b.id, c.id) id,
a.tv_sales_amt, b.newspaper_sales_amt, c.radio_sales_amt
from tableA a
full join tableB b on a.id = b.id and a.date = b.date
full join tableC c on (c.id = b.id and c.date = b.date) or
(c.id = a.id and c.date = a.date);
EDIT: This is DbFiddle demo. Try removing coalesce() for date or id (removing in only one of them helps to see it better).
EDIT: Maybe this shows the need for coalesce() better:
select coalesce(a.date, b.date, c.date) [date],
a.id idA, b.id idB, c.id idC,
a.tv_sales_amt, b.newspaper_sales_amt, c.radio_sales_amt
from tableA a
full join tableB b on a.id = b.id and a.date = b.date
full join tableC c on (c.id = b.id and c.date = b.date) or
(c.id = a.id and c.date = a.date);
Full outer joins are tricky. I would suggest:
SELECT COALESCE(A.DATE, B.DATE, C.DATE) as date,
COALESCE(A.ID, B.ID, C.ID) as id,
A.tv_sales_amt, B.newspaper_sales_amt, C.radio_sales_amt
FROM A FULL JOIN
B
ON B.id = A.id AND
B.date = A.date FULL JOIN
C
ON C.id = COALESCE(A.id, B.id) AND
C.date = COALESCE(B.date, B.date);
I find that using COALESCE() in the ON clause simplifies adding more conditions. From a performance perspective, both COALESCE() and OR are pretty bad.

Slow SQL query when join two tables, any way to improve the query speed?

Slow SQL query when join two tables, any way to improve the query speed?
I have a small table A and large table B. A has all column we need, except TYPE column, and the TYPE value can be only found in B. But B have too many useless rows.
Now I want to select all rows from A, and they should have all column plus TYPE. My idea is use left join as it can select all rows from B which exist in A, so we can get the TYPE value.
Oracle:
SELECT B.HOUR, B.LOCATION, B.PRICE, B.TYPE, B.DATE
FROM A LEFT JOIN B
ON A.HOUR=B.HOUR AND A.LOCATION=B.LOCATION AND A.PRICE=B.PRICE AND A.DATE=B.DATE
It is very slow. Besides, I only have read privilege so I cannot create new table. Is there any method to improve it? Thanks.
Without seeing the actual data or being able to add indexes, etc it is hard to provide advice, but there a couple of approaches you can try:
a) Use an Exists instead of the JOIN
SELECT B.HOUR, B.LOCATION, B.PRICE, B.TYPE, B.DATE
FROM B
WHERE EXISTS ( SELECT 1 FROM A
WHERE A.HOUR=B.HOUR AND A.LOCATION=B.LOCATION AND A.PRICE=B.PRICE AND A.DATE=B.DATE)
b) Group the larger 'B' table in a CTE or into a temp table
;WITH data as (
SELECT B.HOUR, B.LOCATION, B.PRICE, B.TYPE, B.DATE
FROM B
GROUP BY B.HOUR, B.LOCATION, B.PRICE, B.TYPE, B.DATE
)
SELECT Data.HOUR, Data.LOCATION, Data.PRICE, Data.TYPE, Data.DATE
FROM Data
INNER JOIN A
ON A.HOUR=Data.HOUR AND A.LOCATION=Data.LOCATION AND A.PRICE=Data.PRICE AND A.DATE=Data.DATE
It is possible that neither solution will work, but they may be worth a try
For your query, you want an index on b(hour, location, price, date).
The order of the columns does not really matter.
I think your query should be written as:
SELECT a.*, b.type
FROM A LEFT JOIN
B
ON A.HOUR = B.HOUR AND A.LOCATION = B.LOCATION AND
A.PRICE = B.PRICE AND A.DATE = B.DATE;
The problem with the join is it's attempting to match multiple keys, and it none of them are integers. Instead of trying to fix that (you don't have permission anyway), use a subquery.
A subquery only takes the A rows and adds the single field in B with the matching criteria:
SELECT A.HOUR, A.LOCATION, A.PRICE,
,(SELECT TYPE
FROM B
WHERE B.HOUR = A.HOUR AND B.LOCATION = A.LOCATION
AND B.PRICE = A.PRICE AND B.DATE = A.DATE
) AS [Type]
,A.DATE
FROM A

SparkSQL - The correlated scalar subquery can only contain equality predicates

I would like to execute the following query with Spark SQL 2.0
SELECT
a.id as id,
(SELECT SUM(b.points)
FROM tableB b
WHERE b.id = a.id AND b.date <= a.date) AS points
FROM tableA a
but I get the following error
The correlated scalar subquery can only contain equality predicates.
Any idea how can I rewrite the query or use operations between the two dataframes tableA and tableB to make it working?
select a.id as id,
sum(b.points) as points
from a, b
where a.id = b.id
and b.date <= a.date
group by a.id
;
Skip the sub-select and group by id to ensure a one to one relationship between ids and the sum of b's points column.
Here's a 'down and dirty' example which I used:
select * from a ;
id|date
1|2017-01-22 17:59:49
2|2017-01-22 18:00:00
3|2017-01-22 18:00:05
4|2017-01-22 18:00:11
5|2017-01-22 18:00:15
select * from b ;
id|points|date
1|12|2017-01-21 18:03:20
3|25|2017-01-21 18:03:37
5|17|2017-01-21 18:03:55
2|-1|2017-01-22 18:04:27
4|-4|2017-01-22 18:04:35
5|400|2017-01-20 18:17:31
5|-1000|2017-01-23 18:18:36
Notice that b has three entries of id = 5, two before a.date and one after.
select a.id, sum(b.points) as points from a, b where a.id = b.id and b.date <= a.date group by a.id ;
1|12
3|25
5|417
I also confirmed "group by" is supported: http://spark.apache.org/docs/latest/sql-programming-guide.html#supported-hive-features

Sql several rows greater than value of other table

My model is A as many B. These tables are related by A.id and B.item_id. B has a column called created_at. I need to select all rows from A such that B.created_at is greater than 2016-09-1`. I would like to know if it possible to solve that question using a query similar to this one:
SELECT * FROM A
WHERE (
(SELECT B.created_at
FROM B
WHERE B.item_id = A.id)
>= '2016-09-1' )
You can do this with a JOIN and selecting only from A:
Select A.*
From A
Join B On A.id = B.item_id
Where B.created_at > '2016-09-01'
need to select from A all rows such that B.created_at is greater than '2016-09-1'
select * from A where exists(SELECT 1
FROM B
WHERE B.item_id = A.id)
b.created_at>= '2016-09-1')
In general this can be done with an INNER JOIN containing your date filter criteria. Something like:
SELECT A.*
FROM A
INNER JOIN B
ON A.id = B.Iditem_id
AND B.created_at > '2016-09-01'
This would be a T-SQL method for doing this.
You can also apply a simple WHERE clause - you'd want to check the execution plan to see if either of these options make a real difference...
SELECT A.*
FROM A
INNER JOIN B
ON A.id = B.Iditem_id
WHERE B.created_at > '2016-09-01'
If you want to use a subset from table B first... you can do...
SELECT A.*
FROM A
INNER JOIN (
SELECT B.*
FROM B
WHERE B.created_at > '2016-09-01')
)Bsub
ON A.id = Bsub.Iditem_id
All of these would work with T-SQL, can't speak to other SQL-like languages.
select A.*
from A
left join B
on B.item_id=A.id
where B.created_at >= '2016-09-1'

Write correlated subquery in a WHERE Clause as join

I have a query like below:
select
a.id, a.title, a.description
from
my_table_name as a
where
a.id in (select id from another_table b where b.id = 1)
My question is, is there any way I can avoid the subquery in where clause and use it in from clause itself without compromising of performance?
Both of the answers given so far are incorrect in the general case (though the database may have unique constraints which ensure they are correct in a specific case)
If another_table might have multiple rows with the same id then the INNER JOIN will bring back duplicates that are not present in the IN version. Trying to remove them with DISTINCT can change the semantics if the columns from my_table_name themselves have duplicates.
A general rewrite would be
SELECT a.id,
a.title,
a.description
FROM my_table_name AS a
JOIN (SELECT DISTINCT id
FROM another_table
WHERE id = 1) AS b
ON b.id = a.id
The performance characteristics of this rewrite are implementation dependant.
You may use INNER JOIN as:
select
a.id, a.title, a.description
from
my_table_name as a INNER JOIN another_table as b ON (a.id = b.id and b.id = 1)
Or
select
a.id, a.title, a.description
from
my_table_name as a INNER JOIN another_table as b ON a.id = b.id
where b.id = 1
Both the queries may not return the same value for you. You may choose whatever works for you. Please use this as a starting point and not as a copy-paste code.
To express it as a join:
select distinct
a.id, a.title, a.description
from my_table_name as a
join another_table b on b.id = a.id
where b.id = 1
The use of distinct is to produce the same results in case another_table has the same id more than once so the same row doesn't get returned multiple times.
Note: if combinations of id, name and description in my_table_name are not unique, this query won't return such duplicates as the original query would.
To guarantee to produce the same results, you need to ensure that the id's in another_table is unique. To do this as a join:
select
a.id, a.title, a.description
from my_table_name as a
join (select distinct id from another_table) b on b.id = a.id
where b.id = 1