LEFT JOIN but take only one row from right side - sql

Context:
I have two tables: ks__dokument and ks_pz. It's one-to-many relation where records from ks__dokument may have multiple records assigned from ks_pz.
Goal:
I want to show every row from ks__dokument and every row from ks__dokument must be shown only once.
What I tried:
Here is query I tried:
SELECT DISTINCT ks_id, * FROM ks__dokument AS dok1
LEFT JOIN ks_pz ON ks_id = kp_ksid
But it still shows duplicates.
EDITS
That ORDER BY and WHERE was unnecessary.
I dont need DISTINCT, it's just what I tried.
STRUCTURE OF TABLES
ks__dokument structure:
| ks_id | X | X | X | X | X | X |
ks_pz:
| kp_id | kp_ksid | X | X | X |
'X' are unimportant columns. kp_ksid is foreign key for ks__dokument.

Use OUTER APPLY:
SELECT dok1.*, k2.*
FROM ks__dokument dok1 OUTER APPLY
(SELECT TOP (1) *
FROM ks_pz
WHERE ks_id = kp_ksid
) k2
WHERE ks_usuniety = 0 AND
ks_data_otrzymania >= '2020-08-31'
ORDER BY ks_rok, ks_nr ASC;
Normally, there would be an ORDER BY in the subquery to specify which row to return.
The structure of your question makes it impossible to know if the ORDER BY should be in the subquery or in the outer query -- and the same for the WHERE conditions.
You really need to specify the tables where columns are coming from.

You can try the below - move your WHERE condition clause to ON clause
SELECT DISTINCT ks_id, * FROM ks__dokument AS dok1
LEFT JOIN ks_pz ON ks_id = kp_ksid
and ks_usuniety = 0 AND ks_data_otrzymania >= '2020-08-31'
ORDER BY ks_rok, ks_nr ASC

Related

Join table using column value as table name

Is it possible to join a table whereby the table name is a value in a column?
Here is a TABLE called food:
id food_name price_table pricing_reference_id
1 | 'apple' | 'daily_price' | 13
2 | 'banana' | 'monthly_price' | 13
3 | 'hotdog' | 'weekly_price' | 17
4 | 'sandwich' | 'monthly_price' | 9
There are three other tables (pricing tables): daily_price, weekly_price, and monthly_price tables.
Side note: Despite their names, the three pricing tables display vastly different kinds of information, which is why the three tables were not merged into one table
Each row in the food table can only be joined with one of the three pricing tables at most.
The following does not work -- it is just to illustrate what I am trying to get at:
SELECT *
FROM food
LEFT JOIN food.price_table ON food.pricing_reference_id = daily_price.id
WHERE id = 1;
Obviously the query does not work. Is there any way that the name of the table in the price_table column could be used as the table name in a join?
I would suggest left joins:
select f.*,
coalesce(dp.price, wp.price, mp.price) as price
from food f left join
daily_price dp
on f.pricing_reference_id = dp.id and
f.pricing_table = 'daily_price' left join
weekly_price wp
on f.pricing_reference_id = wp.id and
f.pricing_table = 'weekly_price' left join
monthly_price mp
on f.pricing_reference_id = mp.id and
f.pricing_table = 'monthly_price' ;
For the columns you reference, you need to use coalesce() to combine the results from the three tables. You say that the tables have different data, so you would need to list the columns separately.
The main reason I recommend this approach is performance. I think the left joins should be faster than any solution that uses union all.
Could you get your expected result using by a derived table with UNION SELECT which has a column of each table name?
SELECT *
FROM food
LEFT JOIN
(
SELECT 'daily_price' AS price_table, * FROM daily_price
UNION ALL SELECT 'monthly_price', * FROM monthly_price
UNION ALL SELECT 'weekly_price', * FROM weekly_price
) t
ON food.price_table = t.price_table AND
food.pricing_reference_id = t.id
ORDER BY food.id;
dbfiddle

Postgres join and count multiple relational tables

I want to join the 2 tables to the first table and group by a vendor name. I have three tables listed below.
Vendors Table
| id | name
|:-----------|------------:|
| test-id | Vendor Name |
VendorOrders Table
| id | VendorId | Details | isActive(Boolean)| price |
|:-----------|------------:|:------------:| -----------------| --------
| random-id | test-id | Sample test | TRUE | 5000
OrdersIssues Table
| id | VendorOrderId| Details. |
|:-----------|--------------:-----------:|
| order-id | random-id | Sample test|
The expected output is to count how many orders belong to a vendor and how many issues belongs to a vendor order.
I have the below code but it's not giving the right output.
SELECT "vendors"."name" as "vendorName",
COUNT("vendorOrders".id) as allOrders,
COUNT("orderIssues".id) as allIssues
FROM "vendors"
LEFT OUTER JOIN "vendorOrders" ON "vendors".id = "vendorOrders"."vendorId"
LEFT OUTER JOIN "orderIssues" ON "orderIssues"."vendorOrderId" = "vendorOrders"."id"
GROUP BY "vendors".id;```
You need the keyword DISTINCT, at least for allOrders:
SELECT v.name vendorName,
COUNT(DISTINCT vo.id) allOrders,
COUNT(DISTINCT oi.id) allIssues
FROM vendors v
LEFT OUTER JOIN vendorOrders vo ON v.id = vo.vendorId
LEFT OUTER JOIN orderIssues oi ON oi.vendorOrderId = vo.id
GROUP BY v.id, v.name;
Consider using aliases instead of full table names to make the code shorter and more readable.
You are joining along two related dimensions. The overall number of rows is the number of issues. But to get the number of orders, you need a distinct count:
SELECT v.*, count(distinct vo.id) as num_orders,
COUNT(oi.vendororderid) as num_issues
FROM vendors v LEFT JOIN
vendorOrders vo
ON v.id = vo.vendorId LEFT JOIN
orderIssues oi
ON oi.vendorOrderId = vo.id
GROUP BY v.id;
Notes:
Table aliases make the query easier to write and to read.
Quoting column and table names makes the query harder to write and read. Don't quote identifiers (you may need to recreate the tables).
Postgres support SELECT v.* . . . GROUP BY v.id assuming that the id is the primary key (actually, it only needs to be unique). This seems like a reasonable assumption.

postgres STRING_AGG() returns duplicates?

I have seen some similar posts, requesting advice for getting distinct results from the query. This can be solved with a subquery, but the column I am aggregating image_name is unique image_name VARCHAR(40) NOT NULL UNIQUE. I don't believe that should be necersarry.
This is the data in the spot_images table
spotdk=# select * from spot_images;
id | user_id | spot_id | image_name
----+---------+---------+--------------------------------------
1 | 1 | 1 | 81198013-e8f8-4baa-aece-6fbda15a0498
2 | 1 | 1 | 21b78e4e-f2e4-4d66-961f-83e5c28d69c5
3 | 1 | 1 | 59834585-8c49-4cdf-95e4-38c437acb3c1
4 | 1 | 1 | 0a42c962-2445-4b3b-97a6-325d344fda4a
(4 rows)
SELECT Round(Avg(ratings.rating), 2) AS rating,
spots.*,
String_agg(spot_images.image_name, ',') AS imageNames
FROM spots
FULL OUTER JOIN ratings
ON ratings.spot_id = spots.id
INNER JOIN spot_images
ON spot_images.spot_id = spots.id
WHERE spots.id = 1
GROUP BY spots.id;
This is the result of the images row:
81198013-e8f8-4baa-aece-6fbda15a0498,
21b78e4e-f2e4-4d66-961f-83e5c28d69c5,
59834585-8c49-4cdf-95e4-38c437acb3c1,
0a42c962-2445-4b3b-97a6-325d344fda4a,
81198013-e8f8-4baa-aece-6fbda15a0498,
21b78e4e-f2e4-4d66-961f-83e5c28d69c5,
59834585-8c49-4cdf-95e4-38c437acb3c1,
0a42c962-2445-4b3b-97a6-325d344fda4a,
81198013-e8f8-4baa-aece-6fbda15a0498,
21b78e4e-f2e4-4d66-961f-83e5c28d69c5,
59834585-8c49-4cdf-95e4-38c437acb3c1,
0a42c962-2445-4b3b-97a6-325d344fda4a
Not with linebreaks, I added them for visibility.
What should I do to retrieve the image_name's one time each?
If you don't want duplicates, use DISTINCT:
String_agg(distinct spot_images.image_name, ',') AS imageNames
Likely, there are several rows in ratings that match the given spot, and several rows in spot_images that match the given sport as well. As a results, rows are getting duplicated.
One option to avoid that is to aggregate in subqueries:
SELECT r.avg_raging
s.*,
si.image_names
FROM spots s
FULL OUTER JOIN (
SELECT spot_id, Round(Avg(ratings.rating), 2) avg_rating
FROM ratings
GROUP BY spot_id
) r ON r.spot_id = s.id
INNER JOIN (
SELECT spot_id, string_agg(spot_images.image_name, ',') image_names
FROM spot_images
GROUP BY spot_id
) si ON si.spot_id = s.id
WHERE s.id = 1
This actually could be more efficient that outer aggregation.
Note: it is hard to tell without seeing your data, but I am unsure that you really need a FULL JOIN here. A LEFT JOIN might actually be what you want.

How to find all the products with specific multi attribute values

I am using postgresql.
I have a table called custom_field_answers. The data looks like this
Id | product_id | value | number_value |
4 | 2 | | 117 |
3 | 1 | | 107 |
2 | 1 | bangle | |
1 | 2 | necklace | |
I want to find all the products which has text_value as 'bangle' and number_value less than 50.
Here was my first attempt.
SELECT "products".* FROM "products" INNER JOIN "custom_field_answers"
ON "custom_field_answers"."product_id" = "products"."id"
WHERE ("custom_field_answers"."value" ILIKE 'bangle')
Here is my second attempt.
SELECT "products".* FROM "products" INNER JOIN "custom_field_answers"
ON "custom_field_answers"."product_id" = "products"."id"
where ("custom_field_answers"."number_value" < 50)
Here is my final attempt.
SELECT "products".* FROM "products" INNER JOIN "custom_field_answers"
ON "custom_field_answers"."product_id" = "products"."id"
WHERE ("custom_field_answers"."value" ILIKE 'bangle')
AND ("custom_field_answers"."number_value" < 50)
but this does not select any product record.
A WHERE clause can only look at columns from one row at a time.
So if you need a condition that applies to two different rows from a table, you need to join to that table twice, so you can get columns from both rows.
SELECT p.*
FROM "products" AS p
INNER JOIN "custom_field_answers" AS a1 ON p."id" = a1."product_id"
INNER JOIN "custom_field_answers" AS a2 ON p."id" = a1."product_id"
WHERE a1."value" = 'bangle' AND a2."number_value" < 50
It produces no records because there is no custom_field_answers record that meets both criteria. What you want is a list of product_ids that have the necessary records in the table. Just in case no one gets to writing the SQL for you, and until I have a chance to work it out myself, I thought I would at least explain to you why your query is not working.
This should work:
SELECT p.* FROM products LEFT JOIN custom_field_answers c
ON (c.product_id = p.id AND c.value LIKE '%bangle%' AND c.number_value
Hope it helps
Your bangle-related number_value fields are null, so you won't be able to do a straight comparison in those cases. Instead, convert your nulls to 0s first.
SELECT "products".* FROM "products" INNER JOIN "custom_field_answers"
ON "custom_field_answers"."product_id" = "products"."id"
WHERE ("custom_field_answers"."value" LIKE '%bangle%')
AND (coalesce("custom_field_answers"."number_value", 0) < 50)
Didn't actually test it, but this general idea should work:
SELECT *
FROM products
WHERE
EXISTS (
SELECT *
FROM custom_field_answers
WHERE
custom_field_answers.product_id = products.id
AND value = 'bangle'
)
AND EXISTS (
SELECT *
FROM custom_field_answers
WHERE
custom_field_answers.product_id = products.id
AND number_value < 5
)
In plain English: Get all products such that...
there is a related row in custom_field_answers where value = 'bangle'
and there is (possibly different) related row in custom_field_answers where number_value < 5.

When should I use CROSS APPLY over INNER JOIN?

What is the main purpose of using CROSS APPLY?
I have read (vaguely, through posts on the Internet) that cross apply can be more efficient when selecting over large data sets if you are partitioning. (Paging comes to mind)
I also know that CROSS APPLY doesn't require a UDF as the right-table.
In most INNER JOIN queries (one-to-many relationships), I could rewrite them to use CROSS APPLY, but they always give me equivalent execution plans.
Can anyone give me a good example of when CROSS APPLY makes a difference in those cases where INNER JOIN will work as well?
Edit:
Here's a trivial example, where the execution plans are exactly the same. (Show me one where they differ and where cross apply is faster/more efficient)
create table Company (
companyId int identity(1,1)
, companyName varchar(100)
, zipcode varchar(10)
, constraint PK_Company primary key (companyId)
)
GO
create table Person (
personId int identity(1,1)
, personName varchar(100)
, companyId int
, constraint FK_Person_CompanyId foreign key (companyId) references dbo.Company(companyId)
, constraint PK_Person primary key (personId)
)
GO
insert Company
select 'ABC Company', '19808' union
select 'XYZ Company', '08534' union
select '123 Company', '10016'
insert Person
select 'Alan', 1 union
select 'Bobby', 1 union
select 'Chris', 1 union
select 'Xavier', 2 union
select 'Yoshi', 2 union
select 'Zambrano', 2 union
select 'Player 1', 3 union
select 'Player 2', 3 union
select 'Player 3', 3
/* using CROSS APPLY */
select *
from Person p
cross apply (
select *
from Company c
where p.companyid = c.companyId
) Czip
/* the equivalent query using INNER JOIN */
select *
from Person p
inner join Company c on p.companyid = c.companyId
Can anyone give me a good example of when CROSS APPLY makes a difference in those cases where INNER JOIN will work as well?
See the article in my blog for detailed performance comparison:
INNER JOIN vs. CROSS APPLY
CROSS APPLY works better on things that have no simple JOIN condition.
This one selects 3 last records from t2 for each record from t1:
SELECT t1.*, t2o.*
FROM t1
CROSS APPLY
(
SELECT TOP 3 *
FROM t2
WHERE t2.t1_id = t1.id
ORDER BY
t2.rank DESC
) t2o
It cannot be easily formulated with an INNER JOIN condition.
You could probably do something like that using CTE's and window function:
WITH t2o AS
(
SELECT t2.*, ROW_NUMBER() OVER (PARTITION BY t1_id ORDER BY rank) AS rn
FROM t2
)
SELECT t1.*, t2o.*
FROM t1
INNER JOIN
t2o
ON t2o.t1_id = t1.id
AND t2o.rn <= 3
, but this is less readable and probably less efficient.
Update:
Just checked.
master is a table of about 20,000,000 records with a PRIMARY KEY on id.
This query:
WITH q AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS rn
FROM master
),
t AS
(
SELECT 1 AS id
UNION ALL
SELECT 2
)
SELECT *
FROM t
JOIN q
ON q.rn <= t.id
runs for almost 30 seconds, while this one:
WITH t AS
(
SELECT 1 AS id
UNION ALL
SELECT 2
)
SELECT *
FROM t
CROSS APPLY
(
SELECT TOP (t.id) m.*
FROM master m
ORDER BY
id
) q
is instant.
Consider you have two tables.
MASTER TABLE
x------x--------------------x
| Id | Name |
x------x--------------------x
| 1 | A |
| 2 | B |
| 3 | C |
x------x--------------------x
DETAILS TABLE
x------x--------------------x-------x
| Id | PERIOD | QTY |
x------x--------------------x-------x
| 1 | 2014-01-13 | 10 |
| 1 | 2014-01-11 | 15 |
| 1 | 2014-01-12 | 20 |
| 2 | 2014-01-06 | 30 |
| 2 | 2014-01-08 | 40 |
x------x--------------------x-------x
There are many situations where we need to replace INNER JOIN with CROSS APPLY.
1. Join two tables based on TOP n results
Consider if we need to select Id and Name from Master and last two dates for each Id from Details table.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
INNER JOIN
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
ORDER BY CAST(PERIOD AS DATE)DESC
)D
ON M.ID=D.ID
SQL FIDDLE
The above query generates the following result.
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
x------x---------x--------------x-------x
See, it generated results for last two dates with last two date's Id and then joined these records only in the outer query on Id, which is wrong. This should be returning both Ids 1 and 2 but it returned only 1 because 1 has the last two dates. To accomplish this, we need to use CROSS APPLY.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
CROSS APPLY
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
WHERE M.ID=D.ID
ORDER BY CAST(PERIOD AS DATE)DESC
)D
SQL FIDDLE
and forms the following result.
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-08 | 40 |
| 2 | B | 2014-01-06 | 30 |
x------x---------x--------------x-------x
Here's how it works. The query inside CROSS APPLY can reference the outer table, where INNER JOIN cannot do this (it throws compile error). When finding the last two dates, joining is done inside CROSS APPLY i.e., WHERE M.ID=D.ID.
2. When we need INNER JOIN functionality using functions.
CROSS APPLY can be used as a replacement with INNER JOIN when we need to get result from Master table and a function.
SELECT M.ID,M.NAME,C.PERIOD,C.QTY
FROM MASTER M
CROSS APPLY dbo.FnGetQty(M.ID) C
And here is the function
CREATE FUNCTION FnGetQty
(
#Id INT
)
RETURNS TABLE
AS
RETURN
(
SELECT ID,PERIOD,QTY
FROM DETAILS
WHERE ID=#Id
)
SQL FIDDLE
which generated the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-11 | 15 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-06 | 30 |
| 2 | B | 2014-01-08 | 40 |
x------x---------x--------------x-------x
ADDITIONAL ADVANTAGE OF CROSS APPLY
APPLY can be used as a replacement for UNPIVOT. Either CROSS APPLY or OUTER APPLY can be used here, which are interchangeable.
Consider you have the below table(named MYTABLE).
x------x-------------x--------------x
| Id | FROMDATE | TODATE |
x------x-------------x--------------x
| 1 | 2014-01-11 | 2014-01-13 |
| 1 | 2014-02-23 | 2014-02-27 |
| 2 | 2014-05-06 | 2014-05-30 |
| 3 | NULL | NULL |
x------x-------------x--------------x
The query is below.
SELECT DISTINCT ID,DATES
FROM MYTABLE
CROSS APPLY(VALUES (FROMDATE),(TODATE))
COLUMNNAMES(DATES)
SQL FIDDLE
which brings you the result
x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
| 3 | NULL |
x------x-------------x
cross apply sometimes enables you to do things that you cannot do with inner join.
Example (a syntax error):
select F.* from sys.objects O
inner join dbo.myTableFun(O.name) F
on F.schema_id= O.schema_id
This is a syntax error, because, when used with inner join, table functions can only take variables or constants as parameters. (I.e., the table function parameter cannot depend on another table's column.)
However:
select F.* from sys.objects O
cross apply ( select * from dbo.myTableFun(O.name) ) F
where F.schema_id= O.schema_id
This is legal.
Edit:
Or alternatively, shorter syntax: (by ErikE)
select F.* from sys.objects O
cross apply dbo.myTableFun(O.name) F
where F.schema_id= O.schema_id
Edit:
Note:
Informix 12.10 xC2+ has Lateral Derived Tables and Postgresql (9.3+) has Lateral Subqueries which can be used to a similar effect.
It seems to me that CROSS APPLY can fill a certain gap when working with calculated fields in complex/nested queries, and make them simpler and more readable.
Simple example: you have a DoB and you want to present multiple age-related fields that will also rely on other data sources (such as employment), like Age, AgeGroup, AgeAtHiring, MinimumRetirementDate, etc. for use in your end-user application (Excel PivotTables, for example).
Options are limited and rarely elegant:
JOIN subqueries cannot introduce new values in the dataset based on data in the parent query (it must stand on its own).
UDFs are neat, but slow as they tend to prevent parallel operations. And being a separate entity can be a good (less code) or a bad (where is the code) thing.
Junction tables. Sometimes they can work, but soon enough you're joining subqueries with tons of UNIONs. Big mess.
Create yet another single-purpose view, assuming your calculations don't require data obtained mid-way through your main query.
Intermediary tables. Yes... that usually works, and often a good option as they can be indexed and fast, but performance can also drop due to to UPDATE statements not being parallel and not allowing to cascade formulas (reuse results) to update several fields within the same statement. And sometimes you'd just prefer to do things in one pass.
Nesting queries. Yes at any point you can put parenthesis on your entire query and use it as a subquery upon which you can manipulate source data and calculated fields alike. But you can only do this so much before it gets ugly. Very ugly.
Repeating code. What is the greatest value of 3 long (CASE...ELSE...END) statements? That's gonna be readable!
Tell your clients to calculate the damn things themselves.
Did I miss something? Probably, so feel free to comment. But hey, CROSS APPLY is like a godsend in such situations: you just add a simple CROSS APPLY (select tbl.value + 1 as someFormula) as crossTbl and voilĂ ! Your new field is now ready for use practically like it had always been there in your source data.
Values introduced through CROSS APPLY can...
be used to create one or multiple calculated fields without adding performance, complexity or readability issues to the mix
like with JOINs, several subsequent CROSS APPLY statements can refer to themselves: CROSS APPLY (select crossTbl.someFormula + 1 as someMoreFormula) as crossTbl2
you can use values introduced by a CROSS APPLY in subsequent JOIN conditions
As a bonus, there's the Table-valued function aspect
Dang, there's nothing they can't do!
This has already been answered very well technically, but let me give a concrete example of how it's extremely useful:
Lets say you have two tables, Customer and Order. Customers have many Orders.
I want to create a view that gives me details about customers, and the most recent order they've made. With just JOINS, this would require some self-joins and aggregation which isn't pretty. But with Cross Apply, its super easy:
SELECT *
FROM Customer
CROSS APPLY (
SELECT TOP 1 *
FROM Order
WHERE Order.CustomerId = Customer.CustomerId
ORDER BY OrderDate DESC
) T
Cross apply works well with an XML field as well. If you wish to select node values in combination with other fields.
For example, if you have a table containing some xml
<root>
<subnode1>
<some_node value="1" />
<some_node value="2" />
<some_node value="3" />
<some_node value="4" />
</subnode1>
</root>
Using the query
SELECT
id as [xt_id]
,xmlfield.value('(/root/#attribute)[1]', 'varchar(50)') root_attribute_value
,node_attribute_value = [some_node].value('#value', 'int')
,lt.lt_name
FROM dbo.table_with_xml xt
CROSS APPLY xmlfield.nodes('/root/subnode1/some_node') as g ([some_node])
LEFT OUTER JOIN dbo.lookup_table lt
ON [some_node].value('#value', 'int') = lt.lt_id
Will return a result
xt_id root_attribute_value node_attribute_value lt_name
----------------------------------------------------------------------
1 test1 1 Benefits
1 test1 4 FINRPTCOMPANY
Cross apply can be used to replace subquery's where you need a column of the subquery
subquery
select * from person p where
p.companyId in(select c.companyId from company c where c.companyname like '%yyy%')
here i won't be able to select the columns of company table
so, using cross apply
select P.*,T.CompanyName
from Person p
cross apply (
select *
from Company C
where p.companyid = c.companyId and c.CompanyName like '%yyy%'
) T
Here's a brief tutorial that can be saved in a .sql file and executed in SSMS that I wrote for myself to quickly refresh my memory on how CROSS APPLY works and when to use it:
-- Here's the key to understanding CROSS APPLY: despite the totally different name, think of it as being like an advanced 'basic join'.
-- A 'basic join' gives the Cartesian product of the rows in the tables on both sides of the join: all rows on the left joined with all rows on the right.
-- The formal name of this join in SQL is a CROSS JOIN. You now start to understand why they named the operator CROSS APPLY.
-- Given the following (very) simple tables and data:
CREATE TABLE #TempStrings ([SomeString] [nvarchar](10) NOT NULL);
CREATE TABLE #TempNumbers ([SomeNumber] [int] NOT NULL);
CREATE TABLE #TempNumbers2 ([SomeNumber] [int] NOT NULL);
INSERT INTO #TempStrings VALUES ('111'); INSERT INTO #TempStrings VALUES ('222');
INSERT INTO #TempNumbers VALUES (111); INSERT INTO #TempNumbers VALUES (222);
INSERT INTO #TempNumbers2 VALUES (111); INSERT INTO #TempNumbers2 VALUES (222); INSERT INTO #TempNumbers2 VALUES (222);
-- Basic join is like CROSS APPLY; 2 rows on each side gives us an output of 4 rows, but 2 rows on the left and 0 on the right gives us an output of 0 rows:
SELECT
st.SomeString, nbr.SomeNumber
FROM -- Basic join ('CROSS JOIN')
#TempStrings st, #TempNumbers nbr
-- Note: this also works:
--#TempStrings st CROSS JOIN #TempNumbers nbr
-- Basic join can be used to achieve the functionality of INNER JOIN by first generating all row combinations and then whittling them down with a WHERE clause:
SELECT
st.SomeString, nbr.SomeNumber
FROM -- Basic join ('CROSS JOIN')
#TempStrings st, #TempNumbers nbr
WHERE
st.SomeString = nbr.SomeNumber
-- However, for increased readability, the SQL standard introduced the INNER JOIN ... ON syntax for increased clarity; it brings the columns that two tables are
-- being joined on next to the JOIN clause, rather than having them later on in the WHERE clause. When multiple tables are being joined together, this makes it
-- much easier to read which columns are being joined on which tables; but make no mistake, the following syntax is *semantically identical* to the above syntax:
SELECT
st.SomeString, nbr.SomeNumber
FROM -- Inner join
#TempStrings st INNER JOIN #TempNumbers nbr ON st.SomeString = nbr.SomeNumber
-- Because CROSS APPLY is generally used with a subquery, the subquery's WHERE clause will appear next to the join clause (CROSS APPLY), much like the aforementioned
-- 'ON' keyword appears next to the INNER JOIN clause. In this sense, then, CROSS APPLY combined with a subquery that has a WHERE clause is like an INNER JOIN with
-- an ON keyword, but more powerful because it can be used with subqueries (or table-valued functions, where said WHERE clause can be hidden inside the function).
SELECT
st.SomeString, nbr.SomeNumber
FROM
#TempStrings st CROSS APPLY (SELECT * FROM #TempNumbers tempNbr WHERE st.SomeString = tempNbr.SomeNumber) nbr
-- CROSS APPLY joins in the same way as a CROSS JOIN, but what is joined can be a subquery or table-valued function. You'll still get 0 rows of output if
-- there are 0 rows on either side, and in this sense it's like an INNER JOIN:
SELECT
st.SomeString, nbr.SomeNumber
FROM
#TempStrings st CROSS APPLY (SELECT * FROM #TempNumbers tempNbr WHERE 1 = 2) nbr
-- OUTER APPLY is like CROSS APPLY, except that if one side of the join has 0 rows, you'll get the values of the side that has rows, with NULL values for
-- the other side's columns. In this sense it's like a FULL OUTER JOIN:
SELECT
st.SomeString, nbr.SomeNumber
FROM
#TempStrings st OUTER APPLY (SELECT * FROM #TempNumbers tempNbr WHERE 1 = 2) nbr
-- One thing CROSS APPLY makes it easy to do is to use a subquery where you would usually have to use GROUP BY with aggregate functions in the SELECT list.
-- In the following example, we can get an aggregate of string values from a second table based on matching one of its columns with a value from the first
-- table - something that would have had to be done in the ON clause of the LEFT JOIN - but because we're now using a subquery thanks to CROSS APPLY, we
-- don't need to worry about GROUP BY in the main query and so we don't have to put all the SELECT values inside an aggregate function like MIN().
SELECT
st.SomeString, nbr.SomeNumbers
FROM
#TempStrings st CROSS APPLY (SELECT SomeNumbers = STRING_AGG(tempNbr.SomeNumber, ', ') FROM #TempNumbers2 tempNbr WHERE st.SomeString = tempNbr.SomeNumber) nbr
-- ^ First the subquery is whittled down with the WHERE clause, then the aggregate function is applied with no GROUP BY clause; this means all rows are
-- grouped into one, and the aggregate function aggregates them all, in this case building a comma-delimited string containing their values.
DROP TABLE #TempStrings;
DROP TABLE #TempNumbers;
DROP TABLE #TempNumbers2;
I guess it should be readability ;)
CROSS APPLY will be somewhat unique for people reading to tell them that a UDF is being used which will be applied to each row from the table on the left.
Ofcourse, there are other limitations where a CROSS APPLY is better used than JOIN which other friends have posted above.
The essence of the APPLY operator is to allow correlation between left and right side of the operator in the FROM clause.
In contrast to JOIN, the correlation between inputs is not allowed.
Speaking about correlation in APPLY operator, I mean on the right hand side we can put:
a derived table - as a correlated subquery with an alias
a table valued function - a conceptual view with parameters, where the parameter can refer to the left side
Both can return multiple columns and rows.
Here is an article that explains it all, with their performance difference and usage over JOINS.
SQL Server CROSS APPLY and OUTER APPLY over JOINS
As suggested in this article, there is no performance difference between them for normal join operations (INNER AND CROSS).
The usage difference arrives when you have to do a query like this:
CREATE FUNCTION dbo.fn_GetAllEmployeeOfADepartment(#DeptID AS INT)
RETURNS TABLE
AS
RETURN
(
SELECT * FROM Employee E
WHERE E.DepartmentID = #DeptID
)
GO
SELECT * FROM Department D
CROSS APPLY dbo.fn_GetAllEmployeeOfADepartment(D.DepartmentID)
That is, when you have to relate with function. This cannot be done using INNER JOIN, which would give you the error "The multi-part identifier "D.DepartmentID" could not be bound." Here the value is passed to the function as each row is read. Sounds cool to me. :)
Well I am not sure if this qualifies as a reason to use Cross Apply versus Inner Join, but this query was answered for me in a Forum Post using Cross Apply, so I am not sure if there is an equalivent method using Inner Join:
Create PROCEDURE [dbo].[Message_FindHighestMatches]
-- Declare the Topical Neighborhood
#TopicalNeighborhood nchar(255)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON
Create table #temp
(
MessageID int,
Subjects nchar(255),
SubjectsCount int
)
Insert into #temp Select MessageID, Subjects, SubjectsCount From Message
Select Top 20 MessageID, Subjects, SubjectsCount,
(t.cnt * 100)/t3.inputvalues as MatchPercentage
From #temp
cross apply (select count(*) as cnt from dbo.Split(Subjects,',') as t1
join dbo.Split(#TopicalNeighborhood,',') as t2
on t1.value = t2.value) as t
cross apply (select count(*) as inputValues from dbo.Split(#TopicalNeighborhood,',')) as t3
Order By MatchPercentage desc
drop table #temp
END
This is perhaps an old question, but I still love the power of CROSS APPLY to simplify the re-use of logic and to provide a "chaining" mechanism for results.
I've provided a SQL Fiddle below which shows a simple example of how you can use CROSS APPLY to perform complex logical operations on your data set without things getting at all messy. It's not hard to extrapolate from here more complex calculations.
http://sqlfiddle.com/#!3/23862/2
While most queries which employ CROSS APPLY can be rewritten using an INNER JOIN, CROSS APPLY can yield better execution plan and better performance, since it can limit the set being joined yet before the join occurs.
Stolen from Here
We use CROSS APPLY to update a table with JSON from another (update request) table -- joins won't work for this as we use OPENJSON, to read the content of the JSON, and OPENJSON is a "table-valued function".
I was going to put a simplified version of one of our UPDATE commands here as a example but, even simplified, it is rather large and overly complex for an example. So this much simplied "sketch" of just part of the command will have to suffice:
SELECT
r.UserRequestId,
j.xxxx AS xxxx,
FROM RequestTable as r WITH (NOLOCK)
CROSS APPLY
OPENJSON(r.JSON, '$.requesttype.recordtype')
WITH(
r.userrequestid nvarchar(50) '$.userrequestid',
j.xxx nvarchar(20) '$.xxx
)j
WHERE r.Id > #MaxRequestId
and ... etc. ....