SQL Server cross apply not working? - sql

http://sqlfiddle.com/#!3/78273/1
create table emptb1
(
id int,
name varchar(20),
dept int
)
insert into emptb1 values (1,'vish',10);
insert into emptb1 values (2,'vish',10);
insert into emptb1 values (3,'vish',30);
insert into emptb1 values (4,'vish',20);
create table depttb1
(
id int,
name varchar(20)
)
insert into depttb1 values(10,'IT')
insert into depttb1 values(20,'AC')
insert into depttb1 values(30,'LIC')
select * from emptb1
select e.id, e.name, a.id
from emptb1 e
cross apply
(
select top 1 * from depttb1 d
where d.id = e.dept
order by d.id desc
) a
I was trying to learn cross apply as it's similar as inner join but works with function.
In above query I'm assuming it should take only dept=30 because order d.id desc will give only top 1st id which is 30 and then it should return employees with dept id = 30 but it's giving me all the rows and all the deptid.
What's wrong with query or I'm wrong interpreting the concept of cross apply.

You say "In above query I'm assuming it should take only dept=30 because order d.id desc will give only top 1st id which is 30 and then it should return employees with dept id = 30".
That's not how it works. Here's your query (reformatted a little for clarity):
select e.id, e.name, a.id
from emptb1 e
cross apply
(
select top 1 *
from depttb1 d
where d.id = e.dept
order by d.id desc
) a
The APPLY keyword means that the inner query is (logically) called once for each row of the outer query. For what happens inside the inner query, it's helpful to understand the logical order that the clauses of a SELECT are executed in. This order is:
FROM clause
WHERE clause
SELECT columns
ORDER BY clause
TOP operator
Note that in your inner query then, the TOP operator gets applied last, well after the WHERE clause. This means the where d.id = e.dept will first reduce the inner rows to those whose d.id matches the e.dept of the outer row (which is not necessarily 30), then sort them, and then return the first one. And it does this for every row in the outer query. So obviously, many of them are not going to be 30.
What you are trying to would be more akin to this (still retaining the CROSS APPLY):
select e.id, e.name, a.id
from emptb1 e
cross apply
(
select top 1 *
from
(
select top 1 *
from depttb1 d
order by d.id desc
) b
where b.id = e.dept
) a
Here, the logic has been reordered by use of another, nested, sub-query that insures that the ORDER BY, then TOP 1 get applied before the WHERE clause. (Note that this would not normally the recommended way to do this as nested sub-queries can hamper readability, I just used it here to retain the CROSS APPLY and to retain the rest of the original structure).

To exand on Damien's comment, the inner query:
select top 1 * from depttb1 d
where d.id = e.dept
order by d.id desc
is going to run for every row in the outer query:
select e.id, e.name, a.id
from emptb1 e
So you will always get a match from the inner query for each row. I think you were expecting the inner query to run only one time, but that's not what APPLY does.
So, taking the first row from your outer query, with an ID of 1 and a dept id of 10, your inner query will translate to:
select top 1 * from depttb1 d
where d.id = 10 //this is the dept id for the current row from your outer query
order by d.id desc

To solve this problem without a cross apply, use a sub query. However in your example it will only return one row, the last department entered assuming id value is increasing.
-- Using a sub query to find max dept
select e.id, e.name
from emptb1 e
where e.dept in
(
select top 1 id
from depttb1
order by id desc
)
The idea behind a CROSS APPLY is kinda like a CROSS JOIN. This will return all rows. It is used by DBA's with many of the Dynamic Management View (DMVs) that are Table Value Functions (TVF)
What you want is a OUTER APPLY kinda like a LEFT JOIN.
select e.id, e.name
from emptb1 e
outer apply
(
select top 1 d.id from depttb1 d order by d.id desc
) AS m (id)
where e.dept = m.id
Check out my articles on these concepts.
CROSS APPLY - http://craftydba.com/?p=3767
OUTER APPLY - http://craftydba.com/?p=3796
TABLE VALUE FUNCTION (INLINE) - http://craftydba.com/?p=3733
TABLE VALUE FUNCTION (MULTI LINE) - http://craftydba.com/?p=3754

Related

SQL Joins and Corelated subqueries with column data

I am facing an issue in terms of understanding the joins. Lets say for an example we have two tables employee and sales and now I have a query where we have sales of an employee using the id of the employee
select e.employeename
,s.city
,SUM(s.sales)
from employee e
left join (select sales,eid from sales) s on s.eid = e.id
group by 1,2
I'd like to understand why s.city wasn't showing up? and also would like to understand what is this concept called? Is it co related sub queries on Joins? Please help me down over here.
select
e.employeename
,s.city
,SUM(s.sales)
from employee e
left join (select sales,eid,city from sales) s on s.eid = e.id
group by 1,2
in the left join above you have to add city as well. The query Imagine select sales,eid,city from sales is a table itself and then from this table you are selecting city (your second column s.city) this will run error as your table doesn't have a city column yet.
It is much easier to use CTE (common table expressions than CTE's) You can also do the above question as
select
e.employeename
,s.city
,SUM(s.sales)
from employee e
left join sales as s
on e.id = s.id
group by 1,2
here I have added e.id = s.id instead of s.id = e.id it is better to reference the key of the main table first.
you could use CTE (although used when you have to do a lot of referencing but you can see how it works):
With staging as (
select
e.employeename
,s.city
,s.sales
from employee e
left join sales as s
on e.id = s.id
),
sales_stats as (
select
staging.employeename,
staging.city,
sum(staging.sales)
from staging
group by 1,2
#here you will select from staging again consider staging as a separate table so you will have to have all the columns in the staging that you want to use further. Also you will have to reference columns using staging.x
)
select * from sales_stats
-- here you could have combined the steps but I wanted to show you how cte works, Hope this works for you

I have already specified different table names to avoid ambiguity still error exists

With tbl_name as
(Select top(1000) *
from tblEmployee as E right join tblTransaction as T
on E.EmployeeNumber = T.EmployeeNumber
where E.EmployeeNumber is null
order by T.EmployeeNumber)
Select Distinct(T.EmployeeNumber) as EmployeeNum from tbl_name
The error which I am getting:-
Msg 8156, Level 16, State 1, Line 24 The column 'EmployeeNumber' was
specified multiple times for 'tbl_name'.
The issue is that you have the same column in both the tables you have in the Join. instead of SELECT TOP(1000) * specify the column names from the proper table. if 2 tables are having the same column name and you need both of them in the result set, please give one of the columns a different alias name. something like this
WITH tbl_name
AS (SELECT TOP (1000)
E_EmployeeNumber = E.EmployeeNumber ,
T_EmployeeNumber = T.EmployeeNumber
FROM tblEmployee AS E
RIGHT JOIN tblTransaction AS T ON E.EmployeeNumber = T.EmployeeNumber
WHERE E.EmployeeNumber IS NULL
ORDER BY T.EmployeeNumber)
SELECT DISTINCT
(T_EmployeeNumber) AS EmployeeNum
FROM tbl_name;
Also, Looking at the query, the below will also work better than your current query
SELECT
*
FROM tblEmployee E
WHERE EXISTS
(
SELECT 1 FROM tblTransaction T WHERE EmployeeNumber = E.EmployeeNumber
)
this will give the same result with a better performace
It is entirely unclear how an employee number in tblTransaction would not exist. If you had properly declared foreign key relationships, this could not happen.
You can use the structure of your query, but I would recommend LEFT JOIN:
with tbl_name as (
select top (1000) T.EmployeeNumber
from tblTransaction T left join
tblEmployee E
on E.EmployeeNumber = T.EmployeeNumber
where E.EmployeeNumber is null
order by T.EmployeeNumber
)
Select Distinct EmployeeNumber
from tbl_name;
Your query has several issues:
The select * is doing exactly what the error message suggests. There are multiple columns with the same name.
The T in the outer query's reference to T.EmployeeNumber is not defined.
You are using parentheses around DISTINCT, as if it were a function. It is not. There is a syntactic element in SQL that is SELECT DISTINCT.
I'm not sure if you really need the TOP. NOT EXISTS is a more natural way to write the query:
select distinct t.EmployeeNumber
from tblTransaction T
where not exists (select 1
from tblEmployee E
where E.EmployeeNumber = T.EmployeeNumber
);

PostgreSQL - SELECT DISTINCT, ORDER BY expressions must appear in select list

I'm new to SQL.
I guess I've misunderstood the concept of how to use DISTINCT keyword.
Here's my code:
SELECT DISTINCT(e.id), e.text, e.priority, CAST(e.order_number AS integer), s.name AS source, e.modified_time, e.creation_time, (SELECT string_agg(DISTINCT text, '|') FROM definitions WHERE entry_id = d.entry_id) AS definitions
FROM entries AS e
LEFT JOIN definitions d ON d.entry_id = e.id
INNER JOIN sources s ON e.source_id = s.id
WHERE vocabulary_id = 22
ORDER BY e.order_number
The error is as follows:
ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 6: ORDER BY e.order_number
Just trying to understand what my SELECT statement should look like.
It appears to me that you are trying to distinct on a single column and not on others - which is bound to fail.
For example, select distinct a,b,c from x returns the unique combinations of a,b and c, not unique a but normal b and c
If you want one row per distinct e.id, then you are looking for distinct on. It is very important that the order by be consistent with the distinct on keys:
SELECT DISTINCT ON (e.id), e.id, e.text, e.priority, CAST(e.order_number AS integer),
s.name AS source, e.modified_time, e.creation_time,
(SELECT string_agg(DISTINCT d2.text, '|') FROM definitions d2 WHERE d2.entry_id = d.entry_id) AS definitions
FROM entries e LEFT JOIN
definitions d
ON d.entry_id = e.id INNER JOIN
sources s
ON e.source_id = s.id
WHERE vocabulary_id = 22
ORDER BY e.id, e.order_number;
Given the subquery, I suspect that there are better ways to write the query. If that is of interest, ask another question, provide sample data, desired results, and a description of the logic.

Slow MS Access Sub Query

I have three tables in Access:
employees
----------------------------------
id (pk),name
times
----------------------
id (pk),employee_id,event_time
time_notes
----------------------
id (pk),time_id,note
I want to get the record for each employee record from the times table with an event_time immediately prior to some time. Doing that is simple enough with this:
select employees.id, employees.name,
(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC) as time_id
from employees
However, I also want to get some indication of whether there's a matching record in the time_notes table:
select employees.id, employees.name,
(select top 1 time_notes.id from time_notes where time_notes.time_id=(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC)) as time_note_present,
(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC) as last_time_id
from employees
This does work but it's SOOOOO SLOW. We're talking 10 seconds or more if there's 100 records in the employee table. The problem is peculiar to Access as I can't use the last_time_id result of the other sub-query like I can in MySQL or SQL Server.
I am looking for tips on how to speed this up. Either a different query, indexes. Something.
Not sure if something like this would work for you?
SELECT
employees.id,
employees.name,
time_notes.id AS time_note_present,
times.id AS last_time_id
FROM
(
employees LEFT JOIN
(
times INNER JOIN
(
SELECT times.employee_id AS lt_employee_id, max(times.event_time) AS lt_event_time
FROM times
WHERE times.event_time <= #2018-01-30 14:21:48#
GROUP BY times.employee_id
)
AS last_times
ON times.event_time = last_times.lt_event_time AND times.employee_id = last_times.lt_employee_id
)
ON employees.id = times.employee_id
)
LEFT JOIN time_notes ON times.id = time_notes.time_id;
(Completely untested and may contain typos)
Basically, your query is running multiple correlated subqueries even a nested one in a WHERE clause. Correlated queries calculate a value separately for each row, corresponding to outer query.
Similar to #LeeMac, simply join all your tables to an aggregate query for the max event_time grouped by employee_id which will run once across all rows. Below times is the baseFROM table joined to the aggregate query, employees, and time_notes tables:
select e.id, e.name, t.event_time, n.note
from ((times t
inner join
(select sub.employee_id, max(sub.event_time) as max_event_time
from times sub
where sub.event_time <= #2018-01-30 14:21:48#
group by sub.employee_id
) as agg_qry
on t.employee_id = agg_qry.employee_id and t.event_time = agg_qry.max_event_time)
inner join employees e
on e.id = t.employee_id)
left join time_notes n
on n.time_id = t.id

Select specific columns from two tables

Suppose I have two tables tblEmployee and tblEmpSalary. I need to write a SQL statement to get a list of all employees, their name and salary, who receive the highest salary in each department.
Sample table data is here:
You could use ranking functions in this case:
WITH ranked AS (
SELECT
e.*,
s.monSalary,
rnk = RANK() OVER (PARTITION BY e.strDepartment ORDER BY s.monSalary DESC)
FROM tblEmplopyee e
INNER JOIN tblEmpSalary s ON e.intEmployeeID = s.intEmployeeID
)
SELECT
intEmploeeID,
strEmpName,
strDepartment,
monSalary
FROM ranked
WHERE rnk = 1
The RANK() function will do if you only need those who's got the topmost salary. With RANK(), the query may return more than employee per department if they have the same salary.
Alternatively, you can use DENSE_RANK() instead of RANK(), with the same effect, but DENSE_RANK() would also allow you to get employees with top n salaries. (You would be able to specify that in the WHERE condition like this:
WHERE rnk <= n
)
If, however, you need exactly one employee per department, even if there are several of them matching the requirement, use ROW_NUMBER() instead of RANK(). But then you'll probably need to add another criterion to the ORDER BY clause of the ranking function, e.g. like this:
... ORDER BY s.monSalary DESC, e.strEmpName ASC)
In fact, ROW_NUMBER() would simply make your query employee-oriented rather than salary-oriented. With ROW_NUMBER(), you would be able to have your query return top n most-paid employees, using the same condition as with DENSE_RANK():
WHERE rnk <= n
You can read more about ranking functions in SQL Server on MSDN:
Ranking Functions (Transact-SQL)
SELECT e.strEmpName, s.monSalary
FROM tblEmployee e
JOIN tblEmpSalary s ON e.intEmployeeID = s.intEmployeeID
WHERE e.strDepartment + '-' + CAST(s.monSalary AS varchar(20)) IN (
SELECT e2.strDepartment + '-' + CAST(MAX(s2.monSalary) AS varchar(20))
FROM tblEmployee e2
JOIN tblEmpSalary s2 ON e2.intEmployeeID = s2.intEmployeeID
GROUP BY e2.strDepartment)
Disclaimer: I can't test this query right now, so it could have some small detail wrong.
SELECT a.d, a.m, b.strEmpName
FROM (
SELECT strDepartment d, MAX(monSalary) m
FROM (
SELECT *
FROM tblEmployee e
LEFT JOIN tblEmpSalary s ON e.inEmployeeID = s.intEmployeeID
)
GROUP BY strDepartment
) a
LEFT JOIN (
SELECT *
FROM tblEmployee e
LEFT JOIN tblEmpSalary s ON e.inEmployeeID = s.intEmployeeID
) b ON a.d=b.strDepartment AND a.m=b.M
SELECT tblEmployee.strEmpName, max_salaries.strDepartment, max_salaries.salary
FROM (SELECT tblEmployee.strDepartment, MAX(monSalary)
FROM tblEmployee INNER JOIN tblEmpSalary
ON tblEmployee.intEmployeeID = tblEmpSalary.intEmployeeID
GROUP BY tblEmployee.strDepartment) max_salaries
INNER JOIN tblEmployee ON tblEmployee.strDepartment = max_salaries.strDepartment
INNER JOIN tblEmpSalary ON tblEmpSalary.monSalary = max_salaries.salary
AND tblEmpSalary.intEmployeeID = tblEmployee.intEmployeeID
In case of two or more employees with equal max salaries - this will return all of them for the specified department.