I need to join assignments and expatriates tables by a combination of ID, effective_start_date and effective_end_date.
I need to get data about employees who have gone to another country during their assignment effective_start_date and effective_end_date. But there is a need to handle cases when during one assignment there have been entered data about employees going to two or more countries - I need to show only one - the last one or the active one (if there is).
In the results I'm getting multiple values for 123 person ID and it's because there are incorrect values entered in assignments table - I need to only show only one of this person 123 date - the information about him going to china (the active one).
So basically, if during one assignment (between effective_start_date and effective_end_date) there is information about him going to 2 different countries, I need to only show one case. I need to correct my select statement so it handles this case somehow.
Edit : This also needs to work when the 2 cases about employee going to another country are historical so I dont think this can be done with sysdate.
Edit nr.2 - updated sql fiddle. I need to show BOTH expatriations for person 321 and ony one for person 123 - this is basically my main goal.
Edit nr.3 - still havent found the solution.
LINK TO SQLFIDDLE
select
ass.person_id,
ass.effective_start_date,
ass.effective_end_date,
exp.date_from,
exp.date_to,
exp.home_country,
exp.host_country
from expatriates exp, assignments ass
where
exp.person_id=ass.person_id
and exp.date_to >= ass.effective_start_date
and exp.date_to <= ass.effective_end_date
As #PuneetPandey already wrote your logic will not catch all overlapping periods.
To get only one row you can use ROW_NUMBER, e.g.
select *
from
(
select
ass.person_id,
ass.effective_start_date,
ass.effective_end_date,
exp.date_from,
exp.date_to,
exp.home_country,
exp.host_country,
row_number()
over (partition by ass.person_id, ass.effective_start_date
order by exp.date_from) as rn
from expatriates exp, assignments ass
where
exp.person_id=ass.person_id
and exp.date_to >= ass.effective_start_date
and exp.date_to <= ass.effective_end_date
) dt
where rn = 1
First of all, I think the query needs to be changed -
and exp.date_to <= ass.effective_end_date to and exp.date_from <= ass.effective_end_date.
Now, if you want any of the visited country, you can select distinct record by personid as below -
select
distinct ass.person_id,
ass.effective_start_date,
ass.effective_end_date,
exp.date_from,
exp.date_to,
exp.home_country,
exp.host_country
from expatriates exp, assignments ass
where
exp.person_id=ass.person_id
and exp.date_to >= ass.effective_start_date
and exp.date_from <= ass.effective_end_date
or, if you want a particular row, you can probably maintain another column for status and have that as '1' if the visit is active else keep that as '0' and use below query -
select
ass.person_id,
ass.effective_start_date,
ass.effective_end_date,
exp.date_from,
exp.date_to,
exp.home_country,
exp.host_country
from expatriates exp, assignments ass
where
exp.person_id=ass.person_id
and exp.date_to >= ass.effective_start_date
and exp.date_from <= ass.effective_end_date
and exp.status = 1
I think you need to join a third table which will a derived table like "X" below:
select
ass.person_id,
ass.effective_start_date,
ass.effective_end_date,
exp.date_from,
exp.date_to,
exp.home_country,
exp.host_country
from expatriates exp, assignments ass, (
SELECT e.person_id, MAX(e.date_from) md
FROM expatriates e
INNER JOIN assignments a ON e.person_id=a.person_id
and e.date_to >= a.effective_start_date
and e.date_to <= a.effective_end_date GROUP BY e.person_id) X
where exp.person_id= X.person_id
and exp.date_from= X.md
Im assuming if a person get fired effective_end_date will be updated and no more expatriates record will appear. So I just select the last date_to in expatriates. That is why I dont see why you need compare date ranges and remove that part from my where.
SQL FIDDLE DEMO
active_or_last_ass AS (
SELECT exp.person_id, date_from, max(exp.date_to) max_date
FROM expatriates exp
WHERE exp.date_from < sysdate
GROUP BY exp.person_id, date_from
)
select
ass.person_id,
ass.effective_start_date,
ass.effective_end_date,
exp.date_from,
exp.date_to,
exp.home_country,
exp.host_country
from
active_or_last_ass ala
inner join expatriates exp
on exp.person_id = ala.person_id
and exp.date_to = ala.max_date
inner join assignments ass
on exp.person_id = ass.person_id
Related
I have a SQL query (postgresql) that looks something like this:
SELECT
my_timestamp::timestamp::date as the_date,
count(*) as count
FROM my_table
WHERE ...
GROUP BY the_date
ORDER BY the_date
The result is a table of YYYY-MM-DD, count pairs.
Now I've been asked to fill in the empty dates with zero. So if I was previously providing
2022-03-15 3
2022-03-17 1
I'd now want to return
2022-03-15 3
2022-03-16 0
2022-03-17 1
Now I can easily do this client-side (relative to the database) and let my program compute and return the zero-augmented list to its clients based on the original list from postgres. But perhaps it would better if I could just tell postgresql to include zeros.
I suspect this isn't easy at all, because postgres has no obvious way of knowing what I'm up to. But in the interests of learning more about postgres and SQL, I thought I'd have try. The try isn't too promising thus far...
Any pointers before I conclude that I was right to leave this to my (postgres client) program?
Update
This is an interesting case where my simplification of the problem led to a correct answer that didn't work for me. For those who come after, I thought it worth documenting what followed, because it take some fun twists through constructing SQL queries.
#a_horse_with_no_name responded with a query that I've verified works if I simplify my own query to match. Unfortunately, my query had some extra baggage that I didn't think pertinent, and so had trimmed out when posting the original question.
Here's my real (original) query, with all names preserved (if shortened):
-- current query
SELECT
LEAST(time1, time2, time3, time4)::timestamp::date as the_date,
count(*) as count
FROM reading_group_reader rgr
INNER JOIN ( SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
WHERE LEAST(time1, time2, time3, time4) > current_date - 30
GROUP BY the_date
ORDER BY the_date;
If I translate that directly into the proposed solution, however, the inner join between reading_group_reader and the temporary table TT causes the left join to become inner (I think) and the date sequence drops its zeros again. Fwiw, the table TT is a table because sometimes it actually is a subselect.
So I transformed my query into this:
SELECT
g.dt::date as the_date,
count(*) as count
FROM generate_series(date '2022-03-06', date '2022-04-06', interval '1 day') as g(dt)
LEFT JOIN (
SELECT
LEAST(rgr.time1, rgr.time2, rgr.time3, rgr.time4)::timestamp::date as the_date
FROM reading_group_reader rgr
INNER JOIN (
SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
) rgrt
ON rgrt.the_date = g.dt::date
GROUP BY g.dt
ORDER BY the_date;
but this outputs 1's instead of 0's at the places that should be 0.
The reason for that, however, is because I've now selected every date, so, of course, there's one of each. I need to include an additional field (which will be NULL) and count that.
So this query finally does what I want:
SELECT
g.dt::date as the_date,
count(rgrt.device_id) as count
FROM generate_series(date '2022-03-06', date '2022-04-06', interval '1 day') as g(dt)
LEFT JOIN (
SELECT
LEAST(rgr.time1, rgr.time2, rgr.time3, rgr.time4)::timestamp::date as the_date,
rgr.device_id
FROM reading_group_reader rgr
INNER JOIN (
SELECT group_id, group_type ::group_type_name
FROM (VALUES (31198, 'excerpt')) as T(group_id, group_type)
) TT
ON TT.group_id = rgr.group_id
AND TT.group_type = rgr.group_type
) rgrt(the_date)
ON rgrt.the_date = g.dt::date
GROUP BY g.dt
ORDER BY g.dt;
And, of course, on re-reading the accepted answer, I eventually saw that he did count an unrelated field, which I'd simply missed on my first several readings.
You will need to join to a list of dates. This can e.g. be done using generate_series()
SELECT g.dt::date as the_date,
count(t.my_timestamp) as count
FROM generate_series(date '2022-03-01',
date '2022-03-31',
interval '1 day') as g(dt)
LEFT JOIN my_table as t
ON t.my_timestamp::date = g.dt::date
AND ... -- the original WHERE clause goes here!
GROUP BY the_date
ORDER BY the_date;
Note that the original WHERE conditions need to go into the join condition of the LEFT JOIN. You can't put them into a WHERE clause because that would turn the outer join back into an inner join (which means the missing dates wouldn't be returned).
Let's take a simple query in Oracle:
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED
FROM
CASE
WHERE
CASE.DATE_RAISED > '2019-01-01'
Now let's say another table, EVENT, contains multiple events which may be associated with each case (linked via EVENT.CASE_ID). OR not exist at all. I want to report on the earliest-dated future event per case - or if nothing exists, return NULL. I can do this with a subquery in the SELECT clause, as follows:
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED,
(
SELECT
MIN(EVENT.DATE)
FROM
EVENT
WHERE
EVENT.CASE_ID = CASE.ID
AND EVENT.DATE >= CURRENT_DATE
) AS MIN_EVENT_DATE
FROM
CASE
WHERE
CASE.DATE_RAISED > '2019-01-01'
This will return a table like this:
Case ID Case Type Date Raised Min Event Date
76 A 03/01/2019 10/05/2019
43 B 02/02/2019 [NULL]
89 A 29/01/2019 08/07/2019
90 A 04/03/2019 [NULL]
102 C 15/04/2019 20/05/2019
Note that if there do not exist any Events which match the criteria, the line is still returned but without a value. This is because the subquery is in the SELECT clause. This works just fine.
My problem, however, is if I want to return more than one column from the EVENT table - while still at the same time preserving the possibility that there are no matching rows from the EVENT table. The above code only returns EVENT.DATE as the single subquery result, to ONE column of the main query. But what if I also want to return EVENT.ID, or EVENT.TYPE? While still allowing for them to be NULL (if no matching records from CASE are found)?
I suppose I could use multiple subqueries in the SELECT clause: each returning just ONE column. But this seems horribly inefficient, given that each subquery would be based on the same criteria (the minimum-dated EVENT whose CASE ID matches that of the main query; or NULL if no such events found).
I suspect some nifty joins would be the answer - although I'm struggling to understand which ones exactly.
Please note that the above examples are vastly simplified versions of my actual code, which already contains multiple joins in the "old style" Oracle format, eg:
WHERE
CASE.ID(+) = EVENT.CASE_ID
There are reasons why this is so - therefore a request to anyone answering this, please would you demonstrate any solutions in this style of coding, as my SQL isn't far enough advanced to be able to re-factor the "newer" style joins into existing code.
You can use a join and window functions. For instance:
select c.*, e.*
from c left join
(select e.*,
row_number() over (partition by e.case_id order by e.date desc) as seqnum
from events e
) e
on e.case_id = c.id and e.seqnum = 1;
where c.date_raised > date '2019-01-01'; -- assuming the value is a date
Is this what you mean? I just rewrote Gordon's answer with old Oracle join syntax and your code style.
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED,
MIN_E.DATE AS MIN_EVENT_DATE
FROM
CASE,
(SELECT EVENT.*,
ROW_NUMBER() OVER (PARTITION BY EVENT.CASE_ID ORDER BY EVENT.DATE DESC) AS SEQNUM
FROM
EVENT
WHERE
EVENT.DATE >= CURRENT_DATE
) MIN_E
WHERE
CASE.DATE_RAISED > DATE '2019-01-01'
AND MIN_E.CASE_ID (+) = CASE.ID
AND MIN_E.SEQNUM (+) = 1;
Create object type with columns you want and return it from subquery. Your query will be like
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED,
(
SELECT
t_your_new_type ( MIN(EVENT.DATE) , min ( EVENT.your_another_column ) )
FROM
EVENT
WHERE
EVENT.CASE_ID = CASE.ID
AND EVENT.DATE >= CURRENT_DATE
) AS MIN_EVENT_DATE
FROM
CASE
WHERE
CASE.DATE_RAISED > '2019-01-01'
I'm working in a fault-reporting Oracle database, trying to get fault information out of it.
The main table I'm querying is Incident, which includes incident information. Each record in Incident may have any number of records in the WorkOrder table (or none) and each record in WorkOrder may have any number of records in the WorkLog table (or none).
What I am trying to do at this point is, for each record in Incident, find the WorkLog with the minimum value in the field MXRONSITE, and, for that worklog, return the MXRONSITE time and the REPORTDATE from the work order. I accomplished this using a MIN subquery, but it turned out that several worklogs could have the same MXRONSITE time, so I was pulling back more records than I wanted. I tried to create a subsubquery for it, but it now says I have an invalid identifier (ORA-00904) for WOL1.WONUM in the WHERE line, even though that identifier is in use elsewhere.
Any help is appreciated. Note that there is other stuff in the query, but the rest of the query works in isolation, and this but doesn't work in the full query or on its own.
SELECT
WL1.MXRONSITE as "Date_First_Onsite",
WOL1.REPORTDATE as "Date_First_Onsite_Notified"
FROM Maximo.Incident
LEFT JOIN (Maximo.WorkOrder WOL1
LEFT JOIN Maximo.Worklog WL1
ON WL1.RECORDKEY = WOL1.WONUM)
ON WOL1.ORIGRECORDID = Incident.TICKETID
AND WOL1.ORIGRECORDCLASS = 'INCIDENT'
WHERE (WL1.WORKLOGID IN
(SELECT MIN(WL3.WORKLOGID)
FROM (SELECT MIN(WL3.MXRONSITE), WL3.WORKLOGID
FROM Maximo.Worklog WL3 WHERE WOL1.WONUM = WL3.RECORDKEY))
or WL1.WORKLOGID is null)
To clarify, what I want is:
For each fault in Incident,
the earliest MXRONSITE from the Worklog table (if such a value exists),
For that worklog, information from the associated record from the WorkOrder table.
This is complicated by Incident records having multiple work orders, and work orders having multiple work logs, which may have the same MXRONSITE time.
After some trials, I have found an (almost) working solution:
WITH WLONSITE as (
SELECT
MIN(WLW.MXRONSITE) as "ONSITE",
WLWOW.ORIGRECORDID as "TICKETID",
WLWOW.WONUM as "WONUM"
FROM
MAXIMO.WORKLOG WLW
INNER JOIN
MAXIMO.WORKORDER WLWOW
ON
WLW.RECORDKEY = WLWOW.WONUM
WHERE
WLWOW.ORIGRECORDCLASS = 'INCIDENT'
GROUP BY
WLWOW.ORIGRECORDID, WLWOW.WONUM
)
select
incident.ticketid,
wlonsite.onsite,
wlonsite.wonum
from
maximo.incident
LEFT JOIN WLONSITE
ON WLONSITE.TICKETID = Incident.TICKETID
WHERE
(WLONSITE.ONSITE is null or WLONSITE.ONSITE = (SELECT MIN(WLONSITE.ONSITE) FROM WLONSITE WHERE WLONSITE.TICKETID = Incident.TICKETID AND ROWNUM=1))
AND Incident.AFFECTEDDATE >= TO_DATE ('01/12/2015', 'DD/MM/YYYY')
This however is significantly slower, and also still not quite right, as it turns out a single Incident can have multiple Work Orders with the same ONSITE time (aaargh!).
As requested, here is a sample input, and what I want to get from it (apologies for the formatting). Note that while TICKETID and WONUM are primary keys, they are strings rather than integers. WORKLOGID is an integer.
Incident table:
TICKETID / Description / FieldX
1 / WORD1 / S
2 / WORD2 / P
3 / WORDX /
4 / / Q
Work order table:
WONUM / ORIGRECORDID / REPORTDATE
11 / 1 / 2015-01-01
12 / 2 / 2015-01-01
13 / 2 / 2015-02-04
14 / 3 / 2015-04-05
Worklog table:
WORKLOGID / RECORDKEY / MXRONSITE
101 / 11 / 2015-01-05
102 / 12 / 2015-01-04
103 / 12 /
104 / 12 / 2015-02-05
105 / 13 /
Output:
TICKETID / WONUM / WORKLOGID
1 / 11 / 101
2 / 12 / 102
3 / /
4 / /
(Worklog 101 linked to TICKETID 1, has non-null MXRONSITE, and is from work order 11)
(Worklogs 102-105 linked to TICKETID 2, of which 102 has lowest MXRONSITE, and is work order 12)
(No work logs associated with faults 103 or 104, so work order and worklog fields are null)
Post Christmas attack!
I have found a solution which works:
The method I found was to use multiple WITH queries, as follows:
WLMINL AS (
SELECT
RECORDKEY, MXRONSITE, MIN(WORKLOGID) AS "WORKLOG"
FROM MAXIMO.WORKLOG
WHERE WORKLOG.CLASS = 'WORKORDER'
GROUP BY RECORDKEY, MXRONSITE
),
WLMIND AS (
SELECT
RECORDKEY, MIN(MXRONSITE) AS "MXRONSITE"
FROM MAXIMO.WORKLOG
WHERE WORKLOG.CLASS = 'WORKORDER'
GROUP BY RECORDKEY
),
WLMIN AS (
SELECT
WLMIND.RECORDKEY AS "WONUM", WLMIND.MXRONSITE AS "ONSITE", WLMINL.WORKLOG AS "WORKLOGID"
FROM
WLMIND
INNER JOIN
WLMINL
ON
WLMIND.RECORDKEY = WLMINL.RECORDKEY AND WLMIND.MXRONSITE = WLMINL.MXRONSITE
)
Thus for each work order finding the first date, then for each work order and date finding the lowest worklogid, then joining the two tables. This is then repeated at a higher level to find the data by incident.
However this method does not work in a reasonable time, so while it may be suitable for smaller databases it's no good for the behemoths I'm working with.
I would do this with row_number function:
SQLFiddle
select ticketid, case when worklogid is not null then reportdate end d1, mxronsite d2
from (
select i.ticketid, wo.reportdate, wl.mxronsite, wo.wonum, wl.worklogid,
row_number() over (partition by i.ticketid
order by wl.mxronsite, wo.reportdate) rn
from incident i
left join workorder wo on wo.origrecordid = i.ticketid
and wo.origrecordclass = 'INCIDENT'
left join worklog wl on wl.recordkey = wo.wonum )
where rn = 1 order by ticketid
When you nest subqueries, you cannot access columns that belong two or more levels higher; in your statement, WL1 is not accessible in the innermost subquery. (There is also a group-by clause missing, btw)
This might work (not exactly sure what output you expect, but try it):
SELECT
WL1.MXRONSITE as "Date_First_Onsite",
WOL1.REPORTDATE as "Date_First_Onsite_Notified"
FROM Maximo.Incident
LEFT JOIN (
Maximo.WorkOrder WOL1
LEFT JOIN Maximo.Worklog WL1
ON WL1.RECORDKEY = WOL1.WONUM
) ON WOL1.ORIGRECORDID = Incident.TICKETID
AND WOL1.ORIGRECORDCLASS = 'INCIDENT'
WHERE WL1.WORKLOGID =
( SELECT MIN(WL3.WORKLOGID)
FROM Maximo.WorkOrder WOL3
LEFT JOIN Maximo.Worklog WL3
ON WL3.RECORDKEY = WOL3.WONUM
WHERE WOL3.ORIGRECORDID = WOL1.ORIGRECORDID
AND WL3.MXRONSITE IS NOT NULL
)
OR WL1.WORKLOGID IS NULL AND NOT EXISTS
( SELECT MIN(WL4.WORKLOGID)
FROM Maximo.WorkOrder WOL4
LEFT JOIN Maximo.Worklog WL4
ON WL4.RECORDKEY = WOL4.WONUM
WHERE WOL4.ORIGRECORDID = WOL1.ORIGRECORDID
AND WL4.MXRONSITE IS NOT NULL )
I may not have the details right on what you're trying to do... if you have some sample input and desired output, that would be a big help.
That said, I think an analytic function would help a lot, not only in getting the output but in organizing the code. Here is an example of how the max analytic function in a subquery could be used.
Again, the details on the join may be off -- if you can furnish some sample input and output, I'll bet someone can get to where you're trying to go:
with wo as (
select
wonum, origrecordclass, origrecordid, reportdate,
max (reportdate) over (partition by origrecordid) as max_date
from Maximo.workorder
where origrecordclass = 'INCIDENT'
),
logs as (
select
worklogid, mxronsite, recordkey,
max (mxronsite) over (partition by recordkey) as max_mx
from Maximo.worklog
)
select
i.ticketid,
l.mxronsite as "Date_First_Onsite",
wo.reportdate as "Date_First_Onsite_Notified"
from
Maximo.incident i
left join wo on
wo.origrecordid = i.ticketid and
wo.reportdate = wo.max_date
left join logs l on
wo.wonum = l.recordkey and
l.mxronsite = l.max_mx
-- edit --
Based on your sample input and desired output, this appears to give the desired result. It does do somewhat of an explosion in the subquery, but hopefully the efficiency of the analytic functions will dampen that. They are typically much faster, compared to using group by:
with wo_logs as (
select
wo.wonum, wo.origrecordclass, wo.origrecordid, wo.reportdate,
l.worklogid, l.mxronsite, l.recordkey,
max (reportdate) over (partition by origrecordid) as max_date,
min (mxronsite) over (partition by recordkey) as min_mx
from
Maximo.workorder wo
left join Maximo.worklog l on wo.wonum = l.recordkey
where wo.origrecordclass = 'INCIDENT'
)
select
i.ticketid, wl.wonum, wl.worklogid,
wl.mxronsite as "Date_First_Onsite",
wl.reportdate as "Date_First_Onsite_Notified"
from
Maximo.incident i
left join wo_logs wl on
i.ticketid = wl.origrecordid and
wl.mxronsite = wl.min_mx
order by 1
There are to tables - assignments and countries.
Assignments stores historical data about employees historical assignments, and the three main fields are - person_id, effective_start_date and effective_end_date.
Countries stores info about employees who have been taking trips abroad - the important fields in it are person_id, date_from, date_to, home_country, host_country.
And i need to do the following : I need to show all assignments, and the country which the employee had been to at any point during the assignment. Which means i need to join them via outer join, but the only way i can join them is via person_id but there are multiple entries in each table (Same ID's)
So what i did was something like this :
select *
from assignments ass, employees emp
where
ass.person_id=emp.person_id
AND (emp.date_from(+) >= ass.assignment_start_date AND emp.date_to(+) <= ass.assignment_end_date)
OR (emp.date_from(+) >= ass.assignment_start_date AND emp.date_to(+) >= ass.assignment_end_date)
but it doesnt work because oracle doesn't allow me to make OR statement in an outer join. I tried using union all method but the values returned are not what i quite expected - there are some missing values, so the logic isn't correct eighter. If you have any advices please post them in the same syntax i provided (oracle syntax) where joins are made in where clause, so it's easier for me to understand.
To get your desired result, you can use the following approach:
use ANSI style JOINs instead of the outdated Oracle syntax (they're much more flexible and IMHO also more readable)
concatenate the countries (e.g. using LISTAGG)
Query:
select ass.person_id,
assignment_start_date,
assignment_end_date,
listagg(home_country ||'-' || host_country, ';')
within group (order by date_from) as countries
from assignments ass
left join employees emp
on ass.person_id = emp.person_id
AND ((emp.date_from >= ass.assignment_start_date AND
emp.date_from <= ass.assignment_end_date)
OR (emp.date_to >= ass.assignment_start_date AND
emp.date_to <= ass.assignment_end_date))
group by ass.person_id, assignment_start_date, assignment_end_date
SQL Fiddle
I think is this what you want, with oracle syntax
select ass.person_id, assignment_start_date, assignment_end_date,
emp.home_country,emp.host_country
from assignments ass, employees emp
where
ass.person_id=emp.person_id(+)
AND (emp.date_from(+) <= ass.assignment_end_date AND emp.date_to(+) >= ass.assignment_end_date)
SQLFiddle
I'm trying to run a query where it will give me one Sum Function, then select two columns from a joined table and then to group that data by the unique id i gave them. This is my original query and it works.
SELECT Sum (Commission_Paid)
FROM [INTERN_DB2].[dbo].[PaymentList]
INNER JOIN [INTERN_DB2]..[RealEstateAgentList]
ON RealEstateAgentList.AgentID = PaymentList.AgentID
WHERE Close_Date >= '1/1/2013' AND Close_Date <= '12/31/2013'
GROUP BY RealEstateAgentList.AgentID
I've tried the query below, but I keep getting an error and I don't know why. It says its a syntax error.
SELECT Sum (Commission_Paid)
FROM [INTERN_DB2].[dbo].[PaymentList]
INNERJOIN [INTERN_DB2]..[RealEstateAgentList](
Select First_Name, Last_Name
From [Intern_DB2]..[RealEstateAgentList]
Group By Last_name
)
ON RealEstateAgentList.AgentID = PaymentList.AgentID
WHERE Close_Date >= '1/1/2013' AND Close_Date <= '12/31/2013'
GROUP BY RealEstateAgentList.AgentID
Your query has multiple problems:
SELECT rl.AgentId, rl.first_name, rl.last_name, Sum(Commission_Paid)
FROM [INTERN_DB2].[dbo].[PaymentList] pl inner join
(Select agent_id, min(first_name) as first_name, min(last_name) as last_name
From [Intern_DB2]..[RealEstateAgentList]
GROUP BY agent_id
) rl
ON rl.AgentID = pl.AgentID
WHERE Close_Date >= '2013-01-01' AND Close_Date <= '2013-12-31'
GROUP BY rl.AgentID, rl.first_name, rl.last_name;
Here are some changes:
INNERJOIN --> inner join.
Fixed the syntax of the subquery next to the table name.
Removed columns for first and last name. They are not used.
Changed the subquery to include agent_id.
Added agent_id, first_name, and last_name to the outer aggregation, so you can tell where the values are coming from.
Changed the date formats to a less ambiguous standard form.
Added table alias for subquery.
I suspect the subquery on the agent list is not important. You can probably do:
SELECT rl.AgentId, rl.first_name, rl.last_name, Sum(pl.Commission_Paid)
FROM [INTERN_DB2].[dbo].[PaymentList] pl inner join
[Intern_DB2]..[RealEstateAgentList] rl
ON rl.AgentID = pl.AgentID
WHERE pl.Close_Date >= '2013-01-01' AND pl.Close_Date <= '2013-12-31'
GROUP BY rl.AgentID, rl.first_name, rl.last_name;
EDIT:
I'm glad this solution helped. As you continue to write queries, try to always do the following:
Use table aliases that are abbreviations of the table names.
Always use table aliases when referring to columns.
When using date constants, either use "YYYY-MM-DD" format or use convert() to convert a string using the specified format. (The latter is actually the safer method, but the former is more convenient and works in almost all databases.)
Pay attention to the error messages; they can be informative in SQL Server (unfortunately, other databases are not so clear).
Format your query so other people can understand it. This will help you understand and debug your queries as well. I have a very particular formatting style (which no one is going to change at this point); the important thing is not the particular style but being able to "see" what the query is doing. My style is documented in my book "Data Analysis Using SQL and Excel.
There are other rules, but these are a good way to get started.
SELECT Sum (Commission_Paid)
FROM [INTERN_DB2].[dbo].[PaymentList] pl
INNER JOIN (
Select First_Name, Last_Name
From [Intern_DB2]..[RealEstateAgentList]
Group By Last_name
) x ON x.AgentID = pl.AgentID
WHERE Close_Date >= '1/1/2013'
AND Close_Date <= '12/31/2013'
GROUP BY RealEstateAgentList.AgentID
This is how the query should look... however, if you subquery first and last name, you'll also have to include them in the group by. Assuming Close_Date is in the PaymentList table, this is how I would write the query:
SELECT
al.AgentID,
al.FirstName,
al.LastName,
Sum(pl.Commission_Paid) AS Commission_Paid
FROM [INTERN_DB2].[dbo].[PaymentList] pl
INNER JOIN [Intern_DB2].dbo.[RealEstateAgentList] al ON al.AgentID = pl.AgentID
WHERE YEAR(pl.Close_Date) = '2013'
GROUP BY al.AgentID, al.FirstName, al.LastName
Subqueries are evil, for the most part. There's no need for one here, because you can just get the columns from the join.