SQL retrieve info from 2 different ID in a single query - sql

Let's say I have a table that represents soccer matches (X_train in this case) with an away_team_id and a home_team_id, those id points to another table 'team_attributes'.
What I managed to do with a query is to select the attributes of only one of the team but I'm interested in getting both team's attributes.
This is the query I'm using now :
SELECT
X_Train.* , Team_Attributes.*, MAX(Team_Attributes.date)
FROM
X_Train
LEFT JOIN
Team_Attributes ON X_Train.home_team_api_id = Team_Attributes.team_api_id
AND Team_Attributes.date <= X_Train.date
GROUP BY
X_Train.id
ORDER BY
X_Train.date
This works fine but I need to get the same join on the X_train.away_team_api_id, is there an easy way to do this ? I tried using UNION but maybe I didn't look far enough in that direction.
Thank you

You need a second join. For that -- and to simplify the query -- use table aliases:
SELECT t.*, hta.*, ata.*
FROM X_Train t LEFT JOIN
Team_Attributes hta
ON t.home_team_api_id = hta.team_api_id AND
hta.date <= t.date LEFT JOIN
Team_Attributes ata
ON t.away_team_api_id = ata.team_api_id AND
ata.date <= t.date
ORDER BY t.date DESC;
I don't understand what the GROUP BY is doing, so I removed it. Your question appears to be about JOIN logic anyway.

Related

How to filter where a condition is true at least once

I need to filter down to only service orders that have a "service" work group value in at least one of their tasks. However, I don't want to get rid of the rows that aren't work group = "Service" if at least one of the task rows has that value. The end result would leave out all data from service orders that didn't have at least one BI_WRKFLW_TASK_KEY that was equal to "SERVICE". I know how to do normal filters but getting it to this specificity is beyond my current experience.
I've experimented with normal filters but they leave out rows that are a part of the same Service Order but just don't have that work group.
SELECT W.BI_WRKFLW_KEY,
T.BI_WORK_EVENT_CD,
T.BI_TASK_CD,
T.BI_WORKGRP,
**M.BI_SO_NBR**,
M.BI_SO_TYPE_CD,
M.BI_CLOSE_DT,
M.BI_OPEN_DT,
M.BI_SO_STAT_CD,
R.BI_WRKFLW_TMPLT_NM,
T.BI_WRKFLW_TASK_SEQ_NBR,
T.BI_WORKGRP,
A.BI_WORK_EVENT_CD,
A.BI_EVENT_DT_TM,
A.SY_JOB_QUEUE_ID,
**A.BI_WORKGRP**,
A.SY_USER_ID,
**A.BI_WRKFLW_TASK_KEY**
FROM BI_WRKFLW W
LEFT JOIN BI_WRKFLW_TASKS T ON W.BI_WRKFLW_KEY = T.BI_WRKFLW_KEY
LEFT JOIN BI_SO_DET D ON W.BI_WRKFLW_KEY = D.BI_WRKFLW_KEY
LEFT JOIN BI_SO_MASTER M ON D.BI_SO_NBR = M.BI_SO_NBR
LEFT JOIN BI_WRKFLW_TMPLT_REF R ON W.BI_WRKFLW_TMPLT_ID = R.BI_WRKFLW_TMPLT_ID
LEFT JOIN BI_TASK_ACT A ON T.BI_WRKFLW_TASKS_KEY = A.BI_WRKFLW_TASKS_KEY
WHERE M.BI_OPEN_DT >= ADD_MONTHS(CURRENT_DATE, -'12')
--AND M.BI_SO_TYPE_CD IN ('IVC-NEW1')
--AND M.BI_SO_STAT_CD LIKE 'O'
ORDER BY M.BI_SO_NBR, T.BI_EVENT_DT_TM
Any Service order row where the Service order has at least one BI_WRKFLOW_TASK_CD = "Service" would be kept and all other service orders filtered out.
I tried to map this out, i may not have got it quite right,
I think you are asking for BI_SO_MASTER records that have >=1 BI_WRKFLW_TASKS that belong to a certain group.
Try using a CTE to get the detail rows with a correct task, then you can find the SO population... then you can ???not sure what the ultimate result set goal is?
;with matchingTasks as ( D.BI_SO_NBR, D.<id> , W.BI_WRKFLW_KEY , T.<key> , A.Key
from BI_WRKFLW W
LEFT JOIN BI_WRKFLW_TASKS T ON W.BI_WRKFLW_KEY = T.BI_WRKFLW_KEY
LEFT JOIN BI_SO_DET D ON W.BI_WRKFLW_KEY = D.BI_WRKFLW_KEY
LEFT JOIN BI_TASK_ACT A ON T.BI_WRKFLW_TASKS_KEY = A.BI_WRKFLW_TASKS_KEYW
Where
<good dates>
and <A.field is what I am looking for>
)
/*Here you have the SO population
as well as the ids that helped this SO qualify.
*/
, My_SO_Population as (select Distinct BI_SO_NBR from matchingTasks )
/*now you can go get what you need.
the challenge of finding SOs w/ >=1 matching task has been solved...
*/
select <necessary fields> from
My_SO_Population
join <whatever you need....this is where i am cloudy>
if i am missing the goal, let me know where...
You can just add this to your WHERE clause:
AND T.BI_WRKFLW_KEY IN (
SELECT BI_WRKFLW_KEY
FROM BI_WRKFLW_TASKS
WHERE BI_WRKFLOW_TASK_CD = 'Service')

SQL triple left join query across three databases

I'm trying to run a query across three tables in three different databases. This query works but I'm pulling close to a billion records... Is there any solution to pull the distinct fields from smlog.requestor_type and arcust.maj_class for the following query?
SELECT
smreq.request_id AS ROIrequestID,
arcust.customer AS LAWcustID,
smlog.logid AS ESLlogID,
arcust.maj_class AS invoicetype,
smlog.requestor_type AS SMLrequestortype,
smlog.request_type as SMLrequesttype
FROM roi.sm_request_sp_data reqsp
LEFT JOIN smart.smlog#smartlog smlog ON smlog.logid = reqsp.logid
LEFT JOIN roi.sm_requests smreq ON smreq.request_id = reqsp.request_id
LEFT JOIN lawson.arcustomer#smart7 arcust ON arcust.customer =
smreq.customer_id
WHERE smreq.ORIG_DT >= TO_DATE('2016/03/01', 'yyyy/mm/dd')
AND smreq.ORIG_DT <= TO_DATE('2016/03/02','yyyy/mm/dd')
GROUP BY smlog.requestor_type;
These are observations, not an answer
SELECT
smreq.request_id AS ROIrequestID
FROM roi.sm_request_sp_data reqsp
LEFT JOIN roi.sm_requests smreq ON reqsp.request_id = smreq.request_id
WHERE smreq.ORIG_DT >= TO_DATE('2016/03/01', 'yyyy/mm/dd')
AND smreq.ORIG_DT <= TO_DATE('2016/03/02', 'yyyy/mm/dd')
That LEFT JOIN is overridden completely by the where clause (any NULL produced from the left join is disallowed) so use an INNER JOIN instead.
For the where clause It isn't clear if you want one day's data ('2016/03/01') or 2 day's (both '2016/03/01'+ '2016/03/02'), If you are expecting just one day then don't use <= in the second predicate.
For the rest we really have no factual basis to make recommendations.

Include missing years in Group By query

I am fairly new in Access and SQL programming. I am trying to do the following:
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
and group by year even when there is no amount in some of the years. I would like to have these years listed as well for a report with charts. I'm not certain if this is possible, but every bit of help is appreciated.
My code so far is as follows:
SELECT
Base_CustomerT.SalesRep,
SO_SalesOrderT.CustomerId,
Base_CustomerT.Customer,
SO_SalesOrderPaymentHistoryLineT.DatePaid,
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
FROM
Base_CustomerT
INNER JOIN (
SO_SalesOrderPaymentHistoryLineT
INNER JOIN SO_SalesOrderT
ON SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId
) ON Base_CustomerT.CustomerId = SO_SalesOrderT.CustomerId
GROUP BY
Base_CustomerT.SalesRep,
SO_SalesOrderT.CustomerId,
Base_CustomerT.Customer,
SO_SalesOrderPaymentHistoryLineT.DatePaid,
SO_SalesOrderPaymentHistoryLineT.PaymentType,
Base_CustomerT.IsActive
HAVING
(((SO_SalesOrderPaymentHistoryLineT.PaymentType)=1)
AND ((Base_CustomerT.IsActive)=Yes))
ORDER BY
Base_CustomerT.SalesRep,
Base_CustomerT.Customer;
You need another table with all years listed -- you can create this on the fly or have one in the db... join from that. So if you had a table called alltheyears with a column called y that just listed the years then you could use code like this:
WITH minmax as
(
select min(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as minyear,
max(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as maxyear)
from SalesOrderPaymentHistoryLineT
), yearsused as
(
select y
from alltheyears, minmax
where alltheyears.y >= minyear and alltheyears.y <= maxyear
)
select *
from yearsused
join ( -- your query above goes here! -- ) T
ON year(T.SO_SalesOrderPaymentHistoryLineT.DatePaid) = yearsused.y
You need a data source that will provide the year numbers. You cannot manufacture them out of thin air. Supposing you had a table Interesting_year with a single column year, populated, say, with every distinct integer between 2000 and 2050, you could do something like this:
SELECT
base.SalesRep,
base.CustomerId,
base.Customer,
base.year,
Sum(NZ(data.Amount)) AS [Sum Of PaymentPerYear]
FROM
(SELECT * FROM Base_CustomerT INNER JOIN Year) AS base
LEFT JOIN
(SELECT * FROM
SO_SalesOrderT
INNER JOIN SO_SalesOrderPaymentHistoryLineT
ON (SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId)
) AS data
ON ((base.CustomerId = data.CustomerId)
AND (base.year = Year(data.DatePaid))),
WHERE
(data.PaymentType = 1)
AND (base.IsActive = Yes)
AND (base.year BETWEEN
(SELECT Min(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT)
AND (SELECT Max(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT))
GROUP BY
base.SalesRep,
base.CustomerId,
base.Customer,
base.year,
ORDER BY
base.SalesRep,
base.Customer;
Note the following:
The revised query first forms the Cartesian product of BaseCustomerT with Interesting_year in order to have base customer data associated with each year (this is sometimes called a CROSS JOIN, but it's the same thing as an INNER JOIN with no join predicate, which is what Access requires)
In order to have result rows for years with no payments, you must perform an outer join (in this case a LEFT JOIN). Where a (base customer, year) combination has no associated orders, the rest of the columns of the join result will be NULL.
I'm selecting the CustomerId from Base_CustomerT because you would sometimes get a NULL if you selected from SO_SalesOrderT as in the starting query
I'm using the Access Nz() function to convert NULL payment amounts to 0 (from rows corresponding to years with no payments)
I converted your HAVING clause to a WHERE clause. That's semantically equivalent in this particular case, and it will be more efficient because the WHERE filter is applied before groups are formed, and because it allows some columns to be omitted from the GROUP BY clause.
Following Hogan's example, I filter out data for years outside the overall range covered by your data. Alternatively, you could achieve the same effect without that filter condition and its subqueries by ensuring that table Intersting_year contains only the year numbers for which you want results.
Update: modified the query to a different, but logically equivalent "something like this" that I hope Access will like better. Aside from adding a bunch of parentheses, the main difference is making both the left and the right operand of the LEFT JOIN into a subquery. That's consistent with the consensus recommendation for resolving Access "ambiguous outer join" errors.
Thank you John for your help. I found a solution which works for me. It looks quiet different but I learned a lot out of it. If you are interested here is how it looks now.
SELECT DISTINCTROW
Base_Customer_RevenueYearQ.SalesRep,
Base_Customer_RevenueYearQ.CustomerId,
Base_Customer_RevenueYearQ.Customer,
Base_Customer_RevenueYearQ.RevenueYear,
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
FROM
Base_Customer_RevenueYearQ
LEFT JOIN CustomerPaymentPerYearQ
ON (Base_Customer_RevenueYearQ.RevenueYear = CustomerPaymentPerYearQ.[RevenueYear])
AND (Base_Customer_RevenueYearQ.CustomerId = CustomerPaymentPerYearQ.CustomerId)
GROUP BY
Base_Customer_RevenueYearQ.SalesRep,
Base_Customer_RevenueYearQ.CustomerId,
Base_Customer_RevenueYearQ.Customer,
Base_Customer_RevenueYearQ.RevenueYear,
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
;

SQl Query get data very slow from different tables

I am writing a sql query to get data from different tables but it is getting data from different tables very slowly.
Approximately above 2 minutes to complete.
What i am doing is here :
1. I am getting data differences and on behalf of date difference i am getting account numbers
2. I am comparing tables to get exact data i need.
here is my query
select T.accountno,
MAX(T.datetxn) as MxDt,
datediff(MM,MAX(T.datetxn), '2011-6-30') as Diffs,
max(P.Name) as POName
from Account_skd A,
AccountTxn_skd T,
POName P
where A.AccountNo = T.AccountNo and
GPOCode = A.OfficeCode and
Code = A.POCode and
A.servicecode = T.ServiceCode
group by T.AccountNo
order by len(T.AccountNo) DESC
please help that how i can use joins or any other way to get data within very less time say 5-10 seconds.
Since it appears you are getting EVERY ACCOUNT, and performance is slow, I would try by creating a prequery by just account, then do a single join to the other join tables something like..
select
T.Accountno,
T.MxDt,
datediff(MM, T.MxDt, '2011-6-30') as Diffs,
P.Name as POName
from
( select T1.AccountNo,
Max( T1.DateTxn ) MxDt
from AccontTxn_skd T1
group by T1.AccountNo ) T
JOIN Account_skd A
on T.AccountNo = A.AccountNo
JOIN POName P
on A.POCode = P.Code <-- GUESSING as you didn't qualify alias.field
AND A.OfficeCode = P.GPOCode <-- in your query for these two fields
order by
len(T.AccountNo) DESC
You had other elements based on the T.ServiceCode matching, but since you are only grouping on the account number anyhow, did it matter which service code was used? Otherwise, you would need to group by both the account AND service code (which I would have added the service code into the prequery and added as join condition to the account table too).

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)