Warning: Here be beginner SQL! Be gentle...
I have two queries that independently give me what I want from the relevant tables in a reasonably timely fashion, but when I try to combine the two in a (fugly) union, things quickly fall to bits and the query either gives me duplicate records, takes an inordinately long time to run, or refuses to run at all quoting various syntax errors at me.
Note: I had to create a 'dummy' table (tblAllDates) with a single field containing dates from 1 Jan 2008 as I need the query to return a single record from each day, and there are days in both tables that have no data. This is the only way I could figure to do this, no doubt there is a smarter way...
Here are the queries:
SELECT tblAllDates.date, SUM(tblvolumedata.STT)
FROM tblvolumedata RIGHT JOIN tblAllDates ON tblvolumedata.date=tblAllDates.date
GROUP BY tblAllDates.date;
SELECT tblAllDates.date, SUM(NZ(tblTimesheetData.batching)+NZ(tblTimesheetData.categorisation)+NZ(tblTimesheetData.CDT)+NZ(tblTimesheetData.CSI)+NZ(tblTimesheetData.destruction)+NZ(tblTimesheetData.extraction)+NZ(tblTimesheetData.indexing)+NZ(tblTimesheetData.mail)+NZ(tblTimesheetData.newlodgement)+NZ(tblTimesheetData.recordedDeliveries)+NZ(tblTimesheetData.retrieval)+NZ(tblTimesheetData.scanning)) AS VA
FROM tblTimesheetData RIGHT JOIN tblAllDates ON tblTimesheetData.date=tblAllDates.date
GROUP BY tblAllDates.date;
The best result I have managed is the following:
SELECT tblAllDates.date, 0 AS STT, SUM(NZ(tblTimesheetData.batching)+NZ(tblTimesheetData.categorisation)+NZ(tblTimesheetData.CDT)+NZ(tblTimesheetData.CSI)+NZ(tblTimesheetData.destruction)+NZ(tblTimesheetData.extraction)+NZ(tblTimesheetData.indexing)+NZ(tblTimesheetData.mail)+NZ(tblTimesheetData.newlodgement)+NZ(tblTimesheetData.recordedDeliveries)+NZ(tblTimesheetData.retrieval)+NZ(tblTimesheetData.scanning)) AS VA
FROM tblTimesheetData RIGHT JOIN tblAllDates ON tblTimesheetData.date=tblAllDates.date
GROUP BY tblAllDates.date
UNION SELECT tblAllDates.date, SUM(tblvolumedata.STT) AS STT, 0 AS VA
FROM tblvolumedata RIGHT JOIN tblAllDates ON tblvolumedata.date=tblAllDates.date
GROUP BY tblAllDates.date;
This gives me the VA and STT data I want, but in two records where I have data from both in a single day, like this:
date STT VA
28/07/2008 0 54020
28/07/2008 33812 0
29/07/2008 0 53890
29/07/2008 33289 0
30/07/2008 0 51780
30/07/2008 30456 0
31/07/2008 0 52790
31/07/2008 31305 0
What I'm after is the STT and VA data in single row per day. How might this be achieved, and how far am I away from a query that could be considered optimal? (don't laugh, I only seek to learn!)

You could put all of that into one query like so
SUM(volume.STT) AS STT,
SUM(NZ(timesheet.batching)+NZ(timesheet.categorisation)+NZ(timesheet.CDT)+NZ(timesheet.CSI)+NZ(timesheet.destruction)+NZ(timesheet.extraction)+NZ(timesheet.indexing)+NZ(timesheet.mail)+NZ(timesheet.newlodgement)+NZ(timesheet.recordedDeliveries)+NZ(timesheet.retrieval)+NZ(timesheet.scanning)) AS VA
tblAllDates dates
LEFT JOIN tblvolumedata volume
ON dates.date = volume.date
LEFT JOIN tblTimesheetData timesheet
dates.date timesheet.date
GROUP BY dates.date;
I've put the dates table first in the FROM clause and then LEFT JOINed the two other tables.
The jet database can be funny with more than one join in a query, so you may need to wrap one of the joins in parentheses (I believe this is referred to as Bill's SQL!) - I would recommend LEFT JOINing the tables in the query builder and then taking the SQL code view and modifying that to add in the SUMs, GROUP BY, etc.
Ensure that the date field in each table is indexed as you're joining each table on this field.
How about this -
SELECT date,
(SELECT dates.date, 0 AS STT, SUM(NZ(tblTimesheetData.batching)+NZ(tblTimesheetData.categorisation)+NZ(tblTimesheetData.CDT)+NZ(tblTimesheetData.CSI)+NZ(tblTimesheetData.destruction)+NZ(tblTimesheetData.extraction)+NZ(tblTimesheetData.indexing)+NZ(tblTimesheetData.mail)+NZ(tblTimesheetData.newlodgement)+NZ(tblTimesheetData.recordedDeliveries)+NZ(tblTimesheetData.retrieval)+NZ(tblTimesheetData.scanning)) AS VA
FROM tblTimesheetData RIGHT JOIN dates ON tblTimesheetData.date=dates.date
GROUP BY dates.date
UNION SELECT dates.date, SUM(tblvolumedata.STT) AS STT, 0 AS VA
FROM tblvolumedata RIGHT JOIN dates ON tblvolumedata.date=dates.date
GROUP BY dates.date
GROUP BY date;
Interestingly, When I ran my first statement against some test data, the figures for STT and VA had all been multiplied by 4, compared to the second statement. Very strange behaviour and certainly not what I expected.

The table of dates is the best way.
Combine the joins in there FROM clause. Something like this....
SELECT d.date,
FROM tableOfDates d
RIGHT JOIN firstTable a
ON d.date = a.date
RIGHT JOIN secondTable b
ON d.date = b.date

Turn the SQL into views and join them on the dates.


SQL triple left join query across three databases

I'm trying to run a query across three tables in three different databases. This query works but I'm pulling close to a billion records... Is there any solution to pull the distinct fields from smlog.requestor_type and arcust.maj_class for the following query?
smreq.request_id AS ROIrequestID,
arcust.customer AS LAWcustID,
smlog.logid AS ESLlogID,
arcust.maj_class AS invoicetype,
smlog.requestor_type AS SMLrequestortype,
smlog.request_type as SMLrequesttype
FROM roi.sm_request_sp_data reqsp
LEFT JOIN smart.smlog#smartlog smlog ON smlog.logid = reqsp.logid
LEFT JOIN roi.sm_requests smreq ON smreq.request_id = reqsp.request_id
LEFT JOIN lawson.arcustomer#smart7 arcust ON arcust.customer =
WHERE smreq.ORIG_DT >= TO_DATE('2016/03/01', 'yyyy/mm/dd')
AND smreq.ORIG_DT <= TO_DATE('2016/03/02','yyyy/mm/dd')
GROUP BY smlog.requestor_type;
These are observations, not an answer
smreq.request_id AS ROIrequestID
FROM roi.sm_request_sp_data reqsp
LEFT JOIN roi.sm_requests smreq ON reqsp.request_id = smreq.request_id
WHERE smreq.ORIG_DT >= TO_DATE('2016/03/01', 'yyyy/mm/dd')
AND smreq.ORIG_DT <= TO_DATE('2016/03/02', 'yyyy/mm/dd')
That LEFT JOIN is overridden completely by the where clause (any NULL produced from the left join is disallowed) so use an INNER JOIN instead.
For the where clause It isn't clear if you want one day's data ('2016/03/01') or 2 day's (both '2016/03/01'+ '2016/03/02'), If you are expecting just one day then don't use <= in the second predicate.
For the rest we really have no factual basis to make recommendations.

Include missing years in Group By query

I am fairly new in Access and SQL programming. I am trying to do the following:
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
and group by year even when there is no amount in some of the years. I would like to have these years listed as well for a report with charts. I'm not certain if this is possible, but every bit of help is appreciated.
My code so far is as follows:
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
ON SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId
) ON Base_CustomerT.CustomerId = SO_SalesOrderT.CustomerId
AND ((Base_CustomerT.IsActive)=Yes))
You need another table with all years listed -- you can create this on the fly or have one in the db... join from that. So if you had a table called alltheyears with a column called y that just listed the years then you could use code like this:
WITH minmax as
select min(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as minyear,
max(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as maxyear)
from SalesOrderPaymentHistoryLineT
), yearsused as
select y
from alltheyears, minmax
where alltheyears.y >= minyear and alltheyears.y <= maxyear
select *
from yearsused
join ( -- your query above goes here! -- ) T
ON year(T.SO_SalesOrderPaymentHistoryLineT.DatePaid) = yearsused.y
You need a data source that will provide the year numbers. You cannot manufacture them out of thin air. Supposing you had a table Interesting_year with a single column year, populated, say, with every distinct integer between 2000 and 2050, you could do something like this:
Sum(NZ(data.Amount)) AS [Sum Of PaymentPerYear]
(SELECT * FROM Base_CustomerT INNER JOIN Year) AS base
INNER JOIN SO_SalesOrderPaymentHistoryLineT
ON (SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId)
) AS data
ON ((base.CustomerId = data.CustomerId)
AND (base.year = Year(data.DatePaid))),
(data.PaymentType = 1)
AND (base.IsActive = Yes)
AND (base.year BETWEEN
(SELECT Min(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT)
AND (SELECT Max(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT))
Note the following:
The revised query first forms the Cartesian product of BaseCustomerT with Interesting_year in order to have base customer data associated with each year (this is sometimes called a CROSS JOIN, but it's the same thing as an INNER JOIN with no join predicate, which is what Access requires)
In order to have result rows for years with no payments, you must perform an outer join (in this case a LEFT JOIN). Where a (base customer, year) combination has no associated orders, the rest of the columns of the join result will be NULL.
I'm selecting the CustomerId from Base_CustomerT because you would sometimes get a NULL if you selected from SO_SalesOrderT as in the starting query
I'm using the Access Nz() function to convert NULL payment amounts to 0 (from rows corresponding to years with no payments)
I converted your HAVING clause to a WHERE clause. That's semantically equivalent in this particular case, and it will be more efficient because the WHERE filter is applied before groups are formed, and because it allows some columns to be omitted from the GROUP BY clause.
Following Hogan's example, I filter out data for years outside the overall range covered by your data. Alternatively, you could achieve the same effect without that filter condition and its subqueries by ensuring that table Intersting_year contains only the year numbers for which you want results.
Update: modified the query to a different, but logically equivalent "something like this" that I hope Access will like better. Aside from adding a bunch of parentheses, the main difference is making both the left and the right operand of the LEFT JOIN into a subquery. That's consistent with the consensus recommendation for resolving Access "ambiguous outer join" errors.
Thank you John for your help. I found a solution which works for me. It looks quiet different but I learned a lot out of it. If you are interested here is how it looks now.
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
LEFT JOIN CustomerPaymentPerYearQ
ON (Base_Customer_RevenueYearQ.RevenueYear = CustomerPaymentPerYearQ.[RevenueYear])
AND (Base_Customer_RevenueYearQ.CustomerId = CustomerPaymentPerYearQ.CustomerId)
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]

Subtract. SQL daily value (today value - yesterday value)

I have a database, where i keep all users traffic. It updates each day.
I want to count a traffic per day. It means: traffic today - traffic yesterday?
There are three tables (down, up, remoteid). There are the results:
select rem.remoteid, down.mazgas, down.portas, down.down, up.up, down.date
from dbo.remoteid as rem
inner join dbo.down as down on down.mazgas=rem.mazgas and down.portas = rem.portas
inner join dbo.up as up on up.mazgas=down.mazgas and up.portas = down.portas
where down.mazgas=up.mazgas and down.portas=up.portas and down.date= up.date
remoteid mazgas portas down up date
10156529 gpon-onu_1/12/5:1 2678.0 69963.9 2014-06-01
10156529 gpon-onu_1/12/5:1 2643.8 68912.3 2014-05-31
29546232 gpon-onu_1/16/1:4 927.8 39273.6 2014-06-01
29546232 gpon-onu_1/16/1:4 923.1 39126.7 2014-05-31
I would like to get an answer:
remoteid, mazgas, portas, down, up where down = (down(today) - down(yesterday)) and up= (up(today) - up(yesterday))
Thank You a lot.
Since you did not specify the RDBMS you are using I will try to use basic SQL. Something like this would give you the desired results:
COALESCE(d2.down,0)-COALESCE(d1.down,0) AS down,
COALESCE(u2.up ,0)-COALESCE(u1.up ,0) AS up,
FROM dbo.remoteid AS r
JOIN dbo.down AS d2 ON d2.mazgas=r.mazgas AND d2.portas=r.portas
LEFT JOIN dbo.up AS u2 ON u2.mazgas=r.mazgas AND u2.portas=r.portas AND u2.date=d2.date
LEFT JOIN dbo.down AS d1 ON d1.mazgas=r.mazgas AND d1.portas=r.portas AND d1.date=(
FROM dbo.down
WHERE mazgas=r.mazgas AND portas=r.portas AND date<d2.date)
LEFT JOIN dbo.up AS u1 ON u1.mazgas=r.mazgas AND u1.portas=r.portas AND u1.date=(
FROM dbo.up
WHERE mazgas=r.mazgas AND portas=r.portas AND date<u2.date)
Things to consider:
COALESCE is not supported on all RDBMS. But there should be something to substitute it with (IFNULL or IF or IIF). As a last resort, you can fallback on a CASE WHEN which is supported by most. But the COALESCE or equivalent is important. You do have to consider NULLs in this join. You are joining download, uploads on 2 different dates... ignoring NULLs is asking for trouble.
I opted for referring to the remoteid table for mazgas and portas. This seems more solid than using the down table, as I suppose the remoteid master record should always exist, and does not depend on dates.
This will not give you the difference with yesterday's value, but the difference with the last date that has values (same mazgas and portas). If all days have values, then the result is the same.
What you want can be done with some date algebra, like this simple -1 syntax:
COALESCE(d2.down,0)-COALESCE(d1.down,0) AS down,
COALESCE(u2.up ,0)-COALESCE(u1.up ,0) AS up,
FROM dbo.remoteid AS r
JOIN dbo.down AS d2 ON d2.mazgas=r.mazgas AND d2.portas=r.portas
LEFT JOIN dbo.up AS u2 ON u2.mazgas=r.mazgas AND u2.portas=r.portas AND u2.date=d2.date
LEFT JOIN dbo.down AS d1 ON d1.mazgas=r.mazgas AND d1.portas=r.portas AND d1.date=d2.date-1
LEFT JOIN dbo.up AS u1 ON u1.mazgas=r.mazgas AND u1.portas=r.portas AND u1.date=u2.date-1
... but different RDBMS may handle date algebra in other ways (they all have some function to add, subtract days though).
As far as efficiency goes... well you have better have indexes on mazgas, portas, date on both the down table and the up table ;)

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket_watchers.id (PK)
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
The python code which generated this is:
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?
SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.
what about this?
SELECT auth_user.*,
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)

SQL SUM function doubling the amount it should using multiple tables

My query below is doubling the amount on the last record it returns. I have 3 tables - activities, bookings and tempbookings. The query needs to list the activities and attached information and pull the total number (using the SUM) of places booked (as BookingTotal) from the booking table by each activity and then it needs to calculate the same for tempbookings (as tempPlacesReserved) providing the reservedate field inside that table is in the future.
However the first issue is that if there are no records for an activity in the tempbookings table it does not return any records for that activity at all, to get around this i created dummy records in the past so that it still returns the record, but if I can make it so I don't have to do this I would prefer it!
The main issue I have is that on the final record of the returned results it doubles the booking total and the places reserved which of course makes the whole query useless.
I know that I am doing something wrong I just haven't been able to sort it, I have searched similar issues online but am unable to apply them to my situation correctly.
Any help would be appreciated.
P.S. I'm aware that normally you wouldn't need to fully label all the paths to the databases, tables and fields as I have but for the program I am planning to use it in I have to do it this way.
SELECT [LeisureActivities].[dbo].[activities].[activityID],
SUM([LeisureActivities].[dbo].[bookings].[bookingPlaces]) AS 'bookingTotal',
SUM (CASE WHEN[LeisureActivities].[dbo].[tempbookings].[tempReserveDate] > GetDate() THEN [LeisureActivities].[dbo].[tempbookings].[tempPlaces] ELSE 0 end) AS 'tempPlacesReserved'
FROM [LeisureActivities].[dbo].[activities],
WHERE ([LeisureActivities].[dbo].[activities].[activityID]=[LeisureActivities].[dbo].[bookings].[activityID]
AND [LeisureActivities].[dbo].[activities].[activityID]=[LeisureActivities].[dbo].[tempbookings].[tempActivityID])
AND [LeisureActivities].[dbo].[activities].[activityDate] > GetDate ()
GROUP BY [LeisureActivities].[dbo].[activities].[activityID],
Your current query is using an INNER JOIN between each of the tables so if the tempBookings table has no records, you will not return anything.
I would advise that you start to use JOIN syntax. You might also need to use subqueries to get the totals.
SELECT a.[activityID],
coalesce(b.bookingTotal, 0) bookingTotal,
coalesce(t.tempPlacesReserved, 0) tempPlacesReserved
FROM [LeisureActivities].[dbo].[activities] a
select activityID,
SUM([bookingPlaces]) AS bookingTotal
from [LeisureActivities].[dbo].[bookings]
group by activityID
) b
ON a.[activityID]=b.[activityID]
select tempActivityID,
SUM(CASE WHEN [tempReserveDate] > GetDate() THEN [tempPlaces] ELSE 0 end) AS tempPlacesReserved
from [LeisureActivities].[dbo].[tempbookings]
group by tempActivityID
) t
ON a.[activityID]=t.[tempActivityID]
WHERE a.[activityDate] > GetDate();
Note: I am using aliases because it is easier to read
Use new SQL-92 Join syntax, and make join to tempBookings an outer join. Also clean up your sql with table aliases. Makes it easier to read. As to why last row has doubled values, I don't know, but on off chance that it is caused by extra dummy records you entered. get rid of them. That problem is fixed by using outer join to tempBookings. The other possibility is that the join conditions you had to the tempBookings table(t.tempActivityID = a.activityID) is insufficient to guarantee that it will match to only one record in activities table... If, for example, it matches to two records in activities, then the rows from Tempbookings would be repeated twice in the output, (causing the sum to be doubled)
SELECT a.activityID, a.activityName, a.activityDate,
a.activityPlaces, a.activityPrice,
SUM(b.bookingPlaces) bookingTotal,
SUM (CASE WHEN t.tempReserveDate > GetDate()
THEN t.tempPlaces ELSE 0 end) tempPlacesReserved
FROM LeisureActivities.dbo.activities a
Join LeisureActivities.dbo.bookings b
On b.activityID = a.activityID
Left Join LeisureActivities.dbo.tempbookings t
On t.tempActivityID = a.activityID
WHERE a.activityDate > GetDate ()
GROUP BY a.activityID, a.activityName,
a.activityDate, a.activityPlaces,