SQL subquery / select subset of data from separate table - sql

I'm writing an SQL query to extract the printing usage for individual cartridges. I've got the main body of the query down as below but I'm having trouble selecting some specific data to do with the meter readings stored in a separate table.
The below query lists cartridges put into printers with the date they were activated and the date they were deactivated. I would then like to use a MeterReadings table to see what the usage was over that period using the ActivatedDate and DeactivatedDate based on the DeviceID. What I have so far is below
SELECT Devices.DeviceID,
Devices.DeviceDescription,
DeviceConsumables.ConsumableVariantID,
ConsumableVariants.Type,
ConsumableDescriptions.Description,
MAX(ConsumableReadings.ReadingDate) as DeactivatedDate,
MIN(ConsumableReadings.ReadingDate) AS ActivatedDate,
ConsumableReadings.ChangedDate,
CASE ConsumableVariants.ColourID
WHEN 1 THEN MAX(MeterReadings.TotalMono) - MIN(MeterReadings.TotalMono)
ELSE MAX(MeterReadings.TotalColour) - MIN(MeterReadings.TotalColour)
END AS PrintingDiff,
ConsumableVariants.ExpectedPageCoverage,
ConsumableVariants.ExpectedPageYield
FROM Devices
LEFT JOIN DeviceConsumables ON Devices.DeviceID = DeviceConsumables.DeviceID
LEFT JOIN ConsumableVariants ON DeviceConsumables.ConsumableVariantID = ConsumableVariants.ConsumableVariantID
LEFT JOIN ConsumableReadings ON DeviceConsumables.ConsumableID = ConsumableReadings.ConsumableID
LEFT JOIN ConsumableDescriptions ON ConsumableVariants.DescriptionID = ConsumableDescriptions.ConsumableDescriptionID
LEFT JOIN MeterReadings ON DeviceConsumables.DeviceID = MeterReadings.DeviceID
WHERE ConsumableVariants.Type = '3' -- To only get toner cartridges
AND Devices.DeviceID = '24'
AND MeterReadings.ScanDateTime > MIN(ConsumableReadings.ReadingDate)
AND MeterReadings.ScanDateTime < MAX(ConsumableReadings.ReadingDate)
GROUP BY devices.DeviceID, Devices.DeviceDescription,
DeviceConsumables.ConsumableVariantID, ConsumableVariants.Type, ConsumableDescriptions.Description,
ConsumableReadings.ChangedDate, ConsumableVariants.ColourID, ConsumableVariants.ExpectedPageCoverage,
ConsumableVariants.ExpectedPageYield
ORDER BY Devices.DeviceID
This is currently generating the error "An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference."
The calculated fields ActivatedDate and DeactivatedDate are the date ranges I will require. I want to use the case statement to select MAX(MeterReadings.TotalMono) - MIN(MeterReadings.TotalMono) for black and white or MAX(MeterReadings.TotalColour) - MIN(MeterReadings.TotalColour) for colour. This would effectively give me the usage as readings can only go upwards. This would hopefully give me the starting point usage with the MIN and ending point usage with the MAX for the specific DeviceID.
As shown above I'm joining on my MeterReadings table on DeviceID.
I can't figure out how to get the MeterReadings for device x between y and z (where x is DeviceID, y is ActivatedDate and z is DeactivatedDate) so I can then add a calculated column into the case statement. Any help appreciated.
-- Edit
For brevity I won't add all the schema in here but what should be enough.
Devices - list of all known devices
DeviceID
DeviceDescription
lots of extra fields that describe the device
DeviceConsumables - What devices use what consumables
ConsumableID
DeviceID - Forign key to device
ConsumableVariantID - Forign key to ConsumableVariant
ConsumableVariant - list of all the consumable variants there are
ConsumableVariantID
Type - 3 here indicates toner, what I'm interested in
ConsumableReadings
ReadingID - PK
ConsumableID - forign key to DeviceConsumables
ReadingDate - last time a reading was taken
ChangedDate - last time a new cartridge was inserted
MeterReadings
ReadingID - PK not to do with PK of consumablereadings
DeviceID
ScanDateTime - time usage scan was taken
TotalMono - total mono at scan time
TotalColour Total colour at scan time

Well you have to break you queries into nested queries ... Below query is not tested, so it may have some syntax problem, but it gives a way to find out what you are looking for ...
SELECT Devices.DeviceID,
Devices.DeviceDescription,
DeviceConsumables.ConsumableVariantID,
ConsumableVariants.Type,
ConsumableDescriptions.Description,
A.DeactivatedDate,
A.ActivatedDate,
A.ChangedDate,
CASE ConsumableVariants.ColourID
WHEN 1 THEN MAX(MeterReadings.TotalMono) - MIN(MeterReadings.TotalMono)
ELSE MAX(MeterReadings.TotalColour) - MIN(MeterReadings.TotalColour)
END AS PrintingDiff,
ConsumableVariants.ExpectedPageCoverage,
ConsumableVariants.ExpectedPageYield
FROM Devices
LEFT JOIN DeviceConsumables ON Devices.DeviceID = DeviceConsumables.DeviceID
LEFT JOIN ConsumableVariants ON DeviceConsumables.ConsumableVariantID = ConsumableVariants.ConsumableVariantID
LEFT JOIN ConsumableReadings ON DeviceConsumables.ConsumableID = ConsumableReadings.ConsumableID
LEFT JOIN ConsumableDescriptions ON ConsumableVariants.DescriptionID = ConsumableDescriptions.ConsumableDescriptionID
LEFT JOIN
(
SELECT D.DeviceID,
MAX(CR.ReadingDate) as DeactivatedDate,
MIN(CR.ReadingDate) AS ActivatedDate,
CR.ChangedDate
FROM Devices D
LEFT JOIN DeviceConsumables DC ON D.DeviceID = DC.DeviceID
LEFT JOIN ConsumableReadings CR ON DC.ConsumableID = CR.ConsumableID
WHERE D.DeviceID = '24'
GROUP BY D.DeviceID,
CR.ChangedDate
) AS A ON DeviceConsumables.DeviceID = A.DeviceID
LEFT JOIN MeterReadings ON A.DeviceID = MeterReadings.DeviceID
WHERE ConsumableVariants.Type = '3' -- To only get toner cartridges
AND Devices.DeviceID = '24'
AND MeterReadings.ScanDateTime > A.ActivatedDate
AND MeterReadings.ScanDateTime < A.DeactivatedDate
GROUP BY devices.DeviceID, Devices.DeviceDescription,
DeviceConsumables.ConsumableVariantID, ConsumableVariants.Type, ConsumableDescriptions.Description,
ConsumableReadings.ChangedDate, ConsumableVariants.ColourID, ConsumableVariants.ExpectedPageCoverage,
ConsumableVariants.ExpectedPageYield
ORDER BY Devices.DeviceID

First, I'd add to your output the ColourID so you know if you are reading the Mono or Colour values. Second, I believe if you remove the ConsumableID from your group by clause, it should work. ConsumableID rows have one date, and if you include that in your group by, you'll never be able to get a max and min, therefore the difference.

Your problem is in your join statement.
Change the following line:
LEFT JOIN ConsumableTypes ON ConsumableVariants.Type = ConsumableVariants.Type
To something like:
LEFT JOIN ConsumableTypes ON ConsumableVariants.Type = ConsumableTypes.Type
(or whatever table you are joining to).

Related

How to filter where a condition is true at least once

I need to filter down to only service orders that have a "service" work group value in at least one of their tasks. However, I don't want to get rid of the rows that aren't work group = "Service" if at least one of the task rows has that value. The end result would leave out all data from service orders that didn't have at least one BI_WRKFLW_TASK_KEY that was equal to "SERVICE". I know how to do normal filters but getting it to this specificity is beyond my current experience.
I've experimented with normal filters but they leave out rows that are a part of the same Service Order but just don't have that work group.
SELECT W.BI_WRKFLW_KEY,
T.BI_WORK_EVENT_CD,
T.BI_TASK_CD,
T.BI_WORKGRP,
**M.BI_SO_NBR**,
M.BI_SO_TYPE_CD,
M.BI_CLOSE_DT,
M.BI_OPEN_DT,
M.BI_SO_STAT_CD,
R.BI_WRKFLW_TMPLT_NM,
T.BI_WRKFLW_TASK_SEQ_NBR,
T.BI_WORKGRP,
A.BI_WORK_EVENT_CD,
A.BI_EVENT_DT_TM,
A.SY_JOB_QUEUE_ID,
**A.BI_WORKGRP**,
A.SY_USER_ID,
**A.BI_WRKFLW_TASK_KEY**
FROM BI_WRKFLW W
LEFT JOIN BI_WRKFLW_TASKS T ON W.BI_WRKFLW_KEY = T.BI_WRKFLW_KEY
LEFT JOIN BI_SO_DET D ON W.BI_WRKFLW_KEY = D.BI_WRKFLW_KEY
LEFT JOIN BI_SO_MASTER M ON D.BI_SO_NBR = M.BI_SO_NBR
LEFT JOIN BI_WRKFLW_TMPLT_REF R ON W.BI_WRKFLW_TMPLT_ID = R.BI_WRKFLW_TMPLT_ID
LEFT JOIN BI_TASK_ACT A ON T.BI_WRKFLW_TASKS_KEY = A.BI_WRKFLW_TASKS_KEY
WHERE M.BI_OPEN_DT >= ADD_MONTHS(CURRENT_DATE, -'12')
--AND M.BI_SO_TYPE_CD IN ('IVC-NEW1')
--AND M.BI_SO_STAT_CD LIKE 'O'
ORDER BY M.BI_SO_NBR, T.BI_EVENT_DT_TM
Any Service order row where the Service order has at least one BI_WRKFLOW_TASK_CD = "Service" would be kept and all other service orders filtered out.
I tried to map this out, i may not have got it quite right,
I think you are asking for BI_SO_MASTER records that have >=1 BI_WRKFLW_TASKS that belong to a certain group.
Try using a CTE to get the detail rows with a correct task, then you can find the SO population... then you can ???not sure what the ultimate result set goal is?
;with matchingTasks as ( D.BI_SO_NBR, D.<id> , W.BI_WRKFLW_KEY , T.<key> , A.Key
from BI_WRKFLW W
LEFT JOIN BI_WRKFLW_TASKS T ON W.BI_WRKFLW_KEY = T.BI_WRKFLW_KEY
LEFT JOIN BI_SO_DET D ON W.BI_WRKFLW_KEY = D.BI_WRKFLW_KEY
LEFT JOIN BI_TASK_ACT A ON T.BI_WRKFLW_TASKS_KEY = A.BI_WRKFLW_TASKS_KEYW
Where
<good dates>
and <A.field is what I am looking for>
)
/*Here you have the SO population
as well as the ids that helped this SO qualify.
*/
, My_SO_Population as (select Distinct BI_SO_NBR from matchingTasks )
/*now you can go get what you need.
the challenge of finding SOs w/ >=1 matching task has been solved...
*/
select <necessary fields> from
My_SO_Population
join <whatever you need....this is where i am cloudy>
if i am missing the goal, let me know where...
You can just add this to your WHERE clause:
AND T.BI_WRKFLW_KEY IN (
SELECT BI_WRKFLW_KEY
FROM BI_WRKFLW_TASKS
WHERE BI_WRKFLOW_TASK_CD = 'Service')

SQL triple left join query across three databases

I'm trying to run a query across three tables in three different databases. This query works but I'm pulling close to a billion records... Is there any solution to pull the distinct fields from smlog.requestor_type and arcust.maj_class for the following query?
SELECT
smreq.request_id AS ROIrequestID,
arcust.customer AS LAWcustID,
smlog.logid AS ESLlogID,
arcust.maj_class AS invoicetype,
smlog.requestor_type AS SMLrequestortype,
smlog.request_type as SMLrequesttype
FROM roi.sm_request_sp_data reqsp
LEFT JOIN smart.smlog#smartlog smlog ON smlog.logid = reqsp.logid
LEFT JOIN roi.sm_requests smreq ON smreq.request_id = reqsp.request_id
LEFT JOIN lawson.arcustomer#smart7 arcust ON arcust.customer =
smreq.customer_id
WHERE smreq.ORIG_DT >= TO_DATE('2016/03/01', 'yyyy/mm/dd')
AND smreq.ORIG_DT <= TO_DATE('2016/03/02','yyyy/mm/dd')
GROUP BY smlog.requestor_type;
These are observations, not an answer
SELECT
smreq.request_id AS ROIrequestID
FROM roi.sm_request_sp_data reqsp
LEFT JOIN roi.sm_requests smreq ON reqsp.request_id = smreq.request_id
WHERE smreq.ORIG_DT >= TO_DATE('2016/03/01', 'yyyy/mm/dd')
AND smreq.ORIG_DT <= TO_DATE('2016/03/02', 'yyyy/mm/dd')
That LEFT JOIN is overridden completely by the where clause (any NULL produced from the left join is disallowed) so use an INNER JOIN instead.
For the where clause It isn't clear if you want one day's data ('2016/03/01') or 2 day's (both '2016/03/01'+ '2016/03/02'), If you are expecting just one day then don't use <= in the second predicate.
For the rest we really have no factual basis to make recommendations.

SQL Left Outer Join on Subquery

I am attempting to build a query that contains a left join subquery - based on the principles I learned in a previous question - that should pull similar data sets from two different tables. The goal is to compare volume data by account || platform to ensure that the stored procedure that creates one table from another is doing so correctly.
The idea is this:
Account || Product || T1Vol || T2Vol
abc AT 10 10
def RT 20 25
ghi OB 30
So with this example, the idea is to pull all accounts and products from T1 (the table the procedure acts on) and any accounts and products from T2 (the newly created table) where there is a match (so, Left Join on T1 = T2). (Ideally, everything will match perfectly, with no variance in T1 vs T2 vol and no nulls in T2 volume).
I wrote the following the query to accomplish this but its not quite working. The current error I get is not a GROUP BY expression - which I don't think is the real issue. I have been searching and with iterations to no avail.
The query is below. (To keep with the example, T1 = OpStats and T2 = RegSplits. Any help is much appreciated.
SELECT DTA.trading_code Account, OpStats.product_dwkey Platform, SUM(OpStats.risk_amount_adj)/1000000 OpStatsVol, RegSplits.Volume RegSplitsVol
FROM fact_trade_presplit_rollup OpStats
INNER JOIN dim_trading_accounts DTA ON OpStats.trading_dwkey=DTA.trading_dwkey
LEFT OUTER JOIN
( SELECT b.trading_Code Account, a.product_dwkey Platform, SUM(a.risk_amount_adj)/1000000 Volume
FROM fact_trade_rollup a
INNER JOIN dim_trading_accounts b on a.trading_dwkey=b.trading_dwkey
WHERE a.account_type IN('Customer','Taker')
AND a.date_key>='01-JAN-16'
AND a.date_key<='31-MAR-16'
AND a.daily_db_metric NOT IN ('Manual Treasury Volume ($B)', 'Manual Volume ($B)', 'HSBC-WL POMS (Internal) Volume ($B)','JPMC-WL Order Book (Internal) Volume ($B)')
AND (a.product_dwkey IN('RT','HWL') AND a.source_name<>'STP')
GROUP BY b.trading_code, a.product_dwkey ) RegSplits
ON (DTA.trading_code = RegSplits.Account) /* is it because I am trying to join DTA to the subquery */
WHERE OpStats.account_type IN('Customer','Taker')
AND OpStats.date_key>='01-JAN-16'
AND OpStats.date_key<='31-MAR-16'
AND OpStats.daily_db_metric NOT IN ('Manual Treasury Volume ($B)', 'Manual Volume ($B)', 'HSBC-WL POMS (Internal) Volume ($B)','JPMC-WL Order Book (Internal) Volume ($B)')
AND (OpStats.product_dwkey IN('RT','HWL') AND OpStats.source_name<>'STP')
GROUP BY DTA.trading_code, OpStats.product_dwkey;
The "Not group by expression" error is very easy to check.
Just compare SELECT expressions with GROUP BY expressions:
SELECT DTA.trading_code Account,
OpStats.product_dwkey Platform,
SUM(OpStats.risk_amount_adj)/1000000 OpStatsVol,
RegSplits.Volume RegSplitsVol
FROM ......
......
GROUP BY DTA.trading_code,
OpStats.product_dwkey;
There are two elements in SELECT that are not in GROUP BY:
SUM(OpStats.risk_amount_adj)/1000000 OpStatsVol
RegSplits.Volume RegSplitsVol
The number 1 is OK - it's an aggregate function, it cannot be in GROUP BY.
The number 2 caused this error - it's not an aggregate function, and it is not listed in GROUP BY clause.

SQL Need advice how to add timestamp to this query

I have this code:
select Users.phoneMac, Users.apMac, Locations.Lon, Locations.Lat
from Locations, Users
inner join (
select u.phoneMac, max(u.strenght) as most
from Users u, Locations l
where u.apMac = l.apMac
group by u.phoneMac
) as ij on ij.phoneMac=Users.phoneMac and Users.strenght = ij.most
where Locations.apMac = Users.apMac;
It worked for me fine but when I added more data to users table this query calculated results from all the data and I wanted to get results just from latest data. So I added timestamp to Users table.
So can you help me fix this code so it first take only data from latest timestamp for every user(users.phoneMac)(there can be more then 1 row of data for same phoneMac) and then do the rest of calculations.
You're already picking the max value of the "strenght" field and joining on that, so why not use the same approach again for your timestamp field? Something like:
SELECT Users.phoneMac, Users.apMac, Locations.Lon, Locations.Lat
FROM Locations
INNER JOIN Users
ON Users.apMac = Locations.apMac
INNER JOIN (
SELECT u.phoneMac, max(u.strenght) AS most
FROM Locations l
INNER JOIN Users u ON u.apMac = l.apMac
GROUP BY u.phoneMac) AS ij
ON ij.phoneMac = Users.phoneMac
AND Users.strenght = ij.most
INNER JOIN (
SELECT u2.phoneMac, max(u2.timestampfield) AS latest
FROM Locations l2
INNER JOIN Users u2 ON u2.apMac = l2.apMac
GROUP BY u2.phoneMac) AS ijk
ON ijk.phoneMac = Users.phoneMac
AND Users.timestampfield = ij.latest;
(By the way, using the old join syntax with comma and the WHERE clause makes it harder to understand the logic, and occasionally makes the logic wrong. The new join syntax with ON is really a lot better.)

Having difficulty combining JET SQL queries

Warning: Here be beginner SQL! Be gentle...
I have two queries that independently give me what I want from the relevant tables in a reasonably timely fashion, but when I try to combine the two in a (fugly) union, things quickly fall to bits and the query either gives me duplicate records, takes an inordinately long time to run, or refuses to run at all quoting various syntax errors at me.
Note: I had to create a 'dummy' table (tblAllDates) with a single field containing dates from 1 Jan 2008 as I need the query to return a single record from each day, and there are days in both tables that have no data. This is the only way I could figure to do this, no doubt there is a smarter way...
Here are the queries:
SELECT tblAllDates.date, SUM(tblvolumedata.STT)
FROM tblvolumedata RIGHT JOIN tblAllDates ON tblvolumedata.date=tblAllDates.date
GROUP BY tblAllDates.date;
SELECT tblAllDates.date, SUM(NZ(tblTimesheetData.batching)+NZ(tblTimesheetData.categorisation)+NZ(tblTimesheetData.CDT)+NZ(tblTimesheetData.CSI)+NZ(tblTimesheetData.destruction)+NZ(tblTimesheetData.extraction)+NZ(tblTimesheetData.indexing)+NZ(tblTimesheetData.mail)+NZ(tblTimesheetData.newlodgement)+NZ(tblTimesheetData.recordedDeliveries)+NZ(tblTimesheetData.retrieval)+NZ(tblTimesheetData.scanning)) AS VA
FROM tblTimesheetData RIGHT JOIN tblAllDates ON tblTimesheetData.date=tblAllDates.date
GROUP BY tblAllDates.date;
The best result I have managed is the following:
SELECT tblAllDates.date, 0 AS STT, SUM(NZ(tblTimesheetData.batching)+NZ(tblTimesheetData.categorisation)+NZ(tblTimesheetData.CDT)+NZ(tblTimesheetData.CSI)+NZ(tblTimesheetData.destruction)+NZ(tblTimesheetData.extraction)+NZ(tblTimesheetData.indexing)+NZ(tblTimesheetData.mail)+NZ(tblTimesheetData.newlodgement)+NZ(tblTimesheetData.recordedDeliveries)+NZ(tblTimesheetData.retrieval)+NZ(tblTimesheetData.scanning)) AS VA
FROM tblTimesheetData RIGHT JOIN tblAllDates ON tblTimesheetData.date=tblAllDates.date
GROUP BY tblAllDates.date
UNION SELECT tblAllDates.date, SUM(tblvolumedata.STT) AS STT, 0 AS VA
FROM tblvolumedata RIGHT JOIN tblAllDates ON tblvolumedata.date=tblAllDates.date
GROUP BY tblAllDates.date;
This gives me the VA and STT data I want, but in two records where I have data from both in a single day, like this:
date STT VA
28/07/2008 0 54020
28/07/2008 33812 0
29/07/2008 0 53890
29/07/2008 33289 0
30/07/2008 0 51780
30/07/2008 30456 0
31/07/2008 0 52790
31/07/2008 31305 0
What I'm after is the STT and VA data in single row per day. How might this be achieved, and how far am I away from a query that could be considered optimal? (don't laugh, I only seek to learn!)
You could put all of that into one query like so
SELECT
dates.date,
SUM(volume.STT) AS STT,
SUM(NZ(timesheet.batching)+NZ(timesheet.categorisation)+NZ(timesheet.CDT)+NZ(timesheet.CSI)+NZ(timesheet.destruction)+NZ(timesheet.extraction)+NZ(timesheet.indexing)+NZ(timesheet.mail)+NZ(timesheet.newlodgement)+NZ(timesheet.recordedDeliveries)+NZ(timesheet.retrieval)+NZ(timesheet.scanning)) AS VA
FROM
tblAllDates dates
LEFT JOIN tblvolumedata volume
ON dates.date = volume.date
LEFT JOIN tblTimesheetData timesheet
ON
dates.date timesheet.date
GROUP BY dates.date;
I've put the dates table first in the FROM clause and then LEFT JOINed the two other tables.
The jet database can be funny with more than one join in a query, so you may need to wrap one of the joins in parentheses (I believe this is referred to as Bill's SQL!) - I would recommend LEFT JOINing the tables in the query builder and then taking the SQL code view and modifying that to add in the SUMs, GROUP BY, etc.
EDIT:
Ensure that the date field in each table is indexed as you're joining each table on this field.
EDIT 2:
How about this -
SELECT date,
Sum(STT),
Sum(VA)
FROM
(SELECT dates.date, 0 AS STT, SUM(NZ(tblTimesheetData.batching)+NZ(tblTimesheetData.categorisation)+NZ(tblTimesheetData.CDT)+NZ(tblTimesheetData.CSI)+NZ(tblTimesheetData.destruction)+NZ(tblTimesheetData.extraction)+NZ(tblTimesheetData.indexing)+NZ(tblTimesheetData.mail)+NZ(tblTimesheetData.newlodgement)+NZ(tblTimesheetData.recordedDeliveries)+NZ(tblTimesheetData.retrieval)+NZ(tblTimesheetData.scanning)) AS VA
FROM tblTimesheetData RIGHT JOIN dates ON tblTimesheetData.date=dates.date
GROUP BY dates.date
UNION SELECT dates.date, SUM(tblvolumedata.STT) AS STT, 0 AS VA
FROM tblvolumedata RIGHT JOIN dates ON tblvolumedata.date=dates.date
GROUP BY dates.date
)
GROUP BY date;
Interestingly, When I ran my first statement against some test data, the figures for STT and VA had all been multiplied by 4, compared to the second statement. Very strange behaviour and certainly not what I expected.
The table of dates is the best way.
Combine the joins in there FROM clause. Something like this....
SELECT d.date,
a.value,
b.value
FROM tableOfDates d
RIGHT JOIN firstTable a
ON d.date = a.date
RIGHT JOIN secondTable b
ON d.date = b.date
Turn the SQL into views and join them on the dates.