SQL query, sub queries - sql

I have a table storing sports results for a series of events: ONS_Skippers
The relevant columns from this table for the question are:
FK_EventID, FK_SkipperID and intResult.
I'm presenting different statistics from this database, but I've not succeeded to generate the query for the most advanced one: I would like to list average performance for each participant (FK_SkipperID). I've defined performance to be 100% for an event win, 0% for last place in an event and performance on a linear curve between the two extents. The formula for this is:
Performance = 100*(1-(intResult-1)/(NumberOfParticipantsInTheEvent-1))
NumberOfParticipantsInTheEvent varies from each event, hence needs to be counted from each group of FK_EventID. All my attempts so far has failed:
Example:
SELECT FK_SkipperID, AVG((1-(intResult-1.0)/((SELECT Count(FK_EventID)
FROM ONS_Skippers AS ONS_Skippers2
WHERE ONS_Skippers.FK_EventID = ONS_Skippers2.FK_EventID AND FK_SkipperID > 0
GROUP BY FK_EventID)-1))*100)
FROM ONS_Skippers
GROUP BY FK_SkipperID
This gives error messages "Cannot perform an aggregate function on an expression containing an aggregate or a subquery".
Any idea on how to produce the wanted output?

Try to join to the subquery instead:
SELECT
FK_SkipperID,
AVG((1-(intResult-1.0)/(e.events-1))*100)
FROM ONS_Skippers o
INNER JOIN
(
SELECT Count(FK_EventID) AS events
FROM ONS_Skippers AS ONS_Skippers2
WHERE FK_SkipperID > 0
GROUP BY FK_EventID
) e
ON o.FK_EventID = e.FK_EventID
GROUP BY FK_SkipperID

I think you could achieve this by joining to an inline table as follows...
select SkipperID,
AVG(100*(1-(Result-1)/(p.NumParticipants-1))) as Performance
from Spike.Skippers s
inner join (
select EventId, Count(EventId) as NumParticipants -- Or Max(Result)
from Spike.Skippers
group by EventID
) p on s.EventID = p.EventID
group by SkipperID
[Edit] Apologies for not sticking to your column naming conventions - my OCD insisted I adhere to my own personal standard. Fussy, I know. [/Edit]

Related

SQL Math Operation In Correlated Subquery

I am working with three tables, basically, one is a bill of materials, one contains part inventory, and the last one contains work orders or jobs. I am trying to find out if it is possible to have a correlated subquery that can perform a math operation using a value from the outer query. Here's an example of what I'm trying to do:
SELECT A.work_order,A.assembly,A.job_quantity,
(SELECT COUNT(X.part_number)
FROM bom X
WHERE X.assembly = A.assembly
AND (X.quantity_required * A.job_quantity) >= (SELECT Y.quantity_available FROM inventory Y WHERE
Y.part_number = X.part_number)) AS negatives
FROM work_orders A
ORDER BY A.assembly ASC
I am attempting to find out, for a given work order, if there are parts that we do not have enough of to build the assembly. I'm currently getting an "Error correlating fields" error. Is it possible to do this kind of operation in a single query?
Try moving the subquery to a join, something like this:
SELECT a.work_order, a.assembly, a.job_quantity, n.negatives
FROM work_orders a JOIN (SELECT x.part_number, COUNT(x.part_number) as negatives
FROM bom x JOIN work_orders b
ON x.assembly = b.assembly
WHERE (x.quantity_required * b.job_quantity) >= (SELECT y.quantity_available
FROM inventory y WHERE
y.part_number = x.part_number)
GROUP BY x.part_number) n
ON a.part_number = n.part_number
ORDER BY a.assembly ASC
Or create a temporary cursor with the subquery and then use it to join the main table.
Hope this helps.
Luis

Include missing years in Group By query

I am fairly new in Access and SQL programming. I am trying to do the following:
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
and group by year even when there is no amount in some of the years. I would like to have these years listed as well for a report with charts. I'm not certain if this is possible, but every bit of help is appreciated.
My code so far is as follows:
SELECT
Base_CustomerT.SalesRep,
SO_SalesOrderT.CustomerId,
Base_CustomerT.Customer,
SO_SalesOrderPaymentHistoryLineT.DatePaid,
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
FROM
Base_CustomerT
INNER JOIN (
SO_SalesOrderPaymentHistoryLineT
INNER JOIN SO_SalesOrderT
ON SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId
) ON Base_CustomerT.CustomerId = SO_SalesOrderT.CustomerId
GROUP BY
Base_CustomerT.SalesRep,
SO_SalesOrderT.CustomerId,
Base_CustomerT.Customer,
SO_SalesOrderPaymentHistoryLineT.DatePaid,
SO_SalesOrderPaymentHistoryLineT.PaymentType,
Base_CustomerT.IsActive
HAVING
(((SO_SalesOrderPaymentHistoryLineT.PaymentType)=1)
AND ((Base_CustomerT.IsActive)=Yes))
ORDER BY
Base_CustomerT.SalesRep,
Base_CustomerT.Customer;
You need another table with all years listed -- you can create this on the fly or have one in the db... join from that. So if you had a table called alltheyears with a column called y that just listed the years then you could use code like this:
WITH minmax as
(
select min(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as minyear,
max(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as maxyear)
from SalesOrderPaymentHistoryLineT
), yearsused as
(
select y
from alltheyears, minmax
where alltheyears.y >= minyear and alltheyears.y <= maxyear
)
select *
from yearsused
join ( -- your query above goes here! -- ) T
ON year(T.SO_SalesOrderPaymentHistoryLineT.DatePaid) = yearsused.y
You need a data source that will provide the year numbers. You cannot manufacture them out of thin air. Supposing you had a table Interesting_year with a single column year, populated, say, with every distinct integer between 2000 and 2050, you could do something like this:
SELECT
base.SalesRep,
base.CustomerId,
base.Customer,
base.year,
Sum(NZ(data.Amount)) AS [Sum Of PaymentPerYear]
FROM
(SELECT * FROM Base_CustomerT INNER JOIN Year) AS base
LEFT JOIN
(SELECT * FROM
SO_SalesOrderT
INNER JOIN SO_SalesOrderPaymentHistoryLineT
ON (SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId)
) AS data
ON ((base.CustomerId = data.CustomerId)
AND (base.year = Year(data.DatePaid))),
WHERE
(data.PaymentType = 1)
AND (base.IsActive = Yes)
AND (base.year BETWEEN
(SELECT Min(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT)
AND (SELECT Max(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT))
GROUP BY
base.SalesRep,
base.CustomerId,
base.Customer,
base.year,
ORDER BY
base.SalesRep,
base.Customer;
Note the following:
The revised query first forms the Cartesian product of BaseCustomerT with Interesting_year in order to have base customer data associated with each year (this is sometimes called a CROSS JOIN, but it's the same thing as an INNER JOIN with no join predicate, which is what Access requires)
In order to have result rows for years with no payments, you must perform an outer join (in this case a LEFT JOIN). Where a (base customer, year) combination has no associated orders, the rest of the columns of the join result will be NULL.
I'm selecting the CustomerId from Base_CustomerT because you would sometimes get a NULL if you selected from SO_SalesOrderT as in the starting query
I'm using the Access Nz() function to convert NULL payment amounts to 0 (from rows corresponding to years with no payments)
I converted your HAVING clause to a WHERE clause. That's semantically equivalent in this particular case, and it will be more efficient because the WHERE filter is applied before groups are formed, and because it allows some columns to be omitted from the GROUP BY clause.
Following Hogan's example, I filter out data for years outside the overall range covered by your data. Alternatively, you could achieve the same effect without that filter condition and its subqueries by ensuring that table Intersting_year contains only the year numbers for which you want results.
Update: modified the query to a different, but logically equivalent "something like this" that I hope Access will like better. Aside from adding a bunch of parentheses, the main difference is making both the left and the right operand of the LEFT JOIN into a subquery. That's consistent with the consensus recommendation for resolving Access "ambiguous outer join" errors.
Thank you John for your help. I found a solution which works for me. It looks quiet different but I learned a lot out of it. If you are interested here is how it looks now.
SELECT DISTINCTROW
Base_Customer_RevenueYearQ.SalesRep,
Base_Customer_RevenueYearQ.CustomerId,
Base_Customer_RevenueYearQ.Customer,
Base_Customer_RevenueYearQ.RevenueYear,
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
FROM
Base_Customer_RevenueYearQ
LEFT JOIN CustomerPaymentPerYearQ
ON (Base_Customer_RevenueYearQ.RevenueYear = CustomerPaymentPerYearQ.[RevenueYear])
AND (Base_Customer_RevenueYearQ.CustomerId = CustomerPaymentPerYearQ.CustomerId)
GROUP BY
Base_Customer_RevenueYearQ.SalesRep,
Base_Customer_RevenueYearQ.CustomerId,
Base_Customer_RevenueYearQ.Customer,
Base_Customer_RevenueYearQ.RevenueYear,
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
;

BigQuery - Shuffle By error

I have a table of about 5M rows. Note this is just a poc. Ultimately we will need to be in the TB range. I am doing a self join to find permutations of products for a market basket analysis.
I need to find the number of times the combination occurs in a basket, the ratio of occurrences to total baskets, and the number of times the item occurs in all baskets. This is pretty standard. BigQuery does not support selects in the predicate of another select so I needed to create another join I suppose. Here's what I came up with -
select twoItem.upc1,twoItem.upc2,twoItem.twoItemOccurrences, totalUpc.totalUpcCount
from
(
select purchase1.upc as upc1,purchase2.upc as upc2,count(upc1) as twoItemOccurrences
from
conagra.purchase as purchase1
join each conagra.purchase as purchase2
on purchase1.upc = purchase2.upc
group by upc1,upc2
) as twoItem
JOIN EACH
(
select purchase3.upc as upc3, count(*) as totalUpcCount
from conagra.purchase as purchase3
group by upc3
) as totalUpc
on totalUpc.upc3 = twoItem.upc1
LIMIT 50;
I get the following error:
SHUFFLE BY may only be applied to parallelizable queries, but query is not parallelizable: (SELECT * FROM (SELECT [purchase3.upc] AS [upc3], COUNT(*) AS [totalUpcCount]...
Maybe an unpublished limitation?
Any help would be appreciated.
Try running these with GROUP EACH BY on your inner queries. We'll improve the response message for queries like this.

SQL SUM function doubling the amount it should using multiple tables

My query below is doubling the amount on the last record it returns. I have 3 tables - activities, bookings and tempbookings. The query needs to list the activities and attached information and pull the total number (using the SUM) of places booked (as BookingTotal) from the booking table by each activity and then it needs to calculate the same for tempbookings (as tempPlacesReserved) providing the reservedate field inside that table is in the future.
However the first issue is that if there are no records for an activity in the tempbookings table it does not return any records for that activity at all, to get around this i created dummy records in the past so that it still returns the record, but if I can make it so I don't have to do this I would prefer it!
The main issue I have is that on the final record of the returned results it doubles the booking total and the places reserved which of course makes the whole query useless.
I know that I am doing something wrong I just haven't been able to sort it, I have searched similar issues online but am unable to apply them to my situation correctly.
Any help would be appreciated.
P.S. I'm aware that normally you wouldn't need to fully label all the paths to the databases, tables and fields as I have but for the program I am planning to use it in I have to do it this way.
Code:
SELECT [LeisureActivities].[dbo].[activities].[activityID],
[LeisureActivities].[dbo].[activities].[activityName],
[LeisureActivities].[dbo].[activities].[activityDate],
[LeisureActivities].[dbo].[activities].[activityPlaces],
[LeisureActivities].[dbo].[activities].[activityPrice],
SUM([LeisureActivities].[dbo].[bookings].[bookingPlaces]) AS 'bookingTotal',
SUM (CASE WHEN[LeisureActivities].[dbo].[tempbookings].[tempReserveDate] > GetDate() THEN [LeisureActivities].[dbo].[tempbookings].[tempPlaces] ELSE 0 end) AS 'tempPlacesReserved'
FROM [LeisureActivities].[dbo].[activities],
[LeisureActivities].[dbo].[bookings],
[LeisureActivities].[dbo].[tempbookings]
WHERE ([LeisureActivities].[dbo].[activities].[activityID]=[LeisureActivities].[dbo].[bookings].[activityID]
AND [LeisureActivities].[dbo].[activities].[activityID]=[LeisureActivities].[dbo].[tempbookings].[tempActivityID])
AND [LeisureActivities].[dbo].[activities].[activityDate] > GetDate ()
GROUP BY [LeisureActivities].[dbo].[activities].[activityID],
[LeisureActivities].[dbo].[activities].[activityName],
[LeisureActivities].[dbo].[activities].[activityDate],
[LeisureActivities].[dbo].[activities].[activityPlaces],
[LeisureActivities].[dbo].[activities].[activityPrice];
Your current query is using an INNER JOIN between each of the tables so if the tempBookings table has no records, you will not return anything.
I would advise that you start to use JOIN syntax. You might also need to use subqueries to get the totals.
SELECT a.[activityID],
a.[activityName],
a.[activityDate],
a.[activityPlaces],
a.[activityPrice],
coalesce(b.bookingTotal, 0) bookingTotal,
coalesce(t.tempPlacesReserved, 0) tempPlacesReserved
FROM [LeisureActivities].[dbo].[activities] a
LEFT JOIN
(
select activityID,
SUM([bookingPlaces]) AS bookingTotal
from [LeisureActivities].[dbo].[bookings]
group by activityID
) b
ON a.[activityID]=b.[activityID]
LEFT JOIN
(
select tempActivityID,
SUM(CASE WHEN [tempReserveDate] > GetDate() THEN [tempPlaces] ELSE 0 end) AS tempPlacesReserved
from [LeisureActivities].[dbo].[tempbookings]
group by tempActivityID
) t
ON a.[activityID]=t.[tempActivityID]
WHERE a.[activityDate] > GetDate();
Note: I am using aliases because it is easier to read
Use new SQL-92 Join syntax, and make join to tempBookings an outer join. Also clean up your sql with table aliases. Makes it easier to read. As to why last row has doubled values, I don't know, but on off chance that it is caused by extra dummy records you entered. get rid of them. That problem is fixed by using outer join to tempBookings. The other possibility is that the join conditions you had to the tempBookings table(t.tempActivityID = a.activityID) is insufficient to guarantee that it will match to only one record in activities table... If, for example, it matches to two records in activities, then the rows from Tempbookings would be repeated twice in the output, (causing the sum to be doubled)
SELECT a.activityID, a.activityName, a.activityDate,
a.activityPlaces, a.activityPrice,
SUM(b.bookingPlaces) bookingTotal,
SUM (CASE WHEN t.tempReserveDate > GetDate()
THEN t.tempPlaces ELSE 0 end) tempPlacesReserved
FROM LeisureActivities.dbo.activities a
Join LeisureActivities.dbo.bookings b
On b.activityID = a.activityID
Left Join LeisureActivities.dbo.tempbookings t
On t.tempActivityID = a.activityID
WHERE a.activityDate > GetDate ()
GROUP BY a.activityID, a.activityName,
a.activityDate, a.activityPlaces,
a.activityPrice;

SQL nested Select....I think?

I have 2 tables UnitProd and Unit.
Unit = unitproductivityid,unitid, unitnumber, fleet
UnitProd = unitproductivityid, day, shipweight, stops
I have multiple units in each table and I am trying to do group by functions to get counts of different things.(The tables have more fields than specified this is just example purposes.)
So basically I have the following:
SELECT
u.[Fleet]
,u.[Unit]
,up.[Day]
,((SUM(up.[Shipment_Weight]))/2000) AS [ShipmentWeight]
,((SUM(up.[Shipment_Weight]))/COUNT(up.[Stops])) AS [ShpmntAvg]
FROM
[dbo].[UnitProductivity] u
INNER JOIN [dbo].[UnitProductivityDetails] up
ON u.UnitProductivityId = up.UnitProductivityId
GROUP BY u.fleet, u.unit
So basically the issue I am having is that some up.[Stops] fields have a 0 in them so I want to exclude these. So basically a unit has 1-30 days no matter what and some of those days have a 0 as [Stop] so I want to count(ONLY DAYS with a stop). Would I use a nested select here and how?
Thanks
Unless I am misunderstanding the question, you don't need a nested SELECT.
Just add the following before your GROUP BY:
WHERE up.[Stops] > 0
It doesn't have to be nested, but here's a simple way:
SELECT * FROM Unit WHERE unitproductivityid IN (SELECT unitproductivityid FROM UnitProd WHERE stops > 0) as UP
Good luck!
after your GROUP BY line, add HAVING count(up.[Stops]) > 0
With existing code you can do HAVING clause:
SELECT
u.[Fleet]
,u.[Unit]
,up.[Day]
,((SUM(up.[Shipment_Weight]))/2000) AS [ShipmentWeight]
,((SUM(up.[Shipment_Weight]))/COUNT(up.[Stops])) AS [ShpmntAvg]
FROM
[dbo].[UnitProductivity] u
INNER JOIN [dbo].[UnitProductivityDetails] up
ON u.UnitProductivityId = up.UnitProductivityId
GROUP BY u.fleet, u.unit
HAVING(COUNT(up.[Stops]) > 0
More about HAVING clause on: http://www.w3schools.com/sql/sql_having.asp