how to perform multiple aggregations on a single SQL query - sql

I have a table with Three columns:
GEOID, ParcelID, and PurchaseDate.
The PKs are GEOID and ParcelID which is formatted as such:
GEOID PARCELID PURCHASEDATE
12345 AB123 1/2/1932
12345 sfw123 2/5/2012
12345 fdf323 4/2/2015
12346 dfefej 2/31/2022 <-New GEOID
What I need is an aggregation based on GEOID.
I need to count the number of ParcelIDs from last month PER GEOID
and I need to provide a percentage of that GEOID of all total sold last month.
I need to produce three columns:
GEOID Nbr_Parcels_Sold Percent_of_total
For each GEOID, I need to know how many Parcels Sold Last month, and with that Number, find out how much percentage that entails for all Solds.
For example: if there was 20 Parcels Sold last month, and 4 of them were sold from GEOID 12345, then the output would be:
GEOID Nbr_Parcels_Sold Perc_Total
12345 4 .2 (or 20%)
I am having issues with the dual aggregation. The concern is that the table in question has over 8 million records.
if there is a SQL Warrior out here who have seen this issue before, Any wisdom would be greatly appreciated.
Thanks.

Hopefully you are using SQL Server 2005 or later version, in which case you can get advantage of windowed aggregation. In this case, windowed aggregation will allow you to get the total sale count alongside counts per GEOID and use the total in calculations. Basically, the following query returns just the counts:
SELECT
GEOID,
Nbr_Parcels_Sold = COUNT(*),
Total_Parcels_Sold = SUM(COUNT(*)) OVER ()
FROM
dbo.atable
GROUP BY
GEOID
;
The COUNT(*) call gives you counts per GEOID, according to the GROUP BY clause. Now, the SUM(...) OVER expression gives you the grand total count in the same row as the detail count. It is the empty OVER clause that tells the SUM function to add up the results of COUNT(*) across the entire result set. You can use that result in calculations just like the result of any other function (or any expression in general).
The above query simply returns the total value. As you actually want not the value itself but a percentage from it for each GEOID, you can just put the SUM(...) OVER call into an expression:
SELECT
GEOID,
Nbr_Parcels_Sold = COUNT(*),
Percent_of_total = COUNT(*) * 100 / SUM(COUNT(*)) OVER ()
FROM
dbo.atable
GROUP BY
GEOID
;
The above will give you integer percentages (truncated). If you want more precision or a different representation, remember to cast either the divisor or the dividend (optionally both) to a non-integer numeric type, since SQL Server always performs integral division when both operands are integers.

How about using sub-query to count the sum
WITH data AS
(
SELECT *
FROM [Table]
WHERE
YEAR(PURCHASEDATE) * 100 + MONTH(PURCHASEDATE) = 201505
)
SELECT
GEOID,
COUNT(*) AS Nbr_Parcels_Sold,
CONVERT(decimal(18,8), COUNT(*)) /
(SELECT COUNT(*) FROM data) AS Perc_Total
FROM
data t
GROUP BY
GEOID
EDIT
To update another table by the result, use UPDATE under WITH()
WITH data AS
(
SELECT *
FROM [Table]
WHERE
YEAR(PURCHASEDATE) * 100 + MONTH(PURCHASEDATE) = 201505
)
UPDATE target SET
Nbr_Parcels_Sold = source.Nbr_Parcels_Sold,
Perc_Total = source.Perc_Total
FROM
[AnotherTable] target
INNER JOIN
(
SELECT
GEOID,
COUNT(*) AS Nbr_Parcels_Sold,
CONVERT(decimal(18,8), COUNT(*)) /
(SELECT COUNT(*) FROM data) AS Perc_Total
FROM
data t
GROUP BY
GEOID
) source ON target.GEOID = source.GEOID

Try the following. It grabs the total sales into a variable then uses it in the subsequent query:
DECLARE #pMonthStartDate DATETIME
DECLARE #MonthEndDate DATETIME
DECLARE #TotalPurchaseCount INT
SET #pMonthStartDate = <EnterFirstDayOfAMonth>
SET #MonthEndDate = DATEADD(MONTH, 1, #pMonthStartDate)
SELECT
#TotalPurchaseCount = COUNT(*)
FROM
GEOIDs
WHERE
PurchaseDate BETWEEN #pMonthStartDate
AND #MonthEndDate
SELECT
GEOID,
COUNT(PARCELID) AS Nbr_Parcels_Sold,
CAST(COUNT(PARCELID) AS FLOAT) / CAST(#TotalPurchaseCount AS FLOAT) * 100.0 AS Perc_Total
FROM
GEOIDs
WHERE
ModifiedDate BETWEEN #pMonthStartDate
AND #MonthEndDate
GROUP BY
GEOID
I'm guessing your table name is GEOIDs. Change the value of #pMonthStartDate to suit yourself. If your PKs are as you say then this will be a quick query.

Related

query the percentage of occurrences in an SQL table

I have a table of names, where each row has the columns name, and occurrences.
I'd like to calculate the percentage of a certain name from the table.
How can I do that in one query?
You can get it by using SUM(occurrences):
select
name,
100.0 * sum(occurrences) / (select sum(occurrences) from users) as percentage
from
users
where name = 'Bob'
Try this:
SELECT name, cast(sum(occurance) as float) /
(select sum(occurance) from test) * 100 percentage FROM test
where name like '%dog%'
Demo here
It is not very elegant due to the subquery in the field list but this will do the job if you want it in one query:
SELECT
`name`,
(CAST(SUM(`occurance`) AS DOUBLE)/CAST((SELECT SUM(`occurance`) FROM `user`) AS DOUBLE)) as `percent`
FROM
`user`
WHERE
`name`='miroslav';
Example Fiddle
Hope this helps,
I think conditional aggregation is the best approach:
select sum(case when name = #name then occurrences else 0 end) / sum(occurrences) as ratio
from t;
If you want an actual percentage between 0 and 100 multiply by 100.

Combining Count and MIN functions

I have a part of my query as:
SUM(POReceiptQuantity) as Receieved,
MIN(ItemLocalStandardCost) as Low,
MAX(ItemLocalStandardCost) as High,
Received returns the total number of Items we sold this year. The LOW is the lowest price we paid, and High is the highest price we paid.
I'm trying to incorporate a new column showing how many if the item we sold at the Low price. I tried to use Count along with Min function but it returns a "cannot perform an aggregate function on an expression containing an aggregate or a subquery"
Does anyone have any ideas how i could go about this.
Thank you
You need create a subquery with your current GROUP BY query and join with your Original Table. Then you can use a conditional COUNT
SELECT T2.Received,
T2.Low,
COUNT( CASE WHEN T1.ItemLocalStandardCost = T2.Low THEN 1 END) as Total_Low,
T2.High,
COUNT( CASE WHEN T1.ItemLocalStandardCost = T2.High THEN 1 END) as Total_High
FROM YourTable T1
CROSS JOIN ( SELECT SUM(Y.POReceiptQuantity) as Receieved,
MIN(Y.ItemLocalStandardCost) as Low,
MAX(Y.ItemLocalStandardCost) as High
FROM YourTable Y
GROUP BY .... ) as T2

SQL UPDATE: integer divided by count of a specified value

SQL Server 2008:
Supposing a table of customers, and a column called "Shipping_State". I want to split the $10,000 spent on shipping costs equally amongst all customers who have Shipping_State = Ohio value, so if there's 2 in Ohio 1 month, it'll be 5,000 a piece, if there's 100 the next month, it'll be 100 a piece.
I have a blank column in the table named Cost for that calculated value. Cost is a decimal(18,4) data type. I'd like to be able to use the query for any data types (usually nchar).
How would I accomplish this? My incorrect code in SQL Server Mgmt Studio returns the message:
Msg 157, Level 15, State 1, Line 1 An aggregate may not appear in the
set list of an UPDATE statement.
UPDATE CustomerTable
SET Cost = (10000 / COUNT(CustomerTable.Shipping_State))
WHERE CustomerTable.Shipping_State = 'Ohio';
Use nested SELECT.
UPDATE CustomerTable
SET Cost = (SELECT 10000.0 / count(*)
FROM CustomerTable
WHERE CustomerTable.Shipping_state = 'Ohio')
WHERE CustomerTable.Shipping_State = 'Ohio';
You would need to do a sub-query to get the count, and then update based on this value, something like this should work:
UPDATE CustomerTable
SET Cost = (10000 / CTCount.Shipping_State_Count)
FROM CustomerTable CT
INNER JOIN (
SELECT Shipping_State, COUNT(Shipping_State) AS Shipping_State_Count
FROM CustomerTable
GROUP BY Shipping_State) CTCount ON
CT.Shipping_State = CTCount.Shipping_State
WHERE CT.Shipping_State = 'Ohio';
SQL Server offers two things that really help with this type of query. The first is updatable CTEs and the second are window functions.
with toupdate as (
select ct.*, count(*) over (partition by ct.Shipping_State) as cnt
from CustomerTable
where ct.Shipping_State = 'Ohio'
)
update toupdate
set Cost = cast(10000 as float) / cnt;
Note that 10000 is cast to a floating point number. SQL Server does integer division, and I presume you want integers here (actually, money would probably be a better data type).
It is unclear how "month" fits in, but this might be closer to what you are looking for:
with toupdate as (
select ct.*, count(*) over (partition by ct.Shipping_State, month(ct.Shipping_Date) as cnt
from CustomerTable
where ct.Shipping_State = 'Ohio'
)
update toupdate
set Cost = cast(10000 as float) / cnt;
Note the change to the partition by clause.

SQLite ROLLUP query

I am trying to get a summary of the balance per month within my database. The table has the following fields
tran_date
type (Income or Expense)
amount
I can get as far as retrieving the sum for each type for every month but want the sum for the whole month. This is my current query:
SELECT DISTINCT strftime('%m%Y', tran_date), type, SUM(amount) FROM tran WHERE exclude = 0 GROUP BY tran_date, type
This returns
032013 Income 100
032013 Expense 200
I would like the summary on one row, in this example 032013 -100.
Just use the right group by. This uses conditional aggregation, assuming that you want "income - expense":
SELECT strftime('%m%Y', tran_date), type,
SUM(case when type = 'Income' then amount when type = 'Expense' then - amount end)
FROM tran WHERE exclude = 0
GROUP BY tran_date;
If you want just the full sum, then this is easier:
SELECT strftime('%m%Y', tran_date), type,
SUM(amount)
FROM tran WHERE exclude = 0
GROUP BY tran_date;
Your original query returned type rows because "type" was in the group by clause.
Also, distinct is (almost) never needed with group by.

Weighted average in T-SQL (like Excel's SUMPRODUCT)

I am looking for a way to derive a weighted average from two rows of data with the same number of columns, where the average is as follows (borrowing Excel notation):
(A1*B1)+(A2*B2)+...+(An*Bn)/SUM(A1:An)
The first part reflects the same functionality as Excel's SUMPRODUCT() function.
My catch is that I need to dynamically specify which row gets averaged with weights, and which row the weights come from, and a date range.
EDIT: This is easier than I thought, because Excel was making me think I required some kind of pivot. My solution so far is thus:
select sum(baseSeries.Actual * weightSeries.Actual) / sum(weightSeries.Actual)
from (
select RecordDate , Actual
from CalcProductionRecords
where KPI = 'Weighty'
) baseSeries inner join (
select RecordDate , Actual
from CalcProductionRecords
where KPI = 'Tons Milled'
) weightSeries on baseSeries.RecordDate = weightSeries.RecordDate
Quassnoi's answer shows how to do the SumProduct, and using a WHERE clause would allow you to restrict by a Date field...
SELECT
SUM([tbl].data * [tbl].weight) / SUM([tbl].weight)
FROM
[tbl]
WHERE
[tbl].date >= '2009 Jan 01'
AND [tbl].date < '2010 Jan 01'
The more complex part is where you want to "dynamically specify" the what field is [data] and what field is [weight]. The short answer is that realistically you'd have to make use of Dynamic SQL. Something along the lines of:
- Create a string template
- Replace all instances of [tbl].data with the appropriate data field
- Replace all instances of [tbl].weight with the appropriate weight field
- Execute the string
Dynamic SQL, however, carries it's own overhead. Is the queries are relatively infrequent , or the execution time of the query itself is relatively long, this may not matter. If they are common and short, however, you may notice that using dynamic sql introduces a noticable overhead. (Not to mention being careful of SQL injection attacks, etc.)
EDIT:
In your lastest example you highlight three fields:
RecordDate
KPI
Actual
When the [KPI] is "Weight Y", then [Actual] the Weighting Factor to use.
When the [KPI] is "Tons Milled", then [Actual] is the Data you want to aggregate.
Some questions I have are:
Are there any other fields?
Is there only ever ONE actual per date per KPI?
The reason I ask being that you want to ensure the JOIN you do is only ever 1:1. (You don't want 5 Actuals joining with 5 Weights, giving 25 resultsing records)
Regardless, a slight simplification of your query is certainly possible...
SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
CalcProductionRecords AS [baseSeries]
INNER JOIN
CalcProductionRecords AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
-- AND [weightSeries].someOtherID = [baseSeries].someOtherID
WHERE
[baseSeries].KPI = 'Tons Milled'
AND [weightSeries].KPI = 'Weighty'
The commented out line only needed if you need additional predicates to ensure a 1:1 relationship between your data and the weights.
If you can't guarnatee just One value per date, and don't have any other fields to join on, you can modify your sub_query based version slightly...
SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
(
SELECT
RecordDate,
SUM(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Tons Milled'
GROUP BY
RecordDate
)
AS [baseSeries]
INNER JOIN
(
SELECT
RecordDate,
AVG(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Weighty'
GROUP BY
RecordDate
)
AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
This assumes the AVG of the weight is valid if there are multiple weights for the same day.
EDIT : Someone just voted for this so I thought I'd improve the final answer :)
SELECT
SUM(Actual * Weight) / SUM(Weight)
FROM
(
SELECT
RecordDate,
SUM(CASE WHEN KPI = 'Tons Milled' THEN Actual ELSE NULL END) AS Actual,
AVG(CASE WHEN KPI = 'Weighty' THEN Actual ELSE NULL END) AS Weight
FROM
CalcProductionRecords
WHERE
KPI IN ('Tons Milled', 'Weighty')
GROUP BY
RecordDate
)
AS pivotAggregate
This avoids the JOIN and also only scans the table once.
It relies on the fact that NULL values are ignored when calculating the AVG().
SELECT SUM(A * B) / SUM(A)
FROM mytable
If I have understand the problem then try this
SET DATEFORMAT dmy
declare #tbl table(A int, B int,recorddate datetime,KPI varchar(50))
insert into #tbl
select 1,10 ,'21/01/2009', 'Weighty'union all
select 2,20,'10/01/2009', 'Tons Milled' union all
select 3,30 ,'03/02/2009', 'xyz'union all
select 4,40 ,'10/01/2009', 'Weighty'union all
select 5,50 ,'05/01/2009', 'Tons Milled'union all
select 6,60,'04/01/2009', 'abc' union all
select 7,70 ,'05/01/2009', 'Weighty'union all
select 8,80,'09/01/2009', 'xyz' union all
select 9,90 ,'05/01/2009', 'kws' union all
select 10,100,'05/01/2009', 'Tons Milled'
select SUM(t1.A*t2.A)/SUM(t2.A)Result from
(select RecordDate,A,B,KPI from #tbl)t1
inner join(select RecordDate,A,B,KPI from #tbl t)t2
on t1.RecordDate = t2.RecordDate
and t1.KPI = t2.KPI