SQL UPDATE: integer divided by count of a specified value - sql

SQL Server 2008:
Supposing a table of customers, and a column called "Shipping_State". I want to split the $10,000 spent on shipping costs equally amongst all customers who have Shipping_State = Ohio value, so if there's 2 in Ohio 1 month, it'll be 5,000 a piece, if there's 100 the next month, it'll be 100 a piece.
I have a blank column in the table named Cost for that calculated value. Cost is a decimal(18,4) data type. I'd like to be able to use the query for any data types (usually nchar).
How would I accomplish this? My incorrect code in SQL Server Mgmt Studio returns the message:
Msg 157, Level 15, State 1, Line 1 An aggregate may not appear in the
set list of an UPDATE statement.
UPDATE CustomerTable
SET Cost = (10000 / COUNT(CustomerTable.Shipping_State))
WHERE CustomerTable.Shipping_State = 'Ohio';

Use nested SELECT.
UPDATE CustomerTable
SET Cost = (SELECT 10000.0 / count(*)
FROM CustomerTable
WHERE CustomerTable.Shipping_state = 'Ohio')
WHERE CustomerTable.Shipping_State = 'Ohio';

You would need to do a sub-query to get the count, and then update based on this value, something like this should work:
UPDATE CustomerTable
SET Cost = (10000 / CTCount.Shipping_State_Count)
FROM CustomerTable CT
INNER JOIN (
SELECT Shipping_State, COUNT(Shipping_State) AS Shipping_State_Count
FROM CustomerTable
GROUP BY Shipping_State) CTCount ON
CT.Shipping_State = CTCount.Shipping_State
WHERE CT.Shipping_State = 'Ohio';

SQL Server offers two things that really help with this type of query. The first is updatable CTEs and the second are window functions.
with toupdate as (
select ct.*, count(*) over (partition by ct.Shipping_State) as cnt
from CustomerTable
where ct.Shipping_State = 'Ohio'
)
update toupdate
set Cost = cast(10000 as float) / cnt;
Note that 10000 is cast to a floating point number. SQL Server does integer division, and I presume you want integers here (actually, money would probably be a better data type).
It is unclear how "month" fits in, but this might be closer to what you are looking for:
with toupdate as (
select ct.*, count(*) over (partition by ct.Shipping_State, month(ct.Shipping_Date) as cnt
from CustomerTable
where ct.Shipping_State = 'Ohio'
)
update toupdate
set Cost = cast(10000 as float) / cnt;
Note the change to the partition by clause.

Related

How to use rounded value for join in SQL

i'm just learning SQL today and i never thought how fun it's until i'm fiddling with it.
I got a problem and i need a help.
i have 2 tables, Customer and Rate, with details stated below
Customer
idcustomer = int
namecustomer = varchar
rate = decimal(3,0)
with value as described:
idcustomer---namecustomer---rate
1---JOHN DOE---100
2---MARY JANE---90
3---CLIVE BAKER---12
4---DANIEL REYES---47
Rate
rate = decimal(3,0)
description = varchar(40)
with value as described:
rate---description
10---G Rank
20---F Rank
30---E Rank
40---D Rank
50---C Rank
60---B Rank
70---A Rank
80---S Rank
90---SS Rank
100---SSS Rank
Then i ran query below in order to round all values in customer.rate field then inner join it with rate table.
SELECT *, round(rate,-1) as roundedrate
FROM customer INNER JOIN rate ON customer.roundedrate = rate.rate
It didn't produce this result:
idcustomer---namecustomer---rate---roundedrate---description
1---JOHN DOE---100---100---SSS Rank
2---MARY JANE---90---90---SS Rank
3---CLIVE BAKER---12---10---G Rank
4---DANIEL REYES---47---50---C Rank
Is there anything wrong with my code ?
Your query should produce an 'ambigious column' error because you're not specifying a table name when referring to rate (in round(rate,-1)), which exists in both tables.
Also, the where part of a sql query is executed before the select part, so you can't refer to the alias customer.roundedrate in your where statement.
Try this instead
SELECT *, round(customer.rate,-1) as roundedrate
FROM customer INNER JOIN rate ON round(customer.rate,-1) = rate.rate
http://sqlfiddle.com/#!9/e94a60/2
I would suggest a correlated subquery for this:
select c.*,
(select r.description
from rate r
where r.rate <= c.rate
order by r.rate desc
fetch first 1 row only
) as description
from customer c;
Note: fetch first 1 row only is ANSI standard SQL, which some databases do not support. MySQL uses limit. Older versions of SQL Server use select top 1 instead.

how to perform multiple aggregations on a single SQL query

I have a table with Three columns:
GEOID, ParcelID, and PurchaseDate.
The PKs are GEOID and ParcelID which is formatted as such:
GEOID PARCELID PURCHASEDATE
12345 AB123 1/2/1932
12345 sfw123 2/5/2012
12345 fdf323 4/2/2015
12346 dfefej 2/31/2022 <-New GEOID
What I need is an aggregation based on GEOID.
I need to count the number of ParcelIDs from last month PER GEOID
and I need to provide a percentage of that GEOID of all total sold last month.
I need to produce three columns:
GEOID Nbr_Parcels_Sold Percent_of_total
For each GEOID, I need to know how many Parcels Sold Last month, and with that Number, find out how much percentage that entails for all Solds.
For example: if there was 20 Parcels Sold last month, and 4 of them were sold from GEOID 12345, then the output would be:
GEOID Nbr_Parcels_Sold Perc_Total
12345 4 .2 (or 20%)
I am having issues with the dual aggregation. The concern is that the table in question has over 8 million records.
if there is a SQL Warrior out here who have seen this issue before, Any wisdom would be greatly appreciated.
Thanks.
Hopefully you are using SQL Server 2005 or later version, in which case you can get advantage of windowed aggregation. In this case, windowed aggregation will allow you to get the total sale count alongside counts per GEOID and use the total in calculations. Basically, the following query returns just the counts:
SELECT
GEOID,
Nbr_Parcels_Sold = COUNT(*),
Total_Parcels_Sold = SUM(COUNT(*)) OVER ()
FROM
dbo.atable
GROUP BY
GEOID
;
The COUNT(*) call gives you counts per GEOID, according to the GROUP BY clause. Now, the SUM(...) OVER expression gives you the grand total count in the same row as the detail count. It is the empty OVER clause that tells the SUM function to add up the results of COUNT(*) across the entire result set. You can use that result in calculations just like the result of any other function (or any expression in general).
The above query simply returns the total value. As you actually want not the value itself but a percentage from it for each GEOID, you can just put the SUM(...) OVER call into an expression:
SELECT
GEOID,
Nbr_Parcels_Sold = COUNT(*),
Percent_of_total = COUNT(*) * 100 / SUM(COUNT(*)) OVER ()
FROM
dbo.atable
GROUP BY
GEOID
;
The above will give you integer percentages (truncated). If you want more precision or a different representation, remember to cast either the divisor or the dividend (optionally both) to a non-integer numeric type, since SQL Server always performs integral division when both operands are integers.
How about using sub-query to count the sum
WITH data AS
(
SELECT *
FROM [Table]
WHERE
YEAR(PURCHASEDATE) * 100 + MONTH(PURCHASEDATE) = 201505
)
SELECT
GEOID,
COUNT(*) AS Nbr_Parcels_Sold,
CONVERT(decimal(18,8), COUNT(*)) /
(SELECT COUNT(*) FROM data) AS Perc_Total
FROM
data t
GROUP BY
GEOID
EDIT
To update another table by the result, use UPDATE under WITH()
WITH data AS
(
SELECT *
FROM [Table]
WHERE
YEAR(PURCHASEDATE) * 100 + MONTH(PURCHASEDATE) = 201505
)
UPDATE target SET
Nbr_Parcels_Sold = source.Nbr_Parcels_Sold,
Perc_Total = source.Perc_Total
FROM
[AnotherTable] target
INNER JOIN
(
SELECT
GEOID,
COUNT(*) AS Nbr_Parcels_Sold,
CONVERT(decimal(18,8), COUNT(*)) /
(SELECT COUNT(*) FROM data) AS Perc_Total
FROM
data t
GROUP BY
GEOID
) source ON target.GEOID = source.GEOID
Try the following. It grabs the total sales into a variable then uses it in the subsequent query:
DECLARE #pMonthStartDate DATETIME
DECLARE #MonthEndDate DATETIME
DECLARE #TotalPurchaseCount INT
SET #pMonthStartDate = <EnterFirstDayOfAMonth>
SET #MonthEndDate = DATEADD(MONTH, 1, #pMonthStartDate)
SELECT
#TotalPurchaseCount = COUNT(*)
FROM
GEOIDs
WHERE
PurchaseDate BETWEEN #pMonthStartDate
AND #MonthEndDate
SELECT
GEOID,
COUNT(PARCELID) AS Nbr_Parcels_Sold,
CAST(COUNT(PARCELID) AS FLOAT) / CAST(#TotalPurchaseCount AS FLOAT) * 100.0 AS Perc_Total
FROM
GEOIDs
WHERE
ModifiedDate BETWEEN #pMonthStartDate
AND #MonthEndDate
GROUP BY
GEOID
I'm guessing your table name is GEOIDs. Change the value of #pMonthStartDate to suit yourself. If your PKs are as you say then this will be a quick query.

update existing column with results of select query using sql

I am trying to update a column called Number_Of_Marks in our Results table using the results we get from our SELECT statement. Our select statement is used to count the numbers of marks per module in our results table. The SELECT statement works and the output is correct, which is
ResultID ModuleID cnt
-------------------------
111 ART3452 2
114 ART3452 2
115 CSC3039 3
112 CSC3039 3
113 CSC3039 3
The table in use is:
Results: ResultID, ModuleID, Number_Of_Marks
We need the results of cnt to be updated into our Number_Of_Marks column. This is our code below...
DECLARE #cnt INT
SELECT #cnt
SELECT C.cnt
FROM Results S
INNER JOIN (SELECT ModuleID, count(ModuleID) as cnt
FROM Results
GROUP BY ModuleID) C ON S.ModuleID = C.ModuleID
UPDATE Results
SET [Number_Of_Marks] = (#cnt)
You can do this in SQL Server using the update/join syntax:
UPDATE s
SET [Number_Of_Marks] = c.cnt
FROM Results S INNER JOIN
(SELECT ModuleID, count(ModuleID) as cnt
FROM Results
GROUP BY ModuleID
) C
ON S.ModuleID = C.ModuleID;
I assume that you want the count from the subquery, not from the uninitialized variable.
EDIT:
In general, when you change the question it is better to ask another question. Sometimes, though, the changes are really small. The revised query looks something like:
UPDATE s
SET [Number_Of_Marks] = c.cnt,
Marks = avgmarks
FROM Results S INNER JOIN
(SELECT ModuleID, count(ModuleID) as cnt, avg(marks * 1.0) as avgmarks
FROM Results
GROUP BY ModuleID
) C
ON S.ModuleID = C.ModuleID;
Note that I multiplied the marks by 1.0. This is a quick-and-dirty way to convert an integer to a numeric value. SQL Server takes averages on integers and produces an integer. Usually you want some sort of decimal or floating value.

Calculating current consecutive days from a table

I have what seems to be a common business request but I can't find no clear solution. I have a daily report (amongst many) that gets generated based on failed criteria and gets saved to a table. Each report has a type id tied to it to signify which report it is, and there is an import event id that signifies the day the imports came in (a date column is added for extra clarification). I've added a sqlfiddle to see the basic schema of the table (renamed for privacy issues).
http://www.sqlfiddle.com/#!3/81945/8
All reports currently generated are working fine, so nothing needs to be modified on the table. However, for one report (type 11), not only I need pull the invoices that showed up today, I also need to add one column that totals the amount of consecutive days from date of run for that invoice (including current day). The result should look like the following, based on the schema provided:
INVOICE MESSAGE EVENT_DATE CONSECUTIVE_DAYS_ON_REPORT
12345 Yes July, 30 2013 6
54355 Yes July, 30 2013 2
644644 Yes July, 30 2013 4
I only need the latest consecutive days, not any other set that may show up. I've tried to run self joins to no avail, and my last attempt is also listed as part of the sqlfiddle file, to no avail. Any suggestions or ideas? I'm quite stuck at the moment.
FYI: I am working in SQL Server 2000! I have seen a lot of neat tricks that have come out in 2005 and 2008, but I can't access them.
Your help is greatly appreciated!
Something like this? http://www.sqlfiddle.com/#!3/81945/14
SELECT
[final].*,
[last].total_rows
FROM
tblEventInfo AS [final]
INNER JOIN
(
SELECT
[first_of_last].type_id,
[first_of_last].invoice,
MAX([all_of_last].event_date) AS event_date,
COUNT(*) AS total_rows
FROM
(
SELECT
[current].type_id,
[current].invoice,
MAX([current].event_date) AS event_date
FROM
tblEventInfo AS [current]
LEFT JOIN
tblEventInfo AS [previous]
ON [previous].type_id = [current].type_id
AND [previous].invoice = [current].invoice
AND [previous].event_date = [current].event_date-1
WHERE
[current].type_id = 11
AND [previous].type_id IS NULL
GROUP BY
[current].type_id,
[current].invoice
)
AS [first_of_last]
INNER JOIN
tblEventInfo AS [all_of_last]
ON [all_of_last].type_id = [first_of_last].type_id
AND [all_of_last].invoice = [first_of_last].invoice
AND [all_of_last].event_date >= [first_of_last].event_date
GROUP BY
[first_of_last].type_id,
[first_of_last].invoice
)
AS [last]
ON [last].type_id = [final].type_id
AND [last].invoice = [final].invoice
AND [last].event_date = [final].event_date
The inner most query looks up the starting record of the last block of consecutive records.
Then that joins on to all the records in that block of consecutive records, giving the final date and the count of rows (consecutive days).
Then that joins on to the row for the last day to get the message, etc.
Make sure that in reality you have an index on (type_id, invoice, event_date).
You have multiple problems. Tackle them separately and build up.
Problems:
1) Identifying consecutive ranges: subtract the row_number from the range column and group by the result
2) No ROW_NUMBER() functions in SQL 2000: Fake it with a correlated subquery.
3) You actually want DENSE_RANK() instead of ROW_NUMBER: Make a list of unique dates first.
Solutions:
3)
SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date
2)
SELECT t2.invoice,t2.event_date,t2.id,
DATEDIFF(day,(SELECT COUNT(DISTINCT event_date) FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t1 WHERE t2.invoice = t1.invoice AND t2.event_date > t1.event_date),t2.event_date) grp
FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t2
ORDER BY invoice,grp,event_date
1)
SELECT
t3.invoice AS INVOICE,
MAX(t3.event_date) AS EVENT_DATE,
COUNT(t3.event_date) AS CONSECUTIVE_DAYS_ON_REPORT
FROM (
SELECT t2.invoice,t2.event_date,t2.id,
DATEDIFF(day,(SELECT COUNT(DISTINCT event_date) FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t1 WHERE t2.invoice = t1.invoice AND t2.id > t1.id),t2.event_date) grp
FROM (SELECT MAX(id) AS id,invoice,event_date FROM tblEventInfo GROUP BY invoice,event_date) t2
) t3
GROUP BY t3.invoice,t3.grp
The rest of your question is a little ambiguous. If two ranges are of equal length, do you want both or just the most recent? Should the output MESSAGE be 'Yes' if any message = 'Yes' or only if the most recent message = 'Yes'?
This should give you enough of a breadcrumb though
I had a similar requirement not long ago getting a "Top 5" ranking with a consecutive number of periods in Top 5. The only solution I found was to do it in a cursor. The cursor has a date = #daybefore and inside the cursor if your data does not match quit the loop, otherwise set #daybefore = datediff(dd, -1, #daybefore).
Let me know if you want an example. There just seem to be a large number of enthusiasts, who hit downvote when they see the word "cursor" even if they don't have a better solution...
Here, try a scalar function like this:
CREATE FUNCTION ConsequtiveDays
(
#invoice bigint, #date datetime
)
RETURNS int
AS
BEGIN
DECLARE #ct int = 0, #Count_Date datetime, #Last_Date datetime
SELECT #Last_Date = #date
DECLARE counter CURSOR LOCAL FAST_FORWARD
FOR
SELECT event_date FROM tblEventInfo
WHERE invoice = #invoice
ORDER BY event_date DESC
FETCH NEXT FROM counter
INTO #Count_Date
WHILE ##FETCH_STATUS = 0 AND DATEDIFF(dd,#Last_Date,#Count_Date) < 2
BEGIN
#ct = #ct + 1
END
CLOSE counter
DEALLOCATE counter
RETURN #ct
END
GO

Weighted average in T-SQL (like Excel's SUMPRODUCT)

I am looking for a way to derive a weighted average from two rows of data with the same number of columns, where the average is as follows (borrowing Excel notation):
(A1*B1)+(A2*B2)+...+(An*Bn)/SUM(A1:An)
The first part reflects the same functionality as Excel's SUMPRODUCT() function.
My catch is that I need to dynamically specify which row gets averaged with weights, and which row the weights come from, and a date range.
EDIT: This is easier than I thought, because Excel was making me think I required some kind of pivot. My solution so far is thus:
select sum(baseSeries.Actual * weightSeries.Actual) / sum(weightSeries.Actual)
from (
select RecordDate , Actual
from CalcProductionRecords
where KPI = 'Weighty'
) baseSeries inner join (
select RecordDate , Actual
from CalcProductionRecords
where KPI = 'Tons Milled'
) weightSeries on baseSeries.RecordDate = weightSeries.RecordDate
Quassnoi's answer shows how to do the SumProduct, and using a WHERE clause would allow you to restrict by a Date field...
SELECT
SUM([tbl].data * [tbl].weight) / SUM([tbl].weight)
FROM
[tbl]
WHERE
[tbl].date >= '2009 Jan 01'
AND [tbl].date < '2010 Jan 01'
The more complex part is where you want to "dynamically specify" the what field is [data] and what field is [weight]. The short answer is that realistically you'd have to make use of Dynamic SQL. Something along the lines of:
- Create a string template
- Replace all instances of [tbl].data with the appropriate data field
- Replace all instances of [tbl].weight with the appropriate weight field
- Execute the string
Dynamic SQL, however, carries it's own overhead. Is the queries are relatively infrequent , or the execution time of the query itself is relatively long, this may not matter. If they are common and short, however, you may notice that using dynamic sql introduces a noticable overhead. (Not to mention being careful of SQL injection attacks, etc.)
EDIT:
In your lastest example you highlight three fields:
RecordDate
KPI
Actual
When the [KPI] is "Weight Y", then [Actual] the Weighting Factor to use.
When the [KPI] is "Tons Milled", then [Actual] is the Data you want to aggregate.
Some questions I have are:
Are there any other fields?
Is there only ever ONE actual per date per KPI?
The reason I ask being that you want to ensure the JOIN you do is only ever 1:1. (You don't want 5 Actuals joining with 5 Weights, giving 25 resultsing records)
Regardless, a slight simplification of your query is certainly possible...
SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
CalcProductionRecords AS [baseSeries]
INNER JOIN
CalcProductionRecords AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
-- AND [weightSeries].someOtherID = [baseSeries].someOtherID
WHERE
[baseSeries].KPI = 'Tons Milled'
AND [weightSeries].KPI = 'Weighty'
The commented out line only needed if you need additional predicates to ensure a 1:1 relationship between your data and the weights.
If you can't guarnatee just One value per date, and don't have any other fields to join on, you can modify your sub_query based version slightly...
SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
(
SELECT
RecordDate,
SUM(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Tons Milled'
GROUP BY
RecordDate
)
AS [baseSeries]
INNER JOIN
(
SELECT
RecordDate,
AVG(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Weighty'
GROUP BY
RecordDate
)
AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
This assumes the AVG of the weight is valid if there are multiple weights for the same day.
EDIT : Someone just voted for this so I thought I'd improve the final answer :)
SELECT
SUM(Actual * Weight) / SUM(Weight)
FROM
(
SELECT
RecordDate,
SUM(CASE WHEN KPI = 'Tons Milled' THEN Actual ELSE NULL END) AS Actual,
AVG(CASE WHEN KPI = 'Weighty' THEN Actual ELSE NULL END) AS Weight
FROM
CalcProductionRecords
WHERE
KPI IN ('Tons Milled', 'Weighty')
GROUP BY
RecordDate
)
AS pivotAggregate
This avoids the JOIN and also only scans the table once.
It relies on the fact that NULL values are ignored when calculating the AVG().
SELECT SUM(A * B) / SUM(A)
FROM mytable
If I have understand the problem then try this
SET DATEFORMAT dmy
declare #tbl table(A int, B int,recorddate datetime,KPI varchar(50))
insert into #tbl
select 1,10 ,'21/01/2009', 'Weighty'union all
select 2,20,'10/01/2009', 'Tons Milled' union all
select 3,30 ,'03/02/2009', 'xyz'union all
select 4,40 ,'10/01/2009', 'Weighty'union all
select 5,50 ,'05/01/2009', 'Tons Milled'union all
select 6,60,'04/01/2009', 'abc' union all
select 7,70 ,'05/01/2009', 'Weighty'union all
select 8,80,'09/01/2009', 'xyz' union all
select 9,90 ,'05/01/2009', 'kws' union all
select 10,100,'05/01/2009', 'Tons Milled'
select SUM(t1.A*t2.A)/SUM(t2.A)Result from
(select RecordDate,A,B,KPI from #tbl)t1
inner join(select RecordDate,A,B,KPI from #tbl t)t2
on t1.RecordDate = t2.RecordDate
and t1.KPI = t2.KPI