I have a view consists of data from different tables. major fields are BillNo,ITEM_FEE,GroupNo. Actually I need to calculate the total discount by passing the groupNo. The discount calculation is based on the fraction of amount group by BillNo(single Bill no can have multiple entries). If there are multiple transactions for a single BillNo then discount is calculated if decimal part of sum of ITEM_FEE is greater than 0 and if there is only single transaction and the decimal part of ITEM_FEE is greater than 0 then the decimal part will be treated as discount.
I have prepared script and I am getting total discount for a particular groupNo.
declare #GroupNo as nvarchar(100)
set #GroupNo='3051'
SELECT Sum(disc) Discount
FROM --sum(ITEM_FEE) TotalAmoiunt,
(SELECT (SELECT CASE
WHEN ( Sum(item_fee) )%1 > 0 THEN Sum(( item_fee )%1)
END
FROM view_bi_sales VBS
WHERE VBS.billno = VB.billno
GROUP BY billno) Disc
FROM view_bi_sales VB
WHERE groupno = #GroupNo)temp
The problem is that it takes almost 2 minutes to get the result.
Please help me to find the result faster if possible.
Thank you for all your help and support , as I was already calculating sum of decimal part of ITEM_FEE group by BillNo , there was no need of checking greater than 0 or not. below query gives me the desired ouput in less than 10 sec
select sum(discount) from
(select sum((ITEM_FEE)%1) discount from view_BI_Sales
where groupno=3051
group by BillNo )temp
If I understand correctly, you don't need a JOIN. This might help performance:
SELECT SUM(disc) as Discount
FROM (SELECT (CASE WHEN SUM(item_fee % 1) > 0
THEN SUM(item_fee % 1)
END) as disc
FROM view_bi_sales VBS
WHERE groupno = #GroupNo
GROUP BY billno
) vbs;
Related
I'm brainstorming on ways to find trends over a dataset containing transaction amounts that spans a year.
I'd like to run an average of top 25% observations of data and bottom 75% observations of data and viceversa.
If the entire dataset contains 1000 observations, I'd like to run:
An average of the top 25% and then separately, an average of the bottom 75% and find the resulting average of this.
Inversely, top 75% average, then bottom 25%, then the average of the 2.
For the overall average I have: avg(transaction_amount)
I am aware that in order for the sectioning averages to be useful, I will have to order the data according to the date which I already have accounted for in my SQL code:
select avg(transaction_amount)
from example.table
order by transaction_date
I am now struggling to find a way to split the data between 25% and 75% based on the number of observations.
Thanks.
If you're using MSSQL, it's pretty trivial depending on exactly the output you're looking for.
SELECT TOP 25 PERCENT
*
FROM (
SELECT
AVG(transaction_amount) as avg_amt
FROM example.table
) AS sub
ORDER BY sub.avg_amt DESC
Use PERCENT_RANK in order to see which percentage block a row belongs to. Then use this to group your data:
with data as
(
select t.*, percent_rank() over (order by transaction_amount) as pr
from example.table t
)
select
case when pr <= 0.75 then '0-75%' else '75-100%' end as percent,
avg(transaction_amount) as avg,
avg(avg(transaction_amount)) over () as avg_of_avg
from data
group by case when pr <= 0.75 then '0-75%' else '75-100%' end
union all
select
case when pr <= 0.25 then '0-25%' else '25-100%' end as percent,
avg(transaction_amount) as avg,
avg(avg(transaction_amount)) over () as avg_of_avg
from data
case when pr <= 0.25 then '0-25%' else '25-100%' end;
I have a table called Graduates recording name and income for different graduates. Now I need to calculate the median of income. Here is the code from a book.
My question is
What is the result from having clause?
What is the result from self join ?
SELECT AVG(DISTINCT income)
FROM (
SELECT T1.income
FROM Graduates T1, Graduates T2
GROUP BY T1.income
HAVING SUM(CASE WHEN T2.income >= T1.income THEN 1 ELSE 0 END)
>= COUNT(*) / 2
AND SUM(CASE WHEN T2.income <= T1.income THEN 1 ELSE 0 END)
>= COUNT(*) / 2
) TMP;
What the code is doing is finding the one or two middle values. It is doing this by counting the number of values bigger than and less than the value.
Each of the SUM()s in the HAVING clause is counting the number of values greater than or less than a given income. What the expression is doing is saying something to the effect:
The middle value value(s) are the ones that have the same number of values bigger and less than itself.
The median is then the average of the middle values. If there is one, then the average is the value itself. If there are two the average is the median.
This is awful for multiple reasons:
It only works on numeric values. But medians are defined for strings and dates as well.
It requires a self-join, which is expensive.
It is rather indecipherable.
It is not obvious how to get medians within groups rather than over the entire dataset.
I'm using an sql statement for total of stockmutations and there are different ways how the stock will mutate.
In the following SQL query you will see that i check for types 1,6 and 9 that the stock will change and sum the total.
But there is still 1 more type to be added and that is type 0 and the only stock to sum this up = when the stock is a negative number.
select ART.Artcode, SUM(VRDMUT.VrdMutAantal)
from Kingsystem.tabVoorraadMutatie VRDMUT
inner join Kingsystem.tabArtikel ART
on ART.ArtGid = VRDMUT.VrdMutArtGid
where VRDMUT.VrdMutDatum between dateadd(day, -7, date(now())) and date(now()) AND VRDMUT.VrdMutSoort IN (1,6,9)
GROUP BY ART.Artcode
Does someone know what to do with the query to get the result i want?
Just use ABS in the SUM expression SUM(ABS(VRDMUT.VrdMutAantal))
with AND VRDMUT.VrdMutSoort IN (0,1,6,9)
Thanks in advance for any thoughts, advice, and suggestions!
System: SQL Server 2008 R2
I need to count for a given customer the number of repurchases within several different time intervals (date ranges), and display these counts in a single table. I get this working with several subsequent common table expressions (cte) which I finally join together. This way, however, is cumbersome and rather inefficient (in terms of performance speed).
The SQL code I expected to be shortest and fastest, however, does not work for several reasons and will return error messages like
“ the subqueries (Select (count …….) will return several values and hence “cannot be used as an expression”
or
Another error message is: “An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.”
Please find below a sample table (WDB), the desired result table (WDB_result) and the SQL code that need improvement. Thanks a lot to everyone who may help!
Sample WDB Table:
CustomerID: customer ID
InNo: invoice number
OrderDate: order date
Result table WDB_result:
Columns
A) total number of repurchases
B) number of repurchases within the first 3 months
C) number of repurchases within the first 6 months
D) number of repurchases within the first 12 months
E) number of repurchases with last 3 months
F) number of repurchases with last 6 months
G) number of repurchases with last 12 months
Sample SQL Code to calculate columns A, B, und E:
SELECT
CustomerID
, COUNT(InNo) OVER (PARTITION by CustomerID) -1) as Norepurchases_Total
, (SELECT (COUNT(InNo) OVER (PARTITION by CustomerID) -1) as Count3
FROM WDB
WHERE OrderDate between MIN(OrderDate) and DATEADD(month, 3, MIN(OrderDate))
) as Norepurchases_1st_3months
, (SELECT (COUNT(InNo) OVER (PARTITION by CustomerID) -1) as Count3
FROM WDB
WHERE OrderDate between MAX(OrderDate) and DATEPART(y, DATEADD(m, -3, getdate()))
) as NoRepurchases_Last_3months
FROM WDB;
Typically what I would do in a situation like this is something like
SELECT CustormerID,
SUM(
CASE
WHEN OrderDate > #ThreeMonthsAgo AND OrderDate <= #CurrentDate
1
ELSE 0
END
) InLast3Months,
SUM(
CASE
WHEN OrderDate > #SixMonthsAgo AND OrderDate <= #ThreeMonthsAgo
1
ELSE 0
END
) InLast3To6Months,
...
FROM YourTable
GROUP BY CustomerID
This will alow you to determine the buckets beforehand as variables, as shown, and then count how many items falls in which buckets.
This is a very interesting query and I think what you're after can be achieved if you read over this stackoverflow article on multiple aggregate functions.
Applying the same concept as is used in this question should solve your problem.
I have a table dbo.X with DateTime column Y which may have hundreds of records.
My Stored Procedure has parameter #CurrentDate, I want to find out the date in the column Y in above table dbo.X which is less than and closest to #CurrentDate.
How to find it?
The where clause will match all rows with date less than #CurrentDate and, since they are ordered descendantly, the TOP 1 will be the closest date to the current date.
SELECT TOP 1 *
FROM x
WHERE x.date < #CurrentDate
ORDER BY x.date DESC
Use DateDiff and order your result by how many days or seconds are between that date and what the Input was
Something like this
select top 1 rowId, dateCol, datediff(second, #CurrentDate, dateCol) as SecondsBetweenDates
from myTable
where dateCol < #currentDate
order by datediff(second, #CurrentDate, dateCol)
I have a better solution for this problem i think.
I will show a few images to support and explain the final solution.
Background
In my solution I have a table of FX Rates. These represent market rates for different currencies. However, our service provider has had a problem with the rate feed and as such some rates have zero values. I want to fill the missing data with rates for that same currency that as closest in time to the missing rate. Basically I want to get the RateId for the nearest non zero rate which I will then substitute. (This is not shown here in my example.)
1) So to start off lets identify the missing rates information:
Query showing my missing rates i.e. have a rate value of zero
2) Next lets identify rates that are not missing.
Query showing rates that are not missing
3) This query is where the magic happens. I have made an assumption here which can be removed but was added to improve the efficiency/performance of the query. The assumption on line 26 is that I expect to find a substitute transaction on the same day as that of the missing / zero transaction.
The magic happens is line 23: The Row_Number function adds an auto number starting at 1 for the shortest time difference between the missing and non missing transaction. The next closest transaction has a rownum of 2 etc.
Please note that in line 25 I must join the currencies so that I do not mismatch the currency types. That is I don't want to substitute a AUD currency with CHF values. I want the closest matching currencies.
Combining the two data sets with a row_number to identify nearest transaction
4) Finally, lets get data where the RowNum is 1
The final query
The query full query is as follows;
; with cte_zero_rates as
(
Select *
from fxrates
where (spot_exp = 0 or spot_exp = 0)
),
cte_non_zero_rates as
(
Select *
from fxrates
where (spot_exp > 0 and spot_exp > 0)
)
,cte_Nearest_Transaction as
(
select z.FXRatesID as Zero_FXRatesID
,z.importDate as Zero_importDate
,z.currency as Zero_Currency
,nz.currency as NonZero_Currency
,nz.FXRatesID as NonZero_FXRatesID
,nz.spot_imp
,nz.importDate as NonZero_importDate
,DATEDIFF(ss, z.importDate, nz.importDate) as TimeDifferece
,ROW_NUMBER() Over(partition by z.FXRatesID order by abs(DATEDIFF(ss, z.importDate, nz.importDate)) asc) as RowNum
from cte_zero_rates z
left join cte_non_zero_rates nz on nz.currency = z.currency
and cast(nz.importDate as date) = cast(z.importDate as date)
--order by z.currency desc, z.importDate desc
)
select n.Zero_FXRatesID
,n.Zero_Currency
,n.Zero_importDate
,n.NonZero_importDate
,DATEDIFF(s, n.NonZero_importDate,n.Zero_importDate) as Delay_In_Seconds
,n.NonZero_Currency
,n.NonZero_FXRatesID
from cte_Nearest_Transaction n
where n.RowNum = 1
and n.NonZero_FXRatesID is not null
order by n.Zero_Currency, n.NonZero_importDate