SQL sub queries - is there a better way

SQL sub queries - is there a better way - sql

This is an SQL efficiency question.
A while back I had to write a collection of queries to pull data from an ERP system. Most of these were simple enough but one of them resulted in a rather ineficient query and its been bugging me ever since as there's got to be a better way.
The problem is not complex. You have rows of sales data. In each row you have quantity, sales price and the salesman code, among other information.
Commission is paid based on a stepped sliding scale. The more they sell, the better the commission. Steps might be 1000, 10000, 10000$ and so forth. The real world problem is more complex but thats it essentially it.
The only way I found of doing this was to do something like this (obviously not the real query)
select qty, price, salesman,
(select top 1 percentage from comissions
where comisiones.salesman = saleslines.salesman
and saleslines.qty > comisiones.qty
order by comissiones.qty desc
) percentage
from saleslines
this results in the correct commission but is horrendously heavy.
Is there a better way of doing this? I'm not looking for someone to rewrite my sql, more 'take a look as foobar queries' and I can take it from there.
The real life commission structure can be specified for different salesmen, articles and clients and even sales dates. It also changes from time to time, so everything has to be driven by the data in the tables... i.e I can't put fixed ranges in the sql. The current query returns some 3-400000 rows and takes around 20-30 secs. Luckily its only used monthly but the slowness is kinda bugging me.
This is on mssql.
Ian
edit:
I should have given a more complex example from the beginning. I realize now that my initial example is missing a few essential elements of the complexity, apologies to all.
This may better capture it
select client-code, product, product-family, qty, price, discount, salesman,
(select top 1 percentage from comissions
where comisiones.salesman = saleslines.salesman
and saleslines.qty > comisiones.qty
and [
a collection of conditions which may or may not apply:
Exclude rows if the salesman has offered discounts above max discounts
which appear in each row in the commissions table
There may be a special scale for the product family
There may be a special scale for the product
There may be a special scale for the client
A few more cases
]
order by [
The user can control the order though a table
which can prioritize by client, family or product
It normally goes from most to least specific.
]
) percentage
from saleslines
needless to say the real query is not easy to follow. Just to make life more interesting, its naming is multi language.
Thus for every row of salesline the commission can be different.
It may sound overly complex but if you think of how you would pay commission it makes sense. You don't want to pay someone for selling stuff at high discounts, you also want to be able to offer a particular client a discount on a particular product if they buy X units. The salesman should earn more if they sell more.
In all the above I'm excluding date limited special offers.
I think partitions may be the solution but I need to explore this more indepth as I know nothing about partitions. Its given me a few ideas.

If you are using a version of SQL Server that supports common-table expressions such as SQL Server 2005 and later, a more efficient solution might be:
With RankedCommissions As
(
Select SL.qty, SL.price, SL.salesman, C.percentage
, Row_Number() Over ( Partition By SL.salesman Order By C.Qty Desc ) As CommissionRank
From SalesLines As SL
Join Commissions As C
On SL.salesman = C.salesman
And SL.qty > C.qty
)
Select qtr, price, salesman, percentage
From RankedCommissions
Where CommissionRank = 1
If you needed to account for the possibility that there are no Commissions values for a given salesman where the SalesLine.Qty > Commission.Qty, then you could do something like:
With RankedCommissions As
(
Select SL.qty, SL.price, SL.salesman, C.percentage
, Row_Number() Over ( Partition By SL.salesman Order By C.Qty Desc ) As CommissionRank
From SalesLines As SL
Join Commissions As C
On SL.salesman = C.salesman
And SL.qty > C.qty
)
Select SL.qtr, SL.price, SL.salesman, RC.percentage
From SalesLines As SL
Left Join RankedCommissions As RC
On RC.salesman = SL.salesman
And RC.CommissionRank = 1

select
qty, price, salesman,
max(percentage)
from saleslines
inner join comissions on commisions.salesman = saleslines.salesman and
saleslines.qty > comissions.qty
group by
qty, price, salesman

Related

SQL joining two tables with different levels of details

So I have two tables of sales, budget and actual.
"budget" has two columns: location and sales. For example,
location sales
24 $20000
36 $100300
40 $24700
Total $145000
"actual" has three columns: invoice_number, location, and sales. For example,
invoice location sales
10000 36 $5000
10001 40 $6000
10002 99 $7000
and so forth
Total $110000
In summary, "actual" records transactions at the invoice level, whereas "budget" is done at the location level only (no individual invoices).
I'm trying to create a summary table that lists actual and budget sales side by side, grouped by location. The total of the actual column should be $110000, and $145000 for budget. This is my attempt at it (on pgAdmin/ postgresql):
SELECT actual.location, SUM(actual.sales) AS actual_sales, SUM(budget.sales) AS budget_sales
FROM actual LEFT JOIN budget
ON actual.location = budget.location
GROUP BY actual.location;
I used LEFT JOIN because "actual" has locations that "budget" doesn't have (e.g. location 99).
I ended up with some gigantic numbers ($millions) on both the actual_sales and budget_sales columns, far exceeding the total actual ($110000) or budget sales ($145,000).
Is this because the way I wrote my query is basically asking SQL to join each invoice in "actual" to each line in "budget," therefore duplicating things many times over? If so how should I have written this?
Thanks in advance!

Based on your description, you seem to have duplicates in both tables. There are various ways to solve this problem. Here is one using union all and group by:
select Location,
sum(actual_sales) as actual_sales,
sum(budget_sales) as budget_sales
from ((select a.location, a.sales as actual_sales, null as budget_sales
from actual a
) union all
(select b.location, null, b.sales
from budget b
)
) ab
group by location;
This structure guarantees that each value is counted only once, regardless of the table.

The query looks fine to me. However, it is difficult to find out why the figures are wrong. My suggestion is that you do the sum by location separately for budget and actual into 2 temporary tables, and later put them together using LEFT JOIN.

Yes, you're joining the budget in once for each actual sales row. However, your Actual Sales sum shouldn't have been larger unless there were multiple budget rows for the same location. You should check for that, because it doesn't sound like there should be.
What you need to do in a case like this is sum the actual sales first in a CTE or subquery, then later join the result to the budget. That way you only have one row for each location. This does it for the actual sales. If you really do have more than one row for a location for budget as well, you might need to subquery the budget as well the same way.
Select Act.Location, Act.actual_sales, budget.sales as budget_sales
From
(
SELECT actual.location, SUM(actual.sales) AS actual_sales
FROM actual
GROUP BY actual.location
) Act
left join budget on Act.location = budget.location

Gordon's suggestion is good, an alternative using WITH statements is:
WITH aloc AS (
SELECT location, SUM(sales) FROM actual GROUP BY 1
), bloc AS (
SELECT location, SUM(sales) FROM budget GROUP BY 1
)
SELECT location, a.sum AS actual_sales, b.sum AS budget_sales
FROM aloc a LEFT JOIN bloc b USING (location)
This is equivalent to:
SELECT location, a.sum AS actual_sales, b.sum AS budget_sales
FROM (SELECT location, SUM(sales) FROM actual GROUP BY 1) a LEFT JOIN
(SELECT location, SUM(sales) FROM budget GROUP BY 1) b USING (location)
but I find WITH statements more readable.
The purpose of the subqueries is to get tables into a state where a row means something relevant, i.e. aloc contains a row per location, and hence cause the join to evaluate to what you want.

Access 2013 SQL to perform linear interpolation where necessary

I have a database in which there are 13 different products, sold in 6 different countries.
Prices increase once a year.
Prices need to be calculated using a linear interpolation method.  I have 21 different price and quantity increments for each product for each country for each year.
The user needs to be able to see how much an order would cost for any given value (as you would expect).
What the database needs to do (in English!) is to:
If there is a matching quantity from TblOrderDetail in the TblPrices,
use the price for the current product, country and year
if there isn't a matching quantity but the quantity required is greater than 1000 for one product (GT) and greater than 100 for every other product:
Find the highest quantity for the product, country and year (so, 1000 or 100, depending on the product), and calculate a pro-rated price.  eg.  If someone wanted 1500 of product GT for the UK for 2015, we'd look at the price for 1000 GT in the UK for 2015 and multiply it by 1.5.  If 1800 were required, we'd multiply it by 1.8.  I haven't been able to get this working yet as I'm looking at it alongside the formula for the next possibility...
If there isn't a matching quantity and the quantity required is less than 1000 for the product GT but 100 for the other products (this is the norm)...
Find the quantity and price for the increment directly below the quantity required by the user for the required product, country and year (let's call these quantitybelow and pricebelow)
Find the quantity and price for the increment directly above the quantity required by the user for the required product, country and year (let's call these quantityabove and priceabove)
Calculate the price for the required number of products for an account holder in a particular country for a given year using this formula.
ActualPrice: PriceBelow + ((PriceAbove - PriceBelow) * (The quantity required in the order detail - QuantityBelow) / (QuantityAbove - QuantityBelow))
I have spent days on this and have sought advice about this before but I am still getting very stuck.
The tables I've been working with to try and make this work are as follows:
TblAccount (primary key is AccountID, it also has a Country field which joins to the TblCountry.Code (primary key)
TblOrders (primary key is Order ID) which joins to TblAccount via the AccountID field; TblOrderDetail via the OrderID.  This table also holds the OrderDate and Recipient ID which links to a person in TblContact - I don't need that here but will need it later to generate an invoice 
TblOrderDetail (primary key is DetailID) which joins to TblOrders via OrderID field; TblProducts via ProductID field, and holds the Quantity required as well as the product
TblProducts (primary key is ProductCode) which as well as joining to TblOrderDetail, also joins to TblPrice via the Product field
TblPrices links to the TblProducts (as you have just read).  I've also created an Alias for the TblCountry (CountryAliasForProductCode) so I can link it to the TblPrices to show the country link. I'm not sure if I needed to do this - it doesn't work if I do or I don't do it, so I seek guidance again here.
This is the code I've been trying to use (and failing) to get my price and quantity steps above and I hope to replicate it, making a couple of tweaks to get the steps below:
SELECT MIN(TblPrices.stepquantity) AS QuantityAbove, MIN(TblPrices.StepPrice) AS PriceAbove, TblOrders.OrderID, TblOrders.OldOrderID, TblOrders.AccountID, TblOrders.OrderDate, TblOrders.RecipientID, TblOrders.OrderStatus, TblOrderDetail.DetailID, TblOrderDetail.Product, TblOrderDetail.Quantity
FROM (TblCountry INNER JOIN ((TblAccount INNER JOIN TblOrders ON TblAccount.AccountID = TblOrders.AccountID) INNER JOIN (TblOrderDetail INNER JOIN TblProducts ON TblOrderDetail.Product = TblProducts.ProductCode) ON TblOrders.OrderID = TblOrderDetail.OrderID) ON TblCountry.Code = TblAccount.Country) INNER JOIN (TblCountry AS CountryAliasForProduct INNER JOIN TblPrices ON CountryAliasForProduct.Code = TblPrices.CountryCode) ON TblProducts.ProductCode = TblPrices.Product
WHERE (StepQuantity >= TblOrderDetails.Quantity)
AND (TblPrices.CountryCode = TblAccount.Country)
AND (TblOrderDetail.Product = TblPrices.Product)
AND (DATEPART('yyyy', TblPrices.DateEffective) = DATEPART('yyyy', TblOrders.OrderDate));
I've also tried...
I've even tried going back to basics and trying again to generate the steps below in 1 query, then try the steps above in another and finally, create the final calculation in another query.
This is what I have been trying to get my prices and quantities below:
SELECT Max(StepQuantity) AS quantity_below, Max(StepPrice) AS price_below, TblOrderDetails.Quantity, TblAccounts.Country
FROM 
(TblProducts INNER JOIN TblPrices ON TblProducts.ProductCode = TblPrices.Product)
(TblOrderDetail INNER JOIN TblProducts ON TblOrderDetail.Product = TblProducts.ProductCode)
(TblOrders INNER JOIN TblOrderDetail ON TblOrders.OrderID = TblOrderDetail.OrderID)
(TblAccount INNER JOIN TblOrders ON TblAccount.AccountID = TblOrders.AccountID),
WHERE (((TblPrices.StepQuantity)<=(TblOrderDetail.Quantity)) AND ((TblPrices.CountryCode)=([TblAccounts].[country])) AND ((TblPrices.Product)=([TblOrderDetail].[product])) AND ((DatePart('yyyy',[TblPrices].[DateApplicable]))=(DatePart('yyyy',[TblOrders].[OrderDate]))));
You may be able to see glaring errors in this but I'm afraid I can't.  I've tried re-jigging it and I'm getting nowhere.
I need to be able to tie the information in to the OrderDetail records as the price generated will need to be added to a financial transactions table as a debit amount and will show as an amount owing on statements.
I'm really not very good at SQL.  I've read and worked though several self-study books and I have asked part of this question before; but I really am struggling with it.  If anyone has any ideas on how to proceed, or even where I've gone wrong with my code, I'd be delighted, even if you tell me I shouldn't be using SQL. For the record, I originally posted this question on a different forum under Visual Basic. Responses from that forum brought me to SQL - however, anything that works would be good!
I've even tried, using Excel, concatenating the Year&Product&Country&Quantity to get a unique product code, interpolating the prices for every quantity between 1 and 1000 for each product, country and year and bringing them into a TblProductsAndPrices table. In Access, I created a query to concatenate the Year(of order date from tblOrders)&Product(of tblorderdetails)&Country(of tblAccount) in order to get the required product code for the order. Another query would find a price for me. However, any product code that doesn't appear on the list (such as where a quantity isn't listed in the tblProductsAndPrices as it is larger than the highest price increment) doesn't have a price.
If there was a workable solution to what I've just described that would generate a price for everything, then I'd be so pleased.
I'd really like to be able to generate an order for any quantity of any product for any account based in any country on any date and retrieve a price which will be used to "debit" a financial account in the database, who in a transaction history for an account and appear on statements. I'd also like to be able to do an ad-hoc price check on the spot.
Thank you very much for taking the time to read this.  I really appreciate it. If you could offer any help or words of encouragement, I'd be very grateful.
Many thanks
Karen

Maybe no one thinks on an easy solution to the problem, since not all minds work in database thinking.
Easy solution: Create one view that gives all calculated values, not only the final one you need, each one as a column. Then you can use such view in a relation view and use on some rows one of the values and on other rows other values, etc.
How to think is simple, think in reverse order, instead of thinking "if that then I need to calculate such else I need this other", think as "I need "such" and I need "this other", both are columns of an intermediate view, then think on top level "if" that would be another view, such view will select the correct value ignoring the rest.
Never ever try to solve all in one step, that can be a really big headache.
Pros: You can isolate calculated values (needed or not), sql is much more easy to write and maintain.
Cons: Resources use is bigger than minimal, but most of times that extra calculated values does not represent a really big impact.
In terms of tutorial out there: Instead of a Top-Down method, use a Down-Top method.
Sometimes it is better (with your example) to calculate all three values (you write sentences on bold) ignoring the if part, and have all three possible values for your order and after that discard the ones not wanted, than trying to only calculate one.
Trying to calculate only one is thinking as a procedural programming, when working with databases most times one must get rid of such thinking and think as reverse, first do the most internal part of such procedural programming to have all data collected, then do the external selection of the procedural programing.
Note: If one of the values can not be calculated, just generate a Null.
I know it is hard to think on First in, last out (Down-Top) model, but it is great for things as the one you want.
Step1 (on specific view, or a join from one view per calculation):
Calculate column 1 as price for the current product, country and
year
Calculate column 2 as calculate a pro-rated price as if 1000
Calculate column 3 as calculate a pro-rated price as if 100
Calculate column 4 as etc
Calculate column N as etc
Step 2 (Another view, the one you want):
Calculate the if part, so you can choose adequate column from previous view (you can use immediately if or a calculated auxiliary field).
Hope you can follow theese way of thinking, I have solved a lot of things like that one (and more complex) thinking in that way, but it is not easy to think as that, needs an extra effort.

Your Query does not include the specified expression, how to fix it?

I don't understand why my sql is not running,
it pop out a window say
"Your query does not include the specified expression ' SUM(SaleRecord.Number)*(product.Price' as part of an aggregate function"
SELECT SUM(SaleRecord.Number)*(Product.Price) AS TotalIncome
FROM Product, SaleRecord
WHERE Product.ProductID=SaleRecord.SaleProduct;

Product.Price is not part of the aggregate. Presumably, you intend:
SELECT SUM(SaleRecord.Number * Product.Price) AS TotalIncome
FROM Product INNER JOIN
SaleRecord
ON Product.ProductID=SaleRecord.SaleProduct;
Note that I also fixed the archaic join syntax.

You asked in my previous answer:
"thank you, I just make some mistake, now it is working. And sorry to
bother you more, I want to select the product who sell the most out,
how can I do it, I try to add MAX(xxx) on it, and it don't work"
Now, I am by no means an expert, but there are two processes going on. Your language is confusing so I'm going to assume you want to know which product sells the most in $$ terms (rather than count. For example, you might sell 1,000 $0.50 products, equallying $500 total sales, or 10 $500 products, totallying $5000. If you want the count or the dollar value, then the method changes slightly).
So the first process is to get the total sales of each product, which I outlined above. Then you want to nest that inside a second query, where you then select the max. I'll give you the code and then explain it:
SELECT ProductID, MAX(TotalSale)
FROM (
SELECT P.ProductID, SUM(S.Number)*P.Price AS TotalSale
FROM Products as P, SaleRecords as S
WHERE product.Productid = SaleRecord.SaleProduct
GROUP BY Product.ProductID
)
It's easiest to imagine this as querying a query. Your first query is in the FROM() statement. That will run and give you the output of total sale per product. Then the second query is ran (the top most SELECT line) that selects the productID and the sale amount that is the largest among all the products.
Your teacher may not like this since nesting queries is a little advanced (though completely intuitive IMO). Hopefully this helps!

You brackets are wrong - for each row you want to multiply the price by the number, and only then sum them:
SELECT SUM(SaleRecord.Number * Product.Price) AS TotalIncome
FROM Product, SaleRecord
WHERE Product.ProductID = SaleRecord.SaleProduct;

You have a bracket error:
SELECT SUM(SaleRecord.Number * Product.Price) AS TotalIncome
FROM Product INNER JOIN
SaleRecord ON Product.ProductID = SaleRecord.SaleProduct;

This is because you're not indicating which column to group by. The line you wrote is:
SUM(SaleRecord.Number) * Product.Price
Which sums all of the sale quantities (regardless of differences in product ID) and multiplies it by the price right? Well what if you have multiple products with different prices? Basically, you are doing a one to many match, where you have a total that is the sum of all the sales, multiplied by multiple prices. What you need is a group by command. I would modify your code to say:
SELECT Product.ProductID, SUM(SaleRecord.Number)*Product.Price AS TotalSales
FROM Product, SaleRecord
WHERE product.Productid = SaleRecord.SaleProduct
GROUP BY Product.ProductID
That should take care of it, telling the dbms to group each product together, sum the number of sales and then multiply by the price of that product.
You can nest that inside another query to get total Income:
SELECT SUM(TotalIncome)
FROM ( **the above code here)
EDIT: Or you can do it like the ways listed above where your query creates a TotalIncome for each ORDER, and then sums them all together. my way creates a total sale for each PRODUCT and then sums all the products

Calculate First Time Buyer and Repeating Buyers using MAQL Queries in GOOD-DATA platform

I have been recently working with GOOD-DATA platform. I don't have that much experience in MAQL, but I am working on it. I did some metric and reports in GOOD-DATA platform. Recently I tried to create a metric for calculating Total Buyers,First Time Buyers and Repeating Buyers. I created these three reports and working perfect.But when i try to add a order date parent filter the first time buyers and repeating buyers value getting wrong. please have Look at Following queries.
I can find out the correct values using sql queries.
MAQL Queries:
TOTAL ORDERS-
SELECT COUNT(NexternalOrderNo) BY CustomerNo WITHOUT PF
TOTAL FIRSTTIMEBUYERS-
SELECT COUNT(CustomerNo) WHERE (TOTAL ORDER WO PF=1) WITHOUT PF
TOTAL REPEATINGBUYERS-
SELECT COUNT(CustomerNo) WHERE (TOTAL ORDER WO PF>1) WITHOUT PF
Can any one suggest a logic for finding these values using MAQL

It's not clear what you want to do. If you could provide more details about the report you need to get, it would be great.
It's not necessary to put "without pf" into the metrics. This clause bans filter application, so when you remove it, the parent filter will be used there. And you will probably get what you want. Specifically, modify this:
SELECT COUNT(CustomerNo) WHERE (TOTAL ORDER WO PF>1) WITHOUT PF
to:
SELECT COUNT(CustomerNo) WHERE (TOTAL ORDER WO PF>1)

The only thing you miss here is "ALL IN ALL OTHER DIMENSIONS" aka "ALL OTHER".
This keyword locks and overrides all attributes in all other dimensions—keeping them from having any affect on the metric. You can read about it more in MAQL Reference Guide.
FIRSTTIMEBUYERS:
SELECT COUNT(CustomerNo)
WHERE (SELECT IFNULL(COUNT(NexternalOrderNo), 0) BY Customer ID, ALL OTHER) = 1
REPEATINGBUYERS:
SELECT COUNT(CustomerNo)
WHERE (SELECT IFNULL(COUNT(NexternalOrderNo), 0) BY Customer ID, ALL OTHER) > 1

Finding drop off rate from membership table in SQL Server 2005

We have a view that stores the history of membership plans held by our members and we have been running a half price direct debit offer for some time. We've been asked to report on whether people are allowing the direct debit to renew (at full price) but I'm no SQL expert!
The view in effect is
memberRef, historyRef, validFrom, validTo,MembershipType,PaymentType,totalAmount
Here
memberRef identifies the person (int)
historyRef identifies this row (int)
validFrom and validTo are the start and end of the plan (datetime)
MembershipType is the type of plan (int)
PaymentType is direct debit or credit card (a string - DD or EFT)
totalAmount is the price of the plan (decimal)
I'm wondering if there is a query as opposed to a cursor I can use to count the number of policies which are at half price and have another direct debit policy that follows on from it.
If we can also capture if that person first joined at half price or if there was a gap where membership had lapsed before they took the half price incentive that would be great.
Thanks in advance for any help!
For example
select count(MemberRef), max(vhOuter.validFrom) "most recent plan start",
(select top(1) vh2.validFrom
from v_Membershiphistory vh2
where (vh2.totalamount = 14.97 or vh2.totalamount = 25.50)
and vh2.memberref = vhOuter.memberref
order by createdat desc
) "half price plan start"
from v_membershiphistory vhOuter
where vhOuter.memberref in (select vh1.memberref from v_membershiphistory vh1 where vh1.totalamount = 14.97 or vh1.totalamount = 25.50)--have taken up offer
group by memberref
having max(vhOuter.validFrom) > (select top(1) vh2.validFrom
from v_Membershiphistory vh2
where (vh2.totalamount = 14.97 or vh2.totalamount = 25.50)
and vh2.memberref = vhOuter.memberref
order by createdat desc
)
This will display the members who have a half price plan and have a valid from date that is greater than the valid from date of that plan.
Not quite right as we should be testing that it is the same plan but...
if I change the select here to just count(memberRef) I get the count of memberRef for the member I'm grouping for each member I'm grouping i.e. for 5220 results I'd get 5220 rows returned each with in effect the number of plans I've selected
But I need to count the number of people taking the offer and proportion that renew. Also that renewal rate in the population that aren't taking the offer (which I'm guessing is a trivial change once I've got one set sorted)
I suppose I'm looking at how one operates on the set but compares multiple rows for each distinct person without using a cursor. But I might be wrong :)

try something like:
SELECT
a.*, b.*
FROM YourTable a
INNER JOIN YourTable b On a.memberRef=b.memberRef and a.validToDate<b.validFromDate
WHERE b.PaymentType='?direct debit?' and a.Cost='?half price?'
to get just counts use something like:
SELECT
COUNT(a.memberRef) AS TotalCount
FROM YourTable a
INNER JOIN YourTable b On a.memberRef=b.memberRef and a.validToDate<b.validFromDate
WHERE b.PaymentType='?direct debit?' and a.Cost='?half price?'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas