Using distinct and then an aggregate function in Postgresql? - sql

This is a pretty basic problem and for whatever reason I can't find a reasonable solution. I'll do my best to explain.
Say you have an event ticket (section, row, seat #). Each ticket belongs to an attendee. Multiple tickets can belong to the same attendee. Each attendee has a worth (ex: Attendee #1 is worth $10,000). That said, here's what I want to do:
1. Group the tickets by their section
2. Get number of tickets (count)
3. Get total worth of the attendees in that section
Here's where I'm having problems: If Attendee #1 is worth $10,000 and is using 4 tickets, sum(attendees.worth) is returning $40,000. Which is not accurate. The worth should be $10,000. Yet when I make the result distinct on the attendee, the count is not accurate. In an ideal world it'd be nice to do something like
select
tickets.section,
count(tickets.*) as count,
sum(DISTINCT ON (attendees.id) attendees.worth) as total_worth
from
tickets
INNER JOIN
attendees ON attendees.id = tickets.attendee_id
GROUP BY tickets.section
Obviously this query doesn't work. How can I accomplish this same thing in a single query? OR is it even possible? I'd prefer to stay away from sub queries too because this is part of a much larger solution where I would need to do this across multiple tables.
Also, the worth should follow the ticket divided evenly. Ex: $10,000 / 4. Each ticket has an attendee worth of $5,000. So if the tickets are in different sections, they take their prorated worth with them.
Thanks for your help.

You need to aggregate the tickets before the attendees:
select ta.section, sum(ta.numtickets) as count, sum(a.worth) as total_worth
from (select attendee_id, section, count(*) as numtickets
from tickets
group by attendee_id, section
) ta INNER JOIN
attendees a
ON a.id = ta.attendee_id
GROUP BY ta.section
You still have a problem of a single attendee having seats in multiple sections. However, you do not specify how to solve that (apportion the worth? randomly choose one section? attribute it to all sections? canonically choose a section?)

Using jsonb_object_agg:
select
tickets.section,
count(tickets.*) as count,
(
SELECT SUM(value::int4)
FROM jsonb_each_text(jsonb_object_agg(attendees.id, attendees.worth))
) as total_worth
from
tickets
INNER JOIN
attendees ON attendees.id = tickets.attendee_id
GROUP BY tickets.section

Related

How to find the sum of a column generated from a query

So, I'm im doing a ticket style database and i'm trying to find the sum of the tickets generated from this. I have attached my query below and the result of it
QUERY RESULT
SELECT Price, TicketID, Ticket.TicketType
FROM Price, Ticket
WHERE Price.TicketType = Ticket.TicketType
;
In the picture, I am trying to get a total for all of the prices that you see in there, each ticketID is a new ticket sold. I can click the totals button in access and it puts a total below it, but I need the total done in the query, so I am wondering how I could do this.
First, learn to use proper join syntax. Commmas in the FROM clause are simply an archaic way to represent a join.
From what you say, you want sum():
SELECT SUM(Price)
FROM Price JOIN
Ticket
ON Price.TicketType = Ticket.TicketType;
This may or may not do exactly what you want. It answers the question quite specifically.
Use the SUM function:
http://www.w3schools.com/sql/sql_func_sum.asp
I think this can help

Access 2013 SQL to perform linear interpolation where necessary

I have a database in which there are 13 different products, sold in 6 different countries.
Prices increase once a year.
Prices need to be calculated using a linear interpolation method.  I have 21 different price and quantity increments for each product for each country for each year.
The user needs to be able to see how much an order would cost for any given value (as you would expect).
What the database needs to do (in English!) is to:
If there is a matching quantity from TblOrderDetail in the TblPrices,
use the price for the current product, country and year
if there isn't a matching quantity but the quantity required is greater than 1000 for one product (GT) and greater than 100 for every other product:
Find the highest quantity for the product, country and year (so, 1000 or 100, depending on the product), and calculate a pro-rated price.  eg.  If someone wanted 1500 of product GT for the UK for 2015, we'd look at the price for 1000 GT in the UK for 2015 and multiply it by 1.5.  If 1800 were required, we'd multiply it by 1.8.  I haven't been able to get this working yet as I'm looking at it alongside the formula for the next possibility...
If there isn't a matching quantity and the quantity required is less than 1000 for the product GT but 100 for the other products (this is the norm)...
Find the quantity and price for the increment directly below the quantity required by the user for the required product, country and year (let's call these quantitybelow and pricebelow)
Find the quantity and price for the increment directly above the quantity required by the user for the required product, country and year (let's call these quantityabove and priceabove)
Calculate the price for the required number of products for an account holder in a particular country for a given year using this formula.
ActualPrice: PriceBelow + ((PriceAbove - PriceBelow) * (The quantity required in the order detail - QuantityBelow) / (QuantityAbove - QuantityBelow))
I have spent days on this and have sought advice about this before but I am still getting very stuck.
The tables I've been working with to try and make this work are as follows:
TblAccount (primary key is AccountID, it also has a Country field which joins to the TblCountry.Code (primary key)
TblOrders (primary key is Order ID) which joins to TblAccount via the AccountID field; TblOrderDetail via the OrderID.  This table also holds the OrderDate and Recipient ID which links to a person in TblContact - I don't need that here but will need it later to generate an invoice 
TblOrderDetail (primary key is DetailID) which joins to TblOrders via OrderID field; TblProducts via ProductID field, and holds the Quantity required as well as the product
TblProducts (primary key is ProductCode) which as well as joining to TblOrderDetail, also joins to TblPrice via the Product field
TblPrices links to the TblProducts (as you have just read).  I've also created an Alias for the TblCountry (CountryAliasForProductCode) so I can link it to the TblPrices to show the country link. I'm not sure if I needed to do this - it doesn't work if I do or I don't do it, so I seek guidance again here.
This is the code I've been trying to use (and failing) to get my price and quantity steps above and I hope to replicate it, making a couple of tweaks to get the steps below:
SELECT MIN(TblPrices.stepquantity) AS QuantityAbove, MIN(TblPrices.StepPrice) AS PriceAbove, TblOrders.OrderID, TblOrders.OldOrderID, TblOrders.AccountID, TblOrders.OrderDate, TblOrders.RecipientID, TblOrders.OrderStatus, TblOrderDetail.DetailID, TblOrderDetail.Product, TblOrderDetail.Quantity
FROM (TblCountry INNER JOIN ((TblAccount INNER JOIN TblOrders ON TblAccount.AccountID = TblOrders.AccountID) INNER JOIN (TblOrderDetail INNER JOIN TblProducts ON TblOrderDetail.Product = TblProducts.ProductCode) ON TblOrders.OrderID = TblOrderDetail.OrderID) ON TblCountry.Code = TblAccount.Country) INNER JOIN (TblCountry AS CountryAliasForProduct INNER JOIN TblPrices ON CountryAliasForProduct.Code = TblPrices.CountryCode) ON TblProducts.ProductCode = TblPrices.Product
WHERE (StepQuantity >= TblOrderDetails.Quantity)
AND (TblPrices.CountryCode = TblAccount.Country)
AND (TblOrderDetail.Product = TblPrices.Product)
AND (DATEPART('yyyy', TblPrices.DateEffective) = DATEPART('yyyy', TblOrders.OrderDate));
I've also tried...
I've even tried going back to basics and trying again to generate the steps below in 1 query, then try the steps above in another and finally, create the final calculation in another query.
This is what I have been trying to get my prices and quantities below:
SELECT Max(StepQuantity) AS quantity_below, Max(StepPrice) AS price_below, TblOrderDetails.Quantity, TblAccounts.Country
FROM 
(TblProducts INNER JOIN TblPrices ON TblProducts.ProductCode = TblPrices.Product)
(TblOrderDetail INNER JOIN TblProducts ON TblOrderDetail.Product = TblProducts.ProductCode)
(TblOrders INNER JOIN TblOrderDetail ON TblOrders.OrderID = TblOrderDetail.OrderID)
(TblAccount INNER JOIN TblOrders ON TblAccount.AccountID = TblOrders.AccountID),
WHERE (((TblPrices.StepQuantity)<=(TblOrderDetail.Quantity)) AND ((TblPrices.CountryCode)=([TblAccounts].[country])) AND ((TblPrices.Product)=([TblOrderDetail].[product])) AND ((DatePart('yyyy',[TblPrices].[DateApplicable]))=(DatePart('yyyy',[TblOrders].[OrderDate]))));
You may be able to see glaring errors in this but I'm afraid I can't.  I've tried re-jigging it and I'm getting nowhere.
I need to be able to tie the information in to the OrderDetail records as the price generated will need to be added to a financial transactions table as a debit amount and will show as an amount owing on statements.
I'm really not very good at SQL.  I've read and worked though several self-study books and I have asked part of this question before; but I really am struggling with it.  If anyone has any ideas on how to proceed, or even where I've gone wrong with my code, I'd be delighted, even if you tell me I shouldn't be using SQL. For the record, I originally posted this question on a different forum under Visual Basic. Responses from that forum brought me to SQL - however, anything that works would be good!
I've even tried, using Excel, concatenating the Year&Product&Country&Quantity to get a unique product code, interpolating the prices for every quantity between 1 and 1000 for each product, country and year and bringing them into a TblProductsAndPrices table. In Access, I created a query to concatenate the Year(of order date from tblOrders)&Product(of tblorderdetails)&Country(of tblAccount) in order to get the required product code for the order. Another query would find a price for me. However, any product code that doesn't appear on the list (such as where a quantity isn't listed in the tblProductsAndPrices as it is larger than the highest price increment) doesn't have a price.
If there was a workable solution to what I've just described that would generate a price for everything, then I'd be so pleased.
I'd really like to be able to generate an order for any quantity of any product for any account based in any country on any date and retrieve a price which will be used to "debit" a financial account in the database, who in a transaction history for an account and appear on statements. I'd also like to be able to do an ad-hoc price check on the spot.
Thank you very much for taking the time to read this.  I really appreciate it. If you could offer any help or words of encouragement, I'd be very grateful.
Many thanks
Karen
Maybe no one thinks on an easy solution to the problem, since not all minds work in database thinking.
Easy solution: Create one view that gives all calculated values, not only the final one you need, each one as a column. Then you can use such view in a relation view and use on some rows one of the values and on other rows other values, etc.
How to think is simple, think in reverse order, instead of thinking "if that then I need to calculate such else I need this other", think as "I need "such" and I need "this other", both are columns of an intermediate view, then think on top level "if" that would be another view, such view will select the correct value ignoring the rest.
Never ever try to solve all in one step, that can be a really big headache.
Pros: You can isolate calculated values (needed or not), sql is much more easy to write and maintain.
Cons: Resources use is bigger than minimal, but most of times that extra calculated values does not represent a really big impact.
In terms of tutorial out there: Instead of a Top-Down method, use a Down-Top method.
Sometimes it is better (with your example) to calculate all three values (you write sentences on bold) ignoring the if part, and have all three possible values for your order and after that discard the ones not wanted, than trying to only calculate one.
Trying to calculate only one is thinking as a procedural programming, when working with databases most times one must get rid of such thinking and think as reverse, first do the most internal part of such procedural programming to have all data collected, then do the external selection of the procedural programing.
Note: If one of the values can not be calculated, just generate a Null.
I know it is hard to think on First in, last out (Down-Top) model, but it is great for things as the one you want.
Step1 (on specific view, or a join from one view per calculation):
Calculate column 1 as price for the current product, country and
year
Calculate column 2 as calculate a pro-rated price as if 1000
Calculate column 3 as calculate a pro-rated price as if 100
Calculate column 4 as etc
Calculate column N as etc
Step 2 (Another view, the one you want):
Calculate the if part, so you can choose adequate column from previous view (you can use immediately if or a calculated auxiliary field).
Hope you can follow theese way of thinking, I have solved a lot of things like that one (and more complex) thinking in that way, but it is not easy to think as that, needs an extra effort.

Calculate First Time Buyer and Repeating Buyers using MAQL Queries in GOOD-DATA platform

I have been recently working with GOOD-DATA platform. I don't have that much experience in MAQL, but I am working on it. I did some metric and reports in GOOD-DATA platform. Recently I tried to create a metric for calculating Total Buyers,First Time Buyers and Repeating Buyers. I created these three reports and working perfect.But when i try to add a order date parent filter the first time buyers and repeating buyers value getting wrong. please have Look at Following queries.
I can find out the correct values using sql queries.
MAQL Queries:
TOTAL ORDERS-
SELECT COUNT(NexternalOrderNo) BY CustomerNo WITHOUT PF
TOTAL FIRSTTIMEBUYERS-
SELECT COUNT(CustomerNo) WHERE (TOTAL ORDER WO PF=1) WITHOUT PF
TOTAL REPEATINGBUYERS-
SELECT COUNT(CustomerNo) WHERE (TOTAL ORDER WO PF>1) WITHOUT PF
Can any one suggest a logic for finding these values using MAQL
It's not clear what you want to do. If you could provide more details about the report you need to get, it would be great.
It's not necessary to put "without pf" into the metrics. This clause bans filter application, so when you remove it, the parent filter will be used there. And you will probably get what you want. Specifically, modify this:
SELECT COUNT(CustomerNo) WHERE (TOTAL ORDER WO PF>1) WITHOUT PF
to:
SELECT COUNT(CustomerNo) WHERE (TOTAL ORDER WO PF>1)
The only thing you miss here is "ALL IN ALL OTHER DIMENSIONS" aka "ALL OTHER".
This keyword locks and overrides all attributes in all other dimensions—keeping them from having any affect on the metric. You can read about it more in MAQL Reference Guide.
FIRSTTIMEBUYERS:
SELECT COUNT(CustomerNo)
WHERE (SELECT IFNULL(COUNT(NexternalOrderNo), 0) BY Customer ID, ALL OTHER) = 1
REPEATINGBUYERS:
SELECT COUNT(CustomerNo)
WHERE (SELECT IFNULL(COUNT(NexternalOrderNo), 0) BY Customer ID, ALL OTHER) > 1

sql SUM value incorrect when using joins and group by

Im writing a query that sums order values broken down by product groups - problem is that when I add joins the aggregated SUM gets greatly inflated - I assume its because its adding in duplicate rows. Im kinda new to SQL, but I think its because I need to construct the query with sub selects or nested joins?
All data returns as expected, and my joins pull out the needed data, but the SUM(inv.item_total) AS Value returned is much higher that it should be - SQL below
SELECT so.Company_id, SUM(inv.item_total) AS Value, co.company_name,
agents.short_desc, stock_type.short_desc AS Type
FROM SORDER as so
JOIN company AS co ON co.company_id = so.company_id
JOIN invoice AS inv ON inv.Sorder_id = so.Sorder_id
JOIN sorder_item AS soitem ON soitem.sorder_id = so.Sorder_id
JOIN STOCK AS stock ON stock.stock_id = soitem.stock_id
JOIN stock_type AS stock_type ON stock_type.stype_id = stock.stype_id
JOIN AGENTS AS AGENTS ON agents.agent_id = co.agent_id
WHERE
co.last_ordered >'01-JAN-2012' and so.Sotype_id='1'
GROUP BY so.Company_id,co.company_name,agents.short_desc, stock_type.short_desc
Any guidence on how I should structure this query to pull out an "un-duplicated" SUM(inv.item_total) AS Value much appreciated.
To get an accurate sum, you want only the joins that are needed. So, this version should work:
SELECT so.Company_id, SUM(inv.item_total) AS Value, co.company_name
FROM SORDER so JOIN
company co
ON co.company_id = so.company_id JOIN
invoice inv
ON inv.Sorder_id = so.Sorder_id
group by so.Company_id, co.company_name
You can then add in one join at a time to see where the multiplication is taking place. I'm guessing it has to do with the agents.
It sounds like the joins are not accurate.
First suspect join
For example, would an agent be per company, or per invoice?
If it is per order, then should the join be something along the lines of
JOIN AGENTS AS AGENTS ON agents.agent_id = inv.agent_id
Second suspect join
Can one order have many items, and many invoices at the same time? That can cause problems as well. Say an order has 3 items and 3 invoices were sent out. According to your joins, the same item will show up 3 times means a total of 9 line items where there should be only 3. You may need to eliminate the invoices table
Possible way to solve this on your own:
I would remove all the grouping and sums, and see if you can filter by one invoice produce an unique set of rows for all the data.
Start with an invoice that has just one item and inspect your result set for accuracy. If that works, then add another invoice that has multiple and check the rows to see if you get your perfect dataset back. If not, then the columns that have repeating values (Company Name, Item Name, Agent Name, etc) are usually a good starting point for checking up on why the duplicates are showing up.

minimizing runtime of calculating several ranks (based on grades) on large sql tables, how short can it get (ms access)

I've been stuck with the rather famous problem of ranking students by grade for a couple weeks now, and while I've learned much, I still haven't solved my problem (the ranks are generated, but the process is too slow):
I have a large table (320,000 rows) that contains the student codes (serves as an identifier, instead of their names), the students classroom, the test , the tests date, the subject, the question number and the students grade on that question. This table is the base for everything else that is calculated and its size makes all these calculations very very slow, to the point where I find me almost breaking everything here at work.
First, some intel on the school (very little info, required to understand the problem)
Here at the school we have weekly tests over several subjects. The school is also separated in classrooms with different purposes (one is focused on math, physics and chemistry, another one is focused on biology, and the last one focuses on history, Portuguese and geography). But they all do the same tests every week.
What we want to do is calculate the standard deviation for each question for everyone in the school (not per-classroom) and the average grade per question (also for everyone in the school), and then generate the following ranks (all of them per date):
-Rank per subject per classroom (with "raw" grades), Rank per subject considering the whole school (with "raw" grades) and Rank per subject considering the whole school (using normalized grades, with the standard deviation per question and the average grade per question information)
-The same ranks that were mentioned above, but not per Subject, considering instead all subjects
As you can see, after calculating the average grades and the standard deviations, we still need to calculate the sums of the grades on each question, and rank according to these sums (the actual subject/test grades). I've attacked this problem in a few ways:
1)
Created two tables, one with the grades per student per subject (fields: Students code, Students classroom, Date of test, Subject, Grade, Normalized Grade, Rank in Classroom, Rank in School, Rank in School using normalized grades) and another with the grades per student per test (all subjects taken into account, fields: Students code, Students classroom, Date of test, Grade, Normalized Grade, Rank in Classroom, Rank in School, Rank in School using normalized grades).
The insertion of data in these tables takes about 50 seconds
Then, I tried using SQL to rank, however, I ran into some problems:
-Access has no ROW_NUMBER or RANK functions, and thus I have to use queries with COUNT, like (below is just a simplified version):
SELECT 1+(SELECT Count(*) FROM grades_table_per_subject t2 WHERE
t2.Grade > t1.Grade AND t1.Date=t2.Date AND t1.Subject=t2.Subject) AS [Global Rank],
1+(SELECT Count(*) FROM grades_table_per_subject t3 WHERE t3.Grade > t1.Grade AND
t3.Date=t1.Date AND t3.Subject=t1.Subject AND t3.Classroom=t1.Classroom) AS
[Rank in classroom] FROM grades_table_per_subject;
There still is the rank with the normalized grades in the query above, but I omitted it.
The table grades_table_per_subject has about 45,000 lines and this query takes more than 15 minutes here, even with indexing (tried many different index combinations, even some odd ones when I saw that the ones that should work didn't).
I also tried to ORDER BY Count() DESC the inner selects, but I hit ctrl+break after 7 minutes and no results.
2)
Added the following fields to the tables above:
Rank in Classroom, Rank in School, Rank in School using normalized grades
Then I tried using VBA with DAO and manually update the Rank fields, running the following code (simplified version):
Set rs = CurrentDb.OpenRecordset("SELECT Classroom, Date, Subject, Grade, [Rank in classroom] FROM
grades_table_per_subject ORDER BY Date, Classroom, Subject, Grade DESC;", dbOpenDynaset)
...
...
rs.movefirst
i=1
While Not rs.eof
'Verifies if there was a change on either one of Subject, Classroom, Date and if so:
...
i = 1
...
rs.Edit
rs![Rank in classroom]=i
rs.Update
i = i + 1
rs.movenext
Wend
rs.close
This obviously builds only one of the ranks (in this case per subject per classroom), and it takes alone 3min 10sec.
I verified that it takes so long due to the writes on the table (rs.Edit and rs.Update are the culprits, commenting them makes the whole thing run in only 4 seconds), but I need the ranks written to the table to generate an access report later.
FINALLY:
I could generate all the ranks once and make ways for the users to access all the data very quickly, but the idea is that everything should be calculated on-the-fly. The times we have achieved, however, make this impossible.
Overall, the question to be asked is the following:
-Is there a way to calculate the ranks shown above through an Access Query under 10 seconds, or to use VBA and calculate-insert these ranks to the table in a similar time considering the size of the tables used here?
Also, I would love to see a list of efficient ranking algorithms, so that even if I can't do everything quickly, I can improve it as much as possible.
I could generate all the ranks once and make ways for the users to access all the data very quickly, but the idea is that everything should be calculated on-the-fly.
Why?
Why bother regenerating the same data over and over? It's most likely preferable to generate these statistics when the data changes and just look them up every other time. Redoing work you've already done whenever somebody wants to check something is just silly.
I just saw you say ms access only
so ignore this answer -- or consider moving to a real DB if you want to be able to do this type of power processing.
original answer below
I don't have access to your test data, but how fast does this run?
SELECT RANK () OVER (PARTITION BY [Date],[Subject] ORDER BY Grade) AS [Global Rank],
RANK () OVER (PARTITION BY [Date],[Subject], Classroom ORDER BY Grade) AS [Rank in classroom]
FROM grades_table_per_subject
My guess is you are not going to be able to beat SQL Servers ranking speed in VBA, if this is not fast enough then you need to look in the profiler and see what indexes it suggests you make.