so i have a question regarding Average problem. suppose i have 5 transactions, with each transaction having multiple items and each item has their own Quantity Value. I want to search Average Quantity per transaction. Note that in my ERD Design, there are 2 separate tables which are HeaderTransaction and TransactionDetail.
If i use AVG() function, then it will be very weird as i.e.
first transaction:
5 eggs
2 sausages
Second transaction :
3 eggs
10 sausages.
AVG will work like (5+2+3+10)/4 what i want is ((5+2)+(3+10))/2
my current solution is
SELECT SUM(ItemQuantity)/COUNT (DISTINCT (SalesTransactionId)) as[aveg]
i find it a bit rough
If i use AVG() function, then it will be very weird
Not if you AVG what you say you want to average, which is the number of items per transaction
SELECT AVG(num_of_items_in_transaction)
FROM
(SELECT SUM(amount_of_item) as num_of_items_in_transaction FROM detail GROUP BY tran_id)
The inner query groups per transaction, and counts the total number of items. The outer query then averages these totals. The point is that because you need to first do an operation group by transaction, then another operation group by something else (the whole dataset) you can't combine the grouping operations into a single step - it has to be multi stage, because you're feeding the output from one stage into the input of another. SELECT AVG(SUM(amount)) .. GROUP BY ??? - what would you put into the ??? to let MySQL know you wanted the SUM grouping by one thing but the AVG grouping by another? (You can't)
You generally need to do it as a two step reduction if you're not windowing, but that's conceivably a implicit multi-step operation anyway
I don't think there's any need to change what you have (sum of amounts divided by count of transactions), I just wanted to point out why it probably wasn't working as you expected
Related
Here I have this query that finds out the drop percentage of a bunch of clients based on the orders they have received(i.e. It finds the percentage difference in orders by comparing the current month with the previous month). What I want to achieve here is to have a field where I can see the clients who had 4 months continuous drop, 3 months drop, 2 months drop, and 1 month drop.
I know, it can only be achieved by comparing the last 4 months using the lag function or sub queries. can you guys pls help me out on this one, would appreciate it very much
select
fd.customers2, fd.Month1, fd.year1, fd.variance, case when
(fd.variance < -0.00001 and fd.year1 = '2022.0' and fd.Month1 = '1')
then '1month drop' else fd.customers2 end as 1_most_host_drop
from
(SELECT
c.*,
sa.customers as customers2,
sum(sa.order) as orders,
date_part(mon, sa.date) as Month1,
date_part(year, sa.date) as year1,
(cast(orders - LAG(orders) OVER(Partition by customers2 ORDER BY
year1, Month1) as NUMERIC(10,2))/NULLIF(LAG(orders)
OVER(partition by customers2 ORDER BY year1, Month1) * 1, 0)) AS variance
FROM stats sa join (select distinct
d.id, d.customers
from configer d
) c on sa.customers=c.customers
WHERE sa.date >= '2021-04-1'
GROUP BY Month1, sa.customers, c.id, year1,
c.customers)fd
In a spirit of friendliness: I think you are a little premature in posting this here as there are several issues with the syntax before even reaching the point where you can solve the problem:
You have at least two places with a comma immediately preceding the word FROM:
...AS variance, FROM stats_archive sa ...
...d.customers, FROM config d...
Recommend you don't use VARIANCE as an alias (it is a system function in PostgreSQL and so is likely also a system function name in Redshift)
Not super important, but there's no need for c.* - just select the columns you will use
DATE_PART requires a string as the first parameter DATE_PART('mon',current_date)
I might be wrong about this, but I suspect you cannot use column aliases in the partition by or order by of a window function. Put the originating expressions there instead:
... OVER (PARTITION BY customers2 ORDER BY DATE_PART('year',sa.date),DATE_PART('mon',sa.date))
LAG has three parameters. (1) The column you want to retrieve the value from, (2) the row offset, where a positive integer indicates how many rows prior to the current row you should retrieve a value from according to the partition and order context and (3) the value the function should return as a default (in case of the first row in the partition). As such, you don't need NULLIF. So, to get the row immediately prior to the current row, or return 0 in case the current row is the first row in the partition:
LAG(orders,1,0) OVER (PARTITION BY customers2 ORDER BY DATE_PART('year',sa.date),DATE_PART('mon',sa.date))
If you use 0 as a default in the calculation of what is currently aliased variance, you will almost certainly run into a div/0 error either now, or worse, when you least expect it in the future. You should protect against that with some CASE logic or better, provide a more appropriate default value or even better, calculate the LAG with the default 0, then filter out the 0 rows before doing the calculation.
You can't use column aliases in the GROUP BY. You must reference each field that is not participating in an aggregate in the group by, whether through direct mention (sa.date) or indirectly in an expression (DATE_PART('mon',sa.date))
Your date should be '2021-04-01'
All in all, without sample data, expected results using the posted sample data and without first removing syntax errors, it is a tall order to attempt to offer advice on the problem which is any more specific than:
Build the source of the calculation as a completely separate query first. Calculate the LAG in that source query. Only when you've run that source query and verified that the LAG is producing the correct result should you then wrap it as a sub-query or CTE (not sure if Redshift supports these, but presumably) at which point you can filter out the rows with a zero as the denominator (the first month of orders for each customer).
Good luck!
I'm trying to get the subtotal for WO Total Cost but where that WO Number appears multiple time in the Division group I don't want to add them together. I only want the cost for WO Total to appear once in the report for each division. As it is now the work order number 40321 has a WO Total Cost of $362.24 and because it appears for each laborer it is added in the total represented in the green total line. Can anyone tell me how to prevent summing WO Total Cost where the WO Number appears more than once in a Division group?
Thanks for the reply Alan. Here is a screenshot of the design view in Report Builder. So, what I'd like to see is a summing of only the WO Total cost once for each report for each WO Number. The WO Total cost represents all costs (Equipment, Labor, and Material) for a given work order. So, at it stands now, the report is summing work order 40321 three times and giving us an incorrect total.
I've grouped the report by Division so we can see costs for a particular division for a given time period. I've also grouped the report by Person Labor so we can see how much a particular laborer is costing us for a given period of time.
What I don't know is how to do is prevent the report from summing the WO Total cost where the WO Number appears multiple times in the results for a given division.
I'm not sure you will be able to do this directly in the report design without some convoluted custom code. Other people maybe will have more inspiration but I can't see how to do it. You cannot even look for unique values to sum as it's possible that 2 WO numbers could have the same cost.
Without knowing what your dataset/database tables look like I can't offer much more but here are some approaches I might consider if I was doing this, both mean doing the work in SQL.
**Option 1 **
I would consider calculating this in your query in a new column which should be fairly simple. Each row would end up with the total amount for the AssetGroup. So, if you had 20 records for AssetGroup A and 10 records for AssetGroup B than all the 'A' records would have the same 'AssetGroupTotalWOAmount' value and all the 'B' records would have the same 'AssetGroupTotalWOAmount' value.
Then in your report you can simply use =FIRST(Fields!AssetGroupTotalWOAmount.Value) to get the correct number. This will get the first value within the scope of the expression, so your case within the ASSETGROUP row group.
**Option 2 **
Create a separate dataset with just the amounts you need, with a single record per AssetGroup . So probably just two columns, AssetGroup and AssetGroupTotalWOAmount
In the report you could then use a LOOKUP to get the correct value.
I'm relatively new to SQL but have learned some cool stuff. I'm getting results that don't make sense. I've got a query with several subqueries and what-not but I have a windowed function that isn't working like I'm expecting.
The part that isn't working is this (simplified from the 300 line query):
SELECT AVG(table.sales_amount)
OVER (PARTITION BY table.month, table.sales_rep, table.department)
FROM table
The problem is that when I pull the data non aggregated I get a value different (107) than the above returns (95).
I've used windowed functions for COUNT and SUM and they work fine, but AVG is acting strangely. Am I missing something about how this works with AVG?
The subquery that table is a standin for looks like:
sales_rep, month, department, sales_amount
1, 2017-1, abc, 125.20
1, 2017-2, abc, 120.00
2, 2017-1, def, 100.00
...etc
Working out of Sql Server Management studio
SOLVED: I did finally figure it out, the results i was joining this subquery to had the sales rep multiple times in a month selling objects A&B which caused whoever sold both to be counted twice. whoops, my bad.
The results that you get should be the same values as in:
SELECT AVG(table.sales_amount)
FROM table
GROUP BY table.month, table.sales_rep, table.department;
Of course, the rows will be different. You need to match up the three key columns.
Based on your sample data, it looks like the partitioning keys uniquely define each row. Perhaps you really intend:
SELECT AVG(table.sales_amount) OVER () as overall_average
FROM table;
EDIT:
For the departmental average:
SELECT AVG(table.sales_amount) OVER (partition by table.department) as department_average
FROM table;
After some bruteforcing of potential errors I finally figured out the issue. I was joining that subquery to the another which had multiple instances of a sales_rep in a given month (selling objects a & b) which caused the average of those with sales of both objects to be counted twice instead of once.
so sales rep 1 sold objects a & b which made his avg count as 66% of the dept avg instead of 50%, and sales rep 2 count only 33%.
Am trying to rank groups by summing a field (not a calculated column) for each group so I get a static answer for each row in my table.
For example, I may have a table with state, agent, and sales. Sales is a field, not a measure. There can be many agents within a state, so there are many rows for each individual state. I am trying to rank the states by total sales within each state.
I have tried many things, but the ones that make the most sense to me are:
rankx(CALCULATETABLE(Table,allexcept(Table,Table[AGENT]),sum([Sales]),,DESC)
and
=rankx(SUMMARIZE(State,Table[State],"Sales",sum(Table[Sales])),[Sales])
The first one is creating a table where it sums sales without grouping by Agent. and then tries to rank based on that. I get #ERROR on this one.
The second one creates a table using SUMMARIZE with only sum of Sales grouped by state, then tries to take that table and rank the states based on Sales. For this one I get a rank of 1 for every row.
I think, but am not sure, that my problem is coming from the sales being a static field and not a calculated measure. I can't figure out where to go from here. Any help?
Assuming your data looks something like this...
...have you tried this:
Ranking Measure = RANKX(ALL('Table'[STATE]),CALCULATE(SUM('Table'[Sales])))
The ALL('Table'[STATE]) says to rank all states. The CALCULATE(SUM('Table'[Sales])) says to rank by the sum of their sales. The CALCULATE wrapper is important; a plain SUM('Table'[Sales]) will be filtered to the current row context, resulting in every state being ranked #1. (Alternatively, you can spin off SUM('Table'[Sales]) into a separate Sales measure - which I'd recommend.)
Note: the ranks will change based on slicers/filters (e.g. a filter by agent will re-rank the states by that agent). If you're looking for a static rank of states by their total sales (i.e. not affected by filters on agent and always looking at the entire table), then try this:
Static Ranking Measure = CALCULATE([Ranking Measure], ALLEXCEPT('Table', 'Table'[State]))
This takes the same ranking measure, but removes all filters except the state filter (which you need to leave, as that's the column you're ranking by).
I did figure out a solution that's pretty simple, but it's messier than I'd like. If it's the only thing that works though, that's okay.
I created a new table with each distinct state along with a sum of sales then just do a basic RANKX on that table.
I am trying to determine medicare costs per capita in each State using Google BigQuery.
I already have population numbers for each state (represented as Total) as well as total medicare cost (Cost) in each state. I am trying to divide total cost by the population of each state.
At the moment the query runs, however every entry is null. I am admittedly a beginner with both BigQuery and SQL.
Here is my code:
SELECT State, Cost / Total AS PerCapita
FROM medicare.population, medicare.CostByState
GROUP BY State, PerCapita;
One thing that may be causing issues is that the 'State' column exists in both 'population' and 'CostByState' tables. Not sure how to address this.
Here are my tables:
population
CostByState
You seem to have data with one row per state, so you only need a JOIN.
SELECT p.State, cbs.Cost / p.Total AS PerCapita
FROM medicare.population p JOIN
medicare.CostByState cbs
ON p.state = cbs.state;
You would only need aggregation if the tables had multiple rows per state.
Indeed you need to join that.
If the relationship is one to one you're good. But if not you may need some type of aggregation
sum(cost)/sum(total) as per_capita