SQL Query to Find Top 80% of Contribution

SQL Query to Find Top 80% of Contribution - sql

I'm looking to find out how to write a SQL query that looks for store locations that contribute towards 80% of inventory adjustments and what their inventory accuracy calculation is. I'm not quite sure how to go about doing it. So far I have the total absolute value of their adjustments which will be used to base the calculation off of. Here's what I have so far.Any help would be appreciated.
SELECT sum(abs(Details.ValueDifference)) As writeoff, (sum(Details.NumberofPartsCounted) - sum(Details.NumberofPartsCountedwithErrors))/(sum(Details.NumberofPartsCounted)) As Accuracy
FROM Details;

Alright, well, I feel like I'm not 100% confident in terms of what you're looking for, but here's my best guess. It looks like you want to select location number along with what you're referring to as "writeoff" and "Accuracy" in your query. You'll have to group by the location number in order to get the sums for each location.
It sounds like the writeoff results column is supposed to help tell you how much a particular location has contributed towards total inventory adjustments? Assuming that's true, I would also order by writeoff in descending order so you can see the rows with the highest values for this first.
SELECT N_LOCATION
,sum(abs(ValueDifference)) As writeoff
,(sum(NumberofPartsCounted) - sum(Details.NumberofPartsCountedwithErrors))/(sum(NumberofPartsCounted)) As Accuracy
FROM Details
GROUP BY
N_LOCATION
ORDER BY
writeoff DESC;
Honestly, I think the best way to accomplish what I think you're trying to do would be with a stored procedure. I would use the query above and also a query to select the sum of the absolute value of the value difference for all locations. Then I would save the total for all locations in a variable to be used while looping through the results of the first query. That way, you can use the "writeoff" for the location with the total for all locations to find the percentage for each location.
I'm guessing you would then add locations to a result set until the locations that you've added have percentages totaling 80%? Not sure exactly how you're setting the rule for that 80%, but hopefully you can adjust as needed.

Related

Internal db logic/operation to group/compress result

I have a CrateDB table storing various information for zipcodes. It contains around 30k zipcodes, and I need my query to return certain profiling information for all zipcodes at once. I understand that typically it wouldn't be feasible, but since I only need ballpark information and many zipcodes are consecutive, I think an optimization is possible.
For example, if I wanted to profile population, a grouped result such as this would work for me:
group 1 (0-1000): 00000-02000,02004-02010,02012
group 2 (1001-3000): ...
...
The populations and groups above are fake, but the idea should hold. Basically, group profiled category into buckets, assign zipcodes to correct bucket, and further reduce size by using range representation. I could settle for a predefined number of groups or have group buckets defined by request/query itself. This would hopefully reduce the response from something that would be too large for a single query to one that's manageable.
Is it possible to write a cratedb function to do something similar to avoid bandwidth issues from having this grouping done on a different service/container/vm?

You could probably crate groups on the fly or as columns if you wish with a regex, I have done this on a 23M row table and group by that.
In my example regex grouping and AVG took around 30s, but this is very subjective to my hardware.
Something like this would probably work as a general pointer
SELECT avg (--yourColumn--), regexp_matches(--yourColumn--, '--your regex--','i')[1]
FROM "doc"."--yourTable--"
group by regexp_matches(postcode, '--your regex--','i')[1]
order by regexp_matches(postcode, '--your regex--','i')[1]
You could use over windowed function but this doesn't yet have the full SQL support for partitioning etc.

SSAS: Get a distinct Count based on two different elements

I need to create a distinct count of people who fall into two different dimensions.
One is called [Student Research Degree].[Is Research Degree Current].&[Yes]
The other is called [Student Research Degree].[Is Research Degree Complete].&[Yes]
If one or the other are Yes, or both, then I need to count the record.
If both are no, I can exclude it. I have a row counter measure called [Measures].[Student ID Distinct Count Hidden] already in place.
If I use just one element with the measure, I get the right answer, but if I try to cross join the other elements, I get a result of NULL.
eg
AGGREGATE(CROSSJOIN(
[Student Research Degree].[Is Research Degree Current].&[Yes]
,[Student Research Degree].[Is Research Degree Complete].&[Yes]
), [Measures].[Student ID Distinct Count Hidden])
I am aware that I can just land an extra value in the ETL, and have SQL do the work, and in the end this might be the solution. Is there a way of doing an OR statement on this sort of thing?

No, the TUPLE of &[YES], &[YES] doesn't create an OR situation, where I want [NO]s when the other is yes.
I started looking at a subtractive approach where I started with the ALL set, and removed the distinct count of invalid combinations in a tuple and subtracted that from the grand total. This approach did work, but ONLY because the data allowed for it. If a person could have been in multiple combinations, this wouldn't have worked.
I'm currently testing that approach with the rest of the cube. By all appearances this works perfectly, but I will go with ETL if any bugs or mismatches can be proven.

Derived Table Error: "The multi-part identifier could not be bound"

I'm having trouble getting the results I would like from the query I've built. The overall goal I'm trying to accomplish is to get the first odometer reading of the month and the last odometer reading of the month for a specific vehicle. I would then like to subtract the two to get total miles driven for that month. I figured a derived table with window functions would best help to accomplish this goal (see example SQL below).
SELECT
VEHICLE_ID2_FW
FROM
(SELECT
VEHICLE_ID2_FW,
LOCATION_CODE_FW,
MIN(ODOMETER_FW) OVER(PARTITION BY YEAR(DATE_FW), MONTH(DATE_FW)) AS MIN_ODO,
MAX(ODOMETER_FW) OVER(PARTITION BY YEAR(DATE_FW), MONTH(DATE_FW)) AS MAX_ODO
FROM
GPS_TRIPS_FW) AS G
I keep running into an issue where the derived table's query, by itself, runs and
works. However, when I bracket it in the FROM clause it shoots back an the error
The multi-part identifier could not be bound
Hoping that I could get some help figuring this out and maybe finding an overall better way to accomplish my goal. Thank you!

Odometers only increase (well, that should be true). So just use aggregation:
select VEHICLE_ID2_FW, year(date_fw), month(date_fw),
min(ODOMETER_FW), max(ODOMETER_FW),
max(ODOMETER_FW) - min(ODOMETER_FW) as miles_driven_in_month
from GPS_TRIPS_FW
group by VEHICLE_ID2_FW, year(date_fw), month(date_fw);
This answers the question that you asked. I don't think it solves your problem, though, because the total miles driven per month will not add up to the total miles driven. The issue are the miles driven between the last record at the end of the month and the first at the beginning of the next month.
If this is an issue, ask another question. Provide sample data, desired results, and an appropriate database tag.

Statistical calculations in an Access 2010 query

currently we're building a database to track different factories' pollutant emissions. Now a query is needed that gives us information about relative quantities. Somehow I feel this should be straight forward but I have had no success implementing it in SQL.
I'm starting from a working query that returns the following fields:
PRODUCTION_YEAR, COMPANY, PRODUCT_CATEGORY, POLLUTANT, TOTAL_EMISSIONS, SHARE
TOTAL_EMISSIONS contains the total emissions for each company in a particular year and product category. SHARE is a computed field and contains the contribution (as a fraction) of each company to that year's overall emissions of that particular pollutant in that particular product category.
Now the task is to count the factories contributing to each pollutant. I arrived at this:
SELECT PRODUCTION_YEAR, POLLUTANT, PRODUCT_CATEGORY, Count(COMPANY)
FROM theQuery
GROUP BY PRODUCTION_YEAR, POLLUTANT, PRODUCT_CATEGORY;
However, now our client wants something more sophisticated: count only the biggest polluters who contribute 95% of emissions. In a script, I'd probably just have the pollution percentages in each category sorted ascendingly, then walk the dataset, sum up the shares and only start counting after reaching 5%. Doing it in SQL, no idea.
My first step (adding a SUM(SHARE) field to the new query) already resulted in errors ("expression not included in aggregate function", roughly translated, not sure what to make of it because all the expressions were indeed included). Is there even a way to do this in an SQL query, or am I wasting my time and would be better off just writing some VBA?
Thanks for any input!
Best,
Ben

Gord's method (see link in comment) works well for this task.

Multiplying Quantity * Price in Calculated Member

I know MDX is used for much more sophisticated math, so please forgive the simplistic scenario, but this is one of my first Calculated members.
When I multiply Price x Quantity, the AS cube's data browser has the correct information in the leaf elements, but not in any of the parents. The reason seems to be that I want something like (1 * 2) + (2 * 3) + (4 * 5) and not (7 * 10) which think I am getting as a result of how the Sum is done on columns.
Is the IsLeaf expression intended to be used in these circumstances? Or is there another way? If so, are there any examples as simple as this I can see?
This Calculated member that I tried to create is just this:
[Measures].[Price]*[Measures].[Quantity]
The result for a particular line item (the leaf) is correct. But the results for, say, all of april, is an incredibly high number.
Edit:
I am now considering that this might be an issue regarding bad data. It would be helpful though if someone could just confirm that the above calculated member should be work under normal circumstances.

Here it is a blog post dealing with this particular problem: Aggregating the Result of an MDX Calculation Using Scoped Assignments.

For leaf level computations resulting in something that can then be summed, MDX is rather complex and slow.
The simplest way to do what you want to achieve would be to make this a normal measure, based on the Price x Quantity calculation defined in the data source view.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas