Sum with two conditions in Excel Table - excel-2007

I'm new to this forum, so if I make mistakes, tell me so I can learn ;).
So the question is, I want to make a summation of a column of a table in Excel, but only if it complies with two conditions. Table1 has 3 columns: Col1 contains a Date, Col2 a price and Col3 a catagory in which the price is logged.
I want the sum of all prices, for which the date falls within a certain month, and the Category complies with a choosen Category.
The code for both individual requirements works, and looks like this:
{=SUM(IF(MONTH(Table1[Date])=MONTH(A3);Table1[Price];0))}
{=SUM(IF(Table1[Category]="Category1";Table1[Price];0))}
However, the combined sum, =SUM(IF(AND(MONTH(Table1[Date])=MONTH(A3);Table1[Category]="Category1");Table1[Price];0)) does not work.
Do you know what I do wrong?
Thanks in advance

I think not really "do wrong" though perhaps a poor choice of approach (in my opinion, I happen not to be a fan of structured references). It is just that AND here does not return an array, only either TRUE (when the result would be the sum of all prices) or, much more likely FALSE (any one condition does not match, when the result is 0).
Instead I would recommend a PivotTable.

Related

Selection Issue

So I have this QVD where records seize to update incrementally, the last record is dated 25/04/2022. However, I enable to generate new dates to calculate the interest accrued with the expression; RANGESUM(BELOW(SUM(DailyInterest),0,NOOFROWS())) + SUM(Interest).
My challenge is whenever I select any subsequent date, the amount defaults to 5,263.25, that's the initial amount as of 25/04/2022.
Table without any date selection
Table with 27/04/2022 selection
Apparently, in the above scenario, the amount should read 5,409.45 and not 5,263.25.
Help me out here, please!
Table with Daily Interest column
Disclaimer, there is guesswork in this answer, it might not work.
Your formula of rangesum + below (x,x,noofrows()) access the data table and sums all(noofrows) rows with date less or equal than the dimension value.
There is no set analysis in your formula; That means, when you make a selection, the other dates' data are removed from the calculation. Obviously 5263.25 (or maybe 5263.25-73.10) is an initial value, maybe sum(Interest).
One solution would be to make a set analysis that would make DailyInterest ignore your selections:
RANGESUM(BELOW(SUM({1}DailyInterest),0,NOOFROWS())) + SUM(Interest).
That might cause your chart to re-show dates you don't need. In this case, you have the generic problem of "I want to see less rows, but not actually select less data". You can solve this by using "Hide zeroes" and using IFs on every measure to make sure they are equal to zero/null for the dimension values you don't want.
You could of course calculate the accrued in your data model, which would be simpler.

Is it possible to use SQL to show the average of some values in one column and then in subsequent columns display the individual values?

I have a bunch of data and I want the output to display an average of all the data points but also the individual data points in subsequent columns. Ideally it would look something like this:
Compound | Subject | Avg datapoint | Datapoint Experiment 1 | Datapoint Exp 2 | ...
..........XYZ......|.....ABC....|............40...............|...............20..............................|...............60...............|......
..........TUV......|.....ABC....|............30...............|...............20..............................|...............40...............|......
..........TUV......|.....DEF....|............20...............|...............10..............................|...............30...............|......
One problem I'm running in to is that I get repetitive lines of information. Another is that I have some rows pulling in info that doesn't apply, such that some of the individual datapoints in, say, row 2 would have info from subject DEF when I only want it to have info from subject ABC.
I hope this makes sense! I'm currently using inner join with a ton of where qualifiers. I'm close but not quite there. Any help is appreciate and let me know if I can provide additional info to help you help me.
The SQL language has a very strict rule requiring you to know the exact number of columns for your result set in advance, before looking at any data in your tables.
Therefore, if this average is based off a known fixed number of columns, or if the number of potential columns is reasonably small, where you can manually setup placeholders, then this will be possible. The key search terms to learn how to do this is "conditional aggregation", where you may also need to join the table to itself for each field.
Otherwise, you will need to pivot and aggregate your data in your client code or reporting tool.

How to populate all possible combination of values in columns, using Spark/normal SQL

I have a scenario, where my original dataset looks like below
Data:
Country,Commodity,Year,Type,Amount
US,Vegetable,2010,Harvested,2.44
US,Vegetable,2010,Yield,15.8
US,Vegetable,2010,Production,6.48
US,Vegetable,2011,Harvested,6
US,Vegetable,2011,Yield,18
US,Vegetable,2011,Production,3
Argentina,Vegetable,2010,Harvested,15.2
Argentina,Vegetable,2010,Yield,40.5
Argentina,Vegetable,2010,Production,2.66
Argentina,Vegetable,2011,Harvested,15.2
Argentina,Vegetable,2011,Yield,40.5
Argentina,Vegetable,2011,Production,2.66
Bhutan,Vegetable,2010,Harvested,7
Bhutan,Vegetable,2010,Yield,35
Bhutan,Vegetable,2010,Production,5
Bhutan,Vegetable,2011,Harvested,2
Bhutan,Vegetable,2011,Yield,6
Bhutan,Vegetable,2011,Production,3
Image of the above csv:
Now there is a very small country lookup table which has all possible countries the source data can come with, listed. PFB:
I want to have the output data's number of columns always fixed (this is to ensure the reporting/visualization tool doesn't get dynamic number columns with every day's new source data ingestions depending on the varying distinct number of countries present).
So, I've to somehow join the source data with the country_lookup csv and populate all those columns with default value as F. Every country column would be binary with T or F being the possible values.
The original dataset from the above has to be converted into below:
Data (I've kept the Amount field unsolved for column Type having Derived Yield as is, rather than calculating them below for a better understanding and for you to match with the formulae):
Country,Commodity,Year,Type,Amount,US,Argentina,Bhutan,India,Nepal,Bangladesh
US,Vegetable,2010,Harvested,2.44,T,F,F,F,F,F
US,Vegetable,2010,Yield,15.8,T,F,F,F,F,F
US,Vegetable,2010,Production,6.48,T,F,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
US,Vegetable,2011,Harvested,6,T,F,F,F,F,F
US,Vegetable,2011,Yield,18,T,F,F,F,F,F
US,Vegetable,2011,Production,3,T,F,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+2)/(3+3),T,F,T,F,F,F
US,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Argentina,Vegetable,2010,Harvested,15.2,F,T,F,F,F,F
Argentina,Vegetable,2010,Yield,40.5,F,T,F,F,F,F
Argentina,Vegetable,2010,Production,2.66,F,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Argentina,Vegetable,2011,Harvested,10,F,T,F,F,F,F
Argentina,Vegetable,2011,Yield,90,F,T,F,F,F,F
Argentina,Vegetable,2011,Production,9,F,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Bhutan,Vegetable,2010,Harvested,7,F,F,T,F,F,F
Bhutan,Vegetable,2010,Yield,35,F,F,T,F,F,F
Bhutan,Vegetable,2010,Production,5,F,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Bhutan,Vegetable,2011,Harvested,2,F,F,T,F,F,F
Bhutan,Vegetable,2011,Yield,6,F,F,T,F,F,F
Bhutan,Vegetable,2011,Production,3,F,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
The image of the above expected output data for a structured look at it:
Part 1 -
Part 2 -
Formulae for populating Amount Field for Derived Type:
Derived Amount = Sum of Harvested of all countries with T (True) grouped by Year and Commodity columns divided by Sum of Production of all countries with T (True)grouped by Year and Commodity columns.
So, the target is to have a combination of all the countries from source and calculate the sum of respective Harvested and Production values which then has to be divided. The commodity can be more than one in the actual scenario for any given country, but that should not bother as the summation of amount happens on grouped commodity and year.
Note: The users in the frontend can select any combination of countries. The sole purpose of doing it in the backend rather than dynamically doing it in the frontend is because AWS QuickSight (our visualisation tool), even though can populate sum on selected column filters but doesn't yet support calculation on those derived summed fields. Hence, the entire calculation of all combination of countries has to be pre-populated (very naive approach) in order to make it available in report on dynamic users selection of countries.
Also if you've any better approach (than the above naive approach mentioned in note) to solve this problem, you are most welcome to guide me. I've also posted a question on the same problem without writing my expected approach for experts to show me the path on how we can solve this kind of a problem better than this naive approach. If you want to help solve it with some other technique, you're most welcome, here is the link to that question.
Any help shall be greatly acknowledged.

SQL query for the number of cases when a value of column1 (non-unique) can't be found within any record where column2 meets a basic criteria

I am doing a beginners' SQL tutorial and I started to wonder whether a simple SQL query on this table: http://www.sqlcourse2.com/items_ordered.html could tell the number of items (also 1) which have only been purchased more items at a time, so there is no record which contains the quantity column with a value of 1 AND the item. I am really beginner at this so please try to keep it simple.
Thank you in advance!
Welcome to the fascinating world of SQL.
Well - I'm not giving you the answer, but a hint (after all, it's a training and your own thinking and finding the solution would be the best way for you to learn something new).
The way you formulate your question is somewhat puzzling.
When I combine what you ask with what is possible with SQL, the question that would make sense to me would be that you need to list (or count, I did not understand that very well) the items (or the complete rows in the table with matching item, that was not clear either), that were never sold with a quantity of 1.
If that's what you need, you will need a subselect to get all distinct items that were sold with a quantity of 1, and select the rows from your base table whose item value is not in the list you get from the subselect.
Do you need more hints?
Marco

VLOOKUP two different items, use whichever one has a number

I have a list of financial metrics in column A, rest of the columns are the time periods the financial data is for.
Let's say I'm trying to calculate a ratio, but the financial metrics in A are not entirely unique, in the sense that a metric type may have more than one associated metric depending on how the company reports the metric.
For example, let's say I need Depreciation Expense on the income statement... that item may be reported as Depreciation, or DepreciationAndAmortization, or something else.
Any ideas how the formula in the ratio I'm trying to calculate can lookup the metric in A1, use the number immediately to the right as part of the formula... and if the metric Depreciation for example is 0, it would look for the next one I specify, like DepreciationAndAmortization, and use that one instead as the first one isn't reported.
If I understand correctly, this should do it:
=MAX(INDEX(B:B,MATCH("*depreciation*",A:A,)),INDEX(B:B,MATCH("*depreciation*",A:A,)+MATCH("*depreciation*",INDEX(A:A,1+MATCH("*depreciation*",A:A,)):INDEX(A:A,100+MATCH("*depreciation*",A:A,)),)))
If the alternatives are say in E2 and E3 then:
=MAX(VLOOKUP(E2,A:B,2,0),VLOOKUP(E3,A:B,2,0))
ie try both and take whichever is larger.
About your concern on the answer of Excel Hero that returns Value error, you can use the function "iferror" and returns "0" if the value/date you're looking for isn't available.
=IFERROR(MAX(INDEX(B:B,MATCH("*depreciation*",$A:$A,)),INDEX(B:B,MATCH("*depreciation*",$A:$A,)+MATCH("*depreciation*",INDEX($A:$A,1+MATCH("*depreciation*",$A:$A,)):INDEX($A:$A,100+MATCH("*depreciation*",$A:$A,)),))),0)