Combining two fields

Combining two fields - qlikview

in one of my charts I have Branch as the dimension and the expressions are a set of KPIs. I also have a Year (2009-2016) list box that is set up to always have one selected value.
From 2014 onwards I need to amalgamate Branch A into Branch B while leaving the other Branches as they were. My searches so far lead me to believe that a calculated dimension might solve my problem but I'm not sure how to go about it.
Ideally I would like my other charts to continue to display the branches separately. Any advice is appreciated.
-Brandon

You can go with calculated dimensions and type something like this:
if( Year => 2014 and Branch = 'A', 'B', Branch)
But my advice is to use calculated dimensions only if there is no other way. Calculated dimensions are nice to have but they lead to performance issues. They are creating additional tables/fields in the memory and if you have decent amount of data this will slow down the calculations.
Instead you can create an additional field in the script (using the same expression) and use this field as dimension in your object.

Related

Insert ceros instead of interopolate ARIMA_PLUS bigquery

I want to do ARIMA_plus forecasting on a series of sale records. The problem is that sale records only contain sales. When doing the forecast we need to insert for every product the "non sales", which, essentially, are rows with the import column set to cero for every day the product has not been sold. We have here two options:
Fill the database with those zero-rows (uses a lot of space)
When doing the forecasting with ARIMA_PLUS in bigquery tell the model to fill with zeros instead of interpolating (default and seemingly unique option).
I want to follow the second option, yet, i dont see how. Here you can see a screenshot of the documentation Google info about interpolation
The first option would be carried out with a merge, nevertheless I would prefer to discard it since it increases the size of the sales table.
I have scanned the documentation and havent seen any solution

You need to provide an input dataset covering the missing values with the right method for your use case.
In other words, the SQL query must solve the interpolation so that the input for the model already contains the expected data.
You can, for example, create a query to add a liner interpolation solution for your use case.
So, the first approach you mentioned can be solved using that input SQL (rather than adding the data to the source table) and the second approach is not valid in bigquery, as far as I know.
Here you have an example: https://justrocketscience.com/post/interpolation_sql/

How to populate all possible combination of values in columns, using Spark/normal SQL

I have a scenario, where my original dataset looks like below
Data:
Country,Commodity,Year,Type,Amount
US,Vegetable,2010,Harvested,2.44
US,Vegetable,2010,Yield,15.8
US,Vegetable,2010,Production,6.48
US,Vegetable,2011,Harvested,6
US,Vegetable,2011,Yield,18
US,Vegetable,2011,Production,3
Argentina,Vegetable,2010,Harvested,15.2
Argentina,Vegetable,2010,Yield,40.5
Argentina,Vegetable,2010,Production,2.66
Argentina,Vegetable,2011,Harvested,15.2
Argentina,Vegetable,2011,Yield,40.5
Argentina,Vegetable,2011,Production,2.66
Bhutan,Vegetable,2010,Harvested,7
Bhutan,Vegetable,2010,Yield,35
Bhutan,Vegetable,2010,Production,5
Bhutan,Vegetable,2011,Harvested,2
Bhutan,Vegetable,2011,Yield,6
Bhutan,Vegetable,2011,Production,3
Image of the above csv:
Now there is a very small country lookup table which has all possible countries the source data can come with, listed. PFB:
I want to have the output data's number of columns always fixed (this is to ensure the reporting/visualization tool doesn't get dynamic number columns with every day's new source data ingestions depending on the varying distinct number of countries present).
So, I've to somehow join the source data with the country_lookup csv and populate all those columns with default value as F. Every country column would be binary with T or F being the possible values.
The original dataset from the above has to be converted into below:
Data (I've kept the Amount field unsolved for column Type having Derived Yield as is, rather than calculating them below for a better understanding and for you to match with the formulae):
Country,Commodity,Year,Type,Amount,US,Argentina,Bhutan,India,Nepal,Bangladesh
US,Vegetable,2010,Harvested,2.44,T,F,F,F,F,F
US,Vegetable,2010,Yield,15.8,T,F,F,F,F,F
US,Vegetable,2010,Production,6.48,T,F,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
US,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
US,Vegetable,2011,Harvested,6,T,F,F,F,F,F
US,Vegetable,2011,Yield,18,T,F,F,F,F,F
US,Vegetable,2011,Production,3,T,F,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
US,Vegetable,2011,Derived Yield,(6+2)/(3+3),T,F,T,F,F,F
US,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Argentina,Vegetable,2010,Harvested,15.2,F,T,F,F,F,F
Argentina,Vegetable,2010,Yield,40.5,F,T,F,F,F,F
Argentina,Vegetable,2010,Production,2.66,F,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2)/(6.48+2.66),T,T,F,F,F,F
Argentina,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Argentina,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Argentina,Vegetable,2011,Harvested,10,F,T,F,F,F,F
Argentina,Vegetable,2011,Yield,90,F,T,F,F,F,F
Argentina,Vegetable,2011,Production,9,F,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10)/(3+9),T,T,F,F,F,F
Argentina,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Argentina,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
Bhutan,Vegetable,2010,Harvested,7,F,F,T,F,F,F
Bhutan,Vegetable,2010,Yield,35,F,F,T,F,F,F
Bhutan,Vegetable,2010,Production,5,F,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(15.2+7)/(2.66+5),F,T,T,F,F,F
Bhutan,Vegetable,2010,Derived Yield,(2.44+15.2+7)/(6.48+2.66+5),T,T,T,F,F,F
Bhutan,Vegetable,2011,Harvested,2,F,F,T,F,F,F
Bhutan,Vegetable,2011,Yield,6,F,F,T,F,F,F
Bhutan,Vegetable,2011,Production,3,F,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(2.44+7)/(6.48+5),T,F,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(10+2)/(9+3),F,T,T,F,F,F
Bhutan,Vegetable,2011,Derived Yield,(6+10+2)/(3+9+3),T,T,T,F,F,F
The image of the above expected output data for a structured look at it:
Part 1 -
Part 2 -
Formulae for populating Amount Field for Derived Type:
Derived Amount = Sum of Harvested of all countries with T (True) grouped by Year and Commodity columns divided by Sum of Production of all countries with T (True)grouped by Year and Commodity columns.
So, the target is to have a combination of all the countries from source and calculate the sum of respective Harvested and Production values which then has to be divided. The commodity can be more than one in the actual scenario for any given country, but that should not bother as the summation of amount happens on grouped commodity and year.
Note: The users in the frontend can select any combination of countries. The sole purpose of doing it in the backend rather than dynamically doing it in the frontend is because AWS QuickSight (our visualisation tool), even though can populate sum on selected column filters but doesn't yet support calculation on those derived summed fields. Hence, the entire calculation of all combination of countries has to be pre-populated (very naive approach) in order to make it available in report on dynamic users selection of countries.
Also if you've any better approach (than the above naive approach mentioned in note) to solve this problem, you are most welcome to guide me. I've also posted a question on the same problem without writing my expected approach for experts to show me the path on how we can solve this kind of a problem better than this naive approach. If you want to help solve it with some other technique, you're most welcome, here is the link to that question.
Any help shall be greatly acknowledged.

SSAS OLAP Cube - Sum measure only works when keys are present

(This is a mock of my actual setup to help me figure out the problem.)
I have one fact table and one dimension table, linked by an id field.
My goal is to make a measure that sums up all "thing_count" (integer) values in my cube.
If the user splits by nothing, it should show the total "thing_count" for all records in the fact table. If it's split by "category_name" from the dimension, it should show the total "thing_count" for each category.
I tried to achieve this by creating a SUM measure in my cube:
It works, but not in the way I intend it to
It always shows (null) unless I drag in the "id" field from the dimension.
Measure only:
Measure and category:
Measure, category, and id:
How can I make the measure show the value without keys needing to be present?
Edit:
For GregGalloway's request (I've edited the names so the screenshots are easier to follow):

One common explanation for this behavior (no aggregation) is that you have inadvertently commented out the CALCULATE; statement in your MDX script in the cube. Please check that statement is still present.

Can someone explain the following Essbase code: FIX, #relative

Can someone please explain the below Essbase code to me please? This is my first time looking at any Essbase code and I'm getting a bit confused as to what it is actually doing.
FIX(&Mth, &Yr, &Version,
"Sector1","Sector2", #relative("Source Code",0), #relative("Channel", 0) )
FIX("AccountNo","DepNo")
DATACOPY "1A11"->"A-500" TO "1BCD"->"C-800";
ENDFIX
ENDFIX
From what I have googled the following is my understanding:
Creates a new command block which restricts database calculations to this subset.
Passes the following members into the command to be used:
Mth
Yr
Version
Returns the following fields:
Sector1
Sector2
returns the 0-level members of the Source Code member - meaning it returns the members of the Total Source Code without children (no other dimensions)
returns the 0-level members of the Channel member - meaning it returns the members of the Channel without children (no other dimensions)
Begins a new command block and passes the following members into the command to be used:
AccountNo
DepNo
Copies the range of cells 1A11, A-500 over to the range 1BCD, C-800
The above is what I understand from the oracle documents on each of the functions, but I can't actually figure out what is happening.

Welcome to the world of Essbase; it can be a little daunting at first especially if you're new to the concept of multidimensionality. You are on the right track regarding analyzing your calc script.
Try not to think of the FIX statement as a command block, per se. A FIX is used to select a portion of cells in the cube. Every single piece of data in your cube has a particular address that consists of one member from every dimension, plus the actual data value itself. For instance, a cube with the dimensions Time, Year, Scenario, and Location might have a particular piece of data at Jan->2018->Actual->Washington. The number of possible permutations of data in a cube can quickly get very large. For instance, if you're organization has 4 years of data, 12 months in a year, 100 locations, 10000 accounts, 3 versions, and 10 departments, you are talking about 4 * 12 * 100 * 10000 * 3 * 10 = 1.4 billion different potential addresses (cells) of data – and that's actually fairly small for a cube, as they tend to grow much larger.
That said, FIX statements are used to narrow down the scope of your calculation operation, rather than operating on the ENTIRE cube (all 1.4 billion cells in my hypothetical example), the FIX essentially restricts the calculation to cells that match certain criteria you specify. In this case, the first FIX statement restricts the calculation to a particular month, yr, version, sectors, sources, and channels. Note that the ampersand on Mth, Yr, and Version means that a substitution variable is to be used. This means your server or cube has a substitution variable value set, such as the variable Mth = "Jan" and Yr = "FY2018" and Version might be "Working" or "Final" or something similar. I would guess that Sector1 and Sector2 are possibly two different members from the same dimension. #RELATIVE("Source Code", 0) is a function that finds the level-0 members (leaf/bottom-level members in a dimension, that is, members that do not have children below them) of the specified member.
In other words, the first FIX statement is narrowing the scope of the calculation to a particular month in a particular year in a particular version (as opposed to all months, all years, all versions), and for that particular month/year/version (for either Sector1 or Sector2) it is fixing on all of the level-0/bottom/leaf members in Source Code and Channel dimensions.
The next FIX statement just further narrows the current scope of cells to calculate on in addition to the outer FIX. It's not uncommon to see FIX statements nested like this.
Lastly we get to the part where something actually happens: the DATACOPY. In the given FIX context, this DATACOPY command is saying that for EACH cell in the current FIX, copy values from the source to the destination. DATACOPY is a little more straightforward when it's just DATACOPY "Source" TO "Target" as opposed to using the inter dimensional operator (->)... but this is perhaps more easily understood in terms of the time/year dimensions. For example, imagine the data copy was written like this:
DATACOPY "FY2018"->"Dec" TO "FY2019"->"Jan";
In this DATACOPY I'd be telling Essbase that for the given FIX context I would like to copy values from the end of the year (data values where the year is FY2018 AND the month is Dec) to the beginning of the next year (data values where the year is FY2019 AND the month is Jan). Your DATACOPY is working in a similar fashion, but using cost centers or something else. It all just depends on how the cube is setup.

What is difference between Pentaho DI "variables" and "fields"?

Could not find much information about this. I can see that fields can have multiple copies per row in a transformation. But what are variables? Are they unique across all rows a transformation produces? But, by the name, variables are meant to vary.
What is difference between fields and variables exactly?
Can someone enlighten me please
Thank you

PDI transformations work with a stream of rows that pass through all the steps. The rows consist of a number of fields that the steps can act on, converting them, filtering them, sorting, etc.
Variables are more like a configuration help and have a single value in the transformation. It's very important to remember that they can NOT be set/changed and used within the same transformation, because all the steps execute in parallel!
Example
In your transformation you have a variable called "last_staging_run" and its value is "2017/01/19 05:00:00". This one has been passed to the transformation from the parent job.
You then use it in a Table Input:
SELECT id, product_id, price, number
FROM sales
WHERE purchase_date > ${last_staging_run}
This will give you the new rows since the last staging run with the fields id, product_id, price and number. You might then lookup the product names or filter products with a zero price with other steps, then store it in a table again.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas