vba loop through all the pivot fields of a pivot table and return specified values - vba

I have a dataset whose entries has 5 different attributes and one value. For example, I have a height of 5000 people. For each person I have his hair color, eye color, his nationality, the city he were born and the name of his mother (the 5 dimensions).
No/Eye Color/Hair Color/Nationality/Hometown/Mother's Name/Height
Blue Blond Swiss Zürich Nicole 184
Blue Brown English York Ruby 164
Brown Brown French Paris Sophie 154
etc..
So there are 5 dimensions. The data is set dynamically, so the number of categories in each dimensions can vary. I sought to compute the average height of people depending on whether I want to include some dimensions or not (from 1 to 5). For example I wanted the retrieve:
The average height of French and Blue eyed people. Next day only the people born in London. And the week after, the Swiss, blue-eyed, red-haired, born in Geneva and whose mother is called Nicole.
So I create a pivot table with the Eye Color as Row labels, Hair Color as Column labels, the average height as the Data and the last 3 dimensions as Market Filters. This allowed me see all the possible and desired combinations of average height that my data implies.
Now my goal is:
I want to create a Macro that goes through all the possible combinations that my dimensions entails (i.e 2^5-1=31) and store in a vector all the combination of height average that are above a certain value, e.g. 190. And then It could print on a worksheet.
I was thinking on using some booleans arrays vector and For-Each-Next structure, but I must say that I fail to picture how to implement it.
Any ideas?
Thanks for the time and help!

Related

Unable to create new features in Machine learning

I have a dataset. I am using pandas dataframe and named it df.
The dataset has 50,000 rows - here are the first 5:.
Name_Restaurant cuisines_available Average cost
Food Heart Japnese, chinese 60$
Spice n Hungary Indian, American, mexican 42$
kfc, Lukestreet Thai, Japnese 29$
Brown bread shop American 11$
kfc, Hypert mall Thai, Japnese 40$
I want to create column which contains the no. of cuisines available
I am trying code
df['no._of_cuisines_available']=df['cuisines_available'].str.len()
Then instead of showing the no. of cuisines, it is showing the sum of charecters.
For example - for first row the o/p should be 2 , but its showing 17.
I need a new column that contain number of stores for each restaurant. example -
here kfc has 2 stores kfc, lukestreet and kfc, hypert mall. I have completely
no idea how to code this.
i)
df['cuisines_available'].str.split(',').apply(len)
ii)
df['Name_Restaurant'].str.split(',', expand=True).melt().['value'].str.strip().value_counts()
What ii) does: split columns at ',' and store all strings thus generated in an individual column. Then use melt to make one big column, strip away spaces etc. and count individual entries.

SSRS Switch statement in Expression is not working (color code a polygon in a chart)

I have a report that breaks down financials by state. Here is what it looks like:
That is the tablix version of the data. I also have a chart as a map where I want to display the data visually.
The actual data is broken up like this:
NM City 100
NJ City1 100
NJ City2 100
NJ City3 100
NY City 100
NY City2 100
In SSRS, each state is a polygon.
I want to set the fill color of that polygon to be a color based on the Total Value of that state.
The best way to do this would be to just set the color value equal to my formula against the total value. Then I would use that same line of code for every polygon and it would color code accordingly.
However, I do not think the polygons know which state they belong to. For example, is there any way to get the New York Polygon to only look at the NY state value?
In case there isn't, I'm trying to so a switch statement where for every polygon I'll have it only get the value where the state name equals whatever I manually input.
=SWITCH
(Max(Fields!State.Value, "CustomersByState") = "NE" , "10000"
Max(Fields!State.Value, "CustomersByState") = "NY" , "20000"
1=1,"Coral")
When I have that line be as the expression for the label name of that polygon (for testing, if I can make this work I can make anything work) it gives me an error and says comma, ')', or a valid expression continuation expected.
I believe you need a comma after "10000" and "20000"

Testing a cube - is slicing by each dimension in turn sufficient?

One for the mathematicians.
Say I have two cubes, or dimensionally-modelled datasets A and B.
To prove that they're identical, is it sufficient to slice each of them by every dimension in turn, and verify that the totals for each member are identical?
A simple example: dimensions Country (England and Scotland), Gender (Male and Female) and Married (Yes or No). Measure CountPeople.
If I slice CountPeople by Country, comparing the results from A and B, then by Gender, then by Married, and find identical results, have I proved that every cell in A and B is identical?
I think that I have, but I'm not sure.
No, slicing on each dimension in turn is not sufficient to prove that the cubes are identical at cell level. It probably will be close enough most of the time, but it's not mathematically guaranteed.
We can prove this with a fairly simple example with just Gender and Country dimensions. Imagine we have the following data at cell level:
(Male, England): 100, (Female, Scotland): 100
If we slice separately by Gender or Country we get:
Male: 100, Female: 100
England: 100, Scotland: 100.
Now if all of those males move to Scotland and all the females move to England, we'll have different data at cell level:
(Male, Scotland): 100, (Female, England): 100
But the data reported by either single dimension will be the same:
Male: 100, Female: 100
England: 100, Scotland: 100
This is a fairly trivial example, but the same possibility exists for non-trivial data, so to be 100% sure two cubes are identical, you would need to validate at cell level.

Parse data from Morningstar Direct to worksheet

I have to put together a report every quarter using data pulled off of Morningstar Direct. I have to automate the whole process, or at least parts of it. We have put this report together for the last two quarters, and we use the same format each time. So, we already have the general templates for the report - now I'm just looking for a way to pull the data from Morningstar and putting into the templates correctly.
Does anyone have any general idea where I should start?
A B C D E F
Group Name Weight Gross Net Contribution
Equity 25% 10% 8% .25
IBM 5% 15% 12%
AAPL 7% 23% 18%
Fixed Income 25% 5% 4% .17
10 Yr Bond 10% 7% 5%
Emerging Mrkts
And it goes on breaking things into more groups, and there are many more holdings within each group.
What I want it to do is search until it finds "Equity", for example, and then go over one row, grab the name of the position, its weight, and its net return, and do that for each holding in Equity. The for it to do the same thing in Fixed Income, and on and on - selecting the names, weights, and nets for each holding. Then copy and pasting them into another workbook.
Anyway that is possible?
It sounds like you need to parse your information. By using left(), right(), and mid() you can select the good data and ignore the superfluous. You could separate the data in one cell into multiple cells in the desired format.
A B
Name Address
John Q. Public 123 My Street, City, State, Zip
E (First Name) F (Middle Initial) (extra work to program missing data)
=LEFT(A2,FIND(" ",A2)) =MID(A2,LEN(E2)+1,FIND(" ",MID(A2,LEN(E2)-1,99)))
G (Last Name) H (City)
=MID(A2,(LEN(E2)+LEN(F2)+2),99) =MID(B2,LEN(H2)+2,FIND(",",MID(B2,LEN(H2)+2,99))-1)
I (State)
=MID(B2,(LEN(I2)+LEN(H2)+4),FIND(",",MID(B2,(LEN(I2)+LEN(H2)+4),99))-1)
J (Zip Code)
=MID(B2,(LEN(H2)+LEN(I2)+LEN(J2)+6),99)
This code will parse the name in the cell A2 and address in cell B2 into separate fields.
Similar cuts should allow you to get rid of the unwanted data.
==================================================================
7/8/2015
Your data seems to be your desired output. If so, please provide sanitized input data for comparison. You probably need to loop through your input to find the groups. When the group changes, prepare the summary figures.

PostGIS: bounding box of a multipolygon

SELECT id, ST_Box2D(areas) AS bbox FROM mytable;
In this example, the table "mytable" contains two columns: "id" is the unique id number of the row and "areas" is a geometry field containing one MULTIPOLYGON per row.
This works fine for multipolygons containing only one polygon, but some rows have polygons very spread apart, hence the bounding box is not relevant when the multipolygon contains one polygon in Europe and one in Canada for example.
So I would need a way to get one box2d per polygon per multipolygon, but I haven't found how just yet.
More exactly, my goal is to return one multipolygon per row, containing one box2d per polygon.
First example
id: 123
area: a multipolygon containing only one oval polygon in Australia
therefore bbox should return a multipolygon containing only one rectangle (the bounding box) in Australia
Second example
id: 321
area: a multipolygon containing one circle in Paris, one circle in Toronto
therefore bbox should return a multipolygon containing one rectangle in Paris, one rectangle in Toronto
You should use ST_Dump https://postgis.net/docs/ST_Dump.html
Then you will get one row per polygon. The other fields will be duplicated when the geometry is split. It is like an aggregate function but the other way.
The syntax gets a little special since it outputs a compound data type so you have to extract the geometry part like this:
SELECT (ST_Dump(the_geom)).geom from mytable;
since this gives you more rows in the table you should just make a new table from the query.
then you can just create an index on that new geometry column in the new table and it will be built on bounding boxes for each single polygon.
HTH
/Nicklas
Do you want your polygons too at one row each? That is what I thought, but if you want only a table with bboxes, one per row with an id references the original multipolygon (you will of cource get the same id repeated for every part of the multipolygon) then you can do the same byt just extracting the bboxes something like:
CREATE TABLE newTable AS
SELECT id, BOX2D((ST_Dump(the_geom)).geom) AS myBox FROM originamTable
I am afraid I don't really get what you want, but you have a lot of possibilities with ST_Dump in cases like this.
You would have to box the relevant bits (say the Canadian and French components) separately. The best tool for this in PostGIS is the geometry accessor ST_GeometryN(geometry,int) (reference: http://postgis.refractions.net/docs/ST_GeometryN.html ). That link has a good example of combining the accessor with ST_NumGeometries.
UPDATE PER COMMENT:
Here is a simple example from San Francisco -- this table contains a geometry field called the_geom, gid record 1 is a field with two multipolygons as reported by st_numgeometries (note the ordinal is indexed at 1 not 0):
=> select st_box2d(st_geometryn(the_geom, 1)) from tl_2009_06075_cousub00 \
where gid = 1;
st_box2d
-------------------------------------------------------------------------
BOX(-123.173828125 37.6398277282715,-122.935707092285 37.8230590820312)
(1 row)
=> select st_box2d(st_geometryn(the_geom, 2)) from tl_2009_06075_cousub00 \
where gid = 1;
st_box2d
----------------------------------------------------------------------------
BOX(-122.612289428711 37.7067184448242,-122.281776428223 37.9298248291016)
(1 row)