I need to get the min, max, avg of an Array List using Java - arraylist

The values come from a Junit test case or from a scanner, so the method has to work in all scenarios. The array List looks something like this.
Utah 5
Nevada 6
California 12
Oregon 8
Utah 9
California 10
Nevada 4
Nevada 4
Oregon 17
California 6
I need to be able to calculate the average of, let's say, Utah. I know how to do something that finds the average. My biggest problem is knowing how to only get values from the names of Utah rather than just getting all of them.
names=states values= numbers categories=what
you are calculating
here is the start of the method:
public static ArrayList<Double> summarizeData (ArrayList<String> names, ArrayList<Double> values ,ArrayList<String> categories, int operation)

Related

Suitable Clustering Approach

I've got a total of 9 sensors in the ground, which measure the water content of the soil. 1-3 are in a depth of 1m, 4-6 are in a depth of 2m and sensors 7-9 are in a depth of 3m.
My dataset also contains the precipiation of the location. It is hourly data:
Time
Sensor-ID
Precipitation
Soil Water Content
2022-01-01 11:00
1
74
120
2022-01-01 11:00
2
74
100
2022-01-01 11:00
3
74
110
...
...
...
...
2022-01-01 11:00
9
74
30
The goal now is to find out if the different ground / soil depths behave differently regarding the water content after raining (over time).
I thought about a clustering method to find out if the sensors can be clustered based on the data and confirm this. Since I'm not very experienced in data science, would that be the right approach and is it even possible to analyse it with clustering?
For clustering, you can add a new column with three new classes to your data - for 1-3 sensors : Class 1, for 4-6 sensors : Class 2, for 7-9 sensors : Class 3 and perform your analysis using the new classes. Either can be done using Python, Power BI or Excel.
You should start by analyzing different variables w.r.t to the sensors at different ground depths: Use univariate, Bi-Variate and Multi-Variate plots to derive your goal.

Minimum number if Common Items in 2 Dynamic Stacks

I have a verbal algorithm question, thus I have no code yet. The question is this: How can I possibly create an algorithm such that I have 2 dynamic stacks, both can or can not have duplicate items of strings, for example I have 3 breads, 4 lemons and 2 pens in the first stack, say s1, and I have 5 breads, 3 lemons and 5 pens in the second stack, say s2. I want to find the number of duplicates in each stack, and print out the minimum number of duplicates in both lists, for example:
bread --> 3
lemon --> 3
pen --> 2
How can I traverse 2 stacks and print the number of duplicated occurrences until the end of stacks? If you are confused about anything, I can edit my question depending on your confusion. Thanks.

How to collapse multiple unique observations into one and find a mean?

Data: https://www.dropbox.com/s/c2yef22u96dd3s5/female_mentions_centrality_1.xlsx?dl=0
Data set screenshot:
I have a data set which looks like the picture above. It has multiple (unique) observations for the same Movie Name. For example, there are 3 unique observations for the movie Aan Milo Sajna and 2 for Aap Ke Saath.
I want that wherever there are multiple observations for a given Movie Name, they get collapsed into a single observation such that each variable value is the mean of the multiple observations.
For example, see below.
Transformed data set screenshot:
The Movie Names that had single observations remain untouched. But the three observations for Aan Milo Sajna and the 2 observations for Aap Ke Sath get collapsed into single observations. And each of the variable values is changed to the mean of the multiple observations as shown in the picture.
How can I accomplish this?
df_mean = df.groupby('MOVIE NAME').agg(np.mean).reset_index()
MOVIE NAME FEMALE MENTIONS TOTAL FEMALE CENTRALITY FEMALE COUNT AVERAGE FEMALE CENTRALITY
0 1920 19.000 258.417 140.500 1.669
1 100 Days 18.600 435.320 153.000 3.427
2 13B 2.333 74.289 23.333 1.259
3 1920 London 14.500 926.183 152.500 3.118
4 1942: A Love Story 11.000 398.500 78.000 5.109
... ... ... ... ... ...
2029 Zindagi 5.000 119.667 45.667 2.506
2030 Zindagi Na Milegi Dobara 13.000 265.750 135.000 1.865
2031 Zindagi Tere Naam 2.500 57.500 21.250 3.689
2032 Zubeidaa 0.000 1260.122 101.000 14.421
2033 Zulmi 1.000 5.333 4.000 1.333

Aggregate data based on values appearing in two columns interchangeably?

home_team_name away_team_name home_ppg_per_odds_pre_game away_ppg_per_odds_pre_game
0 Manchester United Tottenham Hotspur 3.310000 4.840000
1 AFC Bournemouth Aston Villa 0.666667 3.230000
2 Norwich City Crystal Palace 0.666667 13.820000
3 Leicester City Sunderland 4.733333 3.330000
4 Everton Watford 0.583333 2.386667
5 Chelsea Manchester United 1.890000 3.330000
The home_ppg_per_odds_pre_game and away_ppg_per_odds_pre_game are basically the same metric. The former reprsents the value of this metric for the home_team, while the latter represents this metric for the away team. I want a mean of this metric for each team and that is regardless whether the team is playing home or away. In the example df you Manchester United as home_team_name in zero and as away_team_name in 5. I want the mean for Manchester United that includes all this examples.
df.groupby("home_team_name")["home_ppg_per_odds_pre_game"].mean()
This will only bring me the mean for the occasion when the team is playing home, but I want both home and away.
Since the two metrics are the same, you can append the home and away team metrics, like this:
data_df = pd.concat([df.loc[:,('home_team_name','home_ppg_per_odds_pre_game')], df.loc[:,('away_team_name','away_ppg_per_odds_pre_game')].rename(columns={'away_team_name':'home_team_name','away_ppg_per_odds_pre_game':'home_ppg_per_odds_pre_game'})])
Then you can use groupby to get the means:
data_df.groupby('home_team_name')['home_ppg_per_odds_pre_game'].mean().reset_index()

vba loop through all the pivot fields of a pivot table and return specified values

I have a dataset whose entries has 5 different attributes and one value. For example, I have a height of 5000 people. For each person I have his hair color, eye color, his nationality, the city he were born and the name of his mother (the 5 dimensions).
No/Eye Color/Hair Color/Nationality/Hometown/Mother's Name/Height
Blue Blond Swiss Zürich Nicole 184
Blue Brown English York Ruby 164
Brown Brown French Paris Sophie 154
etc..
So there are 5 dimensions. The data is set dynamically, so the number of categories in each dimensions can vary. I sought to compute the average height of people depending on whether I want to include some dimensions or not (from 1 to 5). For example I wanted the retrieve:
The average height of French and Blue eyed people. Next day only the people born in London. And the week after, the Swiss, blue-eyed, red-haired, born in Geneva and whose mother is called Nicole.
So I create a pivot table with the Eye Color as Row labels, Hair Color as Column labels, the average height as the Data and the last 3 dimensions as Market Filters. This allowed me see all the possible and desired combinations of average height that my data implies.
Now my goal is:
I want to create a Macro that goes through all the possible combinations that my dimensions entails (i.e 2^5-1=31) and store in a vector all the combination of height average that are above a certain value, e.g. 190. And then It could print on a worksheet.
I was thinking on using some booleans arrays vector and For-Each-Next structure, but I must say that I fail to picture how to implement it.
Any ideas?
Thanks for the time and help!