I have a plist with 12 objects with unique keys (let's say 1, 2, 3, etc). Each object contains an array (key: clubLocations) with 100 - 200 objects.
Each object in the clubLocations array contains a longitude and latitude key with a club's location.
I would like some assistance in creating a method which loops through each object for every clubLocations in every 12 objects and finds out which clubLocation is the closest match to the users coordinates.
So basically object 1/2/3/etc -> clubLocations objectAtIndex:0/1/2/3/etc -> best match?
I have the user's coordinates, so "just" need assistance to find the closest locations from my plist. Thank you in advance and please do not hesitate to ask in a comment if I am not making myself clear enough.
In pseudocode:
min_distance = MAXINT
closest = None
for obj in objects:
for club in obj.clubLocations:
d = distance(club.longitude, club.latitude, user.longitude, user.latitude)
if d < min_distance:
min_distance = d
closest = club
Related
I have 2 data frames. One called cuartos (rooms in English) and another called paredes (walls in English) They have room temperatures and walls temperatures. I want to create a new data frame with the temperature difference between each wall and its room. For example
Room name = 2_APTO_1
Walls of the room = 2_APTO_1.FACE2, 2_APTO_1.FACE3 and 2_APTO_1.FACE4
The new data frame should be something like
2_APTO_1.FACE2 = 2_APTO_1.FACE2 - 2_APTO_1
2_APTO_1.FACE3 = 2_APTO_1.FACE3 - 2_APTO_1
2_APTO_1.FACE4 = 2_APTO_1.FACE4 - 2_APTO_1 ....
I tried this:
get a list of paredes and cuartos columns
col_names_paredes= paredes.columns.tolist()
col_names_cuartos= cuartos.columns.tolist()
Check if col_names_paredes has col_names_cuartos names in it
for i in col_names_cuartos:
for k in col_names_paredes:
if col_names_paredes[k] in col_names_cuartos[i]:
print(k)
I got this error
TypeError: list indices must be integers or slices, not str
any help would be appreciated.
When you do for i in col_names_cuartos, i will take column names values, not indice values that you would obtain with for i in range(len(col_names_cuartos)).
So you can use the following code instead :
for col_cuartos in col_names_cuartos:
for col_paredes in col_names_paredes:
if col_paredes in col_cuartos:
print(col_paredes)
I'm sorry, I know this is basic but I've tried to figure it out myself for 2 days by sifting through documentation to no avail.
My code:
import numpy as np
import pandas as pd
name = ["bob","bobby","bombastic"]
age = [10,20,30]
price = [111,222,333]
share = [3,6,9]
list = [name,age,price,share]
list2 = np.transpose(list)
dftest = pd.DataFrame(list2, columns = ["name","age","price","share"])
print(dftest)
name age price share
0 bob 10 111 3
1 bobby 20 222 6
2 bombastic 30 333 9
Want to divide all elements in 'price' column with all elements in 'share' column. I've tried:
print(dftest[['price']/['share']]) - Failed
dftest['price']/dftest['share'] - Failed, unsupported operand type
dftest.loc[:,'price']/dftest.loc[:,'share'] - Failed
Wondering if I could just change everything to int or float, I tried:
dftest.astype(float) - cant convert from str to float
Ive tried iter and items methods but could not understand the printouts...
My only suspicion is to use something called iterate, which I am unable to wrap my head around despite reading other old posts...
Please help me T_T
Apologies in advance for the somewhat protracted answer, but the question is somewhat unclear with regards to what exactly you're attempting to accomplish.
If you simply want price[0]/share[0], price[1]/share[1], etc. you can just do:
dftest['price_div_share'] = dftest['price'] / dftest['share']
The issue with the operand types can be solved by:
dftest['price_div_share'] = dftest['price'].astype(float) / dftest['share'].astype(float)
You're getting the cant convert from str to float error because you're trying to call astype(float) on the ENTIRE dataframe which contains string columns.
If you want to divide each item by each item, i.e. price[0] / share[0], price[1] / share[0], price[2] / share[0], price[0] / share[1], etc. You would need to iterate through each item and append the result to a new list. You can do that pretty easily with a for loop, although it may take some time if you're working with a large dataset. It would look something like this if you simply want the result:
new_list = []
for p in dftest['price'].astype(float):
for s in dftest['share'].astype(float):
new_list.append(p/s)
If you want to get this in a new dataframe you can simply save it to a new dataframe using pd.Dataframe() method:
new_df = pd.Dataframe(new_list, columns=[price_divided_by_share])
This new dataframe would only have one column (the result, as mentioned above). If you want the information from the original dataframe as well, then you would do something like the following:
new_list = []
for n, a, p in zip(dftest['name'], dftest['age'], dftest['price'].astype(float):
for s in dftest['share'].astype(float):
new_list.append([n, a, p, s, p/s])
new_df = pd.Dataframe(new_list, columns=[name, age, price, share, price_div_by_share])
If you check the data types of your dataframe, you will realise that they are all strings/object type :
dftest.dtypes
name object
age object
price object
share object
dtype: object
first step will be to change the relevant columns to numbers - this is one way:
dftest = dftest.set_index("name").astype(float)
dftest.dtypes
age float64
price float64
share float64
dtype: object
This way you make the names a useful index, and separate it from the numeric data. This is just a suggestion; you may have other reasons to leave names as a columns - in that case, you have to individually change the data types of each column.
Once that is done, you can safely execute your code :
dftest.div(dftest.share,axis=0)
age price share
name
bob 3.333333 37.0 1.0
bobby 3.333333 37.0 1.0
bombastic 3.333333 37.0 1.0
I assume this is what you expect as your outcome. If not, you can tweak it. Main part is get your data types as numbers before computation/division can occur.
I've used NLTK to pos_tag sentences in a pandas dataframe from an old Yelp competition. This returns a list of tuples (word, POS). I'd like to count the number of parts of speech for each instance. How would I, say, create a function to count the number of being verbs in each review? I know how to apply functions to features - no problem there. I just can't wrap my head around how to count things inside tuples inside lists inside a pd feature.
The head is here, as a tsv: https://pastebin.com/FnnBq9rf
Thank you #zhangyulin for your help. After two days, I learned some incredibly important things (as a novice programmer!). Here's the solution!
def NounCounter(x):
nouns = []
for (word, pos) in x:
if pos.startswith("NN"):
nouns.append(word)
return nouns
df["nouns"] = df["pos_tag"].apply(NounCounter)
df["noun_count"] = df["nouns"].str.len()
As an example, for dataframe df, noun count of the column "reviews" can be saved to a new column "noun_count" using this code.
def NounCount(x):
nounCount = sum(1 for word, pos in pos_tag(word_tokenize(x)) if pos.startswith('NN'))
return nounCount
df["noun_count"] = df["reviews"].apply(NounCount)
df.to_csv('./dataset.csv')
There are a number of ways you can do that and one very straight forward way is to map the list (or pandas series) of tuples to indicator of whether the word is a verb, and count the number of 1's you have.
Assume you have something like this (please correct me if it's not, as you didn't provide an example):
a = pd.Series([("run", "verb"), ("apple", "noun"), ("play", "verb")])
You can do something like this to map the Series and sum the count:
a.map(lambda x: 1 if x[1]== "verb" else 0).sum()
This will return you 2.
I grabbed a sentence from the link you shared:
text = nltk.word_tokenize("My wife took me here on my birthday for breakfast and it was excellent.")
tag = nltk.pos_tag(text)
a = pd.Series(tag)
a.map(lambda x: 1 if x[1]== "VBD" else 0).sum()
# this returns 2
I'm writing an app where a group of people must mark each other. So I have a "Users" array like this:
0: paul
1: sally
2: james
3: bananaman
The first item Paul is marked (out of ten) by the other three, and then the second item Sally is marked by the other three (index 2, 3, 0) and so on, to create a "Results" array like this one:
0: paul, sally, 5
1: paul, james, 7
2: paul, bananaman, 9
3: sally, james, 4
I'm keeping track of the current 'scorer' and 'being_scored' integers as a new score gets added, which looks like this:
scorer = 1, being_scored = 0
scorer = 2, being_scored = 0
scorer = 3, being_scored = 0
scorer = 0, being_scored = 1
scorer = 2, being_scored = 1
However the group can stop scoring at any point, and a different group session could be loaded, which was also partially scored.
My question is how can I generate the 'scorer' and 'being_scored' values based only on the results [array count].
Presumably it's the [results count] divided by [users count] - 1, with the resulting whole number 'being_scored' and the remainder is the 'scorer'.
But my brain is utterly fried after a long week and this doesn't seem to be working.
Any help much appreciated
Mike.
Ignoring your added comment that the "Results" array is multi-dimensional and simply contains structs/objects with three fields/properties: scored, scorer, score; then surely you just go to the last element of "Results" (at index [Results count]-1), select the scored and scorer and move on to the next in your sequence - which you presumably have logic for already in the case the loop was not interrupted (something like "if last scorer precedes being_scored [treating the array as a circular buffer by using modulo arithmetic] then advanced being_scored and init scorer else advance scorer").
But then that sounds rather obvious, but you did say you brain was fried...
Not Ignoring your added comment implies you have a two-dimensional array of scores which you are filling up in some pattern? If this is a pre-allocated array of some number type then if you init it with an invalid score (negative maybe?) you scan the array following your pattern looking for the first invalid score and restart from there. If it is a dynamic single dimensional array of single dimensional arrays then the count of the outer one tells you the being_scored, and the count of the last inner one tells you the scorer...
But that sounds rather obvious as well...
Maybe some sleep? Then reframe the question if you're still stuck? Or maybe this bear of little brain missed the point entirely and somebody else will figure out your question for you.
[This is more a comment than an answer, but its too long for a comment, sorry.]
so I am calling the twitter api:
openurl = urllib.urlopen("https://api.twitter.com/1/statuses/user_timeline.json?include_entities=true&contributor_details&include_rts=true&screen_name="+user+"&count=3600")
and it returns some long file like:
[{"entities":{"hashtags":[],"user_mentions":[],"urls":[{"url":"http:\/\/t.co\/Hd1ubDVX","indices":[115,135],"display_url":"amzn.to\/tPSKgf","expanded_url":"http:\/\/amzn.to\/tPSKgf"}]},"coordinates":null,"truncated":false,"place":null,"geo":null,"in_reply_to_user_id":null,"retweet_count":2,"favorited":false,"in_reply_to_status_id_str":null,"user":{"contributors_enabled":false,"lang":"en","profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/151701304\/theme14.gif","favourites_count":0,"profile_text_color":"333333","protected":false,"location":"North America","is_translator":false,"profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/151701304\/theme14.gif","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/1642783876\/idB005XNC8Z4_normal.png","name":"User Interface Books","profile_link_color":"009999","url":"http:\/\/twitter.com\/ReleasedBooks\/genres","utc_offset":-28800,"description":"All new user interface and graphic design book releases posted on their publication day","listed_count":11,"profile_background_color":"131516","statuses_count":1189,"following":false,"profile_background_tile":true,"followers_count":732,"profile_image_url":"http:\/\/a2.twimg.com\/profile_images\/1642783876\/idB005XNC8Z4_normal.png","default_profile":false,"geo_enabled":false,"created_at":"Mon Sep 20 21:28:15 +0000 2010","profile_sidebar_fill_color":"efefef","show_all_inline_media":false,"follow_request_sent":false,"notifications":false,"friends_count":1,"profile_sidebar_border_color":"eeeeee","screen_name":"User","id_str":"193056806","verified":false,"id":193056806,"default_profile_image":false,"profile_use_background_image":true,"time_zone":"Pacific Time (US & Canada)"},"possibly_sensitive":false,"in_reply_to_screen_name":null,"created_at":"Thu Nov 17 00:01:45 +0000 2011","in_reply_to_user_id_str":null,"retweeted":false,"source":"\u003Ca href=\"http:\/\/twitter.com\/ReleasedBooks\/genres\" rel=\"nofollow\"\u003EBook Releases\u003C\/a\u003E","id_str":"136957158075011072","in_reply_to_status_id":null,"id":136957158075011072,"contributors":null,"text":"Digital Media: Technological and Social Challenges of the Interactive World - by William Aspray - Scarecrow Press. http:\/\/t.co\/Hd1ubDVX"},{"entities":{"hashtags":[],"user_mentions":[],"urls":[{"url":"http:\/\/t.co\/GMCzTija","indices":[119,139],"display_u
Well,
the different objects are slit into tables and dictionaries and I want to extract the different parts but to do this I have to know how many objects the file has:
example:
[{1:info , 2:info}][{1:info , 2:info}][{1:info , 2:info}][{1:info , 2:info}]
so to extract the info from 1 in the first table I would:
[0]['1']
>>>>info
But to extract it from the last object in the table I need to know how many object the table has.
This is what my code looks like:
table_timeline = json.loads(twitter_timeline)
table_timeline_inner = table_timeline[x]
lines = 0
while lines < linesmax:
in_reply_to_user_id = table_timeline_inner['in_reply_to_status_id_str']
lines += 1
So how do I find the value of the last object in this table?
thanks
I'm not entirely sure this is what you're looking for, but to get the last item in a python list, use an index of -1. For example,
>>> alist = [{'position': 'first'}, {'position': 'second'}, {'position': 'third'}]
>>> print alist[-1]['position']
{'position': 'third'}
>>> print alist[-1]['position']
third