For each element in the array find the number of ways to present it as a sum of two other numbers from this array - binary-search

You are given the size of an array and the array itself. I know this problem can be solved effectively with the use of the binary search idea, but i don't know how to use it here. Would appreciate any help and code in c++ or python.
Here is the input-output example:
input:
5
3 3 2 2 1
output:
2 2 0 0 0
P.S. Sorry for my English, I'm from Russia and I'm 16

Try below code, it is not the exact output you want, but is very close:
import itertools
count =[]
stuff = [3,3,2,2,1]
for L in range(0, len(stuff)+1):
for subset in itertools.combinations(stuff, L):
if len(subset) ==2:
add = sum(subset)
count.append(len([i for i in stuff if i == add]))
print count

Related

How can I detect similarity of names in the same columns

Guys I have a dataset like this:
`
df = pd.DataFrame(data = ['John','gal britt','mona','diana','molly','merry','mony','molla','johnathon','dina'],\
columns = ['Name'])
df
`
it gives this output
Name
0 John
1 gal britt
2 mona
3 diana
4 molly
5 merry
6 mony
7 molla
8 johnathon
so I imagine that to get all names across each other and detect the similarity I will use df.merge(df,how = "cross" )
The thing is the real data is 40000 rows and performing this will result in a very big dataset which I don't have the memory for.
any algorithm or idea would really help and I'll adjust the logic to my purposes
I tried working with vaex instead of pandas to work with this huge amount of data but still I run into the problem of insufficient memory allocation.
In short: I KNOW that this algorithm or way of thinking about such problem is wrong and inefficient.

[pandas]Dividing all elements of columns in df with elements in another column (Same df)

I'm sorry, I know this is basic but I've tried to figure it out myself for 2 days by sifting through documentation to no avail.
My code:
import numpy as np
import pandas as pd
name = ["bob","bobby","bombastic"]
age = [10,20,30]
price = [111,222,333]
share = [3,6,9]
list = [name,age,price,share]
list2 = np.transpose(list)
dftest = pd.DataFrame(list2, columns = ["name","age","price","share"])
print(dftest)
name age price share
0 bob 10 111 3
1 bobby 20 222 6
2 bombastic 30 333 9
Want to divide all elements in 'price' column with all elements in 'share' column. I've tried:
print(dftest[['price']/['share']]) - Failed
dftest['price']/dftest['share'] - Failed, unsupported operand type
dftest.loc[:,'price']/dftest.loc[:,'share'] - Failed
Wondering if I could just change everything to int or float, I tried:
dftest.astype(float) - cant convert from str to float
Ive tried iter and items methods but could not understand the printouts...
My only suspicion is to use something called iterate, which I am unable to wrap my head around despite reading other old posts...
Please help me T_T
Apologies in advance for the somewhat protracted answer, but the question is somewhat unclear with regards to what exactly you're attempting to accomplish.
If you simply want price[0]/share[0], price[1]/share[1], etc. you can just do:
dftest['price_div_share'] = dftest['price'] / dftest['share']
The issue with the operand types can be solved by:
dftest['price_div_share'] = dftest['price'].astype(float) / dftest['share'].astype(float)
You're getting the cant convert from str to float error because you're trying to call astype(float) on the ENTIRE dataframe which contains string columns.
If you want to divide each item by each item, i.e. price[0] / share[0], price[1] / share[0], price[2] / share[0], price[0] / share[1], etc. You would need to iterate through each item and append the result to a new list. You can do that pretty easily with a for loop, although it may take some time if you're working with a large dataset. It would look something like this if you simply want the result:
new_list = []
for p in dftest['price'].astype(float):
for s in dftest['share'].astype(float):
new_list.append(p/s)
If you want to get this in a new dataframe you can simply save it to a new dataframe using pd.Dataframe() method:
new_df = pd.Dataframe(new_list, columns=[price_divided_by_share])
This new dataframe would only have one column (the result, as mentioned above). If you want the information from the original dataframe as well, then you would do something like the following:
new_list = []
for n, a, p in zip(dftest['name'], dftest['age'], dftest['price'].astype(float):
for s in dftest['share'].astype(float):
new_list.append([n, a, p, s, p/s])
new_df = pd.Dataframe(new_list, columns=[name, age, price, share, price_div_by_share])
If you check the data types of your dataframe, you will realise that they are all strings/object type :
dftest.dtypes
name object
age object
price object
share object
dtype: object
first step will be to change the relevant columns to numbers - this is one way:
dftest = dftest.set_index("name").astype(float)
dftest.dtypes
age float64
price float64
share float64
dtype: object
This way you make the names a useful index, and separate it from the numeric data. This is just a suggestion; you may have other reasons to leave names as a columns - in that case, you have to individually change the data types of each column.
Once that is done, you can safely execute your code :
dftest.div(dftest.share,axis=0)
age price share
name
bob 3.333333 37.0 1.0
bobby 3.333333 37.0 1.0
bombastic 3.333333 37.0 1.0
I assume this is what you expect as your outcome. If not, you can tweak it. Main part is get your data types as numbers before computation/division can occur.

Count in string terms and stored mapped to other value

I have a pandas dataframe which includes columns (amongst others) like this, with RATING being integers 0 to 5 and COMMENT is string:
RATING COMMENT
1 some text
2 more text
3 other text
... ...
I would now like to mine (for lack of better word ) the key words for a list of strings:
list = ['like', trust', 'etc etc etc']
and would like to iterate through the COMMENT and count the number of key words by rating to get a df out like so
KEYWORD RATING COUNT
like 1 202
like 2 325
like 3 0
like 4 967
like 5 534
...
trust 1 126
....
how can I achieve this?
I am beginner so would really appreciate your help (and the simpler and more understandable the better)
thank you
hi at the moment I have been iterating through manually,
ie
#DATA_df is the original data
word_list = ['word', 'words', 'words', 'more']
values = [0] * len(word_list)
tot_val=[values]*5
rating_table = pd.DataFrame(tot_val, columns=word_list)
for i in len(word_list):
for g in len (DATA_df[COMMENT]):
if i in DATA_df[COMMENT][g]:
rating_table[i][DATA_df[RATING]-1] +=1
this give a DF like so
word words words more
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
that I am then trying to add to.... it appears really clunky
I managed to solve it, key points learnt are use group by to pre-select data based on the rating, this slices the data and it is possible to alternate through the groups. also use of str.lower() in combination with str.count() worked well.
I am thankful if more experienced programmers could show me a better solution, but at least this works.
rating = [1,2,3,4,5]
rategroup = tp_clean.groupby('Rating')
#print (rategroup.groups)
results_list =[]
for w in word_list:
current = [w]
for r in rating:
stargroup = rategroup.get_group(str(r))
found = stargroup['Content'].str.lower().str.count(w)
c = found.sum()
current.append(c)
results_list.append(current)
results_df = pd.DataFrame (results_list, columns=['Keyword','1 Star','2 Star','3 Star','4 Star','5 Star'])
The one thing I am still struggling with is how to use regex to make it look for full words. I believe \b is the right one but how do I put it into str.count function?

Extractive Text Summarization: Weighting sentence location in document

I am looking at an extractive text summarization problem. Eventually, I want to generate a list of words (not sentences) that seem to be the most important. One of the ideas that I had was to the words that appear early in the document more heavily.
I have two dataframes. the first is a set of words with their occurrence counts:
words.head()
words occurrences
0 '' 2
1 11-1 1
2 2nd 1
3 april 1
4 b.
And the second is a set of sentences. 0 is the first sentence in the document, 1 is the secont.. etc.
sentences.head()
sentences
0 Site Menu expandHave a correction?...
1 This will be a chance for ...
2 The event will include...
3 Further, this...
4 Contact:Share:
I managed to accomplish my goal like this:
weights = []
for value in words.index.values:
weights.append(((len(sentences) - sentences.index.values) *
sentences['sentences'].str.contains(words['words'][value])).sum())
weights
[0,
5,
5,
0,
12,...]
words['occurrences'] *= weights
words.head()
words occurrences
0 '' 0
1 11-1 5
2 2nd 5
3 april 0
4 b. 12
However, this seems sort of sloppy. I know that I can use list comprehension (I thought it would be easier to read on here without it) - but, other than that, does anyone have thoughts on a more elegant solution to this problem?

pandas df remove items from df1 that are also in df2

I have two very large csv files. They are both only one col with integers. I need to check for every integer in dfA if they are in dfB. If so, I need to remove item a from dfA.
I would probably loop through dfA and check for every value if in dfB, but looping is wayyyy too slow.
dfA :
0
0 9312969810
1 3045897298
2 8162414592
3 2030000000
4 7876904982
dfB:
0
0 2030000000
1 2030156119
2 2030389149
3 2030641047
4 2030693850
output:
0
0 2030156119
1 2030389149
2 2030641047
3 2030693850
Since 2030000000 is in dfB, we need to remove from dfA.
Does anyone have a better way.
Thanks
edit: csv for dfB is 2gb and dfA is 5mb
There's no 'magic bullet' here, you'll have to loop through each list at least once
You can iterate through just one of the lists as follows (though, i think under the hood, we iterate through both lists)
dfA = pd.read_csv(file1)
dfB = pd.read_csv(file2)
for n in dfB.values:
dfA = dfA[dfA[0] != n]
Alternative, what Zero said, but I think that's still (under the hood) doing (more efficient) looping
dfA[~dfA[0].isin(dfB[0])]