adding href link to a pandas data frame

adding href link to a pandas data frame - pandas

I have sample dataframe
Date Announcement href
Apr 9, 2020 Hello World https://helloworld.com/
data = {'Date': ['c' , 'Apr 8,2010'], 'Announcement': ['Hello World A', 'Hello World B'], 'href': ['https://helloworld.com', 'https://helloworldb.com'}
df = pd.DataFrame(data, columns=['Date', 'Announcement', 'href']
df.to_excel("announce.xls', engine='xlswriter')
I am trying to figure out how can i just have output in xls as following: dataframe in announcement column should have a link to href
Date Announcement
Apr 9, 2020 Hello World
https://helloworld.com/

Updated to your embed the url in the cell. The trick is to use the *.xslx format, as opposed to the 1997 *.xls format:
import pandas as pd
data = {
'Date': ['c' , 'Apr 8,2010'],
'Announcement': ['=HYPERLINK("http://helloworld.com", "Hello World A")','=HYPERLINK("http://helloworldb.com", "Hello World B")'],
}
df = pd.DataFrame(data, columns=['Date', 'Announcement'])
df.to_excel('announce.xlsx')

Related

Compare each string with all other strings in a dataframe

I have this dataframe:
mylist = [
"₹67.00 to Rupam Sweets using Bank Account XXXXXXXX5343<br>11 Feb 2023, 20:42:25",
"₹66.00 to Rupam Sweets using Bank Account XXXXXXXX5343<br>10 Feb 2023, 21:09:23",
"₹32.00 to Nagori Sajjad Mohammed Sayyed using Bank Account XXXXXXXX5343<br>9 Feb 2023, 07:06:52",
"₹110.00 to Vikram Manohar Jsohi using Bank Account XXXXXXXX5343<br>9 Feb 2023, 06:40:08",
"₹120.00 to Winner Dinesh Gupta using Bank Account XXXXXXXX5343<br>30 Jan 2023, 06:23:55",
]
import pandas as pd
df = pd.DataFrame(mylist)
df.columns = ["full_text"]
ndf = df.full_text.str.split("to", expand=True)
ndf.columns = ["amt", "full_text"]
ndf2 = ndf.full_text.str.split("using Bank Account XXXXXXXX5343<br>", expand=True)
ndf2.columns = ["client", "date"]
df = ndf.join(ndf2)[["date", "client", "amt"]]
I have created embeddings for each client name:
from openai.embeddings_utils import get_embedding, cosine_similarity
import openai
openai.api_key = 'xxx'
embedding_model = "text-embedding-ada-002"
embeddings = df.client.apply([lambda x: get_embedding(x, engine=embedding_model)])
df["embeddings"] = embeddings
I can now calculate the similarity index for a given string. For e.g. "Rupam Sweet" using:
query_embedding = get_embedding("Rupam Sweet", engine="text-embedding-ada-002")
df["similarity"] = df.embeddings.apply(lambda x: cosine_similarity(x, query_embedding))
But I need the similarity score of each client across all other clients. In other words, the client names will be in rows as well as in columns and the score will be the data. How do I achieve this?

I managed to get the expected results using:
for k, i in enumerate(df.client):
query_embedding = get_embedding(i, engine="text-embedding-ada-002")
if i in df.columns:
df[i + str(k)] = df.embeddings.apply(
lambda x: cosine_similarity(x, query_embedding)
)
else:
df[i] = df.embeddings.apply(lambda x: cosine_similarity(x, query_embedding))
I am not sure if this is efficient in case of big data.

Aggregated dict from pandas dataframe

My Pandas df is below. I wish to convert that to aggregated key-value pair. Below is what I have achieved and also where I am falling short.
import pandas as pd
import io
data = """
Name factory1 factory2 factory3
Philips China US
Facebook US
Taobao China Taiwan Australia
"""
df = pd.read_table(io.StringIO(data), delim_whitespace=True)
df.set_index('Name').to_dict('index')
{'Philips': {'factory1': 'China', 'factory2': 'US', 'factory3': nan},
'Facebook': {'factory1': 'US', 'factory2': nan, 'factory3': nan},
'Taobao': {'factory1': 'China', 'factory2': 'Taiwan', 'factory3': 'Australia'}}
My expected output is :
{'Philips': {'China', 'US'},
'Facebook': {'US'},
'Taobao': {'China', 'Taiwan', 'Australia'}}
is there someway to aggregate!

Let us try stack with groupby to_dict
out = df.set_index('Name').stack().groupby(level=0).agg(set).to_dict()
Out[109]:
{'Facebook': {'US'},
'Philips': {'China', 'US'},
'Taobao': {'Australia', 'China', 'Taiwan'}}

Collapsing a PANDAs dataframe into a single column of all items and their occurances

I have a data frame consisting of a mixture of NaN's and strings e.g
data = {'String1':['NaN', 'tree', 'car', 'tree'],
'String2':['cat','dog','car','tree'],
'String3':['fish','tree','NaN','tree']}
ddf = pd.DataFrame(data)
I want to
1:count the total number of items and put in a new data frame e.g
NaN=2
tree=5
car=2
fish=1
cat=1
dog=1
2:Count the total number of items when compared to a separate longer list (column of a another data frame, e.g
df['compare'] =
NaN
tree
car
fish
cat
dog
rabbit
Pear
Orange
snow
rain
Thanks
Jason

For the first question:
from collections import Counter
data = {
"String1": ["NaN", "tree", "car", "tree"],
"String2": ["cat", "dog", "car", "tree"],
"String3": ["fish", "tree", "NaN", "tree"],
}
ddf = pd.DataFrame(data)
a = Counter(ddf.stack().tolist())
df_result = pd.DataFrame(dict(a), index=['Count']).T
df = pd.DataFrame({'vals':['NaN', 'tree', 'car', 'fish', 'cat', 'dog', 'rabbit', 'Pear', 'Orange', 'snow', 'rain']})
df_counts = df.vals.map(df_result.to_dict()['Count'])
THis should do :)

You can use the following code for count of items over all data frame.
import pandas as pd
data = {'String1':['NaN', 'tree', 'car', 'tree'],
'String2':['cat','dog','car','tree'],
'String3':['fish','tree','NaN','tree']}
df = pd.DataFrame(data)
def get_counts(df: pd.DataFrame) -> dict:
res = {}
for col in df.columns:
vc = df[col].value_counts().to_dict()
for k,v in vc.items():
if k in res:
res[k] += v
else:
res[k] = v
return res
counts = get_counts(df)
Output
>>> print(counts)
{'tree': 5, 'car': 2, 'NaN': 2, 'cat': 1, 'dog': 1, 'fish': 1}

missing data in pandas profiling report

I am using Python 2.7 and Pandas Profiling to generate a report out of a dataframe. Following is my code:
import pandas as pd
import pandas_profiling
# the actual dataset is very large, just providing the two elements of the list
data = [{'polarity': 0.0, 'name': u'danesh bhopi', 'sentiment': 'Neutral', 'tweet_id': 1049952424818020353, 'original_tweet_id': 1049952424818020353, 'created_at': Timestamp('2018-10-10 14:18:59'), 'tweet_text': u"Wouldn't mind aus 120 all-out but before that would like to see a Finch \U0001f4af #PakVAus #AUSvPAK", 'source': u'Twitter for Android', 'location': u'pune', 'retweet_count': 0, 'geo': '', 'favorite_count': 0, 'screen_name': u'DaneshBhope'}, {'polarity': 1.0, 'name': u'kamal Kishor parihar', 'sentiment': 'Positive', 'tweet_id': 1049952403980775425, 'original_tweet_id': 1049952403980775425, 'created_at': Timestamp('2018-10-10 14:18:54'), 'tweet_text': u'#the_summer_game What you and Australia think\nPlay for\n win \nDraw\n or....! #PakvAus', 'source': u'Twitter for Android', 'location': u'chembur Mumbai ', 'retweet_count': 0, 'geo': '', 'favorite_count': 0, 'screen_name': u'kaluparihar1'}]
df = pd.DataFrame(data) #data is a python list containing python dictionaries
pfr = pandas_profiling.ProfileReport(df)
pfr.to_file("df_report.html")
The screenshot of the part of the df_report.html file is below:
As you can see in the image, the Unique(%) field in all the variables is 0.0 although the columns have unique values.
Apart from this, the chart in the 'location' variable is broken. There is no bar for the values 22, 15, 4 and the only bar is for the maximum value only. This is happening in all the variables.
Any help would be appreciated.

pandas xlsxwriter stacked barchart

I am looking to upload a grouped barchart in excel, however I can't seem to find a way to do so.
Here is my code:
bar_chart2 = workbook.add_chart({'type':'column'})
bar_chart2.add_series({
'name':'Month over month product',
'categories':'=Month over month!$H$2:$H$6',
'values':'=Month over month!$I$2:$J$6',
})
bar_chart2.set_legend({'none': True})
worksheet5.insert_chart('F8',bar_chart2)
bar_chart2.set_legend({'none': True})
worksheet5.insert_chart('F8',bar_chart2)
However, I get that.

Using your provided data, I re-worked the Example given in the Docs by jmcnamara (link here) to suit what you're looking for.
Full Code:
import pandas as pd
import xlsxwriter
headings = [' ', 'Apr 2017', 'May 2017']
data = [
['NGN', 'UGX', 'KES', 'TZS', 'CNY'],
[5816, 1121, 115, 146, 1],
[7089, 1095, 226, 120, 0],
]
#opening workbook
workbook = xlsxwriter.Workbook("test.xlsx")
worksheet5 = workbook.add_worksheet('Month over month')
worksheet5.write_row('H1', headings)
worksheet5.write_column('H2', data[0])
worksheet5.write_column('I2', data[1])
worksheet5.write_column('J2', data[2])
# beginning of OP snippet
bar_chart2 = workbook.add_chart({'type':'column'})
bar_chart2.add_series({
'name': "='Month over month'!$I$1",
'categories': "='Month over month'!$H$2:$H$6",
'values': "='Month over month'!$I$2:$I$6",
})
bar_chart2.add_series({
'name': "='Month over month'!$J$1",
'categories': "='Month over month'!$H$2:$H$6",
'values': "='Month over month'!$J$2:$J$6",
})
bar_chart2.set_title ({'name': 'Month over month product'})
bar_chart2.set_style(11)
#I took the liberty of leaving the legend in there - it was commented in originally
#bar_chart2.set_legend({'none': True})
# end of OP snippet
worksheet5.insert_chart('F8', bar_chart2)
workbook.close()
Output:

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

adding href link to a pandas data frame - pandas

Related

Compare each string with all other strings in a dataframe

Aggregated dict from pandas dataframe

Collapsing a PANDAs dataframe into a single column of all items and their occurances

missing data in pandas profiling report

pandas xlsxwriter stacked barchart

Categories

Resources