I'd like to create a multi index where: the first column is 'ID', the second column is Revenue and EBITDA, and the 5 different dates as columns.
Please see image of the dataframe attached. Thanks guys
https://i.stack.imgur.com/Rit5j.png
Got it eventually:
dataframe = dataframe.pivot_table(index=['ID'],values= ['Revenue','EBITDA'],columns='Date').stack(0,1)
Related
data_int['$5,000-6,999'] = data_int.iloc[1:5,6:8].sum(axis=1)
I tried to sum the columns 5 and 7, '5000-5999' and '6000-6999' respectively. However, the numbers are not displaying correctly, it shows e+.... Please help! Thanks!
So I am new to using Python Pandas dataframes.
I have a dataframe with one column representing customer ids and the other holding flavors and satisfaction scores that looks something like this.
Although each customer should have 6 rows dedicated to them, Customer 1 only has 5. How do I create a new dataframe that will only print out customers who have 6 rows?
I tried doing: df['Customer No'].value_counts() == 6 but it is not working.
Here is one way to do it
if you post data as a code (preferably) or text, i would be able to share the result
# create a temporary column 'c' by grouping on Customer No
# and assigning count to it using transform
# finally, using loc to select rows that has a count eq 6
(df.loc[df.assign(
c=df.groupby(['Customer No'])['Customer No']
.transform('count'))['c'].eq(6]
)
I am working with a dataframe and one of the columns has for values a list of strings in each row. The list contains a number of links (each list can have a different number of links). I want to create a new column that will be based on this column of lists but keep only the links that have the keyword "uploads".
To my example, the first entry of the column is like that:
['https://seekingalpha.com/instablog/5006891-hfir/4960045-natural-gas-daily',
'https://seekingalpha.com/article/4116929-weekly-natural-gas-storage-report',
'https://static.seekingalpha.com/uploads/2017/10/26/5006891-15090647719993095_origin.png',
'https://static.seekingalpha.com/uploads/2017/10/26/5006891-15090647854075453_origin.png',
'https://static.seekingalpha.com/uploads/2017/10/26/5006891-1509065004154725_origin.png',
'https://seekingalpha.com/account/research/subscribe?slug=hfir-energy&sasource=upsell']
And I want to keep only
['https://static.seekingalpha.com/uploads/2017/10/26/5006891-15090647719993095_origin.png',
'https://static.seekingalpha.com/uploads/2017/10/26/5006891-15090647854075453_origin.png',
'https://static.seekingalpha.com/uploads/2017/10/26/5006891-1509065004154725_origin.png']
And put the clean version in a new column of the same dataframe.
Can you please suggest a way to do it?
I just found a way where I create a function that looks within a list for a specific pattern (in my case the keyword "uploads")
def clean_alt_list(list_):
list_ = [s for s in list_ if "uploads" in s]
return list_
And then I apply this function into the column I am interested in
df['clean_links'] = df['links'].apply(clean_alt_list)
IIUC, this should work for you:
df = pd.DataFrame({'url': [['https://seekingalpha.com/instablog/5006891-hfir/4960045-natural-gas-daily', 'https://seekingalpha.com/article/4116929-weekly-natural-gas-storage-report', 'https://static.seekingalpha.com/uploads/2017/10/26/5006891-15090647719993095_origin.png', 'https://static.seekingalpha.com/uploads/2017/10/26/5006891-15090647854075453_origin.png', 'https://static.seekingalpha.com/uploads/2017/10/26/5006891-1509065004154725_origin.png', 'https://seekingalpha.com/account/research/subscribe?slug=hfir-energy&sasource=upsell']]})
df = df.explode('url').reset_index(drop=True)
df[df['url'].str.contains('uploads')]
Result:
url
2 https://static.seekingalpha.com/uploads/2017/1...
3 https://static.seekingalpha.com/uploads/2017/1...
4 https://static.seekingalpha.com/uploads/2017/1...
I'm just thinking in a hypothetical dataframe (df) with around 50 columns and 30000 rows, and one hypothetical column like e.g: Toy = ['Ball','Doll','Horse',...,'Sheriff',etc].
Now I only have the name of the column (Toy) and I want to know what are the variables inside the column without duplicated values.
I'm thinking an output like the .describe() function
df['Toy'].describe()
but with more info, because now I'm getting only this output
count 30904
unique 7
top "Doll"
freq 16562
Name: Toy, dtype: object
In other words, how do I get the 7 values in this column. I was thinking in something like copy the column and delete duplicated values, but I'm pretty sure that there is a shorter way. Do you know the right code or if I should use another library?
Thank you so much!
You can use unique() function to list out all the unique values in your columns. In your case, to list out the unique values in the column name toys in the dataframe df the syntax would look like
df["toys"].unique()
You can also use .drop_duplicates(), which returns a pandas Series:
df['toys'].drop_duplicates()
I have a pandas Data Frame where some of the id's are repeated a few times. I've written this code:
df = df["id"].value_counts()
and got this output
What should I do to get something like in the following image?
Thanks
As Quang Hoang answered, value_counts set the column you count as the index. Therefore in order to get the id and the count as columns, you need to do 2 things:
Make the counts as column - to_frame(name='B')
Reset the index to make the ids another column which we'll rename to the desired name: .reset_index().rename(columns={'index': 'A'})
So in one line it'll be:
df = df["id"].value_counts().to_frame(name='B').reset_index().rename(columns={'index': 'A'})
Another possible way is:
col = list(["A", "B")]
df.columns = col