Changing the order of a DataFrame by 1 in paython - pandas

I have the following data frame:
df = [NaT, 1, 2, 3, 4, 5]
and I am trying to get the following data frame of the previous one but with changing the order, something like this:
df = [1, 2, 3, 4, 5, NaT]
Any help would be very appreciated. Thank you in advance

You can use .sort_values to sort the dataframe:
df = pd.DataFrame([pd.NaT, 1, 2, 3, 4, 5])
df = df.sort_values(by=[0]) # replace 0 with your column name
Which results in:
0
1 1
2 2
3 3
4 4
5 5
0 NaT

Use python numpy
import numpy as np
array = np.array([1,2,3,4,5])
array_new = np.roll(array, 1)
print(array_new)

Related

Combine unequal length lists to dataframe pandas with values repeating

How to add a list to a dataframe column such that the values repeat for every row of the dataframe?
mylist = ['one error','delay error']
df['error'] = mylist
This gives error of unequal length as df has 2000 rows. I can still add it if I make mylist into a series, however that only appends to the first row and the output looks like this:
d = {'col1': [1, 2, 3, 4, 5],
'col2': [3, 4, 9, 11, 17],
'error':['one error',np.NaN,np.NaN,np.NaN,np.NaN]}
df = pd.DataFrame(data=d)
However I would want the solution to look like this:
d = {'col1': [1, 2, 3, 4, 5],
'col2': [3, 4, 9, 11, 17],
'error':[''one error','delay error'',''one error','delay error'',''one error','delay error'',''one error','delay error'',''one error','delay error'']}
df = pd.DataFrame(data=d)
I have tried ffill() but it didn't work.
You can assign to the result of df.to_numpy(). Note that you'll have to use [mylist] instead of mylist, even though it's already a list ;)
>>> mylist = ['one error']
>>> df['error'].to_numpy()[:] = [mylist]
>>> df
col1 col2 error
0 1 3 [one error]
1 2 4 [one error]
2 3 9 [one error]
3 4 11 [one error]
4 5 17 [one error]
>>> mylist = ['abc', 'def', 'ghi']
>>> df['error'].to_numpy()[:] = [mylist]
>>> df
col1 col2 error
0 1 3 [abc, def, ghi]
1 2 4 [abc, def, ghi]
2 3 9 [abc, def, ghi]
3 4 11 [abc, def, ghi]
4 5 17 [abc, def, ghi]
It's not a very clean way to do it, but you can first update your mylist to become the same length as the rows in dataframe, and only then you put it into your dataframe.
mylist = ['one error','delay error']
new_mylist = [mylist for i in range(len(df['col1']))]
df['error'] = new_mylist
Repeat the elements in mylist exactly N times where N is the ceil of quotient obtained after dividing length of dataframe with length of list, now assign this to new column but while assigning make sure that the length of repeated list don't exceed the length of column
df['error'] = (mylist * (len(df) // len(mylist) + 1))[:len(df)]
col1 col2 error
0 1 3 one error
1 2 4 delay error
2 3 9 one error
3 4 11 delay error
4 5 17 one error
df.assign(error=mylist.__str__())

how to generate random numbers that can be summed to a specific value?

I have 2 dataframe as follows:
import pandas as pd
import numpy as np
# Create data set.
dataSet1 = {'id': ['A', 'B', 'C'],
'value' : [9,20,20]}
dataSet2 = {'id' : ['A', 'A','A','B','B','B','C'],
'id_2': [1, 2, 3, 2,3,4,1]}
# Create dataframe with data set and named columns.
df_map1 = pd.DataFrame(dataSet1, columns= ['id', 'value'])
df_map2 = pd.DataFrame(dataSet2, columns= ['id','id_2'])
df_map1
id value
0 A 9
1 B 20
2 C 20
df_map2
id id_2
0 A 1
1 A 2
2 A 3
3 B 2
4 B 3
5 B 4
6 C 1
where id_2 can have dups of id. (namely id_2 is subset of id)
#doing a quick merge, based on id.
df = df_map1.merge(df_map2 ,on=['id'])
id value id_2
0 A 9 1
1 A 9 2
2 A 9 3
3 B 20 2
4 B 20 3
5 B 20 4
6 C 20 1
I can represent what's the relationship between id and id_2 as follows
id_ref = df.groupby('id')['id_2'].apply(list).to_dict()
{'A': [1, 2, 3], 'B': [2, 3, 4], 'C': [1]}
Now, I would like to generate random integer say 0 to 3 put the list (5 elements for exmaple) into the pandas df and explode.
import numpy as np
import random
df['random_value'] = df.apply(lambda _: np.random.randint(0,3, 5), axis=1)
id value id_2 random_value
0 A 9 1 [0, 0, 0, 0, 1]
1 A 9 2 [0, 2, 1, 2, 1]
2 A 9 3 [0, 1, 2, 2, 1]
3 B 20 2 [2, 1, 1, 2, 2]
4 B 20 3 [0, 0, 0, 0, 0]
5 B 20 4 [1, 0, 0, 1, 0]
6 C 20 1 [1, 2, 2, 2, 1]
The condition for generating this random_value list, is that sum of the list has to be equal to 9.
That means, for id : A, if we sum all the elements inside the list, we have total of 13 shown the description below, but what we want is 9:
and same concept for id B and C.. and so on....
is there anyway to achieve this?
# i was looking into multinomial from np.random function... seems this should be the solution but im not sure how to apply this with pandas.
np.random.multinomial(9, np.ones(5)/5, size = 1)[0]
=> array([2,3,3,0,1])
2+3+3+0+1 = 9
ATTEMPT/IDEA ...
given that we have list of id_2. ie) id: A has 3 distinct elements [1,2,3].
so id A is mapped to 3 different elements. so we can get
3 * 5 = 15 ( which will be our long list )
3: length of list
5: create 5 elements of list
hence
list_A = np.random.multinomial(9,np.ones(3*5)/(3*5) ,size = 1)[0]
and then we evenly distribute/split the list.
using this list comprehension:
[list_A [i:i + n] for i in range(0, len(list_A ), n)]
but I am still unsure how to do this dynamically.
The core idea is as you said (about getting 3*5=15 numbers), plus reshaping it into a 2D array with the same number of rows as that id has in the dataframe. The following function does that,
def generate_random_numbers(df):
value = df['value'].iloc[0]
list_len = 5
num_rows = len(df)
num_rand = list_len*num_rows
return pd.Series(
map(list, np.random.multinomial(value, np.ones(num_rand)/num_rand).reshape(num_rows, -1)),
df.index
)
And apply it:
df['random_value'] = df.groupby(['id', 'value'], as_index=False).apply(generate_random_numbers).droplevel(0)

I need to get the values that have a specific result on .value_counts()

I want to get the values that have a specific result on .value_counts(), eg:
import numpy as np
import pandas as pd
data = np.array([3, 3, 5, 6, 7, 7, 7, 9, 9])
ser = pd.Series(data)
counts_nums = ser.value_counts()
print(counts_nums)
Here are the results:
7 3
9 2
3 2
6 1
5 1
dtype: int64
Now, I want to find a way to get the values have a count number equal to 2, which are 9 and 3. In other words, I want to index .value_counts()
What are the different ways of doing this?
Try this to get a list.
count_values[count_values == 2].index.tolist()
Output:
[9, 3]
counts_nums.loc[counts_nums==2]

How to get the index of each increment in pandas series?

how to get the index of pandas series when the value incremented by one?
Ex. The input is
A
0 0
1 1
2 1
3 1
4 2
5 2
6 3
7 4
8 4
the output should be: [0, 1, 4, 6, 7]
You can use Series.duplicated and access the index, should be slightly faster.
df.index[~df.A.duplicated()]
# Int64Index([0, 1, 4, 6, 7], dtype='int64')
If you really want a list, you can do this,
df.index[~df.A.duplicated()].tolist()
# [0, 1, 4, 6, 7]
Note that duplicated (and drop_duplicates) will only work if your Series does not have any decrements.
Alternatively, you can use diff here, and index into df.index, similar to the previous solution:
np.insert(df.index[df.A.diff().gt(0)], 0, 0)
# Int64Index([0, 1, 4, 6, 7], dtype='int64')
It is drop_duplicates
df.drop_duplicates('A').index.tolist()
[0, 1, 4, 6, 7]
This makes sure the second row is incremented by one (not by two or anything else!)
df[ ((df.A.shift(-1) - df.A) == 1.0)].index.values
output is numpy array:
array([2, 5])
Example:
# * * here value increase by 1
# 0 1 2 3 4 5 6 7
df = pd.DataFrame({ 'A' : [1, 1, 1, 2, 8, 3, 4, 4]})
df[ ((df.A.shift(-1) - df.A) == 1.0)].index.values
array([2, 5])

how to convert a pandas column containing list into dataframe

I have a pandas dataframe.
One of its columns contains a list of 60 elements, constant across its rows.
How do I convert each of these lists into a row of a new dataframe?
Just to be clearer: say A is the original dataframe with n rows. One of its columns contains a list of 60 elements.
I need to create a new dataframe nx60.
My tentative:
def expand(x):
return(pd.DataFrame(np.array(x)).reshape(-1,len(x)))
df["col"].apply(lambda x: expand(x))]
it gives funny results....
The weird thing is that if i call the function "expand" on a single raw, it does exactly what I expect from it
expand(df["col"][0])
To ChootsMagoots: Thjis is the result when i try to apply your suggestion. It does not work.
Sample data
df = pd.DataFrame()
df['col'] = np.arange(4*5).reshape(4,5).tolist()
df
Output:
col
0 [0, 1, 2, 3, 4]
1 [5, 6, 7, 8, 9]
2 [10, 11, 12, 13, 14]
3 [15, 16, 17, 18, 19]
now exctract DataFrame from col
df.col.apply(pd.Series)
Output:
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
Try this:
new_df = pd.DataFrame(df["col"].tolist())
This is a little frankensteinish, but you could also try:
import numpy as np
np.savetxt('outfile.csv', np.array(df['col'].tolist()), delimiter=',')
new_df = pd.read_csv('outfile.csv')
You can try this as well:
newCol = pd.Series(yourList)
df['colD'] = newCol.values
The above code:
1. Creates a pandas series.
2. Maps the series value to columns in original dataframe.