Pandas str split. Can I skip line which gives troubles? - dataframe

I have a dataframe (all5) including one column with dates('CREATIE_DATUM'). Sometimes the notation is 01/JAN/2015 sometimes it's written as 01-JAN-15.
I only need the year, so I wrote the following code line:
all5[['Day','Month','Year']]=all5['CREATIE_DATUM'].str.split('-/',expand=True)
but I get the following error:
columns must be same length as key
so I assume somewhere in my dataframe (>100.000 lines) a value has more than two '/' signs.
How can I make my code skip this line?

You can try to use pd.to_datetime and then use .dt property to access day, month and year:
x = pd.to_datetime(all5["CREATIE_DATUM"])
all5["Day"] = x.dt.day
all5["Month"] = x.dt.month
all5["Year"] = x.dt.year

Related

The column label 'national_id' is not unique

I have two dataframe but when I use this :
total_views.columns = [['national_id','total_views']]
ret.columns = [['national_id','count']]
merged_df = ret.merge(total_views, on= 'national_id')
I got this error "The column label 'national_id' is not unique"
but when I use this one :
total_views.columns = ['national_id','total_views']
ret.columns = ['national_id','count']
merged_df = ret.merge(total_views, on= 'national_id')
my code work properly
I can't find any difference between one bracket or two brackets
I try figure out what's difference between this 2 syntaxes
using double square brackets creates a MultiIndex for the columns which is why you get your error. If you run total_views.columns after running the lines of code to rename columns using double brackets, you get:
MultiIndex([('national_id',),
('total_views',)],
)
Whereas running it after renaming the columns using single square brackets gives a normal index:
Index(['national_id', 'total_views'], dtype='object')

Using to_datetime several columns names

I am working with several CSV's that first N columns are information and then the next Ms (M is big) columns are information regarding a date.
This is the dataframe picture
I need to set just the columns between N+1 to N+M - 1 columns name to date format.
I tried this, in this case N+1 = 5, no matter M, I suppose that I can use -1 to not affect the last column name.
ContDiarios.columns[5:-1] = pd.to_datetime(ContDiarios.columns[5:-1])
but I get the following error:
TypeError: Index does not support mutable operations
The way you are doing is not feasable. Please try this way
def convert(x):
try:
return pd.to_datetime(x)
except:
return x
x.columns = map(convert,x.columns)
Or you can also use df.rename property to convert it.

How do I reverse each value in a column bit wise for a hex number?

I have a dataframe which has a column called hexa which has hex values like this. They are of dtype object.
hexa
0 00802259AA8D6204
1 00802259AA7F4504
2 00802259AA8D5A04
I would like to remove the first and last bits and reverse the values bitwise as follows:
hexa-rev
0 628DAA592280
1 457FAA592280
2 5A8DAA592280
Please help
I'll show you the complete solution up here and then explain its parts below:
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
list_of_bits = [i+j for i, j in zip(trimmed_bits[::2], trimmed_bits[1::2])]
reversed_bits = [list_of_bits[-i] for i in range(1,len(list_of_bits)+1)]
return ''.join(reversed_bits)
df['hexa-rev'] = df['hexa'].apply(lambda x: reverse_bits(x))
There are possibly a couple ways of doing it, but this way should solve your problem. The general strategy will be defining a function and then using the apply() method to apply it to all values in the column. It should look something like this:
df['hexa-rev'] = df['hexa'].apply(lambda x: reverse_bits(x))
Now we need to define the function we're going to apply to it. Breaking it down into its parts, we strip the first and last bit by indexing. Because of how negative indexes work, this will eliminate the first and last bit, regardless of the size. Your result is a list of characters that we will join together after processing.
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
The second line iterates through the list of characters, matches the first and second character of each bit together, and then concatenates them into a single string representing the bit.
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
list_of_bits = [i+j for i, j in zip(trimmed_bits[::2], trimmed_bits[1::2])]
The second to last line returns the list you just made in reverse order. Lastly, the function returns a single string of bits.
def reverse_bits(bits):
trimmed_bits = bits[2:-2]
list_of_bits = [i+j for i, j in zip(trimmed_bits[::2], trimmed_bits[1::2])]
reversed_bits = [list_of_bits[-i] for i in range(1,len(list_of_bits)+1)]
return ''.join(reversed_bits)
I explained it in reverse order, but you want to define this function that you want applied to your column, and then use the apply() function to make it happen.

From one line iteration to loop to include exception management

I have two columns and I need to check whether the value in one column, all_news['Query], is in another column, all['description'] column.
I found the following solution:
all_news['C'] = on.apply(lambda x: x.Query in x.description, axis=1)
but I get the following error:
TypeError: ("argument of type 'float' is not iterable", 'occurred at
index 737')
Likely because there are some weird characters the iteration cannot decipher and it seems I cannot run any exception management in a one-line iteration.
How can I unfold this one line iteration into a for loop?
Result for index 737:
Query = 'medike international'
description = 'po ketvirtadienį praūžusios liūties nukentėjo ne tik kauno miestas, bet ir rajonas. pliaupiant lietui prie vilkijos vydūno alėjos esančioje apžvalgos aikštelėje ...'

How to change pandas Period Index to lower case? [duplicate]

This question already has an answer here:
Convert pandas._period.Period type Column names to Lowercase
(1 answer)
Closed 4 years ago.
I have a dataframe where I used
df.groupby(pd.PeriodIndex(df.columns, freq='Q'), axis=1).mean() to combine all column names from month into quarter by taking the mean.
However, the result dataframe has columns like below and I could not change all upper case Q into lower case 'q'.
PeriodIndex(['2000Q1', '2000Q2', '2000Q3', '2000Q4', '2001Q1', '2001Q2',
'2001Q3', '2001Q4', '2002Q1', '2002Q2', '2002Q3', '2002Q4',
'2003Q1', '2003Q2', '2003Q3', '2003Q4', '2004Q1', '2004Q2',
'2004Q3', '2004Q4', '2005Q1', '2005Q2', '2005Q3', '2005Q4',
'2006Q1', '2006Q2', '2006Q3', '2006Q4', '2007Q1', '2007Q2',
'2007Q3', '2007Q4', '2008Q1', '2008Q2', '2008Q3', '2008Q4',
'2009Q1', '2009Q2', '2009Q3', '2009Q4', '2010Q1', '2010Q2',
'2010Q3', '2010Q4', '2011Q1', '2011Q2', '2011Q3', '2011Q4',
'2012Q1', '2012Q2', '2012Q3', '2012Q4', '2013Q1', '2013Q2',
'2013Q3', '2013Q4', '2014Q1', '2014Q2', '2014Q3', '2014Q4',
'2015Q1', '2015Q2', '2015Q3', '2015Q4', '2016Q1', '2016Q2',
'2016Q3'],
dtype='period[Q-DEC]', freq='Q-DEC')
I have tried using df.columns=[x.lower() for x in df.columns] and it gives an
error:'Period' object has no attribute 'lower'
This looks like a duplicate of the issue posted here: Convert pandas._period.Period type Column names to Lowercase
Basically, you'll want to reformat the Period output to have a lowercase q like so:
df.columns = df.columns.strftime('%Yq%q')
Alternatively, if you want to modify your PeriodIndex object directly, you can do something like:
# get the PeriodIndex object you pasted in your question
periods = df.groupby(pd.PeriodIndex(df.columns, freq='Q'), axis=1).mean()
# format the entries accordingly
periods = [p.strftime('%Yq%q') for p in periods]
The %Y denotes the year format, the first q is the lowercase "q" you want, and the %q is the quartile.
Here is the documentation for a Period's strftime() method, which returns the formatted time string. At the bottom they have some nice examples!
Looking at the methods listed in the Pandas documentation, lower() isn't an available method for the Period object, which is why you're getting this error (a PeriodIndex is just an array of Periods, which denote a chunk of time).