Pyshark dataframe column to string - pyshark

I have tried to convert pyshark df column to string.
Df has only one column keys. Keys are like 1,2,3,4.
So is there possible convert column to string.
Wanted outcome is something like this
Keys = "(1,2,3,4)"

Related

Aggregating multiple data types in pandas groupby

I have a data frame with rows that are mostly translations of other rows e.g. an English row and an Arabic row. They share an identifier (location_shelfLocator) and I'm trying to merge the rows together based on the identifier match. In some columns the Arabic doesn't contain a translation, but the same English value (e.g. for the language column both records might have ['ger'] which becomes ['ger', 'ger']) so I would like to get rid of these duplicate values. This is my code:
df_merged = df_filled.groupby("location_shelfLocator").agg(
lambda x: np.unique(x.tolist())
)
It works when the values being aggregated are the same type (e.g. when they are both strings or when they are both arrays). When one is a string and the other is an array, it doesn't work. I get this warning:
FutureWarning: ['subject_name_namePart'] did not aggregate successfully. If any error is raised this will raise in a future version of pandas. Drop these columns/ops to avoid this warning.
df_merged = df_filled.groupby("location_shelfLocator").agg(lambda x: np.unique(x.tolist()))
and the offending column is removed from the final data frame. Any idea how I can combine these values and remove duplicates when they are both lists, both strings, or one of each?
Here is some sample data:
location_shelfLocator,language_languageTerm,subject_topic,accessCondition,subject_name_namePart
81055/vdc_100000000094.0x000093,ara,"['فلك، العرب', 'فلك، اليونان', 'فلك، العصور الوسطى', 'الكواكب']",المُلكية العامة,كلاوديوس بطلميوس (بطليمو)
81055/vdc_100000000094.0x000093,ara,"['Astronomy, Arab', 'Astronomy, Greek', 'Astronomy, Medieval', 'Constellations']",Public Domain,"['Claudius Ptolemaeus (Ptolemy)', ""'Abd al-Raḥmān ibn 'Umar Ṣūfī""]"
And expected output:
location_shelfLocator,language_languageTerm,subject_topic,accessCondition,subject_name_namePart
"[‘81055/vdc_100000000094.0x000093’] ",[‘ara’],"['فلك، العرب', 'فلك، اليونان', 'فلك، العصور الوسطى', ‘الكواكب’, 'Astronomy, Arab', 'Astronomy, Greek', 'Astronomy, Medieval', 'Constellations']","[‘المُلكية العامة’, ‘Public Domain’]","[‘كلاوديوس بطلميوس (بطليمو)’,’Claudius Ptolemaeus (Ptolemy)', ""'Abd al-Raḥmān ibn 'Umar Ṣūfī""]"
If you cannot have a control over the input value, you need to fix it somehow.
Something like this. Here, I am converting string value in subject_name_namePart to array of string.
from ast import literal_eval
mask = df.subject_name_namePart.str[0] != '['
df.loc[mask, 'subject_name_namePart'] = "['" + df.loc[mask, 'subject_name_namePart'] + "']"
df['subject_name_namePart'] = df.subject_name_namePart.transform(literal_eval)
Then, you can do (explode) + aggregation.
df = df.explode('subject_name_namePart')
df = df.groupby('location_shelfLocator').agg(lambda x: x.unique().tolist())

How to concatenate numerous column names in pandas?

I would like to concatenate all the columns with comma-delimitted in pandas.
But as you can seem it is very laborious tasks since I manually typed all the column indices.
de = data[3]+","+data[4]+","+data[5]+....+","+data[1511]
do you have any idea to avoid above procedure in pandas in python3?
First convert all columns to strings by DataFrame.astype and then possible add join per rows:
df = data.astype(str).apply(','.join, axis=1)
Or after convert to strings add ,, then sum and last remove last , by Series.str.rstrip:
df = data.astype(str).add(',').sum(axis=1).str.rstrip(',')

Convert Series to Dataframe where series index is Dataframe column names

I am selecting row by row as follows:
for i in range(num_rows):
row = df.iloc[i]
as a result I am getting a Series object where row.index.values contains names of df columns.
But I wanted instead dataframe with only one row having dataframe columns in place.
When I do row.to_frame() instead of 1x85 dataframe (1 row, 85 cols) I get 85x1 dataframe where index contains names of columns and row.columns
outputs
Int64Index([0], dtype='int64').
But all I want is just original data-frame columns with only one row. How do I do it?
Or how do I convert row.index values to row.column values and change 85x1 dimension to 1x85
You just need to adding T
row.to_frame().T
Also change your for loop with adding []
for i in range(num_rows):
row = df.iloc[[i]]

Is it possible that a dataframe that has empty strings for values in a column will be converted to null?

So some values in one of the columns in my dataframe are ''. When I save the df as a CSV using .to_csv() to use it in another function I get a csv with null values.
Use replace:
df.replace('', 'null').to_csv(file)
#if need specify only some column
df.replace({'A': {'': 'null'}}).to_csv(file)
If empty values are NaNs add parameter na_rep for convert all NaNs to another value:
df.to_csv(file, na_rep='null')

Convert PySpark DataFrame back to rows

I have some existing code which relies on data being in a row ala:
[u'0,1,1,5,0,1382,4,15]
In order to make some transformations, I had to convert my RDD to a dataframe sp it now looks like this:
Row(a=u'1', code=u'ts=12206384',date=u'2014-10-05', cstat='200', 'substat'=0,', time=0, time=u'00:06:18' Target=0)]
Is it possible to convert the spark DF back to it's original row format so that the rest of my code will work?
I'm going to assume you mean you want to get from a Row object back to a single string of comma separated values.
You would take your dataframe which contains Row objects and do the following:
df_of_row_objects.map(lambda row: ",".join(x for x in row))
This code iterates through each Row in your dataframe and joins each item in the row by a comma.