Dictionary type data sorting - pandas

I have this type of data
{"id":"colvera","gg_unique_id":"colvera","gg_unique_prospect_account_id":"cobra-hq-enterprises","completeness_score":100,"full_name":"chris olvera","first_name":"chris","last_name":"olvera","linkedin_url":"linkedin.com/in/colvera","linkedin_username":"colvera","facebook_url":null,"twitter_url":null,"email":"colvera#cobrahq.com","mobile_phone":null,"industry":"information technology and services","title":"independent business owner","company_name":"cobra hq enterprises","domain":"cobrahq.com","website":"cobrahq.com","employee_count":"1-10","company_linkedin_url":"linkedin.com/company/cobra-hq-enterprises","company_linkedin_username":"cobra-hq-enterprises","company_location":"raymore, missouri, united states","company_city":"raymore","company_state":"missouri","company_country":"united states"
i want to set "id","gg_unique_id" etc as column name and the values as row. How can i do that?
Im trying the following codes but nothing happens:
import pandas as pd
import numpy as np
data = pd.read_csv("1k_sample_data.txt")
data.info()
df = pd.DataFrame.from_dict(data)
df
I am new to this type of data, any help would be appriciated

Looks like you have data in Json format. Try:
df = pd.read_json("1k_sample_data.txt", lines=True)
print(df)

Related

Cleaning DataFrame columns words starting with defined character

I have a Pandas DataFrame that I would like to clean a little bit.
import pandas as pd
data = ['This is awesome', '\$BTC $USD Short the market', 'Dont miss the dip on $ETH']
df = pd.DataFrame(data)
print(df)
'''
I am trying to delete all words starting with "$" such as "$BTC", "$USD", etc. Can't figure out what to do. Convert the column to a list? Would like to use the function startswith() but don't know exactly how... thanks for your help!
Code-
import pandas as pd
data = ['This is awesome', '\$BTC $USD Short the market', 'Dont miss the dip on $ETH']
df = pd.DataFrame(data,columns=['data'])
df['data']=df['data'].replace('\$\w+',"", regex=True)
df
Output-
data
0 This is awesome
1 \ Short the market
2 Dont miss the dip on
Ref link- remove words starting with "#" in a column from a dataframe

Having trouble with and Excel spreadsheet, in google colab and a column is missing

Yes this is homework and no I don't want an answer to the question, but for some reason the column I would like to move using pandas is missing yet I can still see it on my end result. Why is this happening. This is what I have done:
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mpl
import seaborn as sns
#read xlsx file
df = pd.read_excel("https://docs.google.com/spreadsheets/d/e/2PACX-
1vTd9TqybCunAe9HPPdb5mOW5uFn5m5fXO-mecfsn0TEk10_l8Bz1Kc7k13AFWoyvC1t3A7A27zozfTd/pub?
output=xlsx")
df
#removes last 2 rows
df.iloc[0:, 0:21]
#columns grouped by type float
df.iloc[0:, [0,2,4,9,10,11,12,13,14,15,16,17,18,19,20]]
#columns grouped by type object
df.iloc[0:, [1,3,5,6,7]]
#gets dummies and stores them in variables
type_float = df.iloc[0:, [0,2,4,9,10,11,12,13,14,15,16,17,18,19,20]]
type_object = df.iloc[0:, [1,3,5,6,7]]
#concatonates the dummies to orignal dataframe
df = pd.concat([type_float, type_object], axis='columns')
df
#rename
df.rename(columns = {'Attrition_Flag':'Target'}, inplace = True)
df
#Replaceing target with 0/1
df['Target'].replace(['Existing Customer', 'Attrited Customer'],[0, 1], inplace=True)
df
'''
This is where im having trouble
When I try to move column "target" I cant. Ive tried to pop it, and then move it to the back
and when I try using "df.iloc[0:, [15]]" which is its column, it just goes to the next column. Why is this column non-existent? anymore
Not sure if I understand correctly what you need to do but if you want to change the order of columns (make 'Target' the last column) you can use:
all_columns_in_new_order = list(df.columns.drop('Target')) + ['Target']
and then:
df = df.reindex(all_columns_in_new_order, axis=1)

Scrapping Table from Website with Pandas Returning Empty Data Frame

I am trying to extract 'Holdings' table from
https://www.ishares.com/us/products/268752/ishares-global-reit-etf
I use pandas but it return me empty dataframe with only columns name.
Could anyone help me with this please?
import pandas as pd
url = 'https://www.ishares.com/us/products/268752/ishares-global-reit-etf'
tab = pd.read_html(url)
df = pd.DataFrame(tab[6])

How can I get an interpolated value from a Pandas data frame?

I have a simple Pandas data frame with two columns, 'Angle' and 'rff'. I want to get an interpolated 'rff' value based on entering an Angle that falls between two Angle values (i.e. between two index values) in the data frame. For example, I'd like to enter 3.4 for the Angle and then get an interpolated 'rff'. What would be the best way to accomplish that?
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
s = s.set_index('Angle') #Set 'Angle' as index
print(s)
result = s.at[3.0, "rff"]
print(result)
You may use numpy:
import numpy as np
np.interp(3.4, s.index, s.rff)
#59.6
You could use numpy for this:
import numpy as np
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
print(np.interp(3.4, s.Angle, s.rff))
>>> 59.6

how to extract the unique values and its count of a column and store in data frame with index key

I am new to pandas.I have a simple question:
how to extract the unique values and its count of a column and store in data frame with index key
I have tried to:
df = df1['Genre'].value_counts()
and I am getting a series but I don't know how to convert it to data frame object.
Pandas series has a .to_frame() function. Try it:
df = df1['Genre'].value_counts().to_frame()
And if you wanna "switch" the rows to columns:
df = df1['Genre'].value_counts().to_frame().T
Update: Full example if you want them as columns:
import pandas as pd
import numpy as np
np.random.seed(400) # To reproduce random variables
df1 = pd.DataFrame({
'Genre': np.random.choice(['Comedy','Drama','Thriller'], size=10)
})
df = df1['Genre'].value_counts().to_frame().T
print(df)
Returns:
Thriller Comedy Drama
Genre 5 3 2
try
df = pd.DataFrame(df1['Genre'].value_counts())