convert csv file to nc(netcdf) file using xarray - numpy

I want to convert the CSV file storing date, temperature values, latitude, and longitude information to NetCDF file format having three-dimension.
My dataframe is like:
When I use this script below it only contains 1 dimension.
import pandas as pd
import xarray as xr
df = pd.read_csv(csv_file)
xr = df.to_xarray()
nc=xr.to_netcdf('my_netcdf.nc')
Can you help me with that?
Thank you.

You will need to set time etc. as indices in pandas first. So modify your code to something like this:
import pandas as pd
import xarray as xr
df = pd.read_csv(csv_file)
df = df.set_index(["time", "lon", "lat"]
xr = df.to_xarray()
nc=xr.to_netcdf('my_netcdf.nc')

Related

Ploting dataframe with NAs with linearly joined points

I have a dataframe where each column has many missing values. How can I make a plot where the datapoints in each column are joined with lines, i.e. NAs are ignored, instead of having a choppy plot?
import numpy as np
import pandas as pd
pd.options.plotting.backend = "plotly"
d = pd.DataFrame(data = np.random.choice([np.nan] + list(range(7)), size=(10,3)))
d.plot(markers=True)
One way is to use this for each column:
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, name="linear",
line_shape='linear'))
Are there any better ways to accomplish this?
You can use pandas interpolate. Have demonstrated using plotly express and chained use so underlying data is not changed.
Post comments have amended answer so that markers are not shown for interpreted points.
import numpy as np
import pandas as pd
import plotly.express as px
d = pd.DataFrame(data=np.random.choice([np.nan] + list(range(7)), size=(10, 3)))
px.line(d).update_traces(mode="lines+markers").add_traces(
px.line(d.interpolate(limit_direction="both")).update_traces(showlegend=False).data
)

Dictionary type data sorting

I have this type of data
{"id":"colvera","gg_unique_id":"colvera","gg_unique_prospect_account_id":"cobra-hq-enterprises","completeness_score":100,"full_name":"chris olvera","first_name":"chris","last_name":"olvera","linkedin_url":"linkedin.com/in/colvera","linkedin_username":"colvera","facebook_url":null,"twitter_url":null,"email":"colvera#cobrahq.com","mobile_phone":null,"industry":"information technology and services","title":"independent business owner","company_name":"cobra hq enterprises","domain":"cobrahq.com","website":"cobrahq.com","employee_count":"1-10","company_linkedin_url":"linkedin.com/company/cobra-hq-enterprises","company_linkedin_username":"cobra-hq-enterprises","company_location":"raymore, missouri, united states","company_city":"raymore","company_state":"missouri","company_country":"united states"
i want to set "id","gg_unique_id" etc as column name and the values as row. How can i do that?
Im trying the following codes but nothing happens:
import pandas as pd
import numpy as np
data = pd.read_csv("1k_sample_data.txt")
data.info()
df = pd.DataFrame.from_dict(data)
df
I am new to this type of data, any help would be appriciated
Looks like you have data in Json format. Try:
df = pd.read_json("1k_sample_data.txt", lines=True)
print(df)

How can I get an interpolated value from a Pandas data frame?

I have a simple Pandas data frame with two columns, 'Angle' and 'rff'. I want to get an interpolated 'rff' value based on entering an Angle that falls between two Angle values (i.e. between two index values) in the data frame. For example, I'd like to enter 3.4 for the Angle and then get an interpolated 'rff'. What would be the best way to accomplish that?
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
s = s.set_index('Angle') #Set 'Angle' as index
print(s)
result = s.at[3.0, "rff"]
print(result)
You may use numpy:
import numpy as np
np.interp(3.4, s.index, s.rff)
#59.6
You could use numpy for this:
import numpy as np
import pandas as pd
data = [[1.0,45.0], [2,56], [3,58], [4,62],[5,70]] #Sample data
s= pd.DataFrame(data, columns = ['Angle', 'rff'])
print(s)
print(np.interp(3.4, s.Angle, s.rff))
>>> 59.6

Parse list and create DataFrame

I have been given a list called data which has the following content
data=[b'Name,Age,Occupation,Salary\r\nRam,37,Plumber,1769\r\nMohan,49,Elecrician,3974\r\nRahim,39,Teacher,4559\r\n']
I wanted to have a pandas dataframe which looks like the link
Expected Dataframe
How can I achieve this.
You can try this:
data=[b'Name,Age,Occupation,Salary\r\nRam,37,Plumber,1769\r\nMohan,49,Elecrician,3974\r\nRahim,39,Teacher,4559\r\n']
processed_data = [x.split(',') for x in data[0].decode().replace('\r', '').strip().split('\n')]
df = pd.DataFrame(columns=processed_data[0], data=processed_data[1:])
Hope it helps.
I would recommend you to convert this list to string as there is only one index in this list
str1 = ''.join(data)
Then use solution provided here
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
TESTDATA = StringIO(str1)
df = pd.read_csv(TESTDATA, sep=",")

Why does my Price column show in e-multiples?

I used this method to clean up the currency column of my data of "£" and ",". Also converted the non str values to NaN.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
### Reading the excel file with dtype
df = pd.read_excel("Housing Market B16+5Miles.xlsx", dtype={"Price" : str})
df.loc[df['Price'] == 'POA','Price'] = np.nan
House_Price = df["Price"].str.replace(",","").str.replace("£","").astype("float")
del df['Price']
df["Price"] = House_Price
df
df.describe()
by describing the dataframe, the column for the "Price" was all in decimals with an e-value at the end. Why did this happen and will it affect my analysis moving forward?
Your pandas settings might be set to display large numbers in scientific notation. You can change that using pd.set_option('display.float_format', lambda x: '%.3f' % x)