reading shp file with pyshp jupyter - record

import numpy as np
import pandas as pd
import shapefile as shp
import matplotlib.pyplot as plt
import seaborn as sns
shp_path = 'seoul\seoul.shp'
sf = shp.Reader(shp_path)
sf.records()[1]
and then finally I got this error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 0: invalid start byte
I wanna how to solve this problem

You need to try different encodings options.
The documentation shows a way to do it for the shapefile's method Reader:
shp_path = "Barrios_Cali/Barrios.shp"
sf = shp.Reader(shp_path, encoding="latin1") # Notice the encoding option
So go ahead and look for the .shp probable encoding type and give it a try.

Related

read csv file from buffer got EmptyDataError?

i need to read a string like csv content with pandas , but pandas get some errors, i don't knonw what happened, can anyone help me?
import pandas as pd
import io
s = ',测试项,信息,结果\r\n0,软件测试机型805,软件测试机型805,PASS\r\n1,软件当前版本1,软件当前版本1,FAIL\r\n2,软件测试机型805,软件测试机型805,PASS\r\n3,软件当前版本1,软件当前版本1,FAIL\r\n4,软件测试机型805,软件测试机型805,PASS\r\n5,软件当前版本1,软件当前版本1,FAIL\r\n'
buf = io.StringIO()
buf.write(s)
df = pd.read_csv(buf)
got error, EmptyDataError: No columns to parse from file
老铁你拿去
import pandas as pd
import io
s = ',测试项,信息,结果\r\n0,软件测试机型805,软件测试机型805,PASS\r\n1,软件当前版本1,软件当前版本1,FAIL\r\n2,软件测试机型805,软件测试机型805,PASS\r\n3,软件当前版本1,软件当前版本1,FAIL\r\n4,软件测试机型805,软件测试机型805,PASS\r\n5,软件当前版本1,软件当前版本1,FAIL\r\n'
buf = io.StringIO()
buf.write(s)
buf.seek(0)
df = pd.read_csv(buf)
``

Blank Bokeh plot when reading data from dataframe and using time on x-axis

I am unable to display plot using Bokeh. I am reading the data from dataframe. Here's a snippet of my Python code.
I am new to Bokeh. I tried following some of the examples from the User Guide. I'm unable to figure out what's going wrong here. Please advise.
import datetime
import pandas
from bokeh.plotting import figure, show, output_file, output_notebook
from bokeh.models import ColumnDataSource
PATH_TO_CSV = "Sample_Data.csv"
output_notebook()
data = pd.read_csv(PATH_TO_CSV, index_col=False)
data['timestamp'] = pd.to_datetime(data['timestamp']).dt.strftime("%H:%M:%S")
source = ColumnDataSource(data)
p = figure(plot_width=400, plot_height=400, x_axis_type="datetime")
p.line('timestamp', 'event_msg', source=source)
show(p)
Here's sample .csv
event_msg,timestamp
Created,2019-03-02 13:19:44.164562-0700
Created,2019-03-02 13:20:32.212323-0700
Created,2019-03-02 13:20:56.582761-0700
Modified,2019-03-02 13:21:48.021752-0700
Deleted,2019-03-02 13:22:16.938382-0700
Modified,2019-03-02 13:22:22.139714-0700
Permission changed,2019-03-02 13:24:20.195975-0700
Deleted,2019-03-02 13:33:53.049900-0700
Modified,2019-03-02 13:33:56.266113-0700
Deleted,2019-03-02 13:33:59.757584-0700
I am seeing completely blank plot. Ideally, I am interested in plotting different line plots based on the event messages.
You should convert your time like this:
data['timestamp'] = pd.to_datetime(data['timestamp'])
So your code should look like (tested with Bokeh v1.1.0):
import os
import datetime
import pandas as pd
from bokeh.plotting import figure, show, output_file, output_notebook
from bokeh.models import ColumnDataSource
PATH_TO_CSV = "Sample_Data.csv"
output_notebook()
data = pd.read_csv(os.path.join(os.path.dirname(__file__), PATH_TO_CSV), index_col = False)
data['timestamp'] = pd.to_datetime(data['timestamp'])
source = ColumnDataSource(data)
p = figure(plot_width = 400, plot_height = 400, x_axis_type = "datetime", y_range = data['event_msg'].unique())
p.line('timestamp', 'event_msg', source = source)
show(p)
Result:

Wordcloud using matplotlib is not showing

For my code, please see below:
#tfids words word cloud
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import pandas as pd
tf = pd.DataFrame(columns=['word'])
tf['word'] = ['federalist', 'wrexham', 'remy', 'islamic', 'hegseth', 'hereford', 'gbt', 'sharenet', 'cpr', 'vegas', 'krvn', 'bandidos', 'nginx', 'manafort' , 'roth', 'kennedy' ,'pence', 'quantum']
wordcloud10 = WordCloud().generate(' '.join(tf['word']))
plt.imshow(wordcloud10)
plt.axis("off")
plt.title("")
plt.show()
display()
In databricks, one has to use display to see a chart. Despite the above code, I don't see a wordcloud. The O/p i see as below:
Out[1106]: <matplotlib.image.AxesImage at 0x7fae917806d0>
Out[1106]: (-0.5, 399.5, 199.5, -0.5)
Out[1106]: <matplotlib.text.Text at 0x7faeb31a6110>
Thanks for your help.
Screenshot from Databricks - No Chart

RDKit - Export pandas data frame with mol image

I would like to know whether is it possible to export pandas dataframe with molecular image directly in excel file format?
Thanks in advance,
In RDKit's PandasTools there is the funktion SaveXlsxFromFrame.
http://www.rdkit.org/Python_Docs/rdkit.Chem.PandasTools-module.html#SaveXlsxFromFrame
XlsxWriter must be installed.
import pandas as pd
from rdkit import Chem
from rdkit.Chem import PandasTools
smiles = ['c1ccccc1', 'c1ccccc1O', 'c1cc(O)ccc1O']
df = pd.DataFrame({'ID':['Benzene', 'Phenol', 'Hydroquinone'], 'SMILES':smiles})
df['Mol Image'] = [Chem.MolFromSmiles(s) for s in df['SMILES']]
PandasTools.SaveXlsxFromFrame(df, 'test.xlsx', molCol='Mol Image')

How do I enable the REFS_OK flag in nditer in numpy in Python 3.3?

Does anyone know how one goes about enabling the REFS_OK flag in numpy? I cannot seem to find a clear explanation online.
My code is:
import sys
import string
import numpy as np
import pandas as pd
SNP_df = pd.read_csv('SNPs.txt',sep='\t',index_col = None ,header = None,nrows = 101)
output = open('100 SNPs.fa','a')
for i in SNP_df:
data = SNP_df[i]
data = np.array(data)
for j in np.nditer(data):
if j == 0:
output.write(("\n>%s\n")%(str(data(j))))
else:
output.write(data(j))
I keep getting the error message: Iterator operand or requested dtype holds references, but the REFS_OK was not enabled.
I cannot work out how to enable the REFS_OK flag so the program can continue...
I have isolated the problem. There is no need to use np.nditer. The main problem was with me misinterpreting how Python would read iterator variables in a for loop. The corrected code is below.
import sys
import string
import fileinput
import numpy as np
SNP_df = pd.read_csv('datafile.txt',sep='\t',index_col = None ,header = None,nrows = 5000)
output = open('outputFile.fa','a')
for i in range(1,51):
data = SNP_df[i]
data = np.array(data)
for j in range(0,1):
output.write(("\n>%s\n")%(str(data[j])))
for k in range(1,len(data)):
output.write(str(data[k]))
If you really want to enable the flag, I have an working example.
Python 2.7, numpy 1.14.2, pandas 0.22.0
import pandas as pd
import numpy as np
# get all data as panda DataFrame
data = pd.read_csv("./monthdata.csv")
print(data)
# get values as numpy array
data_ar = data.values # numpy.ndarray, every element is a row
for row in data_ar:
print(row)
sum = 0
count = 0
for month in np.nditer(row, flags=["refs_OK"], op_flags=["readwrite"]):
print month