The picture can't display in the google colab - google-colaboratory

python, pyecharts, google colab
It seems get the picture, but why can's see anything
enter image description here

from pyecharts.globals import CurrentConfig, NotebookType
CurrentConfig.NOTEBOOK_TYPE = NotebookType.JUPYTER_LAB
from pyecharts.charts import Bar
from pyecharts import options as opts
# V1 版本开始支持链式调用
bar = (
Bar()
.add_xaxis(["衬衫", "毛衣", "领带", "裤子", "风衣", "高跟鞋", "袜子"])
.add_yaxis("商家A", [114, 55, 27, 101, 125, 27, 105])
.add_yaxis("商家B", [57, 134, 137, 129, 145, 60, 49])
.set_global_opts(title_opts=opts.TitleOpts(title="某商场销售情况"))
)
bar.render()
# 不习惯链式调用的开发者依旧可以单独调用方法
bar = Bar()
bar.add_xaxis(["衬衫", "毛衣", "领带", "裤子", "风衣", "高跟鞋", "袜子"])
bar.add_yaxis("商家A", [114, 55, 27, 101, 125, 27, 105])
bar.add_yaxis("商家B", [57, 134, 137, 129, 145, 60, 49])
bar.set_global_opts(title_opts=opts.TitleOpts(title="某商场销售情况"))
# bar.load_javascript()
# bar.render()
bar.render_notebook()

Related

Feeding Word Embedding Matrix into a Pytorch LSTM Model

I have a LSTM model I am using to predict the unemployment rate from federal reserve filings. It uses glove vectors and vocab2index embedding and the training went as planned. However, upon attempting to feed a word embedding into the model for prediction testing it keeps throwing various errors.
Here is the model:
def load_glove_vectors(glove_file= glove_embedding_vectors_text_file):
"""Load the glove word vectors"""
word_vectors = {}
with open(glove_file) as f:
for line in f:
split = line.split()
word_vectors[split[0]] = np.array([float(x) for x in split[1:]])
return word_vectors
def get_emb_matrix(pretrained, word_counts, emb_size = 300):
""" Creates embedding matrix from word vectors"""
vocab_size = len(word_counts) + 2
vocab_to_idx = {}
vocab = ["", "UNK"]
W = np.zeros((vocab_size, emb_size), dtype="float32")
W[0] = np.zeros(emb_size, dtype='float32') # adding a vector for padding
W[1] = np.random.uniform(-0.25, 0.25, emb_size) # adding a vector for unknown words
vocab_to_idx["UNK"] = 1
i = 2
for word in word_counts:
if word in word_vecs:
W[i] = word_vecs[word]
else:
W[i] = np.random.uniform(-0.25,0.25, emb_size)
vocab_to_idx[word] = i
vocab.append(word)
i += 1
return W, np.array(vocab), vocab_to_idx
word_vecs = load_glove_vectors()
pretrained_weights, vocab, vocab2index = get_emb_matrix(word_vecs, counts)
Unfortunately when I feed this array
[array([ 3, 10, 6287, 6, 113, 271, 3, 6639, 104, 5105, 7525,
104, 7526, 9, 23, 9, 10, 11, 24, 7527, 7528, 104,
11, 24, 7529, 7530, 104, 11, 24, 7531, 7530, 104, 11,
24, 7532, 7530, 104, 11, 24, 7533, 7534, 24, 7535, 7536,
104, 7537, 104, 7538, 7539, 7540, 6643, 7541, 7354, 7542, 7543,
7544, 9, 23, 9, 10, 11, 24, 25, 8, 10, 11,
24, 3, 10, 663, 168, 9, 10, 290, 291, 3, 4909,
198, 10, 1478, 169, 15, 4621, 3, 3244, 3, 59, 1967,
113, 59, 520, 198, 25, 5105, 7545, 7546, 7547, 7546, 7548,
7549, 7550, 1874, 10, 7551, 9, 10, 11, 24, 7552, 6287,
7553, 7554, 7555, 24, 7556, 24, 7557, 7558, 7559, 6, 7560,
323, 169, 10, 7561, 1432, 6, 3134, 3, 7562, 6, 7563,
1862, 7144, 741, 3, 3961, 7564, 7565, 520, 7566, 4833, 7567,
7568, 4901, 7569, 7570, 4901, 7571, 1874, 7572, 12, 13, 7573,
10, 7574, 7575, 59, 7576, 59, 638, 1620, 7577, 271, 6488,
59, 7578, 7579, 7580, 7581, 271, 7582, 7583, 24, 669, 5932,
7584, 9, 113, 271, 3764, 3, 5930, 3, 59, 4901, 7585,
793, 7586, 7587, 6, 1482, 520, 7588, 520, 7589, 3246, 7590,
13, 7591])
into torch.LongTensor() I keep getting the following error:
TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
Any ideas on how to remedy? I am fairly new to AI in general, and I am an economist by trade so I am almost certain I have made a boneheaded error.

Pollution rose plot gridded

I am trying to create a pollution rose plot as described in the link Plotting Windrose: making a pollution rose with concentration set to color
Example in the reply is working but when I used my data then it is giving a weird plot. Any advice where I am going wrong? Thank you.
import matplotlib.pyplot as plt
import numpy as np
wd = [90.,297.,309.,336.,20.,2.,334.,327.,117.,125.,122.,97.,95.,97.,103.,106.,125.,148.,147.,140.,141.,145.,144.,151.,161.]
ws = [15,1.6,1.8,1.7,2.1,1.6,2.1,1.4,3,6.5,7.1,8.2,10.2,10.2,10.8,10.2,11.4,9.7,8.6,7.1,6.4,5.5,5,5,6]
oz = [10.,20.,30.,40.,50.,60.,70.,80.,90.,100.,110.,120.,90.,140.,100.,106.,125.,148.,147.,140.,141.,145.,144.,151.,161.]
pi_fac = 22/(7*180.)
wd_rad = [w * pi_fac for w in wd]
ws_r = np.linspace(min(ws),max(ws),16)
WD,WS = np.meshgrid(wd_rad,ws_r)
C = oz + np.zeros((len(ws_r),len(wd)),dtype=float)
C = np.ma.masked_less_equal(C,10)
fig, ax = plt.subplots(subplot_kw={"projection":"polar"})
ax.pcolormesh(WD,WS,C,vmin=10, vmax=170) # I tried different vmin and vmax too
plt.show()
The linked post assumes you have a regular grid for directions and for speeds, but your input seems to be quite unordered combinations.
To create a plot with colored regions depending on the oz values, you could try tricontourf. tricontourf takes in X, Y and Z values that don't need to lie on a grid and creates a contour plot. Although it is meant for rectangular layouts, it might also work for your case. It will have a discontinuity though, when crossing from 360º to 0º.
The plot of this example also draws a colorbar to show which range of oz values correspond to which color. vmin and vmax can change this mapping of colors.
import matplotlib.pyplot as plt
import numpy as np
wd = [90, 297, 309, 336, 20, 2, 334, 327, 117, 125, 122, 97, 95, 97, 103, 106, 125, 148, 147, 140, 141, 145, 144, 151, 161]
ws = [15, 1.6, 1.8, 1.7, 2.1, 1.6, 2.1, 1.4, 3, 6.5, 7.1, 8.2, 10.2, 10.2, 10.8, 10.2, 11.4, 9.7, 8.6, 7.1, 6.4, 5.5, 5, 5, 6]
oz = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 90, 140, 100, 106, 125, 148, 147, 140, 141, 145, 144, 151, 161]
fig, ax = plt.subplots(subplot_kw={"projection": "polar"})
cont = ax.tricontourf(np.radians(np.array(wd)), ws, oz, cmap='hot')
plt.colorbar(cont)
plt.show()
With ax.scatter(np.radians(np.array(wd)), ws, c=oz, cmap='hot', vmax=250) you could create a scatter plot to get an idea how the input looks like when colored.
You might want to incorporate Python's windrose library to get polar plots to resemble a windrose.
Another approach, which might be closer to the one intended by the linked question, would be to use scipy's interpolate.griddata to map the data to a grid. To get rid of the areas without data, an 'under' color of 'none' can be used, provided that vmin is higher than zero.
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
wd = [90, 297, 309, 336, 20, 2, 334, 327, 117, 125, 122, 97, 95, 97, 103, 106, 125, 148, 147, 140, 141, 145, 144, 151, 161]
ws = [15, 1.6, 1.8, 1.7, 2.1, 1.6, 2.1, 1.4, 3, 6.5, 7.1, 8.2, 10.2, 10.2, 10.8, 10.2, 11.4, 9.7, 8.6, 7.1, 6.4, 5.5, 5, 5, 6]
oz = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 90, 140, 100, 106, 125, 148, 147, 140, 141, 145, 144, 151, 161]
wd_rad = np.radians(np.array(wd))
oz = np.array(oz, dtype=np.float)
WD, WS = np.meshgrid(np.linspace(0, 2*np.pi, 36), np.linspace(min(ws), max(ws), 16 ))
Z = interpolate.griddata((wd_rad, ws), oz, (WD, WS), method='linear')
fig, ax = plt.subplots(subplot_kw={"projection": "polar"})
cmap = plt.get_cmap('hot')
cmap.set_under('none')
img = ax.pcolormesh(WD, WS, Z, cmap=cmap, vmin=20)
plt.colorbar(img)
plt.show()

I need to filter the column from the beginning of a sentence

In my code, I can filter a column from exact texts, and it works without problems. However, it is necessary to filter another column with the beginning of a sentence.
The phrases in this column are:
A_2020.092222
A_2020.090787
B_2020.983898
B_2020.209308
So, I need to receive everything that starts with A_20 and B_20.
Thanks in advance
My code:
from bs4 import BeautifulSoup
import pandas as pd
import zipfile, urllib.request, shutil, time, csv, datetime, os, sys, os.path
#location
dt = datetime.datetime.now()
file_csv = "/home/Downloads/source.CSV"
file_csv_new = "/var/www/html/Data/Test.csv"
#open CSV
with open(file_csv, 'r', encoding='CP1251') as file:
reader = csv.reader(file, delimiter=';')
data = list(reader)
#list to dataframe
df = pd.DataFrame(data)
#filter UF
df = df.loc[df[9].isin(['PR','SC','RS'])]
#filter key
# A_ & B_
df = df.loc[df[35].isin(['A_20','B_20'])]
#print (df)
#Empty DataFrame
#Columns: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]
#Index: []
#[0 rows x 119 columns]```
Give the following a try:
lst1 = ['A_2020.092222', 'A_2020.090787 ', 'B_2020.983898', 'B_2020.209308', 'C_2020.209308', 'D_2020.209308']
df = pd.DataFrame(lst1, columns =['Name'])
df.loc[df.Name.str.startswith(('A_20','B_20'))]

Kotlin add carriage return into multiline string

In Kotlin, when I build a multiline string like this:
value expected = """
|digraph Test {
|${'\t'}Empty1;
|${'\t'}Empty2;
|}
|""".trimMargin()
I see that the string lacks carriage return characters (ASCII code 13) when I output it via:
println("Expected bytes")
println(expected.toByteArray().contentToString())
Output:
Expected bytes
[100, 105, 103, 114, 97, 112, 104, 32, 84, 101, 115, 116, 32, 123, 10, 9, 69, 109, 112, 116, 121, 49, 59, 10, 9, 69, 109, 112, 116, 121, 50, 59, 10, 125, 10]
When some code I'm trying to unit test builds the same String via a PrintWriter it delineates lines via the lineSeparator property:
/*
* Line separator string. This is the value of the line.separator
* property at the moment that the stream was created.
*/
So I end up with a string which looks the same in output, but is composed of different bytes and thus is not equal:
Actual bytes
[100, 105, 103, 114, 97, 112, 104, 32, 84, 101, 115, 116, 32, 123, 13, 10, 9, 69, 109, 112, 116, 121, 49, 59, 13, 10, 9, 69, 109, 112, 116, 121, 50, 59, 13, 10, 125, 13, 10]
Is there a better way to address this during string declaration than splitting my multiline string into concatenated stringlets which can each be suffixed with char(13)?
Alternately, I'd like to do something like:
value expected = """
|digraph Test {
|${'\t'}Empty1;
|${'\t'}Empty2;
|}
|""".trimMargin().useLineSeparator(System.getProperty("line.separator"))
or .replaceAll() or such.
Does any standard method exist, or should I add my own extension function to String?
This did the trick.
System.lineSeparator()
Kotlin multiline strings are always compiled into string literals which use \n as the line separator. If you need to have the platform-dependent line separator, you can do replace("\n", System.getProperty("line.separator")).
As of Kotlin 1.2, there is no standard library method for this, so you should define your own extension function if you're using this frequently.

Pymc size / indexing issue

I am trying to model Kruschke's "filtration-condensation experiment" with pymc 2.3.5. (numpy 1.10.1)
Basicaly there are:
4 groups
each group has 40 individuals
each individual has 64 Bernoulli trials (correct/incorrect)
What I am modeling:
each individual's results are Binomial distribution (e.g. 45 correct out of 64).
my belief about each individual's performance is Beta distribution.
this Beta distribution is influenced by group to which individual belongs (through parameters A=mu*kappa and B=(1-mu)*kappa)
my belief about how strong each group's influence is Gamma distribution (kappa variable)
my belief about each group's average is Beta distribution (mu variable)
The problem:
when I do modeling with "size=" parameters, pymc get's lost
when I seperate each distribution manually (no size=) the pymc does good job
I include the code below:
Data
import numpy as np
import seaborn as sns
import pymc as pm
from pymc.Matplot import plot as mcplot
%matplotlib inline
# Data
ncond = 4
nSubj = 40
trials = 64
N = np.repeat([trials], (ncond * nSubj))
z = np.array([45, 63, 58, 64, 58, 63, 51, 60, 59, 47, 63, 61, 60, 51, 59, 45,
61, 59, 60, 58, 63, 56, 63, 64, 64, 60, 64, 62, 49, 64, 64, 58, 64, 52, 64, 64,
64, 62, 64, 61, 59, 59, 55, 62, 51, 58, 55, 54, 59, 57, 58, 60, 54, 42, 59, 57,
59, 53, 53, 42, 59, 57, 29, 36, 51, 64, 60, 54, 54, 38, 61, 60, 61, 60, 62, 55,
38, 43, 58, 60, 44, 44, 32, 56, 43, 36, 38, 48, 32, 40, 40, 34, 45, 42, 41, 32,
48, 36, 29, 37, 53, 55, 50, 47, 46, 44, 50, 56, 58, 42, 58, 54, 57, 54, 51, 49,
52, 51, 49, 51, 46, 46, 42, 49, 46, 56, 42, 53, 55, 51, 55, 49, 53, 55, 40, 46,
56, 47, 54, 54, 42, 34, 35, 41, 48, 46, 39, 55, 30, 49, 27, 51, 41, 36, 45, 41,
53, 32, 43, 33])
condition = np.repeat([0,1,2,3], nSubj)
Does not work
# modeling
mu = pm.Beta('mu', 1, 1, size=ncond)
kappa = pm.Gamma('gamma', 1, 0.1, size=ncond)
# Prior
theta = pm.Beta('theta', mu[condition] * kappa[condition], (1 - mu[condition]) * kappa[condition], size=len(z))
# likelihood
y = pm.Binomial('y', p=theta, n=N, value=z, observed=True)
# model
model = pm.Model([mu, kappa, theta, y])
mcmc = pm.MCMC(model)
#mcmc.use_step_method(pm.Metropolis, mu)
#mcmc.use_step_method(pm.Metropolis, theta)
#mcmc.assign_step_methods()
mcmc.sample(100000, burn=20000, thin=3)
# outputs never converge and does vary in new simulations
mcplot(mcmc.trace('mu'), common_scale=False)
Works
z1 = z[:40]
z2 = z[40:80]
z3 = z[80:120]
z4 = z[120:]
Nv = N[:40]
mu1 = pm.Beta('mu1', 1, 1)
mu2 = pm.Beta('mu2', 1, 1)
mu3 = pm.Beta('mu3', 1, 1)
mu4 = pm.Beta('mu4', 1, 1)
kappa1 = pm.Gamma('gamma1', 1, 0.1)
kappa2 = pm.Gamma('gamma2', 1, 0.1)
kappa3 = pm.Gamma('gamma3', 1, 0.1)
kappa4 = pm.Gamma('gamma4', 1, 0.1)
# Prior
theta1 = pm.Beta('theta1', mu1 * kappa1, (1 - mu1) * kappa1, size=len(Nv))
theta2 = pm.Beta('theta2', mu2 * kappa2, (1 - mu2) * kappa2, size=len(Nv))
theta3 = pm.Beta('theta3', mu3 * kappa3, (1 - mu3) * kappa3, size=len(Nv))
theta4 = pm.Beta('theta4', mu4 * kappa4, (1 - mu4) * kappa4, size=len(Nv))
# likelihood
y1 = pm.Binomial('y1', p=theta1, n=Nv, value=z1, observed=True)
y2 = pm.Binomial('y2', p=theta2, n=Nv, value=z2, observed=True)
y3 = pm.Binomial('y3', p=theta3, n=Nv, value=z3, observed=True)
y4 = pm.Binomial('y4', p=theta4, n=Nv, value=z4, observed=True)
# model
model = pm.Model([mu1, kappa1, theta1, y1, mu2, kappa2, theta2, y2,
mu3, kappa3, theta3, y3, mu4, kappa4, theta4, y4])
mcmc = pm.MCMC(model)
#mcmc.use_step_method(pm.Metropolis, mu)
#mcmc.use_step_method(pm.Metropolis, theta)
#mcmc.assign_step_methods()
mcmc.sample(100000, burn=20000, thin=3)
# outputs converge and are not too much different in every simulation
mcplot(mcmc.trace('mu1'), common_scale=False)
mcplot(mcmc.trace('mu2'), common_scale=False)
mcplot(mcmc.trace('mu3'), common_scale=False)
mcplot(mcmc.trace('mu4'), common_scale=False)
mcmc.summary()
Can someone please explain it to me why mu[condition] and gamma[condition] does not work? :)
I guess that not splitting thetas into different variables is the problem but cannot understand why and maybe there is a way to pass a shape parameter to size= on theta?
First of all, I can confirm that the first version doesn't lead to stable results. What I can't confirm is that the second one is much better; I have seen very different results also with the second code, with values for the first mu parameter varying between 0.17 and 0.9 for different runs.
The convergence problems can be cured by using good starting values for the Markov chain. This can be done by first doing a maximum a posteriori (MAP) estimate, and then starting the Markov chain from there. The MAP step is computationally inexpensive and leads to a converging Markov chain with reproducible results for both variants of your code. For reference and comparison: The values I see for the four mu parameters are around 0.94 / 0.86 / 0.72 and 0.71.
You can do the MAP estimation by inserting the following two lines of code right after the line in which you define your model with "model=pm.Model(...":
map_ = pm.MAP(model)
map_.fit()
This technique is covered in more detail in Cameron Davidson-Pilon's Bayesian Methods for Hackers, together with other helpful topics around PyMC.