Anim Plot Updating - matplotlib

I am trying to find maximum values of a sinusoidal and show it in a subplot. And trying to update it like an animation. But the subplot of the maximum values gives all zero values. When I print the array it is not zeroes. I think it is not updating y values. I couldn't figure out the reason. Any help would be kindly appreciated.
I will put my code it is executable:
from pylab import *
import time
ion()
fs = 1e6
Ts = 1/fs
SNR=10
sinfreq=2*pi*1e5
pack= 512
t = Ts*arange(0,pack)
f = fs*(arange(0,pack)-pack/2)/pack
max_y = zeros (len(t))
y=sin(sinfreq*t)
y=y+randn(size(y))/sqrt(10^(SNR/10)*2)
subplot(211)
line1, = plot(y)
subplot(212)
line2, = plot(max_y)
for i1 in arange(1,1000):
y=sin(sinfreq*t)
y=y+randn(size(y))/sqrt(10^(SNR/10)*2)
line1.set_ydata(y)
mk=0
for mk in range(0,len(y)):
if y[mk] > max_y[mk]:
max_y[mk] = y[mk]
print max_y
line2.set_ydata(max_y)
draw()
waitforbuttonpress(timeout=0.5)

You've forgotten an indent in the second for loop.
That should actually give you an IndentationError, so I don't understand how you can say that the program is executable (in fact, I edited your entry to remove the left-over "enter code here" statement; if you had checked and copy-pasted your entry, you'd probably found both mistakes).
But, are you sure you don't want simply
max_y[:] = max(y)
instead of that for loop?

Ok I have found the solution incidentally, I don't know the reason but I update my max_y plot any other value firstly then do my real update, then the plot shows my changes. Other than this it is not showing. I try with different for loops they are ploting with one update but for my loop it wanted to update twice.
I have also add set_ylim to see better the limits. I put stars to places that I have changed. I am putting the new code also. I wish it will help people that have the same problem.
from pylab import *
import time
ion()
fs = 1e6
Ts = 1/fs
SNR=10
sinfreq=2*pi*1e5
pack= 512
t = Ts*arange(0,pack)
f = fs*(arange(0,pack)-pack/2)/pack
max_y = zeros (len(t))
y=sin(sinfreq*t)
y=y+randn(size(y))/sqrt(10^(SNR/10)*2)
subplot(211)
line1, = plot(y)
sub2=subplot(212) #****
line2, = plot(max_y)
for i1 in arange(1,1000):
y=sin(sinfreq*t)
y=y+randn(size(y))/sqrt(10^(SNR/10)*2)
line1.set_ydata(y)
mk=0
for mk in range(0,len(y)):
if y[mk] > max_y[mk]:
max_y[mk] = y[mk]
#print max_y
line2.set_ydata(zeros(len(max_y)))#****
line2.set_ydata(max_y)
sub2.set_ylim(min(max_y),max(max_y)) #****
draw()
waitforbuttonpress(timeout=0.5)

Related

Python Scattergeo increasing scatter size (Type error : Nonetype and int)

Sup, ive been stuck on this problem since last night , i am trying to increase the scatter plots on
Scattergeo dependent on the magnitude of a earthquake, i keep getting Nonetype error here's the code, when i add size to :
[Scattergeo(lon=lons,lat=lats,size=[5*mag for mag in mags])]
import json
from plotly import offline
from plotly.graph_objs import *
with open('all_month.geojson.json',encoding='utf8') as geo :
file = json.load(geo)
with open('earthquakes','w') as eq :
json.dump(file,eq,indent=4)
file = file['features']
mags,lons,lats = [],[],[]
for columns in file :
mag = columns['properties']['mag']
lon = columns['geometry']['coordinates'][0]
lat = columns['geometry']['coordinates'][1]
mags.append(mag)
lons.append(lon)
lats.append(lat)
data = [Scattergeo(lon=lons,lat=lats)]
my_layout = Layout(title='Earthquakes around the world for the past 30 days')
offline.plot({'data':data,'layout':my_layout},filename='Eq.html')
OK, i figured it out, when i downloaded the data, there were mutiple options as to the format and i recommend downloading the +1 magnitude and above
.
here is the output

How to get the lowest low of a series in PineScript

I'm trying to get the lowest low of a series of candles after a condition, but it always returns the last candle of the condition. I try with min(), lowest() and a for loop but it doesn't work. Also try using blackCandle[] and min(ThreeinARow)/lowest(ThreeinARow) and sometimes it returns the last candle and other times it gives me compilation error.
blackCandle = close < open
ThreeinARow = blackCandle[3] and blackCandle[2] and blackCandle[1]
SL = ThreeinARow ? min(low[1], low[2], low[3]) : na
//#version=4
study("Help (low after 3DownBar)", overlay=true, max_bars_back=100)
blackCandle = close < open
ThreeinARow = blackCandle[3] and blackCandle[2] and blackCandle[1]
bar_ind = barssince(ThreeinARow)
//SL = lowest(max(1, nz(bar_ind))) // the lowest low of a series of candles after the condition
SL = lowest(max(1, nz(bar_ind)+1)) // the lowest low of a series of candles since the condition
plot(SL, style=plot.style_cross, linewidth=3)
bgcolor(ThreeinARow ? color.silver : na)
See also the second solution which is in the commented line
It seems that I was misinterpreting it. Using min() does return the minimum of a series of candles. The detail is that I must enter the specific number of candles that I will use to calculate the minimum, which, for now, does not generate any problem for me. In the end, this is how I ended up writing it:
blackCandle = close < open
ThreeinARow = blackCandle[3] and blackCandle[2] and blackCandle[1]
Lowest_Low = if ThreeinARow
min(low[1], low[2], low[3])
plot(Lowest_Low, color=color.red)

Using Python, how can you make a time wheel include all 24 wedges if your data doesn't have data in each category?

Using code from David Dale (Time Wheel in python3 pandas), my data is fairly large but has a few hours that don't have data and subsequently the corresponding wedges are not shown in the time wheel. And so the wheel is missing wedges and it makes it look wrong even though it technically is not wrong.
I have searched the proposed questions when asking this question and tried to understand the code to alter it but cannot.
David Dale code copied from the link:
def pie_heatmap(table, cmap=cm.hot, vmin=None, vmax=None,inner_r=0.25, pie_args={}):
n, m = table.shape
vmin= table.min().min() if vmin is None else vmin
vmax= table.max().max() if vmax is None else vmax
centre_circle = plt.Circle((0,0),inner_r,edgecolor='black',facecolor='white',fill=True,linewidth=0.25)
plt.gcf().gca().add_artist(centre_circle)
norm = mpl.colors.Normalize(vmin=vmin, vmax=vmax)
cmapper = cm.ScalarMappable(norm=norm, cmap=cmap)
for i, (row_name, row) in enumerate(table.iterrows()):
labels = None if i > 0 else table.columns
wedges = plt.pie([1] * m,radius=inner_r+float(n-i)/n, colors=[cmapper.to_rgba(x) for x in row.values],
labels=labels, startangle=90, counterclock=False, wedgeprops={'linewidth':-1}, **pie_args)
plt.setp(wedges[0], edgecolor='white',linewidth=1.5)
wedges = plt.pie([1], radius=inner_r+float(n-i-1)/n, colors=['w'], labels=[row_name], startangle=-90, wedgeprops={'linewidth':0})
plt.setp(wedges[0], edgecolor='white',linewidth=1.5)
The code works well - thanks Dave - but I need it to make the time wheel with 24 wedges regardless of whether data exists or not for that wedge. Thanks for any help!
Update: I wrote a script to make it work for me. data is my 7x24 data table (7 days by 24 hours each). hours is a list of the 24 hours of the day. ie, ['00:00','01:00','02:00','03:00' ... '23:00']. It is a list of strings because the data is strings when we get it.
blankhours=pandas.DataFrame(0,index=np.arange(0,24),columns=np.arange(1)) #to get 0s
shape=data.shape
if shape[1] < 24: #to access shape of the columns
for h in hours:
hour=0 #counter
if h not in data.columns.values: #see if it is in what should be the complete list
data.insert(hour,h,blankhours) #insert it in there since it wasn't in there
hour+=1 #increment counter
data=data.sort_index(axis=1) #sort the final dataframe by column headers
Hopefully that helps someone...

Tukey-Test Grouping and plotting in SciPy

I'm trying to plot results from a Tukey test, but I am struggling with putting data into groups based on a P-Value. This is the equivalent in R which I am trying to replicate. I have been using the SciPy one-way ANOVA tests and the Tukey test statsmodel but can't get these groups done in the same way.
Any help is greatly appreciated
I've also just found this another example in R of what I want to do in python
I have been struggling to do the same thing. I found a paper that tells you how to code the letters.
Hans-Peter Piepho (2004) An Algorithm for a Letter-Based Representation of All-Pairwise Comparisons, Journal of Computational and Graphical Statistics, 13:2, 456-466, DOI: 10.1198/1061860043515
Doing the coding was a little tricky as you need to check and replicate columns and then combine columns. I tried to add some comments to the colde. I figured out a method where you can run tukeyhsd and then from the results compute the letters. It should be possible to turn this into a function. Or hopefully part of tukeyhsd. My data is not posted but it is a column of data and then a column describing the groups. The groups for me are the five boroughs of NYC. You can also just change the comments and use random data the first time.
# Read data. Comment out the next ones to use random data.
df=pd.read_excel('anova_test.xlsx')
#n=1000
#df = pd.DataFrame(columns=['Groups','Data'],index=np.arange(n))
#df['Groups']=np.random.randint(1, 4,size=n)
#df['Data']=df['Groups']*np.random.random_sample(size=n)
# define columns for data and then grouping
col_to_group='Groups'
col_for_data='Data'
#Now take teh data and regroup for anova
samples = [cols[1] for cols in df.groupby(col_to_group)[col_for_data]] #I am not sure how this works but it makes an numpy array for each group
f_val, p_val = stats.f_oneway(*samples) # I am not sure what this star does but this passes all the numpy arrays correctly
#print('F value: {:.3f}, p value: {:.3f}\n'.format(f_val, p_val))
# this if statement can be uncommmented if you don't won't to go furhter with out p<0.05
#if p_val<0.05: #If the p value is less than 0.05 it then does the tukey
mod = MultiComparison(df[col_for_data], df[col_to_group])
thsd=mod.tukeyhsd()
#print(mod.tukeyhsd())
#this is a function to do Piepho method. AN Alogrithm for a letter based representation of al-pairwise comparisons.
tot=len(thsd.groupsunique)
#make an empty dataframe that is a square matrix of size of the groups. #set first column to 1
df_ltr=pd.DataFrame(np.nan, index=np.arange(tot),columns=np.arange(tot))
df_ltr.iloc[:,0]=1
count=0
df_nms = pd.DataFrame('', index=np.arange(tot), columns=['names']) # I make a dummy dataframe to put axis labels into. sd stands for signifcant difference
for i in np.arange(tot): #I loop through and make all pairwise comparisons.
for j in np.arange(i+1,tot):
#print('i=',i,'j=',j,thsd.reject[count])
if thsd.reject[count]==True:
for cn in np.arange(tot):
if df_ltr.iloc[i,cn]==1 and df_ltr.iloc[j,cn]==1: #If the column contains both i and j shift and duplicat
df_ltr=pd.concat([df_ltr.iloc[:,:cn+1],df_ltr.iloc[:,cn+1:].T.shift().T],axis=1)
df_ltr.iloc[:,cn+1]=df_ltr.iloc[:,cn]
df_ltr.iloc[i,cn]=0
df_ltr.iloc[j,cn+1]=0
#Now we need to check all columns for abosortpion.
for cleft in np.arange(len(df_ltr.columns)-1):
for cright in np.arange(cleft+1,len(df_ltr.columns)):
if (df_ltr.iloc[:,cleft].isna()).all()==False and (df_ltr.iloc[:,cright].isna()).all()==False:
if (df_ltr.iloc[:,cleft]>=df_ltr.iloc[:,cright]).all()==True:
df_ltr.iloc[:,cright]=0
df_ltr=pd.concat([df_ltr.iloc[:,:cright],df_ltr.iloc[:,cright:].T.shift(-1).T],axis=1)
if (df_ltr.iloc[:,cleft]<=df_ltr.iloc[:,cright]).all()==True:
df_ltr.iloc[:,cleft]=0
df_ltr=pd.concat([df_ltr.iloc[:,:cleft],df_ltr.iloc[:,cleft:].T.shift(-1).T],axis=1)
count+=1
#I sort so that the first column becomes A
df_ltr=df_ltr.sort_values(by=list(df_ltr.columns),axis=1,ascending=False)
# I assign letters to each column
for cn in np.arange(len(df_ltr.columns)):
df_ltr.iloc[:,cn]=df_ltr.iloc[:,cn].replace(1,chr(97+cn))
df_ltr.iloc[:,cn]=df_ltr.iloc[:,cn].replace(0,'')
df_ltr.iloc[:,cn]=df_ltr.iloc[:,cn].replace(np.nan,'')
#I put all the letters into one string
df_ltr=df_ltr.astype(str)
df_ltr.sum(axis=1)
#print(df_ltr)
#print('\n')
#print(df_ltr.sum(axis=1))
#Now to plot like R with a violing plot
fig,ax=plt.subplots()
df.boxplot(column=col_for_data, by=col_to_group,ax=ax,fontsize=16,showmeans=True
,boxprops=dict(linewidth=2.0),whiskerprops=dict(linewidth=2.0)) #This makes the boxplot
ax.set_ylim([-10,20])
grps=pd.unique(df[col_to_group].values) #Finds the group names
grps.sort() # This is critical! Puts the groups in alphabeical order to make it match the plotting
props=dict(facecolor='white',alpha=1)
for i,grp in enumerate(grps): #I loop through the groups to make the scatters and figure out the axis labels.
x = np.random.normal(i+1, 0.15, size=len(df[df[col_to_group]==grp][col_for_data]))
ax.scatter(x,df[df[col_to_group]==grp][col_for_data],alpha=0.5,s=2)
name="{}\navg={:0.2f}\n(n={})".format(grp
,df[df[col_to_group]==grp][col_for_data].mean()
,df[df[col_to_group]==grp][col_for_data].count())
df_nms['names'][i]=name
ax.text(i+1,ax.get_ylim()[1]*1.1,df_ltr.sum(axis=1)[i],fontsize=10,verticalalignment='top',horizontalalignment='center',bbox=props)
ax.set_xticklabels(df_nms['names'],rotation=0,fontsize=10)
ax.set_title('')
fig.suptitle('')
fig.savefig('anovatest.jpg',dpi=600,bbox_inches='tight')
Results showing the letters above plots using the tukeyhsd
Here is a function that returns letter labels if you have a symmetric matrix of p-values from a Tukey test:
import numpy as np
def tukeyLetters(pp, means=None, alpha=0.05):
'''TUKEYLETTERS - Produce list of group labels for TukeyHSD
letters = TUKEYLETTERS(pp), where PP is a symmetric matrix of
probabilities from a Tukey test, returns alphabetic labels
for each group to indicate clustering. PP may also be a vector
from PAIRWISE_TUKEYHSD.
Optional argument MEANS specifies group means, which is used for
ordering the letters. ("a" gets assigned to the group with lowest
mean.) Without this argument, ordering is arbitrary.
Optional argument ALPHA specifies cutoff for treating groups as
part of the same cluster.'''
if len(pp.shape)==1:
# vector
G = int(3 + np.sqrt(9 - 4*(2-len(pp))))//2
ppp = .5*np.eye(G)
ppp[np.triu_indices(G,1)] = pp
pp = ppp + ppp.T
conn = pp>alpha
G = len(conn)
if np.all(conn):
return ['a' for g in range(G)]
conns = []
for g1 in range(G):
for g2 in range(g1+1,G):
if conn[g1,g2]:
conns.append((g1,g2))
letters = [ [] for g in range(G) ]
nextletter = 0
for g in range(G):
if np.sum(conn[g,:])==1:
letters[g].append(nextletter)
nextletter += 1
while len(conns):
grp = set(conns.pop(0))
for g in range(G):
if all(conn[g, np.sort(list(grp))]):
grp.add(g)
for g in grp:
letters[g].append(nextletter)
for g in grp:
for h in grp:
if (g,h) in conns:
conns.remove((g,h))
nextletter += 1
if means is None:
means = np.arange(G)
means = np.array(means)
groupmeans = []
for k in range(nextletter):
ingroup = [g for g in range(G) if k in letters[g]]
groupmeans.append(means[np.array(ingroup)].mean())
ordr = np.empty(nextletter, int)
ordr[np.argsort(groupmeans)] = np.arange(nextletter)
result = []
for ltr in letters:
lst = [chr(97 + ordr[x]) for x in ltr]
lst.sort()
result.append(''.join(lst))
return result
To make that concrete, here is a full example:
from statsmodels.stats.multicomp import pairwise_tukeyhsd
data = [ 1,2,2,1,4,5,4,5,7,8,7,8,1,3,4,5 ]
group = [ 0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3 ]
tuk = pairwise_tukeyhsd(data, group)
letters = tukeyLetters(tuk.pvalues)
This will result in letters containing ['a', 'c', 'b', 'ac']

Checking errors in my program

I'm trying to make some changes to my dictionary counter in python. I want make some changes to my current counter, but not making any progress so far. I want my code to show the number of different words.
This is what I have so far:
# import sys module in order to access command line arguments later
import sys
# create an empty dictionary
dicWordCount = {}
# read all words from the file and put them into
#'dicWordCount' one by one,
# then count the occurance of each word
you can use the Count function from collections lib:
from collections import Counter
q = Counter(fileSource.read().split())
total = sum(q.values())
First, your first problem, add a variable for the word count and one for the different words. So wordCount = 0 and differentWords = 0. In the loop for your file reading put wordCount += 1 at the top, and in your first if statement put differentWords += 1. You can print these variables out at the end of the program as well.
The second problem, in your printing, add the if statement, if len(strKey)>4:.
If you want a full example code here it is.
import sys
fileSource = open(sys.argv[1], "rt")
dicWordCount = {}
wordCount = 0
differentWords = 0
for strWord in fileSource.read().split():
wordCount += 1
if strWord not in dicWordCount:
dicWordCount[strWord] = 1
differentWords += 1
else:
dicWordCount[strWord] += 1
for strKey in sorted(dicWordCount, key=dicWordCount.get, reverse=True):
if len(strKey) > 4: # if the words length is greater than four.
print(strKey, dicWordCount[strKey])
print("Total words: %s\nDifferent Words: %s" % (wordCount, differentWords))
For your first qs, you can use set to help you count the number of different words. (Assume there is a space between every two words)
str = 'apple boy cat dog elephant fox'
different_word_count = len(set(str.split(' ')))
For your second qs, using a dictionary to help you record the word_count is ok.
How about this?
#gives unique words count
unique_words = len(dicWordCount)
total_words = 0
for k, v in dicWordCount.items():
total_words += v
#gives total word count
print(total_words)
You don't need a separate variable for counting word counts since you're using dictionary, and to count the total words, you just need to add the values of the keys(which are just counts)