Data-Visualization Python - pandas

Plot 4 different line plots for the 4 companies in dataframe open_prices. Year would be on X-axis, stock price on Y axis, you will need (2,2) plot. Set figure size to 10, 8 and share X-axis for better visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from nsepy import get_history
import datetime as dt
%matplotlib inline
start = dt.datetime(2015, 1, 1)
end = dt.datetime.today()
infy = get_history(symbol='INFY', start = start, end = end)
infy.index = pd.to_datetime(infy.index)
hdfc = get_history(symbol='HDFC', start = start, end = end)
hdfc.index = pd.to_datetime(hdfc.index)
reliance = get_history(symbol='RELIANCE', start = start, end = end)
reliance.index = pd.to_datetime(reliance.index)
wipro = get_history(symbol='WIPRO', start = start, end = end)
wipro.index = pd.to_datetime(wipro.index)
open_prices = pd.concat([infy['Open'], hdfc['Open'],reliance['Open'],
wipro['Open']], axis = 1)
open_prices.columns = ['Infy', 'Hdfc', 'Reliance', 'Wipro']
f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
axes[0, 0].plot(open_prices.index.year,open_prices.INFY)
axes[0, 1].plot(open_prices.index.year,open_prices.HDB)
axes[1, 0].plot(open_prices.index.year,open_prices.TTM)
axes[1, 1].plot(open_prices.index.year,open_prices.WIT)
Blank graph is coming.Please help....?!??

Below code works fine , I have changed the following things
a) axis should be ax b) DF column names were incorrect c) for any one to try this example would also need to install lxml library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from nsepy import get_history
import datetime as dt
start = dt.datetime(2015, 1, 1)
end = dt.datetime.today()
infy = get_history(symbol='INFY', start = start, end = end)
infy.index = pd.to_datetime(infy.index)
hdfc = get_history(symbol='HDFC', start = start, end = end)
hdfc.index = pd.to_datetime(hdfc.index)
reliance = get_history(symbol='RELIANCE', start = start, end = end)
reliance.index = pd.to_datetime(reliance.index)
wipro = get_history(symbol='WIPRO', start = start, end = end)
wipro.index = pd.to_datetime(wipro.index)
open_prices = pd.concat([infy['Open'], hdfc['Open'],reliance['Open'],
wipro['Open']], axis = 1)
open_prices.columns = ['Infy', 'Hdfc', 'Reliance', 'Wipro']
print(open_prices.columns)
ax=[]
f, ax = plt.subplots(2, 2, sharey=True)
ax[0,0].plot(open_prices.index.year,open_prices.Infy)
ax[1,0].plot(open_prices.index.year,open_prices.Hdfc)
ax[0,1].plot(open_prices.index.year,open_prices.Reliance)
ax[1,1].plot(open_prices.index.year,open_prices.Wipro)
plt.show()

Related

How to color a plot lines based on amplitude

So I want to change the color of the blue vlines(matplotlib) in the above plot.
First I want to make the negative(< 0) values different color and take their absolute so that only amplitude is visible but they will be a different color than the negative ones. Positive values could remain unchanged.
minimum reproducible code as below:
import numpy as np
import random
import matplotlib.pyplot as plt
peakmzs = np.array([random.uniform(506, 2000) for i in range(2080)])
peakmzs = peakmzs[peakmzs.argsort()[::1]]
spec = np.zeros_like(peakmzs)
b = np.where((peakmzs > 1500) & (peakmzs < 1540))[0]
spec[b] = [random.uniform(0, 0.002) for i in range(len(b))]
b = np.where((peakmzs > 700) & (peakmzs < 820))[0]
spec[b] = [random.uniform(0, 0.05) for i in range(len(b))]
spec[300:302] = 0.07
b = np.where((peakmzs > 600) & (peakmzs < 650))[0]
spec[b] = [random.uniform(0, 0.03) for i in range(len(b))]
plt.vlines(peakmzs, spec, ymax=spec.max())
plt.show()
shp_values = np.zeros_like(peakmzs)
b = np.where((peakmzs > 1500) & (peakmzs < 1540))[0]
b_ = np.random.randint(1500, 1540, 10)
# print(b_)
shp_values[b] = [random.uniform(-0.003, 0.002) for i in range(len(b))]
shp_values[b_] = 0
b = np.where((peakmzs > 700) & (peakmzs < 820))[0]
shp_values[b] = [random.uniform(-0.004, 0.002) for i in range(len(b))]
b_ = np.random.randint(700, 820, 70)
shp_values[b_] = 0
# [random.uniform(-0.005, 0.003) for i in range(len(peakmzs))]
plt.plot(shp_values)
Based on the suggestion from #JohanC,
demo_shp = np.array(shapvalues[0][19])
colors = np.where(demo_shp < 0, 'cyan', 'pink')
plt.vlines(peakmzs, ymin=[0], ymax=demo_shp, colors=colors)
plt.show()

Plot multiple graphs without using a for loop

So, my question may not be exactly what is in the title.
I have a function
y = a*x + b
And I want to plot y whith different values of b.
I know that I can do the following:
import numpy as np
import matplotlib.pyplot as plt
a = 2
x = np.array([0,1,2,3,4])
b = 0
for i in range(10):
y = a*x + b
b = b+1
plt.plot(x,y)
And that returns exactly what I want.
But, there is someway that I can make this by using
b = np.array([0,1,2,3,4,5,6,7,8,9])? So, then my code could look something like:
import numpy as np
import matplotlib.pyplot as plt
a = 2
x = np.array([0,1,2,3,4])
b = np.array([0,1,2,3,4,5,6,7,8,9])
y = a*x + b
plt.plot(x,y)
Yes, you can use matrix operations to create a 2D matrix with the result of the operation y = a*x + b.
a = 2
x = np.array([0,1,2,3,4])
b = np.array([0,1,2,3,4,5,6,7,8,9])
y = a*x[:,None]+b
plt.plot(x, y)
EDIT: I'm shwing the solution provided by #Quang Hoang which is much simpler than mine.
original code was:
y = np.tile(a*x, (b.size,1)) + b[:,np.newaxis]
plt.plot(x, y.T)

pandas merge two dataframe to form a multiindex

I'm playing around with Pandas to see if I can do some stock calculation better/faster than with other tools. If I have a single stock it's easy to create daily calculation L
df['mystuff'] = df['Close']+1
If I download more than a ticker it gets complicated:
df = df.stack()
df['mystuff'] = df['Close']+1
df = df.unstack()
If I want to use prevous' day "Close" it gets too complex for me. I thought I might go back to fetch a single ticker, do any operation with iloc[i-1] or something similar (I haven't figured it yet) and then merge the dataframes.
How do I merget two dataframes of single tickers to have a multiindex?
So that:
f1 = web.DataReader('AAPL', 'yahoo', start, end)
f2 = web.DataReader('GOOG', 'yahoo', start, end)
is like
f = web.DataReader(['AAPL','GOOG'], 'yahoo', start, end)
Edit:
This is the nearest thing to f I can create. It's not exactly the same so I'm not sure I can use it instead of f.
f_f = pd.concat(['AAPL':f1,'GOOG':f2},axis=1)
Maybe I should experiment with operations working on a multiindex instead of splitting work on simpler dataframes.
Full Code:
import pandas_datareader.data as web
import pandas as pd
from datetime import datetime
start = datetime(2001, 9, 1)
end = datetime(2019, 8, 31)
a = web.DataReader('AAPL', 'yahoo', start, end)
g = web.DataReader('GOOG', 'yahoo', start, end)
# here are shift/diff calculations that I don't knokw how to do with a multiindex
a_g = web.DataReader(['AAPL','GOOG'], 'yahoo', start, end)
merged = pd.concat({'AAPL':a,'GOOG':g},axis=1)
a_g.to_csv('ag.csv')
merged.to_csv('merged.csv')
import code; code.interact(local=locals())
side note: I don't know how to compare the two csv
This is not exactly the same but it returns Multiindex you can use as in the a_g case
import pandas_datareader.data as web
import pandas as pd
from datetime import datetime
start = datetime(2019, 7, 1)
end = datetime(2019, 8, 31)
out = []
for tick in ["AAPL", "GOOG"]:
d = web.DataReader(tick, 'yahoo', start, end)
cols = [(col, tick) for col in d.columns]
d.columns = pd.MultiIndex\
.from_tuples(cols,
names=['Attributes', 'Symbols'] )
out.append(d)
df = pd.concat(out, axis=1)
Update
In case you want to calculate and add a new column in case you have multiindex columns you can follow this
import pandas_datareader.data as web
import pandas as pd
from datetime import datetime
start = datetime(2019, 7, 1)
end = datetime(2019, 8, 31)
ticks = ['AAPL','GOOG']
df = web.DataReader(ticks, 'yahoo', start, end)
names = list(df.columns.names)
df1 = df["Close"].shift()
cols = [("New", col) for col in df1.columns]
df1.columns = pd.MultiIndex.from_tuples(cols,
names=names)
df = df.join(df1)

How to develop a function

I am completely new in python and I am learning it. I have written the following code but i couldnt make any functions of it. Can somebody help me please?
import pandas as pd
import numpy as np
f = open('1.csv', 'r')
df = pd.read_csv(f, usecols=[0], sep="\t", index_col=False)
Primary_List = df.values.tolist()
x = 0
y = len(Primary_List)
for i in range(x, y):
x = i
MyMatrix = Primary_List[x:x + 10]
print(MyMatrix)
You could create a function where you pass in the filename, then you could use this code to read and print many csv files.
def createMatrix(filename):
f = open(filename, 'r')
df = pd.read_csv(f, usecols=[0], sep="\t", index_col=False)
Primary_List = df.values.tolist()
x = 0
y = len(Primary_List)
for i in range(x, y):
x = i
MyMatrix = Primary_List[x:x + 10]
return MyMatrix
print(createMatrix('1.csv'))
print(createMatrix('2.csv'))
print(createMatrix('3.csv'))

matplotlib - plotting histogram with unique bins

I am trying to plot a histogram but the x ticks does not seem to get right.
The plot is intended to get a histogram of frequency counts ( 1 to 13 ) and total rows in 10000.
d1 = []
for i in np.arange(1, 10000):
tmp = np.random.randint(1, 13)
d1.append(tmp)
d2 = pd.DataFrame(d1)
d2.hist(width = 0.5)
plt.xticks(np.arange(1, 14, 1))
I am trying to plot frequency count of values and not ranges.
You would need to set the bin edges which should be used by the histogram.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
d1 = np.random.randint(1, 13, size=1000)
d2 = pd.DataFrame(d1)
bins = np.arange(0,13)+0.5
d2.hist(bins=bins, ec ="k")
plt.xticks(np.arange(1, 13))
plt.show()