Related
import numpy as np
data = np.array([[10, 20, 30, 40, 50, 60, 70, 80, 90],
[2, 7, 8, 9, 10, 11],
[3, 12, 13, 14, 15, 16],
[4, 3, 4, 5, 6, 7, 10, 12]],dtype=object)
target = data[:,0]
It has this error.
IndexError Traceback (most recent call last)
Input In \[82\], in \<cell line: 9\>()
data = np.array(\[\[10, 20, 30, 40, 50, 60, 70, 80, 90\],
\[2, 7, 8, 9, 10, 11\],
\[3, 12, 13, 14, 15, 16\],
\[4, 3, 4, 5, 6, 7, 10,12\]\],dtype=object)
# Define the target data ----\> 9 target = data\[:,0\]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
May I know how to fix it, please? I mean do not change the elements in the data. Many thanks. I made the matrix in the same size and the error message was gone. But I have the data with variable size.
You have a array of objects, so you can't use indexing on axis=1 as there is none (data.shape -> (4,)).
Use a list comprehension:
out = np.array([a[0] for a in data])
Output: array([10, 2, 3, 4])
I have the following dictionary
{'Electronic Arts': 66,
'GT Interactive': 1,
'Palcom': 1,
'Fox Interactive': 1,
'LucasArts': 5,
'Bethesda Softworks': 9,
'SquareSoft': 3,
'Nintendo': 142,
'Virgin Interactive': 4,
'Atari': 7,
'Ubisoft': 28,
'Konami Digital Entertainment': 11,
'Hasbro Interactive': 1,
'MTV Games': 1,
'Sega': 11,
'Enix Corporation': 4,
'Capcom': 13,
'Warner Bros. Interactive Entertainment': 7,
'Acclaim Entertainment': 1,
'Universal Interactive': 1,
'Namco Bandai Games': 7,
'Eidos Interactive': 9,
'THQ': 7,
'RedOctane': 1,
'Sony Computer Entertainment Europe': 3,
'Take-Two Interactive': 24,
'Square Enix': 5,
'Microsoft Game Studios': 22,
'Disney Interactive Studios': 2,
'Vivendi Games': 2,
'Sony Computer Entertainment': 52,
'Activision': 45,
'505 Games': 4}
Now the problem I am facing is viewing the labels. The labels are extremely small and invisible.
Please anyone can suggest on how to increase the label size.
I have tried the below code:
plt.figure(figsize=(80,80))
plt.pie(vg_dict.values(),labels=vg_dict.keys())
plt.show()
Adding textprops argument in plt.pie method:
plt.figure(figsize=(80,80))
plt.pie(vg_dict.values(), labels=vg_dict.keys(), textprops={'fontsize': 30})
plt.show()
You can check all the properties of Text object here.
Updated
I don't know if your labels order matter? To avoid overlapping labels, you can try to modify your start angle (plt start drawing pie counterclockwise from the x-axis), and re-order the "crowded" labels:
vg_dict = {
'Palcom': 1,
'Electronic Arts': 66,
'GT Interactive': 1,
'LucasArts': 5,
'Bethesda Softworks': 9,
'SquareSoft': 3,
'Nintendo': 142,
'Virgin Interactive': 4,
'Atari': 7,
'Ubisoft': 28,
'Hasbro Interactive': 1,
'Konami Digital Entertainment': 11,
'MTV Games': 1,
'Sega': 11,
'Enix Corporation': 4,
'Capcom': 13,
'Acclaim Entertainment': 1,
'Warner Bros. Interactive Entertainment': 7,
'Universal Interactive': 1,
'Namco Bandai Games': 7,
'Eidos Interactive': 9,
'THQ': 7,
'RedOctane': 1,
'Sony Computer Entertainment Europe': 3,
'Take-Two Interactive': 24,
'Vivendi Games': 2,
'Square Enix': 5,
'Microsoft Game Studios': 22,
'Disney Interactive Studios': 2,
'Sony Computer Entertainment': 52,
'Fox Interactive': 1,
'Activision': 45,
'505 Games': 4}
plt.figure(figsize=(80,80))
plt.pie(vg_dict.values(), labels=vg_dict.keys(), textprops={'fontsize': 35}, startangle=-35)
plt.show()
Result:
Problem:
I have a list of ~108 dictionaries named list_of_dictionary and I would like to use Matplotlib to generate line graphs.
The dictionaries have the following format (this is one of 108):
{'price': [59990,
59890,
60990,
62990,
59990,
59690],
'car': '2014 Land Rover Range Rover Sport',
'datetime': [datetime.datetime(2020, 1, 22, 11, 19, 26),
datetime.datetime(2020, 1, 23, 13, 12, 33),
datetime.datetime(2020, 1, 28, 12, 39, 24),
datetime.datetime(2020, 1, 29, 18, 39, 36),
datetime.datetime(2020, 1, 30, 18, 41, 31),
datetime.datetime(2020, 2, 1, 12, 39, 7)]
}
Understanding the dictionary:
The car 2014 Land Rover Range Rover Sport was priced at:
59990 on datetime.datetime(2020, 1, 22, 11, 19, 26)
59890 on datetime.datetime(2020, 1, 23, 13, 12, 33)
60990 on datetime.datetime(2020, 1, 28, 12, 39, 24)
62990 on datetime.datetime(2020, 1, 29, 18, 39, 36)
59990 on datetime.datetime(2020, 1, 30, 18, 41, 31)
59690 on datetime.datetime(2020, 2, 1, 12, 39, 7)
Question:
With this structure how could one create mini-graphs with matplotlib (say 11 rows x 10 columns)?
Where each mini-graph will have:
the title of the graph frome car
x-axis from the datetime
y-axis from the price
What I have tried:
df = pd.DataFrame(list_of_dictionary)
df = df.set_index('datetime')
print(df)
I don't know what to do thereafter...
Relevant Research:
Plotting a column containing lists using Pandas
Pandas column of lists, create a row for each list element
I've read these multiple times, but the more I read it, the more confused I get :(.
I don't know if it's sensible to try and plot that many plots on a figure. You'll have to make some choices to be able to fit all the axes decorations on the page (titles, axes labels, tick labels, etc...).
but the basic idea would be this:
car_data = [{'price': [59990,
59890,
60990,
62990,
59990,
59690],
'car': '2014 Land Rover Range Rover Sport',
'datetime': [datetime.datetime(2020, 1, 22, 11, 19, 26),
datetime.datetime(2020, 1, 23, 13, 12, 33),
datetime.datetime(2020, 1, 28, 12, 39, 24),
datetime.datetime(2020, 1, 29, 18, 39, 36),
datetime.datetime(2020, 1, 30, 18, 41, 31),
datetime.datetime(2020, 2, 1, 12, 39, 7)]
}]*108
fig, axs = plt.subplots(11,10, figsize=(20,22)) # adjust figsize as you please
for car,ax in zip(car_data, axs.flat):
ax.plot(car["datetime"], car['price'], '-')
ax.set_title(car['car'])
Ideally, all your axes could share the same x and y axes so you could have the labels only on the left-most and bottom-most axes. This is taken care of automatically if you add sharex=True and sharey=True to subplots():
fig, axs = plt.subplots(11,10, figsize=(20,22), sharex=True, sharey=True) # adjust figsize as you please
While plotting using scatterplot in matplotlib, I find some of the values from x-axis are missing in the labels. I want to have all the x-axis legends to be displayed in the graph.
This might be related to tick spacing but I am not sure how to set it to display all the x-axis values.
In the sample code, I want to have all the dates displayed on x-axis
x = [datetime.date(2019, 6, 16), datetime.date(2019, 6, 17), datetime.date(2019, 6, 18), datetime.date(2019, 6, 19),
datetime.date(2019, 6, 20), datetime.date(2019, 6, 21), datetime.date(2019, 6, 22), datetime.date(2019, 6, 23),
datetime.date(2019, 6, 24), datetime.date(2019, 6, 25), datetime.date(2019, 6, 26), datetime.date(2019, 6, 27),
datetime.date(2019, 6, 28), datetime.date(2019, 6, 29), datetime.date(2019, 6, 30), datetime.date(2019, 7, 1),
datetime.date(2019, 7, 2), datetime.date(2019, 7, 3), datetime.date(2019, 7, 4), datetime.date(2019, 7, 5),
datetime.date(2019, 7, 6), datetime.date(2019, 7, 7), datetime.date(2019, 7, 8), datetime.date(2019, 7, 9),
datetime.date(2019, 7, 10), datetime.date(2019, 7, 11), datetime.date(2019, 7, 12), datetime.date(2019, 7, 13),
datetime.date(2019, 7, 15)]
y = [0.15338331291011087, 0.15340904024033467, 0.1534195786228156, 0.15343290378685995, 0.15331644003478487,
0.1533570064827251, 0.1531156771286262, 0.15307150988142237, 0.15306137109205153, 0.15302301551230038,
0.15295889536607005, 0.15298157619113423, 0.15286883583977182, 0.15283539558962958, 0.15284508041253356,
0.15281542656182034, 0.1527844647725921, 0.15277054534676898, 0.1527339281127108, 0.15270419704783855,
0.15261812595095475, 0.15255120245035042, 0.15251650362641, 0.15257536163149088, 0.15253967278547242,
0.15249871561808356, 0.15248591103997422, 0.15242121840852002, 0.15248773465596907]
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(x, y, s=10, c='b', marker="s", label='y')
plt.legend(loc='upper left')
plt.xticks(rotation=90)
plt.show()
Plot that I get with the sample code
Just pass the value of x in the plt.xticks() and set x-axis using 'plt.gcf' it will work.
I have create a random list for the x and plot the graph check it.
from matplotlib import pyplot as plt
from datetime import datetime
def std(a):
return datetime.strptime(a, '%Y, %m, %d').date()
x = [std('2019, 6, 16'), std('2019, 6, 17'), std('2019, 6, 18'), std('2019, 6, 19'),
std('2019, 6, 20'), std('2019, 6, 21'), std('2019, 6, 22'), std('2019, 6, 23'),
std('2019, 6, 24'), std('2019, 6, 25'), std('2019, 6, 26'), std('2019, 6, 27'),
std('2019, 6, 28'), std('2019, 6, 29'), std('2019, 6, 30'), std('2019, 7, 1'),
std('2019, 7, 2'), std('2019, 7, 3'), std('2019, 7, 4'), std('2019, 7, 5'),
std('2019, 7, 6'), std('2019, 7, 7'), std('2019, 7, 8'), std('2019, 7, 9'),
std('2019, 7, 10'), std('2019, 7, 11'), std('2019, 7, 12'), std('2019, 7, 13'),
std('2019, 7, 15')]
y = [0.15338331291011087, 0.15340904024033467, 0.1534195786228156, 0.15343290378685995, 0.15331644003478487,
0.1533570064827251, 0.1531156771286262, 0.15307150988142237, 0.15306137109205153, 0.15302301551230038,
0.15295889536607005, 0.15298157619113423, 0.15286883583977182, 0.15283539558962958, 0.15284508041253356,
0.15281542656182034, 0.1527844647725921, 0.15277054534676898, 0.1527339281127108, 0.15270419704783855,
0.15261812595095475, 0.15255120245035042, 0.15251650362641, 0.15257536163149088, 0.15253967278547242,
0.15249871561808356, 0.15248591103997422, 0.15242121840852002, 0.15248773465596907]
fig = plt.figure(figsize=(8,5))
ax1 = fig.add_subplot(111)
ax1.scatter(x, y, s=10, c='b', marker="s", label='y')
plt.legend(loc='upper left')
#plt.xticks(x,rotation=90)
#plt.xticks(range(len(x)))
plt.gca().margins(x=0)
plt.gcf().canvas.draw()
t_l = plt.gca().get_xticklabels()
maxsize = max([t.get_window_extent().width for t in t_l])
m = .2 # inch margin
s = maxsize/plt.gcf().dpi*len(x)+3*m
margin = m/plt.gcf().get_size_inches()[1]
plt.gcf().subplots_adjust(left=margin, right=0.8-margin)
plt.gcf().set_size_inches(s, plt.gcf().get_size_inches()[1])
plt.xticks(x,rotation=90)
plt.show()
I am stuck. I want to read a simple csv file into a Numpy array and seem to have dug myself into a hole. I am new to Numpy and I am SURE I have messed this up somehow as usually I can read CSV files easily in Python 3.4. I don't want to use Pandas so I thought I would use Numpy to increase my skillset but I really am not getting this at all. If someone could tell me if I am on the right track using genfromtxt OR is there an easier way and give me a nudge in the right direction I would be grateful.
I want to read in the CSV file manipulate the datetime column to 8/4/2014 then put it in a numpy array together with the remaining columns. Here is what I have so far and the error which I am having trouble coding around. I can get the date part way there but don't see how to add the date.strftime("%Y-%m-%d") to the datefunc. Also I don't see how to format the string for SYM to get round the error. Any help would be appreciated.
the data
2015-08-04 02:14:05.249392, AA, 0.0193103612, 0.0193515212, 0.0249713335, 30.6542480634, 30.7195875454, 39.640763021, 0.2131498442, 29.0406746589, 13524.5347810182, 89, 57, 99
2015-08-04 02:14:05.325113, AAPL, 0.0170506271, 0.0137941891, 0.0105915637, 27.0670313481, 21.8975963326, 16.8135861893, -19.0986405157, -23.2172064279, 21.5647072302, 33, 26, 75
2015-08-04 02:14:05.415193, AIG, 0.0080808151, 0.0073296055, 0.0076213535, 12.8278962785, 11.635388035, 12.0985236788, -9.2962105215, 3.980405659, -142.8175077335, 71, 42, 33
2015-08-04 02:14:05.486185, AMZN, 0.0235649449, 0.0305828226, 0.0092703502, 37.4081902773, 48.5487257749, 14.7162247572, 29.7810062852, -69.6877219282, -334.0005615016, 2, 92, 10
the "code" sorry still learning
import numpy as np
from datetime import datetime
from datetime import date,time
datefunc = lambda x: datetime.strptime(x.decode("utf-8"), '%Y-%m-%d %H:%M:%S.%f')
a = np.genfromtxt('/home/dave/Desktop/development/hvanal2016.csv',delimiter = ',',
converters = {0:datefunc},dtype='object,str,float,float,float,float,float,float,float,float,float,float,float,float',
names = ["date","sym","20sd","10sd","5sd","hv20","hv10","hv5","2010hv","105hv","abshv","2010rank","105rank","absrank"])
print(a["date"])
print(a["sym"])
print(a["20sd"])
print(a["hv20"])
print(a["absrank"])
the error
Python 3.4.3+ (default, Oct 14 2015, 16:03:50)
[GCC 5.2.1 20151010] on linux
Type "copyright", "credits" or "license()" for more information.
>>>
============================================================================== RESTART: /home/dave/3 9 15 my slope.py ===============================================================================
[datetime.datetime(2015, 8, 4, 2, 14, 5, 249392)
datetime.datetime(2015, 8, 4, 2, 14, 5, 325113)
datetime.datetime(2015, 8, 4, 2, 14, 5, 415193) ...,
datetime.datetime(2016, 3, 18, 1, 0, 25, 925754)
datetime.datetime(2016, 3, 18, 1, 0, 26, 26400)
datetime.datetime(2016, 3, 18, 1, 0, 26, 114828)]
Traceback (most recent call last):
File "/home/dave/3 9 15 my slope.py", line 19, in <module>
print(a["sym"])
File "/usr/lib/python3/dist-packages/numpy/core/numeric.py", line 1615, in array_str
return array2string(a, max_line_width, precision, suppress_small, ' ', "", str)
File "/usr/lib/python3/dist-packages/numpy/core/arrayprint.py", line 454, in array2string
separator, prefix, formatter=formatter)
File "/usr/lib/python3/dist-packages/numpy/core/arrayprint.py", line 328, in _array2string
_summaryEdgeItems, summary_insert)[:-1]
File "/usr/lib/python3/dist-packages/numpy/core/arrayprint.py", line 490, in _formatArray
word = format_function(a[i]) + separator
UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 0-3: code point not in range(0x110000)
So part of your text is
b'2015-08-04 02:14:05.249392 AA 0.0193103612 ...'
(I'm using b because Py3 genfromtxt opens the file a bytestrings).
But you specify a , delimiter. I don't see any commas.
Let's just try a basic load, not fancy business.
In [97]: txt=b"""2015-08-04 02:14:05.249392 AA 0.0193103612 0.0193515212 0.0249713335 30.6542480634 30.7195875454 39.640763021 0.2131498442 29.0406746589 13524.5347810182 89 57 99
2015-08-04 02:14:05.325113 AAPL 0.0170506271 0.0137941891 0.0105915637 27.0670313481 21.8975963326 16.8135861893 -19.0986405157 -23.2172064279 21.5647072302 33 26 75
"""
In [98]: txt=txt.splitlines()
In [99]: data=np.genfromtxt(txt,dtype=None)
In [100]: data
Out[100]:
array([ (b'2015-08-04', b'02:14:05.249392', b'AA', 0.0193103612, 0.0193515212, 0.0249713335, 30.6542480634, 30.7195875454, 39.640763021, 0.2131498442, 29.0406746589, 13524.5347810182, 89, 57, 99),
(b'2015-08-04', b'02:14:05.325113', b'AAPL', 0.0170506271, 0.0137941891, 0.0105915637, 27.0670313481, 21.8975963326, 16.8135861893, -19.0986405157, -23.2172064279, 21.5647072302, 33, 26, 75)],
dtype=[('f0', 'S10'), ('f1', 'S15'), ('f2', 'S4'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8'), ('f6', '<f8'), ('f7', '<f8'), ('f8', '<f8'), ('f9', '<f8'), ('f10', '<f8'), ('f11', '<f8'), ('f12', '<i4'), ('f13', '<i4'), ('f14', '<i4')])
The datetime information is in 2 fields:
In [101]: data[['f0','f1']]
Out[101]:
array([(b'2015-08-04', b'02:14:05.249392'),
(b'2015-08-04', b'02:14:05.325113')],
dtype=[('f0', 'S10'), ('f1', 'S15')])
Your datefunction does work with a byte substring
In [102]: datefunc(b'2015-08-04 02:14:05.249392')
Out[102]: datetime.datetime(2015, 8, 4, 2, 14, 5, 249392)
But it requires 2 fields (as defined by the ' ' delimiter). So we need to figure out a way of parsing these 2 substrings as one, rather than split into two fields.
Maybe I'll try changing the sample txt to really use , delimiter (but not between date and time) and set what works.
With the , delimited text I get:
In [117]: data=np.genfromtxt(txt,delimiter=',',dtype=None,usecols=[0,1,2,3])
In [118]: data.dtype
Out[118]: dtype([('f0', 'S26'), ('f1', 'S5'), ('f2', '<f8'), ('f3', '<f8')])
In [119]: data['f0']
Out[119]:
array([b'2015-08-04 02:14:05.249392', b'2015-08-04 02:14:05.325113',
b'2015-08-04 02:14:05.415193', b'2015-08-04 02:14:05.486185'],
dtype='|S26')
In [120]: [datefunc(d) for d in data['f0']]
Out[120]:
[datetime.datetime(2015, 8, 4, 2, 14, 5, 249392),
datetime.datetime(2015, 8, 4, 2, 14, 5, 325113),
datetime.datetime(2015, 8, 4, 2, 14, 5, 415193),
datetime.datetime(2015, 8, 4, 2, 14, 5, 486185)]
I used usecols because the full text has 14 fields in the 1st line, and 13 in the others.
If I specify the dtype (instead of the easy None), I can replace the strings in the 1st field with these datetime objects:
In [122]: data=np.genfromtxt(txt,delimiter=',',dtype='O,S5,f,f',usecols=[0,1,2,3])
In [123]: data
Out[123]:
array([ (b'2015-08-04 02:14:05.249392', b' AA', 0.01931036077439785, 0.019351521506905556),
(b'2015-08-04 02:14:05.325113', b' AAPL', 0.01705062761902809, 0.01379418931901455),....],
dtype=[('f0', 'O'), ('f1', 'S5'), ('f2', '<f4'), ('f3', '<f4')])
In [124]: data['f0']
Out[124]:
array([b'2015-08-04 02:14:05.249392', b'2015-08-04 02:14:05.325113',
b'2015-08-04 02:14:05.415193', b'2015-08-04 02:14:05.486185'], dtype=object)
....
In [126]: data['f0']=[datefunc(d) for d in data['f0']]
In [127]: data
Out[127]:
array([ (datetime.datetime(2015, 8, 4, 2, 14, 5, 249392), b' AA', 0.01931036077439785, 0.019351521506905556),
(datetime.datetime(2015, 8, 4, 2, 14, 5, 325113), b' AAPL', 0.01705062761902809, 0.01379418931901455),...],
dtype=[('f0', 'O'), ('f1', 'S5'), ('f2', '<f4'), ('f3', '<f4')])
and with the converter, your call works (more or less)
In [133]: data=np.genfromtxt(txt,dtype='object,S5,float,float',
converters = {0:datefunc},delimiter=',',usecols=[0,1,2,3])
In [134]: data
Out[134]:
array([ (datetime.datetime(2015, 8, 4, 2, 14, 5, 249392), b' AA', 0.0193103612, 0.0193515212),
(datetime.datetime(2015, 8, 4, 2, 14, 5, 325113), b' AAPL', 0.0170506271, 0.0137941891),...],
dtype=[('f0', 'O'), ('f1', 'S5'), ('f2', '<f8'), ('f3', '<f8')])
the numpy datetime64 works with this string. These types can be used a numpy numbers.
In [154]: datefunc(b'2015-08-04 02:14:05.249392')
Out[154]: datetime.datetime(2015, 8, 4, 2, 14, 5, 249392)
In [155]: np.datetime64(b'2015-08-04 02:14:05.249392')
Out[155]: numpy.datetime64('2015-08-04T02:14:05.249392-0700')
From this Importing csv into Numpy datetime64 I got this to work:
In [175]: data=np.genfromtxt(txt,dtype='M8[us],S5,float,float',
delimiter=',',usecols=[0,1,2,3])
In [176]: data
Out[176]:
array([ (datetime.datetime(2015, 8, 4, 9, 14, 5, 249392), b' AA', 0.0193103612, 0.0193515212),
(datetime.datetime(2015, 8, 4, 9, 14, 5, 325113), b' AAPL', 0.0170506271, 0.0137941891),...],
dtype=[('f0', '<M8[us]'), ('f1', 'S5'), ('f2', '<f8'), ('f3', '<f8')])
See for datetime units: http://docs.scipy.org/doc/numpy/reference/arrays.datetime.html#datetime-units