Data handling for matplotlib histogram with error bars - matplotlib

I've got a data set which is a list of tuples in python like this:
dataSet = [(6.1248199999999997, 27), (6.4400500000000003, 4), (5.9150600000000004, 1), (5.5388400000000004, 38), (5.82559, 1), (7.6892199999999997, 2), (6.9047799999999997, 1), (6.3516300000000001, 76), (6.5168699999999999, 1), (7.4382099999999998, 1), (5.4493299999999998, 1), (5.6254099999999996, 1), (6.3227700000000002, 1), (5.3321899999999998, 11), (6.7402300000000004, 4), (7.6701499999999996, 1), (5.4589400000000001, 3), (6.3089700000000004, 1), (6.5926099999999996, 2), (6.0003000000000002, 5), (5.9845800000000002, 1), (6.4967499999999996, 2), (6.51227, 6), (7.0302600000000002, 1), (5.7271200000000002, 49), (7.5311300000000001, 7), (5.9495800000000001, 2), (5.1487299999999996, 18), (5.7637099999999997, 6), (5.5144500000000001, 44), (6.7988499999999998, 1), (5.2578399999999998, 1)]
Where the first element of the tuple is an energy and the second a counter, how many sensor where affected.
I want to create a histogram to study the relation between the number of affected sensors and the energy. I'm pretty new to matplotlib (and python), but this is what I've done so far:
import math
import matplotlib.pyplot as plt
dataSet = [(6.1248199999999997, 27), (6.4400500000000003, 4), (5.9150600000000004, 1), (5.5388400000000004, 38), (5.82559, 1), (7.6892199999999997, 2), (6.9047799999999997, 1), (6.3516300000000001, 76), (6.5168699999999999, 1), (7.4382099999999998, 1), (5.4493299999999998, 1), (5.6254099999999996, 1), (6.3227700000000002, 1), (5.3321899999999998, 11), (6.7402300000000004, 4), (7.6701499999999996, 1), (5.4589400000000001, 3), (6.3089700000000004, 1), (6.5926099999999996, 2), (6.0003000000000002, 5), (5.9845800000000002, 1), (6.4967499999999996, 2), (6.51227, 6), (7.0302600000000002, 1), (5.7271200000000002, 49), (7.5311300000000001, 7), (5.9495800000000001, 2), (5.1487299999999996, 18), (5.7637099999999997, 6), (5.5144500000000001, 44), (6.7988499999999998, 1), (5.2578399999999998, 1)]
binWidth = .2
binnedDataSet = []
#create another list and append the "binning-value"
for item in dataSet:
binnedDataSet.append((item[0], item[1], math.floor(item[0]/binWidth)*binWidth))
energies, sensorHits, binnedEnergy = [[q[i] for q in binnedDataSet] for i in (0,1,2)]
plt.plot(binnedEnergy, sensorHits, 'ro')
plt.show()
This works so far (although it doesn't even look like a histogram ;-) but OK), but now I want to calculate the mean value for each bin and append some error bars.
What's the way to do it? I looked at histogram examples for matplotlib, but they all use one-dimensional data which will be counted, so you get a frequency spectrum… That's not really what I want.

I am somewhat confused by exactly what you are trying to do, but I think this (to first order) will do what I think you want:
bin_width = .2
bottom = 5.0
top = 8.0
binned_data = [0.0] * int(math.ceil(((top - bottom) / bin_width)))
binned_count = [0] * int(math.ceil(((top - bottom) / bin_width)))
n_bins = len(binned_data)
for E, cnt in dataSet:
if E < bottom or E > top:
print 'out of range'
continue
bin_id = int(math.floor(n_bins * (E - bottom) / (top - bottom)))
binned_data[bin_id] += cnt
binned_count[bin_id] += 1
binned_avergaed_data = [C_sum / hits if hits > 0 else 0 for C_sum, hits in zip(binned_data, binned_count)]
bin_edges = [bottom + j * bin_width for j in range(len(binned_data))]
plt.bar(bin_edges, binned_avergaed_data, width=bin_width)
I would also suggest looking into numpy, it would make this much simpler to write.

Related

PysimpleGUI: subplots - plotting graphs next to each other when some are deselected

I have a code where I plot several graphs using subplots. I have checkboxes where I can select/deselect in my main window which graph/data I want to plot. I wonder if it is possible when selecting e.g. graph 1, 3 & 4 and deselecting graph 2, if graph 3 & 4 can be plotted next to graph 1. In my code there is a gap between graph 1, 3 & 4.
if event == 'Plot Lithology':
fig1, ax = plt.subplots(figsize=(7, 5), dpi=100)
mng = plt.get_current_fig_manager()
mng.resize(*mng.window.maxsize())
ax1 = plt.subplot2grid((1, 7), (0, 0), rowspan=1, colspan=1)
ax2 = plt.subplot2grid((1, 7), (0, 1), rowspan=1, colspan=1)
ax3 = plt.subplot2grid((1, 7), (0, 2), rowspan=1, colspan=1)
ax4 = plt.subplot2grid((1, 7), (0, 3), rowspan=1, colspan=1, sharey=ax1)
ax5 = ax2.twiny()
ax6 = plt.subplot2grid((1, 7), (0, 4), rowspan=1, colspan=1, sharey=ax1)
ax7 = plt.subplot2grid((1, 7), (0, 5), rowspan=1, colspan=1, sharey=ax1)
canvas = FigureCanvasTkAgg(fig1, master=canvas_elem.TKCanvas)
canvas.draw()
.... # some more code .....
plt.show()
main_window.close()
# The ax are defined before the fig. as e.g.:
if values['litho']:
ax1.pcolormesh([0, 1], df_depth['Top_Depth'],
df_depth['Lithological Unit'][:-1].map(lit).to_numpy().reshape(-1, 1), cmap=cmap, vmin=1,
vmax=len(colors))
I tried using matplotlib.pyplot.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None) but it gave me errors or just didn't do anything.
Thank you
If it is possible to plot graphs next to each other when deselecting them so that there is no gap between the other graphs. Which function I can use.

Performing a mod function on time data column pandas python

Hello I wanted to apply a mod function of column % 24 to the hour of time column.
I believe the time column is in a string format,
I was wondering how I should go about performing the operation.
sales_id,date,time,shopping_cart,price,parcel_size,Customer_lat,Customer_long,isLoyaltyProgram,nearest_storehouse_id,nearest_storehouse,dist_to_nearest_storehouse,delivery_cost
ORD0056604,24/03/2021,45:13:45,"[('bed', 3), ('Chair', 1), ('wardrobe', 4), ('side_table', 2), ('Dining_table', 2), ('mattress', 1)]",3152.77,medium,-38.246,145.61984,1,4,Sunshine,78.43,5.8725000000000005
ORD0096594,13/12/2018,54:22:20,"[('Study_table', 4), ('wardrobe', 4), ('side_table', 1), ('Dining_table', 2), ('sofa', 4), ('Chair', 3), ('mattress', 1)]",3781.38,large,-38.15718,145.05072,1,4,Sunshine,40.09,5.8725000000000005
ORD0046310,16/02/2018,17:23:36,"[('mattress', 2), ('wardrobe', 1), ('side_table', 2), ('sofa', 1), ('Chair', 3), ('Study_table', 4)]",2219.09,medium,144.69623,-38.00731,0,2,Footscray,34.2,16.9875
ORD0031675,25/06/2018,17:38:48,"[('bed', 4), ('side_table', 1), ('Chair', 1), ('mattress', 3), ('Dining_table', 2), ('sofa', 2), ('wardrobe', 2)]",4542.1,large,144.65506,-38.40669,1,2,Footscray,72.72,18.274500000000003
ORD0019799,05/01/2021,18:37:16,"[('wardrobe', 1), ('Study_table', 3), ('sofa', 4), ('side_table', 2), ('Chair', 4), ('Dining_table', 4), ('bed', 1)]",3132.71,L,-37.66022,144.94286,1,0,Clayton,17.77,14.931
ORD0041462,25/12/2018,07:29:33,"[('Chair', 3), ('bed', 1), ('mattress', 3), ('side_table', 3), ('wardrobe', 3), ('sofa', 4)]",4416.42,medium,-38.39154,145.87448,0,6,Sunshine,105.91,6.151500000000001
ORD0047848,30/07/2021,34:18:01,"[('Chair', 3), ('bed', 3), ('wardrobe', 4)]",2541.04,small,-37.4654,144.45832,1,2,Footscray,60.85,18.4635
Convert values to timedeltas by to_timedelta and then remove days by indexing - selecting last 8 values:
print (df)
sales_id date time
0 ORD0056604 24/03/2021 45:13:45
1 ORD0096594 13/12/2018 54:22:20
print (pd.to_timedelta(df['time']))
0 1 days 21:13:45
1 2 days 06:22:20
Name: time, dtype: timedelta64[ns]
df['time'] = pd.to_timedelta(df['time']).astype(str).str[-8:]
print (df)
sales_id date time
0 ORD0056604 24/03/2021 21:13:45
1 ORD0096594 13/12/2018 06:22:20
If need also add days to date column solution is add timedeltas to dates and last extract values by Series.dt.strftime:
dates = pd.to_datetime(df['date'], dayfirst=True) + pd.to_timedelta(df['time'])
df['time'] = dates.dt.strftime('%H:%M:%S')
df['date'] = dates.dt.strftime('%d/%m/%Y')
print (df)
sales_id date time
0 ORD0056604 25/03/2021 21:13:45
1 ORD0096594 15/12/2018 06:22:20

Is there any way to mapping point between 2 ecliptic curve?

The ecliptic curve E1: y^2 = x^3+7 over F17 with the base point G is (15, 13)
and the second ecliptic curve E2: y^2 = x^3+7 over F31 with the same base point G is (15, 13).
My question is: is there any way to calculate the equivalent point of F31 based on F17?
For example: with 7G = (10, 15) of curve F17, how to calculate 7G of F31 ? The result should be 7G = (12, 14) on F31.
Below is all points of two curves:
#----Curve F17-------#
1G = (15, 13)
2G = (2, 10)
3G = (8, 3)
4G = (12, 1)
5G = (6, 6)
6G = (5, 8)
7G = (10, 15)
8G = (1, 12)
9G = (3, 0)
10G = (1, 5)
11G = (10, 2)
12G = (5, 9)
13G = (6, 11)
14G = (12, 16)
15G = (8, 14)
16G = (2, 7)
17G = (15, 4)
#----Curve F31-------#
1G = (15, 13)
2G = (29, 17)
3G = (1, 22)
4G = (20, 19)
5G = (21, 17)
6G = (23, 23)
7G = (12, 14)
8G = (11, 27)
9G = (25, 22)
10G = (7, 19)
11G = (27, 27)
12G = (5, 9)
13G = (0, 24)
14G = (4, 12)
15G = (22, 23)
16G = (3, 13)
17G = (13, 18)
18G = (17, 23)
19G = (24, 4)
20G = (24, 27)
21G = (17, 8)
22G = (13, 13)
23G = (3, 18)
24G = (22, 8)
25G = (4, 19)
26G = (0, 7)
27G = (5, 22)
28G = (27, 4)
29G = (7, 12)
30G = (25, 9)
31G = (11, 4)

Pandas how to find position of cell contains sub-string

Example:
Price | Rate p/lot | Total Comm|
947.2 1.25 BAM 1.25
129.3 2.1 $ 1.25
161.69 $ 0.8 CAD 2.00
If I have search for ['$','CAD']:-
Expected output:-
[(1, 2), (2, 1),(2,2)]
Sorry, find solution like this,It may help someone
import pandas as pd
df = pd.DataFrame([[947.2, 1.25, 'BAM 1.25'],
[129.3, 2.1, '$ 1.25'],
[161.69, '0.8 $', 'CAD 2.00']],
columns=['Price', 'Rate p/lot', 'Total Comm'])
row, column = (df.applymap(lambda x: x if any(s in str(x) for s in ['$','CAD']) else None )).values.nonzero()
t = list(zip(row,column))
You can use in with applymap:
i, j = (df.applymap(lambda x: '$' in str(x))).values.nonzero()
t = list(zip(i, j))
print (t)
[(1, 2), (2, 1)]
i, j = (df.applymap(lambda x: any(y for y in L if y in str(x)))).values.nonzero()
#another solution
#i, j = (df.applymap(lambda x: any(s in str(x) for s in L))).values.nonzero()
t = list(zip(i, j))
print (t)
[(1, 2), (2, 1), (2, 2)]
Use str.contains:
df = df.astype(str)
from itertools import product
result = reduce(lambda x,y:x+y, [list(product([i],list(df.iloc[:,i][df.iloc[:,i].str.contains('\$|CAD')].index))) for i in range(len(df.columns))])
Output
[(1, 2), (2, 1), (2, 2)]

Create polygons based on GPS-logging points SQL 2008

I'm using SQL Server 2008 and I want to create a database view in which polygons are created based on points in a results-table?
Some remarks:
The results-table contains i.a. the columns 'ScanID', 'Location', 'Latitude', 'Longitude', 'Automatic'.
The combination of a consecutive series of ScanID's and a value in the column 'Location' should result into unique polygons.
The column 'Automatic' defines whether a specific point should be taken into account (1) or not (0).
The first vertex of the polygon should be repeated in the end of the polygon-string to close the polygon. The results-table only contains GPS-logging points and doesn't repeat the first point.
Some representative sample data:
(7894560, 'Lake', '52.9891', '5.1206', 0),
(7894561, 'Lake', '52.9901', '5.1201', 1),
(7894562, 'Lake', '52.9901', '5.1211', 1),
(7894563, 'Lake', '52.9911', '5.1211', 1),
(7894564, 'Lake', '52.9911', '5.1201', 1),
(7894565, 'House', '52.9901', '5.1211', 1),
(7894566, 'House', '52.9901', '5.1221', 1),
(7894567, 'House', '52.9911', '5.1221', 1),
(7894568, 'House', '52.9911', '5.1211', 1),
(7894569, 'Lake', '52.9901', '5.1221', 1),
(7894570, 'Lake', '52.9901', '5.1231', 1),
(7894571, 'Lake', '52.9911', '5.1231', 1),
(7894572, 'Lake', '52.9911', '5.1221', 1);
Ideally the output would be formatted as follows:
(Location, Shape)
('Lake', 'POLYGON (52.9890 5.1201, 52.9901 5.1211, 52.9911 5.1211, 52.9911 5.1201, 52.9890 5.1201)')
('House', 'POLYGON (52.9901 5.1211, 52.9901 5.1221, 52.9911 5.1221, 52.9911 5.1211, 52.9901 5.1211)')
('Lake', 'POLYGON (52.9901 5.1221, 52.9901 5.1231, 52.9911 5.1231, 52.9911 5.1221, 52.9901 5.1221)')
I've got another database view in which I create linestrings by taking two consecutive points. The code I use there is:
GEOMETRY::STGeomFromText('LINESTRING(' + CONVERT(VARCHAR,
CAST(t1.latitude AS decimal(20, 16))) + ' ' + CONVERT(VARCHAR, CAST(t1.longitude AS decimal(20, 16))) + ', ' + CONVERT(VARCHAR, CAST(t2.latitude AS decimal(20, 16)))
+ ' ' + CONVERT(VARCHAR, CAST(t2.longitude AS decimal(20, 16))) + ')', 4326) AS Shape"
I don't think I should solve my problem by using and transforming this piece of code, but it gives some direction. Probably de piece 'GEOMETRY::STGeomFromText' or something like that should be used.
Thanks in advance!