I want to know the scatter plot of the sum of the flight fields per minute. My information is as follows
http://python2018.byethost10.com/flights.csv
My grammar is as follows
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib
matplotlib.rcParams['font.sans-serif'] = ['Noto Serif CJK TC']
matplotlib.rcParams['font.family']='sans-serif'
Df=pd.read_csv('flights.csv')
Df["time_hour"] = pd.to_datetime(df['time_hour'])
grp = df.groupby(by=[df.time_hour.map(lambda x : (x.hour, x.minute))])
a=grp.sum()
plt.scatter(a.index, a['flight'], c='b', marker='o')
plt.xlabel('index value', fontsize=16)
plt.ylabel('flight', fontsize=16)
plt.title('scatter plot - index value vs. flight (data range A row & E row )', fontsize=20)
plt.show()
Produced the following error:
Produced the following error
Traceback (most recent call last):
File "I:/PycharmProjects/1223/raise1/char3.py", line 10, in
Plt.scatter(a.index, a['flight'], c='b', marker='o')
File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\pyplot.py", line 3470, in scatter
Edgecolors=edgecolors, data=data, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib__init__.py", line 1855, in inner
Return func(ax, *args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axes_axes.py", line 4320, in scatter
Alpha=alpha
File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\collections.py", line 927, in init
Collection.init(self, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\collections.py", line 159, in init
Offsets = np.asanyarray(offsets, float)
File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\numeric.py", line 544, in asanyarray
Return array(a, dtype, copy=False, order=order, subok=True)
ValueError: setting an array element with a sequence.
How can I produce the following results? Thank you.
http://python2018.byethost10.com/image.png
Problem is in aggregation, in your code it return tuples in index.
Solution is convert time_dt column to strings HH:MM by Series.dt.strftime:
a = df.groupby(by=[df.time_hour.dt.strftime('%H:%M')]).sum()
All together:
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib
matplotlib.rcParams['font.sans-serif'] = ['Noto Serif CJK TC']
matplotlib.rcParams['font.family']='sans-serif'
#first column is index and second clumn is parsed to datetimes
df=pd.read_csv('flights.csv', index_col=[0], parse_dates=[1])
a = df.groupby(by=[df.time_hour.dt.strftime('%H:%M')]).sum()
print (a)
year sched_dep_time flight air_time distance hour minute
time_hour
05:00 122793 37856 87445 11282.0 72838 366 1256
05:01 120780 44810 82113 11115.0 71168 435 1310
05:02 122793 52989 99975 11165.0 72068 515 1489
05:03 120780 57653 98323 10366.0 65137 561 1553
05:04 122793 67706 110230 10026.0 63118 661 1606
05:05 122793 75807 126426 9161.0 55371 742 1607
05:06 120780 82010 120753 10804.0 67827 799 2110
05:07 122793 90684 130339 8408.0 52945 890 1684
05:08 120780 93687 114415 10299.0 63271 922 1487
05:09 122793 101571 99526 11525.0 72915 1002 1371
05:10 122793 107252 107961 10383.0 70137 1056 1652
05:11 120780 111351 120261 10949.0 73350 1098 1551
05:12 122793 120575 135930 8661.0 57406 1190 1575
05:13 120780 118272 104763 7784.0 55886 1166 1672
05:14 122793 37289 109300 9838.0 63582 364 889
05:15 122793 42374 67193 11480.0 78183 409 1474
05:16 58377 22321 53424 4271.0 27527 216 721
plt.scatter(a.index, a['flight'], c='b', marker='o')
#rotate labels of x axis
plt.xticks(rotation=90)
plt.xlabel('index value', fontsize=16)
plt.ylabel('flight', fontsize=16)
plt.title('scatter plot - index value vs. flight (data range A row & E row )', fontsize=20)
plt.show()
Another solution is convert datetimes to times:
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib
matplotlib.rcParams['font.sans-serif'] = 'Noto Serif CJK TC'
matplotlib.rcParams['font.family']='sans-serif'
df=pd.read_csv('flights.csv', index_col=[0], parse_dates=[1])
a = df.groupby(by=[df.time_hour.dt.time]).sum()
print (a)
year sched_dep_time flight air_time distance hour minute
time_hour
05:00:00 122793 37856 87445 11282.0 72838 366 1256
05:01:00 120780 44810 82113 11115.0 71168 435 1310
05:02:00 122793 52989 99975 11165.0 72068 515 1489
05:03:00 120780 57653 98323 10366.0 65137 561 1553
05:04:00 122793 67706 110230 10026.0 63118 661 1606
05:05:00 122793 75807 126426 9161.0 55371 742 1607
05:06:00 120780 82010 120753 10804.0 67827 799 2110
05:07:00 122793 90684 130339 8408.0 52945 890 1684
05:08:00 120780 93687 114415 10299.0 63271 922 1487
05:09:00 122793 101571 99526 11525.0 72915 1002 1371
05:10:00 122793 107252 107961 10383.0 70137 1056 1652
05:11:00 120780 111351 120261 10949.0 73350 1098 1551
05:12:00 122793 120575 135930 8661.0 57406 1190 1575
05:13:00 120780 118272 104763 7784.0 55886 1166 1672
05:14:00 122793 37289 109300 9838.0 63582 364 889
05:15:00 122793 42374 67193 11480.0 78183 409 1474
05:16:00 58377 22321 53424 4271.0 27527 216 721
plt.scatter(a.index, a['flight'], c='b', marker='o')
plt.xticks(rotation=90)
plt.xlabel('index value', fontsize=16)
plt.ylabel('flight', fontsize=16)
plt.title('scatter plot - index value vs. flight (data range A row & E row )', fontsize=20)
plt.show()
Related
I've used lifelines a lot, but when I'm re-running old code that previously worked fine I get the following error: KeyError: "None of [Index(['At risk', 'Censored', 'Events'], dtype='object')] are in the [index]"
I'm guessing there has been some changes to the code when displaying at risk counts, but I can't find any evidence of it in the lifelines documentation. I am using version 27.0
Snippet of the table with data
index
t2p
O
1
354
False
2
113
False
3
1222
False
4
13
True
5
59
False
6
572
False
Code:
ax = plt.subplot(111)
m = KaplanMeierFitter()
ax = m.fit(h.t2p, h.O, label='PPI').plot_cumulative_density(ax=ax,ci_show=False)
add_at_risk_counts(m)
Full error:
KeyError Traceback (most recent call last)
<ipython-input-96-a8ce3ea9e60c> in <module>
4 ax = m.fit(h.t2p, h.O, label='PPI').plot_cumulative_density(ax=ax,ci_show=False)
5
----> 6 add_at_risk_counts(m)
7
8
~\AppData\Local\Continuum\anaconda3\lib\site-packages\lifelines\plotting.py in add_at_risk_counts(labels, rows_to_show, ypos, xticks, ax, at_risk_count_from_start_of_period, *fitters, **kwargs)
510 .rename({"at_risk": "At risk", "censored": "Censored", "observed": "Events"})
511 )
--> 512 counts.extend([int(c) for c in event_table_slice.loc[rows_to_show]])
513
514 if n_rows > 1:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
1766
1767 maybe_callable = com.apply_if_callable(key, self.obj)
-> 1768 return self._getitem_axis(maybe_callable, axis=axis)
1769
1770 def _is_scalar_access(self, key: Tuple):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
1952 raise ValueError("Cannot index with multidimensional key")
1953
-> 1954 return self._getitem_iterable(key, axis=axis)
1955
1956 # nested tuple slicing
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_iterable(self, key, axis)
1593 else:
1594 # A collection of keys
-> 1595 keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
1596 return self.obj._reindex_with_indexers(
1597 {axis: [keyarr, indexer]}, copy=True, allow_dups=True
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1551
1552 self._validate_read_indexer(
-> 1553 keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
1554 )
1555 return keyarr, indexer
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1638 if missing == len(indexer):
1639 axis_name = self.obj._get_axis_name(axis)
-> 1640 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
1641
1642 # We (temporarily) allow for some missing keys with .loc, except in
KeyError: "None of [Index(['At risk', 'Censored', 'Events'], dtype='object')] are in the [index]"
I have the following data with the name 'Salaries.csv'. It looks like the following:[The dataset has some columns like Index(['yearID', 'teamID', 'lgID', 'salary', 'num_feat'], dtype='object'). Please note that the column num_feat I have added to the DataFrame.
I want to do a Seaborn pairplot for team 'ATL' to plot scatter plots among all numeric features in the data frame.
I have the following code :
import seaborn as sns
var_set = [
"yearID",
"teamID",
"lgID",
"playerID",
"salary"
]
head_set = []
head_set.extend(var_set)
head_set.append("num_feat")
df = pd.read_csv('Salaries.csv',index_col='playerID', header=None, names=head_set)
df['num_feat'] = 100 * np.random.random_sample(df.shape[0]). #Adding column num_feat
df_copy = df
cols_with_team_ATL = df_copy.loc[df_copy.teamID=="ATL", ]
# Create the default pairplot
pairplot_fig = sns.pairplot(cols_with_team_ATL, vars=['yearID', 'salary', 'num_feat'])
plt.subplots_adjust(top=0.9)
pairplot_fig.fig.suptitle("Scatter plots among all numeric features in the data frame for teamID = ATL", fontsize=18, alpha=0.9, weight='bold')
plt.show()
The same code runs perfectly on my friend's system but not on mine. It shows the following error in my system :
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/var/folders/ch/6r9p7n0j3xg1l79lz1zdkvsh0000gq/T/ipykernel_97373/3735184261.py in <module>
25 # Create the default pairplot
26 print(df.columns)
---> 27 pairplot_fig = sns.pairplot(cols_with_team_ATL, vars=['yearID', 'salary', 'num_feat'])
28 plt.subplots_adjust(top=0.9)
29 pairplot_fig.fig.suptitle("Scatter plots among all numeric features in the data frame for teamID = ATL", fontsize=18, alpha=0.9, weight='bold')
~/USC/anaconda3/lib/python3.9/site-packages/seaborn/_decorators.py in inner_f(*args, **kwargs)
44 )
45 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 46 return f(**kwargs)
47 return inner_f
48
~/USC/anaconda3/lib/python3.9/site-packages/seaborn/axisgrid.py in pairplot(data, hue, hue_order, palette, vars, x_vars, y_vars, kind, diag_kind, markers, height, aspect, corner, dropna, plot_kws, diag_kws, grid_kws, size)
2124 diag_kws.setdefault("legend", False)
2125 if diag_kind == "hist":
-> 2126 grid.map_diag(histplot, **diag_kws)
2127 elif diag_kind == "kde":
2128 diag_kws.setdefault("fill", True)
~/USC/anaconda3/lib/python3.9/site-packages/seaborn/axisgrid.py in map_diag(self, func, **kwargs)
1476 plot_kwargs.setdefault("hue_order", self._hue_order)
1477 plot_kwargs.setdefault("palette", self._orig_palette)
-> 1478 func(x=vector, **plot_kwargs)
1479 ax.legend_ = None
1480
~/USC/anaconda3/lib/python3.9/site-packages/seaborn/distributions.py in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)
1460 if p.univariate:
1461
-> 1462 p.plot_univariate_histogram(
1463 multiple=multiple,
1464 element=element,
~/USC/anaconda3/lib/python3.9/site-packages/seaborn/distributions.py in plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)
426
427 # First pass through the data to compute the histograms
--> 428 for sub_vars, sub_data in self.iter_data("hue", from_comp_data=True):
429
430 # Prepare the relevant data
~/USC/anaconda3/lib/python3.9/site-packages/seaborn/_core.py in iter_data(self, grouping_vars, reverse, from_comp_data)
981
982 if from_comp_data:
--> 983 data = self.comp_data
984 else:
985 data = self.plot_data
~/USC/anaconda3/lib/python3.9/site-packages/seaborn/_core.py in comp_data(self)
1055 orig = self.plot_data[var].dropna()
1056 comp_col = pd.Series(index=orig.index, dtype=float, name=var)
-> 1057 comp_col.loc[orig.index] = pd.to_numeric(axis.convert_units(orig))
1058
1059 if axis.get_scale() == "log":
~/USC/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
721
722 iloc = self if self.name == "iloc" else self.obj.iloc
--> 723 iloc._setitem_with_indexer(indexer, value, self.name)
724
725 def _validate_key(self, key, axis: int):
~/USC/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value, name)
1730 self._setitem_with_indexer_split_path(indexer, value, name)
1731 else:
-> 1732 self._setitem_single_block(indexer, value, name)
1733
1734 def _setitem_with_indexer_split_path(self, indexer, value, name: str):
~/USC/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py in _setitem_single_block(self, indexer, value, name)
1966
1967 # actually do the set
-> 1968 self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
1969 self.obj._maybe_update_cacher(clear=True)
1970
~/USC/anaconda3/lib/python3.9/site-packages/pandas/core/internals/managers.py in setitem(self, indexer, value)
353
354 def setitem(self: T, indexer, value) -> T:
--> 355 return self.apply("setitem", indexer=indexer, value=value)
356
357 def putmask(self, mask, new, align: bool = True):
~/USC/anaconda3/lib/python3.9/site-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
325 applied = b.apply(f, **kwargs)
326 else:
--> 327 applied = getattr(b, f)(**kwargs)
328 except (TypeError, NotImplementedError):
329 if not ignore_failures:
~/USC/anaconda3/lib/python3.9/site-packages/pandas/core/internals/blocks.py in setitem(self, indexer, value)
941
942 # length checking
--> 943 check_setitem_lengths(indexer, value, values)
944 exact_match = is_exact_shape_match(values, arr_value)
945
~/USC/anaconda3/lib/python3.9/site-packages/pandas/core/indexers.py in check_setitem_lengths(indexer, value, values)
174 and len(indexer[indexer]) == len(value)
175 ):
--> 176 raise ValueError(
177 "cannot set using a list-like indexer "
178 "with a different length than the value"
ValueError: cannot set using a list-like indexer with a different length than the value
Why is it not running particularly on my system? Is there any problem with the python version or Jupyter Notebook?
Please help.
R&D Spend Administration Marketing Spend State Profit
0 165349.20 136897.80 471784.10 New York 192261.83
1 162597.70 151377.59 443898.53 California 191792.06
2 153441.51 101145.55 407934.54 Florida 191050.39
3 144372.41 118671.85 383199.62 New York 182901.99
4 142107.34 91391.77 366168.42 Florida 166187.94
5 131876.90 99814.71 362861.36 New York 156991.12
6 134615.46 147198.87 127716.82 California 156122.51
7 130298.13 145530.06 323876.68 Florida 155752.60
8 120542.52 148718.95 311613.29 New York 152211.77
9 123334.88 108679.17 304981.62 California 149759.96
10 101913.08 110594.11 229160.95 Florida 146121.95
11 100671.96 91790.61 249744.55 California 144259.40
12 93863.75 127320.38 249839.44 Florida 141585.52
13 91992.39 135495.07 252664.93 California 134307.35
14 119943.24 156547.42 256512.92 Florida 132602.65
15 114523.61 122616.84 261776.23 New York 129917.04
16 78013.11 121597.55 264346.06 California 126992.93
17 94657.16 145077.58 282574.31 New York 125370.37
18 91749.16 114175.79 294919.57 Florida 124266.90
19 86419.70 153514.11 0.00 New York 122776.86
20 76253.86 113867.30 298664.47 California 118474.03
21 78389.47 153773.43 299737.29 New York 111313.02
22 73994.56 122782.75 303319.26 Florida 110352.25
23 67532.53 105751.03 304768.73 Florida 108733.99
24 77044.01 99281.34 140574.81 New York 108552.04
25 64664.71 139553.16 137962.62 California 107404.34
26 75328.87 144135.98 134050.07 Florida 105733.54
27 72107.60 127864.55 353183.81 New York 105008.31
28 66051.52 182645.56 118148.20 Florida 103282.38
29 65605.48 153032.06 107138.38 New York 101004.64
30 61994.48 115641.28 91131.24 Florida 99937.59
31 61136.38 152701.92 88218.23 New York 97483.56
32 63408.86 129219.61 46085.25 California 97427.84
33 55493.95 103057.49 214634.81 Florida 96778.92
34 46426.07 157693.92 210797.67 California 96712.80
35 46014.02 85047.44 205517.64 New York 96479.51
36 28663.76 127056.21 201126.82 Florida 90708.19
37 44069.95 51283.14 197029.42 California 89949.14
38 20229.59 65947.93 185265.10 New York 81229.06
39 38558.51 82982.09 174999.30 California 81005.76
40 28754.33 118546.05 172795.67 California 78239.91
41 27892.92 84710.77 164470.71 Florida 77798.83
42 23640.93 96189.63 148001.11 California 71498.49
43 15505.73 127382.30 35534.17 New York 69758.98
44 22177.74 154806.14 28334.72 California 65200.33
45 1000.23 124153.04 1903.93 New York 64926.08
46 1315.46 115816.21 297114.46 Florida 49490.75
47 0.00 135426.92 0.00 California 42559.73
48 542.05 51743.15 0.00 New York 35673.41
49 0.00 116983.80 45173.06 California 14681.40
code
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([('State', OneHotEncoder(), [3])], remainder='passthrough')
X = np.array(ct.fit_transform(X), dtype=np.float)
I keep getting an error like -
TypeError Traceback (most recent call last)
<ipython-input-36-17f64bed7e4c> in <module>
3
4 ct = ColumnTransformer([('State', OneHotEncoder(), [3])], remainder='passthrough')
----> 5 X = np.array(ct.fit_transform(X), dtype=object)
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\sklearn\compose\_column_transformer.py in fit_transform(self, X, y)
516 self._validate_remainder(X)
517
--> 518 result = self._fit_transform(X, y, _fit_transform_one)
519
520 if not result:
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\sklearn\compose\_column_transformer.py in _fit_transform(self, X, y, func, fitted)
446 self._iter(fitted=fitted, replace_strings=True))
447 try:
--> 448 return Parallel(n_jobs=self.n_jobs)(
449 delayed(func)(
450 transformer=clone(trans) if not fitted else trans,
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1002 # remaining jobs.
1003 self._iterating = False
-> 1004 if self.dispatch_one_batch(iterator):
1005 self._iterating = self._original_iterator is not None
1006
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)
833 return False
834 else:
--> 835 self._dispatch(tasks)
836 return True
837
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\joblib\parallel.py in _dispatch(self, batch)
752 with self._lock:
753 job_idx = len(self._jobs)
--> 754 job = self._backend.apply_async(batch, callback=cb)
755 # A job can complete so quickly than its callback is
756 # called before we get here, causing self._jobs to
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\joblib\_parallel_backends.py in apply_async(self, func, callback)
207 def apply_async(self, func, callback=None):
208 """Schedule a func to be run"""
--> 209 result = ImmediateResult(func)
210 if callback:
211 callback(result)
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\joblib\_parallel_backends.py in __init__(self, batch)
588 # Don't delay the application, to avoid keeping the input
589 # arguments in memory
--> 590 self.results = batch()
591
592 def get(self):
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\joblib\parallel.py in __call__(self)
253 # change the default number of processes to -1
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 255 return [func(*args, **kwargs)
256 for func, args, kwargs in self.items]
257
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\joblib\parallel.py in <listcomp>(.0)
253 # change the default number of processes to -1
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 255 return [func(*args, **kwargs)
256 for func, args, kwargs in self.items]
257
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\sklearn\pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
726 with _print_elapsed_time(message_clsname, message):
727 if hasattr(transformer, 'fit_transform'):
--> 728 res = transformer.fit_transform(X, y, **fit_params)
729 else:
730 res = transformer.fit(X, y, **fit_params).transform(X)
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\sklearn\preprocessing\_encoders.py in fit_transform(self, X, y)
370 """
371 self._validate_keywords()
--> 372 return super().fit_transform(X, y)
373
374 def transform(self, X):
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params)
569 if y is None:
570 # fit method of arity 1 (unsupervised transformation)
--> 571 return self.fit(X, **fit_params).transform(X)
572 else:
573 # fit method of arity 2 (supervised transformation)
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\sklearn\preprocessing\_encoders.py in fit(self, X, y)
345 """
346 self._validate_keywords()
--> 347 self._fit(X, handle_unknown=self.handle_unknown)
348 self.drop_idx_ = self._compute_drop_idx()
349 return self
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\sklearn\preprocessing\_encoders.py in _fit(self, X, handle_unknown)
72
73 def _fit(self, X, handle_unknown='error'):
---> 74 X_list, n_samples, n_features = self._check_X(X)
75
76 if self.categories != 'auto':
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\sklearn\preprocessing\_encoders.py in _check_X(self, X)
41 if not (hasattr(X, 'iloc') and getattr(X, 'ndim', 0) == 2):
42 # if not a dataframe, do normal check_array validation
---> 43 X_temp = check_array(X, dtype=None)
44 if (not hasattr(X, 'dtype')
45 and np.issubdtype(X_temp.dtype, np.str_)):
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
506 if sp.issparse(array):
507 _ensure_no_complex_data(array)
--> 508 array = _ensure_sparse_format(array, accept_sparse=accept_sparse,
509 dtype=dtype, copy=copy,
510 force_all_finite=force_all_finite,
c:\users\dell\appdata\local\programs\python\python38\lib\site-packages\sklearn\utils\validation.py in _ensure_sparse_format(spmatrix, accept_sparse, dtype, copy, force_all_finite, accept_large_sparse)
304
305 if accept_sparse is False:
--> 306 raise TypeError('A sparse matrix was passed, but dense '
307 'data is required. Use X.toarray() to '
308 'convert to a dense numpy array.')
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
Please help me to resolve this issue.....
You can set sparse_threshold=0, not very sure about the rest of your code, what X is etc:
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
from sklearn.compose import ColumnTransformer
import pandas as pd
import numpy as np
X = pd.DataFrame({"R&D":[1,2,3,4],
"State":["New Tork","Florida","New York","California"]})
ct = ColumnTransformer([('State', OneHotEncoder(), [0])],
sparse_threshold=0,remainder='passthrough')
np.array(ct.fit_transform(X[['State']]), dtype=np.float)
array([[0., 0., 1., 0.],
[0., 1., 0., 0.],
[0., 0., 0., 1.],
[1., 0., 0., 0.]])
I got a DataFrame:
date phone sensor pallet
126 2019-04-15 940203 C0382C391A4D 47
127 2019-04-15 940203 C0382D392A4D 47
133 2019-04-16 940203 C0382C391A4D 47
134 2019-04-16 940203 C0382D392A4D 47
138 2019-04-17 940203 C0382C391A4D 47
139 2019-04-17 940203 C0382D392A4D 47
144 2019-04-18 940203 C0382C391A4D 47
145 2019-04-18 940203 C0382D392A4D 47
156 2019-04-19 940203 C0382D392A4D 47
157 2019-04-19 940203 C0382C391A4D 47
277 2019-04-15 941557 C0392D362735 32
279 2019-04-15 941557 C03633364D50 32
286 2019-04-16 941557 C03633364D50 32
287 2019-04-16 941557 C0392D362735 32
296 2019-04-17 941557 C03633364D50 32
297 2019-04-17 941557 C0392D362735 32
305 2019-04-18 941557 C0392D362735 32
306 2019-04-18 941557 C03633364D50 32
317 2019-04-19 941557 C03633364D50 32
318 2019-04-19 941557 C0392D362735 32
561 2019-04-15 942316 C0384639224D 45
562 2019-04-15 942316 C03632364950 45
563 2019-04-15 942316 C03920363835 45
564 2019-04-15 942316 C0382939384D 45
573 2019-04-16 942316 C0382939384D 45
574 2019-04-16 942316 C0384639224D 45
575 2019-04-16 942316 C03632364950 45
i want to be able to make subplot for each pallet which contain the sensors arrived in each date.
example:
i have tried few methods:
ax.plot_date
looping through opened ax's and plotting through each 1
grouped = pallets_arrived.groupby('pallet')
nrows = 2
ncols = 2
fig, axs = plt.subplots(nrows, ncols)
targets = zip(grouped.groups.keys(), axs.flatten())
for i, (key, ax) in enumerate(targets):
ax.plot_date(grouped.get_group(key)['date'], grouped.get_group(key)['sensor'], 'o')
plt.show()
return pallets_arrived
which gives wierdly formatted repeating dates (index the Df with date isnt solving the prob)
Df plotting
grouped = pallets_arrived.groupby('pallet')
nrows = 2
ncols = 2
fig, axs = plt.subplots(nrows, ncols)
targets = zip(grouped.groups.keys(), axs.flatten())
for i, (key, ax) in enumerate(targets):
grouped.get_group(key).plot(x='date', y='sensor', ax=ax)
ax.legend()
plt.show()
or
grouped = pallets_arrived.set_index('date').groupby('pallet')
nrows = 2
ncols = 2
fig, axs = plt.subplots(nrows, ncols)
targets = zip(grouped.groups.keys(), axs.flatten())
for i, (key, ax) in enumerate(targets):
grouped.get_group(key).plot(grouped.get_group(key).index, y='sensor', ax=ax)
ax.legend()
plt.show()
pyplot
grouped = pallets_arrived.groupby('pallet')
nrows = 2
ncols = 2
fig, axs = plt.subplots(nrows, ncols)
targets = zip(grouped.groups.keys(), axs.flatten())
for i, (key, ax) in enumerate(targets):
plt.sca(ax)
plt.plot(grouped.get_group(key)['date'], grouped.get_group(key)['sensor'])
ax.legend()
plt.show()
which again
Pivot pallets to Plot() on columns(pallets)
which doesnt work because there are more than 1 sensor in each pallet in same date. so there is a duplicated value error...
I really dont know what method to use to make this 1 correct:
grouping similar dates in x axis.
being able to plot each pallet to different subplot.
i think i dont get the pandas wrapping of matplotlib correctly.
ill be glad for some explenation because im reading guides and cant understand the preferred method for those stuff.
Thanks alot for the helpers.
you can use matplotlib to plot categorical data:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
%matplotlib inline
fig, ax = plt.subplots()
ax.scatter(df['date'], df['sensor'])
plt.show()
or if you want to color the groups:
fig, ax = plt.subplots()
for _,g in df.groupby('pallet'):
ax.scatter(g['date'], g['sensor'])
plt.show()
you can also add a legend:
fig, ax = plt.subplots()
for _,g in df.groupby('pallet'):
ax.scatter(g['date'], g['sensor'], label='Pallet_'+str(_))
ax.legend()
plt.show()
I have sequencing data of micro-RNAs (miR) under different conditions ('Comparisons'), and I want to create a point-plot which will show me on different graphs the fold-change for each miR. the data looks like this (and is a pandas data_frame)
mir_Names Comparison Fold_Change
9 9 mmu-miR-100-4373160\n15 m... YAD-YC 508539.390000
15 9 mmu-miR-100-4373160\n15 m... YAD-YC 26.816000
17 9 mmu-miR-100-4373160\n15 m... YAD-YC 728.608000
18 9 mmu-miR-100-4373160\n15 m... YAD-YC 11483029.706000
'upregulated' is a subset of the dataframe and i tried to visualize it using:
g = sns.FacetGrid(upregulated, col='Comparison', sharex=True, sharey=True, size=0.75, aspect=12./8, despine=True, margin_titles=True)
g.map(sns.pointplot, 'mir_Names', 'Fold_Change', data=upregulated)
**
but it gives me the error which I couldn't find any solution to it:
**
ValueError Traceback (most recent call last) <ipython-input-180-a1cf1b282869> in <module>()
1 g = sns.FacetGrid(upregulated, col='Comparison', sharex=True, sharey=True, size=0.75, aspect=12./8, despine=True, margin_titles=True)
----> 2 g.map(sns.pointplot, 'mir_Names', 'Fold_Change', data=upregulated) #maybe with .count
c:\pyzo2014a\lib\site-packages\seaborn\axisgrid.py in map(self, func,
*args, **kwargs)
446
447 # Finalize the annotations and layout
--> 448 self._finalize_grid(args[:2])
449
450 return self
c:\pyzo2014a\lib\site-packages\seaborn\axisgrid.py in
_finalize_grid(self, axlabels)
537 self.set_axis_labels(*axlabels)
538 self.set_titles()
--> 539 self.fig.tight_layout()
540
541 def facet_axis(self, row_i, col_j):
c:\pyzo2014a\lib\site-packages\matplotlib\figure.py in tight_layout(self, renderer, pad, h_pad, w_pad, rect) 1663 rect=rect) 1664
-> 1665 self.subplots_adjust(**kwargs) 1666 1667
c:\pyzo2014a\lib\site-packages\matplotlib\figure.py in subplots_adjust(self, *args, **kwargs) 1520 1521 """
-> 1522 self.subplotpars.update(*args, **kwargs) 1523 for ax in self.axes: 1524 if not isinstance(ax, SubplotBase):
c:\pyzo2014a\lib\site-packages\matplotlib\figure.py in update(self, left, bottom, right, top, wspace, hspace)
223 if self.bottom >= self.top:
224 reset()
--> 225 raise ValueError('bottom cannot be >= top')
226
227 def _update_this(self, s, val):
**ValueError: bottom cannot be >= top**
What causes this error?