Flask-SqlAlchemy Filtering by time section of datetime - flask-sqlalchemy

I am looking to filter by the time section of a datetime column in Flask-Admin using Flask-SQlAlchemy.
My attempt so far is:
class BaseTimeBetweenFilter(filters.TimeBetweenFilter):
def apply(self, query, value, alias=None):
return query.filter(cast(Doctor.datetime, Time) >= value[0],
cast(Doctor.datetime, Time) <= value[1]).all()
I've got the time selector showing and if I do
print (value[0])
or
print (value[1])
it prints out the inputted times as expected. However the query is not working.
admin-panel_1 | response = self.full_dispatch_request()
admin-panel_1 | File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1641, in full_dispatch_request
admin-panel_1 | rv = self.handle_user_exception(e)
admin-panel_1 | File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1544, in handle_user_exception
admin-panel_1 | reraise(exc_type, exc_value, tb)
admin-panel_1 | File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 33, in reraise
admin-panel_1 | raise value
admin-panel_1 | File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1639, in full_dispatch_request
admin-panel_1 | rv = self.dispatch_request()
admin-panel_1 | File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1625, in dispatch_request
admin-panel_1 | return self.view_functionsrule.endpoint
admin-panel_1 | File "/usr/local/lib/python3.5/dist-packages/flask_admin/base.py", line 69, in inner
admin-panel_1 | return self._run_view(f, *args, **kwargs)
admin-panel_1 | File "/usr/local/lib/python3.5/dist-packages/flask_admin/base.py", line 368, in _run_view
admin-panel_1 | return fn(self, *args, **kwargs)
admin-panel_1 | File "/usr/local/lib/python3.5/dist-packages/flask_admin/model/base.py", line 1818, in index_view
admin-panel_1 | view_args.search, view_args.filters)
admin-panel_1 | File "/usr/local/lib/python3.5/dist-packages/flask_admin/contrib/sqla/view.py", line 975, in get_list
admin-panel_1 | count = count_query.scalar() if count_query else None
admin-panel_1 | AttributeError: 'list' object has no attribute 'scalar'
Also, I import Time and Cast from sqlalchemy, is this OK or should I be getting it from flask-sqlalchemy?
from flask_sqlalchemy import SQLAlchemy
from sqlalchemy import Time, cast

Flask-Admin constructs a query by joining tables, applying filters and sorting. After all this procedures it calls the query itself and gets its results.
Your apply method should return sqlalchemy.orm.query.Query instance as the one it gets as query argument. When you add .all() method this query is called and you get query result as a list. Remove .all() call from result value and your filter should work:
class BaseTimeBetweenFilter(filters.TimeBetweenFilter):
def apply(self, query, value, alias=None):
return query.filter(
cast(Doctor.datetime, Time) >= value[0],
cast(Doctor.datetime, Time) <= value[1]
)
The source of importing doesn't really matter. flask_sqlalchemy.SQLAlchemy instance contains the same objects as sqlalchemy:
>>> from flask_sqlalchemy import SQLAlchemy
>>> db = SQLAlchemy()
>>> db.Time
<class 'sqlalchemy.sql.sqltypes.Time'>
>>> db.cast
<function cast at 0x7f60a3e89c80>
>>> from sqlalchemy import Time, cast
>>> Time
<class 'sqlalchemy.sql.sqltypes.Time'>
>>> cast
<function cast at 0x7f60a3e89c80>
>>> db.cast == cast and db.Time == Time
True

Related

euclidean distance between two dataframes

I have two dataframes. For simplicity assume, they each have only one entry
+--------------------+
| entry |
+--------------------+
|[0.34, 0.56, 0.87] |
+--------------------+
+--------------------+
| entry |
+--------------------+
|[0.12, 0.82, 0.98] |
+--------------------+
How can I compute the euclidean distance between the entries of these two dataframes? Right now I have the following code:
from pyspark.sql.functions import udf
from pyspark.sql.types import DoubleType
from scipy.spatial import distance
inference = udf(lambda x, y: float(distance.euclidean(x, y)), DoubleType())
inference_result = inference(a, b)
but I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/sql/udf.py", line 197, in wrapper
return self(*args)
File "/usr/lib/spark/python/pyspark/sql/udf.py", line 177, in __call__
return Column(judf.apply(_to_seq(sc, cols, _to_java_column)))
File "/usr/lib/spark/python/pyspark/sql/column.py", line 68, in _to_seq
cols = [converter(c) for c in cols]
File "/usr/lib/spark/python/pyspark/sql/column.py", line 68, in <listcomp>
cols = [converter(c) for c in cols]
File "/usr/lib/spark/python/pyspark/sql/column.py", line 56, in _to_java_column
"function.".format(col, type(col)))
TypeError: Invalid argument, not a string or column: DataFrame[embedding:
array<float>] of type <class 'pyspark.sql.dataframe.DataFrame'>. For column
literals, use 'lit', 'array', 'struct' or 'create_map' function.

Trouble using pandas df.rolling() with my own functions

I have a pandas dataframe raw_data with two columns: 'T' and 'BP':
T BP
0 -0.500 115.790
1 -0.499 115.441
2 -0.498 115.441
3 -0.497 115.441
4 -0.496 115.790
... ... ...
647163 646.663 105.675
647164 646.664 105.327
647165 646.665 105.327
647166 646.666 105.327
647167 646.667 104.978
[647168 rows x 2 columns]
I want to apply the Hodges-Lehmann mean (it's a robust average) over a rolling window and create a new column. Here's the function:
def hodgesLehmannMean(x):
m = np.add.outer(x, x)
ind = np.tril_indices(len(x), 0)
return 0.5 * np.median(m[ind])
I therefore write:
raw_data[new_col] = raw_data['BP'].rolling(21, min_periods=1, center=True,
win_type=None, axis=0, closed=None).agg(hodgesLehmannMean)
but I get a string of error messages:
Traceback (most recent call last):
File "C:\Users\tkpme\miniconda3\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\tkpme\miniconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "c:\Users\tkpme\.vscode\extensions\ms-python.python-2020.8.101144\pythonFiles\lib\python\debugpy\__main__.py", line 45, in <module>
cli.main()
File "c:\Users\tkpme\.vscode\extensions\ms-python.python-2020.8.101144\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 430, in main
run()
File "c:\Users\tkpme\.vscode\extensions\ms-python.python-2020.8.101144\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 267, in run_file
runpy.run_path(options.target, run_name=compat.force_str("__main__"))
File "C:\Users\tkpme\miniconda3\lib\runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Users\tkpme\miniconda3\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Users\tkpme\miniconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "c:\Users\tkpme\OneDrive\Documents\Work\CMC\BP Satya and Suresh\Code\Naveen_peak_detect test.py", line 227, in <module>
main()
File "c:\Users\tkpme\OneDrive\Documents\Work\CMC\BP Satya and Suresh\Code\Naveen_peak_detect test.py", line 75, in main
raw_data[new_col] = raw_data['BP'].rolling(FILTER_WINDOW, min_periods=1, center=True, win_type=None,
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py", line 1961, in aggregate
return super().aggregate(func, *args, **kwargs)
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py", line 523, in aggregate
return self.apply(func, raw=False, args=args, kwargs=kwargs)
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py", line 1987, in apply
return super().apply(
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py", line 1300, in apply
return self._apply(
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py", line 507, in _apply
result = calc(values)
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py", line 495, in calc
return func(x, start, end, min_periods)
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\window\rolling.py", line 1326, in apply_func
return window_func(values, begin, end, min_periods)
File "pandas\_libs\window\aggregations.pyx", line 1375, in pandas._libs.window.aggregations.roll_generic_fixed
File "c:\Users\tkpme\OneDrive\Documents\Work\CMC\BP Satya and Suresh\Code\Naveen_peak_detect test.py", line 222, in hodgesLehmannMean
m = np.add.outer(x, x)
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\series.py", line 705, in __array_ufunc__
return construct_return(result)
File "C:\Users\tkpme\miniconda3\lib\site-packages\pandas\core\series.py", line 694, in construct_return
raise NotImplementedError
NotImplementedError
which appear to be driven by the line
m = np.add.outer(x, x)
and points to something not being implemented or numpy being missing. But I import numpy right at the beginning as follows:
import numpy as np
import pandas as pd
The function works perfectly well on its own if I feed it a list or a numpy array, so I'm not sure what the problem is. Interestingly, if I use the median instead of the Hodges-Lehmann Mean, it runs like a charm
raw_data[new_col] = raw_data['BP'].rolling(21, min_periods=1, center=True,
win_type=None, axis=0, closed=None).median()
What is the cause of my problem, and how do I fix it?
Sincerely
Thomas Philips
I've tried your code with a small dataframe and it worked well, so maybe there is something on your dataframe that must be cleaned or transformed.
Solved it. It turns out that
m = np.add.outer(x, x)
requires x to be array like. When I tested it using lists, numpy arrays, etc. it worked perfectly, just as it did for you. But the .rolling line generates a slice of a dataframe, which is not array like, and the function fails with a confusing error message. I modified the function to create a numpy array from the input and it now works as it should.
def hodgesLehmannMean(x):
x_array = np.array(x)
m = np.add.outer(x_array, x_array)
ind = np.tril_indices(len(x_array), 0)
return 0.5 * np.median(m[ind])
Thanks for looking at it!

Can't create Dask dataframe although Pandas dataframe gets created for the same query (sqlalchemy.exc.NoSuchTableError)

Hello I am trying to create a Dask Dataframe by pulling data from an Oracle Database as:
import cx_Oracle
import pandas as pd
import dask
import dask.dataframe as dd
# Build connection string/URL
user='user'
pw='pw'
host = 'xxx-yyy-x000'
port = '9999'
sid= 'XXXXX000'
ora_uri = 'oracle+cx_oracle://{user}:{password}#{sid}'.format(user=user, password=pw, sid=cx_Oracle.makedsn(host,port,sid))
tstquery ="select ID from EXAMPLE where rownum <= 5"
# Create Pandas Dataframe from ORACLE Query pull
tstdf1 = pd.read_sql(tstquery
,con = ora_uri
)
print("Dataframe tstdf1 created by pd.read_sql")
print(tstdf1.info())
# Create Dask Dataframe from ORACLE Query pull
tstdf2 = dd.read_sql_table(table = tstquery
,uri = ora_uri
,index_col = 'ID'
)
print(tstdf2.info())
As you can see the Pandas DF gets created but not the Dask DF. Following is the stdout:
Dataframe tstdf1 created by pd.read_sql
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
ID 5 non-null int64
dtypes: int64(1)
memory usage: 120.0 bytes
None
Traceback (most recent call last):
File "dk_test.py", line 40, in <module>
,index_col = 'ID'
File "---------------------------python3.6/site-packages/dask/dataframe/io/sql.py", line 103, in read_sql_table
table = sa.Table(table, m, autoload=True, autoload_with=engine, schema=schema)
File "<string>", line 2, in __new__
File "---------------------------python3.6/site-packages/sqlalchemy/util/deprecations.py", line 130, in warned
return fn(*args, **kwargs)
File "---------------------------python3.6/site-packages/sqlalchemy/sql/schema.py", line 496, in __new__
metadata._remove_table(name, schema)
File "---------------------------python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "---------------------------python3.6/site-packages/sqlalchemy/util/compat.py", line 154, in reraise
raise value
File "---------------------------python3.6/site-packages/sqlalchemy/sql/schema.py", line 491, in __new__
table._init(name, metadata, *args, **kw)
File "---------------------------python3.6/site-packages/sqlalchemy/sql/schema.py", line 585, in _init
resolve_fks=resolve_fks,
File "---------------------------python3.6/site-packages/sqlalchemy/sql/schema.py", line 609, in _autoload
_extend_on=_extend_on,
File "---------------------------python3.6/site-packages/sqlalchemy/engine/base.py", line 2147, in run_callable
return conn.run_callable(callable_, *args, **kwargs)
File "---------------------------python3.6/site-packages/sqlalchemy/engine/base.py", line 1604, in run_callable
return callable_(self, *args, **kwargs)
File "---------------------------python3.6/site-packages/sqlalchemy/engine/default.py", line 429, in reflecttable
table, include_columns, exclude_columns, resolve_fks, **opts
File "---------------------------python3.6/site-packages/sqlalchemy/engine/reflection.py", line 653, in reflecttable
raise exc.NoSuchTableError(table.name)
sqlalchemy.exc.NoSuchTableError: select ID from EXAMPLE where rownum <= 5
Needless to say, the table exists (As demonstrated by the creation of the Pandas DF), the Index is on
the col ID as well. What is the problem ?

How to sum specific columns of a pandas dataframe and then plot the totals on a histogram

My dataframe consists of 59 columns with +2000 rows of data (data collected per second). I need to sum/add all data in individual columns 29 to 60, to give 31 bin totals. I then need to plot the 31 bin totals in a histogram with bin number as the x axis and number of counts on the y axis.
NB: I am working in a Function inside a Class called from elsewhere.
I have summed/added each specified column using the sum() function. The histogram function is then called, and it gets as far as producing an empty figure and then produces a long list of errors, culminating at a datetime error (I am not sure how datetime is involved in this...). Please can someone point me in the right direction and help get my histogram working?
The examples shows the preceding code and just the first 5 bins.
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
class Pops(object):
def __init__(self, popsfile): #, pops_hdr_lines):
'''
...
'''
#________________________________________________
# # READ IN FILES AND CONCATENATE FILES
if not isinstance(popsfile, list):
self.POPSfile = [popsfile]
else:
self.POPSfile = popsfile
# read in multiple POPS datafiles into a list
dfs = []
for file in self.POPSfile:
df = pd.read_csv(file, sep=',', header=0) # PANDAS!
dfs.append(df)
# concatenate all dataframes together
data = pd.concat(dfs)
# ------------------------------------------------------------
# create new time stamp
# determine date
date_string = self.POPSfile[0][-16:-8] # extracts YYYYMMDD.
popsdate = dt.datetime.strptime(date_string, '%Y%m%d')
# Convert from date and time in sam since 1/1/1970
data.DateTime = pd.to_datetime(data.DateTime, errors='coerce', unit='s')
# Rename columns to remove whitespaces
data = data.rename(columns={' Status': 'Status', ' PartCt': 'PartCt', ' PartCon': 'PartCon',' BL': 'BL',
' P': 'P', ' POPS_Flow': 'POPS_Flow', ' LDTemp': 'LDTemp', ' Temp': 'Temp',
' Laser_Current': 'Laser_Current'})
self.data = data
def plot_housekeeping(self, path, show_save):
'''plot some of the basic data
'''
# histogram attempt
bins = self.data[['b0', 'b1', 'b2', 'b3', 'b4']].sum() # sums the bins of interest
print(bins) # check output
plt.hist(bins)
plt.show()
return None
if __name__ == '__main__':
main()
This is what I get:
b0 12965454
b1 9168956
b2 4178861
b3 2878718
b4 2699768
dtype: int64
Traceback (most recent call last):
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/backends/backend_qt5.py", line 519, in _draw_idle
self.draw()
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/backends/backend_agg.py", line 437, in draw
self.figure.draw(self.renderer)
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/artist.py", line 55, in draw_wrapper
return draw(artist, renderer, *args, **kwargs)
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/figure.py", line 1493, in draw
renderer, self, artists, self.suppressComposite)
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/image.py", line 141, in _draw_list_compositing_images
a.draw(renderer)
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/artist.py", line 55, in draw_wrapper
return draw(artist, renderer, *args, **kwargs)
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 2635, in draw
mimage._draw_list_compositing_images(renderer, self, artists)
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/image.py", line 141, in _draw_list_compositing_images
a.draw(renderer)
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/artist.py", line 55, in draw_wrapper
return draw(artist, renderer, *args, **kwargs)
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/axis.py", line 1190, in draw
ticks_to_draw = self._update_ticks(renderer)
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/axis.py", line 1028, in _update_ticks
tick_tups = list(self.iter_ticks()) # iter_ticks calls the locator
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/axis.py", line 971, in iter_ticks
majorLocs = self.major.locator()
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/dates.py", line 1249, in __call__
self.refresh()
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/dates.py", line 1269, in refresh
dmin, dmax = self.viewlim_to_dt()
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/dates.py", line 1027, in viewlim_to_dt
return num2date(vmin, self.tz), num2date(vmax, self.tz)
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/dates.py", line 522, in num2date
return _from_ordinalf(x, tz)
File "/opt/scitools/environments/default/current/lib/python3.6/site-packages/matplotlib/dates.py", line 322, in _from_ordinalf
dt = datetime.datetime.fromordinal(ix).replace(tzinfo=UTC)
ValueError: year 37173 is out of range

Incompatible indexer with Series

Why do I get an error:
import pandas as pd
a = pd.Series(index=[4,5,6], data=0)
print a.loc[4:5]
a.loc[4:5] += 1
Output:
4 0
5 0
Traceback (most recent call last):
File "temp1.py", line 9, in <module>
dtype: int64
a.loc[4:5] += 1
File "lib\site-packages\pandas\core\indexing.py", line 88, in __setitem__
self._setitem_with_indexer(indexer, value)
File "lib\site-packages\pandas\core\indexing.py", line 177, in _setitem_with_indexer
value = self._align_series(indexer, value)
File "lib\site-packages\pandas\core\indexing.py", line 206, in _align_series
raise ValueError('Incompatible indexer with Series')
ValueError: Incompatible indexer with Series
Pandas 0.12.
I think this is a bug, you can work around this by use tuple index:
import pandas as pd
a = pd.Series(index=[4,5,6], data=0)
print a.loc[4:5]
a.loc[4:5,] += 1