fast_executemany=True throwing DBAPIError: Function sequence error in sqlalchemy version 1.3.5 - pandas

Since SQLAlchemy 1.3.0, released 2019-03-04,
sqlalchemy now supports
engine = create_engine(sqlalchemy_url, fast_executemany=True)
for the mssql+pyodbc dialect. I.e.,
it is no longer necessary to define a function and use
#event.listens_for(engine, 'before_cursor_execute').
However when i am trying to write a simple test dataframe to mssql it returns error:
DBAPIError: (pyodbc.Error) ('HY010', '[HY010] [Microsoft][ODBC Driver 17 for SQL Server]Function sequence error (0) (SQLParamData)')
[SQL: INSERT INTO fast_executemany_test ([Date], [A], [B], [C], [D]) VALUES (?, ?, ?, ?, ?)][parameters: ((datetime.datetime(2018, 1, 3, 0, 0), 2.0, 1.0, 1.0, 'Joe'), (datetime.datetime(2018, 1, 4, 0, 0), 2.0, 1.0, 2.0, 'Joe'), (datetime.datetime(2018, 1, 5, 0, 0), 2.0, 3.0, 1.0, 'Pete'), (datetime.datetime(2018, 1, 6, 0, 0), 2.0, 1.0, 5.0, 'Mary'))]
(Background on this error at: http://sqlalche.me/e/dbapi)
I have gone through the documentation but could not find what am I doing wrong.
import sqlalchemy
import pandas as pd
DataFrame contains datetime, float, float, float, string.
test_columns = ['Date', 'A', 'B', 'C', 'D']
test_data = [
[datetime(2018, 1, 3), 2.0, 1.0, 1.0, 'Joe'],
[datetime(2018, 1, 4), 2.0, 1.0, 2.0, 'Joe'],
[datetime(2018, 1, 5), 2.0, 3.0, 1.0, 'Pete'],
[datetime(2018, 1, 6), 2.0, 1.0, 5.0, 'Mary'],
]
I am establishing connection as:
sqlUrl='mssql+pyodbc://ID:PASSWORD' + 'SERVER_ADDRESS' + '/' + 'DBName' + '?driver=ODBC+Driver+17+for+SQL+Server'
sqlcon = sqlalchemy.create_engine(sqlUrl,fast_executemany=True)
if sqlcon:
test_data.to_sql('FastTable_test', sqlcon, if_exists='replace',index=False)
print('Successfully written!')
It creates the table but due to error does not write any data into it.

Related

First 'Group by' then plot/save as png from pandas

first I need to filter data then plot each group separately and save files to directory
for id in df["set"].unique():
df2= df.loc[df["set"] == id]
outpath = "path/of/your/folder/"
sns.set_style("whitegrid", {'grid.linestyle': '-'})
plt.figure(figsize=(12,8))
ax1=sns.scatterplot(data=df2, x="x", y="y", hue="result",markers=['x'],s=1000)
ax1.get_legend().remove()
ax1.set_yticks((0, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5), minor=False)
ax1.set_xticks([0, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.6], minor=False)
fig = ax1.get_figure()
fig.savefig(path.join(outpath,"id.png",dpi=300 )
This worked for me but it is very slow
groups = df.groupby("set")
for name, group in groups:
sns.set_style("whitegrid", {'grid.linestyle': '-'})
plt.figure(figsize=(12,8))
ax1=sns.scatterplot(data=group, x="x", y="y", hue="result",markers=['x'],s=1000)
ax1.get_legend().remove()
ax1.set_yticks((0, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5), minor=False)
ax1.set_xticks([0, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.6], minor=False)
fig = ax1.get_figure()
fig.savefig("directory/{0}.png".format(name), dpi=300)

pd.MultiIndex from product

from pandas documentation:
numbers = [0, 1, 2]
colors = ['green', 'purple']
pd.MultiIndex.from_product([numbers, colors],names=['number', 'color'])
MultiIndex([(0, 'green'),
(0, 'purple'),
(1, 'green'),
(1, 'purple'),
(2, 'green'),
(2, 'purple')],
names=['number', 'color'])
what I got:
MultiIndex(levels=[[0, 1, 2], ['green', 'purple']],
codes=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
names=['numbers', 'colors'])
can someone please help understand why I got this output by putting in the same code?
That was how previous Pandas versions represent the multiIndex. On my system, Pandas 1.0.3 gives the former and 0.24.2 gives the latter. Make sure your system's version is the same with that of the doc.
See the section "Better repr for MultiIndex" enhancement which was released in v0.25.0.

Can fromfile omit fields?

I am reading data from a given binary format, however I am only interested in a subset of the fields.
For example:
MY_DTYPE = np.dtype({'names': ('A', 'B', 'C'), 'formats': ('<f8', '<u2', 'u1')})
data = np.fromfile(infile, count=-1, dtype=MY_DTYPE)
Assume I don't really need data['C'], is it possible to specify what fields I want to keep in the first place?
Simulate the load:
In [117]: MY_DTYPE = np.dtype({'names': ('A', 'B', 'C'), 'formats': ('<f8', '<u2', 'u1')})
In [118]: data = np.zeros(3, MY_DTYPE)
In [119]: data
Out[119]:
array([(0., 0, 0), (0., 0, 0), (0., 0, 0)],
dtype=[('A', '<f8'), ('B', '<u2'), ('C', 'u1')])
In [120]: data['C']
Out[120]: array([0, 0, 0], dtype=uint8)
In the latest numpy version, multifield indexing creates a view:
In [121]: data[['A','B']]
Out[121]:
array([(0., 0), (0., 0), (0., 0)],
dtype={'names':['A','B'], 'formats':['<f8','<u2'], 'offsets':[0,8], 'itemsize':11})
It provides a repack_fields functions to make a proper copy:
In [122]: import numpy.lib.recfunctions as rf
In [123]: rf.repack_fields(data[['A','B']])
Out[123]: array([(0., 0), (0., 0), (0., 0)], dtype=[('A', '<f8'), ('B', '<u2')])
See the docs of repack for more information, or look at recent release notes.

Pandas: apply tupleize_cols to dataframe without to_csv()?

I like the tupleize_cols option in the to_csv() function. Is this function available on a in-memory dataframe? I would like to clean up the tuples of the multi-indexed columns to 'reportable' column names automatically.
Thanks,
Luc
Just use .values on the index
In [1]: i = pd.MultiIndex.from_product([[1,2,3],['a','b','c']])
In [2]: i
Out[2]:
MultiIndex(levels=[[1, 2, 3], [u'a', u'b', u'c']],
labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]])
In [3]: i.values
Out[3]:
array([(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), (2, 'b'), (2, 'c'),
(3, 'a'), (3, 'b'), (3, 'c')], dtype=object)

Add another column(index) into the array

I have an array
a =array([ 0.74552751, 0.70868784, 0.7351144 , 0.71597612, 0.77608263,
0.71213591, 0.77297658, 0.75637376, 0.76636106, 0.76098067,
0.79142821, 0.71932262, 0.68984604, 0.77008623, 0.76334351,
0.76129872, 0.76717526, 0.78413129, 0.76483804, 0.75160062,
0.7532506 ], dtype=float32)
I want to store my array in item,value format and can't seems to get it right.
I'm trying to get this format:
a = [(0, 0.001497),
(1, 0.0061543),
..............
(46, 0.001436781),
(47, 0.00654533),
(48, 0.0027139),
(49, 0.00462962)],
Numpy arrays have a fixed data type that you must specify. It looks like a data type of
int for your item and float for you value would work best. Something like:
import numpy as np
dtype = [("item", int), ("value", float)]
a = np.array([(0, 0.), (1, .1), (2, .2)], dtype=dtype)
The string part of the dtype is the name of each field. The names allow you to access the fields more easily like this:
print a['value']
# [ 0., 0.1, 0.2]
a['value'] = [7, 8, 9]
print a
# [(0, 7.0) (1, 8.0) (2, 9.0)]
If you need to copy another array into the array I describe above, you can do it just by using the filed name:
new = np.empty(len(a), dtype)
new['index'] = [3, 4, 5]
new['value'] = a['value']
print new
# [(3, 7.0) (4, 8.0) (5, 9.0)]