I would like to be performing a simple task of bringing table data from a MS Access database into Pandas in the form of a dataframe. I had this working great recently and now I can not figure out why it is no longer working. I remember when initially troubleshooting the connection there was work that I needed to do around installing a new microsoft database driver with the correct bitness so I have revisited that and gone through a reinstallation of the driver. Below is what I am using for a setup.
Record of install on Laptop:
OS: Windows 7 Professional 64-bit (verified 9/6/2017)
Access version: Access 2016 32bit (verified 9/6/2017)
Python version: Python 3.6.1 (64-bit) found using >Python -V (verified 9/11/2017)
the AccessDatabaseEngine needed will be based on the Python bitness above
Windows database engine driver installed with AccessDatabaseEngine_X64.exe from 2010 release using >AccessDatabaseEngine_X64.exe /passive (verified 9/11/2017)
I am running the following simple test code to try out the connection to a test database.
import pyodbc
import pandas as pd
[x for x in pyodbc.drivers() if x.startswith('Microsoft Access Driver')]
returns:
['Microsoft Access Driver (*.mdb, *.accdb)']
Setting the connection string.
dbpath = r'Z:\1Users\myfiles\software\JupyterNotebookFiles\testDB.accdb'
conn_str = (r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};''DBQ=%s;' %(dbpath))
cnxn = pyodbc.connect(conn_str)
crsr = cnxn.cursor()
Verifying that the I am connected to the db...
for table_info in crsr.tables(tableType='TABLE'):
print(table_info.table_name)
returns:
TestTable1
Trying to connect to TestTable1 gives the error below.
dfTable = pd.read_sql_table(TestTable1, cnxn)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-14-a24de1550834> in <module>()
----> 1 dfTable = pd.read_sql_table(TestTable1, cnxn)
2 #dfQuery = pd.read_sql_query("SELECT FROM [TestQuery1]", cnxn)
NameError: name 'TestTable1' is not defined
Trying again with single quotes gives the error below.
dfTable = pd.read_sql_table('TestTable1', cnxn)
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-15-1f89f9725f0a> in <module>()
----> 1 dfTable = pd.read_sql_table('TestTable1', cnxn)
2 #dfQuery = pd.read_sql_query("SELECT FROM [TestQuery1]", cnxn)
C:\Users\myfiles\Anaconda3\lib\site-packages\pandas\io\sql.py in read_sql_table(table_name, con, schema, index_col, coerce_float, parse_dates, columns, chunksize)
250 con = _engine_builder(con)
251 if not _is_sqlalchemy_connectable(con):
--> 252 raise NotImplementedError("read_sql_table only supported for "
253 "SQLAlchemy connectable.")
254 import sqlalchemy
NotImplementedError: read_sql_table only supported for SQLAlchemy connectable.
I have tried going back to the driver issue and reinstalling a 32bit version without any luck.
Anybody have any ideas?
Per the docs of pandas.read_sql_table:
Given a table name and an SQLAlchemy connectable, returns a DataFrame.
This function does not support DBAPI connections.
Since pyodbc is a DBAPI, use the query method, pandas.read_sql which the con argument does support DBAPI:
dfTable = pd.read_sql("SELECT * FROM TestTable1", cnxn)
Reading db table with just table_name
import pandas
from sqlalchemy import create_engine
engine=create_engine('postgresql+psycopg2://user:password#localhost/db_name')
df=pandas.read_sql_table("table_name",engine)
Related
I have my postgres installed on PC1 and I am connecting to the database using PC2. I have modified the settings so that postgres on PC1 is accessible to local network.
On PC2 I am doing the following:
import pandas as pd, pyodbc
from sqlalchemy import create_engine
z1 = create_engine('postgresql://postgres:***#192.168.40.154:5432/myDB')
z2 = pd.read_sql(fr"""select * from public."myTable" """, z1)
I get the error:
File "C:\Program Files\Python311\Lib\site-packages\pandas\io\sql.py", line 1405, in execute
return self.connectable.execution_options().execute(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'OptionEngine' object has no attribute 'execute'
While running the same code on PC1 I get no error.
I just noticed that it happens only when reading from the db. If I do to_sql it works. Seems there is missing on the PC2 as instead of trying 192.168.40.154:5432 if I use localhost:5432 I get the same error.
Edit:
Following modification worked but not sure why. Can someone please educate me what could be the reason for this.
from sqlalchemy.sql import text
connection = connection = z1.connect()
stmt = text("SELECT * FROM public.myTable")
z2 = pd.read_sql(stmt, connection)
Edit2:
PC1:
pd.__version__
'1.5.2'
import sqlalchemy
sqlalchemy.__version__
'1.4.46'
PC2:
pd.__version__
'1.5.3'
import sqlalchemy
sqlalchemy.__version__
'2.0.0'
Does it mean that if I update the packages on PC1 everything is going to break?
I ran into the same problem just today and basically it's the SQLalchemy version, if you look at the documentation here the SQLalchemy version 2.0.0 was released a few days ago so pandas is not updated, for now I think the solution is sticking with the 1.4.x version.
The sqlalchemy.sql.text() part is not the issue. The addition of connection() to the connect_engine() instruction seems to have done the trick.
You should also use a context manager in addition to a SQLAlchemy SQL clause using text, e.g.:
import pandas as pd, pyodbc
from sqlalchemy import create_engine, text
engine = create_engine('postgresql://postgres:***#192.168.40.154:5432/myDB')
with engine.begin() as connection:
res = pd.read_sql(
sql=text(fr'SELECT * FROM public."myTable"'),
con=connection,
)
As explained here https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html :
conSQLAlchemy connectable, str, or sqlite3 connection
Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported. The user is
responsible for engine disposal and connection closure for the
SQLAlchemy connectable; str connections are closed automatically. See
here.
--> especially this point: https://docs.sqlalchemy.org/en/20/core/connections.html#connect-and-begin-once-from-the-engine
I have a file called im.db in my current working directory, as shown here:
I also am able to query this database directly from sqlite3 at the command line:
% sqlite3
SQLite version 3.28.0 2019-04-15 14:49:49
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite\> .open im.db
sqlite\> select count(\*) from writers;
255873
However, when running the same query inside of my notebook:
import sqlite3 as sql
con = sql.connect("im.db")
writers_dataframe = pd.read_sql_query("SELECT COUNT(\*) from writers", con)
I get an error message which states no such table: writers
Any help would be much appreciated. Thanks!
I've followed the step by step here and inserted this snippet:
https://colab.research.google.com/notebook#snippetFileIds=%2Fv2%2Fexternal%2Fnotebooks%2Fsnippets%2Fbigquery.ipynb&snippetQuery=Using%20BigQuery%20with%20Pandas%20API
however, i can run the query, but then appears an error :
TypeError Traceback (most recent call last)
<ipython-input-22-b9e37aa67e26> in <module>()
9 COUNT(*) as total
10 FROM `bigquery-public-data.samples.gsod`
---> 11 ''', project_id=project_id).total[0]
12
13 df = pd.io.gbq.read_gbq(f'''
8 frames
/usr/local/lib/python3.6/dist-packages/pyarrow/table.pxi in pyarrow.lib.RecordBatch.from_arrays()
TypeError: from_arrays() takes at least 2 positional arguments (1 given)
I have tried with several database, with no success.
Any Idea?
I have followed steps from Using BigQuery with Pandas API Colab and it works fine for me.
Firstly you need to create Cloud Platform project if you not already have one, and then enable billing and BigQuery API.
When running first snippet of code, you need to click on a link, that shows in a console, copy verification code and paste it to the console in Enter verification code field:
from google.colab import auth
auth.authenticate_user()
Before running second snippet of code, you need to change the name of project_id field to the name of your actual project, that you created in GCP:
import pandas as pd
# https://cloud.google.com/resource-manager/docs/creating-managing-projects
project_id = 'your Cloud Platform project ID'
sample_count = 2000
row_count = pd.io.gbq.read_gbq('''
SELECT
COUNT(*) as total
FROM `bigquery-public-data.samples.gsod`
''', project_id=project_id).total[0]
df = pd.io.gbq.read_gbq(f'''
SELECT
*
FROM
`bigquery-public-data.samples.gsod`
WHERE RAND() < {sample_count}/{row_count}
''', project_id=project_id)
print(f'Full dataset has {row_count} rows')
After that, you will get following output:
I hope it helps you.
I fixed this issue by updating to the latest version of arrow
!pip install pyarrow==0.17.1
I'm trying to use pyusb over usblib1.0 to read the data from an old Ps2 mouse using a Ps2 to USB adapter that represents it as a HID device.
I am able to access the device, but when I try to send it the GET_REPORT request over control transfer, it shows me this error:
[Errno None] b'libusb0-dll:err [claim_interface] could not claim interface 0, win error: The parameter is incorrect.\r\n'
Here is my code:
import usb.core as core
import time
from usb.core import USBError as USBError
dev = core.find(idVendor=0x13ba, idProduct=0x0018, address=0x03)
interface = 0
endpoint = dev[0][(interface, 0)][0]
dev.set_configuration()
collected = 0
attempts = 50
while collected < attempts:
try:
print(dev.ctrl_transfer(0b10100001, 0x01, wValue=100, data_or_wLength=64))
collected += 1
except USBError as e:
print(e)
time.sleep(0.1)
I'm using python 3.x in windows 10 (Lenovo g-510 if it matters to anyone)
The driver I installed is a usblib-win32 using Zadig
Any help will be appreciated!
Thanks
EDIT:
Tried using WIN-USB so that It'll work with usblib-1.0
It didn't find the device. Returned None from the usb.core.fןnd() function
Continuing with WIN-USB with usblib-1.0, I successfully found the device, but no it appears to have no configuration.
dev.set_configuration()
returns:
File "C:\Users\Idan Stark\AppData\Local\Programs\Python\Python36-32\lib\site-packages\usb\backend\libusb1.py", line 595, in _check
raise USBError(_strerror(ret), ret, _libusb_errno[ret])
usb.core.USBError: [Errno 2] Entity not found
Any help will be appreciated, in usblib-1.0 or usblib-0.1, anything to make this work! Thank you!
I am getting below error after executing below code. Am I missing something in the installation? I am using spark installed on my local mac and so I am checking to see if I need to install additional libraries for below code to work and load data from bigquery.
Py4JJavaError Traceback (most recent call last)
<ipython-input-8-9d6701949cac> in <module>()
13 "com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat",
14 "org.apache.hadoop.io.LongWritable", "com.google.gson.JsonObject",
---> 15 conf=conf).map(lambda k: json.loads(k[1])).map(lambda x: (x["word"],
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.ClassNotFoundException: com.google.gson.JsonObject
import json
import pyspark
sc = pyspark.SparkContext()
hadoopConf=sc._jsc.hadoopConfiguration()
hadoopConf.get("fs.gs.system.bucket")
conf = {"mapred.bq.project.id": "<project_id>", "mapred.bq.gcs.bucket": "<bucket>",
"mapred.bq.input.project.id": "publicdata",
"mapred.bq.input.dataset.id":"samples",
"mapred.bq.input.table.id": "shakespeare" }
tableData = sc.newAPIHadoopRDD(
"com.google.cloud.hadoop.io.bigquery.JsonTextBigQueryInputFormat",
"org.apache.hadoop.io.LongWritable", "com.google.gson.JsonObject",
conf=conf).map(lambda k: json.loads(k[1])).map(lambda x: (x["word"],
int(x["word_count"]))).reduceByKey(lambda x,y: x+y)
print tableData.take(10)
The error "java.lang.ClassNotFoundException: com.google.gson.JsonObject" seems to hint that a library is missing.
Please try adding the gson jar to your path: http://search.maven.org/#artifactdetails|com.google.code.gson|gson|2.6.1|jar
Highlighting something buried in the connector link in Felipe's response: the bq connector used to be included by default in Cloud Dataproc, but was dropped starting at v 1.3. The link shows you three ways to get it back.