I am running into a strange issue with PyHive running a Hive query in async mode. Internally, PyHive uses Thrift client to execute the query and to fetch logs (along with execution status). I am unable to fetch the logs of Hive query (map/reduce tasks, etc). cursor.fetch_logs() returns an empty data structure
Here is the code snippet
rom pyhive import hive # or import hive or import trino
from TCLIService.ttypes import TOperationState
def run():
cursor = hive.connect(host="10.x.y.z", port='10003', username='xyz', password='xyz', auth='LDAP').cursor()
cursor.execute("select count(*) from schema1.table1 where date = '2021-03-13' ", async_=True)
status = cursor.poll(True).operationState
print(status)
while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
logs = cursor.fetch_logs()
for message in logs:
print("running ")
print(message)
# If needed, an asynchronous query can be cancelled at any time with:
# cursor.cancel()
print("running ")
status = cursor.poll().operationState
print
cursor.fetchall()
The cursor is able to get operationState correctly but its unable to fetch the logs. Is there anything on HiveServer2 side that needs to be configured?
Thanks in advance
Closing the loop here in case someone else has same or similar issue with hive.
In my case the problem was the hiveserver configuration. Hive Server won't stream the logs if logging operation is not enabled. Following is the list I configured
hive.server2.logging.operation.enabled - true
hive.server2.logging.operation.level EXECUTION (basic logging - There are other values that increases the logging level)
hive.async.log.enabled false
hive.server2.logging.operation.log.location
I am trying to schedule a job in EMR using airflow livy operator. Here is the example code I followed. The issue here is... nowhere Livy connection string (Host name & Port) is specified. How do I provide the Livy Server host name & port for the operator?
Also, the operator has parameter livy_conn_id, which in the example is set a value of livy_conn_default. Is that the right value?... or do I have set some other value?
You should be having livy_conn_default under connections in Admin tab of your Airflow dashboard, If that's set alright then yes, you can use this. Otherwise, you can change this or create another connection id and use that in livy_conn_id
There are 2 APIs we can use to connect Livy and Airflow:
Using LivyBatchOperator
Using LivyOperator
In the following example, i will cover LivyOperator API.
LivyOperator
Step1: Update the livy configuration:
Login to airflow ui --> click on Admin tab --> Connections --> Search for livy. Click on edit button and update the Host and Port parameters.
Step2: Install the apache-airflow-providers-apache-livy
pip install apache-airflow-providers-apache-livy
Step3: Create the data file under $AIRFLOW_HOME/dags directory.
vi $AIRFLOW_HOME/dags/livy_operator_sparkpi_dag.py
from datetime import timedelta, datetime
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from airflow.providers.apache.livy.operators.livy import LivyOperator
default_args = {
'owner': 'RangaReddy',
"retries": 3,
"retry_delay": timedelta(minutes=5),
}
# Initiate DAG
livy_operator_sparkpi_dag = DAG(
dag_id = "livy_operator_sparkpi_dag",
default_args=default_args,
schedule_interval='#once',
start_date = datetime(2022, 3, 2),
tags=['example', 'spark', 'livy']
)
# define livy task with LivyOperator
livy_sparkpi_submit_task = LivyOperator(
file="/root/spark-3.2.1-bin-hadoop3.2/examples/jars/spark-examples_2.12-3.2.1.jar",
class_name="org.apache.spark.examples.SparkPi",
driver_memory="1g",
driver_cores=1,
executor_memory="1g",
executor_cores=2,
num_executors=1,
name="LivyOperator SparkPi",
task_id="livy_sparkpi_submit_task",
dag=livy_operator_sparkpi_dag,
)
begin_task = DummyOperator(task_id="begin_task")
end_task = DummyOperator(task_id="end_task")
begin_task >> livy_sparkpi_submit_task >> end_task
LIVY_HOST=192.168.0.1
curl http://${LIVY_HOST}:8998/batches/0/log | python3 -m json.tool
Output:
"Pi is roughly 3.14144103141441"
import pyodbc
connection = pyodbc.connect('Driver = {SQL Server};Server=SIWSQL43A\SIMSSPROD43A;'
'Database=CSM_reporting;Trusted_Connection=yes;')
Error:
connection = pyodbc.connect('Driver = {SQL Server};Server=SIWSQL43A\SIMSSPROD43A;'
pyodbc.Error: ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')
Do not put a space after the Driver keyword in the connection string.
This fails on Windows ...
conn_str = (
r'DRIVER = {SQL Server};'
r'SERVER=(local)\SQLEXPRESS;'
r'DATABASE=myDb;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
... but this works:
conn_str = (
r'DRIVER={SQL Server};'
r'SERVER=(local)\SQLEXPRESS;'
r'DATABASE=myDb;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
I am also getting same error. Finally I have found the solution.
We can search odbc in our local program and check for version of odbc. In my case I have version 17 and 11 so. I have used 17 in connection string
'DRIVER={ODBC Driver 17 for SQL Server}'
I'm using Django 2.2
and got the same error while connecting to sql-server 2012. Spent lot of time to solve this issue and finally this worked.
I changed driver to
'driver': 'SQL Server Native Client 11.0'
and it worked.
I've met same problem and fixed it changing connection string like below.
Write
'DRIVER={ODBC Driver 13 for SQL Server}'
instead of
'DRIVER={SQL Server}'
Local Ms Sql database server need or {ODBC driver 17 for SQL Server}
Azure Sql Database need{ODBC driver 13 for SQL SERVER}
Check installed drivers here => Installed ODBC Drivers
Format for connection to Azure Sql Database is :
import pyodbc
conn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};'
'SERVER=tcp:nameServer.database.windows.net,1433;'
'DATABASE=Name database; UID=name; PWD=password;')
Format for connection to Ms SQL Databse Local:
import pyodbc
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};'
'SERVER=server.name;' // example Doctor-Notebook\\MSSQLEXPRESS
'DATABASE=database.name; Trusted_connection = yes')
I faced this issue and was looking for the solution. Finally I was trying all the options from the https://github.com/mkleehammer/pyodbc/wiki/Connecting-to-SQL-Server-from-Windows , and for my MSSQL 12 only "{ODBC Driver 11 for SQL Server}" works. Just try it one by one. And the second important thing you have to get correct server name, because I thought preciously that I need to set \SQLEXPRESS in all of the cases, but found out that you have to set EXACTLY what you see in the server properties. Example on the screenshot:
You could try:
import pyodbc
# Using a DSN
cnxn = pyodbc.connect('DSN=odbc_datasource_name;UID=db_user_id;PWD=db_password')
Note: You will need to know the "odbc_datasource_name". In Windows you can search for ODBC Data Sources. The name will look something like this:
Data Source Name Example
The below code works magic.
SQLALCHEMY_DATABASE_URI = "mssql+pyodbc://<servername>/<dbname>?driver=SQL Server Native Client 11.0?trusted_connection=yes?UID" \
"=<db_name>?PWD=<pass>"
Below connection string is working
import pandas as pd
import pyodbc as odbc
sql_conn = odbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER=SERVER_NAME;DATABASE=DATABASE_NAME;UID=USERNAME;PWD=PASSWORD;')
query = "SELECT * FROM admin.TABLE_NAME"
df = pd.read_sql(query, sql_conn)
df.head()
I have had the same error on python3 and this help me:
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};'
'SERVER=YourServerName;'
'DATABASE=YourDatabaseName;UID=USER_NAME;PWD=PASS_WORD;')
remember python is case-sensitive so you have to mention DRIVER,SERVER,... in upper case.
and you can visit this link for more information:
https://learn.microsoft.com/en-us/sql/connect/python/pyodbc/step-3-proof-of-concept-connecting-to-sql-using-pyodbc?view=sql-server-ver15
In my case, the exact same error was caused by the lack of the drivers on Windows Server 2019 Datacenter running in an Azure virtual machine.
As soon as I installed the drivers from https://www.microsoft.com/en-us/download/details.aspx?id=56567, the issue was gone.
for error : pyodbc.InterfaceError: ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')
No space between the driver and event
connection = Driver={SQL Server Native Client 11.0};
"Server=servername;"
"Database=dbname;"
"Trusted_Connection=yes;"
Apart from the other answers, that considered the connection string itself, it might simply be necessary to download the correct odbc driver. My client just faced this issue when executing a python app, that required it.
you can check this by pressing windows + typing "odbc".
the correct driver should appear in the drivers tab.
Create a DSN something like this (ASEDEV) for your connection and try to use DSN instead of DRIVER like below:
enter code here
import pyodbc
cnxn = pyodbc.connect('DSN=ASEDEV;User ID=sa;Password=sybase123')
mycur = cnxn.cursor()
mycur.execute("select * from master..sysdatabases")
row = mycur.fetchone()
while row:
print(row)
row = mycur.fetchone()`
I was facing the same issue whole day wasted and I tried all possible ODBC Driver values
import pyodbc
connection = pyodbc.connect('Driver = {SQL Server};Server=ServerName;'
'Database=Database_Name;Trusted_Connection=yes;')
In place of Driver = {SQL Server} we can try these option one by one or just you can use with you corresponding setting, somehow in my case the last one works :)
Driver={ODBC Driver 11 for SQL Server} for SQL Server 2005 - 2014
Driver={ODBC Driver 13 for SQL Server} for SQL Server 2005 - 2016
Driver={ODBC Driver 13.1 for SQL Server} for SQL Server 2008 - 2016
Driver={ODBC Driver 17 for SQL Server} for SQL Server 2008 - 2017
Driver={SQL Server} for SQL Server 2000
Driver={SQL Native Client} for SQL Server 2005
Driver={SQL Server Native Client 10.0} for SQL Server 2008
Driver={SQL Server Native Client 11.0} for SQL Server 2012
You need to download Microsoft ODBC Driver 13 for SQL Server
from Microsoft ODBC Driver 13
Thank you Avinash.
brilliant. I tried to connect to MS Azure database using PyCharm. It worked.
server = ''
database = ''
username = ''
password = ''
driver = 'SQL Server Native Client 11.0'
connection1 = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+ password + ';TDS_Version=8.0')
print("Connected.")
Try below:
import pyodbc
server = 'servername'
database = 'DB'
username = 'UserName'
password = 'Password'
cnxn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
cursor.execute('SELECT * FROM Tbl')
for row in cursor:
print('row = %r' % (row,))
Have you installed any product of SQL in your system machine ?
You can download and install "ODBC Driver 13(or any version) for SQL Server" and try to run if you havent alerady done.
Make sure you have all drivers and db engine installed
https://www.microsoft.com/en-us/download/details.aspx?id=54920
server = '123.45.678.90'
database = 'dbname'
username = 'username'
password = 'pwork'
driivver = '{ODBC Driver 17 for SQL Server}'
samgiongzon='DRIVER='+driivver+';SERVER='+server+\
';DATABASE='+database+';UID='+username+\
';PWD='+password+';Trusted_Connection=no;'
pyodbc.connect(samgiongzon, autocommit=True)
it worked for me; you need to install driver from here
https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver15
or (in ubuntu) sudo apt-get install unixodbc-dev if you get an error with pip install pyodbc
if any one are trying to access the database which is hosted in azure then try to give the driver as ODBC Driver 17 for SQL Server
I'm using a CDH cluster which is kerberous enabled and I'd like to use pyhive to connect to HIVE and read HIVE tables. Here is the code I have
from pyhive import hive
from TCLIService.ttypes import TOperationState
cursor = hive.connect(host = 'xyz', port = 10000, username = 'my_username', auth = 'KERBEROS', database = 'poc', kerberos_service_name = 'hive' ).cursor()
I'm getting the value of xyz from hive-site.xml under hive.metastore.uris, however it says xyz:9083, but if I replace 10000 with 9083, it complains.
My problem is when I connect (using port = 10000), it gives me permission error when executing a query, while I can read that table if I use HIVE CLI or beeline. My question is 1) if the xyz is the value I should use? 2) which port I should use? 3) if all is correct, why I'm still getting a permission issue?
import pyodbc
cnxn = pyodbc.connect('DRIVER={SQL Server Native Client 11.0};SERVER=LENOVO-PCN;DATABASE=testing;')
cursor = cnxn.cursor()
cursor.execute("select Sales from Store_Inf")
row = cursor.fetchone()
if row:
print (row)
I try using python 3 with module pyodbc to connect SQL Server Express.
My codes gave a error:
('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]Named
Pipes Provider: Could not open a connection to SQL Server [2]. (2)
(SQLDriverConnect)')
Any idea for this?
Here is an example that worked for me using Trusted_Connection=yes
import pyodbc
conn_str = pyodbc.connect(
Trusted_Connection='Yes',
Driver='{ODBC Driver 11 for SQL Server}',
Server='SERVER_NAME,PORT_NUMBER',
Database='DATABASE_NAME'
)
connection = pyodbc.connect(conn_str)
Please note that port number is comma separated!