How to query Hive from python connecting using Zookeeper?

How to query Hive from python connecting using Zookeeper? - hive

I can connect to a Hive (or LLAP) database using pyhive and I can query the database fixing the server host. Here is a code example:
from pyhive import hive
host_name = "vrt1553.xxx.net"
port = 10000
connection = hive.Connection(
host=host_name,
port=port,
username=user,
kerberos_service_name='hive',
auth='KERBEROS',
)
cursor = connection.cursor()
cursor.execute('show databases')
print(cursor.fetchall())
How could I connect using Zookeeper to get a server name?

You must install the Kazoo package to query Zookeeper and find the host and port of your Hive servers:
import random
from kazoo.client import KazooClient
zk = KazooClient(hosts='vrt1554.xxx.net:2181,vrt1552.xxx.net:2181,vrt1558.xxx.net:2181', read_only=True)
zk.start()
servers = [hiveserver2.split(';')[0].split('=')[1].split(':')
for hiveserver2
in zk.get_children(path='hiveserver2')]
hive_host, hive_port = random.choice(servers)
zk.stop()
print(hive_host, hive_port)
Then just pass hive_host and hive_port to your Connection constructor:
connection = hive.Connection(
host=hive_host,
port=hive_port,
username=user,
kerberos_service_name="hive",
auth="KERBEROS",
)
And query as a standard python sql cursor. Here is using pandas:
df = pd.read_sql(sql_query, connection)

Related

PyHive unable to fetch logs from HiveServer2 when running in async mode

I am running into a strange issue with PyHive running a Hive query in async mode. Internally, PyHive uses Thrift client to execute the query and to fetch logs (along with execution status). I am unable to fetch the logs of Hive query (map/reduce tasks, etc). cursor.fetch_logs() returns an empty data structure
Here is the code snippet
rom pyhive import hive # or import hive or import trino
from TCLIService.ttypes import TOperationState
def run():
cursor = hive.connect(host="10.x.y.z", port='10003', username='xyz', password='xyz', auth='LDAP').cursor()
cursor.execute("select count(*) from schema1.table1 where date = '2021-03-13' ", async_=True)
status = cursor.poll(True).operationState
print(status)
while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE):
logs = cursor.fetch_logs()
for message in logs:
print("running ")
print(message)
# If needed, an asynchronous query can be cancelled at any time with:
# cursor.cancel()
print("running ")
status = cursor.poll().operationState
print
cursor.fetchall()
The cursor is able to get operationState correctly but its unable to fetch the logs. Is there anything on HiveServer2 side that needs to be configured?
Thanks in advance

Closing the loop here in case someone else has same or similar issue with hive.
In my case the problem was the hiveserver configuration. Hive Server won't stream the logs if logging operation is not enabled. Following is the list I configured
hive.server2.logging.operation.enabled - true
hive.server2.logging.operation.level EXECUTION (basic logging - There are other values that increases the logging level)
hive.async.log.enabled false
hive.server2.logging.operation.log.location

Submitting Spark Job to Livy (in EMR) from Airflow (using airflow Livy operator)

I am trying to schedule a job in EMR using airflow livy operator. Here is the example code I followed. The issue here is... nowhere Livy connection string (Host name & Port) is specified. How do I provide the Livy Server host name & port for the operator?
Also, the operator has parameter livy_conn_id, which in the example is set a value of livy_conn_default. Is that the right value?... or do I have set some other value?

You should be having livy_conn_default under connections in Admin tab of your Airflow dashboard, If that's set alright then yes, you can use this. Otherwise, you can change this or create another connection id and use that in livy_conn_id

There are 2 APIs we can use to connect Livy and Airflow:
Using LivyBatchOperator
Using LivyOperator
In the following example, i will cover LivyOperator API.
LivyOperator
Step1: Update the livy configuration:
Login to airflow ui --> click on Admin tab --> Connections --> Search for livy. Click on edit button and update the Host and Port parameters.
Step2: Install the apache-airflow-providers-apache-livy
pip install apache-airflow-providers-apache-livy
Step3: Create the data file under $AIRFLOW_HOME/dags directory.
vi $AIRFLOW_HOME/dags/livy_operator_sparkpi_dag.py
from datetime import timedelta, datetime
from airflow import DAG
from airflow.operators.dummy import DummyOperator
from airflow.providers.apache.livy.operators.livy import LivyOperator
default_args = {
'owner': 'RangaReddy',
"retries": 3,
"retry_delay": timedelta(minutes=5),
}
# Initiate DAG
livy_operator_sparkpi_dag = DAG(
dag_id = "livy_operator_sparkpi_dag",
default_args=default_args,
schedule_interval='#once',
start_date = datetime(2022, 3, 2),
tags=['example', 'spark', 'livy']
)
# define livy task with LivyOperator
livy_sparkpi_submit_task = LivyOperator(
file="/root/spark-3.2.1-bin-hadoop3.2/examples/jars/spark-examples_2.12-3.2.1.jar",
class_name="org.apache.spark.examples.SparkPi",
driver_memory="1g",
driver_cores=1,
executor_memory="1g",
executor_cores=2,
num_executors=1,
name="LivyOperator SparkPi",
task_id="livy_sparkpi_submit_task",
dag=livy_operator_sparkpi_dag,
)
begin_task = DummyOperator(task_id="begin_task")
end_task = DummyOperator(task_id="end_task")
begin_task >> livy_sparkpi_submit_task >> end_task
LIVY_HOST=192.168.0.1
curl http://${LIVY_HOST}:8998/batches/0/log | python3 -m json.tool
Output:
"Pi is roughly 3.14144103141441"

MS SQL Server connection using ODBC Driver from Python [duplicate]

import pyodbc
connection = pyodbc.connect('Driver = {SQL Server};Server=SIWSQL43A\SIMSSPROD43A;'
'Database=CSM_reporting;Trusted_Connection=yes;')
Error:
connection = pyodbc.connect('Driver = {SQL Server};Server=SIWSQL43A\SIMSSPROD43A;'
pyodbc.Error: ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')

Do not put a space after the Driver keyword in the connection string.
This fails on Windows ...
conn_str = (
r'DRIVER = {SQL Server};'
r'SERVER=(local)\SQLEXPRESS;'
r'DATABASE=myDb;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
... but this works:
conn_str = (
r'DRIVER={SQL Server};'
r'SERVER=(local)\SQLEXPRESS;'
r'DATABASE=myDb;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)

I am also getting same error. Finally I have found the solution.
We can search odbc in our local program and check for version of odbc. In my case I have version 17 and 11 so. I have used 17 in connection string
'DRIVER={ODBC Driver 17 for SQL Server}'

I'm using Django 2.2
and got the same error while connecting to sql-server 2012. Spent lot of time to solve this issue and finally this worked.
I changed driver to
'driver': 'SQL Server Native Client 11.0'
and it worked.

I've met same problem and fixed it changing connection string like below.
Write
'DRIVER={ODBC Driver 13 for SQL Server}'
instead of
'DRIVER={SQL Server}'

Local Ms Sql database server need or {ODBC driver 17 for SQL Server}
Azure Sql Database need{ODBC driver 13 for SQL SERVER}
Check installed drivers here => Installed ODBC Drivers
Format for connection to Azure Sql Database is :
import pyodbc
conn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};'
'SERVER=tcp:nameServer.database.windows.net,1433;'
'DATABASE=Name database; UID=name; PWD=password;')
Format for connection to Ms SQL Databse Local:
import pyodbc
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};'
'SERVER=server.name;' // example Doctor-Notebook\\MSSQLEXPRESS
'DATABASE=database.name; Trusted_connection = yes')

I faced this issue and was looking for the solution. Finally I was trying all the options from the https://github.com/mkleehammer/pyodbc/wiki/Connecting-to-SQL-Server-from-Windows , and for my MSSQL 12 only "{ODBC Driver 11 for SQL Server}" works. Just try it one by one. And the second important thing you have to get correct server name, because I thought preciously that I need to set \SQLEXPRESS in all of the cases, but found out that you have to set EXACTLY what you see in the server properties. Example on the screenshot:

You could try:
import pyodbc
# Using a DSN
cnxn = pyodbc.connect('DSN=odbc_datasource_name;UID=db_user_id;PWD=db_password')
Note: You will need to know the "odbc_datasource_name". In Windows you can search for ODBC Data Sources. The name will look something like this:
Data Source Name Example

The below code works magic.
SQLALCHEMY_DATABASE_URI = "mssql+pyodbc://<servername>/<dbname>?driver=SQL Server Native Client 11.0?trusted_connection=yes?UID" \
"=<db_name>?PWD=<pass>"

Below connection string is working
import pandas as pd
import pyodbc as odbc
sql_conn = odbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER=SERVER_NAME;DATABASE=DATABASE_NAME;UID=USERNAME;PWD=PASSWORD;')
query = "SELECT * FROM admin.TABLE_NAME"
df = pd.read_sql(query, sql_conn)
df.head()

I have had the same error on python3 and this help me:
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};'
'SERVER=YourServerName;'
'DATABASE=YourDatabaseName;UID=USER_NAME;PWD=PASS_WORD;')
remember python is case-sensitive so you have to mention DRIVER,SERVER,... in upper case.
and you can visit this link for more information:
https://learn.microsoft.com/en-us/sql/connect/python/pyodbc/step-3-proof-of-concept-connecting-to-sql-using-pyodbc?view=sql-server-ver15

In my case, the exact same error was caused by the lack of the drivers on Windows Server 2019 Datacenter running in an Azure virtual machine.
As soon as I installed the drivers from https://www.microsoft.com/en-us/download/details.aspx?id=56567, the issue was gone.

for error : pyodbc.InterfaceError: ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')
No space between the driver and event
connection = Driver={SQL Server Native Client 11.0};
"Server=servername;"
"Database=dbname;"
"Trusted_Connection=yes;"

Apart from the other answers, that considered the connection string itself, it might simply be necessary to download the correct odbc driver. My client just faced this issue when executing a python app, that required it.
you can check this by pressing windows + typing "odbc".
the correct driver should appear in the drivers tab.

Create a DSN something like this (ASEDEV) for your connection and try to use DSN instead of DRIVER like below:
enter code here
import pyodbc
cnxn = pyodbc.connect('DSN=ASEDEV;User ID=sa;Password=sybase123')
mycur = cnxn.cursor()
mycur.execute("select * from master..sysdatabases")
row = mycur.fetchone()
while row:
print(row)
row = mycur.fetchone()`

I was facing the same issue whole day wasted and I tried all possible ODBC Driver values
import pyodbc
connection = pyodbc.connect('Driver = {SQL Server};Server=ServerName;'
'Database=Database_Name;Trusted_Connection=yes;')
In place of Driver = {SQL Server} we can try these option one by one or just you can use with you corresponding setting, somehow in my case the last one works :)
Driver={ODBC Driver 11 for SQL Server} for SQL Server 2005 - 2014
Driver={ODBC Driver 13 for SQL Server} for SQL Server 2005 - 2016
Driver={ODBC Driver 13.1 for SQL Server} for SQL Server 2008 - 2016
Driver={ODBC Driver 17 for SQL Server} for SQL Server 2008 - 2017
Driver={SQL Server} for SQL Server 2000
Driver={SQL Native Client} for SQL Server 2005
Driver={SQL Server Native Client 10.0} for SQL Server 2008
Driver={SQL Server Native Client 11.0} for SQL Server 2012

You need to download Microsoft ODBC Driver 13 for SQL Server
from Microsoft ODBC Driver 13

Thank you Avinash.
brilliant. I tried to connect to MS Azure database using PyCharm. It worked.
server = ''
database = ''
username = ''
password = ''
driver = 'SQL Server Native Client 11.0'
connection1 = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+ password + ';TDS_Version=8.0')
print("Connected.")

Try below:
import pyodbc
server = 'servername'
database = 'DB'
username = 'UserName'
password = 'Password'
cnxn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
cursor.execute('SELECT * FROM Tbl')
for row in cursor:
print('row = %r' % (row,))

Have you installed any product of SQL in your system machine ?
You can download and install "ODBC Driver 13(or any version) for SQL Server" and try to run if you havent alerady done.

Make sure you have all drivers and db engine installed
https://www.microsoft.com/en-us/download/details.aspx?id=54920

server = '123.45.678.90'
database = 'dbname'
username = 'username'
password = 'pwork'
driivver = '{ODBC Driver 17 for SQL Server}'
samgiongzon='DRIVER='+driivver+';SERVER='+server+\
';DATABASE='+database+';UID='+username+\
';PWD='+password+';Trusted_Connection=no;'
pyodbc.connect(samgiongzon, autocommit=True)
it worked for me; you need to install driver from here
https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver15
or (in ubuntu) sudo apt-get install unixodbc-dev if you get an error with pip install pyodbc

if any one are trying to access the database which is hosted in azure then try to give the driver as ODBC Driver 17 for SQL Server

How to connect to HIVE using python?

I'm using a CDH cluster which is kerberous enabled and I'd like to use pyhive to connect to HIVE and read HIVE tables. Here is the code I have
from pyhive import hive
from TCLIService.ttypes import TOperationState
cursor = hive.connect(host = 'xyz', port = 10000, username = 'my_username', auth = 'KERBEROS', database = 'poc', kerberos_service_name = 'hive' ).cursor()
I'm getting the value of xyz from hive-site.xml under hive.metastore.uris, however it says xyz:9083, but if I replace 10000 with 9083, it complains.
My problem is when I connect (using port = 10000), it gives me permission error when executing a query, while I can read that table if I use HIVE CLI or beeline. My question is 1) if the xyz is the value I should use? 2) which port I should use? 3) if all is correct, why I'm still getting a permission issue?

Connect SQL Server to Python 3 with pyodbc

import pyodbc
cnxn = pyodbc.connect('DRIVER={SQL Server Native Client 11.0};SERVER=LENOVO-PCN;DATABASE=testing;')
cursor = cnxn.cursor()
cursor.execute("select Sales from Store_Inf")
row = cursor.fetchone()
if row:
print (row)
I try using python 3 with module pyodbc to connect SQL Server Express.
My codes gave a error:
('08001', '[08001] [Microsoft][SQL Server Native Client 11.0]Named
Pipes Provider: Could not open a connection to SQL Server [2]. (2)
(SQLDriverConnect)')
Any idea for this?

Here is an example that worked for me using Trusted_Connection=yes
import pyodbc
conn_str = pyodbc.connect(
Trusted_Connection='Yes',
Driver='{ODBC Driver 11 for SQL Server}',
Server='SERVER_NAME,PORT_NUMBER',
Database='DATABASE_NAME'
)
connection = pyodbc.connect(conn_str)
Please note that port number is comma separated!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to query Hive from python connecting using Zookeeper? - hive

Related

PyHive unable to fetch logs from HiveServer2 when running in async mode

Submitting Spark Job to Livy (in EMR) from Airflow (using airflow Livy operator)

MS SQL Server connection using ODBC Driver from Python [duplicate]

How to connect to HIVE using python?

Connect SQL Server to Python 3 with pyodbc

Categories

Resources