How to create sybase external table for SQL - sql

I am trying to create an external table using SQL server 2019 to sybase.
I am already able to create a linked server to sybase using the same driver and login information.
I am able to exec this code with no error:
CREATE EXTERNAL DATA SOURCE external_data_source_name
WITH (
LOCATION = 'odbc://jjjjj.nnnn.iiii.com:xxxxx',
CONNECTION_OPTIONS = 'DRIVER={SQL Anywhere 17};
ServerNode = jjjjj.nnnn.iiii.com:xxxxx;
Database = report;
Port = xxxxx',
CREDENTIAL = [PolyFriend] );
but when I try to create a table using the data source
CREATE EXTERNAL TABLE v_data(
event_id int
) WITH (
LOCATION='report.dbo.v_data',
DATA_SOURCE=external_data_source_name
);
I get this error:
105082;Generic ODBC error: [SAP][ODBC Driver][SQL Anywhere]Database
server not found.

you need to specify the Host & ServerName & DatabaseName properties (for SQL Anywhere) in the connection_options
CREATE EXTERNAL DATA SOURCE external_data_source_name
WITH (
LOCATION = 'odbc://jjjjj.nnnn.iiii.com:xxxxx',
CONNECTION_OPTIONS = 'DRIVER={SQL Anywhere 17};
Host=jjjjj.nnnn.iiii.com:xxxxx;
ServerName=xyzsqlanywhereservername;
DatabaseName=report;',
CREDENTIAL = [PolyFriend] );
Host == machinename:port, the machinename where SQLAnywhere resides and port most likely the default 2638 where the SQLAnywhere service is listening for connections.
ServerName == the name of the SQLAnywhere server/service which hosts the database (connect to the SQLAnywhere db and execute select ##servername).

Related

How to query Hive from python connecting using Zookeeper?

I can connect to a Hive (or LLAP) database using pyhive and I can query the database fixing the server host. Here is a code example:
from pyhive import hive
host_name = "vrt1553.xxx.net"
port = 10000
connection = hive.Connection(
host=host_name,
port=port,
username=user,
kerberos_service_name='hive',
auth='KERBEROS',
)
cursor = connection.cursor()
cursor.execute('show databases')
print(cursor.fetchall())
How could I connect using Zookeeper to get a server name?
You must install the Kazoo package to query Zookeeper and find the host and port of your Hive servers:
import random
from kazoo.client import KazooClient
zk = KazooClient(hosts='vrt1554.xxx.net:2181,vrt1552.xxx.net:2181,vrt1558.xxx.net:2181', read_only=True)
zk.start()
servers = [hiveserver2.split(';')[0].split('=')[1].split(':')
for hiveserver2
in zk.get_children(path='hiveserver2')]
hive_host, hive_port = random.choice(servers)
zk.stop()
print(hive_host, hive_port)
Then just pass hive_host and hive_port to your Connection constructor:
connection = hive.Connection(
host=hive_host,
port=hive_port,
username=user,
kerberos_service_name="hive",
auth="KERBEROS",
)
And query as a standard python sql cursor. Here is using pandas:
df = pd.read_sql(sql_query, connection)

pyspark jdbc error when connecting to sql server

I am trying to import json documents stored on Azure Data Lake Gen2 to SQL Server database using the code below but run into the following error. But when I read data from SQL Server the jdbc connection works.
Error Message: The driver could not open a JDBC connection.
Code:
df = spark.read.format('json').load("wasbs://<file_system>#<storage-account-name>.blob.core.windows.net/empDir/data";)
val blobStorage = "<blob-storage-account-name>.blob.core.windows.net"
val blobContainer = "<blob-container-name>"
val blobAccessKey = "<access-key>"
val empDir = "wasbs://" + blobContainer + "#" + blobStorage +"/empDir"
val acntInfo = "fs.azure.account.key."+ blobStorage
sc.hadoopConfiguration.set(acntInfo, blobAccessKey)
val dwDatabase = "<database-name>"
val dwServer = "<database-server-name>"
val dwUser = "<user-name>"
val dwPass = "<password>"
val dwJdbcPort = "1433"
val sqlDwUrl = "jdbc:sqlserver://" + dwServer + ":" + dwJdbcPort + ";database=" + dwDatabase + ";user=" + dwUser+";password=" + dwPass
spark.conf.set("spark.sql.parquet.writeLegacyFormat","true")
df.write.format("com.microsoft.sqlserver.jdbc.SQLServerDriver").option("url", sqlDwUrl).option("dbtable", "Employee").option( "forward_spark_azure_storage_credentials","True").option("tempdir", empDir).mode("overwrite").save()
Also how to insert all the json documents from empDir directory into the employee table?
You will receive this error message: org.apache.spark.sql.AnalysisException: Table or view not found: dbo.Employee when there is no associated table or view created which you are referring. Make sure the code is pointing to the correct database [Azure Databricks Database (internal) or Azure SQL Database (External)]
You may checkout the query addressed on Microsoft Q&A - Azure Databricks forum.
Writing data to Azure Databricks Database:
To successfully insert data into default database, make sure create a Table or view.
Checkout the dataframe written to default database.
Writing data to Azure SQL database:
Here is an example on how to write data from a dataframe to Azure SQL Database.
Checkout the dataframe written to Azure SQL database.

Connecting R to postgreSQL database

I am trying to connect R to a postgreSQL database. He is what I have been trying in R:
require("RPostgreSQL")
pw<- {
"password"
}
# loads the PostgreSQL driver
drv <- dbDriver("PostgreSQL")
# creates a connection to the postgres database
# note that "con" will be used later in each connection to the database
con <- dbConnect(drv, dbname = "DBname",
host = "localhost", port = 5432,
user = "user", password = pw)
rm(pw) # removes the password
# check for the test_table
dbExistsTable(con, "test_table")
# FALSE >>> Should be true
I cannot figure out why it is not properly connecting to my database. I know that the database is on my computer as I can connect to it in the terminal and with pgAdmin4. Any help is greatly appreciated.
Thanks
I have had better success with the RPostgres package in combination with DBI and I know that RPostgreSQL just released a new version in May after no changes for a while. RPostgres is pretty active
## install.packages("devtools")
#devtools::install_github("RcppCore/Rcpp")
#devtools::install_github("rstats-db/DBI")
#devtools::install_github("rstats-db/RPostgres")
library(RPostgres)
library(DBI)
pw<- {
"password"
}
con <- dbConnect(RPostgres::Postgres()
, host='localhost'
, port='5432'
, dbname='DBname'
, user='user'
, password=pw)
rm(pw) # removes the password
dbExistsTable(con, "test_table")
install.packages("RPostgreSQL")
require("RPostgreSQL")
# this completes installing packages
# now start creating connection
con <- dbConnect(dbDriver("PostgreSQL"),
dbname = "dbname",
host = "localhost",
port = 5432,
user = "db_user",
password = "db_password")
# this completes creating connection
# get all the tables from connection
dbListTables(con)
One issue could be table permission
GRANT ALL PRIVILEGES ON your_table TO user
Replace your_table and user with your own credentials
You can get table from \dt and user from \du

Polybase EXTERNAL TABLE access failed - permission denied

I`m trying to connect to hadoop via polybase in sql server 2016.
My code is:
CREATE EXTERNAL DATA SOURCE MyHadoopCluster WITH (
TYPE = HADOOP,
LOCATION ='hdfs://192.168.114.20:8020',
credential= HadoopUser1
);
CREATE EXTERNAL FILE FORMAT TextFileFormat WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR ='\001',
USE_TYPE_DEFAULT = TRUE)
);
CREATE EXTERNAL TABLE [dbo].[test_hadoop] (
[Market_Name] int NOT NULL,
[Claim_GID] int NOT NULL,
[Completion_Flag] int NULL,
[Diag_CDE] float NOT NULL,
[Patient_GID] int NOT NULL,
[Record_ID] int NOT NULL,
[SRVC_FROM_DTE] int NOT NULL
)
WITH (LOCATION='/applications/gidr/processing/lnd/sha/clm/cf/claim_diagnosis',
DATA_SOURCE = MyHadoopCluster,
FILE_FORMAT = TextFileFormat
);
And i got this error:
EXTERNAL TABLE access failed due to internal error: 'Java exception
raised on call to HdfsBridge_GetDirectoryFiles: Error [Permission
denied: user=pdw_user, access=READ_EXECUTE,
inode="/applications/gidr/processing/lnd/sha/clm/cf/claim_diagnosis":root:supergroup:drwxrwxr--
at
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:281)
at
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:262)
at
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:175)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6590)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6572)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6497)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:5034)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4995)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:882)
at
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getListing(AuthorizationProviderProxyClientProtocol.java:335)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:615)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) ]
occurred while accessing external file.'
The problem is, in newest version of polybase there is no config file, in which you can specify hadoop default login and password. So even, when i`m creating scoped credentials, polybase is still using default pdw_user. I even tried to create pdw_user on hadoop, but still got this error. Any ideas?
If you have a Kerberos secured Hadoop cluster, make sure you alter the xml files as described https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-configuration
If it is not a Kerberos secured Hadoop cluster, make sure that the default user pdw_user has read access to hdfs and execute permissions on Hive.

--staging-table while loading data to sql server (sqoop)

I am getting following error msg when I am trying to load data to sql server 2012 with sqoop.
(org.apache.sqoop.manager.SQLServerManager) does not support staging of data for export. Please retry without specifying the --staging-table option
But the same code is working fine in mysql.
Is it something to do with the sql server compatibility with sqoop.
How to fix this issue
I am using following arg in sqoop export
String arguments[] = new String[] {
"export",
"--connect",
configuration.get(PROPERTY_DBSTRING),
"--username",
configuration.get(PROPERTY_DBUSERNAME),
"--password",
configuration.get(PROPERTY_DBPASSWORD),
"--verbose",
"--table",
tableName,
"--staging-table",
"TEMP_" + tableName,
"--clear-staging-table",
"--input-fields-terminated-by",
configuration.get(PROPERTY_SQOOP_FIELD_SEPARATER),
"--export-dir",
configuration.get(PROPERTY_INPUT_DATADIRECTORY) + "/" + tableName };
return Sqoop.runTool(arguments, configuration);