pyspark jdbc error when connecting to sql server - apache-spark-sql

I am trying to import json documents stored on Azure Data Lake Gen2 to SQL Server database using the code below but run into the following error. But when I read data from SQL Server the jdbc connection works.
Error Message: The driver could not open a JDBC connection.
Code:
df = spark.read.format('json').load("wasbs://<file_system>#<storage-account-name>.blob.core.windows.net/empDir/data";)
val blobStorage = "<blob-storage-account-name>.blob.core.windows.net"
val blobContainer = "<blob-container-name>"
val blobAccessKey = "<access-key>"
val empDir = "wasbs://" + blobContainer + "#" + blobStorage +"/empDir"
val acntInfo = "fs.azure.account.key."+ blobStorage
sc.hadoopConfiguration.set(acntInfo, blobAccessKey)
val dwDatabase = "<database-name>"
val dwServer = "<database-server-name>"
val dwUser = "<user-name>"
val dwPass = "<password>"
val dwJdbcPort = "1433"
val sqlDwUrl = "jdbc:sqlserver://" + dwServer + ":" + dwJdbcPort + ";database=" + dwDatabase + ";user=" + dwUser+";password=" + dwPass
spark.conf.set("spark.sql.parquet.writeLegacyFormat","true")
df.write.format("com.microsoft.sqlserver.jdbc.SQLServerDriver").option("url", sqlDwUrl).option("dbtable", "Employee").option( "forward_spark_azure_storage_credentials","True").option("tempdir", empDir).mode("overwrite").save()
Also how to insert all the json documents from empDir directory into the employee table?

You will receive this error message: org.apache.spark.sql.AnalysisException: Table or view not found: dbo.Employee when there is no associated table or view created which you are referring. Make sure the code is pointing to the correct database [Azure Databricks Database (internal) or Azure SQL Database (External)]
You may checkout the query addressed on Microsoft Q&A - Azure Databricks forum.
Writing data to Azure Databricks Database:
To successfully insert data into default database, make sure create a Table or view.
Checkout the dataframe written to default database.
Writing data to Azure SQL database:
Here is an example on how to write data from a dataframe to Azure SQL Database.
Checkout the dataframe written to Azure SQL database.

Related

Copy records from one table to another using spark-sql-jdbc

I am trying to do POC in pyspark on a very simple requirement. As a first step, I am just trying to copy the table records from one table to another table. There are more than 20 tables but at first, I am trying to do it only for the one table and later enhance it to multiple tables.
The below code is working fine when I am trying to copy only 10 records. But, when I am trying to copy all records from the main table, this code is getting stuck and eventually I have to terminate it manually. As the main table has 1 million records, I was expecting it to happen in few seconds, but it just not getting completed.
Spark UI :
Could you please suggest how should I handle it ?
Host : Local Machine
Spark verison : 3.0.0
database : Oracle
Code :
from pyspark.sql import SparkSession
from configparser import ConfigParser
#read configuration file
config = ConfigParser()
config.read('config.ini')
#setting up db credentials
url = config['credentials']['dbUrl']
dbUsr = config['credentials']['dbUsr']
dbPwd = config['credentials']['dbPwd']
dbDrvr = config['credentials']['dbDrvr']
dbtable = config['tables']['dbtable']
#print(dbtable)
# database connection
def dbConnection(spark):
pushdown_query = "(SELECT * FROM main_table) main_tbl"
prprDF = spark.read.format("jdbc")\
.option("url",url)\
.option("user",dbUsr)\
.option("dbtable",pushdown_query)\
.option("password",dbPwd)\
.option("driver",dbDrvr)\
.option("numPartitions", 2)\
.load()
prprDF.write.format("jdbc")\
.option("url",url)\
.option("user",dbUsr)\
.option("dbtable","backup_tbl")\
.option("password",dbPwd)\
.option("driver",dbDrvr)\
.mode("overwrite").save()
if __name__ =="__main__":
spark = SparkSession\
.builder\
.appName("DB refresh")\
.getOrCreate()
dbConnection(spark)
spark.stop()
It looks like you are using only one thread(executor) to process the data by using JDBC connection. Can you check the executors and driver details in Spark UI and try increasing the resources. Also share the error by which it's failing. You can get this from the same UI or use CLI to logs "yarn logs -applicationId "

How to write tables into Panoply using RPostgreSQL?

I am trying to write a table into my data warehouse using the RPostgreSQL package
library(DBI)
library(RPostgreSQL)
pano = dbConnect(dbDriver("PostgreSQL"),
host = 'db.panoply.io',
port = '5439',
user = panoply_user,
password = panoply_pw,
dbname = mydb)
RPostgreSQL::dbWriteTable(pano, "mtcars", mtcars[1:5, ])
I am getting this error:
Error in postgresqlpqExec(new.con, sql4) :
RS-DBI driver: (could not Retrieve the result : ERROR: syntax error at or near "STDIN"
LINE 1: ..."hp","drat","wt","qsec","vs","am","gear","carb" ) FROM STDIN
^
)
The above code writes into Panoply as a 0 row, 0 byte table. Columns seem to be properly entered into Panoply but nothing else appears.
Fiest and most important redshift <> postgresql.
Redshift does not use the Postgres bulk loader. (so stdin is NOT allowed).
There are many options available which you should choose depending on your need, especially consider the volume of data.
For high volume of data you should write to s3 first and then use redshift copy command.
There are many options take a look at
https://github.com/sicarul/redshiftTools
for low volume see
inserting multiple records at once into Redshift with R

How to create sybase external table for SQL

I am trying to create an external table using SQL server 2019 to sybase.
I am already able to create a linked server to sybase using the same driver and login information.
I am able to exec this code with no error:
CREATE EXTERNAL DATA SOURCE external_data_source_name
WITH (
LOCATION = 'odbc://jjjjj.nnnn.iiii.com:xxxxx',
CONNECTION_OPTIONS = 'DRIVER={SQL Anywhere 17};
ServerNode = jjjjj.nnnn.iiii.com:xxxxx;
Database = report;
Port = xxxxx',
CREDENTIAL = [PolyFriend] );
but when I try to create a table using the data source
CREATE EXTERNAL TABLE v_data(
event_id int
) WITH (
LOCATION='report.dbo.v_data',
DATA_SOURCE=external_data_source_name
);
I get this error:
105082;Generic ODBC error: [SAP][ODBC Driver][SQL Anywhere]Database
server not found.
you need to specify the Host & ServerName & DatabaseName properties (for SQL Anywhere) in the connection_options
CREATE EXTERNAL DATA SOURCE external_data_source_name
WITH (
LOCATION = 'odbc://jjjjj.nnnn.iiii.com:xxxxx',
CONNECTION_OPTIONS = 'DRIVER={SQL Anywhere 17};
Host=jjjjj.nnnn.iiii.com:xxxxx;
ServerName=xyzsqlanywhereservername;
DatabaseName=report;',
CREDENTIAL = [PolyFriend] );
Host == machinename:port, the machinename where SQLAnywhere resides and port most likely the default 2638 where the SQLAnywhere service is listening for connections.
ServerName == the name of the SQLAnywhere server/service which hosts the database (connect to the SQLAnywhere db and execute select ##servername).

SSAS tabular model Value.NativeQuery Faield

I'm trying to use Value.NativeQuery with ODBC coomection to Big Query And get Error
Original Code:
let
Source = #"Odbc/dsn=Google BigQuery",
#"prod_Database" = Source{[Name="prod",Kind="Database"]}[Data],
default_Schema = #"prod_Database"{[Name="default",Kind="Schema"]}[Data],
DW_DIM_Table = default_Schema{[Name="DW_DIM",Kind="Table"]}[Data]
in
DW_DIM_Table
New Code
let
Source = #"Odbc/dsn=Google BigQuery",
MyQuery = Value.NativeQuery(#"Odbc/dsn=Google BigQuery", "SELECT * FROM `default`.DW_DIM")
in
MyQuery
Error:
The query statement is not valid.
When I'm trying Put the statement to Power Query Editor:
= Value.NativeQuery(#"Odbc/dsn=Google BigQuery", "SELECT * FROM `default`.DW_DIM")
get Error
Expression.Error: Native queries aren't supported by this value.
Details:
Table
I had the same error with Athena ODBC (which is also a Simba Driver) but was able to work around it by using "Odbc.Query".

--staging-table while loading data to sql server (sqoop)

I am getting following error msg when I am trying to load data to sql server 2012 with sqoop.
(org.apache.sqoop.manager.SQLServerManager) does not support staging of data for export. Please retry without specifying the --staging-table option
But the same code is working fine in mysql.
Is it something to do with the sql server compatibility with sqoop.
How to fix this issue
I am using following arg in sqoop export
String arguments[] = new String[] {
"export",
"--connect",
configuration.get(PROPERTY_DBSTRING),
"--username",
configuration.get(PROPERTY_DBUSERNAME),
"--password",
configuration.get(PROPERTY_DBPASSWORD),
"--verbose",
"--table",
tableName,
"--staging-table",
"TEMP_" + tableName,
"--clear-staging-table",
"--input-fields-terminated-by",
configuration.get(PROPERTY_SQOOP_FIELD_SEPARATER),
"--export-dir",
configuration.get(PROPERTY_INPUT_DATADIRECTORY) + "/" + tableName };
return Sqoop.runTool(arguments, configuration);