Create table from dataframe to remote thrift server

Create table from dataframe to remote thrift server - hive

I am using the following command in df1 is dataframe containing single column Name of type String.
df1.write.format('jdbc').option('url', 'jdbc:hive2://hostname:port').option('dbtable','sampletest').save()
It raises the following error:
Py4JJavaError: An error occurred while calling o146.saveAsTable.
: java.sql.SQLException: org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'CREATE TABLE sampletest ("Name"'(line 1, pos 25)
I doubt the quotes around the column name is causing issue. Need help for workaround or fix. Also what should be the conf to connect sparksession to remote thrift server directly to fire queries?
Thanks.

Related

Are there diagnostic tools to isolate issue with "Invalid character value for cast specification" (Import dataframe from R to SQL)

I am trying to write a dataframe from R to a SQL database. When I try to append the table to the SQL dataframe, I receive the following error:
Error in result_insert_dataframe(rs#ptr, values, batch_rows) :
[Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification
I understand that this is an issue with one of my field not being formatted correctly or containing incompatible values for SQL. However, it is a very large dataframe with multiple fields, and I am having trouble isolating which is the issue. Does anyone have a suggested workflow or diagnostic tools to help isolate the issue?
Main Question: Are there diagnostic tools in R to help specify/identify which field is triggering an "invalid character value for cast specification" warning when trying to append a multi-variable dataframe in R to a SQL data table?
Edit - here are the data types of the R dataframe compared with the target table:

cursor.execute() invalid syntax error when exporting data to SQL Server - column name starts with a number

I am exporting data from pandas Dataframe to SQL Server using Pyodbc. I have a problem, faced for the first time - that when I insert into a SQL Server column with a name that starts with a number - 5star - the for loop doesn't capture the name of the column, highlights number as it was not part of the column name and returns 'invalid syntax' error.
The code is visible below, was working perfectly when column name wasn't starting from a number.
for index, row in merged3.iterrows():
cursor.execute("INSERT INTO database.rating (overall, 5star, 4star) values(?,?,?)", row.overall, row.5star, row.4star)
database.commit()
cursor.close()
Any ideas how to improve it, without changing the name of the column in the database itself?

I recommend using sqlalchemy:
from sqlalchemy import create_engine
import pandas as pd
quoted = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};SERVER=(localDb)\ProjectsV14;DATABASE=database")
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
merged3.to_sql('rating', schema='dbo', con = engine)

java.sql.SQLException: Unrecognized SQL type -102 while connecting to Oracle Database from Apache Spark

I'm trying to load a remote Oracle Database table on to Apache Spark shell.
This is how I started the spark-shell.
./spark-shell --driver-class-path ../jars/ojdbc6.jar --jars ../jars/ojdbc6.jar --master local
And I get a Scala prompt, where I try to load an Oracle database table like below. (I use a custom JDBC URL)
val jdbcDF = spark.read.format("jdbc").option("url", "jdbc:oracle:thin:#(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=WHATEVER)(HOST=myDummyHost.com)(PORT=xxxx)))(CONNECT_DATA=(SERVICE_NAME=dummy)(INSTANCE_NAME=dummyKaMummy)(UR=A)(SERVER=DEDICATED)))").option("dbtable", "THE_DUMMY_TABLE").option("user", "DUMMY_USER").option("password", "DUMMYPASSWORD").option("driver", "oracle.jdbc.driver.OracleDriver").load()
(Replaced employer data with dummy variables)
And then I get this error.
java.sql.SQLException: Unrecognized SQL type -102
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:246)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:315)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
... 49 elided
I tried to see if there is an issue with the quotes, but it's not that.
Can somebody save my life, please?

The problem is an incompatible field in the database. If you cannot modify the database, but would still like to read it, the solution would be to ignore specific columns (in my case it's a field with type geography). With the help of How to select specific columns through Spack JDBC?, here's a solution in pyspark (scala solution would be similar):
df = spark.read.jdbc(url=connectionString, table="(select colName from Table) as CompatibleTable", properties=properties)

Facing an error when trying to insert the data in database using db2?

I'm trying to insert a row in the table by using below code but it is throwing an error. can anyone help me out to solve the error?
Thanks in advance!!
db2 "Insert into TARIFF_PRODUCT_ATTRIBUTES values (409499, 'ADDITION_SMS_TEMPLATE', 'IDSSMS1')";
Error is :
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL0204N "DB2EAI2.TARIFF_PRODUCT_ATTRIBUTES" is an undefined name.
SQLSTATE=42704

Common causes of SQL0204N in Db2:
spelling mistake in the object name
object does not exist in the currently connected Db2 database
object exists in current database but in a different schema than your current default schema (so you must qualify the name with the correct schema-name).
mixed case table name (Db2 will always uppercase unquoted object names, so if the object is Tariff_Product_Attributes then use double-quotes around the name in the SQL to allow Db2 to find the object).
There are other less common causes , see the documentation for the complete list.

psql restore sql dump syntax error

I am trying to import https://www.yelp.com/dataset/documentation/sql into a PostgreSQL instance. It is having problems with accent marks/backtick. Other than doing a character replace, is there any other way to deal with this?
ERROR: syntax error at or near "PaxHeader"
LINE 1: PaxHeader/yelp_db.sql17 uid=998889796
^
ERROR: syntax error at or near "`"
LINE 1: CREATE DATABASE /*!32312 IF NOT EXISTS*/ `yelp_db` /*!40100 ...
^
ERROR: syntax error at or near "USE"
LINE 1: USE `yelp_db`;
^
ERROR: syntax error at or near "`"
LINE 1: DROP TABLE IF EXISTS `attribute`;

These are typical MySQL syntax issues that PostgreSQL conforms to the standard on and therefore doesn't support. There are a few different converters on GitHub that might help. When I had to do this last, there were tools to convert text dumps. They didn't work perfectly but they got things close enough. Reviewing the tools around today, they tend to assume you have an actual MySQL database, not just a dump file.
So it looks like the appropriate way to address this today is to load the data into MySQL and then move it to PostgreSQL. In this regard you seem to have four options that I can think of for converting the schema and data:
There are tools to convert XML dumps from MySQL and load these into PostgreSQL.
You could set up a foreign data wrapper from PostgreSQL to the MySQL db and then copy the schemas and data in.
You could manually convert the schemas, and then dump/reload the data using an ETL process via CSV.
There are tools to read a live MySQL database and insert the data into PostgreSQL.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Create table from dataframe to remote thrift server - hive

Related

Are there diagnostic tools to isolate issue with "Invalid character value for cast specification" (Import dataframe from R to SQL)

cursor.execute() invalid syntax error when exporting data to SQL Server - column name starts with a number

java.sql.SQLException: Unrecognized SQL type -102 while connecting to Oracle Database from Apache Spark

Facing an error when trying to insert the data in database using db2?

psql restore sql dump syntax error

Categories

Resources