I am using the following command in df1 is dataframe containing single column Name of type String.
df1.write.format('jdbc').option('url', 'jdbc:hive2://hostname:port').option('dbtable','sampletest').save()
It raises the following error:
Py4JJavaError: An error occurred while calling o146.saveAsTable.
: java.sql.SQLException: org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'CREATE TABLE sampletest ("Name"'(line 1, pos 25)
I doubt the quotes around the column name is causing issue. Need help for workaround or fix. Also what should be the conf to connect sparksession to remote thrift server directly to fire queries?
Thanks.
Related
I am trying to write a dataframe from R to a SQL database. When I try to append the table to the SQL dataframe, I receive the following error:
Error in result_insert_dataframe(rs#ptr, values, batch_rows) :
[Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification
I understand that this is an issue with one of my field not being formatted correctly or containing incompatible values for SQL. However, it is a very large dataframe with multiple fields, and I am having trouble isolating which is the issue. Does anyone have a suggested workflow or diagnostic tools to help isolate the issue?
Main Question: Are there diagnostic tools in R to help specify/identify which field is triggering an "invalid character value for cast specification" warning when trying to append a multi-variable dataframe in R to a SQL data table?
Edit - here are the data types of the R dataframe compared with the target table:
I am exporting data from pandas Dataframe to SQL Server using Pyodbc. I have a problem, faced for the first time - that when I insert into a SQL Server column with a name that starts with a number - 5star - the for loop doesn't capture the name of the column, highlights number as it was not part of the column name and returns 'invalid syntax' error.
The code is visible below, was working perfectly when column name wasn't starting from a number.
for index, row in merged3.iterrows():
cursor.execute("INSERT INTO database.rating (overall, 5star, 4star) values(?,?,?)", row.overall, row.5star, row.4star)
database.commit()
cursor.close()
Any ideas how to improve it, without changing the name of the column in the database itself?
I recommend using sqlalchemy:
from sqlalchemy import create_engine
import pandas as pd
quoted = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};SERVER=(localDb)\ProjectsV14;DATABASE=database")
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
merged3.to_sql('rating', schema='dbo', con = engine)
I'm trying to load a remote Oracle Database table on to Apache Spark shell.
This is how I started the spark-shell.
./spark-shell --driver-class-path ../jars/ojdbc6.jar --jars ../jars/ojdbc6.jar --master local
And I get a Scala prompt, where I try to load an Oracle database table like below. (I use a custom JDBC URL)
val jdbcDF = spark.read.format("jdbc").option("url", "jdbc:oracle:thin:#(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=WHATEVER)(HOST=myDummyHost.com)(PORT=xxxx)))(CONNECT_DATA=(SERVICE_NAME=dummy)(INSTANCE_NAME=dummyKaMummy)(UR=A)(SERVER=DEDICATED)))").option("dbtable", "THE_DUMMY_TABLE").option("user", "DUMMY_USER").option("password", "DUMMYPASSWORD").option("driver", "oracle.jdbc.driver.OracleDriver").load()
(Replaced employer data with dummy variables)
And then I get this error.
java.sql.SQLException: Unrecognized SQL type -102
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:246)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:315)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
... 49 elided
I tried to see if there is an issue with the quotes, but it's not that.
Can somebody save my life, please?
The problem is an incompatible field in the database. If you cannot modify the database, but would still like to read it, the solution would be to ignore specific columns (in my case it's a field with type geography). With the help of How to select specific columns through Spack JDBC?, here's a solution in pyspark (scala solution would be similar):
df = spark.read.jdbc(url=connectionString, table="(select colName from Table) as CompatibleTable", properties=properties)
I'm trying to insert a row in the table by using below code but it is throwing an error. can anyone help me out to solve the error?
Thanks in advance!!
db2 "Insert into TARIFF_PRODUCT_ATTRIBUTES values (409499, 'ADDITION_SMS_TEMPLATE', 'IDSSMS1')";
Error is :
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL0204N "DB2EAI2.TARIFF_PRODUCT_ATTRIBUTES" is an undefined name.
SQLSTATE=42704
Common causes of SQL0204N in Db2:
spelling mistake in the object name
object does not exist in the currently connected Db2 database
object exists in current database but in a different schema than your current default schema (so you must qualify the name with the correct schema-name).
mixed case table name (Db2 will always uppercase unquoted object names, so if the object is Tariff_Product_Attributes then use double-quotes around the name in the SQL to allow Db2 to find the object).
There are other less common causes , see the documentation for the complete list.
I am trying to import https://www.yelp.com/dataset/documentation/sql into a PostgreSQL instance. It is having problems with accent marks/backtick. Other than doing a character replace, is there any other way to deal with this?
ERROR: syntax error at or near "PaxHeader"
LINE 1: PaxHeader/yelp_db.sql17 uid=998889796
^
ERROR: syntax error at or near "`"
LINE 1: CREATE DATABASE /*!32312 IF NOT EXISTS*/ `yelp_db` /*!40100 ...
^
ERROR: syntax error at or near "USE"
LINE 1: USE `yelp_db`;
^
ERROR: syntax error at or near "`"
LINE 1: DROP TABLE IF EXISTS `attribute`;
These are typical MySQL syntax issues that PostgreSQL conforms to the standard on and therefore doesn't support. There are a few different converters on GitHub that might help. When I had to do this last, there were tools to convert text dumps. They didn't work perfectly but they got things close enough. Reviewing the tools around today, they tend to assume you have an actual MySQL database, not just a dump file.
So it looks like the appropriate way to address this today is to load the data into MySQL and then move it to PostgreSQL. In this regard you seem to have four options that I can think of for converting the schema and data:
There are tools to convert XML dumps from MySQL and load these into PostgreSQL.
You could set up a foreign data wrapper from PostgreSQL to the MySQL db and then copy the schemas and data in.
You could manually convert the schemas, and then dump/reload the data using an ETL process via CSV.
There are tools to read a live MySQL database and insert the data into PostgreSQL.