Wrapping a statement with spark.sql throws parse exception, but it runs fine with %sql magic command - apache-spark-sql

The following code throws error:
X = "CREATE OR REPLACE TABLE Invoices (InvoiceID INT, CustomerID INT, BillToCustomerID INT, OrderID INT, DeliveryMethodID INT, ContactPersonID INT, AccountsPersonID INT, SalespersonPersonID INT, PackedByPersonID INT, InvoiceDate TIMESTAMP, CustomerPurchaseOrderNumber INT, IsCreditNote STRING, CreditNoteReason STRING, Comments STRING, DeliveryInstructions STRING, InternalComments STRING, TotalDryItems INT, TotalChillerItems STRING, DeliveryRun STRING, RunPosition STRING, ReturnedDeliveryData STRING, ConfirmedDeliveryTime TIMESTAMP, ConfirmedReceivedBy STRING, LastEditedBy INT, LastEditedWhen TIMESTAMP) LOCATION '/mnt/adls/DQD/udl/Invoices/'; ALTER TABLE Invoices ADD COLUMN DQ_Check_Op SMALLINT"
spark.sql(X)
But, with magic command, inside a cell, it runs fine:
%sql
CREATE OR REPLACE TABLE Invoices (InvoiceID INT, CustomerID INT, BillToCustomerID INT, OrderID INT, DeliveryMethodID INT, ContactPersonID INT, AccountsPersonID INT, SalespersonPersonID INT, PackedByPersonID INT, InvoiceDate TIMESTAMP, CustomerPurchaseOrderNumber INT, IsCreditNote STRING, CreditNoteReason STRING, Comments STRING, DeliveryInstructions STRING, InternalComments STRING, TotalDryItems INT, TotalChillerItems STRING, DeliveryRun STRING, RunPosition STRING, ReturnedDeliveryData STRING, ConfirmedDeliveryTime TIMESTAMP, ConfirmedReceivedBy STRING, LastEditedBy INT, LastEditedWhen TIMESTAMP) LOCATION '/mnt/adls/DQD/udl/Invoices/'; ALTER TABLE Invoices ADD COLUMN DQ_Check_Op SMALLINT
What am I doing wrong here?

The reason you are getting a ParseException in spark.sql is because of the usage of multiple SQL query statements (CREATE and ALTER) with the help of ; . Pyspark is not reading this query in the same way as Databricks SQL. Databricks SQL is able to parse and execute multiple queries separated by a semicolon. But Pyspark throws a ParseException while reading any statement written after semicolon.
So, using a single query with spark.sql is recommended (single line query or multiline query) instead of using a single spark.sql to execute multiple SQL queries.
The workaround using Pyspark SQL would be to use one spark.sql for each SQL query. You can modify your code as following:
X = "CREATE OR REPLACE TABLE Invoices (InvoiceID INT, CustomerID INT, BillToCustomerID INT, OrderID INT, DeliveryMethodID INT, ContactPersonID INT, AccountsPersonID INT, SalespersonPersonID INT, PackedByPersonID INT, InvoiceDate TIMESTAMP, CustomerPurchaseOrderNumber INT, IsCreditNote STRING, CreditNoteReason STRING, Comments STRING, DeliveryInstructions STRING, InternalComments STRING, TotalDryItems INT, TotalChillerItems STRING, DeliveryRun STRING, RunPosition STRING, ReturnedDeliveryData STRING, ConfirmedDeliveryTime TIMESTAMP, ConfirmedReceivedBy STRING, LastEditedBy INT, LastEditedWhen TIMESTAMP) LOCATION '/mnt/repro/';"
Y = "ALTER TABLE Invoices ADD COLUMN DQ_Check_Op SMALLINT"
z = "SELECT * FROM Invoices"
spark.sql(X)
spark.sql(Y)
spark.sql(z)

In my actual example, my table wasn't in the default database - I didn't think that was a relevant detail. After I wrapped my database and table name with ` (backtick), the error went off.
So, the following helped:
CREATE TABLE `sch`.`Invoices` (InvoiceID INT, CustomerID INT, BillToCustomerID INT, OrderID INT, DeliveryMethodID INT, ContactPersonID INT, AccountsPersonID INT, SalespersonPersonID INT, PackedByPersonID INT, InvoiceDate TIMESTAMP, CustomerPurchaseOrderNumber INT, IsCreditNote STRING, CreditNoteReason STRING, Comments STRING, DeliveryInstructions STRING, InternalComments STRING, TotalDryItems INT, TotalChillerItems STRING, DeliveryRun STRING, RunPosition STRING, ReturnedDeliveryData STRING, ConfirmedDeliveryTime TIMESTAMP, ConfirmedReceivedBy STRING, LastEditedBy INT, LastEditedWhen TIMESTAMP) LOCATION '/mnt/repro/';ALTER TABLE `sch`.`Invoices` ADD COLUMN DQ_Check_Op SMALLINT"

Related

[Amazon](500310) Invalid operation: syntax error at end of input Position: 684;

CREATE EXTERNAL TABLE schema_vtvs_ai_ext.fire(
fire_number VARCHAR(50),
fire_year DATE,
assessment_datetime INTEGER,
size_class CHAR,
fire_location_latitude REAL,
fire_location_longitude REAL,
fire_origin VARCHAR(50),
general_cause_desc VARCHAR(50),
activity_class VARCHAR(50),
true_cause VARCHAR(50),
fire_start_date DATE,
det_agent_type VARCHAR(50),
det_agent VARCHAR(50),
discovered_date DATE,
reported_date DATE,
start_for_fire_date DATE,
fire_fighting_start_date DATE,
initial_action_by VARCHAR(50),
fire_type VARCHAR(50),
fire_position_on_slope VARCHAR(50),
weather_conditions_over_fire VARCHAR(50),
fuel_type VARCHAR(50),
bh_fs_date DATE,
uc_fs_date DATE,
ex_fs_date DATE
);
This is the SQL code i have written to add an external table in Redhsift schema but the below error. i can't seem to see where the error is?
[Amazon](500310) Invalid operation: syntax error at end of input Position: 684;
If your data is in Amazon S3, then you need to specify the file format (via STORED AS) and the path to data files in S3 (via LOCATION).
This is the example query for csv files (with 1 line header):
create external table <external_schema>.<table_name> (...)
row format delimited
fields terminated by ','
stored as textfile
location 's3://mybucket/myfolder/'
table properties ('numRows'='100', 'skip.header.line.count'='1');
See official doc for details.

Why deleting from a table does not work in PostgreSQL

I tried deleting a row from a table but the result was
DELETE 0
When I created a copy of the original table and performed the delete on the new copy-table
the result was: DELETE 1 as expected
I've created a trigger function that updates a column on a different table upon inserting or deleting is there an option that this is the problem?
Does anyone know what might be the problem ??
P.S. I post a similar to the creation code of the original table because it has been altered for the next parts of the university project and the trigger function's code
P.S.2 A friend who has the same exact database, table, trigger function etc.
doesn't have this problem
create table Listings(
id int,
listing_url varchar(40),
scrape_id bigint,
last_scraped date,
name varchar(140),
summary varchar(1780),
space varchar(1840),
description varchar(2320),
experiences_offered varchar(10),
neighborhood_overview varchar(1830),
notes varchar(1790),
transit varchar(1810),
access varchar(1830),
interaction varchar(1340),
house_rules varchar(1820),
thumbnail_url varchar(10),
medium_url varchar(10),
picture_url varchar(150),
xl_picture_url varchar(10),
street varchar(90),
neighbourhood varchar(20),
neighbourhood_cleansed varchar(70),
neighbourhood_group_cleansed varchar(10),
city varchar(40),
state varchar(60),
zipcode varchar(20),
market varchar(30),
smart_location varchar(40),
country_code varchar(10),
country varchar(10),
latitude varchar(10),
longitude varchar(10),
is_location_exact boolean,
property_type varchar(30),
room_type varchar(20),
accommodates int,
bathrooms varchar(10),
bedrooms int,
beds int,
bed_type varchar(20),
amenities varchar(1660),
);
CREATE FUNCTION Vs()
RETURNS "trigger" AS
$Body$
BEGIN
IF(TG_OP='INSERT')THEN
UPDATE "Host"
SET listings_count = listings_count + 1
WHERE id = NEW.host_id;
RETURN NEW;
ELSIF(TG_OP='DELETE')THEN
UPDATE "Host"
SET listings_count = listings_count -1
WHERE id = OLD.host_id;
RETURN OLD;
END IF;
END;
$Body$
LANGUAGE 'plpgsql' VOLATILE;
CREATE TRIGGER InsertTrigger
before INSERT
ON "Listing"
FOR EACH ROW
EXECUTE PROCEDURE Vs();
CREATE TRIGGER DeleteTrigger
before DELETE
ON "Listing"
FOR EACH ROW
EXECUTE PROCEDURE Vs();

FAILED: ParseException line 1:36 cannot recognize input near '1987'

I'm trying to creat an external table in Hive with this
CREATE EXTERNAL TABLE IF NOT EXISTS 1987(
YEAR INT,
MONTH INT,
DAYOFMONTH INT,
DAYOFWEEK INT,
DEPTIME INT,
CRS INT,
ARRTIME TIME,
CARRIER STRING,
FLIGHTNUM INT,
TAILNUM STRING,
ACTUALELAPSED INT,
CRSELAPSED INT,
AIRTIME INT,
ARRDELAY INT,
DEPDELAY INT,
ORIGIN STRING,
DEST STRING,
DISTANCE INT,
TAXIIN INT,
TAXIOUT INT,
CANCELLED INT,
CANCELLATIONCODE STRING,
DIVERTED INT,
CARRIERDELAY INT,
WEATHERDELAY INT,
NASDELAY INT,
SECURITYDELAY INT,
LATEAIRCRAFT INT,
Origin CHAR(1))
COMMENT 'A??O 1987'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
location '/user/raj_ops/PROYECTO/'1987.csv';
But get the following error:
org.apache.hive.service.cli.HiveSQLException: Error while compiling
statement: FAILED: ParseException line 1:36 cannot recognize input
near '1987' '(' 'YEAR' in table name
Anyone knows why?
Thanks :)
location should be '/user/raj_ops/PROYECTO/' (without file itself). If you have other files in the same location then move them to separate locations, like /user/raj_ops/PROYECTO/1987/ for 1987. Because table can be built on top of location, not file.
And table name cannot start with digits. use backquotes 1987 or rename it like year_1987
I think you probably need to escape the table name with back-ticks if its numeric:
`1987`
There is an extra quote in the location value.Remove that.
location '/user/raj_ops/PROYECTO/'1987.csv';
Should be
location '/user/raj_ops/PROYECTO/1987.csv';

External table syntax error KUP-01005 Oracle

I receive error as below every time when i select external table that i have created.
ORA-29913: bład podczas wykonywania wywołania (callout) ODCIEXTTABLEOPEN
ORA-29400: bład kartrydza danych
KUP-00554: error encountered while parsing access parameters
KUP-01005: syntax error: found "minussign": expecting one of: "badfile, byteordermark, characterset, column, data, delimited, discardfile, dnfs_enable, dnfs_disable, disable_directory_link_check, field, fields, fixed, io_options, load, logfile, language, nodiscardfile, nobadfile, nologfile, date_cache, dnfs_readbuffers, preprocessor, readsize, string, skip, territory, variable, xmltag"
KUP-01007: at line 4 column 23
29913. 00000 - "error in executing %s callout"
The external table is created successfully. Here is the script which creates external table:
CREATE TABLE TB_CNEI_01C
(
NEW_OMC_ID VARCHAR(2),
NEW_OMC_NM VARCHAR(8),
NEW_BSS_ID VARCHAR(6),
NEW_BSS_NM VARCHAR(20),
OMC_ID VARCHAR(2),
OMC_NM VARCHAR(8),
OLD_BSS_ID VARCHAR(6),
OLD_BSS_NM VARCHAR(20),
DEPTH_NO INTEGER,
NE_TP_NO INTEGER,
OP_YN INTEGER,
FAC_ALIAS_NM VARCHAR(20),
FAC_GRP_ALIAS_NM VARCHAR(20),
SPC_VAL VARCHAR(4),
INMS_FAC_LCLS_CD VARCHAR(2),
INMS_FAC_MCLS_CD VARCHAR(3),
INMS_FAC_SCLS_CD VARCHAR(3),
INMS_FAC_SCLS_DTL_CD VARCHAR(2),
LDEPT_ID VARCHAR(3),
FAC_ID VARCHAR(15),
MME_IP_ADDR VARCHAR(20),
MDEPT_ID VARCHAR(4),
HW_TP_NM VARCHAR(20),
MME_POOL_NM VARCHAR(20),
BORD_CNT INTEGER,
FAC_DTL_CLSFN_NM VARCHAR(50),
INSTL_FLOOR_NM VARCHAR(20),
INSTL_LOC_NM VARCHAR(30)
)
ORGANIZATION EXTERNAL
(
TYPE oracle_loader
DEFAULT DIRECTORY EXTERNAL_DATA
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
badfile EXTERNAL_DATA:'testTable.bad'
logfile EXTERNAL_DATA:'testTable.log'
CHARACTERSET x-IBM949
FIELDS TERMINATED BY ','
MISSING FIELD VALUES ARE NULL
(
NEW_OMC_ID VARCHAR(2),
NEW_OMC_NM VARCHAR(8),
NEW_BSS_ID VARCHAR(6),
NEW_BSS_NM VARCHAR(20),
OMC_ID VARCHAR(2),
OMC_NM VARCHAR(8),
OLD_BSS_ID VARCHAR(6),
OLD_BSS_NM VARCHAR(20),
DEPTH_NO INTEGER,
NE_TP_NO INTEGER,
OP_YN INTEGER,
FAC_ALIAS_NM VARCHAR(20),
FAC_GRP_ALIAS_NM VARCHAR(20),
SPC_VAL VARCHAR(4),
INMS_FAC_LCLS_CD VARCHAR(2),
INMS_FAC_MCLS_CD VARCHAR(3),
INMS_FAC_SCLS_CD VARCHAR(3),
INMS_FAC_SCLS_DTL_CD VARCHAR(2),
LDEPT_ID VARCHAR(3),
FAC_ID VARCHAR(15),
MME_IP_ADDR VARCHAR(20),
MDEPT_ID VARCHAR(4),
HW_TP_NM VARCHAR(20),
MME_POOL_NM VARCHAR(20),
BORD_CNT INTEGER,
FAC_DTL_CLSFN_NM VARCHAR(50),
INSTL_FLOOR_NM VARCHAR(20),
INSTL_LOC_NM VARCHAR(30)
)
)
LOCATION ('TB_CNEI_01C.csv')
);
I have checked all permisions for data directory and data files
I had a few commented lines in my 'CREATE TABLE ..' script. I removed those commented lines and the error disappeared.
I received the above suggestion from : http://www.orafaq.com/forum/t/182288/
It seems your CHARACTERSET(x-IBM949) containing - character is not valid
You may try the other alternatives without that sign,
such as
AL32UTF8, US7ASCII, WE8MSWIN1252 .. etc.

How can I reseed a table for Merge replication when it has reach its max identity value. Software is hard coded as int so cannot change datatype?

I have a client using SQL 2008R2 and they have Merge replication in place. They have hit the max range on a few of their tables because the subscriptions were being removed and readded multiple times. The table on the publisher only contains 175000 row and is replicating to 7 other sites so they should be no where close to hitting the max.
How can I reseed the table and keep all the data intact? I tried copying the table and then dropping the table and renaming it but the identity ranges values stay the same. I cannot change the data type because the uses of Int as a datatype is hard coded into our software.
Any help would be appreciated.
Table looks like this and the rownum is the id column.
rowguid, uniqueidentifier,
,contactid, float,
,RowNum, int,
,type, int,
,createdby, varchar(30),
,assignedto, varchar(30),
,createddate, int,
,modifieddate, int,
,startdate, int,
,duedate, int,
,completedate, int,
,duration, int,
,tickler, int,
,priority, smallint,
,therule, int,
,status, int,
,private, tinyint,
,flags, int,
,subject, int,
,notes, text