HIVE: cannot recognize input near 'distinct' '(' - hive

I am trying to execute the below query in Hive:
SELECT
regexp_replace('2016-08-05_11:29:46', '\\_', ' ') as tmstmp,
distinct(P.name)
FROM table P;
It throws an exception saying cannot recognize input near 'distinct' '(' 'P' in selection target.
where as when I run the query interchanging the columns like:
SELECT
distinct(P.name),
regexp_replace('2016-08-05_11:29:46', '\\_', ' ') as tmstmp
FROM table P;
It works fine. Any idea on the issue ?

To my knowledge, This is a restriction imposed by hive in select syntax.
As per the Select syntax in hive language manual , DISTINCT should come first in order followed by other expressions.
Reference:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select
I guess the reason being DISTINCT is a row level operation (even if its specified as function call for a column) and specifically in hive it will be a mapreduce operation.
The similar behavior could be observed in SQL ANSI standard supported database engines like Mysql as well.

Related

Using Regex to determine what kind of SQL statement a row is from a list?

I have a large list of SQL commands such as
SELECT * FROM TEST_TABLE
INSERT .....
UPDATE .....
SELECT * FROM ....
etc. My goal is to parse this list into a set of results so that I can easily determine a good count of how many of these statements are SELECT statements, how many are UPDATES, etc.
so I would be looking at a result set such as
SELECT 2
INSERT 1
UPDATE 1
...
I figured I could do this with Regex, but I'm a bit lost other than simply looking at everything string and comparing against 'SELECT' as a prefix, but this can run into multiple issues. Is there any other way to format this using REGEX?
You can add the SQL statements to a table and run them through a SQL query. If the SQL text is in a column called SQL_TEXT, you can get the SQL command type using this:
upper(regexp_substr(trim(regexp_replace(SQL_TEXT, '\\s', ' ')),
'^([\\w\\-]+)')) as COMMAND_TYPE
You'll need to do some clean up to create a column that indicates the type of statement you have. The rest is just basic aggregation
with cte as
(select *, trim(lower(split_part(regexp_replace(col, '\\s', ' '),' ',1))) as statement
from t)
select statement, count(*) as freq
from cte
group by statement;
SQL is a language and needs a parser to turn it from text into a structure. Regular expressions can only do part of the work (such as lexing).
Regular Expression Vs. String Parsing
You will have to limit your ambition if you want to restrict yourself to using regular expressions.
Still you can get some distance if you so want. A quick search found this random example of tokenizing MySQL SQL statements using regex https://swanhart.livejournal.com/130191.html

MariaDB to calculate table reference in select query

I have a quite dumb client application that wants to get information from MariaDB based on a parameter.
This parameter is a string that contains spaces, like 'valid parameter'.
In MariaDB there is a table for each of the possible string values, and the table name is the string value after spaces have been replaced by underscores and a prefix is added. So I can perform the necessary conversion like this:
SELECT CONCAT('prefix_', REPLACE('valid parameter',' ','_'));
Now the result 'prefix_valid_parameter' is the table to query, so actually I need to fire off
SELECT * from CONCAT('prefix_', REPLACE('valid parameter',' ','_'));
but MariaDB responds with
You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '('prefix_', REPLACE('valid parameter',' ','_'))' at line 1
I would have expected either the content of table 'prefix_valid_parameter' or an error stating "table 'prefix_valid_parameter' not found". How can I make the table_reference part of my SQL SELECT statement dynamic?
You need to use dynamic SQL:
set #sql = 'select * from [table]';
execute immediate replace(#sql, '[table]',
concat('prefix_', replace('valid parameter', ' ', '_'))
);
I should add that the need to do this perhaps suggests a flaw in your data model. If the tables have the same structure, it is better to put all the rows in a single table.

"Syntax error at or near ' , '" while trying to SELECT INTO

The query for selecting multiple values and assigning to multiple variables in a single SELECT query leads to an error. My Postgres version is 9.5.
The query is:
SELECT INTO region_id ,doc_type,tax_amt fk_bint_supplier_tax_region_id,chr_supporting_document_type,
dbl_base_currency_client_net-dbl_base_currency_market_fare-dbl_base_currency_cc_charge_collected+
dbl_base_currency_vat_in+dbl_base_currency_cc_charge_collected+(19*(dbl_base_currency_tax))*5/10
FROM tbl_sales_details WHERE chr_document_status='N' AND vchr_document_no='INV/47922/01/18'
AND vchr_supporting_document_no='5111143004'
The error is:
ERROR: syntax error at or near ","
LINE 1: SELECT INTO region_id ,doc_type,tax_amt fk_bint_supplier_ta...
^
********** Error **********
ERROR: syntax error at or near ","
SQL state: 42601
SELECT INTO in PL/pgSQL has a different meaning from
SELECT INTO in SQL. The latter is generally discouraged. The manual:
CREATE TABLE AS is functionally similar to SELECT INTO. CREATE TABLE AS
is the recommended syntax, since this form of SELECT INTO is not
available in ECPG or PL/pgSQL, because they interpret the INTO clause
differently. Furthermore, CREATE TABLE AS offers a superset of the
functionality provided by SELECT INTO.
The error message indicates you tried to run the statement as plain SQL.
There's nothing wrong with your placement of the INTO clause when used in PL/pgSQL like you tagged. You also stated that it's for:
assigning to multiple variables
That, too, only makes sense inside procedural language code as there are no variable assignments in plain SQL.
Related:
SELECT INTO with more than one attribution
You put the into after the column list:
SELECT region_id, doc_type,tax_amt fk_bint_supplier_tax_region_id, chr_supporting_document_type,
(dbl_base_currency_client_net - dbl_base_currency_market_fare -
dbl_base_currency_cc_charge_collected +
dbl_base_currency_vat_in + dbl_base_currency_cc_charge_collected + 19 * dbl_base_currency_tax
) * 5/10
INTO . . .
FROM tbl_sales_details
WHERE chr_document_status = 'N' AND
vchr_document_no = 'INV/47922/01/18' AND
vchr_supporting_document_no = '5111143004';
I don't know what the variable names are, but the go after the INTO and there must be one for each expression in the SELECT.

Error:cannot recognize input near 'minus' 'SELECT' '*' in table source while running in spark-sql

Can anyone tell me how to resolve the below issue in spark-sql.
SELECT * FROM encrypted_im.abc
MINUS
SELECT * FROM encrypted_im.abc;
Error:cannot recognize input near 'minus' 'SELECT' '*' in table source
SELECT store_num,store_nm FROM encrypted_im.base_abc
minus
SELECT store_num,store_nm FROM encrypted_im.base_abc;
Error:cannot recognize input near 'minus' 'SELECT' 'store_num' in table source;
Could you try EXCEPT instead of MINUS as following:
SELECT * FROM encrypted_im.abc
EXCEPT
SELECT * FROM encrypted_im.abc;
I think that MINUS doesn't exist in SparkSql, SparkSql generally follows Hive style, so you can refer to Hive Syntax. here the supported and unsupported Hive features by SparkSql.

Hive syntax -- Comparision in Case when

I am using Hive to do a comparison in CASE WHEN THEN statement . Can u please check whether my syntax is correct.
${hiveconf:Test Metric} METRIC_ID,
CASE
WHEN ((A.X,A.Y,A.Z)IN (SELECT X,Y,Z FROM HIVE_TPCE_TEMP.TESTTABLE))
THEN CASE
WHEN MODE IN ('A','N')
THEN ${
hiveconf:SOME_CONSTANT ELSE ${hiveconf: SOME_CONSTANT
}
END
I'm guessing your snippet of code is from the SELECT clause of your query? According to the Hive Language Manual: "Hive supports subqueries only in the FROM clause".
Your CASE WHEN statement includes a subquery. Seems like that is not supported, so your syntax is not correct (in Hive).