Hive simple Regular expression

Hive simple Regular expression - sql

I am trying to check if all data within in a column is having a valid date.
create table dates (tm string, dt string) row format delimited fields terminated by '\t'
date.txt(sample data)
20181205 15
20171023 23
20170516 16
load data local inpath 'dates.txt' overwrite into table dates;
create temporary macro isitDate(s string)
case when regexp_extract(s,'((0[1-9]|[12][0-9]|3[01])',0) = ''
then false
else true
end;
select * from dates where isitDate(dt);
But select statement is giving below error-
Failed with exception
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to execute method public java.lang.String
org.apache.hadoop.hive.ql.udf.UDFRegExpExtract.evaluate(java.lang.String,java.lang.String,java.lang.Integer)
on object org.apache.hadoop.hive.ql.udf.UDFRegExpExtract#66b45e1e of
class org.apache.hadoop.hive.ql.udf.UDFRegExpExtract with arguments
{15:java.lang.String, ((0[1-9]|[12][0-9]|3[01]):java.lang.String,
0:java.lang.Integer} of size 3
Is there something wrong with my regular expression.

made a stupid mistake, there is one extra opening bracket in macro

Related

How to resolve this sql error of schema_of_json

I need to find out the schema of a given JSON file, I see sql has schema_of_json function
and something like this works flawlessly
> SELECT schema_of_json('[{"col":0}]');
ARRAY<STRUCT<`col`: BIGINT>>
But if I query for my table name, it gives me the following error
>SELECT schema_of_json(Transaction) as json_data from table_name;
Error in SQL statement: AnalysisException: cannot resolve 'schemaofjson(`Transaction`)' due to data type mismatch: The input json should be a string literal and not null; however, got `Transaction`.; line 1 pos 7;
The Transaction is one of the columns in my table and after checking it manually I can attest that it is of String type(json).
The SQL statement has it to give me the schema of the JSON, how to do it?

after looking further into the documentation that it is clear that the word foldable means that of the static one, and a column from a table JSON won't work
for minimal reroducible example here you go:
SELECT schema_of_json(CAST('{ "a": "b" }' AS STRING))
As soon as the cast is introduced in the above statement, the schema_of_json will fail......... It needs a static JSON as it's input

Error message when try to run a function in function

I create two functions that worked well as a stand alone functions.
But when I try to run one of them inside the second one I got an error:
'SQL compilation error: Unsupported subquery type cannot be evaluated'
Can you advise?
Adding code bellow:
CASE
WHEN (regexp_substr(ARGUMENTS_JSON,'drives_removed":\\[]') = 'drives_removed":[]') THEN **priceperitem(1998, returnitem_add_item(ARGUMENTS_JSON))**
WHEN (regexp_substr(ARGUMENTS_JSON,'drives_added":\\[\\]') = 'drives_added":[]') THEN 1
WHEN (regexp_substr(ARGUMENTS_JSON,'from_flavor') = 'from_flavor') THEN 1
WHEN (regexp_substr(ARGUMENTS_JSON,'{}') = '{}') THEN 1
ELSE 'Other'
END as Price_List,
The problem happened when with function 'priceperitem'
If I replace returnitem_add_item with a string then it will work fine.
Function 1 : priceperitem get customer number and a item return pricing per the item from customer pricing list
Function 2 : returnitem_add_item parsing string and return a string

As an alternative, you may try processing udf within a udf in the following way :
create function x()
returns integer
as
$$
select 1+2
$$
;
set q=x();
create function pi_udf(q integer)
returns integer
as
$$
select q*3
$$
;
select pi_udf($q);
If this does not work out as well, the issue might be data specific. Try executing the function with one record, and then add more records to see the difference in the behavior. There are certain limitations on using SQL UDF's in Snowflake and hence the 'Unsupported Subquery' Error.

apache hive loads null values instead of intergers

I am new to apache hive and was running queries on sample data which is saved in a csv file as below:
0195153448;"Classical Mythology";"Mark P. O. Morford";"2002";"Oxford University Press";"//images.amazon.com/images/P/0195153448.01.THUMBZZZ.jpg";"http://images.amazon.com/images/P/0195153448.01.MZZZZZZZ.jpg";"images.amazon.com/images/P/0195153448.01.LZZZZZZZ.jpg"
and the table which i created is of form
hive> describe book;
OK
isbn bigint
title string
author string
year string
publ string
img1 string
img2 string
img3 string
Time taken: 0.085 seconds, Fetched: 8 row(s)
and the script which I used to create the table is:
create table book(isbn int,title string,author string, year string,publ string,img1 string,img2 string,img3 string) row format delimited fields terminated by '\;' lines terminated by '\n' location 'path';
When I try to retrieve the data from the table by using the following query:
select *from book limit 1;
I get the following result:
NULL "Classical Mythology" "Mark P. O. Morford" "2002" "Oxford University Press" "http://images.amazon.com/images/P/0195153448.01.THUMBZZZ.jpg" "images.amazon.com/images/P/0195153448.01.MZZZZZZZ.jpg" "images.amazon.com/images/P/0195153448.01.LZZZZZZZ.jpg"
Even though I specify the first column type as int or bigint the data into the table is getting loaded as NULL.
I tried searching on the internet and could figure out that I have to specify the row delimiter. I used that too but no change in the data from the table.
Is there anything that I am making a mistake... Please help.

SparkSQL errors when using SQL DATE function

In Spark I am trying to execute SQL queries on a temporary table derived from a data frame that I manually built by reading a csv file and converting the columns into the right data type.
Specifically, the table I'm talking about is the LINEITEM table from [TPC-H specification][1]. Unlike stated in the specification I am using TIMESTAMP rather than DATE because I've read that Spark does not support the DATE type.
In my single scala source file, after creating the data frame and registering a temporary table called "lineitem", I am trying to execute the following query:
val res = sqlContext.sql("SELECT * FROM lineitem l WHERE date(l.shipdate) <= date('1998-12-01 00:00:00');")
When I submit the packaged jar using spark-submit, I get the following error:
Exception in thread "main" java.lang.RuntimeException: [1.75] failure: ``union'' expected but but `;' found
When I omit the semicolon and do the same thing, I get the following error:
Exception in thread "main" java.util.NoSuchElementException: key not found: date
Spark version is 1.4.0.
Does anyone have an idea what's the problem with these queries?
[1] http://www.tpc.org/TPC_Documents_Current_Versions/pdf/tpch2.17.1.pdf

SQL queries passed to SQLContext.sql shouldn't be delimited using semicolon - this the source of your first problem
DATE UDF expects date in the YYYY-MM-DD form and DATE('1998-12-01 00:00:00') evaluates to null. As long as timestamp can be casted to DATE correct query string looks like this:
"SELECT * FROM lineitem l WHERE date(l.shipdate) <= date('1998-12-01')"
DATE is a Hive UDF. It means you have to use HiveContext not a standard SQLContext - this is the source of your second problem.
import org.apache.spark.sql.hive.HiveContext
val sqlContext = new HiveContext(sc) // where sc is a SparkContext
In Spark >= 1.5 it is also possible to use to_date function:
import org.apache.spark.sql.functions.{lit, to_date}
df.where(to_date($"shipdate") <= to_date(lit("1998-12-01")))

Please try hive function CAST (expression AS toDatatype)
It changes an expression from one datatype to other
e.g. CAST ('2016-06-17 00.00.000' AS DATE) will convert String to Date
In your case
val res = sqlContext.sql("SELECT * FROM lineitem l WHERE CAST(l.shipdate as DATE) <= CAST('1998-12-01 00:00:00' AS DATE);")
Supported datatype conversions are as listed in Hive Casting Dates

Firebird " Column does not belong to referenced table "

I am trying to create my first procedure on firebird 2.5 by using ibexpert gui.
The procedure will return 'PROCESS_DATE' which belongs to a specific 'PROCESS_ID'. I prepared following code:
begin
OUTPUT_DATE = (select PROCESS_DATE from PROCESSES
where PROCESS_ID = INPUT_ID);
suspend;
end
input parameter : 'INPUT_ID' --> type 'INTEGER'
output parameter : 'OUTPUT_DATE' --> type 'DATE'
But when I tried to compile it returns this error:
Column does not belong to referenced table.
Dynamic SQL Error.
SQL error code = -206.
Column unknown.
INPUT_ID.
At line 9, column 48.
I do not know how to deal with this error.
I tried to find solutions on other questions also the internet but i couldn't find a basic, understandable answer for beginners. thanks for helps.

Try this:
CREATE PROCEDURE MyP (INPUT_ID INTEGER)
RETURNS (OUTPUT_DATE DATE)
AS
BEGIN
FOR
SELECT PROCESS_DATE FROM PROCESSES
WHERE PROCESS_ID = :INPUT_ID
INTO :OUTPUT_DATE
DO
SUSPEND;
END
Always prepend parameter names with ":". The only place where ":" is not allowed is at the left side of "=" operator.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas