String interpolation in HiveQL

String interpolation in HiveQL - amazon-s3

I'm trying to create an external table that routes to an S3 bucket. I want to group all the tables by date, so the path will be something like s3://<bucketname>/<date>/<table-name>. My current function for creating it looks something like
concat('s3://<bucket-name>/', date_format(current_date,'yyyy-MM-dd'), '/<table-name>/');
I can run this in a SELECTquery fine; however when I try to put it in my table creation statement, I get the following error:
set s3-path = concat('s3://<bucket-name>/', date_format(current_date,'yyyy-MM-dd'), '/<table-name>/');
CREATE EXTERNAL TABLE users
(id STRING,
name STRING)
STORED AS PARQUET
LOCATION ${hiveconf:s3-path};
> FAILED: ParseException line 7:9 mismatched input 'concat' expecting StringLiteral near 'LOCATION' in table location specification
Is there any way to do string interpolation or a function invocation in Hive in this context?

you can try something like this. retrieve udf results in Hive. Essentially, as workaround, you can create a script and call it from the terminal passing the parameter like a hive config.

Related

Write to a dynamic BigQuery table through Apache Beam

I am getting the BigQuery table name at runtime and I pass that name to the BigQueryIO.write operation at the end of my pipeline to write to that table.
The code that I've written for it is:
rows.apply("write to BigQuery", BigQueryIO
.writeTableRows()
.withSchema(schema)
.to("projectID:DatasetID."+tablename)
.withWriteDisposition(WriteDisposition.WRITE_TRUNCATE)
.withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED));
With this syntax I always get an error,
Exception in thread "main" java.lang.IllegalArgumentException: Table reference is not in [project_id]:[dataset_id].[table_id] format
How to pass the table name with the correct format when I don't know before hand which table it should put the data in? Any suggestions?
Thank You

Very late to the party on this however.
I suspect the issue is you were passing in a string not a table reference.
If you created a table reference I suspect you'd have no issues with the above code.
com.google.api.services.bigquery.model.TableReference table = new TableReference()
.setProjectId(projectID)
.setDatasetId(DatasetID)
.setTableId(tablename);
rows.apply("write to BigQuery", BigQueryIO
.writeTableRows()
.withSchema(schema)
.to(table)
.withWriteDisposition(WriteDisposition.WRITE_TRUNCATE)
.withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED));

SnappyData - Error creating Kafka streaming table

I'm seeing an issue when creating a spark streaming table using kafka from the snappy shell.
'The exception 'Invalid input 'C', expected dmlOperation, insert, withIdentifier, select or put (line 1, column 1):'
Reference: http://snappydatainc.github.io/snappydata/streamingWithSQL/#spark-streaming-overview
Here is my sql:
CREATE STREAM TABLE if not exists sensor_data_stream
(sensor_id string, metric string)
using kafka_stream
options (
storagelevel 'MEMORY_AND_DISK_SER_2',
rowConverter 'io.snappydata.app.streaming.KafkaStreamToRowsConverter',
zkQuorum 'localhost:2181',
groupId 'streamConsumer',
topics 'test:01');
The shell seems to not like the script at the first character 'C'. I'm attempting to execute the script using the following command:
snappy> run '/scripts/my_test_sensor_script.sql';
any help appreciated!

There is some inconsistency in documentation and actual syntax.The correct syntax is:
CREATE STREAM TABLE sensor_data_stream if not exists (sensor_id string,
metric string) using kafka_stream
options (storagelevel 'MEMORY_AND_DISK_SER_2',
rowConverter 'io.snappydata.app.streaming.KafkaStreamToRowsConverter',
zkQuorum 'localhost:2181',
groupId 'streamConsumer', topics 'test:01');
One more thing you need to do is to write row converter for your data

Mike, You need to create your own rowConverter class by implementing following trait -
trait StreamToRowsConverter extends Serializable {
def toRows(message: Any): Seq[Row]
}
and then specify that rowConverter fully qualified class name in the DDL.
The rowConverter is specific to a schema.
'io.snappydata.app.streaming.KafkaStreamToRowsConverter' is just an placeholder class name, which should be replaced by your own rowConverter class.

Tokenize Function in Hive

I am trying to follow this example where the term frequency and inverse document frequency is calculated in Hive:https://github.com/myui/hivemall/wiki/TFIDF-calculation
I have a table called pigoutputhive where I have the following fields:
The 'body' column contains a string of words [a-z A-Z & 0-9 only] separated by spaces.
I would like to tokenize the body so that I can generate a relation with a owneruserid and body tuple in order to perform the TF-IDF algorithm.
I am receiving an error relating the the tokenize function, can anyone tell me where I am going wrong?
My error is as follows: Error while compiling statement: FAILED: SemanticException [Error 10011]: Line 8:37 Invalid function 'tokenize' [ERROR_STATUS]
create or replace view pigoutputhive_exploded
as
select
owneruserid,
body,
score
from
pigoutputhive LATERAL VIEW explode(tokenize(body,true)) t as word
where
not is_stopword(word);

The tokenize function is a Hivemall extension to Hive.
So, you need to install Hivemall first.
See the following page for loading Hivemall functions into Hive.
https://github.com/myui/hivemall/wiki/Installation

Tokenize does not work in Hive and had to use sentences() function.

How do I use stored geometries in the Spatial functions?

I've been trying to evaluate the use of OrientDB for our spatial data.
I'm using the following version:
OrientDB: orientdb-community-2.2.0-20160217.214325-39
OrientDB-Spatial: JAR built from develop branch of the github Repo OS:
Win7 64Bit
Right now what I was to do, is if I have polygons stored in the db, and the input is a location (latitude & longitude) then I need to get the polygon which contains that location.
I created a Class to store the State polygons like this:
CREATE class state
CREATE PROPERTY state.name STRING
CREATE PROPERTY state.shape EMBEDDED OPolygon
I Inserted a State with the following command:
INSERT INTO state SET name = 'Center', shape = ST_GeomFromText('POLYGON((77.16796875 26.068502530912397,75.7177734375 21.076171072527064,81.650390625 19.012137871930328,82.9248046875 25.196864372861896,77.16796875 26.068502530912397))')
I've tried several ways of getting the state which contains the given latlong, but all of them give error.
Even something as simple as:
SELECT from state WHERE ST_Contains(shape, ST_GeomFromText('POINT(77.420654296875 23.23929558106523)'))
Gives the following error:
com.orientechnologies.orient.core.sql.OCommandSQLParsingException:
Error on parsing command at position #0: Error parsing query: SELECT
from state WHERE ST_Contains(shape,
ST_GeomFromText('POINT(77.420654296875 23.23929558106523)'))
Encountered "" at line 1, column 25. Was expecting one of: Storage
URL="plocal:E:/DevTools/OrientDB2.2_new/databases/spatial" Storage
URL="plocal:E:/DevTools/OrientDB2.2_new/databases/spatial" -->
com.orientechnologies.orient.core.sql.OCommandSQLParsingException:
Encountered "" at line 1, column 25. Was expecting one of: Storage
URL="plocal:E:/DevTools/OrientDB2.2_new/databases/spatial"
I can run all the spatial functions when I enter the geometries directly in the spatial function, such as:
Select ST_Contains(ST_geomFromText('POLYGON((77.16796875 26.068502530912397,75.7177734375 21.076171072527064,81.650390625 19.012137871930328,82.9248046875 25.196864372861896,77.16796875 26.068502530912397))'), ST_GeomFromText('POINT(77.420654296875 23.23929558106523)'))
I just can't figure out how to get these function to run on shapes which are stored as properties on records.
How are the geometries which are stored, to be used in these spatial functions? Is there some other syntax for doing so?

try this
SELECT from state WHERE ST_Contains(shape, ST_GeomFromText('POINT(77.420654296875 23.23929558106523)')) = true
The syntax with where function() is not supported yet

Why can't my As400 select from a newly created member alias?

I have set up the code as described in this question.
Creating an alias works, as well as dropping it.
For members that I have created myself, this is working correctly, but for existing members I get the following error when selecting from the alias:
SQL State: 42704
Vendor Code: -204
Message: [SQL0204] MyMemberName in MyLib type *FILE not found.
Cause . . . . . : MyMemberName in
TPLWHS type *FILE was not found. If the member name is *ALL, the table
is not partitioned. If this is an ALTER TABLE statement and the type
is *N, a constraint or partition was not found. If this is not an
ALTER TABLE statement and the type is *N, a function, procedure,
trigger or sequence object was not found. If a function was not found,
MyMemberName is the service program that contains the function. The
function will not be found unless the external name and usage name
match exactly. Examine the job log for a message that gives more
details on which function name is being searched for and the name that
did not match.
Recovery . . . : Change the name and try the request
again. If the object is a node group, ensure that the DB2 Multisystem
product is installed on your system and create a nodegroup with the
CRTNODGRP CL command. If an external function was not found, be sure
that the case of the EXTERNAL NAME on the CREATE FUNCTION statement
exactly matches the case of the name exported by the service program.
Any help you can offer is much appreciated. Thanks!
EDIT: Here is my code:
create alias MyLib.MyAlias for MyLib.MyLogicalFile(MyMember);
select * from MyLib.MyAlias;
drop alias MyLib.MyAlias;
The format of Lib.Alias has worked for me when I directly created the phyiscal and logical members. Perhaps the logical file is missing? I'll double check...

This error message can indicate that the file/logical file/member does not exist.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

String interpolation in HiveQL - amazon-s3

you can try something like this. retrieve udf results in Hive. Essentially, as workaround, you can create a script and call it from the terminal passing the parameter like a hive config.

Related

Write to a dynamic BigQuery table through Apache Beam

SnappyData - Error creating Kafka streaming table

Tokenize Function in Hive

How do I use stored geometries in the Spatial functions?

Why can't my As400 select from a newly created member alias?

Categories

Resources