Handling Null DataType - apache-pig

I'm using the Over function from Piggybank to get the Lag of a row
res= foreach (group table by fieldA) {
Aord = order table by fieldB;
generate flatten(Stitch(Aord, Over(Aord.fieldB, 'lag'))) as (fieldA,fieldB,lag_fieldB) ;}
This works correctly and when I do a dump I get the expected result, the problem is when I want to use lag_fieldB for any comparison or transformation I get datatype issues.
If I do a describe it returns fieldA: long,fieldB: chararray,lag_fieldB: NULL
I'm new with PIG but I already tried casting to chararray and using ToString() and I keep getting errors like these:
ERROR 1052: Cannot cast bytearray to chararray
ERROR 1051: Cannot cast to bytearray
Thanks for your help

Ok after some looking around into the code of the Over function I found that you can instantiate the Over class to set the return type. What worked for me was:
DEFINE ChOver org.apache.pig.piggybank.evaluation.Over('chararray');
res= foreach (group table by fieldA) {
Aord = order table by fieldB;
generate flatten(Stitch(Aord, ChOver(Aord.fieldB, 'lag'))) as (fieldA,fieldB,lag_fieldB) ;}
Now the describe is telling me
fieldA: long,fieldB: chararray,lag_fieldB: chararray
And I'm able to use the columns as expected, hope this can save some time for someone else.

Related

How to resolve this sql error of schema_of_json

I need to find out the schema of a given JSON file, I see sql has schema_of_json function
and something like this works flawlessly
> SELECT schema_of_json('[{"col":0}]');
ARRAY<STRUCT<`col`: BIGINT>>
But if I query for my table name, it gives me the following error
>SELECT schema_of_json(Transaction) as json_data from table_name;
Error in SQL statement: AnalysisException: cannot resolve 'schemaofjson(`Transaction`)' due to data type mismatch: The input json should be a string literal and not null; however, got `Transaction`.; line 1 pos 7;
The Transaction is one of the columns in my table and after checking it manually I can attest that it is of String type(json).
The SQL statement has it to give me the schema of the JSON, how to do it?
after looking further into the documentation that it is clear that the word foldable means that of the static one, and a column from a table JSON won't work
for minimal reroducible example here you go:
SELECT schema_of_json(CAST('{ "a": "b" }' AS STRING))
As soon as the cast is introduced in the above statement, the schema_of_json will fail......... It needs a static JSON as it's input

How to read each row in a groovy-sql statement?

I am trying to read a table having five rows and columns. I have used sql.eachRow function to read eachRow and assign the value to a String. I am getting an error "Groovy:[Static type checking] - No such property: MachineName for class: java.lang.Object"
My code:
sql.eachRow('select * from [MACHINES] WHERE UpdateTime > :lastTimeRead, [lastTimeRead: Long.parseLong(lastTimeRead)])
{ row ->
def read = row.MachineName
}
but MachineName is my column name. How can i overcome this error.
Using dynamic Properties with static type checking is not possible*.
However, eachRow will pass a GroovyResultSet as first parameter to the Closure. This means that row has the type GroovyResultSet and so you can access the value using getAt
row.getAt('MachineName')
should work. In groovy you can also use the []-operator:
row['MachineName']
which is equivalent to the first solution.
*) without a type checking extension.
If you Know the Column name you can just use the Below.
"$row.MachineName"
But if you don't Know column name or having some issue, still it can be accessed using an array of Select.
sql.eachRow('select * from [MACHINES] WHERE UpdateTime > :lastTimeRead, [lastTimeRead: Long.parseLong(lastTimeRead)])
{ row->
log.info "First value = ${row[0]}, next value = ${row[1]}"
}

Setting a Clob value in a native query

Oracle DB.
Spring JPA using Hibernate.
I am having difficulty inserting a Clob value into a native sql query.
The code calling the query is as follows:
#SuppressWarnings("unchecked")
public List<Object[]> findQueryColumnsByNativeQuery(String queryString, Map<String, Object> namedParameters)
{
List<Object[]> result = null;
final Query query = em.createNativeQuery(queryString);
if (namedParameters != null)
{
Set<String> keys = namedParameters.keySet();
for (String key : keys)
{
final Object value = namedParameters.get(key);
query.setParameter(key, value);
}
}
query.setHint(QueryHints.HINT_READONLY, Boolean.TRUE);
result = query.getResultList();
return result;
}
The query string is of the format
SELECT COUNT ( DISTINCT ( <column> ) ) FROM <Table> c where (exact ( <column> , (:clobValue), null ) = 1 )
where "(exact ( , (:clobValue), null ) = 1 )" is a function and "clobValue" is a Clob.
I can adjust the query to work as follows:
SELECT COUNT ( DISTINCT ( <column> ) ) FROM <Table> c where (exact ( <column> , to_clob((:stringValue)), null ) = 1 )
where "stringValue" is a String but obviously this only works up to the max sql string size (4000) and I need to pass in much more than that.
I have tried to pass the Clob value as a java.sql.Clob using the method
final Clob clobValue = org.hibernate.engine.jdbc.ClobProxy.generateProxy(stringValue);
This results in a java.io.NotSerializableException: org.hibernate.engine.jdbc.ClobProxy
I have tried to Serialize the Clob using
final Clob clob = org.hibernate.engine.jdbc.ClobProxy.generateProxy(stringValue);
final Clob clobValue = SerializableClobProxy.generateProxy(clob);
But this appears to provide the wrong type of argument to the "exact" function resulting in (org.hibernate.engine.jdbc.spi.SqlExceptionHelper:144) - SQL Error: 29900, SQLState: 99999
(org.hibernate.engine.jdbc.spi.SqlExceptionHelper:146) - ORA-29900: operator binding does not exist
ORA-06553: PLS-306: wrong number or types of arguments in call to 'EXACT'
After reading some post about using Clobs with entities I have tried passing in a byte[] but this also provides the wrong argument type (org.hibernate.engine.jdbc.spi.SqlExceptionHelper:144) - SQL Error: 29900, SQLState: 99999
(org.hibernate.engine.jdbc.spi.SqlExceptionHelper:146) - ORA-29900: operator binding does not exist
ORA-06553: PLS-306: wrong number or types of arguments in call to 'EXACT'
I can also just pass in the value as a String as long as it doesn't break the max string value
I have seen a post (Using function in where clause with clob parameter) which seems to suggest that the only way is to use "plain old JDBC". This is not an option.
I am up against a hard deadline so any help is very welcome.
I'm afraid your assumptions about CLOBs in Oracle are wrong. In Oracle CLOB locator is something like a file handle. And such handle can be created by the database only. So you can not simply pass CLOB as bind variable. CLOB must be somehow related to database storage, because this it can occupy up to 176TB and something like that can not be held in Java Heap.
So the usual approach is to call either DB functions empty_clob() or dbms_lob.create_temporary (in some form). Then you get a clob from database even if you think it is "IN" parameter. Then you can write as many data as you want into that locator (handle, CLOB) and then you can use this CLOB as a parameter for a query.
If you do not follow this pattern, your code will not work. It does not matter whether you use JPA, SpringBatch or plan JDBC. This constrain is given by the database.
It seems that it's required to set type of parameter explicitly for Hibernate in such cases. The following code worked for me:
Clob clob = entityManager
.unwrap(Session.class)
.getLobHelper()
.createClob(reader, length);
int inserted = entityManager
.unwrap(org.hibernate.Session.class)
.createSQLQuery("INSERT INTO EXAMPLE ( UUID, TYPE, DATA) VALUES (:uuid, :type, :data)")
.setParameter("uuid", java.util.Uuid.randomUUID(), org.hibernate.type.UUIDBinaryType.INSTANCE)
.setParameter("type", java.util.Uuid.randomUUID(), org.hibernate.type.StringType.INSTANCE)
.setParameter("data", clob, org.hibernate.type.ClobType.INSTANCE)
.executeUpdate();
Similar workaround is available for Blob.
THE ANSWER: Thank you both for your answers. I should have updated this when i solved the issue some time ago. In the end I used JDBC and the problem disappeared in a puff of smoke!

Convesion from Hive to PigLatin

I am trying to convert the below Hive statement to Pig:
max(substr(case when url like 'http:%' then '' else url end,1,50))
My pig statement for the above is:
url_group = GROUP data by (uid);
max_substr_url= FOREACH url_group generate SUBSTRING(MAX(((Coalesce(data.url) matches '.*http:%.*') ? '' : Coalesce(data.url))), 0, 49);
For some of the data, the url can be null. So I have written a pig UDF called Coalesce(String) which returns an empty string if the data is either null or empty. If the data is not null or not empty it returns the string back.
The above pig statement is giving me lot of trouble and tried n different options/ways but nothing worked. Anyone got any ideas on how to implement this? Please help me.
Thanks in advance
You are going to want to use a nested FOREACH so that you can do the substring transformation on each tuple in the data bag then take the MAX of the transformed bag.
A = GROUP data by (uid);
B = FOREACH url_group {
-- MAX needs a one column bag
transformed = FOREACH data
GENERATE SUBSTRING((Coalesce(url) matches '.*http:.*' ? '' : Coalesce(url)), 0, 49);
GENERATE group AS uid, MAX(transformed) ;
}

NHibernate Like with integer

I have a NHibernate search function where I receive integers and want to return results where at least the beginning coincides with the integers, e.g.
received integer: 729
returns: 729445, 7291 etc.
The database column is of type int, as is the property "Id" of Foo.
But
int id = 729;
var criteria = session.CreateCriteria(typeof(Foo))
criteria.Add(NHibernate.Criterion.Expression.InsensitiveLike("Id", id.ToString() + "%"));
return criteria.List<Foo>();
does result in an error (Could not convert parameter string to int32). Is there something wrong in the code, a work around, or other solution?
How about this:
int id = 729;
var criteria = session.CreateCriteria(typeof(Foo))
criteria.Add(Expression.Like(Projections.Cast(NHibernateUtil.String, Projections.Property("Id")), id.ToString(), MatchMode.Anywhere));
return criteria.List<Foo>();
Have you tried something like this:
int id = 729;
var criteria = session.CreateCriteria(typeof(Foo))
criteria.Add(NHibernate.Criterion.Expression.Like(Projections.SqlFunction("to_char", NHibernate.NHibernateUtil.String, Projections.Property("Id")), id.ToString() + "%"));
return criteria.List<Foo>();
The idea is convert the column before using a to_char function. Some databases do this automatically.
AFAIK, you'll need to store your integer as a string in the database if you want to use the built in NHibernate functionality for this (I would recommend this approach even without NHibernate - the minute you start doing 'like' searches you are dealing with a string, not a number - think US Zip Codes, etc...).
You could also do it mathematically in a database-specific function (or convert to a string as described in Thiago Azevedo's answer), but I imagine these options would be significantly slower, and also have potential to tie you to a specific database.