Apache NiFi: InferAvroSchema infers signed values as string - hive

I'm setting up a pipeline in NiFi where I get JSON records which I then use to make a request to an API. The response I get would have both numeric and textual data. I then have to write this data to Hive. I use InferAvroSchema to infer the schema. Some numeric values are signed values like -2.46,-0.1 While inferring the type, the processor considers them as string instead of double or float or decimal type.
I know we can hard code our AVRO schema in the processors but I thought making it more dynamic by utilizing the InferAvroSchema would be even better. Is there any other way we can overcome/resolve this?

InferAvroSchema is good for guessing an initial schema, but once you need something more specific it is better to remove InferAvroSchema and provide the exact schema you need.

Related

HANA: Unknown Characters in Database column of datatype BLOB

I need help on how to resolve characters of unknown type from a database field into a readable format, because I need to overwrite this value on database level with another valid value (in the exact format the application stores it in) to automate system copy acitvities.
I have a proprietary application that also allows users to configure it in via the frontend. This configuration data gets stored in a table and the values of a configuration property are stored in a column of type "BLOB". For the here desired value, I provide a valid URL in the application frontend (like http://myserver:8080). However, what gets stored in the database is not readable (some square characters). I tried all sorts of conversion functions of HANA (HEX, binary), simple, and in a cascaded way (e.g. first to binary, then to varchar) to make it readable. Also, I tried it the other way around and make the value that I want to insert appear in the correct format (conversion to BLOL over hex or binary) but this does not work either. I copied the value to clipboard and compared it to all sorts of character set tables (although I am not sure if this can work at all).
My conversion tries look somewhat like this:
SELECT TO_ALPHANUM('') FROM DUMMY;
while the brackets would contain the characters in question. I cant even print them here.
How can one approach this and maybe find out the character set that is used by this application? I would be grateful for some more ideas.
What you have in your BLOB column is a series of bytes. As you mentioned, these bytes have been written by an application that uses an unknown character set.
In order to interpret those bytes correctly, you need to know the character set as this is literally the mapping of bytes to characters or character identifiers (e.g. code points in UTF).
Now, HANA doesn't come with a whole lot of options to work on LOB data in the first place and for C(haracter)LOB data most manipulations implicitly perform a conversion to a string data type.
So, what I would recommend is to write a custom application that is able to read out the BLOB bytes and perform the conversion in that custom app. Once successfully converted into a string you can store the data in a new NVCLOB field that keeps it in UTF-8 encoding.
You will have to know the character set in the first place, though. No way around that.
I assume you are on Oracle. You can convert BLOB to CLOB as described here.
http://www.dba-oracle.com/t_convert_blob_to_clob_script.htm
In case of your example try this query:
select UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(<your_blob_value)) from dual;
Obviously this only works for values below 32767 characters.

Purpose of Json schema file while loading data into Big query from a csv file

Can someone please help me by stating the purpose of providing the json schema file while loading a file to BQtable using bq command. what are the advantages?
Dose this file help to maintain data integrity by avoiding any column swap ?
Regards,
Sreekanth
Specifying a JSON schema--instead of relying on auto-detect--means that you are ensured to get the expected types for each column being loaded. If you have data that looks like this, for example:
1,'foo',true
2,'bar',false
3,'baz',true
Schema auto-detection would infer that the type of the first column is an INTEGER (a.k.a. INT64). Maybe you plan to load more data in the future, though, that looks like this:
3.14,'foo',true
1.59,'bar',false
-2.001,'baz',true
In that case, you probably want the first column to have type FLOAT (a.k.a. FLOAT64) instead. If you provide a schema when you load the first file, you can specify a type of FLOAT for that column explicitly.

Phalcon: Convert MySQL data types to PHP data types and vice-versa

What is the best way to handle datatype conversion between MySQL and PHP while using Phalcon models. When a datetime field is retrieved from MySQL, it is converted to a string which I want to automatically convert to datetime. Similarly for MySQL decimal fields, I want to convert the value to a custom Decimal field.
So, where exactly does this datatype conversion happen? OR if it does not, what's the best way to achieve this kind of data conversion? I went through the documentation but couldn't find anything relevant to this.
Any help is highly appreciated.
There are two ways to handle this that I know of.
One is using model annotations to describe metadata:
http://docs.phalconphp.com/en/latest/reference/models.html#annotations-strategy
This will solve your issue with decimals but not with datetime it sounds like.
The other is by using an afterFetch hook to mutate the model:
http://docs.phalconphp.com/en/latest/reference/models.html#initializing-preparing-fetched-records

Using hsqldb as a key-value store

I would like to use hsqldb as a simple key-value store, where both the key and the value are strings.
The value would be a JSON of some data, say no more than 10K in size.
The type of the value column is LONGVARCHAR.
I would like to know whether this type is suitable for this purpose.
P.S.
A bit of background. We wanted to use MongoDB or CouchDB, but the latest MongoDB does not support Windows XP and the latest CouchDB does not support Windows 32 bits, both of which is a requirement. Using a DB like Cassandra seems like an enormous overkill in our case.
If the values are already in the UTF-8 or other 8 bit encoding form, you can use BLOB or VARBINARY. Otherwise, use CLOB or VARCHAR for Unicode characters. Both forms are suitable for up to 10K values. Note LONGVARCHAR is simply a long VARCHAR.
If speed of access is essential, you can test with both types and decide which one is the best for your data. The same access API can be used for BLOB/VARBINARY or CLOB/VARCHAR when the values are relatively small (10k).

Dynamic DB storage based on unknown result from source

I have a source of data that I get from a webService. I can never know when it'll change and I need to store it in a DB as soon as I get it. What is the best way to make the storage solution adapt to what I put there. I am using mySQL. Would serialization be the key?
I would store the context in a column using the TEXT data type, and consider MEDIUMTEXT or LONGTEXT if the content is over 4000 characters. MySQL 5.1 has XML functionality to get values out of the XML payload...
Ideally, I'd consume the webservice and populate tables appropriately.