I have created hive external table on a file written inparquet using thrift schema. Which Serde do we need to pass to read this file?
Related
Is there any way we can pass the Avro schema file through BQ CLI? Json is an option, but that's an overhead to first convert Avro schema file to JSON.
Is it possible to read hive table (or HDFS data in parquet format) in Streamsets Data collector? I don't want to use Transformer for this.
Reading the raw files in parquet is counter to the way that data collector works so that would be a better use case for transformer.
But I have successfully used the jdbc origin either from Impala or hive to achieve this, there are some additional hurdles to jump with the jdbc source.
I have orc files and their schema i have tried loading this orc files in local hive and its working fine, now I will generate multiple orc files and need to load this orc files to hive table using nifi put hive streamming processor ?
PutHiveStreaming expects incoming flow files to be in Avro format. If you are using PutHive3Streaming you have more flexibility but it doesn't accept flow files in ORC format; instead both of those processors convert the input into ORC and write it into a managed table in Hive.
If your files are already in ORC format, you can use PutHDFS to place them directly into HDFS. If you don't have permissions to write directly into a managed table location, you could write to a temporary location, create an external table on top of it, and then load from there into the managed table using INSERT INTO myTable FROM SELECT * FROM externalTable or whatever.
We are facing the following issue: we use hive 1.2.x to write orc files, it is a known problem, that hive before version 2.x does not write the orc column names into the orc file (it writes only col_0,col_1,etc.).
We would like to use an other application which reads the schema from the orc file and can not connect to hcat metastore for the correct column names. Unfortunately we do not have a chance to upgrade to 2.x version of hive.
Is there any solution to "append" or replace the correct column names into these exitsing orc files? Thanks in advnace for you help.
I'm currently importing from Mysql into HDFS using Sqoop in avro format, this works great. However what's the best way to load these files into HIVE?
Since avro files contain the schema I can pull the files down to the local file system, use avro tools and create the table with the extracted schema but this seems excessive?
Also if a column is dropped from a table in mysql can I still load the old files into a new HIVE table created with the new avro schema (dropped column missing)?
After version 9.1, Hive has come packaged with an Avro Hive SerDe. This allows Hive to read from Avro files directly while Avro still "owns" the schema.
For you second question, you can define the Avro schema with column defaults. When you add a new column just make sure to specify a default, and all your old Avro files will work just find in a new Hive table.
To get started, you can find the documentation here and the book Programming Hive (available on Safari Books Online) has a section on the Avro HiveSerde which you might find more readable.