where can I find the avro.schema.url within the hive meta store? - hive

I am trying to locate the property avro.schema.url that is part of the table meta data when a table is created by specifying the location to a avro schema file for some avro data in s3 or hdfs. I am able to see it in the output when I run the describe extended table command, but within the metastore database, where is this property stored? I searched the table_params for that particular table_id and did not find it ?

found it, its in SERDE_PARAMS table

Related

Is there any way to get meta data from hive in hue?

I tried to get meta data in hue from hive but all failed.
Im looking for the way not using metastore in mysql.. etc or shell.
Information_schema is also not implemented in hive…
If i can get meta data, i wanna make a table about all meta data like table name, columns and type.

How to dynamically create table in Snowflake getting schema from parquet file which stored in AWS

Could you help me to load a couple of parquet files to Snowflake.
I've got about 250 parquet-files which stored in AWS stage.
250 files = 250 different tables.
I'd like to dynamically load them into Snowflake tables.
So, I need:
Get schema from parquet file... I've read that I could get the schema from parquet file using parquet-tools (Apache).
Create table using schema from the parquet file
Load data from parquet-file to this table.
Could anyone help me how to do that? Does exist the most efficient way to realize it? (by using GUI Snowflake, for example). Can't find it.
Thanks.
If the schema of the files is same you can put them in a single stage and use the Infer-Schema function. This will give you the schema of the parquet files.
https://docs.snowflake.com/en/sql-reference/functions/infer_schema.html
In case all files have different schema then I'm afraid you have to infer the schema on each file.

Retrieving JSON raw file data from Hive table

I have a JSON File. I want to move only selected fields to Hive table. So below is the statement I used to create a new table to import the data from JSON file to HIVE Table. While creating it doesn't give any error but when i use select * from JsonFile1 or count(*) from JsonFile1 I get error as Failed with exception java.io.IOException:java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
I have browsed over the internet stuck with this since few days. I can't find a solution. I checked in the HDFS. I see there is a table created and complete file imported as-is(not just the fields I selected but all of it). I just provided the sample data, the actual data contains like 50+ field names. creating all the column names is cumbersome. Is that what we need to do? Thank you in advance.
CREATE EXTERNAL TABLE JsonFile1(user STRUCT<id:BIGINT,description:STRING, followers_count:INT>)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION 'link/data';
I have data as below
{filter_level":"low",geo":null,"user":{"id":859264394,"description":"I don’t want it. Building #techteam, #LetsTalk!!! def#abc.com",
"contributors_enabled":false,"profile_sidebar_border_color":"C0DEED","name"krogmi",
"screen_name":"jkrogmi","id_str":"859264394",}}06:20:16 +0000 2012","default_profile_image":false,"followers_count":88,
"profile_sidebar_fill_color":"DDFFCC","screen_name":"abc_abc"}}
Answering my own question.
I have deleted the data in hdfs which I was pointing in the LOCATION '...', copied data again from local to hdfs and recreated the table again and it worked.
I am assuming that data was the problem.

How Can I create a Hive Table on top of a Parquet File

Facing issue on creating hive table on top of parquet file. Can someone help me on the same.? I have read many articles and followed the guidelines but not able to load a parquet file in Hive Table.
According "Using Parquet Tables in Hive" it is often useful to create the table as an external table pointing to the location where the files will be created, if a table will be populated with data files generated outside of Hive.
hive> create external table parquet_table_name (<yourParquetDataStructure>)
STORED AS PARQUET
LOCATION '/<yourPath>/<yourParquetFile>';

Where does hive stores its table?

I am new to Hadoop and I just started working on Hive, I my understanding it provides a query language to process data in HDFS. With HiveQl we can create tables and load data into it from HDFS.
So my question is: where are those tables stored? Specifically if we have 100 GB file in our HDFS and we want to make a hive table out of that data what will be the size of that table and where is it stored?
If my understanding about this concept is wrong please correct me ..
If the table is 100GB you should consider an Hive External Table (as opposed to a "managed table", for the difference, see this).
With an external table the data itself will be still stored on the HDFS in the file path that you specify (note that you may specify a directory of files as long as they all have the same structure), but Hive will create a map of it in the meta-store whereas the managed table will store the data "in Hive".
When you drop a managed table, it drops the underlying data as opposed to dropping a hive external table which only drops the meta-data from the meta-store referencing that data.
Either way you are using only 100GB as viewed by the user and are taking advantage of the HDFS' robustness though duplication of the data.
Hive will create a directory on HDFS. If you didn't specify any location it will create a directory at /user/hive/warehouse on HDFS. After load command the files are moved to the /warehouse/tablename. You can also point to the HDFS directory if it contains partitions (if the files are partitioned), or use external table concept.