Is there any way to get meta data from hive in hue? - hive

I tried to get meta data in hue from hive but all failed.
Im looking for the way not using metastore in mysql.. etc or shell.
Information_schema is also not implemented in hive…
If i can get meta data, i wanna make a table about all meta data like table name, columns and type.

Related

How to auto detect schema from file in GCS and load to BigQuery?

I'm trying to load a file from GCS to BigQuery whose schema is auto-generated from the file in GCS. I'm using Apache Airflow to do the same, the problem I'm having is that when I use auto-detect schema from file, BigQuery creates schema based on some ~100 initial values.
For example, in my case there is a column say X, the values in X is mostly of Integer type, but there are some values which are of String type, so bq load will fail with schema mismatch, in such a scenario we need to change the data type to STRING.
So what I could do is manually create a new table by generating schema on my own. Or I could set the max_bad_record value to some 50, but that doesn't seem like a good solution. An ideal solution would be like this:
Try to load the file from GCS to BigQuery, if the table was created successfully in BQ without any data mismatch, then I don't need to do anything.
Otherwise I need to be able to update the schema dynamically and complete the table creation.
As you can not change column type in bq (see this link)
BigQuery natively supports the following schema modifications:
BigQuery natively supports the following schema modifications:
* Adding columns to a schema definition
* Relaxing a column's mode from REQUIRED to NULLABLE
All other schema modifications are unsupported and require manual workarounds
So as a workaround I suggest:
Use --max_rows_per_request = 1 in your script
Use 1 line which is the best suitable for your case with the optimized field type.
This will create the table with the correct schema and 1 line and from there you can load the rest of the data.

where can I find the avro.schema.url within the hive meta store?

I am trying to locate the property avro.schema.url that is part of the table meta data when a table is created by specifying the location to a avro schema file for some avro data in s3 or hdfs. I am able to see it in the output when I run the describe extended table command, but within the metastore database, where is this property stored? I searched the table_params for that particular table_id and did not find it ?
found it, its in SERDE_PARAMS table

Load data from Drill table into Hive Table

I have created a table using Drill and it is located at
/user/abc/drill/Drilltable.
Now I would like to load the data from DrillTable into HiveTable which is located at path
/user/hive/warehouse/userxyz.db
I am using below statement to load data
INSERT INTO TABLE HiveTable select * from DrillTable;
I get the error
Table not found
and I am bit confused how to let Hive know the path of Drill table.
What would be the right way to handle this?
Hive might be confused about the schema of the drill data as well as the location. If you're willing to experiment, try something like this:
Store the data in a Drill format you can model in Hive, CSV for example, as described in this post.
In Hive, create an external table that defines the schema and location of the textual data. You can then convert the external table to a managed table (optional). For example ....

Writing Avro to BigQuery using Beam

Q1: Say I load Avro encoded data using BigQuery load tool. Now I need to write this data to different table still in Avro format. I am trying to test out different partition in order to test the table performance. How do I write back SchemaAndRecord to BigQuery using Beam? Also would schema detection work in this case?
Q2: Looks like schema information is lost when converted to BigQuery schema type from Avro schema type. For example both double and float Avro type is converted to FLOAT type in BigQuery. Is this expected?
Q1: If the table already exists and the schema matches the one you're copying from you should be able to use CREATE_NEVER CreateDisposition (https://cloud.google.com/dataflow/model/bigquery-io#writing-to-bigquery) and just write the TableRows directly from the output of readTableRows() of the original table. Although I suggest using BigQuery's TableCopy command instead.
Q2: That's expected, BigQuery does not have a Double type. You can find more information on the type mapping here: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#avro_conversions. Also Logical Types will soon be supported as well: https://issuetracker.google.com/issues/35905894.

What is the default schema in Hive?

For Pig, the default schema is ByteArray. Is there a default schema for Hive if we don't mention a schema in Hive? I tried to look at some Hive documentation but couldn't find any.
Hive is schema on Read --- I am not sure this is the answer...If some one could give an insight on this that would be great
Hive does the best that it can to
read the data. You will get lots of null values if there aren’t enough fields in each record
to match the schema. If some fields are numbers and Hive encounters nonnumeric
strings, it will return nulls for those fields. Above all else, Hive tries to recover from all
errors as best it can.
There is not default schema in Hive, in order to query data in hive you have to first create a table explaining the content of your data (by using create external table ... location).
So you basically have to tell hive the "scheme" before querying the data.