HIVE Query - Loading data into HIVE Table

HIVE Query - Loading data into HIVE Table - hive

I have a dataset (txt file) in which there are 10 columns from which, last column has string data separated by a tab. for example -> abcdef lkjhj pqrst...wxyz
I created a new table defining col 10 as STRING but after loading the data into this table and I verify the data it shows only abcdef populated in the last column and the rest are ignored.
Plz can someone help how do I load entire string of data in the hive table. Do I need to write UDF ?
Thanks in advance

Related

BigQuery - Rebuild/Refresh a table with new column data

I have a table with close to 40 columns(including 3 structs) which contains more than 1 TB of data. Now I need to add a new column to that table and refresh complete table data to reflect values to new column.
Could you please help me on what is the best/optimized way to do this.
Thanks in Advance.

You can add new fields when editing the schema as below:
After that, you can update the table with the new data

How can I copy data from CSV to a destination table based on column names?

Context
I am receiving CSV files in S3, which do not always follow the same schema and/or order. For example, sometimes files look like:
foo, bar, bla
hi , 007, 42
bye, 008, 44
But other times, they can look like (bar can be missing):
foo, bla
hi , 42
bye, 44
Now let's say I'm only interested in getting the foo column regardless of what else is there. But I can't really count on the order of the columns in the CSV. so on some days foo could be the first column, but on other days foo could be the third column. By the way, I am using Snowflake as a database.
What I have tried to do
I created a destination table like:
CREATE TABLE woof.meow (foo TEXT);
Then I tried to use Snowflake's COPY INTO command to copy data from the CSV into the table I created. The catch here, is that I tried to do the same way I normally do for Parquet files (matching by column names!) like:
COPY INTO woof.meow
FROM '#STAGES.MY_S3_BUCKET_STAGE/'
file_format = (
TYPE=CSV,
COMPRESSION=GZIP,
)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;
But sadly I always got: error: Insert value list does not match column list expecting 1 but got 0
Some research lead me to this section of the docs (about MATCH_BY_COLUMN_NAME) to discover CSV is not supported:
This copy option is supported for the following data formats:
- JSON
- Avro
- ORC
- Parquet
Desired objective
How can I copy data from the STAGE (containing csv file on s3)to a pre-created table based on column names?
I am happy to provide any further information if needed.

You are trying to insert CSV which is comma separated values file data into one text column ,to my knowledge your column order in your source data files should be same as column orders that you have created for target table in Snowflake which means if you have foo , bar and bla as columns in source csv file then your target table columns should be also be created as separate columns , in same order as source csv files;
If you have unsure of what columns could come in your source file ; i would recommend you transform this file to JSON (that is my choice you can choose other option too like avro) and load that content into VARIANT column in Snowflake;
By this way you would not worry much about order of columns in source files , you would store data as JSON/AVRO into target table and would use JSON handling mechanism to convert JSON values into Columns.(Flatten the JSON to convert it onto relational table)`

SQL Geting json data value from row to another colum

I have coupule of columns in my table and one of them is a CLOB with json object.
I am working on data extraction mechanism from table and i was wondering if it is possible to create a new view with a new column containing certain value from that json (for example one column have rows with data like ...,"request":{"status":"open",.....} and i want new column STATUS)
Do you have any ideas how could I achieve this?

You can use JSON_VALUE.
SELECT
JSON_VALUE(jsonInfo,'$.request.status') status
FROM
( VALUES('{"request":{"status":"open"}}') ) J(jsonInfo)
Result:
status
------------
open

populating Hive table from file yields far too many rows

I am creating a Hive table from a file with 8k rows, but the table created has 78k rows. The command line is the following:
bin/hive_executable < my_script.hql
my_script.hql:
create table my_table(k1 t1, k2 t2....);
load data local inpath 'path/to/table_file.txt' INTO TABLE my_table;
table_file.txt:
v1 v2 v3...
I've tried both space and tab delimited fields, and explicitly declaring the structure in the create table statement. When I use example code to create a table from $HIVE_HOME/example/file/kv1.txt, the table and file both have 500 lines / rows.
Any ideas?
Thanks

Strip text fields of all newline characters.

sql dump of data based on selection criteria

When extracting data from a table (schema and data) I can do this by right clicking on the database and by going to tasks->Generate Scripts and it gives me all the data from the table including the create script, which is good.
This though gives me all the data from the table - can this be changed to give me only some of the data from the table? e.g only data on the table after a certain dtmTimeStamp?
Thanks,

I would recommend extracting your data into a separate table using a query and then using generate scripts on this table. Alternatively you can extract the data separately into a flatfile using the export data wizard (include your column headers and use comma seperators with double quote field delimiters).
To make a copy of your table:
SELECT Col1 ,Col2
INTO CloneTable
FROM MyTable
WHERE Col3 = #Condition
(Thanks to #MarkD for adding that)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

HIVE Query - Loading data into HIVE Table - hive

Related

BigQuery - Rebuild/Refresh a table with new column data

How can I copy data from CSV to a destination table based on column names?

SQL Geting json data value from row to another colum

populating Hive table from file yields far too many rows

sql dump of data based on selection criteria

Categories

Resources