I have 1 data file where in 1 column some data is present with prefix 'D' like 'D1234'.I need to load those data into table without including D.like in table data should be '1234'.Need to remove prefix D and load the data.Is there any way to do like this.Thanks in Advance.
Related
Using Greenplum external table feature to create a temporary readable table to load all files from a specific folder in S3. Here is how the ext table is created:
CREATE READABLE EXTERNAL TEMPORARY TABLE app_json_ext (json_data text)
LOCATION ('s3://some_s3_location/2021/06/17/
config=some_config.conf') FORMAT 'TEXT' (DELIMITER 'OFF' null E'' escape E'\t')
LOG ERRORS SEGMENT REJECT LIMIT 100 PERCENT;
with file names in S3 like:
appData_A2342342342.json
(where A2342342342 is the ID of the JSON file)
Full path based on above example would be:
s3://some_s3_location/2021/06/17/appData_A2342342342.json
Note that the ext table only contains a single column (the JSON data).
I would like to have another column for the ID of the JSON file (in this case A2342342342)
How would I set up a create ext table statement so that I can grab the file name from S3 and parse it for the JSON_ID column value?
something like this....
CREATE READABLE EXTERNAL TEMPORARY TABLE app_json_ext (json_id text, json_data text)
Context
I am receiving CSV files in S3, which do not always follow the same schema and/or order. For example, sometimes files look like:
foo, bar, bla
hi , 007, 42
bye, 008, 44
But other times, they can look like (bar can be missing):
foo, bla
hi , 42
bye, 44
Now let's say I'm only interested in getting the foo column regardless of what else is there. But I can't really count on the order of the columns in the CSV. so on some days foo could be the first column, but on other days foo could be the third column. By the way, I am using Snowflake as a database.
What I have tried to do
I created a destination table like:
CREATE TABLE woof.meow (foo TEXT);
Then I tried to use Snowflake's COPY INTO command to copy data from the CSV into the table I created. The catch here, is that I tried to do the same way I normally do for Parquet files (matching by column names!) like:
COPY INTO woof.meow
FROM '#STAGES.MY_S3_BUCKET_STAGE/'
file_format = (
TYPE=CSV,
COMPRESSION=GZIP,
)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;
But sadly I always got: error: Insert value list does not match column list expecting 1 but got 0
Some research lead me to this section of the docs (about MATCH_BY_COLUMN_NAME) to discover CSV is not supported:
This copy option is supported for the following data formats:
- JSON
- Avro
- ORC
- Parquet
Desired objective
How can I copy data from the STAGE (containing csv file on s3)to a pre-created table based on column names?
I am happy to provide any further information if needed.
You are trying to insert CSV which is comma separated values file data into one text column ,to my knowledge your column order in your source data files should be same as column orders that you have created for target table in Snowflake which means if you have foo , bar and bla as columns in source csv file then your target table columns should be also be created as separate columns , in same order as source csv files;
If you have unsure of what columns could come in your source file ; i would recommend you transform this file to JSON (that is my choice you can choose other option too like avro) and load that content into VARIANT column in Snowflake;
By this way you would not worry much about order of columns in source files , you would store data as JSON/AVRO into target table and would use JSON handling mechanism to convert JSON values into Columns.(Flatten the JSON to convert it onto relational table)`
I have a dataset (txt file) in which there are 10 columns from which, last column has string data separated by a tab. for example -> abcdef lkjhj pqrst...wxyz
I created a new table defining col 10 as STRING but after loading the data into this table and I verify the data it shows only abcdef populated in the last column and the rest are ignored.
Plz can someone help how do I load entire string of data in the hive table. Do I need to write UDF ?
Thanks in advance
I am creating a Hive table from a file with 8k rows, but the table created has 78k rows. The command line is the following:
bin/hive_executable < my_script.hql
my_script.hql:
create table my_table(k1 t1, k2 t2....);
load data local inpath 'path/to/table_file.txt' INTO TABLE my_table;
table_file.txt:
v1 v2 v3...
I've tried both space and tab delimited fields, and explicitly declaring the structure in the create table statement. When I use example code to create a table from $HIVE_HOME/example/file/kv1.txt, the table and file both have 500 lines / rows.
Any ideas?
Thanks
Strip text fields of all newline characters.
When extracting data from a table (schema and data) I can do this by right clicking on the database and by going to tasks->Generate Scripts and it gives me all the data from the table including the create script, which is good.
This though gives me all the data from the table - can this be changed to give me only some of the data from the table? e.g only data on the table after a certain dtmTimeStamp?
Thanks,
I would recommend extracting your data into a separate table using a query and then using generate scripts on this table. Alternatively you can extract the data separately into a flatfile using the export data wizard (include your column headers and use comma seperators with double quote field delimiters).
To make a copy of your table:
SELECT Col1 ,Col2
INTO CloneTable
FROM MyTable
WHERE Col3 = #Condition
(Thanks to #MarkD for adding that)