Hi I'm new to hive and would definitely appreciate some tips.
I'm trying to export hive query results as a csv, in the cli.
I can export them as text using:
hive -e 'set hive.cli.print.header=true; SELECT * FROM TABLE_NAME LIMIT 0;' > /file_path/file_name.txt
Can anyone suggest what I need to add in order to get the columns delimited by ','
This is how you can do it directly from hive, instead of going through sed route.
SET hive.exec.compress.output=FALSE;
SET hive.cli.print.header=TRUE;
INSERT overwrite local directory '/file_path/file_name.txt' row format delimited fields terminated by ',' SELECT * FROM TABLE_NAME LIMIT 1;
You can use concat_ws() function in your query like this
For SELECT *
select concat_ws(',',*) from <table-name>;
Or if you want particluar columns
select concat_ws(',', col_1, col_2, col_3...) from <table-name>;
hive -e 'set hive.cli.print.header=true; SELECT * FROM TABLE_NAME LIMIT 0;' > /file_path/file_name.txt && cat /file_path/file_name.txt | sed -e 's/\s/,/g' > /file_path/file_name.formatted.txt
once your query creates the output file, then use sed to replace the space with "," as show above.
Related
I would like to know how I can replace all occurrences (from any column in a table) of \\\\N with an empty string. I think I should use the REGEX_REPLACE function, but I've only been able to see examples of it used on one column inside Snowflake.
REGEXP_REPLACE( <subject> , <pattern> [ , <replacement> , <position> , <occurrence> , <parameters> ] )
What you're looking for is not possible natively in SQL. You could do
update your_table
set col1=replace(col1,'\\\\N',''),
col2=replace(col2,'\\\\N',''),
col3=replace(col3,'\\\\N',''),
....
I personally prefer the following because I can run the select portion to take a look at my output before making any changes
create or replace table your_table as
select top 0 * --to avoid having to write column names in subsequent select
from your_table
union all
select replace(col1,'\\\\N',''),
replace(col2,'\\\\N',''),
replace(col3,'\\\\N',''),
...
from your_table
You can generate the SQL to operate on each column using the 'show columns' then build a set of SQL statements using the lastqueryID
show columns in table mytable;
select 'update mytable set ' || "column_name" || ' = replace(' || "column_name" || ',''\\\\\\\\N'','''',);' from TABLE(RESULT_SCAN(LAST_QUERY_ID()));
My issue was that Snowflake by default replaced NULL values with \\N that's why I was seeing the exported file from my s3 bucket containing a double escaped newline characters. The issue wasn't a problem with the table itself but after export and setting the file_format option to override the default as empty did the trick.
copy into <#s3..> from <view> header = true max_file_size = 5368709120 Single=True overwrite = true file_format = (TYPE='CSV' COMPRESSION='NONE' DEFAULT_NULL=());
I'm running PSQL exports to CSV files for a few tables. They look like the below:
COPY table_name TO 'file_name' CSV
The issue is that some of these tables have text fields, in which the values for these fields contain both the delimiter (commas) and newlines. What would be the best way to do the export while removing the newlines across all columns?
Example table:
field1,field2,field3,field4
field1,field2,"field3, with, the delimiter",field4
field1,field2,"field3, with, the
delimiter and newline",field4
field1,"field2 with a
newline",field3,field4
How I'd want my export to look:
field1,field2,field3,field4
field1,field2,"field3, with, the delimiter",field4
field1,field2,"field3, with, the delimiter and newline",field4
field1,"field2 with a newline",field3,field4
Some solutions I've been considering:
Write a custom regex replace function and update the tables before I do the export.
See if there is a way to do the replace during the export transaction (is this possible?).
Perform the export as is and use another library/language to post-process the exported CSV.
Thanks for the help!
You can automatically compose an appropriate COPY statement with this SQL statement:
SELECT format(
'COPY (SELECT %s FROM %I.%I) TO ''filename'' (FORMAT ''csv'');',
string_agg(
format(
CASE WHEN data_type IN ('text', 'character varying', 'character')
THEN 'translate(%I, E''\n,'', '''')'
ELSE '%I'
END,
column_name
),
', '
ORDER BY ordinal_position
),
table_schema,
table_name
)
FROM information_schema.columns
WHERE table_schema = 'schema_name'
AND table_name = 'table_name'
GROUP BY table_schema, table_name;
If you are using psql, you can replace the final semicolon with \gexec to have psql run the resulting SQL statement for you in one go.
I have an EXTERNAL hive table store in LZO format. There are some lines in this table, but I can't get the data by "select *". There must be some problems with my table format, but I don't know how to fix it.
CREATE EXTERNAL TABLE tableName(
column1 string
)
PARTITIONED BY (
column2 string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://.../tableName'
select count(*) from tableName; //return 1
select * from tableName; //return nothing
select column1, column2 from tableName group by column1,column2; //return data1 data2
select * from tableName where column2='data2'; //return nothing
Only "select * " return nothing. Maybe "select * " is not executed through map-reduce?
I find that the simple SQL (without count, sum, group by and so on) will not be executed by map-reduce, that will go through fetch job (directly read hdfs file). However, My hdfs file is stored in lzo format, there will be some problem to read it.
The one solution is forcing simple SQL go to map-reduce.
set hive.fetch.task.conversion=none;
I'm very sorry for the real reason.
I forgot the parameters for LZO, thus the hdfs file is TEXTFILE format :(
set hive.exec.compress.output=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
I am using SQL LOADER to load multiple csv file in one table.
The process I found is very easy like
LOAD
DATA
INFILE '/path/file1.csv'
INFILE '/path/file2.csv'
INFILE '/path/file3.csv'
INFILE '/path/file4.csv'
APPEND INTO TABLE TBL_DATA_FILE
EVALUATE CHECK_CONSTRAINTS
REENABLE DISABLED_CONSTRAINTS
EXCEPTIONS EXCEPTION_TABLE
FIELDS TERMINATED BY ","
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
COL0,
COL1,
COL2,
COL3,
COL4
)
But I don't want to use INFILE multiple time cause if I have more than 1000 files then I have to mention 1000 times INFILE in control file script.
So my question is: is there any other way (like any loop / any *.csv) to load multiple files without using multiple infile?
Thanks,
Bithun
Solution 1: Can you concatenate the 1000 files into on big file, which is then loaded by SQL*Loader. On unix, I'd use something like
cd path
cat file*.csv > all_files.csv
Solution 2: Use external tables and load the data using a PL/SQL procedure:
CREATE PROCEDURE myload AS
BEGIN
FOR i IN 1 .. 1000 LOOP
EXECUTE IMMEDIATE 'ALTER TABLE xtable LOCATION ('''||to_char(i,'FM9999')||'.csv'')';
INSERT INTO mytable SELECT * FROM xtable;
END LOOP;
END;
You can use a wildcards (? for a single character, * for any number) like this:
infile 'file?.csv'
;)
Loop over the files from the shell:
#!/bin/bash
for csvFile in `ls file*.csv`
do
ln -s $csvFile tmpFile.csv
sqlldr control=file_pointing_at_tmpFile.ctl
rm tmpFile.csv
done
OPTIONS (skip=1)
LOAD DATA
INFILE /export/home/applmgr1/chalam/Upload/*.csv
REPLACE INTO TABLE XX_TEST_FTP_UP
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(FULL_NAME,EMPLOYEE_NUMBER)
whether it will check all the CSV and load the data or not
I have a bunch of emails, each separated by a comma (,) (not csv). I want to upload them to a database table (with single field email) such that each email goes into separate record entry. what could be the most easiest way to do that? I have an idea of using grep to replace commas with my sql syntax.. but searching for any other workaround.. any idea?
Perhaps something like:
LOAD DATA INFILE '/where/the/file/is'
INTO TABLE table (email)
FIELDS TERMINATED BY ','
LINES STARTING BY '';
Syntax docs here: http://dev.mysql.com/doc/refman/5.1/en/load-data.html
I'd use shell tools like sed or awk to convert the input format to something that mysqlimport can handle.
Convert the current ',' separated email list to a one line per email list
tr ',' '\n' < inputfilename > outputfilename
use load data infile after logging into mysql, make sure your table only has one column in this case
load data infile 'outputfilename' into table tablename;
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
MySQL supports multiple inserts in a single statment
INSERT INTO [Table] ([col1], [col2], ... [colN] )
VALUES ([value1], [value2], ... [valueN] )
, ([value1], [value2], ... [valueN] )
, ([value1], [value2], ... [valueN] )
;
You could pretty quickly format a comma-separated file into this format.