What does 'insert overwrite local directory' mean in Hive? - hive

Im having some issues understanding what does the following type of query do:
insert overwrite local directory $directorey_name$
select $some_query$
What does this mean, and what are the side effects of this?

Export the query results into a file on the local file system
insert overwrite local directory '/tmp/hello'
row format delimited
fields terminated by '|'
select 1,2,3,'Hello','world'
;
! ls /tmp/hello;
000000_0
! cat /tmp/hello/000000_0;
1|2|3|Hello|world

Related

Bash script to import the multiple CSV files into a mysql database using load data local infile command

I have Multiple CSV files which are stored in one of the folder then I need to use these folder to fetch the csv files then load them into Database Table.
This script need to prepare in Bash with parameterized fields like InputFolderPath(loop Csv Files), DatabaseConnection, SchemaName, TableName then pass these fields using
Load Data Local Infile Command.
This worked for me,
for f in /var/www/path_to_your_folder/*.csv
do
mysql -e "use database_name" -e "
load data local infile '"$f"' into table your_table_name FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (column_name1, #date_time_variable1, column_name3)
SET some_column_name_which_contains_date = STR_TO_DATE(#date_time_variable1, '%d-%m-%Y');" -u your_mysql_username_here --p --local-infile=1
echo "Done: '"$f"' at $(date)"
done
This script will prompt password for mysql.
i am using this script on ec2 + ubuntu

PostgreSQL - ERROR: missing data for column "XX" SQL state: 22P04

I am trying to import TXT file into the postgreSQL database table, but I am getting an error:
ERROR:
missing data for column "bts_name"
SQL state: 22P04
My code is:
COPY indicadores2g (
Daily,
BTS_NAME,
SITE_CODE
)
FROM 'C:\Users\Public\Documents\GEO_2G_CELL.txt'
WITH CSV HEADER DELIMITER ' ' NULL AS '' ;
I know that the problem is in the txt file. In the txt file the last two line are blank (example), and when I remove them, the SQL run without problem.enter image description here
My problem is I need to import every day. Is there any rule to put in my SQL code to run without problems?
Another way to run without problems is: Open TXT in excel and save as CSV. Can I do this automatically?
Create simple batch (for example inpfixer.bat):
#echo off
for /f "delims=" %%a in (%1) do (
echo %%a
)
Then
COPY indicadores2g (
Daily,
BTS_NAME,
SITE_CODE
)
FROM PROGRAM 'inpfixer.bat C:\Users\Public\Documents\GEO_2G_CELL.txt'
WITH CSV HEADER DELIMITER ' ' NULL AS '' ;
Surely, inpfixer.bat should be available by PATH.
Disclaimer: Tested on the Wine.

PSQL lo_import in client side script

we have a simple sql script we maintain that sets up your schema and populates a set of text/example values - so it's just like create table, create table table insert into table... and we run it with a simple shell script which calls psql
one of our tables requires files - what I wanted to do was just have the files in the same directory as the script and do something like insert into repository (id, picture) values ('first', lo_import('first.jpg'))
but I get errors saying must be superuser to use server-side script. Is there any way I can achieve this? I have just a .sql file and a bunch of image files and by running psql against the file import them?
Running as superuser is not an option.
Using psql, you could write a shell script like
oid=`psql -At -c "\lo_import 'first.jpg'" | tail -1 | cut -d " " -f 2`
psql -Aqt -c "INSERT INTO repository (id, picture) values ('first', $oid)"
because comments can't have code - thanks to Laurenz, I got it "working" like this:
drop table if exists some_landing_table;
create table some_landing_table( load_time timestamp, filename varchar, data bytea);
\set the_file 'example.jpg';
\lo_import 'example.jpg';
insert into some_landing_table
select now(), 'example.jpg', string_agg(data,decode('','escape') order by pageno)
from
pg_largeobject
where
loid = (select max(loid) from pg_largeobject);
select lo_unlink( max(loid) ) from pg_largeobject;
however, that is ugly for two reasons -
I don't seem to be able to get the result of \lo_import into a variable in any way. even though select \lo_import filename works select \lo_import filename into x doesn't.
I can't use a variable - if I do \lo_import :the_file - it just says example.jpg doesn't exist - enven though if I put it in directly it works perfectly
I can't find a simpler way of providing a 0 length bytea field than decode('','escape')

How to load XML data file into Hive table?

While loading XML data file into HIVE table i got following error message:
FAILED: SemanticException 7:9 Input format must implement InputFormat. Error encountered near token 'StoresXml'.
The way i am loading the XML file is as follows :
**Create a table StoresXml
'CREATE EXTERNAL TABLE StoresXml (storexml string)
STORED AS INPUTFORMAT 'org.apache.mahout.classifier.bayes.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/user/hive/warehouse/stores';'
** Location /user/hive/warehouse/stores is in HDFS.
load data inpath <local path where the xml file is stored> into table StoresXml;
Now,problem is when i select any column from table StoresXml ,the above mentioned error comes up.
Please help me with it.Where i am going wrong?
1) first you need to create single column table like
CREATE TABLE xmlsample(xml string);
2) after that you need to load data in local/hdfs to hive table like
LOAD DATA INPATH '---------' INTO TABLE XMLSAMPLE;
3) NEXT BY USING XPATH, XPATH_ARRAY,XPATH_STRING LIKE SAMPLE XML QUERIES..
I have just loaded this transactions.xml file into hive table using xpath
for XML file:
**Bring records of xml file into one line:
terminal> cat /home/cloudera/Desktop/Test/Transactions_xml.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</record>|</record>\n|g' | grep -v '^\s*$' > /home/cloudera/Desktop/trx_xml;
terminal> hadoop fs -put /home/cloudera/Desktop/trx_xml.xml /user/cloudera/DataTest/Transactions_xml
hive>create table Transactions_xml1(xmldata string);
hive>load data inpath '/user/cloudera/DataTest/Transactions_xml' overwrite into table Transactions_xml1;
hive>create table Transactions_xml(trx_id int,account int,amount int);
hive>insert overwrite table Transactions_xml select xpath_int(xmldata,'record/Tid'),
xpath_int(xmldata,'record/AccounID'),
xpath_int(xmldata,'record/Amount') from Transactions_xml1;
I hope this will help you. Let me know the result.
I have developed a tool to generate hive scripts from a csv file. Following are few examples on how files are generated.
Tool -- https://sourceforge.net/projects/csvtohive/?source=directory
Select a CSV file using Browse and set hadoop root directory ex: /user/bigdataproject/
Tool Generates Hadoop script with all csv files and following is a sample of
generated Hadoop script to insert csv into Hadoop
#!/bin/bash -v
hadoop fs -put ./AllstarFull.csv /user/bigdataproject/AllstarFull.csv
hive -f ./AllstarFull.hive
hadoop fs -put ./Appearances.csv /user/bigdataproject/Appearances.csv
hive -f ./Appearances.hive
hadoop fs -put ./AwardsManagers.csv /user/bigdataproject/AwardsManagers.csv
hive -f ./AwardsManagers.hive
Sample of generated Hive scripts
CREATE DATABASE IF NOT EXISTS lahman;
USE lahman;
CREATE TABLE AllstarFull (playerID string,yearID string,gameNum string,gameID string,teamID string,lgID string,GP string,startingPos string) row format delimited fields terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/bigdataproject/AllstarFull.csv' OVERWRITE INTO TABLE AllstarFull;
SELECT * FROM AllstarFull;
Thanks
Vijay

Sql Bulk Insert -- File does not exist

I have the following query to insert into a table
BULK
INSERT tblMain
FROM 'c:\Type.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
It get the message
Msg 4860, Level 16, State 1, Line 1
Cannot bulk load. The file "c:\Type.txt" does not exist.
The file is clearly there. Anything I may be overlooking?
Look at that:
Cannot bulk load. The file "c:\data.txt" does not exist
Is that file on the SQL Server's C:\ drive??
SQL BULK INSERT etc. always works only with local drive on the SQL Server machine. Your SQL Server cannot reach onto your own local drive.
You need to put the file onto the SQL Server's C:\ drive and try again.
Bulk import utility syntax is described here
http://msdn.microsoft.com/en-us/library/ms188365.aspx
> BULK INSERT [ database_name . [ schema_name ] . | schema_name . ]
> [ table_name | view_name ]
> FROM 'data_file'
> [ WITH
> (
Note on data_file argument says
' data_file '
Is the full path of the data file that contains data to import into
the specified table or view. BULK INSERT can import data from a disk
(including network, floppy disk, hard disk, and so on).
data_file must specify a valid path from the server on which SQL
Server is running. If data_file is a remote file, specify the
Universal Naming Convention (UNC) name. A UNC name has the form
\Systemname\ShareName\Path\FileName. For example,
\SystemX\DiskZ\Sales\update.txt.
I've had this problem before. In addition to checking the file path you'll want to make sure you're referencing the correct file name and file type. Make sure this is indeed a text file that you have saved in the source location and not a word file etc. I got tripped up with .doc and .docx. This is a newb mistake of mine to make but hey, it can happen. Changed the file type and it fixed the problem.