I've been trying to store csv data into a table in a database using a pig script.
But instead of inserting the data into a table in a database I created a new file in the metastore.
Can someone please let me know if it is possible to insert data into a table in a database with a pig script, and if so what that script might look like?
You can take a look at DBStorage, but be sure to include the JDBC jar in your pig script and declaring the UDF.
The documentation for the storage UDF is here:
http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/DBStorage.html
you can use:
STORE into tablename USING org.apache.hcatalog.pig.HCatStorer()
Related
I need to transform a fairly big database table with aws Glue to csv. However I only the newest table rows from the past 24 hours. There ist a column which specifies the creation date of the row. Is it possible, to just transform these rows, without copying the whole table into the csv file? I am using a python script with Spark.
Thank you very much in advance!
There are some Built-in Transforms in AWS Glue which are used to process your data. This transfers can be called from ETL scripts.
Please refer the below link for the same :
https://docs.aws.amazon.com/glue/latest/dg/built-in-transforms.html
You haven't mentioned the type of database that you are trying connect. Anyway for JDBC connections spark has the option of query, in which you can issue the usual SQL query to get the rows you need.
Currently we are dropping the table daily and running the script which loads the data to the tables. Script takes 3-4 hrs during which data will not be available. So now our aim is to make the old hive data available to analysts until new data load execution is complete.
I am achieving this thing in hql script by loading daily data to the hive tables partitioned on load_year, load_month and load_day and dropping the yesterdays data by dropping the partition.
But what is the option for pig script to achieve the same? Can we alter the table through pig script? I dont want to execute the other hql to drop partition after pig.
Thanks
Since HDP 2.3 you can use HCatalog commands inside Pig scripts. Therefore, you can use the HCatalog command to drop a Hive table partition. The following is an example of dropping a Hive partition:
-- Set the correct hcat path
set hcat.bin /usr/bin/hcat;
-- Drop a table partion or execute other any Hcatalog command
sql ALTER TABLE midb1.mitable1 DROP IF EXISTS PARTITION(activity_id = "VENTA_ALIMENTACION",transaction_month = 1);
Another way is to use sh command execution inside Pig Script. However I had some problems to escape special characters in ALTER commands. So, the first is the best option in my opinion.
Regards,
Roberto TardÃo
Is there a way to change the database's table in hive or Hcatalog?
For instance, I have the table foo in the database default, and I want to put this table in the database bar. I try this, but it doesn't work:
ALTER TABLE foo RENAME TO bar.foo
Thanks in advance
AFAIK there is no way in HiveQL to do this. A ticket was raised long back though. But the issue is still open.
An alternate could be to use the EXPORT/IMPORT feature provided by Hive. With this feature we can export the data of a table to a HDFS file along with the metadata using the EXPORT command. The data is stored in JSON format. Data once exported this way could be imported back to another database (even another hive instance) using the IMPORT command.
More on this can be found on the IMPORT/EXPORT MANUAL.
HTH
thanks for your response. I found an other mean to change the database
USE db1; CREATE TABLE db2.foo like foo
Using a SQL tool like SQL Developer / Toad for Oracle
Is it possible to write a SQL query that will do the following
SELECT * FROM TABLE
WHERE COLUMN1 IN CSV_FILE
The CSV file is just one column of data with no delimiters.
How can I achieve this?
Constraints
I cannot create a temp table to insert CSV file (no create permissions)
The data I am using of this column is the only index in that table so I cannot use other columns to query or else it will be really slow.
Thanks
Creating external table is the best way. If you dont have permission then the other way is to move the file to the path of any oracle directory(Oracle object - Directory). And with help of utl_file read the file, loop through it and do your operation inside a PL/SQL block which is too tedious.
See the eaxmples for using utl_file - http://psoug.org/reference/utl_file.html
But its better if you try and get create access.
Toad for Oracle data import (uses sqlldr internally)
Create a temp table and load the data using this utility and select the values
External tables
Create external table, load the data through the same and select the values.
Using SQL developer you can create a table in your schema and load this table with data from a csv file.
Notes:
You will need to create a void column per each column to import from excel
Excel export csv with ";" delimiter
If SQL developer(4.1.5) doesn't preview the fields in separated columns try moving forward/backwards with Next/back buttons
and a very graphical guide in the following page:
http://www.thatjeffsmith.com/archive/2012/04/how-to-import-from-excel-to-oracle-with-sql-developer/
I want to create update script for BLOB column in Table which stores XSL Data in ORACLE. Can anybody help me in simple way without creating any directory. Here number of character involved is also more than 4000.
I have modified in TOAD by 'Save to File' and again from 'Load to File'. Now I want to transfer it to some other database using SQL Script.
Using the Oracle IMP and EXP utilities you can export a table into a file and import it into another database. Here is some information on how to use them:
http://www.orafaq.com/wiki/Import_Export_FAQ
It is not SQL but it also doesn't involve creating directories.