ORA delete / truncate - sql

I'm using SQL loader to load my data into database.
Before I insert the data I need to remove existing data in the table:
options(skip=1,load=250000,errors=0,ROWS=30000,BINDSIZE=10485760)
load data
infile 'G:1.csv' "str '^_^'"
replace
into table IMPORT_ABC
fields terminated by "," OPTIONALLY ENCLOSED BY '"'
trailing nullcols(
.
.
.
.)
But I got error like:
SQL*LOADER-926: OCI error while executing delete/truncate for table IMPORT_ABC
ORA-30036: unable to extend segment by 8 in undo tablespace 'undo1'
How can I delete data for example by 10000 rows?
I know that I have some limit on my DB.

Deleting records in batches can be done in a PL/SQL loop, but is generally considered bad practice as the entire delete should normally be considered as a single transaction; and that can't be done from within the SQL*Loader control file. Your DBA should size the UNDO space to accommodate the work you need to do.
If you're deleting the entire table you'll almost certainly be better off truncating anyway, either in the control file:
options(skip=1,load=250000,errors=0,ROWS=30000,BINDSIZE=10485760)
load data
infile 'G:1.csv' "str '^_^'"
truncate
into table IMPORT_ABC
...
Or as a separate truncate statement in SQL*Plus/SQL Developer/some other client before you start the load:
truncate table import_abc;
The disadvantage is that your table will appear empty to other users while the new rows are being loaded, but if it's a dedicated import area (guessing from the name) that may not matter anyway.
If your UNDO is really that small then you may have to run multiple loads, in which case - probably obviously - you need to make sure you only have the truncate in the control file for the first one (or use the separate truncate statement), and have append instead in subsequent control files as you noted in comments.
You might also want to consider external tables if you're using this data as a base to populate something else, as there is no UNDO overhead on replacing the external data source. You'll probably need to talk to your DBA about setting that up and giving you the necessary directory permissions.

Your undo tablespace is to small to hold all the undo information and it seems it cannot be extended.
You can split the import into smaller batched and issue a commit after each batch or get your DBA to increase the tablespace for undo1
And use truncate in stead of replace before you start the immports

Related

Will truncate in ScyllaDB also delete data on other nodes

I have three nodes running, which contain some data we need to purge. Since we can not identify the records, we would like to truncate the table and fill it from scratch.
Now I was wondering, when I issue a truncate statement, will this only truncate the current instance or will this also clear the other nodes?
So basically, do I have to issue the truncate statement on each node and also fill it there, or is it sufficient to do it on one node and it will propagte to the others. We would load the data from a CSV file via the COPY command.
The table will be truncated in the entire cluster.

Oracle Bulk Insert Using SQL Developer

I have recently taken data dumps from an Oracle database.
Many of them are large in size(~5GB). I am trying to insert the dumped data into another Oracle database by executing the following SQL in SQL Developer
#C:\path\to\table_dump1.sql;
#C:\path\to\table_dump2.sql;
#C:\path\to\table_dump3.sql;
:
but it is taking a long time like more than a day to complete even a single SQL file.
Is there any better way to get this done faster?
SQL*Loader is my favorite way to bulk load large data volumes into Oracle. Use the direct path insert option for max speed but understand impacts of direct-path loads (for example, all data is inserted past the high water mark, which is fine if you truncate your table). It even has a tolerance for bad rows, so if your data has "some" mistakes it can still work.
SQL*Loader can suspend indexes and build them all at the end, which makes bulk inserting very fast.
Example of a SQL*Loader call:
$SQLDIR/sqlldr /#MyDatabase direct=false silent=feedback \
control=mydata.ctl log=/apps/logs/mydata.log bad=/apps/logs/mydata.bad \
rows=200000
And the mydata.ctl would look something like this:
LOAD DATA
INFILE '/apps/load_files/mytable.dat'
INTO TABLE my_schema.my_able
FIELDS TERMINATED BY "|"
(ORDER_ID,
ORDER_DATE,
PART_NUMBER,
QUANTITY)
Alternatively... if you are just copying the entire contents of one table to another, across databases, you can do this if your DBA sets up a DBlink (a 30 second process), presupposing your DB is set up with the redo space to accomplish this.
truncate table my_schema.my_table;
insert into my_schema.my_table
select * from my_schema.my_table#my_remote_db;
The use of the /* +append */ hint can still make use of direct path insert.

Hive: create table and write it locally at the same time

Is it possible in hive to create a table and have it saved locally at the same time?
When I get data for my analyses, I usually create temporary tables to track eventual
mistakes in the queries/scripts. Some of these are just temporary tables, while others contain the data that I actually need for my analyses.
What I do usually is using hive -e "select * from db.table" > filename.tsv to get the data locally; however when the tables are big this can take quite some time.
I was wondering if there is some way in my script to create the table and save it locally at the same time. Probably this is not possible, but I thought it is worth asking.
Honestly doing it the way you are is the best way out of the two possible ways but it is worth noting you can preform a similar task in an .hql file for automation.
Using syntax like this:
INSERT OVERWRITE LOCAL DIRECTORY '/home/user/temp' select * from table;
You can run a query and store it somewhere in the local directory (as long as there is enough space and correct privileges)
A disadvantage to this is that with a pipe you get the data stored nicely as '|' delimitation and new line separated, but this method will store the values in the hive default '^b' I think.
A work around is to do something like this:
INSERT OVERWRITE LOCAL DIRECTORY '/home/user/temp'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
select books from table;
But this is only in Hive 0.11 or higher

Bulk Insert with Limited Disk Space

I have a bit of a strange situation, and I'm wondering if anyone would have any ideas how to proceed.
I'm trying to bulk load a 48 gig pipe-delimited file into a table in SQL Server 2008, using a pretty simple bulk insert statement.
BULK INSERT ItemMovement
FROM 'E:\SQLexp\itemmove.csv'
WITH (DATAFILETYPE = 'char', FIELDTERMINATOR = '|', ROWTERMINATOR = '\n' )
Originally, I was trying to load directly into the ItemMovement table. But unfortunately, there's a primary key violation somewhere in this giant file. I created a temporary table to load this file to instead, and I'm planning on selecting distinct rows from the temporary table and merging them into the permanent table.
However, I keep running into space issues. The drive I'm working with is a total of 200 gigs, and 89 gigs are already devoted to both my CSV file and other database information. Every time I try to do my insertion, even with my recovery model set to "Simple", I get the following error (after 9.5 hours of course):
Msg 9002, Level 17, State 4, Line 1
The transaction log for database 'MyData' is full due to 'ACTIVE_TRANSACTION'.
Basically, my question boils down to two things.
Is there any way to load this file into a table that won't fill up the drive with logging? Simple Recovery doesn't seem to be enough by itself.
If we do manage to load up the table, is there a way to do a distinct merge that removes the items from the source table while it's doing the query (for space reasons)?
Appreciate your help.
Even with simple recovery the insert is still a single operation.
You are getting the error on the PK column
I assume the PK is only a fraction of the total size
I would break it up to only insert the PK
Pretty sure you can limit the columns with FORMATFILE
If you have to edit a bunch of duplicate PKs you may need use a program to parse and then load row by row
Sounds like a lot of work that is solved with a $100 drive.
For real would install a drive and use it for the transaction log.
#tommy_o was right about using TABLOCK in order to get my information loaded. Not only did it run in about an hour and a half instead of nine hours, but it barely increased my log size.
For the second part, I realized I could free up quite a bit of space by deleting my CSV after the load, which gave me enough space to get the tables merged.
Thanks everyone!

remove source file from Hive table

When I load a (csv)-file to a hive table I can load without overwriting, thus adding the new file to the table.
Internally the file is just copied to the correct folder in HDFS
(e.g. user/warehouse/dbname/tablName/datafile1.csv). And probably some metadata is updated.
After a few loads I want to remove the contents of a specific file from the table.
I am sure I cannot simply delete the file because of the metadata that needs to be adjusted as well. There must be some kind of build-in function for this.
How do I do that?
Why do you need that?I mean Hive was developed to serve as a warehouse where you put lots n lots n lots of data and not to delete data every now and then. Such a need seems to be a poorly thought out schema or a poor use of Hive, at least to me.
And if you really have these kind of needs why don't you create partitioned tables? If you need to delete some specific data just delete that particular partition using either TRUNCATE or ALTER.
TRUNCATE TABLE table_name [PARTITION partition_spec];
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec, PARTITION partition_spec,...
if this feature is needed more than just once in a while you can use MapR's distribution while allows this kind of operations with no problem (even via NFS). otherwise, if you don't have partition I think you'll have to create and new table using CTAS filterring the data in the bad file or just copy the good files back to os with "hadoop fs -copyToLocal" and move them back to hdfs into new table