I have one table with one column iscurrent, values are 0 and 1.
I am inserting rows to this table with a merge statement that is inserting or updating based on the value of iscurrent. For performance tuning I have created two partitions with 0 and 1 on that table. But the performance is not good enough.
How can I tune the merge statement? Every time it is merging thousands rows to million of rows. Will partitioning the table help?
If you're using a derived table as the source, start by tuning that select statement first, or put the results of the derived table into a physical table with the proper indexes.
MERGE myTargetTabe as TargetTable
USING (derived table statement goes here) SourceTable on .......
A merge statement will scan every row that meets the criteria you specify in the "ON" clause, if there are rows that you don't need to merge, then exclude the rows here. For example:
MERGE myTargetTabe as TargetTable
USING (derived table statement goes here) SourceTable on TargetTable.ID = SourceTable.ID and TargetTable.OtherColumn NOT IN (....) and ...... or .... and so on.
Related
Is there an efficient way to update rows of a table that has no indexes and no partitions (and ~50millions rows)?
I have a date field LOAD_DTTM and values of this field for rows that require update (around 2000 distinct dates).
WIll update be faster if i specify a date in a WHERE clause along with the UNIQUE_ID of a row?
If you want to update all, or a large number, of the rows then the quickest way is:
create table my_table_copy as
select ... -- all the columns, updating values as required
from my_table;
drop table my_table;
rename my_table_copy to my_table;
If your table had any indexes, constraints or triggers you would now need to re-add them - but it seems you don't have that issue!
You could create a PL/SQL procedure that loops and update and commit the table every n row count -- Say every 20.000 rows. I do not advise to update the full table as it will create a lock for a looong time and expose you to data loss in case of external factors.
The answer is NO.
Even if you specify both conditions in your WHERE clause as you stated, it won't help you to avoid a full scan of your table.
Even if one of your criteria will uniquely identify the row, it still won't help.
There is a real example tested on Oracle 12C ver.2 similar to your case. No indexes, no partitions, nothing. Just plain table with 4 columns
I have a table with 18mn records.
I also have CUSTOMER_ID which is a UNIQUE identifier for a row.
I also have ORDER_DATE column there.
Even if I do the query that you mentioned
update hit set status = 1 where customer_id = 408518625844 and order_date = '09-DEC-19';
it won't help me to avoid a full table scan. See below Execution Plan. Therefore under conditions, you've specified, you will be always getting the slowest execution time possible. Full Table Scan on 50mn rows is actually the worst-case scenario.
And pay attention to that Cost, it is 26539 on 18mn rows.
So if you have 50mn rows we can easily expect much more Cost for your query
In Query that haves 3 tables variables joined with 2 tables in some cases where those variable tables are empty (no rows) in the Execution Plan still appears with 16% cost for the scan of each table variable.
From searching found this rule
If a table variable is joined with other tables in SQL Server, it may
result in slow performance due to inefficient query plan selection
because SQL Server does not support statistics or track number of rows
in a table variable while compiling a query plan
from https://sqlperformance.com/2014/06/t-sql-queries/table-variable-perf-fix
So my question is it worth it before creating the table variable and inserting rows into it, validate if there is any rows to insert in the first place and ignore the creation of the table variable and in the query where have 3 variable table ignore the joins for table variable that weren't created?
I'm currently using Oracle 11g and let's say I have a table with the following columns (more or less)
Table1
ID varchar(64)
Status int(1)
Transaction_date date
tons of other columns
And this table has about 1 Billion rows. I would want to update the status column with a specific where clause, let's say
where transaction_date = somedatehere
What other alternatives can I use rather than just the normal UPDATE statement?
Currently what I'm trying to do is using CTAS or Insert into select to get the rows that I want to update and put on another table while using AS COLUMN_NAME so the values are already updated on the new/temporary table, which looks something like this:
INSERT INTO TABLE1_TEMPORARY (
ID,
STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS)
SELECT
ID
3 AS STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS
FROM TABLE1
WHERE
TRANSACTION_DATE = SOMEDATE
So far everything seems to work faster than the normal update statement. The problem now is I would want to get the remaining data from the original table which I do not need to update but I do need to be included on my updated table/list.
What I tried to do at first was use DELETE on the same original table using the same where clause so that in theory, everything that should be left on that table should be all the data that i do not need to update, leaving me now with the two tables:
TABLE1 --which now contains the rows that i did not need to update
TABLE1_TEMPORARY --which contains the data I updated
But the delete statement in itself is also too slow or as slow as the orginal UPDATE statement so without the delete statement brings me to this point.
TABLE1 --which contains BOTH the data that I want to update and do not want to update
TABLE1_TEMPORARY --which contains the data I updated
What other alternatives can I use in order to get the data that's the opposite of my WHERE clause (take note that the where clause in this example has been simplified so I'm not looking for an answer of NOT EXISTS/NOT IN/NOT EQUALS plus those clauses are slower too compared to positive clauses)
I have ruled out deletion by partition since the data I need to update and not update can exist in different partitions, as well as TRUNCATE since I'm not updating all of the data, just part of it.
Is there some kind of JOIN statement I use with my TABLE1 and TABLE1_TEMPORARY in order to filter out the data that does not need to be updated?
I would also like to achieve this using as less REDO/UNDO/LOGGING as possible.
Thanks in advance.
I'm assuming this is not a one-time operation, but you are trying to design for a repeatable procedure.
Partition/subpartition the table in a way so the rows touched are not totally spread over all partitions but confined to a few partitions.
Ensure your transactions wouldn't use these partitions for now.
Per each partition/subpartition you would normally UPDATE, perform CTAS of all the rows (I mean even the rows which stay the same go to TABLE1_TEMPORARY). Then EXCHANGE PARTITION and rebuild index partitions.
At the end rebuild global indexes.
If you don't have Oracle Enterprise Edition, you would need to either CTAS entire billion of rows (followed by ALTER TABLE RENAME instead of ALTER TABLE EXCHANGE PARTITION) or to prepare some kind of "poor man's partitioning" using a view (SELECT UNION ALL SELECT UNION ALL SELECT etc) and a bunch of tables.
There is some chance that this mess would actually be faster than UPDATE.
I'm not saying that this is elegant or optimal, I'm saying that this is the canonical way of speeding up large UPDATE operations in Oracle.
How about keeping in the UPDATE in the same table, but breaking it into multiple small chunks?
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 0000000 and 0999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 1000000 and 1999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 2000000 and 2999999
COMMIT
This could help if the total workload is potentially manageable, but doing it all in one chunk is the problem. This approach breaks it into modest-sized pieces.
Doing it this way could, for example, enable other apps to keep running & give other workloads a look in; and would avoid needing a single humungous transaction in the logfile.
I have a Hive statement as below:
INSERT INTO TABLE myTable partioned (myDate) SELECT * from myOthertable
myOthertable contains 1 million records and, while executing the above Insert, not all rows are inserted into myTable. As it is a SELECT * query without any WHERE clause ideally the Insert should be done for all the rows from myOthertable into myTable. It ignores some of the rows while inserting.
Can anyone suggest why this is happening?
The issue may be due to ,If the table is large enough the above query wont work seems like due to the larger number of files created on initial map task.
So in that cases group the records in your hive query on the map process and process them on the reduce side. You can implement the same in your hive query itself with the usage of DISTRIBUTE BY. Below is the query .
FROM myOthertable
INSERT OVERWRITE TABLE myTable(myDate)
SELECT other1, other2 DISTRIBUTE BY myDate;
This link may help
I have a CTE which is a select statement on a table. Now if I delete 1 row from the CTE, will it delete that row from my base table?
Also is it the same case if I have a temp table instead of CTE?
Checking the DELETE statement documentation, yes, you can use a CTE to delete from and it will affect the underlying table. Similarly for UPDATE statements...
Also is it the same case if I have a temp table instead of CTE?
No, deletion from a temp table will affect the temp table only -- there's no connection to the table(s) the data came from, a temp table is a stand alone object.
You can think of CTE as a subquery, it doesn't have a temp table underneath.
So, if you run delete statement against your CTE you will delete rows from the table. Of course if SQL can infer which table to upadte/delete base on your CTE. Otherwise you'll see an error.
If you use temp table, and you delete rows from it, then the source table will not be affected, as temp table and original table don't have any correlation.
In the cases where you have a sub query say joining multiple tables and you need to use this in multiple places then both cte and temp table can be used. If you however want to delete records based on the sub query condition then cte is the way to go. Sometimes you can simply use the delete statement with out a need of cte since it's a delete statement and only rows that satisfy the query conditions get deleted even though multiple conditions are used for filtering.