Extract data from view ORACLE performance - sql

Hello I created a view to make a subquery ( select from two tables)
this the sql order :
CREATE OR REPLACE VIEW EMPLOYEER_VIEW
AS
SELECT A.ID,A.FIRST_NAME||' '||A.LAST_NAME AS NAME,B.COMPANY_NAME
FROM EMPLOY A, COMPANY B
WHERE A.COMPANY_ID=B.COMPANY_ID
AND A.DEPARTEMENT !='DEP_004'
ORDER BY A.ID;
If I select data from EMPLOYEER_VIEW the average execution time is 135,953 s
Table EMPLOY contiens 124600329 rows
Table COMPANY contiens 609 rows.
My question is :
How can i make the execution faster ?
I created two indexes:
emply_index (ID,COMPANY_ID,DEPARTEMENT)
and company_index(COMPANY_ID)
Can you help me to make selections run faster ? ( creating another index or change join )
PS: I Can't create a materialized view in this database.
In advance thanks for help.

You have a lot of things to do.
If you must work with a view, and can not create a scheduled job to insert data in a table, I will remove my answer.
VIEWs does not have the scope to support hundred of million data. Is for few million.
INDEXes Must be cleaned when data are inserting. If you insert data with an index the process is 100 times slower. (You can drop and create or update them).
In table company CREATE PARTITION.
If you have a lot of IDs, use RANGE.
If you have around 100 IDs LIST PARTITION.
You do not need Index, because the clause to JOIN does not optimize, INDEXes is specified to strict WHERE Clause.
We had a project with 433.000.000 data, and the only way to works was playing with partitions.

Related

Creating a view from JOIN two massive tables

Context
I have a big table, say table_A, with roughly 20 billion rows and 600 columns. I don't own this table but I can read from it.
For a fraction of these columns I produce a few extra columns (50) which I store in a separate table, say table_B, which is therefore roughly 20 bn X 50 large.
Now I have the need to expose the join of table table_A and table_B to users, which I tried as
CREATE VIEW table_AB
AS
SELECT *
FROM table_A AS ta
LEFT JOIN table_B AS tb ON (ta.tec_key = tb.tec_key)
The problem is that for any simple query like SELECT * FROM table_AB LIMIT 2 will fail because of memory issues: apparently Impala attempts to do a full join first in memory which would result into a table of 0.5 Petabyte. Hence the failure.
Question
What is the best way to create such a view?
How can one instruct SQL that e.g. filtering operations are to be performed on table_AB are to be executed before the join?
Creating a new table is also suboptimal because it would mean duplicating the data in table_AB, using up hundreds of Terabytes.
I have also tried with [...] SELECT STRAIGHT_JOIN * [...] but did not help.
What is the best way to create such a view?
Since both tables are huge, there will be memory problem. here are some points i would recommend,
Assuming table a and b have same tec_key, do a inner join
Keep (smaller) table b as driver. create vw as select ... from b join a on .... Impala stores driver table in memory and so it will require less memory.
Select only columns required and do not select all.
put filter in view.
Do partitions in table b if you can on some dtae/year/region/anything that can evenly distribute the data.
How can one instruct SQL that e.g. filtering operations are to be performed on table_AB are to be executed before the join?
You can not ensure filter goes before or after join. Only way to ensure a filter will improve perf is if you have partition on the filter column. Else, you can try to filter first and the join to see if it improves perf like this
select ... from b
join ( select ... from a where region='Asia') a on ... -- wont improve much
Creating a new table is also suboptimal because it would mean duplicating the data in table_AB, using up hundreds of Terabytes.
Completely agree on this. Multiple smaller tables is way better than one giant table with 600 columns. So, create few stg table with only required fields and then enrich that data. Its a difficult data set, but no one will change 20b rows everyday - so some sort of incremental is also possible to implement.

Query is very slow when we put a where clause on the total selected data by query

I am running a query which is selecting data on the basis of joins between 6-7 tables. When I execute the query it is taking 3-4 seconds to complete. But when I put a where clause on the fetched data it's taking more than one minute to execute. My query is fetching large amounts of data so I can't write it here but the situation I faced is explained below:
Select Category,x,y,z
from
(
---Sample Query
) as a
it's only taking 3-4 seconds to execute. But
Select Category,x,y,z
from
(
---Sample Query
) as a
where category Like 'Spart%'
is taking more than 2-3 minutes to execute.
Why is it taking more time to execute when I use the where clause?
It's impossible to say exactly what the issue is without seeing the full query. It is likely that the optimiser is pushing the WHERE into the "Sample query" in a way that is not performant. Possibly could be resolved by updating statistics on the table, but an easier option would be to insert the whole query into a temporary table, and filter from there.
Select Category,x,y,z
INTO #temp
from
(
---Sample Query
) as a
SELECT * FROM #temp WHERE category Like 'Spart%'
This will force the optimiser to tackle it in the logical order of pulling your data together before applying the WHERE to the end result. You might like to consider indexing the temp table's category field also.
If you're using MS SQL by checking the management studio actual execution plan it may already suggest an index creation
In any case, you should add to the index used by the query the column "Category"
If you don't have an index on that table create it composed by column "Category" and all the other columns used in join or where
bear in mind by using like 'text%' clause you could end in index scan and not index seek

updating a very large oracle table

I have a very large table people with 60M rows indexed on id, wish to populate a field newid for every record based on a look up table id_conversion (1M rows) which contains id and newid, indexed on id.
when I run
update people p set p.newid=(select l.newid from id_conversion l where l.id=p.id)
it runs for an hour or so and then I get an archive error ora 00257.
Any suggestions for either running update in sections or better sql command?
To avoid writing to Oracle's undo log if your update statement hits every single row of the table then you are likely better off running a create table as select query which will bypass all undo logs, which is likely the issue you're running into as it is logging the impact across 60 million rows. You can then drop the old table and rename the new table to that of the old table's name.
Something like:
create table new_people as
select l.newid,
p.col2,
p.col3,
p.col4,
p.col5
from people p
join id_conversion l
on p.id = l.id;
drop table people;
-- rebuild any constraints and indexes
-- from old people table to new people table
alter table new_people rename to people;
For reference, read some of the tips here: http://www.dba-oracle.com/t_efficient_update_sql_dml_tips.htm
If you are basically creating a new table and not just updating some of the rows of a table it will likely prove the faster method.
I doubt you will be able to get this to run in seconds. Your query, as written, needs to update all 60 million rows.
My first advice is to add an index on id_conversion(id, newid), to make the subquery more efficient. If that doesn't help, then doing the update in batches might be the best way to go.
I should add. Because you are updating all the rows, it might be faster to take the following approach:
Copy the data into a new table with the new values.
Truncate the original table.
Insert the new data into the old table.
Inserts are faster than updates.
In addition to the answers above, which probably will work better in this case, you should know the MERGE statement
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_9016.htm
that is used for updating one table according to another table and is far faster then update according to a select statement

minus vs delete where exist in oracle

I have a CREATE TABLE query which can be done using two methods (create as select statement for thousands/million records):
First method:
create table as select some data minus (select data from other table)
OR
first i should create the table as
create table as select .....
and then
delete from ..where exist.
I guess the second method is better.For which query the cost is less?Why is minus query not as fast as the second method?
EDIT:
I forgot to mention that the create statement has join from two tables as well.
The minus is slow probably because it needs to sort the tables on disk in order to compare them.
Try to rewrite the first query with NOT EXISTS instead of MINUS, it should be faster and will generate less REDO and UNDO (as a_horse_with_no_name mentioned). Of course, make sure that all the fields involved in the WHERE clauses are indexed!
The second one will write lots of records to disk and then remove them. This will in 9 of 10 cases take way longer then filtering what you write in to begin with.
So if the first one actually isn't faster we need more information about the tables and statements involved.

Performance: Subquery or Joining

I got a little question about performance of a subquery / joining another table
INSERT
INTO Original.Person
(
PID, Name, Surname, SID
)
(
SELECT ma.PID_new , TBL.Name , ma.Surname, TBL.SID
FROM Copy.Person TBL , original.MATabelle MA
WHERE TBL.PID = p_PID_old
AND TBL.PID = MA.PID_old
);
This is my SQL, now this thing runs around 1 million times or more.
My question is what would be faster?
If I change TBL.SID to (Select new from helptable where old = tbl.sid)
OR
If I add the 'HelpTable' to the from and do the joining in the where?
edit1
Well, this script runs only as much as there r persons.
My program has 2 modules one that populates MaTabelle and one that transfers data. This program does merge 2 databases together and coz of this, sometimes the same Key is used.
Now I'm working on a solution that no duplicate Keys exists.
My solution is to make a 'HelpTable'. The owner of the key(SID) generates a new key and writes it into a 'HelpTable'. All other tables that use this key can read it from the 'HelpTable'.
edit2
Just got something in my mind:
if a table as a Key that can be null(foreignkey that is not linked)
then this won't work with the from or?
Modern RDBMs, including Oracle, optimize most joins and sub queries down to the same execution plan.
Therefore, I would go ahead and write your query in the way that is simplest for you and focus on ensuring that you've fully optimized your indexes.
If you provide your final query and your database schema, we might be able to offer detailed suggestions, including information regarding potential locking issues.
Edit
Here are some general tips that apply to your query:
For joins, ensure that you have an index on the columns that you are joining on. Be sure to apply an index to the joined columns in both tables. You might think you only need the index in one direction, but you should index both, since sometimes the database determines that it's better to join in the opposite direction.
For WHERE clauses, ensure that you have indexes on the columns mentioned in the WHERE.
For inserting many rows, it's best if you can insert them all in a single query.
For inserting on a table with a clustered index, it's best if you insert with incremental values for the clustered index so that the new rows are appended to the end of the data. This avoids rebuilding the index and often avoids locks on the existing records, which would slow down SELECT queries against existing rows. Basically, inserts become less painful to other users of the system.
Joining would be much faster than a subquery
The main difference betwen subquery and join is
subquery is faster when we have to retrieve data from large number of tables.Because it becomes tedious to join more tables.
join is faster to retrieve data from database when we have less number of tables.
Also, this joins vs subquery can give you some more info
Instead of focussing on whether to use join or subquery, I would focus on the necessity of doing 1,000,000 executions of that particular insert statement. Especially as Oracle's optimizer -as Marcus Adams already pointed out- will optimize and rewrite your statements under the covers to its most optimal form.
Are you populating MaTabelle 1,000,000 times with only a few rows and issue that statement? If yes, then the answer is to do it in one shot. Can you provide some more information on your process that is executing this statement so many times?
EDIT: You indicate that this insert statement is executed for every person. In that case the advice is to populate MATabelle first and then execute once:
INSERT
INTO Original.Person
(
PID, Name, Surname, SID
)
(
SELECT ma.PID_new , TBL.Name , ma.Surname, TBL.SID
FROM Copy.Person TBL , original.MATabelle MA
WHERE TBL.PID = MA.PID_old
);
Regards,
Rob.