I have a temp table created in DB2 using java and then some 2000 rows inserted the same.
Now I am using this table in a select query with few joins. The query involves other 3 tables say A, B and C. Somehow this select query returns result very slow it almost takes a 20 seconds to provide results.
Below are the details of this query,
table A has 200000 records. B and C has only 100-200 records.
All these 3 tables have enough indexes defined on columns involved in join and the where clause. The Explain plan tool etc. did not show any new indexes needed.
When I run the query removing session table and its use in where clause, the query returns results in milliseconds. And as I mentioned this session table has only around 2000 records.
I have also declared indexes on each column of this session table.
I am not really sure about terminology here but when I say session table it is a temporary table created using db connection and the table gets dropped when the DB connection is closed. Also when the program runs with 15 threads, no thread is capable of looking at table created by other thread.
Where could the issue be? Please let me know some suggestions here.
Some suggestions (I assume LUW since you don't mention platform)
a) You say that you have indexes on each column of your session table, I assume this means a set of 1 column indexes. This is in most cases not optimal, you can probably replace those with a composite index. Check what advis suggests by creating a real table like:
create table temp.t ( ... )
insert into temp.t (...) values (...)
runstats on temp.t with distribution
then run: db2advis -d -m I -s "your query but with temp.t instead of session table"
b) After data is loaded and - eventually - new indexes are created, do a runstats on the session table
Related
Does a query such as this create a temp table? or is it a one time use within the query?
SELECT A
FROM
(
SELECT A, B FROM TableA
UNION
SELECT A, B FROM TableB
) AS tbl
WHERE B > 'some value'
I am using psql, and snowflake
No, it does not create a temp table.
It does, however, materialize the rows. I'm pretty sure it does this in all databases. The use of union requires removing duplicates. The duplication removal would typically be done using a sorting or hashing algorithm.
In both these cases, the data is going to be written into intermediate storage.
However, the extra metadata that is used for temporary tables would not typically be written. This would just be "within-a-query" temporary space.
In Postgres, a temporary table does not get created. By "temporary table," I mean a file on disk, with a relfilenode entry in pg_class, that exists for the duration of the psql session. A "table" is created in memory for the purposes of the query execution, but it's not a "table" in the sense that you can query from it (it's more of a data structure).
What you're asking about is basically how Postgres handle subqueries--subqueries are evaluated and materialized, then stored into memory/cache for future reference. If you take a look at EXPLAIN (ANALYZE, BUFFERS) as you repeat your query 2-3 times, you'll see that a subquery node gets generated, and subsequent calls to the query will lead to shared buffers hit:..., indicating that the previous calls were cached for faster future access.
I have a table named [cwbOrder] that currently has 1.277.469 rows. I am using SQL Server 2014 and I am doing these tests on a UAT environment, on production this query takes a little bit longer.
If I try selecting all of the rows like using:
SELECT * FROM cwbOrder
It takes 24 seconds to retrieve all of the data from the table. I have read about how it is important to index columns used in the predicates (WHERE), but I still cannot understand how does a simple select take 24 seconds.
Using this table in other more complex queries generates a lot of extra workload for the query, although I have created the JOINs on indexed columns. Additionally I have selected only 2 columns from this table then JOINED it to another table and this operation still takes a significantly long amount of time. As an example please consider the below query:
Below I have attached the index structure of both tables, to illustrate the matter:
PK_cwbOrder is the index on the id_cwbOrder column in the cwbOrder table.
Edit 1: I have added the execution plan for the query in which I join the cwbOrder table with the cwbAction table.
Is there any way, considering the information above, that I can make this query faster?
There are many reasons why such a select could be slow:
The row size or number of rows could be very large, requiring a lot of time to transport or delay.
Other operations on the table could have locks on the table.
The database server or network could be very busy.
The "table" could really be a view that is running a complicated query.
You can test different aspects. For instance:
SELECT TOP 10 <one column here>
FROM cwbOrder o
This returns a very small result set and reads just a small part of the table. This reads the entire table but returns a small result set:
SELECT COUNT(*)
FROM cwbOrder o
I have a very large table people with 60M rows indexed on id, wish to populate a field newid for every record based on a look up table id_conversion (1M rows) which contains id and newid, indexed on id.
when I run
update people p set p.newid=(select l.newid from id_conversion l where l.id=p.id)
it runs for an hour or so and then I get an archive error ora 00257.
Any suggestions for either running update in sections or better sql command?
To avoid writing to Oracle's undo log if your update statement hits every single row of the table then you are likely better off running a create table as select query which will bypass all undo logs, which is likely the issue you're running into as it is logging the impact across 60 million rows. You can then drop the old table and rename the new table to that of the old table's name.
Something like:
create table new_people as
select l.newid,
p.col2,
p.col3,
p.col4,
p.col5
from people p
join id_conversion l
on p.id = l.id;
drop table people;
-- rebuild any constraints and indexes
-- from old people table to new people table
alter table new_people rename to people;
For reference, read some of the tips here: http://www.dba-oracle.com/t_efficient_update_sql_dml_tips.htm
If you are basically creating a new table and not just updating some of the rows of a table it will likely prove the faster method.
I doubt you will be able to get this to run in seconds. Your query, as written, needs to update all 60 million rows.
My first advice is to add an index on id_conversion(id, newid), to make the subquery more efficient. If that doesn't help, then doing the update in batches might be the best way to go.
I should add. Because you are updating all the rows, it might be faster to take the following approach:
Copy the data into a new table with the new values.
Truncate the original table.
Insert the new data into the old table.
Inserts are faster than updates.
In addition to the answers above, which probably will work better in this case, you should know the MERGE statement
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_9016.htm
that is used for updating one table according to another table and is far faster then update according to a select statement
I have a DB2 database with three tables, A, B and C.
The database was created thus:
create database DB alias DB AUTOMATIC STORAGE YES ON /home/db2inst1
using codedeset UTF-8 territory en PAGESIZE 32768
Table A is 28 columns wide with 1.8 mill. rows and PID is primary
key. The columns mostly has int-types, but some are varchar(200-400).
Index: PID
Table B is 7 columns wide with 14 mill. rows and primary key PID_L.
It also has columns C_SOURCE and ROW_COUNT. Index: PID,C_SOURCE
Table C is 20 columns wide with 14 mill. rows and primary key PID_S.
It also has a column ROLE. Index: PID,PID_S
All tables have the column PID
I need a table which aggregates some info in Table B and C. The query to select the appropriate items is:
SELECT
T.*,
(
SELECT
COALESCE(SUM(ROW_COUNT),0)
FROM
C as ITS,
B as ITL
WHERE
ITS.ROLE = 1
AND ITS.PID = ITL.PID
AND ITS.PID_S = ITL.C_SOURCE
AND ITS.PID = T.PID
) AS RR
FROM
A as T;
When this query is run, the DB2 server quickly uses about 3Gb memory. Using top, the CPU usage, however, rarely goes beyond 5%, with some jumps into about 13%. The DB2 server is a RedHat6.2 VM, with 4 cores and 2Ghz per core.
I have let this query run for 24 hours without anything seeming to happen. Other queries, like simple selects and many more, work smoothly.
Questions:
Do you have any suggestions for a different, more efficient, query that might accomplish the same thing?
Is it possible that this performance issue has something to do with the configuration of the database?
i would try the "explain" feature, to see, what db2 is making out of your query
db2exfmt -d database -e schema -t -v % -w -1 -s % -# 0 -n % -g OTIC
You use a nested selection, I propose to call your table A inside the second query and leave the join condition on the table A wit the table C.
I think it could optimize the response time of your query but it all depends on the creation of the tables (Declaration of indexes for example).
Best regards,
Bounty open:
Ok people, the boss needs an answer and I need a pay rise. It doesn't seem to be a cold caching issue.
UPDATE:
I've followed the advice below to no avail. How ever the client statistics threw up an interesting set of number.
#temp vs #temp
Number of INSERT, DELETE and UPDATE statements
0 vs 1
Rows affected by INSERT, DELETE, or UPDATE statements
0 vs 7647
Number of SELECT statements
0 vs 0
Rows returned by SELECT statements
0 vs 0
Number of transactions
0 vs 1
The most interesting being the number of rows affected and the number of transactions. To remind you, the queries below return identical results set, just into different styles of tables.
The following query are basicaly doing the same thing. They both select a set of results (about 7000) and populate this into either a temp or var table. In my mind the var table #temp should be created and populated quicker than the temp table #temp however the var table in the first example takes 1min 15sec to execute where as the temp table in the second example takes 16 seconds.
Can anyone offer an explanation?
declare #temp table (
id uniqueidentifier,
brand nvarchar(255),
field nvarchar(255),
date datetime,
lang nvarchar(5),
dtype varchar(50)
)
insert into #temp (id, brand, field, date, lang, dtype )
select id, brand, field, date, lang, dtype
from view
where brand = 'myBrand'
-- takes 1:15
vs
select id, brand, field, date, lang, dtype
into #temp
from view
where brand = 'myBrand'
DROP TABLE #temp
-- takes 16 seconds
I believe this almost completely comes down to table variable vs. temp table performance.
Table variables are optimized for having exactly one row. When the query optimizer chooses an execution plan, it does it on the (often false) assumption that that the table variable only has a single row.
I can't find a good source for this, but it is at least mentioned here:
http://technet.microsoft.com/en-us/magazine/2007.11.sqlquery.aspx
Other related sources:
http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=125052
http://databases.aspfaq.com/database/should-i-use-a-temp-table-or-a-table-variable.html
Run both with SET STATISTICS IO ON and SET STATISTICS TIME ON. Run 6-7 times each, discard the best and worst results for both cases, then compare the two average times.
I suspect the difference is primarily from a cold cache (first execution) vs. a warm cache (second execution). The output from STATISTICS IO would give away such a case, as a big difference in the physical reads between the runs.
And make sure you have 'lab' conditions for the test: no other tasks running (no lock contention), databases (including tempdb) and logs are pre-grown to required size so you don't hit any log growth or database growth event.
This is not uncommon. Table variables can be (and in a lot of cases ARE) slower than temp tables. Here are some of the reasons for this:
SQL Server maintains statistics for queries that use temporary tables but not for queries that use table variables. Without statistics, SQL Server might choose a poor processing plan for a query that contains a table variable
Non-clustered indexes cannot be created on table variables, other than the system indexes that are created for a PRIMARY or UNIQUE constraint. That can influence the query performance when compared to a temporary table with non-clustered indexes.
table variables use internal metadata in a way that prevents the engine from using a table variable within a parallel query (this means that it wont take advantage of multi-processor machines).
A table variable is optimized for one row, by SQL Server (it assumes 1 row will be returned).
I'm not 100% that this is the cause, but the table var will not have any statistics whereas the temp table will.
SELECT INTO is a non-logged operation, which would likely explain most of the performance difference. INSERT creates a log entry for every operation.
Additionally, SELECT INTO is creating the table as part of the operation, so SQL Server knows automatically that there are no constraints on it, which may factor in.
If it takes over a full minute to insert 7000 records into a temp table (persistent or variable), then the perf issue is almost certainly in the SELECT statement that's populating it.
Have you run DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS before profiling? I'm thinking that maybe it's using some cached results for the second query.