Caching joined tables in SQL Server - sql

My website has a search procedure that runs very slowly. One thing that slows it down is the 8 table join it has to do (It also has a WHERE clause on ~6 search parameters). I've tried to make the query faster using various methods such as adding indexes, but these have not helped.
One Idea I have is to cache the result of the 8 table join. I could create a temporary table of the join, and make the search procedure query this table. I could update the table every 10 minutes or so.
Using pseudo code, I would change my procedure to look like this:
IF CachedTable is NULL or CachedTable is older than 10 minutes
DROP TABLE CachedTable
CREATE TABLE CachedTable as (select * from .....)
ENDIF
Select * from CachedTable Where Name = #SearchName
AND EmailAddress = #SearchEmailAddress
Is this a working strategy? I don't really know what syntax I would need to pull this off, or what I would need to lock to stop things from breaking if two queries happen at the same time.
Also, it might take quite a long time to make a new CachedTable each time, so I thought of trying something like double buffering in computer graphics:
IF CachedTabled is NULL
CREATE TABLE CachedTable as (select * from ...)
ELSE IF CachedTable is older than 10 minutes
-- Somehow do this asynchronously, so that the next time a search comes
-- through the new table is used?
ASYNCHRONOUS (
CREATE TABLE BufferedCachedTable as (select * from ...)
DROP TABLE CachedTable
RENAME TABLE BufferedCachedTable as CachedTable
)
Select * from CachedTable Where Name = #SearchName
AND EmailAddress = #SearchEmailAddress
Does this make any sense? If so, how would I achieve it? If not, what should I do instead? I tried using indexed views, but this resulted in weird errors, so I want something like this that I can have more control over (Also, something I can potentially spin off onto a different server in the future.)
Also, what about indexes and so on for tables created like this?
This may be obvious from the question, but I don't know that much about SQL or the options I have available.

You can use multiple schemas (you should always specify schema!) and play switch-a-roo as I demonstrated in this question. Basically you need two additional schemas (one to hold a copy of the table temporarily, and one to hold the cached copy).
CREATE SCHEMA cache AUTHORIZATION dbo;
CREATE SCHEMA hold AUTHORIZATION dbo;
Now, create a mimic of the table in the cache schema:
SELECT * INTO cache.CachedTable FROM dbo.CachedTable WHERE 1 = 0;
-- then create any indexes etc.
Now when it comes time to refresh the data:
-- step 1:
TRUNCATE TABLE cache.CachedTable;
-- (if you need to maintain FKs you may need to delete)
INSERT INTO cache.CachedTable SELECT ...
-- step 2:
-- this transaction will be almost instantaneous,
-- since it is a metadata operation only:
BEGIN TRANSACTION;
ALTER SCHEMA hold TRANSFER dbo.Cachedtable;
ALTER SCHEMA dbo TRANSFER cache.CachedTable;
ALTER SCHEMA cache TRANSFER hold.CachedTable;
COMMIT TRANSACTION;

Related

Clean/truncate table before selecting into it

Just had a quick question to know if this is the right way to do something.
I have a query that I want to create a table. I thought about updating the table with only the changed rows, but since my query only takes about a minute, I think it is easier to just drop the whole table and rerun the query every time I do my hourly update.
Will this be the way to do it?
Truncate Table
Select *
Into Table
From TableTwo
Where X
And then just take that query, turn it into a stored procedure, and turn the procedure into a job that runs once an hour.
Also, I want this table to have indexes. Will they be preserved even if I truncate every time.
You can do this. I would probably advise dropping the table instead. This will better handle changes in table structure.
If you do use truncate, you want insert rather than into:
insert into Table -- column list is recommended
Select *
From TableTwo
Where X;
Dropping the table might take an iota more time, and it doesn't preserve triggers, constraints, foreign key references, and storage definitions. (I'm guessing those are not important.) However, it does allow the query to change over time, which might be useful to future-proof the code.
Truncate doesn't drop the Table from the database, it just cleans up the table.
So you won't be able to run SELECT INTO because ObjectID already exists. However Truncate preserves all the indexes and keys and table integrity
However if you drop the table you get rid it's ObjectID and then you can run SELECT INTO. It's only a good idea if the Table you're inserting from is going to have a lot of changes all the time(which is a bad thing on its own). This method doesn't preserve any indexes or keys and you'd have to create them in the Stored proc every time you run it.
Which is again a bad thing on it's own.
My Suggestion is you should turn it into the Insert Into stored procedure with Truncate in it. If your company decides to make changes to the Table then you go and change your MyTable and SP, it's more headache, but usually companies don't change their database structure very often, unless the database is still in a Development or Testing and not live. In that case SELECT INTO will be only a temporary solution.
CREATE PROC MyProc
as
TRUNCATE TABLE MyTable;
INSERT INTO MyTable (Col1, Col2, Col3)
SELECT Col1, Col2, Col3
FROM TableTwo

Add column to huge table in Postgresql without downtime

I have a very small table(about 1 mil rows) and I'm going to drop constraints and add new column. The query below is hang about 5 minutes, had to rollback.
BEGIN WORK;
LOCK TABLE usertable IN SHARE MODE;
ALTER TABLE usertable ALTER COLUMN email DROP NOT NULL;
COMMIT WORK;
Another approach suggested on the similar questions in the internet -
CREATE TABLE new_tbl
(
field1 int,
field2 int,
...
);
INSERT INTO new_tbl(field1, field2, ...)
(
SELECT FROM ... -- use your new logic here to insert the updated data
)
CREATE INDEX -- add your constraints and indexes to new_tbl
DROP TABLE tbl;
ALTER TABLE tbl_new RENAME tbl;
Create new table
Insert records from old table to new one (take less then a second)
Drop old table - this query hangs for about 5 minutes ~. Had to rollback. Does not work for me.
Renamed new created table to old one
Dropping old table simply hangs. However when I try to drop new created table with 1 million rows - it works instantly. Why dropping of old table takes so much time ?
SELECT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
I can see a lot of concurrent writes/reads which are waiting for my operation. Since I took lock on the table, I don't really think that the reason why I can't drop old table.
Just run vacuum on old table it did not help.
Why I can't drop old table why it takes so much time compared to dropping recently created table ?
I don't have a lot of experience with PostgreSQL, but my guess is that it keeps a bit to signify when a NULLable column is NULL (as opposed to empty) and when that column is marked as NOT NULL it no longer needs that bit. So, when you change that attribute on a column the system needs to go through the whole table and rearrange the data, moving lots of bits around so that the rows are all correctly structured.
This is much different from a DROP TABLE, which merely needs to mark the disk space as no longer in use and perhaps update a few metadata values.
In short, they're very different actions, so of course they take different amounts of time.
I was not able to drop/rename table original table cause of FK of others tables. Once I dropoed it, approach with renaming table works great

updating a very large oracle table

I have a very large table people with 60M rows indexed on id, wish to populate a field newid for every record based on a look up table id_conversion (1M rows) which contains id and newid, indexed on id.
when I run
update people p set p.newid=(select l.newid from id_conversion l where l.id=p.id)
it runs for an hour or so and then I get an archive error ora 00257.
Any suggestions for either running update in sections or better sql command?
To avoid writing to Oracle's undo log if your update statement hits every single row of the table then you are likely better off running a create table as select query which will bypass all undo logs, which is likely the issue you're running into as it is logging the impact across 60 million rows. You can then drop the old table and rename the new table to that of the old table's name.
Something like:
create table new_people as
select l.newid,
p.col2,
p.col3,
p.col4,
p.col5
from people p
join id_conversion l
on p.id = l.id;
drop table people;
-- rebuild any constraints and indexes
-- from old people table to new people table
alter table new_people rename to people;
For reference, read some of the tips here: http://www.dba-oracle.com/t_efficient_update_sql_dml_tips.htm
If you are basically creating a new table and not just updating some of the rows of a table it will likely prove the faster method.
I doubt you will be able to get this to run in seconds. Your query, as written, needs to update all 60 million rows.
My first advice is to add an index on id_conversion(id, newid), to make the subquery more efficient. If that doesn't help, then doing the update in batches might be the best way to go.
I should add. Because you are updating all the rows, it might be faster to take the following approach:
Copy the data into a new table with the new values.
Truncate the original table.
Insert the new data into the old table.
Inserts are faster than updates.
In addition to the answers above, which probably will work better in this case, you should know the MERGE statement
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_9016.htm
that is used for updating one table according to another table and is far faster then update according to a select statement

What does 'select to a temp table' mean?

This answer had me slightly confused. What is a 'select to a temp table' and can someone show me a simple example of it?
A temp table is a table that exists just for the duration of the stored procedure and is commonly used to hold temporary results on the way to a final calculation.
In SQL Server, all temp tables are prefixed with a # so if you issue a statement like
Create table #tmp(id int, columnA)
Then SQL Server will automatically know that the table is temporary, and it will be destroyed when the stored procedure goes out of scope unless the table is explicitly dropped like
drop table #tmp
I commonly use them in stored procedures that run against huge tables with a high transaction volume, because I can insert the subset of data that I need into the temp table as a temporary copy and work on the data without fear of bringing down a production system if what I'm doing with the data is a fairly intense operation.
In SQL Server all temp tables live in the tempdb datase.
See this article for more information.
If you have a complex set of results that you want to use again and again, then do you keep querying the main tables (where data will be changing, and may impact performance) or do you store them up in a temporary table for more processing. It's better to use a temporary table often.
Or you really need to iterate through rows in a non-set fashion you can use a temp table (or CURSOR)
If you do simple CRUD against a DB then you probably have no need for temp tables
You have:
table variables: DECLARE #foo TABLE (bar int...)
explict temp tables: CREATE TABLE #foo (bar int...)
inline created: SELECT ... INTO #foo FROM...
A temp table is a table that is dynamically created by using some such syntax:
SELECT [columns] INTO #MyTable FROM SomeExistingTable
What you then have is a table that is populated with the values that you selected into it. Now you can select against it, update it, whatever.
SELECT FirstName FROM #MyTable WHERE...
The table lives for some predetermined scope of time, for example, for the duration of the stored procedure in which it lives. Then it's gone from memory and never accessible again. Temporary.
HTH
You can use SELECT ... INTO to both create a temp table and populate it like so:
SELECT Col1, Col2...
INTO #Table
FROM ...
WHERE ...
(BTW, this syntax is for SQL Server and Sybase. )
EDIT Once you had created the table like I did above, you can then use it other queries on the same connection:
Select
From OtherTable
Join #Table
On #Table.Col = OtherTable.Col
The key here is that it all happens on the same connection. Thus, to create and use a temp table from a client script would be awkward in that you would have to ensure that all subsequent uses of the table were on the same connection. Instead, most people use temp tables in stored procedures where they create the table on one line and then use a few lines later in the same procedure.
Think of temp tables as sql variable of type 'table'. Use them in scripts and stored procedures. It comes handy when you need to manipulate data that is not simple value but a subset of a database table (both vertical and horizontal).
When you realize these benefits then you can take advantage of more power that comes with various sharing models (scope) for temp tables: private, global, transaction, etc. All major RDBMS engines support temp tables but there is no standard features or syntax for them.
For example of usage see answer.

How do I find the last time that a PostgreSQL database has been updated?

I am working with a postgreSQL database that gets updated in batches. I need to know when the last time that the database (or a table in the database)has been updated or modified, either will do.
I saw that someone on the postgeSQL forum had suggested that to use logging and query your logs for the time. This will not work for me as that I do not have control over the clients codebase.
You can write a trigger to run every time an insert/update is made on a particular table. The common usage is to set a "created" or "last_updated" column of the row to the current time, but you could also update the time in a central location if you don't want to change the existing tables.
So for example a typical way is the following one:
CREATE FUNCTION stamp_updated() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
NEW.last_updated := now();
RETURN NEW;
END
$$;
-- repeat for each table you need to track:
ALTER TABLE sometable ADD COLUMN last_updated TIMESTAMP;
CREATE TRIGGER sometable_stamp_updated
BEFORE INSERT OR UPDATE ON sometable
FOR EACH ROW EXECUTE PROCEDURE stamp_updated();
Then to find the last update time, you need to select "MAX(last_updated)" from each table you are tracking and take the greatest of those, e.g.:
SELECT MAX(max_last_updated) FROM (
SELECT MAX(last_updated) AS max_last_updated FROM sometable
UNION ALL
SELECT MAX(last_updated) FROM someothertable
) updates
For tables with a serial (or similarly-generated) primary key, you can try avoid the sequential scan to find the latest update time by using the primary key index, or you create indices on last_updated.
-- get timestamp of row with highest id
SELECT last_updated FROM sometable ORDER BY sometable_id DESC LIMIT 1
Note that this can give slightly wrong results in the case of IDs not being quite sequential, but how much accuracy do you need? (Bear in mind that transactions mean that rows can become visible to you in a different order to them being created.)
An alternative approach to avoid adding 'updated' columns to each table is to have a central table to store update timestamps in. For example:
CREATE TABLE update_log(table_name text PRIMARY KEY, updated timestamp NOT NULL DEFAULT now());
CREATE FUNCTION stamp_update_log() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
INSERT INTO update_log(table_name) VALUES(TG_TABLE_NAME);
RETURN NEW;
END
$$;
-- Repeat for each table you need to track:
CREATE TRIGGER sometable_stamp_update_log
AFTER INSERT OR UPDATE ON sometable
FOR EACH STATEMENT EXECUTE stamp_update_log();
This will give you a table with a row for each table update: you can then just do:
SELECT MAX(updated) FROM update_log
To get the last update time. (You could split this out by table if you wanted). This table will of course just keep growing: either create an index on 'updated' (which should make getting the latest one pretty fast) or truncate it periodically if that fits with your use case, (e.g. take an exclusive lock on the table, get the latest update time, then truncate it if you need to periodically check if changes have been made).
An alternative approach- which might be what the folks on the forum meant- is to set 'log_statement = mod' in the database configuration (either globally for the cluster, or on the database or user you need to track) and then all statements that modify the database will be written to the server log. You'll then need to write something outside the database to scan the server log, filtering out tables you aren't interested in, etc.
It looks like you can use pg_stat_database to get a transaction count and check if this changes from one backup run to the next - see this dba.se answer and comments for more details
I like Jack's approach. You can query the table stats and know the number of inserts, updates, deletes and so:
select n_tup_upd from pg_stat_user_tables where relname = 'YOUR_TABLE';
every update will increase the count by 1.
bare in mind this method is viable when you have a single DB. multiple instances will require different approach probably.
See the following article:
MySQL versus PostgreSQL: Adding a 'Last Modified Time' Column to a Table
http://www.pointbeing.net/weblog/2008/03/mysql-versus-postgresql-adding-a-last-modified-column-to-a-table.html
You can write a stored procedure in an "untrusted language" (e.g. plpythonu): This allows access to the files in the postgres "base" directory. Return the larges mtime of these files in the stored procedure.
But this is only vague, since vacuum will change these files and the mtime.