I've been told that RDBMS ( SQL Server in this case ) make use of the temporary database to perform its internal job, for instance when a SELECT count( column ) FROM foo query is performed.
What kind of queries / statements trigger the use of the temporary database?
background:
We are currently about to change the collation on our application database, but we have been told there might be problems if that database make use of the temporary database, because they will have different collation. The rationale is the temporary database is already being used by other applications.
So we want to identify what kind of queries may trigger temp db usage and see if they'll have any problem.
I've found this about when is the db used:
http://msdn.microsoft.com/en-us/library/ms190768.aspx
Related
I read about temporary tables, global temporary tables and table variables. I understood it but could not imagine a condition when I have to use this. Please elaborate on when I should use the temporary table.
Most common scenario for using temporary tables is from within a stored procedure.
If there is logic inside a stored procedure which involves manipulation of data that cannot be done within a single query, then in such cases, the output of one query / intermediate results can be stored in a temporary table which then participates in further manipulation via joins etc to achieve the final result.
One common scenario in using temporary tables is to store the results of a SELECT INTO statement
The table variable is relatively new (introduced in SQL Server 2005 - as far as i can remember ) can be used instead of the temp table in most cases. Some differences between the two are discussed here
In a lot of cases, especially in OLTP applications, usage of temporary tables within your procedures means that you MAY possibly have business processing logic in your database and might be a consideration for you to re-look your design - especially in case of n tier systems having a separate business layer in their application.
The main difference between the three is a matter of lifetime and scope.
By a global table, I am assuming you mean a standard, run of the mill, table. Tables are used for storing persistent data. They are accessible to all logged in users. Any changes you make are visible to other users and vice versa.
A temporary table exist solely for storing data within a session. The best time to use temporary tables are when you need to store information within SQL server for use over a number of SQL transactions. Like a normal table, you'll create it, interact with it (insert/update/delete) and when you are done, you'll drop it. There are two differences between a table and a temporary table.
The temporary table is only visible to you. Even if someone else creates a temporary table with the same name, no one else will be able to see or affect your temporary table.
The temporary table exists for as long as you are logged in, unless you explicitly drop it. If you log out or are disconnected SQL Server will automatically clean it up for you. This also means the data is not persistent. If you create a temporary table in one session and log out, it will not be there when you log back in.
A table variable works like any variable within SQL Server. This is used for storing data for use in a single transaction. This is a relatively new feature of TSQL and is generally used for passing data between procedures - like passing an array. There are three differences between a table and a table variable.
Like a temporary table, it is only visible to you.
Because it is a variable, it can be passed around between stored procedures.
The temporary table only exists within the current transaction. Once SQL Server finishes a transaction (with the GO or END TRANSACTION statements) or it goes out of scope, it will be deallocated.
I personally avoid using temporary tables and table variables, for a few reasons. First, the syntax for them is Microsoft specific. If your program is going to interact with more than one RDBMS, don't use them. Also, temporary tables and table variables have a tendency to increase the complexity of some SQL queries. If your code can be accomplished using a simpler method, I'd recommend going with simple.
I did a succesful migration from MySql to Sql Server using the migration tool.
Unfortunately for some reason it labels the tables database.DBO.tablename instead of just database.tablename
I have never used Sql Server so perhaps this is just the way they name their tables.
When I do:
SELECT TOP 1000 [rid]
,[filename]
,[qcname]
,[compound]
,[response]
,[isid]
,[isidresp]
,[finalconc]
,[rowid]
FROM [test].[calibration]
it does not work
But, when I do:
SELECT TOP 1000 [rid]
,[filename]
,[qcname]
,[compound]
,[response]
,[isid]
,[isidresp]
,[finalconc]
,[rowid]
FROM [test].[dbo].[calibration]
it works.
Does anyone know why it prefixes with DBO?
dbo is the standard database owner for anything you create (tables, stored procedures, etc,..), hence the migration tool automatically prefixing everything with it.
When you access something in Sql Server, such as a table called calibration, the following are functionally equivalent:
calibration
dbo.calibration
database_name.dbo.calibration
server_name.database_name.dbo.calibration
MySql doesn't, as far as I remember (we migrated a solution from MySql to SqlServer about 12 months ago using custom scripts executed by nant) support database owner's when referencing objects, hence you're probably not familiar with four part (server_name.database_name.owner_name.object_name) references.
Basically, if you want to specify the database you're accessing, you also need to specify the "owner" of the object. i.e, the following are functionally identical:
USE [master]
GO
SELECT * FROM [mydatabase].[dbo].[calibration]
USE [mydatabase]
GO
SELECT * FROM [calibration]
SqlServer uses an owner name when it references tables. In this case, dbo is the owner.
MySQL doesn't use owner for table names, which is why you didn't see those names before.
SQL Server has something called schemas, in this case the default schema is dbo but it could be anything you wanted. Schemas are used to logically group objects. So you can create a Employee schema and have all the Employee tables, views, procs and functions in there, this then also enables you to give certain users only access to certain schemas
Tell me your migration tool you have used, and let me know the version of from and to databases.
Regards
Eugene
You do have an issue here with the default schema, if it's set to 'dbo' for the user you logged in as you don't need to specify it. See http://msdn.microsoft.com/en-us/library/ms176060.aspx
I am trying to understand the tempDB and following are the doubts popping in my mind.
What is the lifetime of data in tempDB? Say a query is doing some Order By and uses tempDB for performing that. After this query finishes, someone else also executes a query which utilizes the tempDB. Will the second query find records written by first query in the tempDB or will they be deleted?
Are there any visible tables created inside the tempDB by the Sql Engine? How can I know which temporary table is created because of this query? Is there any naming convention followed by the Sql engine for naming these temporary tables?
I am new to tempDB so please pardon me for asking such silly (if at all) questions :-)
It will be very nice if someone can point me to a good resource which can help me learn about tempDB.
Temp table is stored in tempdb until the connection is dropped (or in the case of a global temp tables when the last connection using it is dropped). You can also (and it is a good practice to do so) manually drop the table when you are finished using it with a drop table statement.
No, others cannot see your temp tables if they are local temp tables (They can see and use global temp tables) Multiple people can run commands which use the same temp table name but they will not be overlapping in a local temp table and so you can have a table named #test and so can 10,000 other users, but each one has its own structure and data.
You don't want to generally look up temp tables in tempdb. It is possible to check for existence, but that is the only time I have ever referenced tempdb directly. Simply use your temp table name. Example below of checking for existence
IF OBJECT_ID('TempDB.dbo.#DuplicateAssignments') IS NOT NULL
BEGIN
DROP TABLE #DuplicateAssignments
END
You name temp tables by prefacing the name with # (for local tables the ones you would use 999.9% of the time) and ## for global temp tables, then the rest of the name you want.
There's a few MSDN articles that are probably the best source of information on the tempDB database in SQL Server.
tempdb Database
The tempdb system database is a global
resource that is available to all
users connected to the instance of SQL
Server and is used to hold the
following:
Temporary user objects that are explicitly created, such as: global or
local temporary tables, temporary
stored procedures, table variables, or
cursors.
Internal objects that are created by the SQL Server Database Engine, for
example, work tables to store
intermediate results for spools or
sorting.
Row versions that are generated by data modification transactions in a
database that uses read-committed
using row versioning isolation or
snapshot isolation transactions.
Row versions that are generated by data modification transactions for
features, such as: online index
operations, Multiple Active Result
Sets (MARS), and AFTER triggers.
Operations within tempdb are minimally
logged. This enables transactions to
be rolled back. tempdb is re-created
every time SQL Server is started so
that the system always starts with a
clean copy of the database. Temporary
tables and stored procedures are
dropped automatically on disconnect,
and no connections are active when the
system is shut down. Therefore, there
is never anything in tempdb to be
saved from one session of SQL Server
to another. Backup and restore
operations are not allowed on tempdb.
There's also tempdb and Index Creation, this blog post along with Working with tempdb in SQL Server 2005 which states:
The SQL Server system database, tempdb, has undergone a number of changes in SQL Server 2005. There are new tempdb usages and internal optimizations in SQL Server 2005; tempdb architecture is mostly unchanged since SQL Server 2000.
The tempdb system database is very similar to a user database. The main difference is that data in tempdb does not persist after SQL Server shuts down.
The temporary tables created in TempDB are dropped when the query is completed.
I'm not sure on this (I would have to try it), but I think theoretically ALL tables created in TempDB are visible, although only the user that created the table has permission to access it.
If you create an Oracle dblink you cannot directly access LOB columns in the target tables.
For instance, you create a dblink with:
create database link TEST_LINK
connect to TARGETUSER IDENTIFIED BY password using 'DATABASESID';
After this you can do stuff like:
select column_a, column_b
from data_user.sample_table#TEST_LINK
Except if the column is a LOB, then you get the error:
ORA-22992: cannot use LOB locators selected from remote tables
This is a documented restriction.
The same page suggests you fetch the values into a local table, but that is... kind of messy:
CREATE TABLE tmp_hello
AS SELECT column_a
from data_user.sample_table#TEST_LINK
Any other ideas?
The best solution by using a query as below, where column_b is a BLOB:
SELECT (select column_b from sample_table#TEST_LINK) AS column_b FROM DUAL
Yeah, it is messy, I can't think of a way to avoid it though.
You could hide some of the messiness from the client by putting the temporary table creation in a stored procedure (and using "execute immediate" to create they table)
One thing you will need to watch out for is left over temporary tables (should something fail half way through a session, before you have had time to clean it up) - you could schedule an oracle job to periodically run and remove any left over tables.
For query data, the solution of user2015502 is the smartest. If you want to insert or update LOB's AT the remote database (insert into xxx#yyy ...) you can easily use dynamic SQL for that. See my solution here:
You could use materalized views to handle all the "cache" management. It´s not perfect but works in most cases :)
Do you have a specific scenario in mind?
For example, if the LOB holds files, and you are on a company intranet, perhaps you can write a stored procedure to extract the files to a known directory on the network and access them from there.
In this specific case can the only way the two systems can communicate is using the dblink.
Also, the table solution is not that terrible, it's just messy to have to "cache" the data on my side of the dblink.
I'm going to guess that the answer is "no" based on the below error message (and this Google result), but is there anyway to perform a cross-database query using PostgreSQL?
databaseA=# select * from databaseB.public.someTableName;
ERROR: cross-database references are not implemented:
"databaseB.public.someTableName"
I'm working with some data that is partitioned across two databases although data is really shared between the two (userid columns in one database come from the users table in the other database). I have no idea why these are two separate databases instead of schema, but c'est la vie...
Note: As the original asker implied, if you are setting up two databases on the same machine you probably want to make two schemas instead - in that case you don't need anything special to query across them.
postgres_fdw
Use postgres_fdw (foreign data wrapper) to connect to tables in any Postgres database - local or remote.
Note that there are foreign data wrappers for other popular data sources. At this time, only postgres_fdw and file_fdw are part of the official Postgres distribution.
For Postgres versions before 9.3
Versions this old are no longer supported, but if you need to do this in a pre-2013 Postgres installation, there is a function called dblink.
I've never used it, but it is maintained and distributed with the rest of PostgreSQL. If you're using the version of PostgreSQL that came with your Linux distro, you might need to install a package called postgresql-contrib.
dblink() -- executes a query in a remote database
dblink executes a query (usually a SELECT, but it can be any SQL
statement that returns rows) in a remote database.
When two text arguments are given, the first one is first looked up as
a persistent connection's name; if found, the command is executed on
that connection. If not found, the first argument is treated as a
connection info string as for dblink_connect, and the indicated
connection is made just for the duration of this command.
one of the good example:
SELECT *
FROM table1 tb1
LEFT JOIN (
SELECT *
FROM dblink('dbname=db2','SELECT id, code FROM table2')
AS tb2(id int, code text);
) AS tb2 ON tb2.column = tb1.column;
Note: I am giving this information for future reference. Reference
I have run into this before an came to the same conclusion about cross database queries as you. What I ended up doing was using schemas to divide the table space that way I could keep the tables grouped but still query them all.
Just to add a bit more information.
There is no way to query a database other than the current one. Because PostgreSQL loads database-specific system catalogs, it is uncertain how a cross-database query should even behave.
contrib/dblink allows cross-database queries using function calls. Of course, a client can also make simultaneous connections to different databases and merge the results on the client side.
PostgreSQL FAQ
Yes, you can by using DBlink (postgresql only) and DBI-Link (allows foreign cross database queriers) and TDS_LInk which allows queries to be run against MS SQL server.
I have used DB-Link and TDS-link before with great success.
I have checked and tried to create a foreign key relationships between 2 tables in 2 different databases using both dblink and postgres_fdw but with no result.
Having read the other peoples feedback on this, for example here and here and in some other sources it looks like there is no way to do that currently:
The dblink and postgres_fdw indeed enable one to connect to and query tables in other databases, which is not possible with the standard Postgres, but they do not allow to establish foreign key relationships between tables in different databases.
If performance is important and most queries are read-only, I would suggest to replicate data over to another database. While this seems like unneeded duplication of data, it might help if indexes are required.
This can be done with simple on insert triggers which in turn call dblink to update another copy. There are also full-blown replication options (like Slony) but that's off-topic.
see https://www.cybertec-postgresql.com/en/joining-data-from-multiple-postgres-databases/ [published 2017]
These days you also have the option to use https://prestodb.io/
You can run SQL on that PrestoDB node and it will distribute the SQL query as required. It can connect to the same node twice for different databases, or it might be connecting to different nodes on different hosts.
It does not support:
DELETE
ALTER TABLE
CREATE TABLE (CREATE TABLE AS is supported)
GRANT
REVOKE
SHOW GRANTS
SHOW ROLES
SHOW ROLE GRANTS
So you should only use it for SELECT and JOIN needs. Connect directly to each database for the above needs. (It looks like you can also INSERT or UPDATE which is nice)
Client applications connect to PrestoDB primarily using JDBC, but other types of connection are possible including a Tableu compatible web API
This is an open source tool governed by the Linux Foundation and Presto Foundation.
The founding members of the Presto Foundation are: Facebook, Uber,
Twitter, and Alibaba.
The current members are: Facebook, Uber, Twitter, Alibaba, Alluxio,
Ahana, Upsolver, and Intel.
In case someone needs a more involved example on how to do cross-database queries, here's an example that cleans up the databasechangeloglock table on every database that has it:
CREATE EXTENSION IF NOT EXISTS dblink;
DO
$$
DECLARE database_name TEXT;
DECLARE conn_template TEXT;
DECLARE conn_string TEXT;
DECLARE table_exists Boolean;
BEGIN
conn_template = 'user=myuser password=mypass dbname=';
FOR database_name IN
SELECT datname FROM pg_database
WHERE datistemplate = false
LOOP
conn_string = conn_template || database_name;
table_exists = (select table_exists_ from dblink(conn_string, '(select Count(*) > 0 from information_schema.tables where table_name = ''databasechangeloglock'')') as (table_exists_ Boolean));
IF table_exists THEN
perform dblink_exec(conn_string, 'delete from databasechangeloglock');
END IF;
END LOOP;
END
$$