Creating temporary tables in SQL - sql

I am trying to create a temporary table that selects only the data for a certain register_type. I wrote this query but it does not work:
$ CREATE TABLE temp1
(Select
egauge.dataid,
egauge.register_type,
egauge.timestamp_localtime,
egauge.read_value_avg
from rawdata.egauge
where register_type like '%gen%'
order by dataid, timestamp_localtime ) $
I am using PostgreSQL.
Could you please tell me what is wrong with the query?

You probably want CREATE TABLE AS - also works for TEMPORARY (TEMP) tables:
CREATE TEMP TABLE temp1 AS
SELECT dataid
, register_type
, timestamp_localtime
, read_value_avg
FROM rawdata.egauge
WHERE register_type LIKE '%gen%'
ORDER BY dataid, timestamp_localtime;
This creates a temporary table and copies data into it. A static snapshot of the data, mind you. It's just like a regular table, but resides in RAM if temp_buffers is set high enough. It is only visible within the current session and dies at the end of it. When created with ON COMMIT DROP it dies at the end of the transaction.
Temp tables come first in the default schema search path, hiding other visible tables of the same name unless schema-qualified:
How does the search_path influence identifier resolution and the "current schema"
If you want dynamic, you would be looking for CREATE VIEW - a completely different story.
The SQL standard also defines, and Postgres also supports: SELECT INTO. But its use is discouraged:
It is best to use CREATE TABLE AS for this purpose in new code.
There is really no need for a second syntax variant, and SELECT INTO is used for assignment in plpgsql, where the SQL syntax is consequently not possible.
Related:
Combine two tables into a new one so that select rows from the other one are ignored
ERROR: input parameters after one with a default value must also have defaults in Postgres
CREATE TABLE LIKE (...) only copies the structure from another table and no data:
The LIKE clause specifies a table from which the new table
automatically copies all column names, their data types, and their
not-null constraints.
If you need a "temporary" table just for the purpose of a single query (and then discard it) a "derived table" in a CTE or a subquery comes with considerably less overhead:
Change the execution plan of query in postgresql manually?
Combine two SELECT queries in PostgreSQL
Reuse computed select value
Multiple CTE in single query
Update with results of another sql

http://www.postgresql.org/docs/9.2/static/sql-createtable.html
CREATE TEMP TABLE temp1 LIKE ...

Related

How to disallow loading duplicate rows to BigQuery?

I was wondering if there is a way to disallow duplicates from BigQuery?
Based on this article I can deduplicate a whole or a partition of a table.
To deduplicate a whole table:
CREATE OR REPLACE TABLE `transactions.testdata`
PARTITION BY date
AS SELECT DISTINCT * FROM `transactions.testdata`;
To deduplicate a table based on partitions defined in a WHERE clause:
MERGE `transactions.testdata` t
USING (
SELECT DISTINCT *
FROM `transactions.testdata`
WHERE date=CURRENT_DATE()
)
ON FALSE
WHEN NOT MATCHED BY SOURCE AND date=CURRENT_DATE() THEN DELETE
WHEN NOT MATCHED BY TARGET THEN INSERT ROW
If there is no way to disallow duplicates then is this a reasonable approach to deduplicate a table?
BigQuery doesn't have a mechanism like constraints that can be found in traditional DBMS. In other words, you can't set a primary key or anything like that because BigQuery is not focused on transactions but in fast analysis and scalability. You should think about it as a Data Lake and not as a database with uniqueness property.
If you have an existing table and need to de-duplicate it, the mentioned approaches will work. If you need your table to have unique rows by default and want to programmatically insert unique rows in your table without resorting to external resources, I can suggest you a workaround:
First insert your data into an temporary table
Then, run a query in your temporary table and save the results into your actual table. This step could be programmatically done in some different ways:
Using the approach you mentioned as a scheduled query
Using a bq command such as bq query --use_legacy_sql=false --destination_table=<dataset.actual_table> 'select distinct * from <dataset.temporary_table>' that will query the distinct values in your temporary table and load the results into the target table pointed in the --destination_table attribute. Its important to mention that this approach will also work for partitioned tables.
Finally, drop the temporary table. Like the previous step, this step could be done either using a scheduled query or bq command.
I hope it helps

Difference between CTE, Temp Table and Table Variable in MSSQL

All are used to store data temporarily.
Are there any performance difference (time complexity and space complexity) for these 3 types of temporary table?
Performance issue should depend on whether the result is saved on disk or memory.
I have searched a lot but did not get satisfactory answer.
CTE - Common Table Expressions
CTE stands for Common Table expressions. It was introduced with SQL Server 2005. It is a temporary result set and typically it may be a result of complex sub-query. Unlike temporary table its life is limited to the current query. It is defined by using WITH statement. CTE improves readability and ease in maintenance of complex queries and sub-queries. Always begin CTE with semicolon.
With CTE1(Address, Name, Age)--Column names for CTE, which are optional
AS
(
SELECT Addr.Address, Emp.Name, Emp.Age from Address Addr
INNER JOIN EMP Emp ON Emp.EID = Addr.EID
)
SELECT * FROM CTE1 --Using CTE
WHERE CTE1.Age > 50
ORDER BY CTE1.NAME
When to use CTE?
This is used to store result of a complex sub query for further use.
This is also used to create a recursive query.
Temporary Tables
In SQL Server, temporary tables are created at run-time and you can do all the operations which you can do on a normal table. These tables are created inside Tempdb database. Based on the scope and behavior temporary tables are of two types as given below-
Local Temp Table
Local temp tables are only available to the SQL Server session or connection (means single user) that created the tables. These are automatically deleted when the session that created the tables has been closed. Local temporary table name is stared with single hash ("#") sign.
CREATE TABLE #LocalTemp
(
UserID int,
Name varchar(50),
Address varchar(150)
)
GO
insert into #LocalTemp values ( 1, 'Shailendra','Noida');
GO
Select * from #LocalTemp
The scope of Local temp table exist to the current session of current user means to the current query window. If you will close the current query window or open a new query window and will try to find above created temp table, it will give you the error.
Global Temp Table
Global temp tables are available to all SQL Server sessions or connections (means all the user). These can be created by any SQL Server connection user and these are automatically deleted when all the SQL Server connections have been closed. Global temporary table name is stared with double hash ("##") sign.
CREATE TABLE ##GlobalTemp
(
UserID int,
Name varchar(50),
Address varchar(150)
)
GO
insert into ##GlobalTemp values ( 1, 'Shailendra','Noida');
GO
Select * from ##GlobalTemp
Global temporary tables are visible to all SQL Server connections while Local temporary tables are visible to only current SQL Server connection.
Table Variable
This acts like a variable and exists for a particular batch of query execution. It gets dropped once it comes out of batch. This is also created in the Tempdb database but not the memory. This also allows you to create primary key, identity at the time of Table variable declaration but not non-clustered index.
GO
DECLARE #TProduct TABLE
(
SNo INT IDENTITY(1,1),
ProductID INT,
Qty INT
)
--Insert data to Table variable #Product
INSERT INTO #TProduct(ProductID,Qty)
SELECT DISTINCT ProductID, Qty FROM ProductsSales ORDER BY ProductID ASC
--Select data
Select * from #TProduct
--Next batch
GO
Select * from #TProduct --gives error in next batch
Note
Temp Tables are physically created in the Tempdb database. These tables act as the normal table and also can have constraints, index like normal tables.
CTE is a named temporary result set which is used to manipulate the complex sub-queries data. This exists for the scope of statement. This is created in memory rather than Tempdb database. You cannot create any index on CTE.
Table Variable acts like a variable and exists for a particular batch of query execution. It gets dropped once it comes out of batch. This is also created in the Tempdb database but not the memory.
Fairly broad topic to cover all the ins and outs. Here are a few high level differences which would give you more ideas for researching this.
CTEs are part of the same query and should be thought of as being very similar to a sub-query. A CTE allows for better readability and code-reuse (same CTE can be reused in different parts of the overall query).
Table variables and Temporary tables should be thought of as being similar real tables but with optimizations that enable SQL server to make operations against them fast especially when used with relatively small data sets. Note that although these operate against the tempdb, that doesn't automatically mean data stored here is actually persisted to disk. With each new version of SQL server, there have been additional optimizations (memory-optimized tables for example) to make these constructs faster, especially for their mainline use case of simplifying complex queries.
See this for more information on this topic:
https://www.brentozar.com/archive/2014/06/temp-tables-table-variables-memory-optimized-table-variables/

PostgreSQL return select results AND add them to temporary table?

I want to select a set of rows and return them to the client, but I would also like to insert just the primary keys (integer id) from the result set into a temporary table for use in later joins in the same transaction.
This is for sync, where subsequent queries tend to involve a join on the results from earlier queries.
What's the most efficient way to do this?
I'm reticent to execute the query twice, although it may well be fast if it was added to the query cache. An alternative is store the entire result set into the temporary table and then select from the temporary afterward. That also seems wasteful (I only need the integer id in the temp table.) I'd be happy if there was a SELECT INTO TEMP that also returned the results.
Currently the technique used is construct an array of the integer ids in the client side and use that in subsequent queries with IN. I'm hoping for something more efficient.
I'm guessing it could be done with stored procedures? But is there a way without that?
I think you can do this with a Postgres feature that allows data modification steps in CTEs. The more typical reason to use this feature is, say, to delete records for a table and then insert them into a log table. However, it can be adapted to this purpose. Here is one possible method (I don't have Postgres on hand to test this):
with q as (
<your query here>
),
t as (
insert into temptable(pk)
select pk
from q
)
select *
from q;
Usually, you use the returning clause with the data modification queries in order to capture the data being modified.

IN vs OR of Oracle, which faster?

I'm developing an application which processes many data in Oracle database.
In some case, I have to get many object based on a given list of conditions, and I use SELECT ...FROM.. WHERE... IN..., but the IN expression just accepts a list whose size is maximum 1,000 items.
So I use OR expression instead, but as I observe -- perhaps this query (using OR) is slower than IN (with the same list of condition). Is it right? And if so, how to improve the speed of query?
IN is preferable to OR -- OR is a notoriously bad performer, and can cause other issues that would require using parenthesis in complex queries.
Better option than either IN or OR, is to join to a table containing the values you want (or don't want). This table for comparison can be derived, temporary, or already existing in your schema.
In this scenario I would do this:
Create a one column global temporary table
Populate this table with your list from the external source (and quickly - another whole discussion)
Do your query by joining the temporary table to the other table (consider dynamic sampling as the temporary table will not have good statistics)
This means you can leave the sort to the database and write a simple query.
Oracle internally converts IN lists to lists of ORs anyway so there should really be no performance differences. The only difference is that Oracle has to transform INs but has longer strings to parse if you supply ORs yourself.
Here is how you test that.
CREATE TABLE my_test (id NUMBER);
SELECT 1
FROM my_test
WHERE id IN (1,2,3,4,5,6,7,8,9,10,
21,22,23,24,25,26,27,28,29,30,
31,32,33,34,35,36,37,38,39,40,
41,42,43,44,45,46,47,48,49,50,
51,52,53,54,55,56,57,58,59,60,
61,62,63,64,65,66,67,68,69,70,
71,72,73,74,75,76,77,78,79,80,
81,82,83,84,85,86,87,88,89,90,
91,92,93,94,95,96,97,98,99,100
);
SELECT sql_text, hash_value
FROM v$sql
WHERE sql_text LIKE '%my_test%';
SELECT operation, options, filter_predicates
FROM v$sql_plan
WHERE hash_value = '1181594990'; -- hash_value from previous query
SELECT STATEMENT
TABLE ACCESS FULL ("ID"=1 OR "ID"=2 OR "ID"=3 OR "ID"=4 OR "ID"=5
OR "ID"=6 OR "ID"=7 OR "ID"=8 OR "ID"=9 OR "ID"=10 OR "ID"=21 OR
"ID"=22 OR "ID"=23 OR "ID"=24 OR "ID"=25 OR "ID"=26 OR "ID"=27 OR
"ID"=28 OR "ID"=29 OR "ID"=30 OR "ID"=31 OR "ID"=32 OR "ID"=33 OR
"ID"=34 OR "ID"=35 OR "ID"=36 OR "ID"=37 OR "ID"=38 OR "ID"=39 OR
"ID"=40 OR "ID"=41 OR "ID"=42 OR "ID"=43 OR "ID"=44 OR "ID"=45 OR
"ID"=46 OR "ID"=47 OR "ID"=48 OR "ID"=49 OR "ID"=50 OR "ID"=51 OR
"ID"=52 OR "ID"=53 OR "ID"=54 OR "ID"=55 OR "ID"=56 OR "ID"=57 OR
"ID"=58 OR "ID"=59 OR "ID"=60 OR "ID"=61 OR "ID"=62 OR "ID"=63 OR
"ID"=64 OR "ID"=65 OR "ID"=66 OR "ID"=67 OR "ID"=68 OR "ID"=69 OR
"ID"=70 OR "ID"=71 OR "ID"=72 OR "ID"=73 OR "ID"=74 OR "ID"=75 OR
"ID"=76 OR "ID"=77 OR "ID"=78 OR "ID"=79 OR "ID"=80 OR "ID"=81 OR
"ID"=82 OR "ID"=83 OR "ID"=84 OR "ID"=85 OR "ID"=86 OR "ID"=87 OR
"ID"=88 OR "ID"=89 OR "ID"=90 OR "ID"=91 OR "ID"=92 OR "ID"=93 OR
"ID"=94 OR "ID"=95 OR "ID"=96 OR "ID"=97 OR "ID"=98 OR "ID"=99 OR
"ID"=100)
I would question the whole approach. The client of the SP has to send 100000 IDs. Where does the client get those IDs from? Sending such a large number of ID as the parameter of the proc is going to cost significantly anyway.
If you create the table with a primary key:
CREATE TABLE my_test (id NUMBER,
CONSTRAINT PK PRIMARY KEY (id));
and go through the same SELECTs to run the query with the multiple IN values, followed by retrieving the execution plan via hash value, what you get is:
SELECT STATEMENT
INLIST ITERATOR
INDEX RANGE SCAN
This seems to imply that when you have an IN list and are using this with a PK column, Oracle keeps the list internally as an "INLIST" because it is more efficient to process this, rather than converting it to ORs as in the case of an un-indexed table.
I was using Oracle 10gR2 above.

What does 'select to a temp table' mean?

This answer had me slightly confused. What is a 'select to a temp table' and can someone show me a simple example of it?
A temp table is a table that exists just for the duration of the stored procedure and is commonly used to hold temporary results on the way to a final calculation.
In SQL Server, all temp tables are prefixed with a # so if you issue a statement like
Create table #tmp(id int, columnA)
Then SQL Server will automatically know that the table is temporary, and it will be destroyed when the stored procedure goes out of scope unless the table is explicitly dropped like
drop table #tmp
I commonly use them in stored procedures that run against huge tables with a high transaction volume, because I can insert the subset of data that I need into the temp table as a temporary copy and work on the data without fear of bringing down a production system if what I'm doing with the data is a fairly intense operation.
In SQL Server all temp tables live in the tempdb datase.
See this article for more information.
If you have a complex set of results that you want to use again and again, then do you keep querying the main tables (where data will be changing, and may impact performance) or do you store them up in a temporary table for more processing. It's better to use a temporary table often.
Or you really need to iterate through rows in a non-set fashion you can use a temp table (or CURSOR)
If you do simple CRUD against a DB then you probably have no need for temp tables
You have:
table variables: DECLARE #foo TABLE (bar int...)
explict temp tables: CREATE TABLE #foo (bar int...)
inline created: SELECT ... INTO #foo FROM...
A temp table is a table that is dynamically created by using some such syntax:
SELECT [columns] INTO #MyTable FROM SomeExistingTable
What you then have is a table that is populated with the values that you selected into it. Now you can select against it, update it, whatever.
SELECT FirstName FROM #MyTable WHERE...
The table lives for some predetermined scope of time, for example, for the duration of the stored procedure in which it lives. Then it's gone from memory and never accessible again. Temporary.
HTH
You can use SELECT ... INTO to both create a temp table and populate it like so:
SELECT Col1, Col2...
INTO #Table
FROM ...
WHERE ...
(BTW, this syntax is for SQL Server and Sybase. )
EDIT Once you had created the table like I did above, you can then use it other queries on the same connection:
Select
From OtherTable
Join #Table
On #Table.Col = OtherTable.Col
The key here is that it all happens on the same connection. Thus, to create and use a temp table from a client script would be awkward in that you would have to ensure that all subsequent uses of the table were on the same connection. Instead, most people use temp tables in stored procedures where they create the table on one line and then use a few lines later in the same procedure.
Think of temp tables as sql variable of type 'table'. Use them in scripts and stored procedures. It comes handy when you need to manipulate data that is not simple value but a subset of a database table (both vertical and horizontal).
When you realize these benefits then you can take advantage of more power that comes with various sharing models (scope) for temp tables: private, global, transaction, etc. All major RDBMS engines support temp tables but there is no standard features or syntax for them.
For example of usage see answer.