I have a query, A, in my ms-access database that takes ~2 seconds to execute. A gives me six fields: Field1, Field2, ..., Field6.
I must append the results of A to a table, T.
I created a query, B, that selects columns from A and inserts them into table T. However, B takes more than 10 minutes to run... Why? and How do I speed-up B?
Here is the code for B:
INSERT INTO TrialRuns (Field1,Field2,...,Field6)
SELECT A.Field1,A.Field2,...,Field6
From A
Try something like this:
INSERT INTO TrialRuns
SELECT * FROM A;
Try;
SELECT A.Field1,A.Field2,...,Field6
INTO TrialRuns
FROM A
Note that this may only work if you make sure that the TrialRuns table doesn't exist to begin with, so do a DROP TABLE TrialRun beforehand if it does exist. This should take as long to run as the initial SELECT statement.
Related
I have 2 BQ tables, very wide ones in terms of number of columns. Note all the table columns are made nullable for flexibility
Table A - 1000 cols - Superset of Bs cols
Table B - 500 cols - Subset of As cols - exactly named/typed as above cols
So rows in Bs table data should be insertable into A, where anything column not inserted just gets a null. i.e 500 cols get a value, remaining 500 get a default null as not present in the insert.
So as these tables are very wide, enumerating all the columns in an insert statement would take forever and be a maintence nightmare.
Is there a way in standard SQL to insert without listing the columns names in the the insert statement, whereby its automagically name matched?
So I want to be able to do this really and have the columns from B matched to A for each row inserted? If not is there any other way I am not seeing that could help with this?
thanks!
INSERT INTO
`p.d.A` (
SELECT
*
FROM
`p.d.B` )
I actually tried enumerating the columns to see if nesting worked and seems it doesnt?
INSERT INTO
`p.d.A` (x, y.z) (
SELECT
x, y.z
FROM
`p.d.B` )
I cant just say (x,y) as y structs from the dff tables arent exactly the same BQ complains structs dont match exact.....hence why I was trying y.z ?
Sure, easy!
Prepare dummy table p.d.b_ using below select
SELECT * FROM `p.d.a` WHERE FALSE
(Note, even though result will be empty table - above will scan whole table a - this is required just once - so should be Okey - if not you can script this once and just create this table from script)
Ok, so now instead of using
SELECT * FROM `p.d.b`
you will use
SELECT * FROM `p.d.b*`
and this will make a trick for you (it did for me :o)
P.S. Of course I assume you will make sure there is no other tables with names starting with b (or whatever real name is) in that dataset
I know how to find the CREATE statement for a table in SQL Server but is there any place that stores the actual SQL code if I use SELECT INTO ... to create a table and if so how do I access it?
I see two ways of creating tables with SELECT INTO.
First: You know the Schema, then you can declare a #Table Variable and perform the Select INSERT
Second: You can create a temp table:
SELECT * INTO #TempTable FROM Customer
There are some limitations on the second choice:
- You need to drop the temp table afterwards.
- If there is a VARCHAR Column and the maximum number of characters of that given SELECT is 123 characters (example), and then you try to insert into the TEMP table afterwards with a greater number of characters, it will throw an error.
My recommendation is always declare a table in order to use, it makes it clear what is the intentions and increases readability.
I have a table, stop_logs in HIVE. When I run a insert query for around 6000 rows, it takes 300 secs, where as if I run just SELECT query, it finishes in 6 seconds. Why insert is taking this much time?
CREATE TABLE stop_logs (event STRING, loadId STRING)
STORED AS SEQUENCEFILE;
Following takes 300 sec:
INSERT INTO TABLE stop_logs
SELECT
i.event, i.loadId
FROM
event_logs i
WHERE
i.stopId IS NOT NULL;
;
Following query takes 6 secs.
SELECT
i.event, i.loadId
FROM
event_logs i
WHERE
i.stopId IS NOT NULL;
;
First you need to understand how Hive is processing your query:
When you perform a "select * from < tablename>", Hive fetches the whole data from file as a FetchTask rather than a mapreduce task which just dumps the data as it is without doing anything on it. This is similar to "hadoop dfs -text ". As it doesn't run any map-reduce task so it runs faster.
while using "select a,b from < tablename>", Hive requires a map-reduce job since it needs to extract the 'column' from each row by parsing it from the file it loads.
While using "insert into table stop_logs select a,b from event_logs" statement, first select statement runs, which trigger map-reduce job since it needs to extract the 'column' from each row by parsing it from the file it loads and for inserting into another table(stop_logs) it will launch another map reduce task to takes values inserted into columns a and b in 'stop_logs' and maps them to columns a and b, respectively, for insertion into the new row.
Another reason for slowness is check If "hive.typecheck.on.insert" is set to true ,because of that values are validated, converted and normalized to conform to their column types (Hive 0.12.0 onward) while inserting into table which also causes insert to perform slow as compare to select statement.
I Know that OUTPUT Clause can be used in INSERT, UPDATE, DELETE, or MERGE statement. The results of an OUTPUT clause in a INSERT, UPDATE, DELETE, or MERGE statements can be stored into a target table.
But, when i run this query
select * from <Tablename> output
I didn't get any error. The query executed as like select * from tablename with out any error and with same no. of rows
So what is the exact use of output clause in select statement. If any then how it can be used?
I searched for the answer but i couldn't find a answer!!
The query in your question is in the same category of errors as the following (that I have also seen on this site)
SELECT *
FROM T1 NOLOCK
SELECT *
FROM T1
LOOP JOIN T2
ON X = Y
The first one just ends up aliasing T1 AS NOLOCK. The correct syntax for the hint would be (NOLOCK) or ideally WITH(NOLOCK).
The second one aliases T1 AS LOOP. To request a nested loops join the syntax would need to be INNER LOOP JOIN
Similarly in your question it just ends up applying the table alias of OUTPUT to your table.
None of OUTPUT, LOOP, NOLOCK are actually reversed keywords in TSQL so it is valid to use them as a table alias without needing to quote them, e.g. in square brackets.
OUTPUT clause return information about the rows affected by a statement. OUTPUT Clause is used along with INSERT, UPDATE, DELETE, or MERGE statements as you mentioned. The reason it is used is because these statements themselves just return the number of rows effected not the rows effected. Thus the usage of OUTPUT with INSERT, UPDATE, DELETE, or MERGE statements helps the user by returning actual rows effected.
SELECT statement itself returns the rows and SELECT doesn't effect any rows. Thus the usage of OUTPUT clause with SELECT is not required or supported. If you want to store the results of a SELECT statement into a target table use SELECT INTO or the standard INSERT along with the SELECT statement.
EDIT
I guess I misunderstood your question. AS #Martin Smith mentioned its is acting an alias in the SELECT statement you mentioned.
IF OBJECT_ID('tempdelete') IS NOT NULL DROP TABLE tempdelete
GO
IF OBJECT_ID('tempdb..#asd') IS NOT NULL DROP TABLE #asd
GO
CREATE TABLE tempdelete (
name NVARCHAR(100)
)
INSERT INTO tempdelete VALUES ('a'),('b'),('c')
--Creating empty temp table with the same columns as tempdelete
SELECT * INTO #asd FROM tempdelete WHERE 1 = 0
DELETE FROM tempdelete
OUTPUT deleted.* INTO #asd
SELECT * FROM #asd
This is how you can put all the deleted records in to a table. The problem with that is that you have to define the table with all the columns matching the table from which you are deleting. This is how i do it.
I have a complex query in PostgreSQL and I want to use the result of it in other operations like UPDATEs and DELETEs, something like:
<COMPLEX QUERY>;
UPDATE WHERE <COMPLEX QUERY RESULT> = ?;
DELETE WHERE <COMPLEX QUERY RESULT> = ?;
UPDATE WHERE <COMPLEX QUERY RESULT> = ?;
I don't want to have to do the complex query one time for each operations. One way to avoid this is store the result in a table and use it for the WHERE and JOINS and after finishing, drop the temporary table.
I want to know if there is another way without storing the results to database, but already using the results in memory.
I already use loops for this, but I think doing only one operation for each thing will be faster than doing the operations per row.
You can loop through the query results like #phatfingers demonstrates (probably with a generic record variable or scalar variables instead of a rowtype, if the result type of the query doesn't match any existing rowtype). This is a good idea for few resulting rows or when sequential processing is necessary.
For big result sets your original approach will perform faster by an order of magnitude. It is much cheaper to do a mass INSERT / UPDATE / DELETE with one SQL command
than to write / delete incrementally, one row at a time.
A temporary table is the right thing for reusing such results. It gets dropped automatically at the end of the session. You only have to delete explicitly if you want to get rid of it right away or at the end of a transaction. I quote the manual here:
Temporary tables are automatically dropped at the end of a session, or
optionally at the end of the current transaction.
For big temporary tables it might be a good idea to run ANALYZE after they are populated.
Writeable CTE
Here is a demo for what Pavel added in his comment:
CREATE TEMP TABLE t1(id serial, txt text);
INSERT INTO t1(txt)
VALUES ('foo'), ('bar'), ('baz'), ('bax');
CREATE TEMP TABLE t2(id serial, txt text);
INSERT INTO t2(txt)
VALUES ('foo2'),('bar2'),('baz2');
CREATE TEMP TABLE t3 (id serial, txt text);
WITH x AS (
UPDATE t1
SET txt = txt || '2'
WHERE txt ~~ 'ba%'
RETURNING txt
)
, y AS (
DELETE FROM t2
USING x
WHERE t2.txt = x.txt
RETURNING *
)
INSERT INTO t3
SELECT *
FROM y
RETURNING *;
Read more in the chapter Data-Modifying Statements in WITH in the manual.
DECLARE
r foo%rowtype;
BEGIN
FOR r IN [COMPLEX QUERY]
LOOP
-- process r
END LOOP;
RETURN;
END