I have created a stored procedure in BigQuery where I do some transformations and data inserts and deletes.
I have one query in my procedure which, for some reason, fails with:
Not found: Table
rdg-datawarehouse:Amazon_Raw._metadata_order_items was not
found in location US at
[rdg-datawarehouse.Amazon.update_orders:141:1]
insert into `rdg-datawarehouse.Amazon_Raw._metadata_order_items` (OrderId, PurchaseDate, processed_flag)
select distinct
src.OrderId,
src.PurchaseDate,
0 as processed_bit
from `rdg-datawarehouse.Amazon_Raw.Orders` src
inner join `last_updated_orders` tmp
on src.OrderId = tmp.OrderId
and src.LastUpdateDate = tmp.LastUpdateDate
and src.import_timestamp = tmp.last_import_timestamp
left join `rdg-datawarehouse.Amazon_Raw._metadata_order_items` mit
on tmp.OrderId = mit.OrderId
where coalesce(mit.OrderId, 1) = 1;
The last_updated_orders table is a temporary table defined previously in the procedure.
create temp table `last_updated_orders` (OrderId STRING, LastUpdateDate TIMESTAMP, last_import_timestamp TIMESTAMP)
as
select ...
I have checked both datasets and tables, and they are all stored in the US.
I have even tried forcing the query session to run in the US, by going to More -> Query Settings and setting the Data Location as "US".
But, unfortunately nothing has worked. Any suggestions are welcome.
Related
I have an MSSQL database Table (named here: TABLE) with four columns (ID, lookup, date, value) and I want to check for a large amount of data whether they are in the database, using python. The data I want to add are here called: to_be_added with columns index, lookup, date, value.
To check whether the data already exist I use the following sql. It returns the index from the to_be_added data which are not yet in the database. I first check which lookup are in the database and then only perform the join on the subset (here called existing).
SELECT to_be_added."index",existing."ID" FROM
(
(
select * from dbo.TABLE
where "lookup" in (1,2,3,4,5,6,7,...)
) existing
right join
(
select * from
( Values
(1, 1, 1/1/2000, 0.123),(2, 2, 1/2/2000, 0.456),(...,...,...)
)t1(index,lookup,date,value)
)to_be_added
on existing.lookup = to_be_added.lookup
and existing.date = to_be_added.date
)
WHERE existing."ID" IS NULL
I do it batchwise as otherwhise the sql command is getting too large to commit and execution time is too long. As I have millions of lines to compare I am looking for a more efficent command as it becomes quite time consuming.
Any help appreciated
I would do the following:
Load the data from Excel into a table in your DB e.g. table = to_be_added
Run a query like this:
SELECT a.index
FROM to_be_added a
LEFT OUTER JOIN existing e ON
a.lookup = e.lookup
and a.date = e.date
WHERE e.lookup IS NULL;
Ensure that table "existing" has an index on lookup+date
I have a complex stored procedure that will also call other stored procedures as part of the workflow. I have checked all stored procedures for the ambiguous column 'ColumnId' error.
The first oddity is that the error is paramaterized with a single input and the error will not recreate for all users even with the same input. The second oddity is that I have checked all the SELECT, JOIN, WHERE, ORDER BY, and GROUP BY for the normal errors of ambiguity and not found any violations.
The only potential violation might be
SELECT RateID
FROM Rate.tblRate
INNER JOIN #tmpRate
ON tblRate.CustomerID = #tmpRate.CustomerID
Could the line for ON be an issue as it is not
ON Rate.tblRate.CustomerID = #tmpRate.CustomerID
In your case, the proc could return different, or multiple result sets making this behavior sporadic. However,I've seen this for temp tables a lot, though I can't explain why. If you alias that table, it resolves it every time.
SELECT RateID
FROM Rate.tblRate r
INNER JOIN #tmpRate t
ON r.CustomerID = t.CustomerID
This is good practice as it is required for other instances, like table variables.
if object_id('tempdb..#temp') is not null drop table #temp
select 1 as ID into #temp
declare #table table (ID int)
insert into #table
values
(1)
select *
from
#table
inner join #temp on #temp.ID = #table.ID
This will throw the error:
Must declare the scalar variable "#table".
So, alias it and it will work:
select *
from
#table t
inner join #temp on #temp.ID = t.ID
There are a lot of blogs out there on why it's a good habit to pick up. Here is one.
I was trying to look for it online but couldn't find anything that will settle my doubts.
I want to figure out which one is better to use, when and why?
I know MERGE is usually used for an upsert, but there are some cases that a normal update with with subquery has to select twice from the table(one from a where clause).
E.G.:
MERGE INTO TableA s
USING (SELECT sd.dwh_key,sd.serial_number from TableA#to_devstg sd
where sd.dwh_key = s.dwh_key and sd.serial_number <> s.serial_number) t
ON(s.dwh_key = t.dwh_key)
WHEN MATCHED UPDATE SET s.serial_number = t.serial_number
In my case, i have to update a table with about 200mil records in one enviorment, based on the same table from another enviorment where change has happen on serial_number field. As you can see, it select onces from this huge table.
On the other hand, I can use an UPDATE STATEMENT like this:
UPDATE TableA s
SET s.serial_number = (SELECT t.serial_number
FROM TableA#to_Other t
WHERE t.dwh_serial_key = s.dwh_serial_key)
WHERE EXISTS (SELECT 1
FROM TableA#To_Other t
WHERE t.dwh_serial_key = s.dwh_serial_key
AND t.serial_number <> s.serial_number)
As you can see, this select from the huge table twice now. So, my question is, what is better? why?.. which cases one will be better than the other..
Thanks in advance.
I would first try to load all necessary data from remote DB to the temporary table and then work with that temporary table.
create global temporary table tmp_stage (
dwh_key <your_dwh_key_type#to_devstg>,
serial_number <your_serial_number_type##to_devstg>
) on commit preserve rows;
insert into tmp_stage
select dwh_key, serial_number
from TableA#to_devstg sd
where sd.dwh_key = s.dwh_key;
/* index (PK on dwh_key) your temporary table if necessary ...*/
update (select
src.dwh_key src_key,
tgt.dwh_key tgt_key,
src.serial_number src_serial_number,
tgt.serial_number tgt_serial_number
from tmp_stage src
join TableA tgt
on src.dwh_key = tgt.dwh_key
)
set src_serial_number = tgt_serial_number;
I currently have a performance issue with a query (that is more complicated than the example below). Originally the query would run and take say 30 seconds, then when I switched out the use of a table variable to using a temp table instead, the speed is cut down to a few seconds.
Here is a trimmed down version using a table variable:
-- Store XML into tables for use in query
DECLARE #tCodes TABLE([Code] VARCHAR(100))
INSERT INTO
#tCodes
SELECT
ParamValues.ID.value('.','VARCHAR(100)') AS 'Code'
FROM
#xmlCodes.nodes('/ArrayOfString/string') AS ParamValues(ID)
SELECT
'SummedValue' = SUM(ot.[Value])
FROM
[SomeTable] st (NOLOCK)
JOIN
[OtherTable] ot (NOLOCK)
ON ot.[SomeTableID] = st.[ID]
WHERE
ot.[CodeID] IN (SELECT [Code] FROM #tCodes) AND
st.[Status] = 'ACTIVE' AND
YEAR(ot.[SomeDate]) = 2013 AND
LEFT(st.[Identifier], 11) = #sIdentifier
Here is the version with the temp table which performs MUCH faster:
SELECT
ParamValues.ID.value('.','VARCHAR(100)') AS 'Code'
INTO
#tCodes
FROM
#xmlCodes.nodes('/ArrayOfString/string') AS ParamValues(ID)
SELECT
'SummedValue' = SUM(ot.[Value])
FROM
[SomeTable] st (NOLOCK)
JOIN
[OtherTable] ot (NOLOCK)
ON ot.[SomeTableID] = st.[ID]
WHERE
ot.[CodeID] IN (SELECT [Code] FROM #tCodes) AND
st.[Status] = 'ACTIVE' AND
YEAR(ot.[SomeDate]) = 2013 AND
LEFT(st.[Identifier], 11) = #sIdentifier
The problem I have with performance is solved with the change but I just don't understand why it fixes the issue and would prefer to know why. It could be related to something else in the query but all I have changed in the stored proc (which is much more complicated) is to switch from using a table variable to using a temp table. Any thoughts?
The differences and similarities between table variables and #temp tables are looked at in depth in my answer here.
Regarding the two queries you have shown (unindexed table variable vs unindexed temp table) three possibilities spring to mind.
INSERT ... SELECT to table variables is always serial. The SELECT can be parallelised for temp tables.
Temp tables can have column statistics histograms auto created for them.
Usually the cardinality of table variables is assumed to be 0 (when they are compiled when the table is empty)
From the code you have shown (3) seems the most likely explanation.
This can be resolved by using OPTION (RECOMPILE) to recompile the statement after the table variable has been populated.
I have never used indexes on a stored procedure so I am seeking help as one of my reports using is running v slow. I have a crystal report which is an aging report. In the procedure itself, it is creating a temp table. The table it is using form temp table has a lot of data and lot of information is being pulled. The end result is the report takes forever to run. Besides creating index on the temp table,any other suggestions are welcome and thanks for looking at this.The code is as follows.
Create procedure [dbo].[ST_Stored] #AsofDate datetime
as
--drop table #rectodate
declare #rectodate table (transid INT, transrowid INT, reconsum REAL)
INSERT into #rectodate
select transid, transrowid, sum(reconsum) as reconsum
from itr1
where reconnum in (select reconnum from oitr where recondate <=#AsofDate)
group by transid, transrowid
select t0.transid, t2.cardcode,
case when t0.debit <> 0 then t0.debit - isnull(t1.reconsum,0) else 0 end as OpenDebit,
case when t0.credit <> 0 then t0.credit - isnull(t1.reconsum, 0) else 0 end as OpenCredit,
t0.debit, t0.credit,*
from jdt1 t0 left outer join #RecToDate t1
on t0.transid = t1.transid and t0.line_id= t1.transrowid
left join OINV t2 on t2.CardCode=t0.ShortName
join oslp on oslp.slpcode = t2.slpcode
where t0.refdate =#AsofDate and t2.slpcode=5
order by t0.transid, t2.cardcode, t0.refdate
1) The simplest thing you can do is to execute this stored procedure in SSMS with Include Actual Execution Plan option activated to see if SQL Server shows you to create a missing index. If you can then you should publish this execution plan.
2) I would create the following index (if there isn't):
CREATE /*UNIQUE*/ INDEX IX_jdt1_refdate_#_transid_lineid_debit_credit
ON dbo.jdt1(refdate)
INCLUDE (transid,line_id,debit,credit)
As Bogdan said, you have to check the query plan to understand why your query is so long.
And I would say that you have to do that before creating any index "au petit bonheur" (I am not saying that the index suggested by Bogdan is wrong, but without knowing the data ... it's a guess :) .
There is a comment in your code that shows that you tried a temp table instead of a table variable (the --drop table #rectodate).
For performance reasons, I would stay with a temp table because you can add index to a temp table not to a table variable (and by the way table variables are created in tempdb too).