Merging 2 partitioned tables BigQuery - google-bigquery

I am trying to merge 2 partitioned tables in BigQuery:
'source_t' is a source table. Its partitioned by Ingestion Time with Partition filter –
Required. Pseudo field _PARTITIONTIME is timestamp
'target_t' is a target table partitioned by field 'date' with Partition filter
Required. Field date is date
I want to get data from last partition of source table and merge it to target table. To filter the search task on tagret table I need to use the field 'date' from the data of source table. I wrote a query but editor shows following query error:
Cannot query over table 'MyDataSet.target_t' without a filter over column(s) 'date'
Here is my query:
declare latest default (select date(max(_PARTITIONTIME)) as latest from MyDataSet.source_t where _PARTITIONTIME >= timestamp(date_sub(current_date(),interval 1 day)));
declare first_date default (select min(date) as first_date from MyDataSet.source_t where date(_PARTITIONTIME) = latest);
merge `MyDataSet.target_t` as a
using (select * from `MyDataSet.source_t` where _PARTITIONTIME = latest) as b
on
a.date >= first_date and
a.date = b.date and
a.account_id = b.account_id and
a.campaign_id = b.campaign_id and
a.adset_id = b.adset_id and
a.ad_id = b.ad_id
when matched then update set
a.account_name = b.account_name,
a.campaign_name = b.campaign_name,
a.adset_name = b.adset_name,
a.ad_name = b.ad_name,
a.impressions = b.impressions,
a.clicks = b.clicks,
a.spend = b.spend,
a.date = b.date
when not matched then insert row;
If I input date instead of 'latest' variable ("where _PARTITIONTIME = '2020-10-01') as b") there wont be any error. But I want to filter the source table properly.
And I don't get it how it affects the following 'on' statement and why everything brokes >.<
Could you please help? What is a proper syntax to write such query. And is there any other ways to run such merge without variables?

declare latest timestamp;
Your variable latest is a TIMESTAMP. Making it a DATE type then your query should work.
------ Update --------
The error is complaining about MyDataSet.target_t doesn't have a good filter on date column. Could you try put after on clause a.date = latest (if this is not the right filter, come up with other constant filter)

Related

Inserting data into a stats table with merge - An action of type 'INSERT' is not allowed in the 'WHEN MATCHED' clause of a MERGE statement

I have two tables
The first table
stats {
Id,
stat1 decimal,
stat2 decimal,
stat3 decimal,
[LastChangeDate] datetime2
}
The second table
statsHistory {
Id,
stat1 decimal,
stat2 decimal,
stat3 decimal,
[LastChangeDate] datetime2
DateGenerated Date
}
The statsHistory table is designed to be a caching of the daily stats data to reduce the recalculation of the daily stats based on session data.
I'm trying to use a merge statement to pull the data from the stats table into the statsHistory table. I intend to have this stored procedure called nightly at a set time.
;WITH C1 AS (
SELECT
ROW_NUMBER() OVER(PARTITION BY uc.Id ORDER BY [DayDateGenerated] DESC) as rowNumber,
uc.[Id]
,uc.[stat1]
,uc.[stat2]
,uc.[stat3]
,UCH.DayDateGenerated
UCH.[stat1]
,UCH.[stat2]
,UCH.[stat3]
FROM
[UserConcepts] as uc
LEFT OUTER JOIN [UserConceptsDayHistory] as UCH on
UCH.[MemberId] = uc.[MemberId] AND UCH.[ForConceptId] = uc.[MemberId]
)
Merge [UserConceptsDayHistory] AS TARGET
USING(
select * from C1
WHERE rowNumber = 1
--AND CONVERT(DATE,[LastChangeDate]) < CONVERT(DATE,DateAdd(GetUtcDAte(),-1))
) AS source
ON(target.[MemberId] = source.[MemberId] AND target.[ForConceptId] = source.[ForConceptId])
WHEN not MATCHED THEN
insert(stat1
,stat2
,stat3
,[LastChangeDate]
,DateGenerated
)
values
(source.stat1
,source.stat2
,source.stat3
,source.[LastChangeDate]
,source.[LastChangeDate]
)
WHEN MATCHED AND source.DayDateGenerated = target.DayDateGenerated AND source.[TotalSessions] != target.[TotalSessions] THEN
update SET
stat1 = source.stat1
,stat2= source.stat2 - target.stat3
,stat3 = source.stat3
,[LastChangeDate] = source.[LastChangeDate],
DateGenerated = source.[LastChangeDate]
WHEN MATCHED AND source.DayDateGenerated IS NOT NULL AND source.DayDateGenerated != target.DayDateGenerated THEN
insert(stat1
,stat2
,stat3
,[LastChangeDate]
,DateGenerated
)
values
(source.stat1
,source.stat2 - target.stat3
,source.stat3
,source.[LastChangeDate]
,source.[LastChangeDate]
)
OUTPUT $action, inserted.*, deleted.*;
As the history table could have many rows I had hoped to have conditional match cases where I could insert where a row existed in the table but not for that date, however, I get the following SQL error:
An action of type 'INSERT' is not allowed in the 'WHEN MATCHED' clause of a MERGE statement
I understand that I cannot achieve what I was hoping for with the merge statement, however, was wondering if there are any good alternative approaches to this issue.
Any help would be gratefully appreciated.

Redshift Target Table Equijoin Predicate Error with Subquery

I am trying to run an update statement using a subquery to apply logic to the target table before updating. I keep getting the following error (below) even though the target table equijoin is specified in the where clause. Is there something I'm missing? I've also tried converting the subquery into a CTE which results in the same error. The "stg" alias table contains the same amount of rows as the "tgt" so it's not there is a mismatch there.
Error: SQL Error [XX000]: ERROR: Target table must be part of an equijoin predicate
SQL:
update db.target_table as tgt
set
ind = stg.ind_stg
,ts = current_timestamp
from
(
select
load_date
,val_a
,val_b
,stdev.val_a_stdev
,stdev.val_b_stdev
,rec_updated_ts
,case
when (load_date >= 'XXXXX' and load_date <= 'XXXXXX')
or (abs(val_a) >= stdev.val_a_stdev
or abs(val_b) >= stdev.val_b_stdev)
then True else False end as ind
from db.target_table
cross join
(
select
cast(stddev_samp(val_a) as dec(14,2))*2.0 val_a_stdev
,cast(stddev_samp(val_b) as dec(14,2))*2.0 val_b_stdev
from db.target_table
where load_date <= (current_date - 2)
) as stdev
where load_date <= (current_date - 2)
) stg
where stg.load_date = tgt.load_date
However, if I was to wrap the "stg" subquery as a temp table, then ran the same update statement (below), it works. This is perplexing because the original error points towards the target table, but only changing the method of how the subquery is derived fixes the query. Not sure I understand what is going on...
update db.target_table as tgt
set
ind = stg.ind_stg
,ts = current_timestamp
from <SAME STAGE LOGIC> as stg
where stg.load_date = tgt.load_date;

Update largest date, matching two fields

tables
Hi, I'm looking to update the last column in a blank table. Picture shows input and desired output. Trying to pick the largest date where workorder and state match.
I've tried a couple different codes:
UPDATE mytable
SET mytable.orderstartdate = MAX(table2.earliestdate)
FROM mytable as table2
WHERE (mytable.workorder = table2.workorder AND
mytable.state = table2.state)
;
"Syntax Error (missing operator) in query expression 'MAX(table2.earliestdate) FROM mytable as table2'."
UPDATE mytable
SET mytable.orderstartdate = (
SELECT max(earliestdate)
FROM mytable as table2
WHERE (mytable.workorder = table2.workorder AND
mytable.state = table2.state)
)
;
"Operation must use an updateable query"
Edit - click tables link for image.
Write PL/SQL Code.
First, select DISTINCT WorkOrder and State and capture in variables.
Now, Iterate the list and Write a query to get max date i.e. max(date) using work_order and State in where clause. Capture the
date.
Now, In the same loop write update query setting max(date) and workorder and State in where clause.
UPDATE table A
SET A.orderstartDate = (SELECT max(earliestdate)
FROM table B
WHERE A.WorkOrder = B.WorkOrder
GROUP BY WorkOrder)
not sure if access supports correlated subqueries but if it does...
and if not...
UPDATE table A
INNER JOIN (SELECT WorkOrder, max(OrderStartDate) MOSD
FROM Table B
GROUP BY WorkOrder) C
ON A.WorkOrder = C.workOrder
SET A.OrderStartDate = C.MOSD
Check database open mode, it may be locked for editing, or you may have no permission to to file.

SQLite Update Query Optimization

So I have tables with the following structure:
TimeStamp,
var_1,
var_2,
var_3,
var_4,
var_5,...
This contains about 600 columns named var_##, the user parses some data stored by a machine and I have to update all null values inside that table to the last valid value. At the moment I use the following query:
update tableName
set var_## =
(select b.var_## from tableName as
where b.timeStamp <= tableName.timeStamp and b.var_## is not null
order by timeStamp desc limit 1)
where tableName.var_## is null;
Problem right now is the tame it takes to run this query for all columns, is there any way to optimize this query?
UPDATE: this is the output query plan when executin te query for one column:
update wme_test2
set var_6 =
(select b.var_6 from wme_test2 as b
where b.timeStamp <= wme_test2.timeStamp and b.var_6 is not null
order by timeStamp desc limit 1)
where wme_test2.var_6 is null;
Having 600 indexes on the data columns would be silly. (But not necessarily more silly than having 600 columns.)
All queries can be sped up with an index on the timeStamp column.

How to write a SQL DELETE statement using a SELECT on a JOIN

Morning All, I'm attempting to run the following script however I'm receiving an "ORA-00933 SQL Command not properly ended" error Can anyone see where I have gone wrong:
delete tableA
FROM tableA
JOIN tableB
ON tableB.usi = tableA.usi
WHERE tableB.usc = 'ABC'
AND tableA.cfs = '01.01.2013'
Thanks for looking!
Oracle does not support JOINs for a DELETE statement. You need to use a sub-query
delete from tableA
where exists (select *
from tableb
where tableB.usi = tableA.usi
and tableB.usc = 'ABC'
AND tableA.cfs = '01.01.2013');
The full syntax of the DELETE statement is documented in the manual
https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_8005.htm#SQLRF01505
Note that if tableA.cfs is a DATE (or TIMESTAMP) column, you should not rely on implicit data type conversion. '01.01.2013' is a string literal not a date. Oracle will try to convert that to a date but this might fail depending on the NLS settings of the SQL client. It's better to use explicit ansi date literals: where cfs = DATE '2013-01-01' or use the to_date() function: where cfs = to_date('01.01.2013', 'dd.mm.yyyy').
Additionally Oracle's DATE column includes a time. So unless all the dates in the csf column have the time 00:00:00 that condition is very likely to not match anything. It's safer to use trunc(tablea.csf) = ... to "remove" the time part of the date column (it doesn't really remove it, it simply sets it to 00:00:00)
You can try somethink like :
delete tableA
WHERE id IN (
SELECT a.id
FROM tableA a
JOIN tableB b
ON b.usi = a.usi
WHERE b.usc = 'ABC'
AND a.cfs = '01.01.2013')