Is it viable to build type 2 history in BiqQuery without knowing primary keys? - sql

I am importing a lot of mainframe extracts into BigQuery daily. Each extract is a full export of all the data available. I've been loading the data into BigQuery and then generating a type 2 history using a SQL MERGE statement, where I join on the primary keys of each table and compare all columns to find differences, closing outdated rows and inserting new/updated rows into a history table. This works fine.
A colleague made the case that we don't need to know the primary keys to do this, we can just treat all data columns together as the unique constraint. With that in mind I created a new MERGE statement, which seems to work just as well as the old one, but without having to know the primary keys, which makes a lot of things easier for us. Here is an example of the new query:
MERGE ods.kfir_history AS main USING (
SELECT FF_NR, FIRMA_PRODENH_TYPE, FIRMA_NAVN_1, FIRMA_NAVN_2, SE_NR, KONCERN_NR, FIRMA_STATUS_DATO, FIRMA_STATUS_DATO_NUL, FIRMA_STATUS, FIRMA_TYPE, ANTAL_ANSAT_DATO, ANTAL_ANSAT_DATO_NUL, ANTAL_ANSAT, ANTAL_ANSAT_NUL, ANTAL_ANSAT_KILDE, FIRMA_STIFTET_DATO, FIRMA_STIFTET_DATO_NUL, AS_REGISTRERET, MOMS_REGISTRERET, FAGLIG_FORENING, HEKTAR, RET_SBH, RET_TIMESTAMP, SUPL_FIRMA_NAVN, SUPL_FIRMA_NAVN_NUL, CVR_NR, P_NR from staging.kfir_new
) AS delta ON
IFNULL(main.FF_NR,'null') = IFNULL(delta.FF_NR,'null') AND
IFNULL(main.FIRMA_PRODENH_TYPE,'null') = FNULL(delta.FIRMA_PRODENH_TYPE,'null') AND
IFNULL(main.FIRMA_NAVN_1,'null') = IFNULL(delta.FIRMA_NAVN_1,'null') AND
IFNULL(main.FIRMA_NAVN_2,'null') = IFNULL(delta.FIRMA_NAVN_2,'null') AND
IFNULL(main.SE_NR,0) = IFNULL(delta.SE_NR,0) AND IFNULL(main.KONCERN_NR,'null') = IFNULL(delta.KONCERN_NR,'null') AND
IFNULL(main.FIRMA_STATUS_DATO,'null') = IFNULL(delta.FIRMA_STATUS_DATO,'null') AND
IFNULL(main.FIRMA_STATUS_DATO_NUL,'null')= IFNULL(delta.FIRMA_STATUS_DATO_NUL,'null') AND
IFNULL(main.FIRMA_STATUS,'null') = IFNULL(delta.FIRMA_STATUS,'null') AND IFNULL(main.FIRMA_TYPE,'null') = IFNULL(delta.FIRMA_TYPE,'null') AND IFNULL(main.ANTAL_ANSAT_DATO,'null') = IFNULL(delta.ANTAL_ANSAT_DATO,'null') AND IFNULL(main.ANTAL_ANSAT_DATO_NUL,'null') = IFNULL(delta.ANTAL_ANSAT_DATO_NUL,'null') AND IFNULL(main.ANTAL_ANSAT,0) = IFNULL(delta.ANTAL_ANSAT,0) AND IFNULL(main.ANTAL_ANSAT_NUL,'null') = IFNULL(delta.ANTAL_ANSAT_NUL,'null') AND IFNULL(main.ANTAL_ANSAT_KILDE,'null') = IFNULL(delta.ANTAL_ANSAT_KILDE,'null') AND IFNULL(main.FIRMA_STIFTET_DATO,'null') = IFNULL(delta.FIRMA_STIFTET_DATO,'null') AND IFNULL(main.FIRMA_STIFTET_DATO_NUL,'null') = IFNULL(delta.FIRMA_STIFTET_DATO_NUL,'null') AND IFNULL(main.AS_REGISTRERET,0) = IFNULL(delta.AS_REGISTRERET,0) AND IFNULL(main.MOMS_REGISTRERET,'null') = IFNULL(delta.MOMS_REGISTRERET,'null') AND IFNULL(main.FAGLIG_FORENING,'null') = IFNULL(delta.FAGLIG_FORENING,'null') AND IFNULL(main.HEKTAR,0) = IFNULL(delta.HEKTAR,0) AND IFNULL(main.RET_SBH,'null') = IFNULL(delta.RET_SBH,'null') AND IFNULL(main.RET_TIMESTAMP,'null') = IFNULL(delta.RET_TIMESTAMP,'null') AND IFNULL(main.SUPL_FIRMA_NAVN,'null') = IFNULL(delta.SUPL_FIRMA_NAVN,'null') AND IFNULL(main.SUPL_FIRMA_NAVN_NUL,'null') = IFNULL(delta.SUPL_FIRMA_NAVN_NUL,'null') AND IFNULL(main.CVR_NR,0) = IFNULL(delta.CVR_NR,0) AND IFNULL(main.P_NR,0) = IFNULL(delta.P_NR,0)
WHEN NOT MATCHED BY SOURCE AND main.SystemTimeEnd = "5999-12-31 23:59:59" --Close all updated records
THEN UPDATE SET SystemTimeEnd=delta.SystemTime, current_timestamp, LastAction = 'U'
WHEN NOT MATCHED BY TARGET --insert all new and updated records
THEN INSERT (FF_NR, FIRMA_PRODENH_TYPE, FIRMA_NAVN_1, FIRMA_NAVN_2, SE_NR, KONCERN_NR, FIRMA_STATUS_DATO, FIRMA_STATUS_DATO_NUL, FIRMA_STATUS, FIRMA_TYPE, ANTAL_ANSAT_DATO, ANTAL_ANSAT_DATO_NUL, ANTAL_ANSAT, ANTAL_ANSAT_NUL, ANTAL_ANSAT_KILDE, FIRMA_STIFTET_DATO, FIRMA_STIFTET_DATO_NUL, AS_REGISTRERET, MOMS_REGISTRERET, FAGLIG_FORENING, HEKTAR, RET_SBH, RET_TIMESTAMP, SUPL_FIRMA_NAVN, SUPL_FIRMA_NAVN_NUL, CVR_NR, P_NR, SystemTime, SystemTimeEnd, Lastupdated, LastAction)
VALUES (delta.FF_NR, delta.FIRMA_PRODENH_TYPE, delta.FIRMA_NAVN_1, delta.FIRMA_NAVN_2, delta.SE_NR, delta.KONCERN_NR, delta.FIRMA_STATUS_DATO, delta.FIRMA_STATUS_DATO_NUL, delta.FIRMA_STATUS, delta.FIRMA_TYPE, delta.ANTAL_ANSAT_DATO, delta.ANTAL_ANSAT_DATO_NUL, delta.ANTAL_ANSAT, delta.ANTAL_ANSAT_NUL, delta.ANTAL_ANSAT_KILDE, delta.FIRMA_STIFTET_DATO, delta.FIRMA_STIFTET_DATO_NUL, delta.AS_REGISTRERET, delta.MOMS_REGISTRERET, delta.FAGLIG_FORENING, delta.HEKTAR, delta.RET_SBH, delta.RET_TIMESTAMP, delta.SUPL_FIRMA_NAVN, delta.SUPL_FIRMA_NAVN_NUL, delta.CVR_NR, delta.P_NR, delta.SystemTime, '5999-12-31 23:59:59', current_timestamp, 'I')
Can anybody tell me if there are any downsides to this approach? Considering BigQuery doesn't deal with primary keys at all, it would be nice to lose the concept completely for us in the solution.

As per my understanding this should not be a problem. Unless your conditions acts on columns which values are not unique, while you have the need to update only that particular datapoint.

Related

ORA-38104 when trying to update my table using merge

I have a stored procedure in which I want to update some columns, so I wrote below code:
PROCEDURE UPDATE_MST_INFO_BKC (
P_SAPID IN NVARCHAR2
) AS
BEGIN
MERGE INTO tbl_ipcolo_billing_mst I
USING (
SELECT
R4G_STATE, -- poilitical state name
R4G_STATECODE, -- poilitical state code
CIRCLE, -- city name
NE_ID,
LATITUDE,
LONGITUDE,
SAP_ID
FROM
R4G_OSP.ENODEB
WHERE
SAP_ID = P_SAPID
AND ROWNUM = 1
)
O ON ( I.SAP_ID = O.SAP_ID )
WHEN MATCHED THEN
UPDATE SET I.POLITICAL_STATE_NAME = O.R4G_STATE,
I.POLITICAL_STATE_CODE = O.R4G_STATECODE,
I.CITY_NAME = O.CIRCLE,
I.NEID = O.NE_ID,
I.FACILITY_LATITUDE = O.LATITUDE,
I.FACILITY_LONGITUDE = O.LONGITUDE,
I.SAP_ID = O.SAP_ID;
END UPDATE_MST_INFO_BKC;
But it is giving me error as
ORA-38104: Columns referenced in the ON Clause cannot be updated: "I"."SAP_ID"
What am I doing wrong?
You are joining the source to destination tables on I.SAP_ID = O.SAP_ID and then, when matched, are trying to update them and set I.SAP_ID = O.SAP_ID. You cannot update the columns used in the join ... and why would you want to as you have already determined that the values are equal.
Just remove the last line of the UPDATE:
...
O ON ( I.SAP_ID = O.SAP_ID )
WHEN MATCHED THEN
UPDATE SET I.POLITICAL_STATE_NAME = O.R4G_STATE,
I.POLITICAL_STATE_CODE = O.R4G_STATECODE,
I.CITY_NAME = O.CIRCLE,
I.NEID = O.NE_ID,
I.FACILITY_LATITUDE = O.LATITUDE,
I.FACILITY_LONGITUDE = O.LONGITUDE;
The error message tells you what the problem is - a MERGE statement cannot update the columns used in the ON clause - and even tells you what column is the problem: "I"."SAP_ID".
So Oracle hurls ORA-38104 because of this line in your WHEN MATCHED branch
I.SAP_ID = O.SAP_ID;
Remove it and your problem disappears. Fortunately the line is unnecessary: I.SAP_ID already equals O.SAP_ID, otherwise the record wouldn't go down the MATCHED branch.
The reason why is quite straightforward: transactional consistency. The MERGE statement operates over a set of records defined by the USING clause and the ON clause. Updating the columns used in the ON clause threatens the integrity of that set, and so Oracle forbids it.

Merge query inserting but not updating on Oracle

I've got this query in a #SQLInsert annotation in Spring against an Oracle 11g database and, although it's inserting properly it is not updating the values but raising no error.
Any ideas? If not, any alternative ways of obtaining same behaviour?
merge INTO ngram sn
USING(SELECT ? AS frequency,
? AS occurrences,
? AS ngram
FROM dual) src
ON (sn.ngram = src.ngram)
WHEN matched THEN
UPDATE SET sn.occurrences = sn.occurrences + src.occurrences,
sn.frequency = sn.frequency + 1
WHEN NOT matched THEN
INSERT (ngram,
frequency,
occurrences)
VALUES (src.ngram,
src.frequency,
src.occurrences)
Update1: I'm adding the Entity def in Java for clarification.
#Entity(name = "NGRAM")
public class Ngram
{
#Id
#JsonProperty("ngram")
private String ngram;
#JsonProperty("frequency")
#Column(name = "frequency")
private int frequency;
#JsonProperty("occurrences")
#Column(name = "occurrences")
private int occurrences;
Update2: Adding a run of the SQL query on an existing Ngram
sql> merge INTO NGRAM sn
USING(SELECT 1 AS frequency,
200 AS occurrences,
'year' AS ngram
FROM dual) src
ON (sn.ngram = src.ngram)
WHEN matched THEN
UPDATE SET sn.occurrences = sn.occurrences + src.occurrences,
sn.frequency = sn.frequency + 1
WHEN NOT matched THEN
INSERT (ngram,
frequency,
occurrences)
VALUES (src.ngram,
src.frequency,
src.occurrences)
[2019-01-29 12:09:10] 1 row affected in 19 ms
This actually modifies the row, but not when I go over them in the code, so it seems the SQL is right...
Update3: So, this is the piece of code that should be inserting or updating but it's only doing the insert:
List<Ngram> lNgrams = new ArrayList<>(lCollocationsMap.size());
lCollocationsMap.forEach((pKey, pValue) -> lNgrams.add( new Ngram(pKey, 1, pValue)));
mNgramRepo.saveAll(lNgrams);
So I'll be investigating the saveAll behaviour.
Update4: I tried using save one by one which is much slower instead of using saveAll but got same behaviour. Changing ON (sn.ngram = src.ngram) to ON (sn.ngram LIKE src.ngram) makes some of the frequencies to become 2 (so they seem to be updated) but not all of them: 'year' for example appears more than 10 times but it's frequency remains 1 and it's occurrences is just updated with last value found.
So, I'm now completely lost on why this is failing, specially in this way.

How to update columns in a table joined with other tables?

I want to update two columns of a table with reference of other tables. While executing the script its showing error.
Error: Error starting at line 1 in command:
UPDATE wb_costing_work_items,
sa_sales_documents,
sa_sales_document_items
SET cwi_price_per_hour = sdi_price,
cwi_amount = sdi_price * cwi_hours
WHERE cwi_lo_id = sad_lo_id
AND sdi_sad_id = sad_id
AND sdi_wit_id = cwi_wit_id
AND cwi_id = 1650833
Error at Command Line:1 Column:28 Error report: SQL Error: ORA-00971:
missing SET keyword
00971. 00000 - "missing SET keyword"
SQL STATEMENT
UPDATE wb_costing_work_items cwi,
sa_sales_documents sad,
sa_sales_document_items sdi
SET cwi.cwi_price_per_hour = sdi.sdi_price,
cwi.cwi_amount = sdi.sdi_price * cwi.cwi_hours
WHERE cwi.cwi_lo_id = sad.sad_lo_id
AND sdi.sdi_sad_id = sad.sad_id
AND sdi.sdi_wit_id = cwi.cwi_wit_id
AND cwi.cwi_id = 1650855
This should definitely work.
UPDATE (SELECT cwi_price_per_hour,
sdi_price,
cwi_amount,
sdi_price,
cwi_hours
FROM wb_costing_work_items,
sa_sales_documents,
sa_sales_document_items
WHERE cwi_lo_id = sad_lo_id
AND sdi_sad_id = sad_id
AND sdi_wit_id = cwi_wit_id
AND cwi_id = 1650833)
SET cwi_price_per_hour = sdi_price, cwi_amount = sdi_price * cwi_hours
Please alias the tables used and prefix columns so one can easily read your query.
Something like this maybe.
Note that I had to use some wild guesses about which column belongs to which table as you have not included any information about how your tables are define.
So most probably I didn't get it right and you will need to adjust the query to your actual table structure.
merge into wb_costing_work_items
using
(
select cwi.pk_column, -- don't know what the PK of the wb_costing_work_items is due to lack of information
sdi.sdi_price, -- don't know if this column really belong to this table
sdi.sdi_price * cwi.cwi_hours as total-- again unsure about column names due to lack of information
FROM wb_costing_work_items cwi
JOIN sa_sales_documents sad ON sad.sad_lo_id = cwi.cwi_lo_id -- not sure about these, due to lack of information
JOIN sa_sales_document_items sdi
ON sdi.sad_id = sad.sdi_sad_id -- not sure about these, due to lack of information
AND sdi.sdi_wit_id = cwi.cwi_wit_id -- not sure about these, due to lack of information
) t ON (t.pk_column = w.pk_column) -- not sure about this, due to lack of information
when matched then
update
set cwi_price_per_hour = t.sdi_price,
cwi_amount = t.total;

Getting table(records) to update properply using the MERGE Statement

Good morning everyone!
Below is a piece of code I stitched together: I used a CTE to grab the records(data) from a link table and than convert strings to dates, than use the merge statement to get the data into a local table:
I am having a problem with the column(field) LAST_RACE_DATE this field is set to NULL and is not required but it does not update with my current set up. What I am trying to accomplished is for this field to populate when data is entered but also update, meaning it should also update with NULL.
So if the field has a specific date, and a new date is entered in the remote database, this field should update as well, even if the data is deleted in the back end, it should also remove the local table data for this field.
WITH CTE AS(
SELECT MEMBER_ID
,[MEMBER_DATE] = MAX(CONVERT(DATE, MEMBER_DATE))
,RACE_DATE = MAX(CONVERT(DATE, RACE_DATE))
,LAST_RACE_DATE = MAX(CONVERT(DATE, LAST_RACE_DATE))
FROM [EXAMPLE].[dbo].[LINKED_MEMBER_DATA]
WHERE (MEMBER_DATE IS NOT NULL) AND (ISDATE(MEMBER_DATE)<> 0) AND (RACE_DATE IS NOT NULL) AND (ISDATE(RACE_DATE)<> 0)
AND (LAST_RACE_DATE IS NULL) OR (ISDATE(LAST_RACE_DATE)<> 0)
GROUP BY MEMBER_ID)
MERGE dbo.LINKED_MEMBER_DATA AS Target
USING (SELECT
MEMBER_ID, MEMBER_DATE, RACE_DATE, LAST_RACE_DATE
FROM CTE
GROUP BY MEMBER_ID, RACE_DATE, LAST_RACE_DATE)AS SOURCE ON (Target.MEMBER_ID = SOURCE.MEMBER_ID)
WHEN MATCHED AND
(Target.MEMBER_DATE) <> (SOURCE.MEMBER_DATE)
OR (Target.RACE_DATE) <> (SOURCE.RACE_DATE)
OR ISNULL(TARGET.LAST_RACE_DATE , Target.LAST_RACE_DATE) <> ISNULL(SOURCE.LAST_RACE_DATE, SOURCE.LAST_RACE_DATE)
THEN UPDATE SET
Target.MEMBER_DATE = SOURCE.MEMBER_DATE
,Target.RACE_DATE = SOURCE.RACE_DATE
,Target.LAST_RACE_DATE = SOURCE.LAST_RACE_DATE
WHEN NOT MATCHED BY TARGET THEN
INSERT(
MEMBER_ID, MEMBER_DATE, RACE_DATE, LAST_RACE_DATE)
VALUES (Source.MEMBER_ID, Source.MEMBER_DATE, Source.RACE_DATE, Source.LAST_RACE_DATE);
I also tried this:
ISNULL(Target.LAST_RACE_DATE,'N/A') <> ISNULL(SOURCE.LAST_RACE_DATE,'N/A')
But it generates the below error for dates conversion:
Conversion failed when converting date and/or time from character string.
Thanks a Million!!
Your current statement is failing because the ISNULLs that you have don't do anything (if one of the values is NULL the expression will evaluate to NULL), and NULL values don't compare. Your second attempt doesn't work because ISNULL requires the data types of the two values to be the same, so you could try eg ISNULL(Target.LAST_RACE_DATE, '1970-01-01') <> ISNULL(Source.LAST_RACE_DATE, '1970-01-01').
Another option would be to simply enumerate the different cases (eg, (((Source.LAST_RACE_DATE IS NULL AND Target.LAST_RACE_DATE IS NOT NULL) OR (Source.LAST_RACE_DATE IS NOT NULL AND Target.LAST_RACE_DATE IS NULL) OR (Source.LAST_RACE_DATE <> Target.LAST_RACE_DATE))). Enumerating the different situations makes the code a bit more verbose, but it can result in better performance (whether it is measurably better really depends on how much data you are processing).

Oracle update based on query

I've got the folloing tables:rashodz, naklrashodz, transport, trans_task_load, shipment_plan.
What I need is to update tbl:rashodz.id_ship whith tbl:shipment_plan.id_ship as follows
rashodz.id2 = shipment_plan.id2
and rashodz.nsthet = naklrashodz.nsthet
and naklrashodz.nsthet = trans_task_load.nsthet
and shipment_plan.idts = trans_task_load.idts
Just do not have an idea how to do it.
When there are two tables under consideration there is no problem, but how to do it when multiple tables are involved?
I would appreciate any help.
Is the code snippet you give the condition for the update?
Then how about:
update rashodz r
set id_ship = nvl(
(select id_ship
from shipment_plan,
naklrashodz,
transport,
trans_task_load
where r.id2 = shipment_plan.id2
and r.nsthet = naklrashod.nsthet
and naklrashod.nsthet = trans_task_load.nsthet
and shipment_plan.idts = trans_task_load.idts
), id_ship)
The purpose of the NVL in this case is that if there is no match, id_ship remains unchanged from its original value.