I have this access SQL append query where I have a Primary key set to now allow duplicates in the destination table. I'm confused if I am accomplishing the right thing here with the WHERE condition.
I am trying to only screen the newest records from the source table 'tbl_IMEI_MASTER' and only append(add) the records that do not have a match on the key in the destination table (same identifier as the source table). I think it's working but I did get a message that 72 rows (which is the total that were new and unique for the addition to the source table based on most recent date) were being updated in the destination table when only 14 of them should have been updated/added. Only 14 should have been identified as not having the same unique key.
INSERT INTO leads_historical (Customer, LeadNumber, ImportDate)
SELECT DISTINCT tbl_IMEI_MASTER.Customer, tbl_IMEI_MASTER.LeadNumber, tbl_IMEI_MASTER.ImportDate
FROM tbl_IMEI_MASTER
WHERE tbl_IMEI_MASTER.ImportDate = (SELECT MAX(tbl_IMEI_MASTER.ImportDate) FROM tbl_IMEI_MASTER);
I got it -
Using just select to break it down further I drilled down to the desired result.
SELECT DISTINCT tbl_IMEI_MASTER.Customer, tbl_IMEI_MASTER.LeadNumber, tbl_IMEI_MASTER.ImportDate
FROM tbl_IMEI_MASTER
WHERE tbl_IMEI_MASTER.ImportDate=(SELECT MAX(tbl_IMEI_MASTER.ImportDate) FROM tbl_IMEI_MASTER) AND NOT EXISTS (SELECT leads_historical.Customer FROM leads_historical WHERE leads_historical.Customer = tbl_IMEI_MASTER.Customer);
Related
I have a user table in Hive of the form:
User:
Id String,
Name String,
Col1 String,
UpdateTimestamp Timestamp
I'm inserting data in this table from a file which has the following format:
I/U,Timestamp when record was written to file, Id, Name, Col1, UpdateTimestamp
e.g. for inserting a user with Id 1:
I,2019-08-21 14:18:41.002947,1,Bob,stuff,123456
and updating col1 for the same user with Id 1:
U,2019-08-21 14:18:45.000000,1,,updatedstuff,123457
The columns which are not updated are returned as null.
Now simple insertion is easy in hive using load in path in a staging table and then ignoring the first two fields from the stage table.
However, how would I go about the update statements? So that my final row in hive looks like below:
1,Bob,updatedstuff,123457
I was thinking to insert all rows in a staging table and then perform some sort of merge query. Any ideas?
Typically with a merge statement your "file" would still be unique on ID and the merge statement would determine whether it needs to insert this as a new record, or update values from that record.
However, if the file is non-negotiable and will always have the I/U format, you could break the process up into two steps, the insert, then the updates, as you suggested.
In order to perform updates in Hive, you will need the users table to be stored as ORC and have ACID enabled on your cluster. For my example, I would create the users table with a cluster key, and the transactional table property:
create table test.orc_acid_example_users
(
id int
,name string
,col1 string
,updatetimestamp timestamp
)
clustered by (id) into 5 buckets
stored as ORC
tblproperties('transactional'='true');
After your insert statements, your Bob record would say "stuff" in col1:
As far as the updates - you could tackle these with an update or merge statement. I think the key here is the null values. It's important to keep the original name, or col1, or whatever, if the staging table from the file has a null value. Here's a merge example which coalesces the staging tables fields. Basically, if there is a value in the staging table, take that, or else fall back to the original value.
merge into test.orc_acid_example_users as t
using test.orc_acid_example_staging as s
on t.id = s.id
and s.type = 'U'
when matched
then update set name = coalesce(s.name,t.name), col1 = coalesce(s.col1, t.col1)
Now Bob will show "updatedstuff"
Quick disclaimer - if you have more than one update for Bob in the staging table, things will get messy. You will need to have a pre-processing step to get the latest non-null values of all the updates prior to doing the update/merge. Hive isn't really a complete transactional DB - it would be preferred for the source to send full user records any time there's an update, instead of just the changed fields only.
You can reconstruct each record in the table using you can use last_value() with the null option:
select h.id,
coalesce(h.name, last_value(h.name, true) over (partition by h.id order by h.timestamp) as name,
coalesce(h.col1, last_value(h.col1, true) over (partition by h.id order by h.timestamp) as col1,
update_timestamp
from history h;
You can use row_number() and a subquery if you want the most recent record.
I have the following query
SELECT q.pol_id
FROM quot q
,fgn_clm_hist fch
WHERE q.quot_id = fch.quot_id
UNION
SELECT q.pol_id
FROM tdb2wccu.quot q
WHERE q.nr_prr_ls_yr_cov IS NOT NULL
For every row in that result set, I want to create a new row in another table (call it table1) and update pol_id in the quot table (from the above result set) with the generated primary key from the inserted row in table1.
table1 has two columns. id and timestamp.
I'm using db2 10.1.
I've tried numerous things and have been unsuccessful for quite a while. Thanks!
Simple solution: create a new table for the result set of your query, which has an identity column in it. Then, after running your query, update the pol_id field with the newly generated ID in your result table.
Alteratively, you can do it more manually by using the the ROW_NUMBER() OLAP function, which I often found convenient for creating IDs. For this it is convenient to use a stored procedure which does the following:
get the maximum old id from Table1 and write it into a variable old_max_id.
after generating the result set, write the row-numbers into the table1, maybe by something like
INSERT INTO TABLE1
SELECT ROW_NUMBER() OVER (PARTITION BY <primary-key> ORDER BY <whatever-you-want>)
+ OLD_MAX_ID
, CURRENT TIMESTAMP
FROM (<here comes your SQL query>)
Either write the result set into a table or return a cursor to it. Here you should either use the same ROW_NUMBER statement as above or directly use the ID from Table1.
I want to merge the IndirectFlights table to the PriceTable.
I do not have IDs entered in the SourceTable (IndirectFlights) and I haven't set a PK for it yet.
The ID column for the PriceTable is an Identity (1,1) column and is also the Primary Key.
Qs1 How do I enter IDs in Source column so that they dont clash with target table (PriceTable) IDs? I was thinking of using a sequence but It potentially could clash in future.
Qs2 Can I choose what columns to merge or must I merge all the columns from the Source table?
Target Table (PriceTable) Columns
IDAirport_ICAO_Code,Airline_ICAO_Code,Departure,Price,RouteStatus,DateRowModified
Source Table (IndirectFlights) Columns
IDAirport_ICAO_Code,Destination,Airline,Airline_ICAO_Code,RouteStatus,Connecting Airport
Edit: I have just run the following Union All statement as an alternative to using Merge.
Select ID,Airport_ICAO_Code,Airline_ICAO_Code,RouteStatus
From RoughworkPriceTable
Union All
Select ID,Airport_ICAO_Code,Airline_ICAO_Code,RouteStatus
From RoughworkIndirectFlights;
The code worked but i noticed that the ID column accepted the Null values from IndirectFlights.ID eventhough I have the ID columns set to Not Null.
Can anyone explain this.
Also can someone expalin how I could create a new permanent table from this Union All statement.
You can create a new table with something like
Select * into newTmpTable from (
Select ID,Airport_ICAO_Code,Airline_ICAO_Code,RouteStatus From RoughworkPriceTable
Union All
Select ID,Airport_ICAO_Code,Airline_ICAO_Code,RouteStatus From RoughworkIndirectFlights)
as mergedData;
Sorry for the long question/post but need some help as I've been searching for several days but havent found anything that helps. Seems like it should be easy but..here goes
I have table1 in my (Access 2010) database that has exising records. I have another table2 that after I run a query, it first deletes the data in table 2, then imports new records into that table. I run the import into table 2 on a semi regular basis but have yet to copy all those records into table 1 successfully.
I need to copy only the records from table 2 to table 1 if the records don't already exist in table1. So, each time the query or vba code would run, it would be continuing to grow table 1 without duplicating existing data.
To clarify further, it's data from the Outlook GAL so each time table2 imports that data (lname,fname,phone,email) it needs to be added to table1, but only if it doesn't already exist in table 1.
I have a small start of SQL but cannot get it to work properly because I'm not sure how to add the other fields into this SQL statement properly (unfortunately I don't know a whole lot about SQL or creating an append query):
INSERT INTO [Current] ( FirstName )
SELECT DISTINCT tblglobaladdresslistimport.First
FROM tblglobaladdresslistimport LEFT JOIN [Current] ON tblglobaladdresslistimport.First = Current.FirstName
WHERE Current.FirstName Is Null;
How about this :
INSERT INTO [Current](FirstName, LastName, Phone, Email)
SELECT DISTINCT
tblglobaladdresslistimport.First
, tblglobaladdresslistimport.Last
, tblglobaladdresslistimport.Phone
, tblglobaladdresslistimport.Email
FROM
tblglobaladdresslistimport LEFT JOIN [Current]
ON tblglobaladdresslistimport.First = Current.FirstName
AND tblglobaladdresslistimport.Last = Current.LastName
AND tblglobaladdresslistimport.Phone = Current.Phone
AND tblglobaladdresslistimport.Email = Current.Email
WHERE Current.FirstName Is Null
AND Current.LastName Is Null
AND Current.Phone Is Null
AND Current.Email Is Null;
Adjust column names if I guessed it wrong. That assumed that you don't have primary key, so data in tblglobaladdresslistimport considered already exists if there is a row in Current having same value for all columns.
I have this table which doesn't have a primary key.
I'm going to insert some records in a new table to analyze them and I'm thinking in creating a new primary key with the values from all the available columns.
If this were a programming language like Java I would:
int hash = column1 * 31 + column2 * 31 + column3*31
Or something like that. But this is SQL.
How can I create a primary key from the values of the available columns? It won't work for me to simply mark all the columns as PK, for what I need to do is to compare them with data from other DB table.
My table has 3 numbers and a date.
EDIT What my problem is
I think a bit more of background is needed. I'm sorry for not providing it before.
I have a database ( dm ) that is being updated everyday from another db ( original source ) . It has records form the past two years.
Last month ( july ) the update process got broken and for a month there was no data being updated into the dm.
I manually create a table with the same structure in my Oracle XE, and I copy the records from the original source into my db ( myxe ) I copied only records from July to create a report needed by the end of the month.
Finally on aug 8 the update process got fixed and the records which have been waiting to be migrated by this automatic process got copied into the database ( from originalsource to dm ).
This process does clean up from the original source the data once it is copied ( into dm ).
Everything look fine, but we have just realize that an amount of the records got lost ( about 25% of july )
So, what I want to do is to use my backup ( myxe ) and insert into the database ( dm ) all those records missing.
The problem here are:
They don't have a well defined PK.
They are in separate databases.
So I thought that If I could create a unique pk from both tables which gave the same number I could tell which were missing and insert them.
EDIT 2
So I did the following in my local environment:
select a.* from the_table#PRODUCTION a , the_table b where
a.idle = b.idle and
a.activity = b.activity and
a.finishdate = b.finishdate
Which returns all the rows that are present in both databases ( the .. union? ) I've got 2,000 records.
What I'm going to do next, is delete them all from the target db and then just insert them all s from my db into the target table
I hope I don't get in something worst : - S : -S
The danger of creating a hash value by combining the 3 numbers and the date is that it might not be unique and hence cannot be used safely as a primary key.
Instead I'd recommend using an autoincrementing ID for your primary key.
Just create a surrogate key:
ALTER TABLE mytable ADD pk_col INT
UPDATE mytable
SET pk_col = rownum
ALTER TABLE mytable MODIFY pk_col INT NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
or this:
ALTER TABLE mytable ADD pk_col RAW(16)
UPDATE mytable
SET pk_col = SYS_GUID()
ALTER TABLE mytable MODIFY pk_col RAW(16) NOT NULL
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)
The latter uses GUID's which are unique across databases, but consume more spaces and are much slower to generate (your INSERT's will be slow)
Update:
If you need to create same PRIMARY KEYs on two tables with identical data, use this:
MERGE
INTO mytable v
USING (
SELECT rowid AS rid, rownum AS rn
FROM mytable
ORDER BY
co1l, col2, col3
)
ON (v.rowid = rid)
WHEN MATCHED THEN
UPDATE
SET pk_col = rn
Note that tables should be identical up to a single row (i. e. have same number of rows with same data in them).
Update 2:
For your very problem, you don't need a PK at all.
If you just want to select the records missing in dm, use this one (on dm side)
SELECT *
FROM mytable#myxe
MINUS
SELECT *
FROM mytable
This will return all records that exist in mytable#myxe but not in mytable#dm
Note that it will shrink all duplicates if any.
Assuming that you have ensured uniqueness...you can do almost the same thing in SQL. The only problem will be the conversion of the date to a numeric value so that you can hash it.
Select Table2.SomeFields
FROM Table1 LEFT OUTER JOIN Table2 ON
(Table1.col1 * 31) + (Table1.col2 * 31) + (Table1.col3 * 31) +
((DatePart(year,Table1.date) + DatePart(month,Table1.date) + DatePart(day,Table1.date) )* 31) = Table2.hashedPk
The above query would work for SQL Server, the only difference for Oracle would be in terms of how you handle the date conversion. Moreover, there are other functions for converting dates in SQL Server as well, so this is by no means the only solution.
And, you can combine this with Quassnoi's SET statement to populate the new field as well. Just use the left side of the Join condition logic for the value.
If you're loading your new table with values from the old table, and you then need to join the two tables, you can only "properly" do this if you can uniquely identify each row in the original table. Quassnoi's solution will allow you to do this, IF you can first alter the old table by adding a new column.
If you cannot alter the original table, generating some form of hash code based on the columns of the old table would work -- but, again, only if the hash codes uniquely identify each row. (Oracle has checksum functions, right? If so, use them.)
If hash code uniqueness cannot be guaranteed, you may have to settle for a primary key composed of as many columns are required to ensure uniqueness (e.g. the natural key). If there is no natural key, well, I heard once that Oracle provides a rownum for each row of data, could you use that?