Fastest way to identify differences between two tables? - sql

I have a need to check a live table against a transactional archive table and I'm unsure of the fastest way to do this...
For instance, let's say my live table is made up of these columns:
Term
CRN
Fee
Level Code
My archive table would have the same columns, but also have an archive date so I can see what values the live table had at a given date.
Now... How would I write a query to ensure that the values for the live table are the same as the most recent entries in the archive table?
PS I'd prefer to handle this in SQL, but PL/SQL is also an option if it's faster.

SELECT term, crn, fee, level_code
FROM live_data
MINUS
SELECT term, crn, fee, level_code
FROM historical_data
Whats on live but not in historical. Can then union to a reverse of this to get whats in historical but not live.

Simply:
SELECT collist
FROM TABLE A
minus
SELECT collist
FROM TABLE B
UNION ALL
SELECT collist
FROM TABLE B
minus
SELECT collist
FROM TABLE A;

You didn't mention how rows are uniquely identified, so I've assumed you also have an "id" column:
SELECT *
FROM livetable
WHERE (term, crn, fee, levelcode) NOT IN (
SELECT FIRST_VALUE(term) OVER (ORDER BY archivedate DESC)
,FIRST_VALUE(crn) OVER (ORDER BY archivedate DESC)
,FIRST_VALUE(fee) OVER (ORDER BY archivedate DESC)
,FIRST_VALUE(levelcode) OVER (ORDER BY archivedate DESC)
FROM archivetable
WHERE livetable.id = archivetable.id
);
Note: This query doesn't take NULLS into account - if any of the columns are nullable you can add suitable logic (e.g. NVL each column to some "impossible" value).

unload to table.unl
select * from table1
order by 1,2,3,4
unload to table2.unl
select * from table2
order by 1,2,3,4
diff table1.unl table2.unl > diff.unl

Could you use a query of the form:
SELECT your columns FROM your live table
EXCEPT
SELECT your columns FROM your archive table WHERE archive date is most recent;
Any results will be rows in your live table that are not in your most recent archive.
If you also need rows in your most recent archive that are not in your live table, simply reverse the order of the selects, and repeat, or get them all in the same query by performing a (live UNION archive) EXCEPT (live INTERSECTION archive)

Related

select rows in sql with latest date from 3 tables in each group

I'm creating PREDICATE system for my application.
Please see image that I already
I have a question how can I select rows in SQL with latest date "Taken On" column tables for each "QuizESId" columns, before that I am understand how to select it but it only using one table, I learn from this
select rows in sql with latest date for each ID repeated multiple times
Here is what I have already tried
SELECT tt.*
FROM myTable tt
INNER JOIN
(SELECT ID, MAX(Date) AS MaxDateTime
FROM myTable
GROUP BY ID) groupedtt ON tt.ID = groupedtt.ID
AND tt.Date = groupedtt.MaxDateTime
What I am confused about here is how can I select from 3 tables, I hope you can guide me, of course I need a solution with good query and efficient performance.
Thanks
This is for SQL Server (you didn't specify exactly what RDBMS you're using):
if you want to get the "latest row for each QuizId" - this sounds like you need a CTE (Common Table Expression) with a ROW_NUMBER() value - something like this (updated: you obviously want to "partition" not just by QuizId, but also by UserName):
WITH BaseData AS
(
SELECT
mAttempt.Id AS Id,
mAttempt.QuizModelId AS QuizId,
mAttempt.StartedAt AS StartsOn,
mUser.UserName,
mDetail.Score AS Score,
RowNum = ROW_NUMBER() OVER (PARTITION BY mAttempt.QuizModelId, mUser.UserName
ORDER BY mAttempt.TakenOn DESC)
FROM
UserQuizAttemptModels mAttempt
INNER JOIN
AspNetUsers mUser ON mAttempt.UserId = muser.Id
INNER JOIN
QuizAttemptDetailModels mDetail ON mDetail.UserQuizAttemptModelId = mAttempt.Id
)
SELECT *
FROM BaseData
WHERE QuizId = 10053
AND RowNum = 1
The BaseData CTE basically selects the data (as you did) - but it also adds a ROW_NUMBER() column. This will "partition" your data into groups of data - based on the QuizModelId - and it will number all the rows inside each data group, starting at 1, and ordered by the second condition - the ORDER BY clause. You said you want to order by "Taken On" date - but there's no such date visible in your query - so I just guessed it might be on the UserQuizAttemptModels table - change and adapt as needed.
Now you can select from that CTE with your original WHERE condition - and you specify, that you want only the first row for each data group (for each "QuizId") - the one with the most recent "Taken On" date value.

SQL Delete Rows Based on Multiple Criteria

I am trying to delete rows from a data set based on multiple criteria, but I am receiving a syntax error. Here is the current code:
With cte As (
Select *,
Row_Number() Over(Partition By ID, Numb1 Order by ID) as RowNumb
from DataSet
)
Delete from cte Where RowNumb > 1;
Where DataSet looks like this:
I want to delete all records in which the ID and the Numb1 are the same. So I would expect the code to delete all rows except:
I am not very experienced with Vertica but it seems like it is not very flexible about delete statements.
One way to do it would be to use a temporary table to store the rows that you want to keep, then truncate the original the table, and insert back into it from the temp table:
create temporary table MyTempTable as
select id, numb1, state_coding
from (select t.*, count(*) over(partition by id, numb1) cnt from DataSet) as t
where cnt = 1;
truncate table DataSet;
insert into DataSet
select id, numb1, state_coding from MyTempTable;
Note that I used a window count instead of row_number. This will remove records for which at least another record exists with the same id and numb1, which is what I understand that you want from your sample data and expected results.
Important: make sure to backup your entire table before you do this!
WITH Clauses in Vertica only support SELECT or INSERT, not DELETE/UPDATE.
Vertica Documentation
The cte is a temporary table. You cannot delete from it. It is effectively read-only.
If you are trying to delete duplicates out of the original DataSet table, you have to delete from the DataSet, not from the cte table.
Try this:
with cte as
(
select
ID,
Row_Number() Over(Partition By ID, Numb1 Order by ID) as RowNumb
from
DataSet
)
delete from DataSet where ID in (select ID from cte where RowNumb > 1)
Can't delete from CTEs. Just manually use delete syntax but rollback transactions or if you have permissions you can always replicate it and test.
You'd have saved me ~5 min had you pasted the data as text and not as picture - as I could not copy-paste and had to retype ...
Having said that:
Rebuild the table here:
DROP TABLE IF EXISTS input;
CREATE TABLE input(id,numb1,state_coding) AS (
SELECT 202003,4718868,'D'
UNION ALL SELECT 202003, 35756,'AA'
UNION ALL SELECT 204281, 146199,'D'
UNION ALL SELECT 204281, 146199,'D'
UNION ALL SELECT 204346, 108094,'D'
UNION ALL SELECT 204346, 108094,'D'
UNION ALL SELECT 204389, 14642,'DD'
UNION ALL SELECT 204389, 96504,'F'
UNION ALL SELECT 204392, 22010,'D'
UNION ALL SELECT 204392, 8051,'G'
UNION ALL SELECT 204400, 74118,'D'
UNION ALL SELECT 204400, 103900,'D'
UNION ALL SELECT 204406,1387304,'D'
UNION ALL SELECT 204406, 0,'HJ'
UNION ALL SELECT 204516, 894,'D'
UNION ALL SELECT 204516, 3927,'D'
UNION ALL SELECT 204586, 234235,'D'
UNION ALL SELECT 204586, 234235,'D'
)
;
And then:
Based on what was said in other responses, and keeping in mind that a mass delete of an important part of the table, not only in Vertica, is best implemented as an INSERT ... SELECT with inverted WHERE condition - here goes:
CREATE TABLE input_help AS
SELECT * FROM input
GROUP BY id,numb1,state_coding
HAVING COUNT(*) = 1;
DROP TABLE input;
ALTER TABLE input_help RENAME TO input;
At least, it works with that simplicity if the whole row is the same - I notice you don't put state_coding into the condition yourself. Otherwise, it gets slightly more complicated.
Or did you want to re-insert one row of the duplicates each afterwards?
Then, just build input_help as SELECT DISTINCT * FROM input; , then drop, then rename.

how to get latest date column records when result should be filtered with unique column name in sql?

I have table as below:
I want write a sql query to get output as below:
the query should select all the records from the table but, when multiple records have same Id column value then it should take only one record having latest Date.
E.g., Here Rudolf id 1211 is present three times in input---in output only one Rudolf record having date 06-12-2010 is selected. same thing with James.
I tried to write a query but it was not succssful. So, please help me to form a query string in sql.
Thanks in advance
You can partition your data over Date Desc and get the first row of each partition
SELECT A.Id, A.Name, A.Place, A.Date FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Id ORDER BY Date DESC) AS rn
FROM [Table]
) A WHERE A.rn = 1
you can use WITH TIES
select top 1 PERCENT WITH TIES * from t
order by (row_number() over(partition by id order by date desc))
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=280b7412b5c0c04c208f2914b44c7ce3
As i can see from your example, duplicate rows differ only in Date. If it's a case, then simple GROUP BY with MAX aggregate function will do the job for you.
SELECT Id, Name, Place, MAX(Date)
FROM [TABLE_NAME]
GROUP BY Id, Name, Place
Here is working example: http://sqlfiddle.com/#!18/7025e/2

select multiple records based on order by

i have a table with a bunch of customer IDs. in a customer table is also these IDs but each id can be on multiple records for the same customer. i want to select the most recently used record which i can get by doing order by <my_field> desc
say i have 100 customer IDs in this table and in the customers table there is 120 records with these IDs (some are duplicates). how can i apply my order by condition to only get the most recent matching records?
dbms is sql server 2000.
table is basically like this:
loc_nbr and cust_nbr are primary keys
a customer shops at location 1. they get assigned loc_nbr = 1 and cust_nbr = 1
then a customer_id of 1.
they shop again but this time at location 2. so they get assigned loc_nbr = 2 and cust_Nbr = 1. then the same customer_id of 1 based on their other attributes like name and address.
because they shopped at location 2 AFTER location 1, it will have a more recent rec_alt_ts value, which is the record i would want to retrieve.
You want to use the ROW_NUMBER() function with a Common Table Expression (CTE).
Here's a basic example. You should be able to use a similar query with your data.
;WITH TheLatest AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY group-by-fields ORDER BY sorting-fields) AS ItemCount
FROM TheTable
)
SELECT *
FROM TheLatest
WHERE ItemCount = 1
UPDATE: I just noticed that this was tagged with sql-server-2000. This will only work on SQL Server 2005 and later.
Since you didn't give real table and field names, this is just psuedo code for a solution.
select *
from customer_table t2
inner join location_table t1
on t1.some_key = t2.some_key
where t1.LocationKey = (select top 1 (LocationKey) as LatestLocationKey from location_table where cust_id = t1.cust_id order by some_field)
Use an aggregate function in the query to group by customer IDs:
SELECT cust_Nbr, MAX(rec_alt_ts) AS most_recent_transaction, other_fields
FROM tableName
GROUP BY cust_Nbr, other_fields
ORDER BY cust_Nbr DESC;
This assumes that rec_alt_ts increases every time, thus the max entry for that cust_Nbr would be the most recent entry.
By using time and date we can take out the recent detail for the customer.
use the column from where you take out the date and the time for the customer.
eg:
SQL> select ename , to_date(hiredate,'dd-mm-yyyy hh24:mi:ss') from emp order by to_date(hiredate,'dd-mm-yyyy hh24:mi:ss');

Return row data based on Distinct Column value

I'm using SQL Azure with asp script, and for the life of me, have had no luck trying to get this to work. The table I'm running a query on has many columns, but I want to query for distinct values on 2 columns (name and email), from there I want it to return the entire row's values.
What my query looks like now:
SELECT DISTINCT quote_contact, quote_con_email
FROM quote_headers
WHERE quote_contact LIKE '"&query&"%'
But I need it to return the whole row so I can retrieve other data points. Had I been smart a year ago, I would have created a separate table just for the contacts, but that's a year ago.
And before I implemented LiveSearch features.
One approach would be to use a CTE (Common Table Expression). With this CTE, you can partition your data by some criteria - i.e. your quote_contact and quote_con_email - and have SQL Server number all your rows starting at 1 for each of those partitions, ordered by some other criteria - i.e. probably SomeDateTimeStamp.
So try something like this:
;WITH DistinctContacts AS
(
SELECT
quote_contact, quote_con_email, (other columns here),
RN = ROW_NUMBER() OVER(PARTITION BY quote_contact, quote_con_email ORDER BY SomeDateTimeStamp DESC)
FROM
dbo.quote_headers
WHERE
quote_contact LIKE '"&query&"%'
)
SELECT
quote_contact, quote_con_email, (other columns here)
FROM
DistinctContacts
WHERE
RowNum = 1
Here, I am selecting only the last entry for each "partition" (i.e. for each pair of name/email) - ordered in a descending fashion by the time stamp.
Does that approach what you're looking for??
You need to provide more details.
This is what I could come up with Without them:
WITH dist as (
SELECT DISTINCT quote_contact, quote_con_email, RANK() OVER(ORDER BY quote_contact, quote_con_email) rankID
FROM quote_headers
WHERE quote_contact LIKE '"&query&"%'
),
data as (
SELECT *, RANK() OVER(ORDER BY quote_contact, quote_con_email) rankID FROM quote_headers
)
SELECT * FROM dist d INNER JOIN data src ON d.rankID = src.rankID