Data reconciliation between 2 datasets on SQL - sql

image_table
I currently need to find all the differences between a new_master dataset and a previous one using SQL Oracle. The datasets have the same structure and consist of both integers and strings and do not have a unique key id unless I select several columns together. You can see an image at the beginning as image_table. I found online this code and I wanted to ask you if you have any advices.
SELECT n.*
FROM new_master as n
LEFT JOIN old_master as o
ON (n.postcode = o.postcode)
WHERE o.postcode IS NULL
SORT BY postcode
In doing so I should get back all the entries from the new_master that are not in the old one.
Thanks

If you are in an Oracle databse, there are a couple queries that can help you find any differences.
Find any records in OLD that are not in NEW.
SELECT * FROM old_master
MINUS
SELECT * FROM new_master;
Find any records in NEW that are not in OLD.
SELECT * FROM new_master
MINUS
SELECT * FROM old_master;
Count number of items in OLD
SELECT COUNT (*) FROM old_master;
Count number of items in NEW
SELECT COUNT (*) FROM new_master;
The COUNT queries are needed in addition to the MINUS queries in case there are duplicate rows with the same column data.

Related

Selecting a single row from a column that has multiple rows

I'm a SQL newbie so bear with me.
I am writing a select statement to select data from multiple tables which I have done however when I try to select a specific column I get duplicates as that column can rightly have multiple rows. What I want to do is select the most appropriate row and select that.
My code so far:
Select
a.[StudentId], a.[Name], a.[StartDT], a.[EndDT],
b.[ClassID], b.[Module], b.[ModStart], b.[ModEnd]
from
[Data].[StudentTbl] a
left join
[Data].[ClassTbl] b on a.[StudentId] = b.[Student_ID]
When I select the b.[Module] I'm getting multiple rows as there can be a number of modules per class however I am wanting to select the b.[Module] the student has completed before leaving.
Essentially if the a.[EndDT] is equal to b.[ModEnd], I need that specific row. Max function doesn't always work as there are DQ issues within the ClassTbl that when a student has left a row is inserted after the last module saying N/A
What I'm currently getting is this:
What I want to get eventually:

Oracle SQL Developer(4.0.0.12)

First time posting here, hopes it goes well.
I try to make a query with Oracle SQL Developer, where it returns a customer_ID from a table and the time of the payment from another. I'm pretty sure that the problems lies within my logicflow (It was a long time I used SQL, and it was back in school so I'm a bit rusty in it). I wanted to list the IDs as DISTINCT and ORDER BY the dates ASCENDING, so only the first date would show up.
However the returned table contains the same ID's twice or even more in some cases. I even found the same ID and same DATE a few times while I was scrolling through it.
If you would like to know more please ask!
SELECT DISTINCT
FIRM.customer.CUSTOMER_ID,
FIRM.account_recharge.X__INSDATE FELTOLTES
FROM
FIRM.customer
INNER JOIN FIRM.account
ON FIRM.customer.CUSTOMER_ID = FIRM.account.CUSTOMER
INNER JOIN FIRM.account_recharge
ON FIRM.account.ACCOUNT_ID = FIRM.account_recharge.ACCOUNT
WHERE
FIRM.account_recharge.X__INSDATE BETWEEN TO_DATE('14-01-01', 'YY-MM-DD') AND TO_DATE('14-12-31', 'YY-MM-DD')
ORDER
BY FELTOLTES
Your select works like this because a CUSTOMER_ID indeed has more than one X__INSDATE, therefore the records in the result will be distinct. If you need only the first date then don't use DISTINCT and ORDER BY but try to select for MIN(X__INSDATE) and use GROUP BY CUSTOMER_ID.
SELECT DISTINCT FIRM.customer.CUSTOMER_ID,
FIRM.account_recharge.X__INSDATE FELTOLTES
Distinct is applied to both the columns together, which means you will get a distinct ROW for the set of values from the two columns. So, basically the distinct refers to all the columns in the select list.
It is equivalent to a select without distinct but a group by clause.
It means,
select distinct a, b....
is equivalent to,
select a, b...group by a, b
If you want the desired output, then CONCATENATE the columns. The distict will then work on the single concatenated resultset.

SELECT DISTINCT returns more rows than expected

I have read many answers here, but until now nothing could help me. I'm developing a ticket system, where each ticket has many updates.
I have about 2 tables: tb_ticket and tb_updates.
I created a SELECT with subqueries, where it took a long time (about 25 seconds) to get about 1000 rows. Now I changed it to INNER JOIN instead many SELECTs in subqueries, it is really fast (70 ms), but now I get duplicates tickets. I would like to know how can I do to get only the last row (ordering by time).
My current result is:
...
67355;69759;"COMPANY X";"2014-08-22 09:40:21";"OPEN";"John";1
67355;69771;"COMPANY X";"2014-08-26 10:40:21";"UPDATE";"John";1
The first column is the ticket ID, the second is the update ID... I would like to get only a row per ticket ID, but DISTINCT does not work in this case. Which row should be? Always the latest one, so in this case 2014-08-26 10:40:21.
UPDATE:
It is a postgresql database. I did not share my current query because it has only portuguese names, so I think it would not help at all.
SOLUTION:
Used_By_Already had the best solution to my problem.
Without the details of your tables one has to guess the field names, but it seems that tb_updates has many records for a single record in tb_ticket (a many to one relationship).
A generic solution to your problem - to get just the "latest" record - is to use a subquery on tb_updates (see alias mx below) and then join that back to tb_updates so that only the record that has the latest date is chosen.
SELECT
t.*
, u.*
FROM tb_ticket t
INNER JOIN tb_updates u
ON t.ticket_id = u.ticket_id
INNER JOIN (
SELECT
ticket_id
, MAX(updated_at) max_updated
FROM tb_updates
GROUP BY
ticket_id
) mx
ON u.ticket_id = mx.ticket_id
AND u.updated_at = mx.max_updated
;
If you have a dbms that supports ROW_NUMBER() then using that function can be a very effective alternative method, but you haven't informed us which dbms you are using.
by the way:
These rows ARE distinct:
67355;69759;"COMPANY X";"2014-08-22 09:40:21";"OPEN";"John";1
67355;69771;"COMPANY X";"2014-08-26 10:40:21";"UPDATE";"John";1
69759 is different to 69771, and that is enough for the 2 rows to be DISTINCT
there are difference in the 2 dates also.
distinct is a row operator which means is considers the entire row, not just the first column, when deciding which rows are unique.
Used_By_Already's solution would work just fine. I'm not sure on the performance but another solution would be to use cross apply, though that is limited to only a few DBMS's.
SELECT *
FROM tb_ticket ticket
CROSS APPLY (
SELECT top(1) *
FROM tb_updates details
ORDER BY updateTime desc
WHERE details.ticketID = ticket.ticketID
) updates
U Can try something like below if your updateid is identity column:
Select ticketed, max(updateid) from table
group by ticketed
To obtain last row you have to end your query with order by time desc then use TOP (1) in the select statement to select only the first row in the query result
ex:
select TOP (1) .....
from .....
where .....
order by time desc

question about SQL query

Given a relation R with n columns. Use sql to returns the tuples having the maximum number of occurrences of the values. I have no idea how to do query horizontally?
SELECT MAX(t.*) FROM mytable t
or
SELECT DISTINCT a, b, c FROM mytable
or
SELECT DISTINCT * FROM mytable
it depends on which SQL implementation you are referring to, and generally more information about the query. but the above examples should get you started so you can google some terms.
I'm not sure what you mean by querying horizontally. Is it one relation with multiple key columns linking the two tables? Sounds like you might just need to group by those columns and order by count(*) descending...

SQL: Get list of numbers not in use by other rows

I'm using PostgreSQL 8.1.17, and I have a table with account numbers. The acceptable range for an account number is a number between 1 and 1,000,000 (a six digit number). The column "acctnum" contains the account number. Selecting all the numbers in use is easy (SELECT acctnum FROM tbl_acct_numbers ORDER BY acctnum). What I would like to do is select all the numbers in the acceptable range that are not in use, that is, they aren't found in any rows within the column acctnum.
SELECT
new_number
FROM
generate_series(1, 1000000) AS new_number
LEFT JOIN tbl_acct_numbers ON new_number = acctnum
WHERE
acctnum IS NULL;
Are you sure you want to do this? I assume the reason you want to find unused numbers is so that you can use them.
But consider that the numbers might not be there because someone did use them in the past, and deleted the row from the database. So if you find that account number and re-use it for a new account, you could be assigning a number that was previously used and was deleted for a reason.
If there are any historical documents in other systems that reference that account number, they might mistakenly be associated with the new account using the same number.
However, as long as you have considered this, you can find unused id's in the following way:
SELECT t1.acctnum-1 AS unused_acctnum
FROM MyTable t1
LEFT OUTER JOIN MyTable t2 ON t2.acctnum = t1.acctnum-1
WHERE t2.acctnum IS NULL;
Granted, this doesn't find all the unused acctnums, only a set of them that are 1 less than a number that's in use. But that might give you enough to work with.
The answer from #Alex Howansky does give you all unused acctnums, but it may return a large set of rows.
select *
from generate_series(1, 1000000) as acctnum
where acctnum not in (select acctnum from tbl_acct_numbers);
You can generate the series of numbers from 1-1,000,000 and then MINUS the results of your query.
select * from generate_series(1,1000000)
EXCEPT
SELECT acctnum FROM tbl_acct_numbers;