How to compare data from two databases

How to compare data from two databases - sql

I have two databases for two different stores. They have the same tables and structure.
When I created the second one, I made a copy of the first one and I'm pretty sure that I forgot to delete some data from the first one before I activated the second one.
Now I need to find this data and delete it from db two because I have for example articles from store 1 in store 2's database.
Is there some easy way to identify the data that is in both databases?

So i find one way but this would cost me a lot of time.
In heidisql that i use i made a copy of the article table in db 1 and put it in db 2 with the Name article_db1
Then i made a SELECT * FROM article a INNER JOIN article_db1 ON a.artnr = b.artnr and like this i got the matches that is in both databases.
My Problem is that im afraid i have to do this on all 40 tables in the databases and becuase of that i was thinking that it's maybe some faster way to solve this.

Related

SQLite data comparison on two tables with 2 million rows of data

I've been trying to find a way of comparing huge amount of data in two different tables but I am not sure if this is the correct approach.
So, this is why I am asking it here to understand a problem more and getting some clarity in order to solve it.
As title says I have two tables with less than 2 million rows of data and I need to a data comparison on them. So basically I only need to check if data in one table matches the data from other tables. Each table comes from separate database and I've managed to create views in order to have the same column names.
Here is my approach which gives me differences from two tables.
SELECT db1.ADDRESS_ID, db1.address
FROM UAT_CUSTOMERS_DB1 db1
EXCEPT
SELECT db2.ADDRESS_ID, db2.address
FROM UAT_CUSTOMERS_DB2 db2;
I have two questions, so basically this works:
It seems pretty straightforward but can someone explain to me in a bit more depth how this query works with such speed ? Yes I know - read docs but I would really appreciate alternative answer.
How can I include all the columns from tables without specifying manually each column name ?

SQL SELECT JOIN Query over multiple tables

I’m trying to get data about Installations in buildings. The problem is that one building can have multiple installations and I’m unsure how to adjust my sql for that as the initial table I query only holds the relations that own the buildings.
Here’s the situation.
Table 1 (RELRLGRP) holds the id of the group the relations that own the buildings that have the installations that have the data I need.
This is what I have so far, I’m worried I shouldn’t use this many joins in an SQL statement but cannot find a quicker link between the information I need from my starting point at the group of relations till the installation data I seek in the BORGINST table. Please disregard the select portion of the statement (removed it for clarity).
SELECT *
FROM RELRLGRP A
JOIN RELATION R ON A.RELATION_GC_ID = R.GC_ID
JOIN BUILDING G ON R.CODE = G.GC_CODE
JOIN INSTALL I ON G.GC_CODE = I.GC_CODE
JOIN BORGINST B ON I.GC_ID = B.GC_ID
WHERE A.RELGROUP_GC_ID LIKE '100109' (<- the group the relations belong to)
I’ve done some rudimentary SQL but this linking through tables is new territory for me, in that sense I’d be happy to know if this many join statements are the way to go or if I should head a different route entirely.

JesseJ - Since I don't know all the columns that exist in your tables, I am going to assume that you are joining on primary keys. If this is the case, your solution may be the only one available to link the RELRLGRP to the BORGINST table.
Linking multiple tables like you are doing can be common in a normalized database.
Example:
In the example I posted, in order to find the State where a particular transaction happened, you have to link all the tables together. There is no shortcut.

Don't sweat it: I have views with three times as many joins. Every join does add complexity and suck more processor, but it really comes down to performance: if this process doesn't finish as quickly as you need it to, you can look into other methods, but otherwise multiple joins like this are perfectly fine.

Merging multiple tables from multiple databases with all rows and columns

I have 30 databases from a survey application that all have a table of results with approximately 100 columns in each. Most of the columns are identical but each survey seems to have a unique column or two added in with no real pattern (these are the added questions and results of the survey). As I am working on the statement to join all of the tables into one large master table the code is getting quite complex. Is there a more efficient way to merge these tables from multiple databases and just select all rows and columns so it will merge if the column exists and create if it encounters a new column?

No, there isn't an automatic way to merge a bunch of similar, but not quite the same, tables into one. At least, not in any database system that I know of.
You could possibly automate something like that with a fairly simple script that relies on your database's information schema (or equivalent).
However, with only 30 tables and only a column or two different in each, I'm not sure it's worth it. A manual approach, with copying and pasting and making minor changes, would probably be faster.
Also, consider whether the "extra" columns that are unique to individual tables need to go into the combined table. The point of making a big single table is to process/analyze all the data together. If something only applies to a single source, this isn't possible.

How do I compare two tables in different databases

I have two databases, which are basically identical running on the same machine.
I would like to compare the records in a table on database A vs the same table in database B
I would like to know which records exist in the table on database A that do not exist in the same table on database B.
Database A = "RICSTOREV341"
Database B = "RICHOSTV341"
The table is "Price_Tab"
The columns I would like to pull are F01, F26, F27, F19, F38
Can this be accomplished?

Yes, this can be done.
You could use something like a three part identifier to identify the tables in different database. Kinda like this :
RICSTOREV341.dbo.Price_Tab
Then you can perform a join on the primary key and fetch the result.

Besides the solutions above - I can also suggest using some 3rd party tools for performing data comparison, and most of them have a fully functional free trial (like SQL data comparison tools from ApexSQL or Redgate).
These tools can help you save lots of time, as they can do data comparison and synchronization in just a couple of clicks.
Hope I helped.

What is the best way to query data from multilpe tables and databases?

I have 5 databases which represent different regions of the country. In each database, there are a few hundred tables, each with 10,000-2,000,000 transaction records. Each table is a representation of a customer in the respective region. Each of these tables has the same schema.
I want to query all tables as if they were one table. The only way I can think of doing it is creating a view that unions all tables, and then just running my queries against that. However, the customer tables will change all the time (as we gain and lose customers), so I'd have to change the query for my view to include new tables (or remove ones that are no longer used).
Is there a better way?
EDIT
In response to the comments, (I also posted this as a response to an answer):
In most cases, I won't be removing any tables, they will remain for historic purposes. As I posted in comment to one response, the idea was to reduce the time it takes a smaller customers (one with only 10,000 records) to query their own history. There are about 1000 customers with an average of 1,000,000 rows (and growing) a piece. If I were to add all records to one table, I'd have nearly a billion records in that table. I also thought I was planning for the future, in that when we get say 5000 customers, we don't have one giant table holding all transaction records (this may be an error in my thinking). So then, is it better not to divide the records as I have done? Should I mash it all into one table? Will indexing on customer Id's prevent delays in querying data for smaller customers?

I think your design may be broken. Why not use one single table with a region and a customer column?
If I were you, I would consider refactoring to one single table, and if necessary (for reverse compatibility for example), I would use views to provide the same info as in the previous tables.
Edit to answer OP comments to this post :
One table with 10 000 000 000 rows in it will do just fine, provided you use proper indexing. Database servers are built to cope with this kind of volume.
Performance is definitely not a valid reason to split one such table into thousands of smaller ones !

The architecture of this system smells like it needs a vastly different approach if there are a few hundred tables and each has the same schema
Why are you adding or removing tables at all? This should not be happening under any normal circumstances.

Agree with Brann,
That's an insane DB Schema Design. Why didn't you go with (or is an option to change to) a single normalised structure with columns to filter by region and whatever condition separates each table within a region database.
In that structure you're stuck with some horribly large (~500 tables) unioned view that you would have to dynamically regenerate as regularly as new tables appear in the system.

2 solutions
1. write a stored procedure who build the view for you by parsing all table names in the 5 databases and build the view with union as you would do it by hand.
create a new database with one table and import each night per example all the records of all the tables in this one.

Sounds like your stuck somewhere between a multi and single tenant database shema. Specifically your storing it as "light"multi-tenant (separate tables vs separate databases) but querying it as single-tenant, one query to rule them all.
In the short term have your data access layer dynamically pick the table to query and not union everything together for one uber query.
In the long term pick one approach and stick too it. One database and one table or many databases.
Here are some posts on the subject.
What are the advantages of using a single database for EACH client?
http://msdn.microsoft.com/en-us/library/aa479086.aspx

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas