SQLite data comparison on two tables with 2 million rows of data - sql

I've been trying to find a way of comparing huge amount of data in two different tables but I am not sure if this is the correct approach.
So, this is why I am asking it here to understand a problem more and getting some clarity in order to solve it.
As title says I have two tables with less than 2 million rows of data and I need to a data comparison on them. So basically I only need to check if data in one table matches the data from other tables. Each table comes from separate database and I've managed to create views in order to have the same column names.
Here is my approach which gives me differences from two tables.
SELECT db1.ADDRESS_ID, db1.address
FROM UAT_CUSTOMERS_DB1 db1
EXCEPT
SELECT db2.ADDRESS_ID, db2.address
FROM UAT_CUSTOMERS_DB2 db2;
I have two questions, so basically this works:
It seems pretty straightforward but can someone explain to me in a bit more depth how this query works with such speed ? Yes I know - read docs but I would really appreciate alternative answer.
How can I include all the columns from tables without specifying manually each column name ?

Related

Should i combine two table become one table to improve query performance

I have two result table for calculate Monthly Allowace and OT-Allowance.
Should i combine two table become one table or use view or only join table in sql statement?
Example
Result
The question becomes, why are there two tables in the first place? If both tables are always 1 to 1 (every time a row goes into first table it must also always go into second table) then yes, combine them, save the space of the duplicate data and indexes. If it is possible for the two tables to have different entries (one always a row per "incident" but the other may not or data sometimes in first table and sometimes in second) the keep them as two to minimize the dead space per row. The data should drive the table designs. You don't say how many rows you are talking. If the rows, tables, and DB are small, does it really matter other than trying to design well (always a good idea)?
Another related consideration is how much data is already in the current design and how much effort is needed to convert to a different design? Is it worth the effort? I prefer to normalize the data "mostly" (am not fanatical about it) but also tend to not mess with what isn't broken.
Sorry if you were wanting a more clear-cut answer...

Merging multiple tables from multiple databases with all rows and columns

I have 30 databases from a survey application that all have a table of results with approximately 100 columns in each. Most of the columns are identical but each survey seems to have a unique column or two added in with no real pattern (these are the added questions and results of the survey). As I am working on the statement to join all of the tables into one large master table the code is getting quite complex. Is there a more efficient way to merge these tables from multiple databases and just select all rows and columns so it will merge if the column exists and create if it encounters a new column?
No, there isn't an automatic way to merge a bunch of similar, but not quite the same, tables into one. At least, not in any database system that I know of.
You could possibly automate something like that with a fairly simple script that relies on your database's information schema (or equivalent).
However, with only 30 tables and only a column or two different in each, I'm not sure it's worth it. A manual approach, with copying and pasting and making minor changes, would probably be faster.
Also, consider whether the "extra" columns that are unique to individual tables need to go into the combined table. The point of making a big single table is to process/analyze all the data together. If something only applies to a single source, this isn't possible.

Does number of columns in a table affect performance if you don't use them in a join?

I have a table with 49 columns and I need to add two more. I could add another table related to that one with those new columns to avoid making the table bigger and bigger.
However I would like to know how much would affect performance having 2 more columns in that table if they are not used in joins?
Is there really a difference in performance doing the join of table A with table B if A has 4 columns or 100 if you only use 3 of them?
Also the table is not highly populated, it doesn't even have 500 rows but would like to know as the DBA doesn't like it just to understand his point of view.
Thanks.
EDIT:
I'll edit to explain that my only work in this table is to add 2 more columns to the existing 49 that the table currently holds and that they will be bit columns. So that's why I wanted to know if increasing the columns size would impact the performance at all assuming they never do a select * when they join with that table.
I think the best answer is: does these new columns will be empty most of the time for your rows ?
Yes: Maybe you can add these columns to the main table. Depends if you needs them most of the time you select rows in this table.
No: Create a new table and join. Empty column for each row is useless disk space.
NB: 50 columns seems like horrible anyway...
Adding these two columns to your table should not significantly impact performance, in particular as your table stores less than 500 rows. Saying that, your DBA does not like this as it does't follow best practices for table design, in particular if many column values are NULL/empty and this design will not scale well. However, unless you anticipate that this table is going to grow in size rapidly, adding two columns should not pose a performance problem.
If you add another table then I assume you will have to use joins to access that data properly. Really this could end up costing more than the two new attributes would added to the single table.
If your table could be refactored then that would be the best option, but if not then you would only lose efficiency by attempting to do so. Dont make a second table simply to stay under the 50 attributes. Two columns added to 49 is not going to be an unworkable load by any measure, but there could be other reasons to redesign your table. If you have a bunch of empty or null cells then you are wasting resources and giving your system more work to do, finding a way to eliminate these would undoubtedly have a greater effect on performance than adding a column or two

How to compare data from two databases

I have two databases for two different stores. They have the same tables and structure.
When I created the second one, I made a copy of the first one and I'm pretty sure that I forgot to delete some data from the first one before I activated the second one.
Now I need to find this data and delete it from db two because I have for example articles from store 1 in store 2's database.
Is there some easy way to identify the data that is in both databases?
So i find one way but this would cost me a lot of time.
In heidisql that i use i made a copy of the article table in db 1 and put it in db 2 with the Name article_db1
Then i made a SELECT * FROM article a INNER JOIN article_db1 ON a.artnr = b.artnr and like this i got the matches that is in both databases.
My Problem is that im afraid i have to do this on all 40 tables in the databases and becuase of that i was thinking that it's maybe some faster way to solve this.

Efficient way to query similarly-named tables with identical column names

I'm building a report in SSRS which takes data from several tables with similar names. There are three different 'sets' of tables - i.e., 123xxx, 456xxx, and 789xxx. Within these groups, the only difference in the table names is a three-digit code for a worksite, so, for example, we might have a table called 123001, 123010, and 123011. Within each set of tables, the columns have the same names.
The problem is that there are about 15 different sites, and I'm taking several columns from each site and each group of tables. Is there a more efficient way to write that query than to write out the name of every single column?
I don't believe there is but I feel like the use of aliases on your tables would make it much easier to undestand/follow your query building.
Also, if you aren't comparing values on the tables at all, then maybe a union between each table select would help make sense too.
I would give each table an alias.
SELECT s1t1.name
FROM Site1Table1 as s1t1;