how to combine multi tables in one table? - sql

Assume we have several different tables with same formats, same number of columns. How to combine these tables in one table?
I am taking the Google Data analyze course and learned CONCAT to combine strings together. I am wondering how to combine different tables together if needed. I googled and found out multiple options like join, full join. At this point, I haven't got the idea the best practice for these different merging/combine functions.

Related

How to retrieve identical entries from two different queries which are based on the same table

I think I got a knot in my line of thought, surely you can untie it.
Basically I have two working queries which are based on the same table and result in an identical structure (same as source table). They are simply two different kinds of row filters. Now I would like to "stack" these filters, meaning that I want to retract all the entries which are in query a and query b.
Why do I want that?
Our club is structured in several local groups and I need to hand different kinds of lists (e.g. members with email-entry) to these groups. In this example I would have a query "groupA" and a query "newsletter". Another could be "groupB" and "activemember", but also "groupB" and "newsletter". Unfortunately each query is based on a set of conditions, which imho would be stored best in a single query instead of copying the conditions several times to different queries (in case something changes).
Judging from the Venn diagrams 1, I suppose I need to use INNER JOIN but could not get it to work. Neither with the LibreOffice Base query assistant nor an SQL-Code. I tried this:
SELECT groupA.*
FROM groupA
INNER JOIN newsletter
ON groupA.memberID = newsletter.memberID
The error code says: Cannot be in ORDER BY clause in statement
I suppose that the problem comes from the fact, that both queries are based on the same table.
May be there is an even easier way of nesting queries?
I am hoping for something like
SELECT * FROM groupA
WHERE groupA.memberID = newsletter.memberID
Thank you and sorry if this already has a duplicate, I just could not find the right search terms.

Merging multiple tables from multiple databases with all rows and columns

I have 30 databases from a survey application that all have a table of results with approximately 100 columns in each. Most of the columns are identical but each survey seems to have a unique column or two added in with no real pattern (these are the added questions and results of the survey). As I am working on the statement to join all of the tables into one large master table the code is getting quite complex. Is there a more efficient way to merge these tables from multiple databases and just select all rows and columns so it will merge if the column exists and create if it encounters a new column?
No, there isn't an automatic way to merge a bunch of similar, but not quite the same, tables into one. At least, not in any database system that I know of.
You could possibly automate something like that with a fairly simple script that relies on your database's information schema (or equivalent).
However, with only 30 tables and only a column or two different in each, I'm not sure it's worth it. A manual approach, with copying and pasting and making minor changes, would probably be faster.
Also, consider whether the "extra" columns that are unique to individual tables need to go into the combined table. The point of making a big single table is to process/analyze all the data together. If something only applies to a single source, this isn't possible.

Best Practice - Should I make one table or two for two similar sets of data?

I need a table to store types of tests. I've been provided with two excel spreadsheets, one for microbial tests, one for pathogens. Microbial has 5 columns and Pathogens has 10. The 5 columns are in both tables. So one has 5 extra columns.
Just to give you an idea, the table columns would be something like this:
**Microbial**
Test Method IncubationStage1
**Pathogens**
Test Method IncubationStage1 IncubationStage2 Enrichment
So Is it better to have one table for Microbial and one for Pathogens, or better to have one table for Tests and have both within it? Is it bad to have a Microbial in a table where I know for certain only half the columns will be utilized? Or is it better to keep related items in the same table, and separate them by a column "Type"?
Obviously both will work fine but I'm wondering which is better.
The answer to these sorts of questions is always "it depends."
For my opinion, if you think you'll ever want to aggregate the data by test or by method across pathogenic or microbial types, then certainly you should put the data in the same table with an additional column that differentiates them.
You also could potentially better "normalize" your tables like this:
Table1: ExperimentID_PK ExperimentTypeID_FK Test Method
Table2: MeasurementRecordID_PK ExperimentID_FK Timestamp Other metadata about the record
Table3 MeasurementID_PK MeasurementTypeID_FK MeasurementValue MeasurementRecordID_FK
Table4: MeasurmentTypeId_PK Metadata About Measurement Types
Table5: ExperimentTypeId_PK Metadata About Experiment Types
... where all the leaf data elements point back to their parent data elements through foreign keys, and then you'd join data together in SQL statements, with indexes applied for optimal performance based on the types of queries you wanted to make. Obviously one of your rows in the question would end up appearing as multiple rows across multiple tables in this schema, and only at query time could they conceivably be reunited into individual rows (e.g. bound by MeasurementRecordID).
But there are other patterns too, in No-SQL land normalization can be the enemy. Slicing and dicing data sets turns out to be easier in some domains if it is stored in a more bloated format to make query structures more obvious. So it kind of comes down to thinking through your use cases.

Efficient way to query similarly-named tables with identical column names

I'm building a report in SSRS which takes data from several tables with similar names. There are three different 'sets' of tables - i.e., 123xxx, 456xxx, and 789xxx. Within these groups, the only difference in the table names is a three-digit code for a worksite, so, for example, we might have a table called 123001, 123010, and 123011. Within each set of tables, the columns have the same names.
The problem is that there are about 15 different sites, and I'm taking several columns from each site and each group of tables. Is there a more efficient way to write that query than to write out the name of every single column?
I don't believe there is but I feel like the use of aliases on your tables would make it much easier to undestand/follow your query building.
Also, if you aren't comparing values on the tables at all, then maybe a union between each table select would help make sense too.
I would give each table an alias.
SELECT s1t1.name
FROM Site1Table1 as s1t1;

Django/SQL: what's most efficient, more columns or more tables?

I have a MySQL database that I'm using with Django. One of my tables has around 60 columns. I'm wondering whether to split it into 5-6 smaller tables. This would make logical sense, since the columns split nicely into 5-6 logical groups.
The downside would be that some of the Django pages would then require 5-6 row queries instead of 1.
Is it more efficient to have one table with many columns, or many tables with fewer columns? If the former, how much of a disadvantage is it to have many tables? (as far as one can quantify such things...)
Thanks for your advice :)
Use old Opccams' razor. Do not do unnecessary moves. Are you okay with your one table? If so - leave it as is. Do not make yourself a trouble out of nowhere.
You are wrong about 6 queries. It will be still single query. But see item 1.
If it's logical to split them into multiple models, then split them. Just for reason of efficiency, don't keep them in single model.
Performance/Effectiveness of retrieving data really depends on how you structure your query. No point loading all 70 columns in memory when you will be using only 5 - 10 fields. You can just select what you want using .values().
Also when you split into multiple models and using foreign keys to relate them, then using select_related you can retrieve same information with fewer queries and sometimes even 1.
If we can see model, may be we can give our best opinions.