DB Design question: Tree (one table) vs. Two tables for tweets and retweets? - sql

I've heard that on stackoverflow questions and answers are stored in the same DB table.
If you were to build a twitter like service that only would allow 1 level of commenting. ie 1 tweet and then comments/replies to that tweet but no re-comments or re-replies.
would you use two tables for tweets and retweets? or just one table where the field parent_tweet_id is optional?
I know this is an open question, but what are some advantages of either solutions?

Retweets are still normal tweets as well. So one table. You wouldn't want to have to load from two tables to include the retweets.

Advantages of one table:
You can search through all tweets and comments in a simple way.
You can use one identity column easily for all posts.
Every post has the same set of columns.
Advantages of two tables:
If it's more common to search or display only top-level tweets instead of tweets + comments, the table of tweets is that much smaller without comments.
Two tables can have different sets of columns, so if there are columns meaningful for one type of post but not the other, you can put these columns in the respective table without having to leave them null when not applicable.
Indexes can also be different on two tables, so if you have the need to search comments in different ways, you can make indexes specialized to that task.
In short, it depends on how you use the data, not only how it's structured. You haven't said much about the operations you need to do with the data.

Like all design questions, it depends.
I don't normally like to mix concepts in a single table. I find it can quickly damage the conceptual integrity of the database schema. For example, I would not put posts and replies in the same table because they are different entities.

Related

Is it better to create different tables for different stores or have one table with a column to designate which site

I am developing a mssql db for stores that are in different cities. is it better to have a table for each city, or house all in one table. I also dont want users from different cities accessing data from cities that are not theirs
SQL is designed to handle large tables, really big tables. It is not designed to handle a zillion little tables. The clear answer to your question is that all examples of a particular entity should go in a single table. There are lots of good reasons for this:
You want to be able to write a query that will return data about any city or all cities. This is easy if the data is in one table; hard if the data is in multiple tables.
You want to optimize your queries by choosing correct indexes and data types and collecting statistics and defragging indexes and so on. Why multiply the work by multiplying the number of tables?
Foreign key relationships should be properly declared. You cannot do that if the foreign key could be to multiple tables.
Lots of small tables results in lots of partially filled data pages, which just makes the database bigger and slows it down.
I could go on. But you probably get the idea by now that one table per entity is the right way to go (at least under most circumstances).
Your issue of limiting users to see data only in one city can be handled in a variety of ways. Probably the most common is simply to use views.

In PostgreSQL, efficiently using a table for every row in another table

I am sorry for the lack of notation in my question but I am not too familiar with SQL. Despite searching the internet for a decent amount of hours, I couldn't find that how to do efficiently what I wanted to do, but that is maybe because I am not familiar with the notation. Here comes the question:
I want to create a table, say Forms, in which each Form row has an ID, some metadata and a pointer(?) to the table of that Form row, lets say Form12 table, which directs me to Form12 table. I need it because every Form has different number, name and type of columns depending on users configuration for a particular Form.
So, I thought I can put the Table ID of Form12 as a column to Form table. But is this approach considered OK, or is there a better way to do it?
Thank you for your time.
Storing the names of tables in a column is generally not a good solution in a relational database. In order to use the information, you need to use dynamic SQL.
I would instead ask why you cannot store the information in a single table or well-defined sets of tables. Postgres has lots of options to help with this:
NULL data values, so columns do not need to be filled in.
Table inheritance, so tables can share columns.
JSON columns to support a flexible set of columns.
Entity-attribute-value (EAV) data models, which allow for lots of flexibility.

Building a MySQL database that can take an infinite number of fields

I am building a MySQL-driven website that will analyze customer surveys distributed by a variety of clients. Generally, these surveys are structured fairly consistently, and most of our clients' data can be reduced to the same normalized database structure.
However, every client inevitably ends up including highly specific demographic questions for their customers that are irrelevant to every other one of our clients. For instance, although all of our clients will ask about customer satisfaction, only our auto clients will ask whether the customers know how to drive manual transmissions.
Up to now, I have been adding columns to a respondents table for all general demographic information, with a lot of default null's mixed in. However, as we add more clients, it's clear that this will end up with a massive number of columns which are almost always null.
Is there a way to do this consistently? I would rather keep as much of the standardized data as possible in the respondents table since our import script is already written for that table. One thought of mine is to build a respondent_supplemental_demographic_info table that has the columns response_id, demographic_field, demographic_value (so the manual transmissions example might become: 'ID999', 'can_drive_manual_indicator', true). This could hold an infinite number of demographic_fields, but would be incredible painful to work with from both a processing and programming perspective. Any ideas?
Your solution to this problem is called entity-attribute-value (EAV). This "unpivots" columns so they are rows in a table and then you tie them together into a single view.
EAV structures are a bit tricky to learn how to deal with. They require many more joins or aggregations to get a single view out. Also, the types of the values becomes challenging. Generally there is one value column, so everything is stored as a string. You can, of course, have a type column with different types.
They also take up more space, because the entity id is repeated on each row (I think that is the response_id in your case).
Although not idea in all situations, they are appropriate in a situation such as you describe. You are adding attributes indefinitely. You will quickly run over the maximum number of columns allowed in a single table (typically between 1,000 and 4,000 depending on the database). You can also keep track of each value in each column separately -- if they are added at different times, for instance, you can keep a time stamp on when they go in.
Another alternative is to maintain a separate table for each client, and then use some other process to combine the data into a common data structure.
Do not fall for a table with key-value pairs (field id, field value) as that is inefficient.
In your case I would create a table per customer. And metadata tables (in a separate DB) describing these tables. With these metadata you can generate SQL etcetera. That is definitely superior too having many null columns. Or copied, adapted scripts. It requires a bit of programming, where an application uses the metadata to generate SQL, collect the data (without customer specific semantic knowledge) and generate reports.

Three dimensional database table

We have all been there - consider the following example - first, the client says "every user shall only have one profile picture", so we add a field for that to the users table - half a year later, requirements change and a user actually needs to have n profile pictures.
Now, this seems only possible if you add a new table such as user_pictures to handle the new cardinality 1:n instead of 1:1. Oftentimes this can get very complicated. Whenever I come across this problem, I wonder why we don't use all three dimensions that we can think in. A two dimensional table is limited in a way that it is somewhat incomplete - what if, referring to our problem with the profile picture again, the picture field in the users table had a depth, and that depth made the field an array that perfectly represented both cardinalities 1:1 and 1:n at the same time.
Table fields would simply become arrays and automatically support both cardinalities - wouldn't that be something? At least I would use it. Is there something like it out there already?
Oracle has support for arrays as well as nested tables. Either seem to fit your requirements. These days though people prefer to model everything as tables and relationships to keep things simple and consistent and so modern RDBMSes don't generally support this stuff and I don't believe it ever made it into standard SQL either.
The standard many-to-many approach, many users to many profile pictures, is easily covered by the three table approach:
Table: Users
Table: Pictures
Table: User_Pictures
However, if you move to a NoSQL approach, you can store a User document (usually in JSON format), that stores an array of profile pictures for that user in a single table.
#gordy +1 for the Oracle link. I wasn't sure if any RDBS supposed arrays.
You are describing a denormalization technique (multiple columns for instances of one field) and it usually leads to tears unless you thoroughly understand the consequences of violating basic relational principles.
A classic difficulty comes when you want to query on the field ("find the user who has this picture") and you discover that an SQL statement with "AND picture IN (pic1, pic2, pic3)" can't be indexed and your optimizer starts planning its revenge.

which is faster, mysql database with one table or multiple tables?

On my website you can search 'ads' or 'classifieds'. There are different categories.
Would the searches be faster with multiple tables, one for each category, or wouldn't it matter?
We are talking about around 500 thousand ads.
If it won't slow down the search, please explain yourself so that I understand why it won't, because it seems like common sense that the more ads you have, the slower the search!
Thanks
Your question is a little unclear. I'm assuming this scenario:
table: ads
id category ad_text
-- -------- ---------------------------
1 pets sample text
2 family sample ad
If you are making one search of ads, then searching multiple tables on each search is slower than searching one table.
HOWEVER, if you're proposing to break "ads" into multiple tables according to "category", leaving you with table names like
pets-ads
family-ads
programmer-ads
And, programatically, you know you're looking for programmer-ads so you can just go search the programmer-ads table, then breaking them out is faster. Barely.
Breaking them out, though, has many drawbacks. You'll need:
some cute code to know which table to
search.
a new table each time you create a new category
to rename a table if you decide a category name is wrong
Given the limited info we have, I would strongly advise one table with a category column, then go ahead and normalize that out into its own table. Slap an index on that column. Databases are built to handle tons of rows of data organized correctly, so don't worry about that so much.
Obviously, it will be nominally faster to search a smaller table (one category) than a larger table. The larger table is probably still the correct design, however. Creating multiple identical tables will simply make the developer's and manager's lives miserable. Furthermore, certain kind of searches are more difficult if you segment the data (for instance, searches across two categories).
Properly indexed, the single-table approach will yield results almost as good as the segmented approach while providing the benefits of proper design.
(Of course, when you say "single table", I assume that you mean a single table to hold the core attributes of the Advertistment entities. Presumably there will be other tables as well.)
It depends.
If you've built a single denormalised table containing text, it'll get progressively slower for a number of reasons. Indexes help to a certain point.
If you have a normalised structure with multiple tables, primary and foreign keys, indexes, etc., it can be more robust and scalable.
A database is very well equipped to deal with 500k adds. Add an index on the category, and you should be fine.
If you add the table definition and the distribution of categories to your question, you'd probably get a better answer :)