Many to many relationship and MySQL - sql

If I wanted to make a database with subscribers (think YouTube), my thought is to have one table containing user information such as user id, email, etc. Then another table (subscriptIon table) containing 2 columns: one for the user id and one for a new subscriber's user id.
So if my user id is 101 and user 312 subscribes to me, my subscription table would be updated with a new row containing 101 in column 1 and 312 in column 2.
My issue with this is that every time 101 gets a new subscriber, it adds their id to the subscription table meaning I can't really set a primary key for the subscription table as a user id can be present many times for each of their subscribers and a primary key requires a unique value.
Also in the event that there's a lot of subscriptions going on, won't it be very slow to search for all of 101's followers as all the rows will have to be searched and be checked for every time 101 is in the first column and check the user id (the subscriber to 101) in the second column?
Is there's a more optimal solution to my problem?
Thanks!

In your case, the pairs (user_id, subscriber_id) are unique (a user can't have two subscriptions for another user, can they?). So make a compound primary key consisting of both fields if you need one.
Regarding the speed of querying your subscription table: think about the queries you'll run on the table, and add appropriate indexes. A common operation might be "give me a list of all my subscribers", which would translate to something like
SELECT subscriber_id FROM subscriptions WHERE user_id = 123;
(possibly as part of a join). If you have indexed the user_id column, this query can be run quite efficiently.

A Primary Key can be made of two columns, subscribe and subscriber in your case. And since search will only be on integer value, (no text search) it will be fast.
more informations here : https://stackoverflow.com/a/2642799/1338574

Related

Generating a primary key unique across multiple databases

Operational databases of identical structure work in several countries.
country A has table Users with column user_id
country B has table Users with column user_id
country C has table Users with column user_id
When data from all three databases is brought to the staging area for the further data warehousing purposes all three operational tables are integrated into a single table Users with dwh_user_id.
The logic looks like following:
if record comes from A then dwh_user_id = 1000000 + user_id
if record comes from B then dwh_user_id = 4000000 + user_id
if record comes from c then dwh_user_id = 8000000 + user_id
I have a strong feeling that it is a very bad approach. What would be a better approach?
(user_id + country_iso_code maybe?)
In general, it's a terrible idea to inject logic into your primary key in this way. It really sets you up for failure - what if country A gets more than 4000000 user records?
There are a variety of solutions.
Ideally, you include the column "country" in all tables, and use that together with the ID as the primary key. This keeps the logic identical between master and country records.
If you're working with a legacy system, and cannot modify the country tables, but can modify the master table, add the key there, populate it during load, and use the combination of country and ID as the primary key.
The way we handle this scenario in Ajilius is to add metadata columns to the load. Values like SERVER_NAME or DATABASE_NAME might provide enough unique information to make a compound key unique.
An alternative scenario is to generate a GUID for each row at extract or load time, which would then uniquely identify each row.
The data vault guys like to use a hash across the row, but in this case it would only work if no row was ever a complete duplicate.
This is why they made the Uniqueidentifier data type. See here.
If you can't change to that, I would put each one in a different table and then union them in a view. Something like:
create view vWorld
as
select 1 as CountryId, user_id
from SpainUsers
UNION ALL
select 2 as CountryId, user_id
from USUsers
Most efficient way to do this would be :-
If record from Country A, then user * 0 = Hence dwh_user_id = 0.
If record from Country B, then (user * 0)- 1 = Hence dwh_user_id = -1.
If record from Country C, then (user * 0)+ 1 = Hence dwh_user_id = 1.
Suggesting this logic assuming the dwh_user_id is supposed to be a number field.

Table of users from different sources

In this scenario: If I have a table Users, with users from default sign up and facebook sign up.
Table users
id | name
1 John (default)
2 Carl (default)
111 Steven (facebook)
...
111 Wayne (default)
If the id is auto increment and unique, when the DBMS will try to insert the id 111 from the default sign up, i will get an error due the unique restriction, because the id 111 has inserted manually. So, the DBMS should know if the id 111 exists the next default sign up should be 112 and not 111.
There is any way to avoid this error? Or what is the best practice to handle similar cases?
Make two tables of users seems a bit over-complicated, just to avoid this issue.
Here are two alternative approaches.
First, if a user can only sign up in two ways, you can have separate columns for each one. So, the users table would have columns such as:
DefaultDateTime
FacebookDateTime
and so on, for whatever columns you want. When a user registers in the "second" way, then you update the existing record rather than inserting a new one.
The second method is probably better. Have two tables:
Users
Signups
The Signups table would have a foreign key relationship back to the Users table (possibly with a NOT NULL constraint). Both tables would have auto-incremented integer primary keys.
Apparently, in your data model, the signups are separate from the users, so you should model them separately. Then you can have as many signups as you like for a given user.
Maby you can try to get the MAX ID value from the database and add +1 to it for the next insert. That way you will always get a value over the current value.

Checklist database design. One to many relationship and storing user information

I am basically designing a web checklist. The process is as follows: User logs in, selects the "Job Name" for a list, clicks on it, goes to next page, selects "Procedure list" from a list, clicks on it, goes to next page, there he sees a checklist where he can basically add comments, and click check box on individual listings.
I know how to code most of it, but at the moment i'm trying to figure out how to setup the
relationships + what extra tables to add to hold the information.
General layout I have at the moment:
Table: User_list
User_ID
Username
Table: Job_list
Job_ID
Job Name
Table: Procedure_List
Procedure_ID
Procedure Name
Job_ID
Table: Check_List
Job_ID
Checklist_ID
Description
Job_ID -> Procedure_ID -> Checklist_ID is one to many... but how to add the user list in order to store all the changes done by the user.
So you can basically have one page where you see:
Job Name
Procedure
Checklist done
and all the details done by the users.
I'm assuming the relationships are Job 1:m Procedure 1:m Checklist. And a user may choose any number of jobs. The combination of Job/Procedure/Checklist is chosen by the user. For example, I assume a Job may have 10 associated procedures and the user selects 1 or more of these. Same for Procedure: A given procedure has certain checklists associated with it and the user may select any number of these checklists.
Use "join tables".
User_Job table. A user may be associated with any job or any number of jobs. The user_ID and job_ID go into the User_Job table. Make the primary key the User_ID + job_ID.
Job_Procedure table. Put the User_Job key (both columns) and procedure_id in this table. Make the primary key User_ID, Job_ID, Procedure_ID
Procedure_Checklist table. Put the Job_procedure key (all columns) & the checklist_id in this table. Make the primary key a compound using all the columns.
Primary Keys thoughts
A sequence number for a primary key for each table will limit the number of columns in the related tables. However this key has no real meaning and if you're looking at the Procedure_Checklist table, for example, you cannot tell which job & user without querying the other tables - PITA. It's also meaningless to sort such a key. And it does not prevent duplicate rows.

Inserting data into a table with new and old data from another two tables

I have a table name Queue_info with structure as
Queue_Id number(10)
Movie_Id number(10)
User_Id Varchar2(20)
Status Varchar2(20)
Reserved_date date
I have two other tables named Movie_info having a many columns including movie_Id and User_info having many columns including User_Id.
In the first table movie_id, user_id is foreign key from movie_info(movie_id) and user_info(User_id).
My problem is that if I insert any value either in the Movie_info or User_info, the Queue_info table should be updated as new entry for every user or for every movie
For example
If insertion in Movie_info as new movie then queue_info should be updated as for every user the status of that new movie is awaiting.
use from triggers. by using triggers you can update all related tables to your table. for example if 1 row inserted in to table 1, 1 row insert in to table 2 too.
Some notes first:
I really like that you have a standardized way to name tables and fields. I would use Queue instead of Queue_info, Movie instead of Movie_info, etc..., as all tables have information - don't they? - and we all know that. I'd also choose MovieId instead of Movie_Id, ReservedDate instead of Resedrved_date but that's a matter of personal taste (allergy to underscores).
What I wanted to stress is that choosing one way for naming and keeping it is very good.
What I don't like is that while your structure seems normalized, you use Varchar type for the User_id Key. Primary (and Foreign) Keys are best if they are small in size and with constant size. This mainly helps in keeping index sizes small (so more efficient) and secondly because the keys are the only values stored repeatedly in the db (so it helps keeping db size small).
Now, to your question, do you really need this? I mean, you may end up having in your database thousands of movies and users. Do you want to add a thousand rows in the Queue table whenever a new movie is inserted? Or another thousand rows when a new user is registered? Or 50 thousand rows when a new list with 50 new movies arrives (and is inserted in the db)?
With 10K movies and 2K users, you'll have a 20M rows table. There is no problem with a table of that size, and one or more triggers will serve your need. What happens if you have 100K movies and 50K users though? A 5G rows table. You can deal with that too, but perhaps you can just keep in that table only the movies that a user is interested in (or has borrowed or has seen, whatever the purpose of the db is). And if you want to have a list of movies that a certain user has not yet been interested in, check for those Movie_Id that do not exist in the table. with something like this:
SELECT
Movie_Id, Movie_Title
FROM
Movie_info AS m
WHERE
NOT EXISTS
( SELECT *
FROM Queue_info AS q
WHERE q.Movie_Id = m.Movie_Id
AND q.User_Id = #UserId
)

SQL Table Design - Identity Columns

SQL Server 2008 Database Question.
I have 2 tables, for arguments sake called Customers and Users where a single Customer can have 1 to n Users. The Customers table generates a CustomerId which is a seeded identity with a +1 increment on it. What I'm after in the Users table is a compound key comprising the CustomerId and a sequence number such that in all cases, the first user has a sequence of 1 and subsequent users are added at x+1.
So the table looks like this...
CustomerId (PK, FK)
UserId (PK)
Name
...and if for example, Customer 485 had three customers the data would look like...
CustomerId | UserId | Name
----------
485 | 1 | John
485 | 2 | Mark
485 | 3 | Luke
I appreciate that I can manually add the 1,2,3,...,n entry for UserId however I would like to get this to happen automatically on row insert in SQL, so that in the example shown I could effectively insert rows with the CustomerId and the Name with SQL Server protecting the Identity etc. Is there a way to do this through the database design itself - when I set UserId as an identity it runs 1 to infinity across all customers which isn't what I am looking for - have I got a setting wrong somewhere, or is this not an option?
Hope that makes sense - thanks for your help
I can think of no automatic way to do this without implementing a custom Stored Procedure that inserted the rows and checked to increment the Id appropriately, althouh others with more knowledge may have a better idea.
However, this smells to me of naturalising a surrogate key - which is not always a good idea.
More info here:
http://www.agiledata.org/essays/keys.html
That's not really an option with a regular identity column, but you could set up an insert trigger to auto populate the user id though.
The naive way to do this would be to have the trigger select the max user id from the users table for the customer id on the inserted record, then add one to that. However, you'll run into concurrency problems there if more than one person is creating a user record at the same time.
A better solution would be to have a NextUserID column on the customers table. In your trigger you would:
Start a transaction.
Increment the NextUserID for the customer (locking the row).
Select the updated next user id.
use that for the new User record.
commit the transaction.
This should ensure that simultaneous additions of users don't result in the same user id being used more than once.
All that said, I would recommend that you just don't do it. It's more trouble than it's worth and just smells like a bad idea to begin with.
So you want a generated user_id field that increments within the confines of a customer_id.
I can't think of one database where that concept exists.
You could implement it with a trigger. But my question is: WHY?
Surrogate keys are supposed to not have any kind of meaning. Why would you try to make a key that, simultaneously, is the surrogate and implies order?
My suggestions:
Create a date_created field, defaulting to getDate(). That will allow you to know the order (time based) in which each user_id was created.
Create an ordinal field - which can be updated by a trigger, to support that order.
Hope that helps.