Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I need to design a database for something like a downloads site . I want to keep track of users , the programs each users downloaded and also allow users to rate+comment said programs.The things I need from this database - get average rating for a program , get all comments for a program , know exactly what program was downloaded by whom(I dont care how many times each program was downloaded but I want to know for each users what programs he has downloaded),maybe also count number of comments for each program and thats about it(it's a very small project for personal use that I want to keep simple)
I come up with these entities -
User(uid,uname etc)
Program(pid,pname)
And the following relationships-
UserDownloadedProgram(uid,pid,timestamp)
UserCommentedOnProgram(uid,pid,commentText,timestamp)
UserRatedProgram(uid,pid,rating)
Why I chose it this way - the relationships (user downloads , user comments and rates) are many to many . A user downloads many programs and a program is downloaded by many users. Same goes for the comments (A user comments on many programs and a program is commented or rated by many users). The best practice as far as I know is to create a third table which is one to many (a relationship table).
. I suppose that in this design the average rating and comment retrieval is done by join queries or something similar.
I'm a total noob in database design but I try to adhere to best practices , is this design more or less ok or am I overlooking something ?
I can definitely think of other possibilities - maybe comment and\or rating can be an entity(table) by itself and the relationships are between 3 entities. I'm not really sure what the benefits\drawbacks of that are: I know that I don't really care about the comments or the ratings , I only want to display them where appropriate and maintain them(delete when needed) , so how do I know if they better become an entity themselves?
Any thoughts?
You would create new entities as dictated by the rules of normalization. There is no particular reason to make an additional (separate) table for comments because you already have one. Who made the comment and which program the comment applied to are full-fledged attributes of a comment. The foreign keys representing these relationships (which are many-to-one, from the perspective of the comment table) belong right where you've put them.
The tables you've proposed are in third normal form which is acceptable according to best practices. I would add that you seem to be tracking data on a transactional basis (i.e. recording events as and when they occur). That is a good practice too because you can always figure out whatever you want to based on detailed information.
Calculating number of downloads or number of comments is a simple matter of using SQL Aggregate Functions with filters on the foreign key(s) that apply to your query - e.g. where pid=1234 etc.
I would do an entity for Downloads with its own id. You could have download status, you may have multiple download of the same program for one user. you may need to associate your download to an order or something else,..
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am interested in learning SQL. I span up a Server 2019 and installed SQL Express on it. I am currently going through a basic SQL course on Codecademy but figured it would be a lot better to practice with some hands on. I came up with an idea to create a database to track my monthly bills and finances. Once I have enough data then I can leverage some graphing tools to visually present the data.
Has anyone done something similar before? Any suggestions as how I should design the database/tables? Keep in mind that my SQL knowledge is still very limited.
Thanks in advance!
This is a very broad question and nobody can give a definitive answer.
I agree with your approach of trying it yourself in your own situation - that's a definite thumbs-up from me (indeed, when we get new employees, this is one of the best traits when reviewing them). So many times people learn so much by 'getting their hands dirty' so to speak (e.g., going on and trying it themselves).
However, I suggest starting with the examples they have to get the general concepts down - the usually choose at least decent examples.
Alternatively though, you could give it a shot. Just be prepared to be wrong and start again. But don't worry - in terms of value, having a shot and getting it wrong is worth much more than reading something and only half-understanding it.
If you are familiar with spreadsheets, I suggest
Imagine how you would keep this on spreadsheets e.g., one sheet with bills that are due, and one sheet with your payments
Each one of those sheets would represent a table in your database.
If you pay all your bills with one payment only (e.g., no installments), then it would be easier to do it with one spreadsheet (e.g., just listing all the bills on the left side, and their payment information on the right). In this case it may not be the best case for teaching yourself databases. On the other hand, if you do pay by installments, then this could be useful.
The big difference difference in approach, is that in databases, the rows are not inherently sorted. Instead, you typically give the rows an identifier (e.g., Bill_ID, or Payment_ID). And then the tables are linked e.g., for a given row in Payment table, you'd also include the Bill_ID to represent which bill the payment was for.
Update: More examples
To choose a relevant thing to try on databases, I suggest choosing things that are related to each other, but are separate from each other (e.g., not linked 1-to-1).
In the bills/payments above, if you paid each bill with one payment, they didn't need to be on separate tables. However, you could try other things e.g.,
You live in a sharehouse where people pay for various things in a 'kitty' sysem (e.g., on each person's payday, they put in the amount they owe). In this case you may have a Bill table (which includes how it is split up, and when it was paid), and Person_Payments table which includes when people put money into the kitty
You have a family with kids and chores. You have a Kids table (with their name, etc), a Chores table (listing chores and how much they are worth in pocket money) and then a Kids_Chores table listing the Kid_ID, Chore_ID and date. Whenever they do a chore it goes into Kids_Chores and that is used to determine their pocket money.
You play various computer games and you want to track your win rate on them over time. You have one table for Game (with info about your user ID, etc), one for Game_Mode (which indicates, for a given game, what mode you were playing e.g., casual vs league, easy vs hard), then one for Game_Stats recording the date you played, the game and game_mode, and the number of games and number of wins.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am wondering what is the recommended type for PK in sql server? I remember reading a long time ago this article but now I am wondering if it is still a wise decision to use GUID still.
One reason that got me thinking about it is, these days many sites use the id in the url for instance Course/1 would get the information about that record.
You can't really do that with a guid, which would mean you would need some new column that would be unique and use that, what is more work as you got to make sure each record has a unique number.
There is never a "one solution fits all". You have to carefully design your architecture and select the best options for your scenario. Both INT and GUID types are valid options like they've always been.
You can absolutely use GUID in a URL. In fact, in most scenarios, it is better to use a GUID (or another random ID) in the URL than a sequential numeric ID for security reason. If you use sequential ID, your site visitors will be able to easily guess other users' IDs and potentially access their contents. For example, if my profile URL is /Profiles/111, I can try Profile/112 and see if I can access it. If my reservation URL is Reservation/444, I can try Reservation/441 and see what happens. I can easily guess other IDs in the system. Of course, you must have strong permissions, so I should not be able to see those other pages that don't belong to my account, but if there is any issues or holes in your permissions and security, a breach can happen. While with GUID and other random IDs, there is no way to guess other IDs in the system, so such a breach is much more difficult.
Another issue with sequential IDs is that your users can guess how many accounts or records you have and their order in your database. If my ID is 50269, I know that you must have almost this number of records. If my Id is 4, then I know that you had a very few accounts when I registered. For that reason, many developers start the first ID at some random high number like 1529 instead of 1. It doesn't solve the issue entirely, but it avoid the issues with small IDs. How important all that guessing is depends on the system, so you have to evaluate your scenario carefully.
That's on the top of the benefits mentioned in the article that you mentioned in your question. But still, an integer is better in some areas, so choose the best option for your scenario.
EDIT To answer the point that you raised in your comment about user-friendly URLs. In those scenarios, sequential numbers is the wrong answer. A better solution is a unique string in the URL which is linked to your numeric ID. For example, the Cars movie has this URL on IMDB:
https://www.imdb.com/title/tt0317219/
Now, compare that to the URL of the same movie on Wikipedia, Rotten Tomatoes, Plugged In, or Facebook:
https://en.wikipedia.org/wiki/Cars_(film)
https://www.rottentomatoes.com/m/cars/
https://www.pluggedin.ca/movie-reviews/cars/
https://www.facebook.com/PixarCars
We must agree that those URLs are much friendlier than the one from IMDB.
I've worked on small, medium, and large scale implementations(100k+ users) with SQL and Oracle. The major of the time PK type of INT is used when needed. The GUID was more popular 10-15 years ago, but even at its height was not as populate as the INT. Unless you see a need for it I would recommend INT.
My experience has been that the only time a GUID is needed is if your data is on the move or merged with other databases. For example, say you have three sites running the same application and you merge those three systems for reporting purposes.
If your data is stationary or running a single instance, int should be sufficient.
According to the article you mention:
GUIDs are unique across every table, every database, every server
Well... this is a great promise, but fails to deliver. GUID are supposed to be unique snowflakes. However, reality is much more complicated than that, and there are numerous reasons why they end up not being unique.
One of the main reasons is not related to the UUID/GUID specification, but by poor implementations of it. For example some Javascript implementations rank as the worst ones, using pseudo random numbers that are quite predictable. Other implementations are much more decent.
So, bottom line, study the specific implementation of UUID/GUID you are and will be using. Don't just read and trust the specification. Otherwise you may be up for a surprise, when you get called at 3 am on a Saturday night by angry customers.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm designing an internal website for a small company I work for. I am comfortable in my ability to do the CSS and HTML and I'm willing to learn how to do whatever else will be needed.
The company is a transportation company that services many towns throughout the day. I have a table (in Excel format currently) of cities that we service along with a correlating zip code and pickup terminal. I would like the dispatchers to be able to enter the cities they are no longer able to service into a search bar and their input would search the previously mentioned table, copy the city, zip, and pickup terminal, and write it to a new table.
The customer service team would then be able to search the newly written table by either city, zip, or pickup terminal to see which cities we no longer service and provide feedback to our customers.
My question is what is the best way to go about this without the need for paid services? My table will contain less than 1000 rows (could easily be reduced to less than 500 if that changes things) and 3 columns and the table being written off of it will have less than 200 rows and 3 columns by the end of the business day.
I've never made a website that needed a database before and I don't know what my best option is for such a small table. I've looked into XML, SQL, and even Google Spreadsheets for options but I just don't know enough about databases to make an informed decision.
1000 rows of 3 columns is not a large amount of data; you could create a JSON or even a text file and load it into RAM. If you create a class for your data you could use dictionaries or maps to query it.
I would not worry about a database until performance or integrity becomes a bottleneck.
Here's a site that might give you some guidance: http://www.htmlgoodies.com/primers/database/article.php/3478121
I've had a lot of success with google sheets for some small-scale back-end data, kind of similar to your project. But we had an experienced developer who then used python to set it up. Also, if you want to scale your data set eventually, Google sheets may not be the way to go. I'd look into SQL as a long-term solution.
my-sql is a good choice
you can make your work easier by downloading xampp
it is free opensource software it contains apache server, mysql database ,php,perl interpreter and other utilities.
it also has 'phpMyAdmin' utility which gives you easy ways to create databases and tables without even knowing much of code of sql statements. but to add functionality you mentioned above you will need to write the back end of your website using php,asp,jsp,python or any other language you know, that takes time to do,
you can download xampp from
https://www.apachefriends.org/index.html
that will do it
http://www.w3schools.com
is helpful for tutorials on web development.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm developing a restful API using NodeJS. To give you a little more insight in my application:
My application has surveys. A survey contains questions which in their turn has choices.
To add a question, you need to provide the id of the survey in the body of the post. To add an option, you need to provide the id of the question.
Now for the API routes. What would be better:
Option 1
/api/departments
/api/surveys
/api/questions
/api/choices
Option 2
/api/departments
/api/departments/department_id/surveys
/api/departments/department_id/surveys/survey_id/questions
/api/departments/department_id/surveys/survey_id/questions/question_id/options
The last one seems more logical because I don't need to provide the id of the parent in the body of the post.
What is best practice to use as endpoints?
I don't think there's a "best practice" between the two; rather, it's about having the interface that makes the most sense for your application. #2 makes the most sense if you're typically going to access the surveys on a per-department basis, and also makes sense in terms of accessing questions on a per-survey basis. If you wanted to eliminate the per-department part, you'd do something that's kind of a mix of the above:
/api/departments
/api/surveys
/api/surveys/survey_id/questions
/api/surveys/survey_id/questions/question_id/options
If you DO want to go by per-department, I'd change #2 so that instead of /api/departments/surveys one would access /api/departments/department_id/surveys ...
But without knowing more about the application, it's difficult to know what the best answer is.
Do surveys contain anything besides questions? do questions contain anything besides choices? The reason I ask is that if the answer to both is no then I'd actually prefer something like this:
/api/departments/ # returns a list of departments
/api/departments/<survey-id>/ # returns a list of questions
/api/departments/<survey-id>/<question-id>/ # returns a list of choices
/api/departments/<survey-id>/<question-id>/<choice-id> # returns a list of options
or something to that effect. Basically, I like to keep the concept of "containers" and "data" rigid. I like to think of it like a file system.
So if the concept ends in an "s", it's a container (and I'd like the route to end with a "/" to indicate that it acts like a folder, but that's a nit).
Any access to "/" results in the element at that index, which of course can be another container. Similar to directory structure in a file system. For example, if I were to lay these out in a file system, I might come up with something like this:
+ /api/departments/
|-----------------/human-resources/
|---------------/survery-10/
|----------/choice-10
The choice depends on whether resources are owned or shared by higher-level resources; whether you want cascading delete or not. If owned (with cascading delete), choose option 2 and if shared, choose option 1.
If a survey is deleted, I guess you want to delete all questions and options with it (cascading delete). This matches well with option 2, because if you delete resource /api/departments/departmentid/surveys/surveyid, you naturally also delete all subresources /api/departments/departmentid/surveys/surveyid/questions/....
On the other hand, if you want the option to share questions among multiple surveys and share surveys among multiple departments, then option 1 is better.
Of course, you can also have a mix of option 1 and option 2, if some resource types are owned and others are shared.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm working on a fairly simple survey system right now. The database schema is going to be simple: a Survey table, in a one-to-many relation with Question table, which is in a one-to-many relation with the Answer table and with the PossibleAnswers table.
Recently the customer realised she wants the ability to show certain questions only to people who gave one particular answer to some previous question (eg. Do you buy cigarettes? would be followed by What's your favourite cigarette brand?, there's no point of asking the second question to a non-smoker).
Now I started to wonder what would be the best way to implement this conditional questions in terms of my database schema? If question A has 2 possible answers: A and B, and question B should only appear to a user if the answer was A?
Edit: What I'm looking for is a way to store those information about requirements in a database. The handling of the data will be probably done on application side, as my SQL skills suck ;)
Survey Database Design
Last Update: 5/3/2015
Diagram and SQL files now available at https://github.com/durrantm/survey
If you use this (top) answer or any element, please add feedback on improvements !!!
This is a real classic, done by thousands. They always seems 'fairly simple' to start with but to be good it's actually pretty complex. To do this in Rails I would use the model shown in the attached diagram. I'm sure it seems way over complicated for some, but once you've built a few of these, over the years, you realize that most of the design decisions are very classic patterns, best addressed by a dynamic flexible data structure at the outset.
More details below:
Table details for key tables
answers
The answers table is critical as it captures the actual responses by users.
You'll notice that answers links to question_options, not questions. This is intentional.
input_types
input_types are the types of questions. Each question can only be of 1 type, e.g. all radio dials, all text field(s), etc. Use additional questions for when there are (say) 5 radio-dials and 1 check box for an "include?" option or some such combination. Label the two questions in the users view as one but internally have two questions, one for the radio-dials, one for the check box. The checkbox will have a group of 1 in this case.
option_groups
option_groups and option_choices let you build 'common' groups.
One example, in a real estate application there might be the question 'How old is the property?'.
The answers might be desired in the ranges:
1-5
6-10
10-25
25-100
100+
Then, for example, if there is a question about the adjoining property age, then the survey will want to 'reuse' the above ranges, so that same option_group and options get used.
units_of_measure
units_of_measure is as it sounds. Whether it's inches, cups, pixels, bricks or whatever, you can define it once here.
FYI: Although generic in nature, one can create an application on top of this, and this schema is well-suited to the Ruby On Rails framework with conventions such as "id" for the primary key for each table. Also the relationships are all simple one_to_many's with no many_to_many or has_many throughs needed. I would probably add has_many :throughs and/or :delegates though to get things like survey_name from an individual answer easily without.multiple.chaining.
You could also think about complex rules, and have a string based condition field in your Questions table, accepting/parsing any of these:
A(1)=3
( (A(1)=3) and (A(2)=4) )
A(3)>2
(A(3)=1) and (A(17)!=2) and C(1)
Where A(x)=y means "Answer of question x is y" and C(x) means the condition of question x (default is true)...
The questions have an order field, and you would go through them one-by one, skipping questions where the condition is FALSE.
This should allow surveys of any complexity you want, your GUI could automatically create these in "Simple mode" and allow for and "Advanced mode" where a user can enter the equations directly.
one way is to add a table 'question requirements' with fields:
question_id (link to the "which brand?" question)
required_question_id (link to the "do you smoke?" question)
required_answer_id (link to the "yes" answer)
In the application you check this table before you pose a certain question.
With a seperate table, it's easy adding required answers (adding another row for the "sometimes" answer etc...)
Personally, in this case, I would use the structure you described and use the database as a dumb storage mechanism. I'm fan of putting these complex and dependend constraints into the application layer.
I think the only way to enforce these constraints without building new tables for every question with foreign keys to others, is to use the T-SQL stuff or other vendor specific mechanisms to build database triggers to enforce these constraints.
At an application level you got so much more possibilities and it is easier to port, so I would prefer that option.
I hope this will help you in finding a strategy for your app.