Sharepoint: Using multiple content types in list. Pros and Cons - sharepoint-2010

I newbie in Sharepoint development.
I has some hierarchical structure like internet forum:
Forum
Post
Comment
For each of this entities I create content type.
I see, that Sharepoint allow store in list different content types and I can store all forums with their posts and comments in single list (Forum and Post will be 'Folder', Comment - Item).
From other side, I can create separate lists for each content type:
Forums List, Posts List, Comments List and link them in some way.
Is anybody can outline Pros and Cons for both solutions? I have about 2 weeks experience in Sharepoint and can't select best way.
P.S. Sorry for my English.

The short answer is: it depends.
First, they need to logically fit together. A user should expect items of these various types to be grouped together (or at least wouldn't be surprised that they have been grouped together). And in terms of design, they should have some common intersection of list type and fields. Combining Documents, Discussions, and Events into a single list wouldn't be a good idea. Likewise, I'm not sure Posts and Comments (as you mention above) would be a good fit for a single list. They just don't logically fit and their schemas probably do not have enough in common.
Once that has been determined, I would put multiple Content Types in the same list if they are meant to be used together. Will you want to show all of these items, regardless of Content Type, together in a view? Do all of these items share the same workflows, policies, permissions, etc? If the answer is no for any of these, then split the Content Types into different lists.
As I said, it depends. I'm not sure there really is a hard or fast rule for this. I see it a little like database normalization. We know the forms and the options. But depending on the project, sometimes we normalize a little more, sometimes we denormalize a little more, but we almost never (I hope) have one, monster table that contains every type of row in the database.

Related

Embeddable vs one to many

I have seen an article in Dzone regarding Post and Post Details (two different entities) and the relations between them. There the post and its details are in different tables. But as I see it, Post Detail is an embeddable part because it cannot be used without the "parent" Post. So what is the logic to separate it in another table?
Please give me a more clear explanation when to use which one?
Embeddable classes represent the state of their parent classes. So to take your example, a StackOverflow POST has an ID which is invariant and used in an unbreakable URL for sharing e.g. http://stackoverflow.com/q/44017535/146325. There are a series of other attributes (state, votes, etc) which are scalar properties. When the post gets edited we have various versions of the text (which are kept and visible to people with sufficient rep). Those are your POST DETAILS.
"what is the logic to separate it in another table?"
Because keeping different things in separate tables is what relational databases do. The standard way of representing this data model is a parent table POST and child table POST_DETAIL with a defined relationship enforced through a foreign key.
Embeddable is a concept from object-oriented programming. Oracle does support object-relational constructs in the database. So it would be possible to define a POST_DETAIL Type and create a POST Table which has a column declared as a nested table of that Type. However, that would be a bad design for two reasons:
The SQL for working with nested tables is clunky. For instance, to get the POST and the latest version of its text would require unnesting the collection of details every time we need to display it. Computationally not much different from joining to a child table and filtering on latest version flag, but harder to optimise.
Children can have children themselves. In the case of Posts, Tags are details because they can vary due to editing. But if you embed TAG in POST_DETAIL embedded in POST how easy would it be to find all the Posts with an [oracle] tag?
This is the difference between Object-Oriented design and relational design.
OO is strongly hierarchical: everything is belongs to something and the way to get the detail is through the parent. This approach works well when dealing with single instances of things, and so is appropriate for UI design.
Relational prioritises commonality: everything of the same type is grouped together with links to other things. This approach is suited for dealing with sets of things, and so is appropriate for data management tasks (do you want to find all the employees who work in BERLIN or whose job is ENGINEER or who are managed by ELLIOTT?)
"give me a more clear explanation when to use which one"
Always store the data relationally in separate tables. Build APIs using OO patterns when it makes sense to do so.

Best solution for relational db design with different types of content

Edit: title is pretty broad. It would be more accurate as "Best solution for relational db design where users can post content, and the available fields would vary depending on the type of content".
I'm working on a website where users can post different types of content. All posts have text and a feed_id value relating it to another (many posts to one feed). Based on the idea that some posts will only contain text, what would be the best solution?
Ideas I've thought of so far:
Add a table for each special type that refers to a post, leaving unreferenced posts to be text posts.
Problem: How would text-only posts be queried?
Add a table for each type including text posts.
Problem: Doesn't seem very efficient. For every post found in text_posts, the post would again have to be found in posts to get the text field. Then again, this was already the case with special types like pictures. Is there a proper way to accomplish this with JOIN?
foreign key in each post entry to each special type which can be null
problem: Lots of null fields, and rules have to be maintained by the application.
a field representing the type of post
cons: rules have to be maintained by the application (case sensitivity, posts should only have a valid type), harder to query
pro: this field could also represent different views for similar content (i.e. I could make a value for post_type called "blog_entry" which is exactly the same as a text post, but have the application show the content it differently). This pro has a bit of a code-smell to it though...
Also, if it makes any difference, my application is written in PHP.
Edit: It seems that my first solution, "Add a table for each special type that refers to a post, leaving unreferenced posts to be text posts", would work fine if I queried all posts like this:
SELECT posts.title, posts.text, picture_posts.id, picture_posts.src FROM posts
LEFT JOIN picture_posts
ON picture_posts.post_id=posts.id
ORDER BY posts.date DESC
and then some pseudo-code for the application would look like this:
print(posts.title, posts.text);
if (picture_posts.id is not null) {
showPicture(picture_posts.src);
}
Should I use this design?
That's a very open-ended question. If you are doing this as an academic exercise, I'd recommend reading about database normalization and database normal forms. If you are doing this to produce a functional website, it sounds like you are reinventing several wheels from scratch and your time would be better spent searching for an off-the-shelf solution that meets your needs.

What would be the best way to index and search my data using Lucene?

I’ve found multiple questions on SO and elsewhere that ask questions along the lines of “How can I index and then search relational data in Lucene”. Quite rightly these questions are met with the standard response that Lucene is not designed to model data like this. This quote I found sums it up…
A Lucene Index is a Document Store. In a Document Store, a single
document represents a single concept with all necessary data stored to
represent that concept (compared to that same concept being spread
across multiple tables in an RDBMS requiring several joins to
re-create).
So I will not ask that question and instead provide my high level requirements and see if any Lucene gurus out there can help me.
We have data on People (Name, Gender, DOB, Nationality, etc)
And data on Companies (Name, Country, City, etc).
We also have data about how these two types of entity relate to each other where a person worked at the company (Person, Company, Role, Date Started, Date Ended, etc).
We have two entities – Person and Company – that have their own properties and then properties exist for the many-to-many link between them.
Some example searches could be as follows…
Find all Companies in Australia
Find all People born between two dates
Find all People who have worked as a .Net Developer
Find all males who have worked as a.Net Developer in London.
Find all People who have worked as a .Net Developer between 2008 and 2010
The criteria span all the three sets of data. Our requirement is to provide a Faceted Search over the data that accepts any combination of the various properties, of which I have given some examples.
I would like to use Lucene.Net for this. We are a .Net software house and so feel slightly intimidated by java. However, all suggestions are welcome.
I am aware of the idea that the Index should be constructed with the search in mind. But I can’t seem to come up with a sensible index that would meet all the combinations of search criteria
What classes native to Lucene or what extension points can we make use of.
Are there are established techniques for doing this kind of thing?
Are there any third open source contributions that I have missed that will help us here?
For now I won’t describe the scenarios we have considered because I don’t want to bloat out this question and make it too intimidating. Please ask me to elaborate where necessary.
To store both companies and people in a single index, you could create documents with a type field that identifies the type of entities they describe.
Birthdays can be stored as date fields.
You could give each person a simple text field containing the names of companies that they worked for. Note that you won't get an error if you enter a company that is not represented by a document in your index. Lucene is not a relational DB tool, but you knew that.
(Sorry that I've not posted any links to the API; I'm familiar with Lucene Core but not Lucene.NET.)

How do I structure my database so that two tables that constitute the same "element" link to another?

I read up on database structuring and normalization and decided to remodel the database behind my learning thingie to reduce redundancy.
I have different types of entries that can be learned. Gap texts/cloze tests (one text, many gaps) and simple known-unknown (one question, one answer) types.
Now I'm in a bit of a pickle:
gaps need exactly the same columns in the user table as question-answer types
but they need less columns than question-answer types (all that info is in the clozetests table)
I'm wishing for a "magic" foreign key that can point both to the gap and the terms table. Of course their ids would overlap though. I don't like having both a term_id and gap_id in the user_terms, that seems unelegant (but is the most elegant I can come up with after googling for a while, not knowing what name this pickle goes by).
I don't want a user_gaps analogue to user_terms, because then I'd be in the same pickle when it comes to the table user_terms_answers.
I put up this cardboard cutout collage of my schema. I didn't remove the stuff that isn't relevant for this question, but I can do that if anyone's confusion can be remedied like that. I think it looks super tidy already. Tidier than my mental concept of this at least.
Did I say any help would be greatly appreciated? Answerers might find themselves adulated for their wisdom.
Background story if you care, it's not really relevant to the question.
Before remodeling I had them all in one table (because I added the gap texts in a hurry), so that the gap texts were "normal" items without answers, while the gaps where items without questions. The application linked them together.
Edit
I added an answer after SO coughed up some helpful posts. I'm not yet 100% satisfied. I try to write views for common queries to this set up now and again I feel like I'll have to pull application logic for something that is database turf.
As mentioned in the comment, it is hard to answer without knowing the whole story. So, here is a story and a model to match. See if you can adapt this to you example.
School of (foreign) languages offers exams for several levels of language proficiency. The school maintains many pre-made tests for each level of each language (LangLevelTestNo).
Each test contains several (many) questions. Each question can be simple or of the close-text-type. Correct answers are stored for each simple question. Correct terms are stored for each gap of each close-text question.
Student can take an exam for a language level and is presented with one of the pre-made tests. For each student exam, the exam form is maintained which stores students answers for each question of the exam. Like a question, an answer may be of a simple of of a close-text-type.
After editing my question some Stackoverflow started relating the right questions to me.
I knew this was a common problem, but I really couldn't find it, just couldn't come up with the right search terms, I guess.
The following threads address similar problems and I'll try to apply that logic to my own design. They all propose adding a higher-level description for (in my case terms and gaps) like items. That makes sense and reflects the logic behind my application.
Relation Database Design
Foreign Key on multiple columns in one of several tables
Foreign Key refering to primary key across multiple tables
And this good person illustrates how to retrieve the data once it's broken up across tables. He also clues me to the keyword class table inheritance, so now I know what to google.
I'll post back with my edited schema once I've applied this. It does seem more elegant like this.
Edited schema

Core Data model design — search vs relationships?

I'm familiar with Core Data basics and have done some dabbling, but have not really done any major apps. Now I need to plan for one. And the question is not specifically about Core Data, but more about data design in general, though I am going to use Core Data to implement it on iPhone which is important for considering performance.
Imagine I am making an email app, where emails are the core object. I need to provide multiple views into the email store: search by user as well as many other criteria: say, "all emails with more than two recipients", "all emails where subject is longer than X", "all emails containing word X" etc.
Some objects, like people (senders/recipients), lend themselves naturally to being modeled as first-class objects, so I could do that and just create many-to-many relations between people and emails. Other searches, such as some examples above, are more artificial and there is no natural way to model them. However, I am able to enumerate the new searches in advance, i.e I know beforehand what will be the criteria.
So, to do things like "emails with >2 recipients" and "emails where subject is longer than X", I think I have two strategies:
1) model these as a special "search" object, and create many-to-many relations between emails and search objects when inserting new objects into store so it is a simple join query when searching;
2) not model anything beyond the core email object and just do searches with predicates from the store at runtime.
My question is:
based on your Core Data instincts, how big is the difference between these two strategies from a performance perspective? My gut tells me #1 will always be faster, but if it is 10%, I am willing to take the performance hit in order to be more flexible with #2. But if #2 will be 200% slower, I need to put more work into modeling the search object and essentially pre-generating all the search results.
I know the exact answer will depend on specifics of data, but there must be a gut feeling you have :) Let's say there are on the order of tens of thousands, but not millions, of content objects, and each record is a few paragraphs of content text with several fields of metadata.
Typically, I would recommend going with strategy two and only spend time researching and developing other techniques if you actually run into performance issues during testing. Core Data is often faster than people think especially on the iPhone.
However, if you are able to determine all the possible searches ahead of time, that does give you an advantage. It sounds like as an email is created, you would check it and add it to all the appropriate "search" objects. My gut feeling is that strategy one would be significantly faster especially at tens of thousands of email objects.