Category Implementation in a database - sql

I'm building a system that involves users and teachers. In this particular system however I would like to categorize the teachers, but the tricky part is the categories are dynamic thus they can change anytime.
I have to have some functions, since I'm developing the backend;
The first one is showAllCategories(), that shows all the main categories.
Second is the showSubcategories() which shows the subcategories of a category()
Third is the showContent(), which in this case shows the teacher's information.
Before asking mighty Stack-Overflowers how would this be efficiently implemented, I thought I could use a doubly linked list approach where in categories table CategoryName, Before, After, Content and if the category did not have the after, the content would be pointing to the teacher's table. This is my classic SQL approach however I'm using MongoDB and since I'm a beginner I wonder if I could take the advantage of NoSQL in this particular situation?

MongoDb natively supports the Array type, which behaves actually more like a list. With $push and $pull you can add and remove elements from such an array field. $addToSet even makes sure there are no dublicates.
Now is the question of how the categories are stored. You can make a collection categories with the main categories, and they would be having a field each that has the array of the sub-categories:
{"_id": "science", "sub": ["chemist", "physicist", "biology"]}
{"_id": "languages", "sub": ["english", "german", "spanish"]}
Your teacher collection on the other hand would then have an array of embedded documents, the categories of the teacher. They are duplicates of those found in the categories collection, minus the fields that you won't need in the teacher view. This way you avoid joins, since they don't exist in MongoDB.
{
"_id": ObjectId(...),
"name": {"first": "Foo", "last": "Bar"},
"categories": ["chemist", "biology"]
}
The rest I am sure you can think up.
Addition: In short, use the flexible types that MongoDB offers, and don't worry about data redundancy. Embed documents often and don't forget the indexes.

Related

How do I get started in producing mathematical models from data?

Lets say I have a system where user-defined questionnaires are displayed and a number of respondents submit their answers. Below is a sample JSON object of an answer:
{
"fullname": "Some guy"
"gender": male,
"q1": true,
"q2": false,
"q3": true,
"q4": false,
"q5": true,
"qualifications": ["Diploma","Degree"]
}
Now most developers could query results like this and be able to produce answer questions like
What percentage of respondents answered TRUE to question 1?
How many diploma holders are female?
I want to produce these answers without a developer being involved. Sure, I could just supply the raw data and let the users start making their own pivot tables in Excel, but even an Excel pivot table is describing the relationship of data. That is respondents where "qualifications" includes "Diploma", give a breakdown of gender.
I know I'm dipping into data science and mathematical models, but I'm not sure where's the best place to start.
How do I describe these relationships in software? What standards and tools exist? If I have the schema of data available, can I have the machine figure out the relationships (like making suggestions)?

A way to store array type data into single database table?

I am having a input JSON which I need to feed into a database. We are exploring on whether to normalize or not our database tables.
Following is the structure for the input data (json):
"attachments": [
{
"filename": "abc.pdf",
"url": "https://www.web.com/abc.pdf",
"type": "done"
},
{
"filename": "pqr.pdf",
"url": "https://www.web.com/pqr.pdf",
"type": "done"
},
],
In the above example, attachments could have multiple values (more than 2, upto 8).
We were thinking of creating a different table called DB_ATTACHMENT and keep all the attachments for a worker down there. But the issue is we have somewhat 30+ different attachment type array (phone, address, previous_emp, visas, etc.)
Is there a way to store everything in ONE table (employee)? One I can think of is using a single column (ATTACHMENT) and add all the data in 'delimited-format' and have the logic at target system to parse and extract everything..
Any other better solution?
Thanks..
Is there a way to store everything in ONE table (employee)? One I can
think of is using a single column (ATTACHMENT) and add all the data in
'delimited-format' and have the logic at target system to parse and
extract everything.. Any other better solution?
You can store the data in a single VARCHAR column as JSON, then recover the information in the client decoding this JSON data.
Also, there are already some SQL implementations offering native JSON datatypes. For example:
mariaDB: https://mariadb.com/kb/en/mariadb/column_json/
mySQL: https://dev.mysql.com/doc/refman/5.7/en/json.html
Database systems store your data and offer you SQL to simplify your search requests in case your data is structured.
It depends on you to decide whether you want to store the data structured to benefit from the SQL or leave the search requester with the burden of parsing it.
It very much depends on how you intend to use the data. I'm not totally sure I understand your question, so I am going to rephrase the business domain I think you're working with - please comment if this is not correct.
The system manages 0..n employees.
One employee may have 0..8 attachments.
An attachment belongs to exactly 1 employee.
An attachment may be one of 30 different types.
Each attachment type may have its own schema.
If attachments aren't important in the business domain - they're basically notes, and you don't need to query or reason about them - you could store them as a column on the "employee" table, and parse them when you show them to the end user.
This solution may seem easier - but don't underestimate the conversion logic - you have to support Create, Read, Update and Delete for each attachment.
If attachments are meaningful in the business domain, this very quickly breaks down. If you need to answer questions like "find all employees who have attached abc.pdf", "find employees who do not have a telephone_number attachment", unpacking each employee_attachment makes your query very difficult.
In this case, you almost certainly need to store attachments in one or more separate tables. If the schema for each attachment is, indeed, different, you need to work out how to deal with inheritance in relational database models.
Finally - some database engines support formats like JSON and XML natively. Yours may offer this as a compromise solution.

Retrieving freebase quad dump type names from id

I'm currently working on a project using the freebase dumps, which I insert assertions into a per-mid ordered LevelDB. My goal is to be able for a given name, like Bob Dylan, to retrieve every types linked to it name.
For example, "Bob Dylan" would correspond to "Musician", "Film Producer" and so on, each corresponding themselves to the types "/music/artist", "/film/producer" etc...
Unfortunatly, if it's rather easy to find out the Bob Dylan mid into the quad dump
/m/bobdylanmid /common/topic/notable_types /music/artist
/m/bobdylanmid /common/topic/notable_types /film/producer
I'd like to be able to find those types names in various languages now. But I can't find a logical way to retrieve them in the dump.
Any clue please?
I'm not 100% certain, but I don't think the schema is actually in the quad dump. I know it never used to be.
You'll need to look up the names using a query like this. Unfortunately, the human readable names exist only in English, so you'd need to jump through some more hoops to get other languages. For that you could try something along the lines of this slightly more complicated query
[{
"id": "/music/artist",
"/freebase/type_profile/equivalent_topic": {
"name": {
"lang": null,
"value": null
}
},
"name": null
}]​
It depends on the "equivalent topic" property being filled in, which may not be the case for all types. If you only want a few languages, you could modify the query to return those explicitly ("Musician" has 45 different language variants).
If you are mainly interested in cases like your example (a person is/was a ...) using properties (rather than types) may do the job, in your case (the latter via a cvt):
/people/person/profession
/people/person/employment_history /business/employment_tenure/title
This might be more what you want to have anyways, unless you also want to display that e.g. Alan Turing is a "Literature Subject".
For the corresponding instances (with types /business/job_title, /people/profession) you can get the names in different languages (if existing).

How to extract property of a collection in the root document

I'm using RavenDB and I'm having trouble extracting a particular value using the Lucene Query.
Here is the JSON in my document:
{
"customer" : "my customer"
"locations": [
{
"name": "vel arcu. Curabitur",
"settings": {
"enabled": true
}
}
]
}
Here is my query:
var list = session.Advanced.LuceneQuery<ExpandoObject>()
.SelectFields<ExpandoObject>("customer", "locations;settings.enabled", "locations;name")
.ToList();
The list is populated and contains a bunch of ExpandoObjects with customer properties but I can't for the life of me get the location -> name or location -> settings -> enabled to come back.
Is the ";" or "." incorrect usage??
It seems that you have misunderstood the concept of indexes and queries in RavenDB. When you load a document in RavenDB you always load the whole document including all of its contents it contains. So in your case, if you load a customer, you already have the collection and all its children loaded. That means, you can use standard linq-to-objects to extract all these values, no need for anything special like indexes or lucene here.
If you want to do this extraction on the database side, so that you can query on those properties, then you need an index. Indexes are written using linq, but it's important to understand that they run on the server and just extract some data to populate the lucene index from. But here again, in most cases you don't even have to write the indexes yourself because RavenDB can create them automatically for you.
I no case, you need to write lucene queries like the one in your question because in RavenDB lucene queries will always be executed against a pre-built index, and these are generally flat. But again, chances are you don't need to do anything with lucene to get what you want.
I hope that makes sense for you. If not, please update your question and tell us more about what you actually want to do.
Technically, you can use the comma operator "," to nest into collections.
That should work, but it isn't recommended. You can just get your whole object and use it, it is easier and faster.

Modeling products with vastly different sets of needed-to-know information and linking them to lineitems?

I'm currently working on a site that sells products of varying types that are custom manufactured. I've got your general, standard cart schema: Order has many LineItems, LineItems have one Product, but I've run into a bit of a sticking point:
Lets say one of our products is a ball, and one of our products is a box of crayons. While people are creating their order, we end up creating items that could be represented by some psuedocode:
Ball:
attributes:
diameter: decimal
color: foreign_ref_to Colors.id
material: foreign_ref to Materials.id
CrayonBox:
attributes:
width: decimal
height: decimal
front_text: string
crayons: many_to_many with Crayon
...
Now, these are created and stored in our db before an order is made. I can pretty easily make it so that when an item is added to a cart, we get a product name and price by doing the linking from Ball or CrayonBox in my controller and generating the LineItem, but it would be nice if we could provide a full set of info for every line item.
I've thought of a few possible solutions, but none that seem ideal:
One: use an intermediary "product info" linking table, and represent different products in terms of that, so we'd have something like:
LineItem
information: many_to_many with product_information
...
ProductInformation:
lineitem: many_to_many with line_item
name: string
value: string
ProductInformation(name='color', value=$SOMECOLOR)
ProductInformation(name='color', value=$SOMEOTHERCOLOR)
...
The problem with this is that the types of data needed to be represented for each attribute of a product does not all fall under the same column type. I could represent everything with strings, but $DEITY knows I don't even come close to thinking that's a good solution.
The other solution I've thought of is having the LineItem table have a foreign key to each table that represents a Product type. Unfortunately, this means I would have to check for the existence of each foreign key in my controller. I don't like this very much at all, but I like it marginally better than stuffing every piece of data into one datatype and then dealing with all the conversion stuff outside of the DB.
One other possible solution would be to store the tablename of the product data in a column, but that can't possibly be a good thing to do, can it? I lose the capability of the db to link stuff together, and it strikes me as akin to using eval() where it's not needed -- and we all know that eval() isn't really needed very often.
I want to be able to say "give me the line item, and then the extended info for that line item", and have the correct set of information for various product types.
So, people who actually know what they're doing with database schema, what should I be doing? How should I be representing this? This seems like it would be a fairly common use case, but I haven't been able to find much info with googling -- is there a common pattern for things like this? Need more info? This can't possibly be outside of the realm of "you can use a RDBMS for this", can it?
Edit: I'm now fairly certain that what I want here is Class Table Inheritance. with an alias in my individual models to "normalize" the link followed to the "info" table for each product type. Unfortunately, the ORM I'm kinda stuck using for this (Doctrine 1.2) doesn't support Class Table Inheritance. I may be able to accomplish something similar with Doctrine's "column aggregation" inheritance, but egh. Anyone think I'm barking way up the wrong tree? I looked over EAV, and I don't think it quite fits the problem -- each set of information about different products is known, although they might be very different from product type A to product type B. The flexibility of EAV may be nice, but it seems like an abuse of the db for a problem like this.
It strikes me that this is a perfect fit for the likes of CouchDB / MongoDB which allow every 'row' to contain different attributes, yet permits indexed lookups. It should be fairly straightforward to build a hybrid structure using MySQL for the rigid relational parts and 'nosql' for the parts of varying shape.
Take a look at this discussion.
Assumptions:
You have some specific products you're selling. I.e., you know you're selling crayons, but not spatulas. The customer doesn't come to your site and try to order a product you've never heard of.
The products you're selling have a pre-existing set of attributes. I.e., crayons have color; crayon_boxes have width, height, crayons... The customer doesn't come to your site and try to specify the value for an attribute you've never heard of.
One way to do this (if you're a RBDM purist, please close your eyes now until I tell you to open them again) is to use an attribute string. So the table would be like this:
Products
+ ProductName
+ ProductAttribute
And then a sample record would be like this:
Product Name = "Crayon Box"
Product Attribute = "Height:5 inches;Width:7 inches"
With something like this, parse the name/value pairs in or out as necessary.