Tree-Structures & SQL - Looking for design recommendations

Tree-Structures & SQL - Looking for design recommendations - sql

from what I've researched so far, this topic is both well documented and very broad. So I'm hoping you can safe me some time diving into the depths of how to store trees in a database by pointing me in the right direction.
I'm working with questionnaires, similarly to how HL7/FHIR approach them:There's two classes: Questionnaire and Item, with Questionnaire consisting of a Set of Items. However, Items can refer to any number of additional Items (i.e. children).So basically, I have a n-ary tree-like structure with - depending on how you want to look at it -a) a Questionnaire-Object as root and several Items as childrenb) several Items as a root each (i.e. a forest), again each with several Items as children
class Questionnaire {
items: Set<Item>
inner class Item {
children: Set<Item>
}
}
This part of the data structure unfortunately is non-negotiable (other than the use of inner classes, which I could change).
I'm now looking for a sensible way to store such a structure in my database (currently MySQL).
Luckily, I'm only ever storing and reading the whole questionnaire. I do not need to access individual nodes or branches, and the data will not be changed / updated (because any change to an existing Questionnaire will result in a new Questionnaire as per my projects definition). So I only need to work with SELECT and INSERT statements, each for one complete Questionnaire (with all its items).
My first approach was to reverse the Item-to-Item relationship, i.e. referring to one parent rather than several children. However, I fear that this might be hell to translate back into the already fixed object-structure. I'm hoping for a fairly easy solution.
Please note that I am aware that there's probably really nice solutions using ORM, but I've been having trouble wrapping my head around the whole setup progress lately, and am now too pressed for time to get into that. Right now, I need a solution in plain SQL to show results. ORM will have to wait a little, but I will get back to that!Also note that performance does not matter right now.
Thanks in advance for your efforts, your help will be much apreciated!

So here's what I ended up doing in case anyone else is looking for an answer:
Let's take as an example my QuestionnaireResponse class:
data class QuestionnaireResponse (
val qID: String,
val timeStamp: String,
val items: List<Item> = listOf<Item>()) {
inner class Item (val itemID: String, val itemType: String, var unit: String,
var answers: MutableList<String> = mutableListOf())
}
Where qID references the Questionnaire that has been answered here.
When a Questionnaire is answered, I'll receive the above object in JSON. I decided to parse the incoming JSON to my data structure, extract qID and timeStamp, and store those values in my database. That way I can select only those QuestionnaireResponses answering to a specific Questionnaire, and filter by timeStamp, while still circumventing to try and represent that (basically recursive) structure into my Db.
The SQL code to create the corresponding table looks like this:
CREATE TABLE `questionnaireresponses` (
`questionnaireID` int NOT NULL,
`timestamp` varchar(25) NOT NULL,
`questionnaireResonseObject` json DEFAULT NULL,
PRIMARY KEY (`questionnaireID`,`timestamp`),
CONSTRAINT `answeredQuestionnaire` FOREIGN KEY (`questionnaireID`) REFERENCES `questionnaires` (`id`))
From what I read, not all databases support the json data type. What it does in MySQL is making sure that the inserted data is formatted properly. I never mind additional checks, but since I've already been successfully parsing the JSON in my application before inserting it into the Db, that step can be omitted. Thus, if your Db doesn't support the json-type, any type that allows to store strings of variable length (e.g. text or blob) might work as well.

Related

How to save an object with a list as attribute in a SQL Database

I want to store a list of Day objects in my sqflite database.
The Day class looks like this:
class Day {
String date;
List<DoneTask> doneTasks;
double score = 0;
Day({this.doneTasks, this.date});
}
the DoneTask class:
class DoneTask {
Category category;
double score;
String description;
DoneTask({this.category, this.score, this.description});
}
Category has an attribute id, which is all I want to store from that.
I'm not sure how I can realize that with sqflite.
I was thinking about adding the Attribute String day to the DoneTasks class for loading the DoneTasks in first, and sort them into the Days later. But this does not sound like a good solution for me, has anyone an idea how I could do it in a better way?
I'm very new to using SQL, so id appreciate simple answers/
(this is what I used yet for sqflite: https://flutter.dev/docs/cookbook/persistence/sqlite)

I would recommend going with a noSQL database if that is the data architecture you are going for. Of course, I do not know the scope of your entire project thus it is possible that an SQL database is a better fit for some reason that is unknown to me. But given just what you have presented, a noSQL alternative to sqflite seems like a better option. This would allow records to be stored effectively as objects, allowing you to store objects within objects, rather than having to create a bunch of tables cross referencing one another. It just seems more intuitive to me to do it that way.

Here is how I would save it :
First create a table to save your Day object. Inside it you will save only the date and score properties.
The method Database().insert return a Future<int> which is the id of the newly created row so you can use it to save your DoneTask and link them to the Day.
Now, you can save each of your List<DoneTask> in another table with a column id_day as their identifier.
Here is a modelization of what it could look.

How to associate multiple tables on Exposed

Now I'm creating an API using Kotlin with Exposed, Ktor and Postgres.
When I need to do some select on a table which is associated with another table I have to "parse" the ResultRow to an entity object. If this table have more associations I do have to repeat the association every time, what is weird.
Is there an easy way to do it?
Cuz I think to write a lot of code to create the objects it too much!
Now I'm doing like this.
fun ResultRow.toInterval() = Interval(
this[Intervals.idInterval],
Setting(
this[OfficeSettings.idOffice],
Office(
this[Offices.idOffice],
this[Offices.code],
this[Offices.name]
),
this[OfficeSettings.scheduleDaysRule],
this[OfficeSettings.serviceTimeRule],
this[OfficeSettings.countInterval],
this[OfficeSettings.restTimeRule]
),
this[Intervals.date],
this[Intervals.startTime],
this[Intervals.endTime]
)
Just to parse the result into an Interval, which is associated with a Setting, and Setting is associated with an Office.

Modeling products with vastly different sets of needed-to-know information and linking them to lineitems?

I'm currently working on a site that sells products of varying types that are custom manufactured. I've got your general, standard cart schema: Order has many LineItems, LineItems have one Product, but I've run into a bit of a sticking point:
Lets say one of our products is a ball, and one of our products is a box of crayons. While people are creating their order, we end up creating items that could be represented by some psuedocode:
Ball:
attributes:
diameter: decimal
color: foreign_ref_to Colors.id
material: foreign_ref to Materials.id
CrayonBox:
attributes:
width: decimal
height: decimal
front_text: string
crayons: many_to_many with Crayon
...
Now, these are created and stored in our db before an order is made. I can pretty easily make it so that when an item is added to a cart, we get a product name and price by doing the linking from Ball or CrayonBox in my controller and generating the LineItem, but it would be nice if we could provide a full set of info for every line item.
I've thought of a few possible solutions, but none that seem ideal:
One: use an intermediary "product info" linking table, and represent different products in terms of that, so we'd have something like:
LineItem
information: many_to_many with product_information
...
ProductInformation:
lineitem: many_to_many with line_item
name: string
value: string
ProductInformation(name='color', value=$SOMECOLOR)
ProductInformation(name='color', value=$SOMEOTHERCOLOR)
...
The problem with this is that the types of data needed to be represented for each attribute of a product does not all fall under the same column type. I could represent everything with strings, but $DEITY knows I don't even come close to thinking that's a good solution.
The other solution I've thought of is having the LineItem table have a foreign key to each table that represents a Product type. Unfortunately, this means I would have to check for the existence of each foreign key in my controller. I don't like this very much at all, but I like it marginally better than stuffing every piece of data into one datatype and then dealing with all the conversion stuff outside of the DB.
One other possible solution would be to store the tablename of the product data in a column, but that can't possibly be a good thing to do, can it? I lose the capability of the db to link stuff together, and it strikes me as akin to using eval() where it's not needed -- and we all know that eval() isn't really needed very often.
I want to be able to say "give me the line item, and then the extended info for that line item", and have the correct set of information for various product types.
So, people who actually know what they're doing with database schema, what should I be doing? How should I be representing this? This seems like it would be a fairly common use case, but I haven't been able to find much info with googling -- is there a common pattern for things like this? Need more info? This can't possibly be outside of the realm of "you can use a RDBMS for this", can it?
Edit: I'm now fairly certain that what I want here is Class Table Inheritance. with an alias in my individual models to "normalize" the link followed to the "info" table for each product type. Unfortunately, the ORM I'm kinda stuck using for this (Doctrine 1.2) doesn't support Class Table Inheritance. I may be able to accomplish something similar with Doctrine's "column aggregation" inheritance, but egh. Anyone think I'm barking way up the wrong tree? I looked over EAV, and I don't think it quite fits the problem -- each set of information about different products is known, although they might be very different from product type A to product type B. The flexibility of EAV may be nice, but it seems like an abuse of the db for a problem like this.

It strikes me that this is a perfect fit for the likes of CouchDB / MongoDB which allow every 'row' to contain different attributes, yet permits indexed lookups. It should be fairly straightforward to build a hybrid structure using MySQL for the rigid relational parts and 'nosql' for the parts of varying shape.

Take a look at this discussion.

Assumptions:
You have some specific products you're selling. I.e., you know you're selling crayons, but not spatulas. The customer doesn't come to your site and try to order a product you've never heard of.
The products you're selling have a pre-existing set of attributes. I.e., crayons have color; crayon_boxes have width, height, crayons... The customer doesn't come to your site and try to specify the value for an attribute you've never heard of.
One way to do this (if you're a RBDM purist, please close your eyes now until I tell you to open them again) is to use an attribute string. So the table would be like this:
Products
+ ProductName
+ ProductAttribute
And then a sample record would be like this:
Product Name = "Crayon Box"
Product Attribute = "Height:5 inches;Width:7 inches"
With something like this, parse the name/value pairs in or out as necessary.

What are the best practices in creating a data access layer for an object that has a reference to another object?

(sorry for my English)
For example, in my DAL,I have an AuthorDB object, that has a Name
and a BookDB object, that has a Title and an IdAuthor.
Now, if I want to show all the books with their corresponding author's name, I have to get a collection of all the Books, and for each of them, with the IdAuthor attribute, find the Author's name. This makes a lot of queries to the database, and obviously, a simple JOIN could be used.
What are my options? Creating a 'custom' object that contains the author's name and the title of the book? If so, the maintenance could become awful.
So, what are the options?
Thank you!

Don't write something buggy, inefficient, and specialized ... when reliable, efficient, and generic tools are available. For free.
Pick an ORM. NHibernate, ActiveRecord, SubSonic, NPersist, LinqToEF, LinqToSQL, LLBLGenPro, DB4O, CSLA, etc.

You can create a View in the database that has the join built into it and bind an an object to that, e.g. AuthorBooksDB. It doesn't create too bad a maintenance headache since the view can hide any underlying changes and remains static.

If you can separate the database query from the object building, then you could create a query to get the data you need. Then pass that data to your builder and let it return your books.
With Linq to SQL, that can be done as easily as:
public IEnumerable<Book> AllBooks()
{
return from book in db.Books
join author in db.Authors on book.AuthorId equals author.Id
select new Book()
{
Title = book.Title,
Author = author.Name,
};
}
The same can be achieved with DataTables / DataSets:
public IEnumerable<Book> AllBooks()
{
DataTable booksAndAuthors = QueryAllBooksAndAuthors(); // encapsulates the sql query
foreach (DataRow row in booksAndAuthors.Rows)
{
Book book = new Book();
book.Title = row["Title"];
book.Author = row["AuthorName"];
yield return book;
}
}

Thank you very much for your inputs.
Actually, we are trying to keep the database objects as close as possible to the actual columns of the corresponding table in the database. That's why we cannot really add a (string) 'Author' attribute to the BookDB object.
Here is the problem I see with using 'View' objects. In the database, if the schema has to be modified for any reason (e.g: In the Book table, the 'Title' column has to be modified for 'The_Title', how do we easily know all the 'View' objects that have to be modified? In other words, how to I know what objects have to be modified when they make queries that use multiple joins?
Here, since we have a AuthorsBooks object, we can see by the name that it probably makes a query to the book and author tables. However, with objects that make 4 or 5 joins between tables we cannot rely on the object name.
Any ideas? (Thank you again, this is a great site!)

I suggest you take a look at Domain Driven Design. In DDD, you get all business objects from a repository. The repository hides your data store and implementation, and will solve your problems on how to query data and keeping track of database changes. Because every business object is retrieved from the repository, the repository will be your single point of change. The repository can then query your database in any way you find efficient, and then build your domain objects from that data:
var books = new BookRepository().GetAllBooks();
You should be able to code the repositories with any of the technologies mentioned by Justice.

Database : best way to model a spreadsheet

I am trying to figure out the best way to model a spreadsheet (from the database point of view), taking into account :
The spreadsheet can contain a variable number of rows.
The spreadsheet can contain a variable number of columns.
Each column can contain one single value, but its type is unknown (integer, date, string).
It has to be easy (and performant) to generate a CSV file containing the data.
I am thinking about something like :
class Cell(models.Model):
column = models.ForeignKey(Column)
row_number = models.IntegerField()
value = models.CharField(max_length=100)
class Column(models.Model):
spreadsheet = models.ForeignKey(Spreadsheet)
name = models.CharField(max_length=100)
type = models.CharField(max_length=100)
class Spreadsheet(models.Model):
name = models.CharField(max_length=100)
creation_date = models.DateField()
Can you think about a better way to model a spreadsheet ? My approach allows to store the data as a String. I am worried about it being too slow to generate the CSV file.

from a relational viewpoint:
Spreadsheet <-->> Cell : RowId, ColumnId, ValueType, Contents
there is no requirement for row and column to be entities, but you can if you like

Databases aren't designed for this. But you can try a couple of different ways.
The naiive way to do it is to do a version of One Table To Rule Them All. That is, create a giant generic table, all types being (n)varchars, that has enough columns to cover any forseeable spreadsheet. Then, you'll need a second table to store metadata about the first, such as what Column1's spreadsheet column name is, what type it stores (so you can cast in and out), etc. Then you'll need triggers to run against inserts that check the data coming in and the metadata to make sure the data isn't corrupt, etc etc etc. As you can see, this way is a complete and utter cluster. I'd run screaming from it.
The second option is to store your data as XML. Most modern databases have XML data types and some support for xpath within queries. You can also use XSDs to provide some kind of data validation, and xslts to transform that data into CSVs. I'm currently doing something similar with configuration files, and its working out okay so far. No word on performance issues yet, but I'm trusting Knuth on that one.
The first option is probably much easier to search and faster to retrieve data from, but the second is probably more stable and definitely easier to program against.
It's times like this I wish Celko had a SO account.

You may want to study EAV (Entity-attribute-value) data models, as they are trying to solve a similar problem.
Entity-Attribute-Value - Wikipedia

The best solution greatly depends of the way the database will be used. Try to find a couple of top use cases you expect and then decide the design. For example if there is no use case to get the value of a certain cell from database (the data is always loaded at row level, or even in group of rows) then is no need to have a 'cell' stored as such.

That is a good question that calls for many answers, depending how you approach it, I'd love to share an opinion with you.
This topic is one the various we searched about at Zenkit, we even wrote an article about, we'd love your opinion on it: https://zenkit.com/en/blog/spreadsheets-vs-databases/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas