We are currently developing a online advert site for people to buy and sell (similar to gumtree) difference being this will be used for employees who work for the company, it wont be reachable from people outside the company.
Now we have 15 categories which have sub categories and those sub categories have child categories.
We have a main table called Adverts which consists on ItemId, Title, SubTitle, Description, CreatedBy, BroughtBy, StartDate, EndDate and ParentCategoryId, SubCategoryId and ChildCategoryId etc
Now instead of having one massive tables which consists of all the details for the item they are selling we were going to create separate table(s) per category for the details of the item.
So we would have Advert.Vehicle_Spec which would have all the details about a car they were selling i.e
ItemId (which will be a FK to the main Advert table), Make, Model, Colour, Mot, Tax etc
That way when we query the main table Advert we can join onto the relevant Spec table which in a way would keep the tables clean and tidy now my question would be to you is this a good approach? Would there be any performance issues with this approach? I will create all the relevant FK where needed to help with queries etc.
I did ask this question on an SQL Server forum and one individual suggested using XML - each category gets an XML schema and the XML tags and values are held in a single field but the data varies depending on what type of item is being sold. This requires more setup but probably has the best overall balance of performance and flexibility, I Personally have never worked with XML within SQL so I can't comment on this being a good approach or not?
Each category can have many different status's we have a variety of tables already which hold the description of each status, the queries we will be performing will vary from select, delete, insert, update some queries will have multiple joins on to the Status/User table, we will also be implementing a "Suggested" form which will show all records suggested for a user depending on what they search for.
Is XML right for this in regards to flexibility and performance?
XML seems to be a good approach for this, you can directly write stored procedures that queries the specific categories you want and organize them into tables and display them. You will then possibly want to use something like XSLT to extract the XML data and display them in a table.
Related
I'm working in a project that uses DynamoDB for most persistent data. I'm now trying to model a data structure that more resembles what one would model in a traditional SQL database, but I'd like to explore the possibilities for a good NoSQL design also for this kind of data.
As an example, consider a simple N-to-N relation such as items grouped into categories. In SQL, this might be modeled with a connection table such as
items
-----
item_id (PK)
name
categories
----------
category_id (PK)
name
item_categories
---------------
item_id (PK)
category_id (PK)
To list all items in a category, one could perform a join such as
SELECT items.name from items
JOIN item_categories ON items.item_id = item_categories.item_id
WHERE item_categories.category_id = ?
And to list all categories to which an item belongs, the corresponding query could be made:
SELECT categories.name from categories
JOIN item_categories ON categories.category_id = item_categories.category_id
WHERE item_categories.item_id = ?
Is there any hope in modeling a relation like this with a NoSQL database in general, and DynamoDB in particular, in a fairly efficient way (not requiring a lot of (N, even?) separate operations) for a simple use-case like the ones above - when there is no equivalent of JOINs?
Or should I just go for RDS instead?
Things I have considered:
Inline categories as an array within item. This makes it easy to find the categories of an item, but does not solve getting all items within a category. And I would need to duplicate the needed attributes such as category name etc within each item. Category updates would be awkward.
Duplicate each item for each category and use category_id as range key, and add a GSI with the reverse (category_id as hash, item_id as range). De-normalizing being common for NoSQL, but I still have doubts. Possibly split items into items and item_details and only duplicate the most common attributes that are needed in listings etc.
Go for a connection table mapping items to categories and vice versa. Use [item_id, category_id] as key and [category_id, item_id] as GSI, to support both queries. Duplicate the most common attributes (name etc) here. To get all full items for a category I would still need to perform one query followed by N get operations though, which consumes a lot of CU:s. Updates of item or category names would require multipe update operations, but not too difficult.
The dilemma I have is that the format of the data itself suits a document database perfectly, while the relations I need fit an SQL database. If possible I'd like to stay with DynamoDB, but obviously not at any cost...
You are already in looking in the right direction!
In order to make an informed decision you will need to also consider the cardinality of your data:
Will you be expecting to have just a few (less then ten?) categories? Or quite a lot (ie hundreds, thousands, tens of thousands etc.)
How about items per category: Do you expect to have many cagories with a few items in each or lots of items in a few categories?
Then, you need to consider the cardinality of the total data set and the frequency of various types of queries. Will you most often need to retrieve only items in a single category? Or will you be mostly querying to retrieve items individually and you just need stayistics for number of items per category etc.
Finally, consider the expected growth of your dataset over time. DynamoDB will generally outperform an RDBMS at scale as long as your queries partition well.
Also consider the acceptable latency for each type of query you expect to perform, especially at scale. For instance, if you expect to have hundreds of categories with hundreds of thousands of items each, what does it mean to retrieve all items in a category? Surely you wouldn't be displaying them all to the user at once.
I encourage you to also consider another type of data store to accompany DynamoDB if you need statistics for your data, such as ElasticSearch or a Redis cluster.
In the end, if aggregate queries or joins are essential to your use case, or if the dataset at scale can generally be processed comfortably on a single RDBMS instance, don't try to fit a square peg in a round hole. A managed RDBMS solution like Aurora might be a better fit.
Background
I work for a real estate technology company. An upcoming project involves building out functionality to allow users to affix tags/labels (plural) to a MLS listing (real estate property). The second requirement is to allow a user to search by one or more tags. We won't be dealing with keeping track of counts or building word clouds or anything like that.
Solutions Researched
I found this SO Q&A and think the solution is pretty straightforward and have attempted to adapt some ideas from it below. Also, I understand that JSONB support is much better in 9.5 and it may be a possibility. If you have any insight here I'd love to hear your thoughts as well in an answer.
Attempted Solution
Table: Tags
Columns: ID, OwnerID, TagName, CreatedDate
Table: TaggedItems
Columns: ID, TagID (references above), PropertyID, CreatedDate, (Possibly some denormalized data to assist with presenting search results; property name, original listor, etc.)
Inserting new tags should be straightforward. Searching tags should also be straightforward since the user will select one or multiple tags from a searchable dropdown, thus affording me access to the actual TagID which I can use to query the TaggedItems table. When showing the full profile view for a listing, I can use it's PropertyID and the UserID to query my tables for the existence of one or more Tags to display in the view.
Edit: It's probably worth noting that we don't keep an entire database of properties, we access them via an API partner; hence the two table solution and not 3.
If you want to Nth normalize you would actually use 3 tables.
1 Property/Listing
2 Tags
3 CrossReferenceBetween the Two
The 3rd table creates a many to many relationship between the other 2 tables.
In this case only the 3 rd table would carry both the tagid and the property.
Going with 2 tables if fine too depending on how large of use you have as a small string won't bloat your databse too much.
I would say that it is strongly preferable to separate the tags to a separate table when you need to do lookups and more on it. Otherwise you have to have a delimited list which then what happens if a user injects a delimiter into their tag value? Also how do you plan on searching the delimited list? You will constantly expand that to a table or use regex and the regex might give you false positives as "some" will match "some" and "something" depending on how you write your code.......
I have made a website in which I have a blog and a products page. Posts and products are stored in different tables. I use microsoft sql server.
I want to create a table to store the views for each post and for each product. My 2 possible designs are:
One table for all views
(id, ref_id, date, ip, type)
where type is either post or product
Two separate tables, table post_views and table product_views
post_views(id, post_id, date, ip)
product_views(id, product_id, date, ip)
Which design is better and why?
My reasoning:
Solution 1 requires more database space (we need to store the type value for each view), also requires a more "complex" query. We need to search by id and type.
Solution 1 is more compact. We have less tables, but the performance won't be so great. If we have 1 million records and 500k views are for the posts and 500k views are for the products, we would have to search all the views to filter by date (just an example)
Pros of 1st solution are the compactness and that I use one less table.
The second solution requires one more table, but the query performance and disk space would be better.
This is a very common design question that I face and I would love to receive an answer from a very good expert.
Since posts and products each have their own tables, go with the second option - meaning different tables for post views and product views.
The reason it's a better option is that this option allows you to use foreign keys between the views tables and the posts and products tables and keep the posts and products separated.
the first option will also allow you to use foreign key between these tables, but it will mean that you can only post views where the ref_id exists in both posts and products tables. Also, it will force more cumbersome select statements to include different joins based on the type column.
With no more information than you have provided, I would go with the first option, simply because it is more concise. Unless you have a compelling reason to break them into two tables, you should keep them in one. It's easy enough to filter any queries by type as needed.
Another consideration is that it's more common practice in stored procs to have a variable field parameter (such as your type field) than it is to have a variable table name, so the function of any stored procs that you create will be a bit more obvious to those who come after you if you keep them in one table.
I have a database in which I store a large amount of user-created products.
The user can also create "views" which works as a container holding these products, lets call them categories.
It would be simple if each product had a category-field containing the id of the category, however each product can be added to multiple categories so that doesn't work.
Presently I solve this by having a string-field "products" in the category-table which is a comma-separated list of product-ids.
What I'm wondering is basically if it's "okay" to do it this way? Is it generally accepted? Will it cause some kind of problem I'm not realizing?
Would it be better to create another table named something like productsInCategories which has 2 fields, one with a category-id and one with product-id and link them together this way?
Will one of these methods perform better or be better in some other way?
I'm using sqlce at the moment if that matters, but that will most likely change soon.
I would go for the second option: a separate table.
Makes it easier to handle if you need to query from the product perspective. Also the join to the categories will be simple and fast. This is exactly what relational databases are made for.
Imagine a simple query like what categories a product is in. With your solution you need to check all categories one by one, parse the csv-list of each category to find the products. With a separate table it is one clean query.
I have a problem, I am designing a database which will store different products and each product may have different details.
As an example it will need to store books with multiple authors and store software with different types of descriptions.
This is my current design:
Product_table
|ID|TYPE|COMPANY|
|1|1|1|
attr_table
|ID|NAME|
|1|ISBN10|
|2|ISBN13|
|3|Title|
|4|Author|
details_table
|ID|attr_id|value
|1|3|Book of adventures|
Connector_table
|id|pro_id|detail_id|
|1|1|1|
So the product table would only store the main product id, the company it belongs to and the type of product it is.
Then I would have the attribute table which lists each attribute a product could have, this will make it easier to add new types of products.
The details table will the hold all the values such as different authors, titles isbn10s etc.
And then the connector table would connect the product table and the details table.
My main worry is that the details table will get very large and will be storing lots of different data types.
What i would like would be to split up all of the different types into tables such as ISBN table and author tables.
If this is the case how could i link these tables up to the attr_table
Any help would be greatly appreciated.
Don't bother. You do not say what database you are using, but any reasonable database will be able to handle the details table. Databases are designed to handle big tables efficiently.
If it is really big, you might want to consider partitioning the table by some sort of theme.
Otherwise, just be sure that you have an index on the id in the table and probably on the attr_id as well. The structure should work fine.