Hbase and 1- Many Relation - one-to-many

I have one question which can be best described by the following scenario.
Suppose I have three tables BaseCategory,Category and products. If i am thinking in terms of RDBMS then the relationship amoung these tables are
1- One BaseCategory has Many categories
2- One Category has Many Products.
Now i am thinking to convert it into HBase. can anybody help me how to map these relations into HBase?

You'd probably have each row represent a supercategory/category pair (encoded with a separator, e.g. MySuperCategory:MyCategory, and a column family named "products" with a column for each product in that category.
This would allow you to very quickly retrieve all of the items in a given supercategory/category pair, and with some de-duplication all of the items in a supercategory.

Related

How to structure SQL tables with one (non-composite) candidate key and all non-primary attributes?

I'm not very familiar with relational databases but here is my question.
I have some raw data that's collected as a result of a customer survey. For each customer who participated, there is only one record and that's uniquely identifiable by the CustomerId attribute. All other attributes I believe fall under the non-prime key description as no other attribute depends on another, apart from the non-composite candidate key. Also, all columns are atomic, as in, none can be split into multiple columns.
For example, the columns are like CustomerId(non-sequential), Race, Weight, Height, Salary, EducationLevel, JobFunction, NumberOfCars, NumberOfChildren, MaritalStatus, GeneralHealth, MentalHealth and I have 100+ columns like this in total.
So, as far as I understand we can't talk about any form of normalization for this kind of dataset, am I correct?
However, given the excessive number of columns, if I wanted to split this monolithic table into tables with fewer columns, ie based on some categorisation of columns like demographics, health, employment etc, is there a specific name for such a structure/approach in the literature? All the tables are still going to be using the CustomerId as their primary key.
Yes, this is part of an assignment and as part of a task, it's required to fit this dataset into a relational DB, not a document DB which I don't think would gain anything in this case anyway.
So, there is no direct question as such as I worded above but creating a table with 100+ columns doesn't feel right to me. Therefore, what I am trying to understand is how the theory approaches such blobs. Some concept names or potential ideas for further investigation would be appreciated as I even don't know how to look this up.
In relational databases using all information in a table is not a good usage.
As you mentioned groping some columns in other tables and join all tables with master table is well. In this usage you can also manage one to many, many to one and many to many relationships. Such as customers could have more than one address or phone numbers.
An other usage is making a table like customer_properities and use columns like property_type and property_value and store data by rows.
But the first usage is more effective and most common usage
customer_id property_type properity_value
1 num_of_child 3
1 age 22
1 marial_status Single
.
.
.

store product and category in one table is good or else we can use seperate table for category and product?

I am totally confused to store category and product data stored in the same table in hierarchical relation/Parent-child relation use in a table or if we create two separate table for category and product table?
in above table, i used the same table for storing category and product with parentId and childId, If we use like this then what benefits? Or we use a separate table for category and product and why? Please anyone help me
This really depends on if the relationship between store and product categories are one to many or many to many and if a single category can only belong to one category tree.
If the relationship is one to many, and a category can only belong to one tree, then you'll be able to use a single table, with a foreign key referencing the same table.
Otherwise, you'll likely be looking 2 or 3 tables. At a minimum you'll need one table for your categories, and then another for you relationship (what's known as a composite key table).
Also, if Product categories and Store Categories are inherently different (hold different data) then you should be using separate tables for them anyway.

How do I model many-to-many relationships with tables that have similar attributes?

Here's a fairly straightforward many-to-many mapping of Nerf gun toys to the price range that they fall under. The Zombie Strike and Elite Retaliator are pricey, while both the Jolt Blaster and Elite Triad are cheaper (in the $5.00-$9.99) range.
So far so good. But what happens when I want to start tracking the prices of other items? These other items have different columns, but still need PRICE_RANGES mappings. So I can potentially still use the PRICE_RANGES table, but I need other tables for the other items.
Let's add board games. How should I model this new table, and others like it?
Should I add multiple many-to-many tables, one for each new type of item I'm tracking?
Or should I denormalize PRICE_RANGES, get rid of the mapping tables altogether, and just duplicate PRICE_RANGES tables for every item type?
The second solution has the advantage of being much similar, but at the cost of duplicating all the ranges in PRICE_RANGES. (and there may be many thousands of PRICE_RANGES, depending on how small the increments are). Is that denormalization still a valid solution?
Or maybe there's a third way that's considered better than these two?
Thanks for the help!
Why do you have a "price ranges" table at all? That would make it highly restrictive. Unless there is a really compelling reason I am missing... Here is what I would consider.
Drop the mapping tables
Drop the price ranges tables
Add a min price and max price to each table you want to track price ranges. If there is no range, you can either allow max price to be null, or make both be the same price. Then you can just query the tables to find items within whatever range you want.
Another thought I would consider... how many different types of products are you trying to track? If you are going to make a separate table for every single kind of product... that will quickly become unmanageable if you expect to have hundreds or thousands of items. Consider having a "Product" table that has columns that share attributes, such as price, across all the products. It would have a ProductType column that either references a lookup table or just puts the types directly in the column. Then have either a separate key/value table to cover other random things like bolt capacity. Or even consider putting that in an xml/json/blob column to cover all the extra bits of info.

Merging tables with common attributes

I am designing a database that contains JOBSEEKERS who can be matched to VACANCIES. I am looking for an effecient and good way to store the common attributes between the 2 (There's a lot). For example a JOBSEEKER has skills and a VACANCY has required skills; or a JOBSEEKER has a salary requirement and a VACANCY has a salary offer.
Right now I am considering two options:
Storing all the attributes or each table in their own table.
Creating another table that contains the common attributes. Each row would represent the attributes for either a VACANCY or JOBSEEKER. I would then link each record to either a VACANCY or a JOBSEEKER.
Which way should is the correct way of going about this? Other suggestions are also welcome.
JobSeeker and Vacancy are two separate entities. In most cases, you would store the values in separate tables with separate columns. Although they have overlapping attributes, they have many attributes that are not common.
The, use application code logic (often implemented in SQL) to match between the two.
For something like skills, you actually want junction tables: JobseekerSkills and VacancySkills to list each of the skills. These would, in turn, reference another table Skills to ensure that the skills are common between the two entities.

Products database design for product lines, categories, manufacturers, related software, product attributes, etc

I am redeveloping the front end and database for a medium size products database so that it can support categories/subcategories, product lines, manufacturers, supported software and product attributes. Right now there is only a products table. There will be pages for products by line, by category/subcategory, by manufacturer, by supported software (optional). Each page will have additional filtering based on the other classifications.
Categories/Subcategories (multi level)
Products and product lines can be assigned to multiple category trees. Up to 5 levels deep should be supported.
Product lines (single level)
Groups of products. Product can only be in single product line.
Manufacturers (single level)
Products and product lines can be assigned to single manufacturer.
Supported software (single level)
Certain products only work with one or more softwares, so a product/line can be assigned to none, one or more softwares.
Attribues (type / options - could be treated so each type is a category and items are children)
Products and product lines can be assigned attributes (eg - color > red / blue / green). Attributes should be able to be assigned to one or more categories.
Since all these items are basically types of subcategories, do I put them all together in a master table OR split them into separate tables for each one?
Master table idea:
ClassificationTypes (product line, category/sub, manufacturer, software, attribute would all be types)
-TypeID
-Name
Classifications
-ClassID
-TypeID
-ParentClassID
-Name
ClassificationsProductsAssociations
-ProductID
-ClassID
I would still need at least one more table to link types together (eg - to link attributes to a category) and a way to link product lines to various types.
If I go with a table for each type it can get messy quick and I will still need a way to link everything together.
Multiple table setup:
Categories
-CategoryID
-Name
-ParentCategoryID
CategoriesAssociations
-CategoryID
-ProductID
-ProductLineID ?
Attributes
-AttributeID
-Name
-ParentAttributeID (use this as the parent would be "color" and child would be "red")
AttributesAssociations
-AttributeID
-ProductID
-CategoryID (do I also need to link the category to the parent attribute?)
CompatibleSoftware
-SoftwareID
-Name
CompatibleSoftwareAssociations
-SoftwareID
-ProductID
-ProductLineID ?
Manufacturers
-ManufacturerID
-Name
ProductLines
-ProductLineID
-ManufacturerID
-Name
Products
-ProductID
-ProductLineID
-ManufacturerID
-Name
Other option for associations is to have a single associations table to link the tables above:
Master Associations
-ProductID
-ProductLineID
-ManufacturerID
-CategoryID
-SoftwareID
-AttributeID
What is the best solution?
Go for multiple tables, it makes the design more obvious and more extensible, in my opinion. While it may fit your solution now, further changes may be more difficult.
I agree to Paddy. It makes your life easier in the future and you are much more flexible. You might want to put in stock control and other stuff. To link everything together use the id's (integer) parent/child of the tables.
I think multiple tables is the way to go, but to really know, do this: Flesh out the design for both ways and then take a sample of 5-10 products.
Populate the tables in both designs for the 5-10 products.
Now start writing the queries for both ways. You will start to see which is easier to write (the single table I bet), and you might find cases that only work in one design (the multi-table I bet.)
When you are done you have not lost the work -- you can use the table schema to move forward and some of your queries will already be written.
If you get to a query that does not make sense, seems to complicated, or such you can post it here and get feed back -- having real code always gets better comments.
Just wanted to post my decision and since I was not satisfied with any of the answers provided, I have elected to answer my own question.
I ended up setting up a a single set of tables:
Classification Types (eg - product lines, categories, manufacturers, etc)
Classifications (supporting parent/child adjacency list, nested sets, and materialized path all at once in order to take advantage of strengths of each. I have a SQL CTE that can populate all the fields in one go when the data changes)
Classifications Relations (with ability to relate products to classifications, relate classifications to other classifications and also relate classifications to other types)
I will admit that the solution is not 100% normalized, but this setup gives me ultimate flexibility to expand by creating new types and is very powerful and easy to query.