Product catalogue storage in mongoDB from an RDBMS perspective - sql

I have a product page with an URL of the form http://host/products/{id}/{seo-friendly-url}, where the "seo-friendly-url" part may be something like category/subcategory/product.
The products controller gets the product with the specified ID and then ensures that the URL that follows is correct for the product - if it isn't the user is redirected to the appropriate URL (all URLs in the shop are generated correctly though, the redirect is just to maintain a canonical URL in the case of mistyping by the user or the URL changing since Google crawled it etc). The ID ensures fast product look-up, and the part on the end ensures keywords make it into the URL.
To check the URL, I have a SQL view which utilises a recursive common table expression to concatenate the product URL chunk with the URLs of its parent category URLs all the way up the hierarchy (generally just 3 deep).
I've recently came across document oriented storage and I can see it being very useful in a variety of situations (for example, my product entities have tags and multibuy prices and attributes etc all in different tables currently).
So on to my question - how can I achieve the above functionality in mongoDB, or is there a better way to think about it? The naive way would be to retrieve each category in the hierarchy individually, but I'm assuming that would be slow.
Related: I've read in the docs that skip/limit for is slow for large result sets - would this be noticeable for the maximum of say 10 pages of 25 products each likely to be present in a retail website category?

I think your best option is to just store the full slug with the product. Then when you get the product just check to see if the slug matches and if not, redirect. Now, the trade-off is that if you want to rename a category you will need to do a batch job to find all products in the category and change their slugs. The good news is that category renames will be much less common than views (hopefully) so your total load will be reduced.
Not sure how skip and limit are related to this question, except that they both involve mongodb. Anyway for 25 results it's really no problem. Limit isn't slow and in fact can speed things up if less than 100 (default first batch size). Skip can hurt performance, but only by making it as slow as if you fetched all skipped documents w/o the extra network traffic. Therefore I wouldn't skip 1 million docs, but skipping 100 would be fine.

You can model a collection called products, with the document like:
product:{id:someId,category:someCategory,subcategory:someSubCategory,productSlug:somenameslug}
The query to get the product given the id, category and subcategory would be something like:
db.products.find({id:123,category:cat,subCategory:subcat})
This sounds pretty simpleton but given my understanding of your question IMO this should be a good start.
For your other question, there are skip and limit modifiers to help with pagination.

Related

VirtoCommerce API getting item prices

I am using VirtoCommerce 2.9 and have some questions regarding the API and what would be the best way to get all the information I need, while keeping the number of API requests down.
Right now I am using the endpoint /api/catalog/search to find items that matches a number of attributes. But the response does not include prices and product texts. Both I would like to present to the end user. What would be the correct or best way to retrieve this information?
Thanks!
Cheers!
Currently search service does not return the description and price for the products.
To get this details you need to use separate queries
api/catalog/product/ids?respGroup='ItemSmall'
to get product detail with description and
api/pricing/evaluate
to retrieve actual products prices. You can call them in parallel for better performance.
Be aware to use WithProperties response group because it may cause
perfomance problem. Anyway product returned with all properties values
and this 'response group' is only responsible for retrieving properties meta-information
(as possible dictionary values, multilingual, required or optional flag etc) this information often used in admin area and in storefront almost not used.
Indexed search module will be serious changed in future versions, and you will be able to have more control over the product details in the search index.

REST API - Reduce number of POSTS operations

I have designed some API which have some nested resources and I am wondering how to reduce the number of POSTS when I am creating some records.
for example, I have the following resources:
/orders/
and
/orders/{order_id}/products/
at the moment I need to run two POST separately if I need to create a new order or a new order's product but I would like to reduce the time for this and run only one POST.
Is this possible? is there any documentation I can read about this?
Thank you
Although you might have found your answer in an other thread there is still some issue regarding your endpoint design.
The first intuition that your endpoint give is that product resource could exist in several place.
./orders/{order_id}/products/{prod_id}
./products/{prod_id}
The question you should ask yourself is: Do you really want to refer to product?
Can product be leaving outside of any orders?
Having a resource sitting in 2 different place might not be that great as you are managing 2 different endpoint with similar behavior. Keeping consistency between both endpoint is not that easy.
My 2 cent is to avoid the term product as it can be confused with a single instance of a product. For example if you sell a toothbrush branded AAA, sku 1234 an order is not compose by this product but by one off the item that you have in stock. The item is "instance" of the toothbrush branded AAA, sku 1234.
As I understand your question you are not really referring to a product but more to a stock-item which should be a unique id.
The resource stock-item if you decide to have one should exist prior to the order. I guess the customer is not adding item to your stock and at the same time purchasing this item.
In conclusion I think that you are not creating the stock-item resource at all when creating orders but just making a reference to it.

Will I get penalty if my last section of URL will repeat?

I've made system very similar to this site where site.com/item/{id}/{name}. Now I wonder if my {name} will repeat itself like in this example
site.com/item/1/something
site.com/item/2/something
site.com/item/3/something
Will I be banned from google bots for this or it is completely normal?
Shouldn't the ID specify a unique row in your database? In that case, there may be instances where name is similar to others, but probably not ever duplicated.
If it were duplicated, however, then yes this would be duplicated content and wouldn't be good for your website. You won't get 'banned from Google bots,' but the pages would have a low score in SERPs. Think of it as a filter, not a penalty.
I will get penalty if I will use /{id}/{name} if the name will repeat itself.
What I did is /{id}-{name}. This way it will never repeat itself and I will be able to find it.

Is there a way to get details of the product from single insudtry?

I want to maintain a database of all the products or the brands with respect to industry.
For example I need to get information about all the food supplements. How can I get them?
I am not sure all the companies have an API for their products.
Please advise
Uhm,... what kind of information? If you need prices, you can probably get information from goverment sources. At least you can here in Argentina. Other than that, I don't think it's possible, unless you somehow manage to scrape websites of all the brands you want to track.
Speaking as someone who has worked for two data-aggregation companies, aggregating data involves a lot of manual work. You find the sources, you automate the acquisition of data as best you can (APIs, file downloads and imports, even screen scraping from HTML pages), and you stay on top of it constantly. You're always looking for additional sources, updating code for sources that have changed, minding legal implications of sources who don't want you to harvest their data, etc.
Sometimes you have to buy the data, or weigh that cost against not having data from that source or scraping it manually. Sometimes a source will block you in some way and you need to either try to get around that or negotiate some terms with them. It's a viable business model, but it's not cheap.
For some products, Retailigence ( http://www.retailigence.com ) may have data in API form. They basically keep track of local stores' inventory and pricing for certain categories of products.
You should definitely check out Good Guide - an API that gives you access to details on over 60,000 household products.
http://developer.goodguide.com
DailyMed is a good service to check out if you're interested in products in the medical space.
http://dailymed.nlm.nih.gov/dailymed

Good URL strategy for sitemap and SEO

I run a site where users have their own profile pages. They are also able to post products for sale (that they have made) and write/import blog posts. I am going to be implementing a sitemap and I need to make a final decision with the URL strategy.
Here's what I currently have for products (where 1234 is the product ID that I use to lookup that product):
N.B "product" is a fixed string (although it's another word in the actual site) - all others are dynamic depending on the item.
example.com/product/1234.product-category.product-name
should I change to any of these? i.e:
example.com/maker/users_name/product-category/product-name/1234
example.com/product/product-category/product-name/1234
example.com/product/1234/product-category/product-name
The main items for consideration are:
Where should the product ID go in the URL? Both in terms of readability by the user but also in SEO terms
Should I include the user's name (as he/she made that product) ?
Should I attempt to remove the ID altogether?
I think the first example (example.com/product/1234.product-category.product-name) is the best format but I would consider changing then "." to "-". I am just thinking that if somehow a product name ends in something that triggers an different handler on your server like ".php" or ".jsp" you might have some undesired effects.
Where should the product ID go in the URL? Both in terms of readability by the user but also in SEO terms
I don't really think it matters too much where the product ID goes but as far as the user reading it, I think they pay attention to the end of the line so I would put the ID first leaving the most descriptive part (the product name) at the end.
Should I include the user's name (as he/she made that product) ?
Not sure if you allow your users to change user names, but if you did I would leave the user name out. An example would be someone getting married and changing their last name. This would hurt your SEO since the URL would change but search engines would have already indexed it the old way. You'd have to put some permantent redirects in place to handle this which could be avoided by just leaving the username out.
Should I attempt to remove the ID altogether?
You should leave the ID in the URL in the event that two products have the same name and your algorithm to generate the URL creates a duplicate link.
I prefer this:
example.com/users_name/product-category/product-name/1234
However, one should be aware that the url gets too long some times. It is difficult to represent or promote in a blog or a forum. Why not simply use
example.com/1234 and use the Title to put the other details like category and product name?
Now a days, I think search engines are getting smarter and short urls are used more and more.