Faceted search in PostgresQL - sql

Working with a Postgres database for an ecommerce website and we'd like to implement a faceted search.
I did some reading and appears to me that relational databases are not the best fit.
We don't really have the resources to implement something like Lucene currently, what patterns would be our best bet to start with something simple that we could later migrate to?
It is highly desirable that it works with 1mm or more records.
The facets will be collected from 3 different tables ( brands, attributes, properties ).
Here's a basic overview of the tables that we currently have, related to the facets.
attribute_groups - id, name (ex: Color)
attributes - id, name (ex. Red), attribute_group_id
property_groups - id, name (ex. Material) property_group_id
properties - id, name (ex. Wood )
products - id, brand_id ...
product_variants - id, product_id
product_variants_to_attributes - product_variant_id, attribute_id
products_to_properties - product_id, property_id
brands - id, name
Initial thoughts
One of my first thoughts would be to send a database query that collects all of the facets from different tables with UNION ALL, which would returns something to the lines of
id
Facet
Value
Type
5
Brand
Adidas
brand
12
Brand
Puma
brand
6
Color
Blue
attribute
What worries me about this approach is that I have multiple tables to join in order to assemble the facets.
Attributes route
products -> product_variants -> product_variants_to_attributes -> attributes -> attribute_groups
Properties route
products -> products_to_properties -> properties -> property_groups
I think its safe to assume that this will not be very performant especially on one of the hottest queries of the system and because of the many different options, it would be pretty hard to cache it efficiently.
Are there any patterns for indexing the data within the context of Postgres, like adding tables for associations between a filter, attributes and products?
Its desirable for the filter to work in a way that, if I set a product.price => 100 on the products query, I would like to get back facets only for the products that match that condition.
What are my best bets?
Here's an image from google for the desired result.

Related

How do i continue this database? (linking characteristics with predefined values to categories)

I'm struggling to understand how i need to do this. So my problem: I'm supposed to allow someone to sell a product on a website. Before selling, he has to chose a certain category. Each category has different characteristics that could be marked, and those characteristics are entirely dependent on the chosen category. The values of those characteristics are predefined, and are already put in the database.
My question now is how do i go on about this? How do i link those characteristics to the chosen category, and how do i link the different amounts of predefined values to those specific characteristics?
example:
category: keyboard
characteristics: condition (dropdown), keyboard layout(dropdown), extra options(multiple choice)
condition has 3 options: new, as good as new, used
keyboard layout has 2 options: qwerty, azerty
extra options is multiple choice, has 3 options: gaming keyboard, wireless, 60%
second example:
category: laptop
characteristics: condition (dropdown), refresh rate(dropdown)
condition has 3 options: new, as good as new, used
refresh rate has 5 options: 50hz, 60hz, 120hz, 144hz, 240hz
Now i would have to make this work in my database, but i can't even figure it out on a relational database diagram.
Any form of help would certainly be appreciated!
I would distribute fields like this:
CATEGORIES (keyboard, laptop)
id
name
ATTRIBUTES (refresh_rate, layout)
id
name
FEATURES (50hz, 60hz, qwerty, etc)
id
attribute_id
name
CATEGORIES_ATTRIBUTES
id
category_id
attribute_id
PRODUCTS
id
name
category_id
condition (could be an enum, I put it here as every product has a condition)
PRODUCT_FEATURES
product_id
attribute_id (redundant but it can save you a join when making queries)
feature_id
Cheers!

Most appropriate way to store/retrieve User Input in a eCommerce iOS application?

I'm a bit confused with Sqlite, Core Data, NSUserDefaultsand PropertyList. I know what is what, but not a very clear idea of about where to appropriately use them.
I know that there are lots of tutorials, but I'm good at learning through situation based understanding. So kindly do help me to understand this in the situation that I'm facing right now and to make use of the available options wisely.
I'm working on an ECommerce iOS (native) application, where I'm highly dependent on API's for data display. Now I'm in need of recording user's review for a product and send it over through an API.
ie. I have three components, rating title, rating value(for that title) and another rating title ID. I'm defining with an example, I need to store multiple rows with details,
Components Data to be stored
**Title** - Quality | Value | Price
| |
**Rating** - 2 | 3 | 1
| |
**TitleID** - 10 | 11 | 12
Like this, there will be so many entries, i.e, the number of components differs for various users, for some users, there might be more than three components, which must be saved & send through an API. So how should I save these data? which is the RIGHT way to save these data temporarily?
If I understand you correctly, as vaibhav implied your question seems pretty general and probably relates more to structuring your data to fit your requirements than to technical aspects of the iOS / CoreData environment. In that vein, I’ll offer a few thoughts I’d have in structuring a data structure for quality ratings per your description.
If your ratings will always be for the three categories you show, i.e. Quality, Value and Price, I wouldn’t over-complicate things; I’d just use three properties in a rating record to hold the values that a user assigns in his/her rating of a product (just showing selected attributes and relationships in all following lists):
Product
name
Rating
ratedProduct (many to one)
qualityRating Int
valueRating Int
priceRating Int
Done this way you’d need to associate the values with their types in code for the APIs, such as (where item is a retrieved rating record):
display(product: item.ratedProduct.name, quality: item.qualityRating, value: item.valueRating, price: item.priceRating).
On the other hand, you may be describing a more generic approach that would allow for ratings categories that vary more frequently, or perhaps vary among products. This could apply where, for example, ratings include how well things fit for clothing but not for other products like books. In that case, you’d need a more complicated structure where a product could have a variable number of ratings of different types, so you’d need another layer of entities that let you create an arbitrary number of rating records that applied to a product.
Here you'd create a separate rating record for each rating that a user assigned to a product.
The simplest form of that structure would be like the following:
Product
name String
UserEvaluation
ratedProduct (many to one)
productRating (one to many)
ProductRating
ratingType (many to one)
value Int
RatingType
ratingTitle String
ratingID String or Int
Then you’d have to have a bit more structure where you'd list the product and then access the ratings with a loop that cycled through the set of all of the ratings linked to the product record somewhat like this (where item is a retrieved UserEvaluation):
displayTitle(product: item.ratedProduct.name)
for rating in item.productRating {
displayRating(ratingTitle: item.productRating.ratingType.title, ratingValue: item.productRating.value)
}
You'd probably want to combine these into a method that takes the name and an array of ratings.
To keep track of things, you’d also probably want to create another entity that defined product classes and specified what specialized ratings applied to each class (like fit for clothing and mileage for cars). By default, you also may want to allow for a few generic rating types that apply to all products (like the quality and price ratings you show). For this approach, the full structure would look like this:
Product Category
title
ratingType (many to many)
Product
productType (many to one)
UserEvaluation
ratedProduct (many to one)
productRating (one to many)
ProductRating
ratingType (many to one)
value Int
RatingType
ratingTitle String
ratingID String or Int
With this structure, once a product is assigned a productType, the application would know what ratings to ask for in the UI.
You could try building more complicated rating records with all of the types that apply to a product category, but that would get very messy if the applicable categories vary over time. You could also create a "custom" rating type that let a user specify a title and input a rating, in which case you'd need a text field in the rating record that only applies if the ratingType is "custom".
I hope this helps…

How to work with map property in RQL (Oracle's ATG Web Commerce)

We use Oracle's ATG Web Commerce for our project. And currently we need construct RQL query which obtain products which SKU's tacticalTradeStatuses contains certain status and ordered by status value.
I briefly describe the relationship between entities: Product item descriptor contains list of SKUs. Each SKU contains map tacticalTradeStatuses (key - tactical trade status, value - sequense)
For example, how to obtain all products which SKU's tacticalTradeStatuses property contains key 'BEST_SELLER' and ordered by value associated with key 'BEST_SELLER'.
Key by which we want to select products we want to pass as RQL parameter.
i have two ways to doing that
1) first create a query which fetches all the product based on map key BEST_SELLER
2) Now pass it to foreach droplet and add sort properties. which help to sort the result based on your requirements
for sorting please refer to below link
http://docs.oracle.com/cd/E23095_01/Platform.93/PageDevGuide/html/s1316foreach01.html
2 way i think is to use query options in RQLStatement.. which work same as sort properties in for each
If you provide some XML Repository structure that will be good..hope it will help you

How to do car search like autoscout24.de with / without SQL?

I am interested in the implementation of the search engine in autoscout24.de. It is a platform where you can sell/buy cars. Every car advert has properties: make, price, kilometers, color, etc. (in sum over 50 different properties) that can be searched for.
I am specifically interested in the detail search that works like this: every possible property is displayed on the page. In brackets behind each property there is the number of cars that will match the new search if the property is selected.
Example: I'll start with empty search criterias.
Property make:
BMW (100.000)
Volkswagen (200.000)
Ford (150.000)
...
Property color:
black (210.000)
silver (50.000)
white (100.000)
...
and so on for the other properties.
I'd like to know:
How would you implement this kind of search with SQL?
How would you implement it with an in-memory data structure?
Range queries should be supported, too (all cars with price from X to Y)
Update:
The numbers in brackets show the number of results after the addition of the search criteria. So it changes each time a property is added / removed...
So a naive algorithm would work like this:
find all cars with current search criteria (e.g. make Ford)
for each property do: find all cars that matches previous search criteria ("Ford") AND the search criteria for the chosen property. Write the count in brackets behind the property.
This algorithm is naive because it would execute 1 + N queries (N=#properties). Nobody wants to do that ;-)
I believe that this is referred to as "faceted search". The Apache Solr project might be worth looking at.
It's a basic code
Create a result object with one counter for each property that the cars have
Check all cars one by one, if the car match the filter then add one to each of the numbers
...But it's blasting fast !
I think they do it on several computers, shreading data across them. Each computer compute 5% of the data and send the result to the front computer wich sum all counts.
There are tools for that : look for "map reduce", "elastic search", "strom"...
Have a properties table:
+Properties
id
title
value
count
The count field allows you to "earn" an extra query , so instead of checking how much cars have a certain property , you can just update this field when adding new cars.
Example of rows in this table:
1 'color' 'white' 1000
2 'color' 'black' 122
3 'km' '5000' 1233
4 'km' '30000' 54
And for the cars table , for each property add a field.
+Cars
id
color
km
and the color and km fields will hold the ID's of the property's row in the Properies table.
EDIT: if you're planning not to use mysql db , you might consider using XML files to contain the properties data. But once again, you should update its count value anytime you add / remove or update a car.
<Properties>
<Property>
<Type>Color</Type>
<Value>White</Value>
<Count>1000</Count>
</Property>
</Properties>

What sort of database design would I need to use in case I wanted users to save tags, and be able to call already used tags?

I'm trying to implement a feature similar to StackOverflow's tag feature. That a user can create a new tag, or by typing pull up a list of similar tags already created.
This is such a wonderful feature on this site and I find it sad that most sites do not have something like this. It's both robust, and yet very very flexible and best of all: driven by the community.
So I have these two tables:
Company
id
email
name
companySize
countryOfOrigin
industryid
Industry
id
description
Every time a user writes a new tag, I want to create one with a unique ID, and also be able to search for existing tags.
Will this database design allow for an easy and efficient implementation of this feature?
If not, please give a little guidance. :)
Whilst there's not a tremendous amount of information to go on, what you've listed should be fine. (The 'tag' being the 'description' field in the industry table, etc.)
As you might imagine, all of the real work is done outside of SQL, where you'll need to...
(Potentially) add new tag(s) that don't yet exist.
Associate the industry with the supplied tag(s).
(Potentially) prune previously used tags that may no longer be in use.
...every time you edit an industry.
That said, the key limitation of your proposed setup is that each company can only belong to a single industry. (i.e.: It can only have a single industry tag associated with it.)
As such, you might want to consider a schema along the lines of...
Company
id
...
countryOfOrigin
Industries
id
description
CompanyIndustriesLookup
companyID
industryID
...which would let you associate multiple industries/tags with a given company.
Update...
For example, under this setup, to get all of the tags associated with company ID 1, you'd use...
SELECT Industries.description FROM (CompanyIndustriesLookup, Industries)
WHERE companyID=1 AND industryID=Industries.ID
ORDER BY Industries.description ASC;
On a similar basis, to get all companies tagged with an industry of "testing", you'd use...
SELECT Company.name FROM (Company, Industries, CompanyIndustriesLookup)
WHERE Company.id=CompanyIndustriesLookup.companyID
AND Industries.id=CompanyIndustriesLookup.industryID
AND Industries.description="testing"
ORDER BY Company.name ASC
A very easy (if somewhat suboptimal, but it often does not matter) solution to use tags is to not have tag ids at all. So, you have:
Items
ItemId
Name
Description
...
ItemTag
ItemId
Tag
Adding a tag to an item is just adding the tuple to the ItemTag table, whether the tag already exists or not. And you don't have to do any bookkeeping on removing tags either. Just keep an index on ItemTag.Tag, to be able to quickly display all unique tags.