Azure Search how to choose query boosting values - lucene

I am implementing search on a website and have 3 types of document; products, articles and content. products may be active (discontinued=0) or discontinued (discontinued=1). articles and content are always indexed as discontinued=0, though this info is ignored for those document types.
I want to apply boosting to my query to get the following ranking (order) but without applying an ordering clause:
active products
discontinued products
articles
general content
Ignoring articles and content, I get the correct ordering when I boost discontinued products by 1 and active products by 1000 (I cannot use negative boost values as my underlying search system is Azure search).
I tried to boost my "contentType" field by 100000 for products, 50000 for articles and 10000 for content, but this breaks the ordering of active/discontinued products (although the split between, products, articles and content is otherwise correct).
How do I go about calculating/choosing the correct boost values for this scenario?

Related

Shopware API search products endpoint total from a stream inconsistent

I have two dynamic product groups
First: Test Product with variants
Conditions: Product Is equal to Variant product
Result total 7 like I expect this
Second: Active Products
Conditions: Active yes
we allready see that the stream ids are just set to 5 products
Now we get a total of 5 instead of 15 products like expected?
Why is it inconsistent, and how can I modify my request to consider also the variants?
You shouldn't rely on the stream_ids column as an indicator which product is shown in a dynamic product group at any given moment. This is because there are multiple more things that factor into whether a product is shown to a user in a dynamic product group.
The filters you define for the group resolve to an SQL query, which in simplified terms would yield something like WHERE active = 1 AND id IN ('...', '...'). So the stream_ids column isn't used to select the contents of a group, but the entire query including all filters is executed in the storefront request. The result of that query is what you see in the preview of the dynamic product group.
Why doesn't it correlate completely with the content of stream_ids?
Shopware features inheritance of fields. If fields of a variant haven't been assigned a value, they may inherit that value from their parents. This may not be reflected in the contents of stream_ids. In fact the children/variants may even inherit the contents of stream_ids.
Then there's the fact that contents of the product group may vary, depending on the current sales channel. That may be because the sales channel features a different language, hence the content of a translatable field used in a filter may vary. Also if you use price filters, there is the possibility of products with multiple prices, which might only be shown if certain conditions are met, defined by the rule builder.
In short, don't count on the stream_ids, which can't reflect all these variables but are used in some capacity internally, for invalidating caches and such. Instead use the preview to judge what the average user might find when they see a product group. There's also the possibility to choose which sales channel the preview should apply to, for the exact reason, that contents may differ depending on the sales channel.

Faceted search in PostgresQL

Working with a Postgres database for an ecommerce website and we'd like to implement a faceted search.
I did some reading and appears to me that relational databases are not the best fit.
We don't really have the resources to implement something like Lucene currently, what patterns would be our best bet to start with something simple that we could later migrate to?
It is highly desirable that it works with 1mm or more records.
The facets will be collected from 3 different tables ( brands, attributes, properties ).
Here's a basic overview of the tables that we currently have, related to the facets.
attribute_groups - id, name (ex: Color)
attributes - id, name (ex. Red), attribute_group_id
property_groups - id, name (ex. Material) property_group_id
properties - id, name (ex. Wood )
products - id, brand_id ...
product_variants - id, product_id
product_variants_to_attributes - product_variant_id, attribute_id
products_to_properties - product_id, property_id
brands - id, name
Initial thoughts
One of my first thoughts would be to send a database query that collects all of the facets from different tables with UNION ALL, which would returns something to the lines of
id
Facet
Value
Type
5
Brand
Adidas
brand
12
Brand
Puma
brand
6
Color
Blue
attribute
What worries me about this approach is that I have multiple tables to join in order to assemble the facets.
Attributes route
products -> product_variants -> product_variants_to_attributes -> attributes -> attribute_groups
Properties route
products -> products_to_properties -> properties -> property_groups
I think its safe to assume that this will not be very performant especially on one of the hottest queries of the system and because of the many different options, it would be pretty hard to cache it efficiently.
Are there any patterns for indexing the data within the context of Postgres, like adding tables for associations between a filter, attributes and products?
Its desirable for the filter to work in a way that, if I set a product.price => 100 on the products query, I would like to get back facets only for the products that match that condition.
What are my best bets?
Here's an image from google for the desired result.

Elastic Search, Nest. functional sorting

I'm building a filter page, with facets etc, which works as it should.
Now the our customer has a request to, basically "Be able to decide which sorting the items comes out in".
Each product is decorated with a Product Display Order, and is in a Product Line.
We got these example Product Display Orders:
1. Featured Item
2. Core Item
3. Spare Part
4. Utility
And these Product Lines:
1. Hammers
2. Saw
3. Wood
and the sorting is like this:
Sorting should firstly be based on Product Display Orders, secondly by product lines, thirdly Alphabetically.
So all products which is a Featured Item is listed first, and all these Featured Items is then sorted by their product line, and if some product are in the same Featured Item and Product Line, then its alphabetically.
The challenge is: I can't just get the sorting of Product Display order items and product lines as a number on the product, i only got a name/id.
We've thought of Boosting based on if the product are in the different categories, but it seems a bit messy.
OR
See if it possible to have some logic in the Sorting.
Sort by productDisplayOrder:
1. featured, 2. core Item ...
Then by ProductLines:
1. Hammers, 2. Saw ...
Then by Name DESC.
Which way is the best way to have this sorting, is it possible to give this logic to elastic, if it is a match and then sort it. Or are we needed to twist the boosts of product?
Hopefully this makes sense for you.
Thanks in advance! :)
Option 1). Quickest/Best performing solution would be to create new/separate integer fields for productDisplayOrder and ProductLine and then use those in your sort criteria as described (after reindexing and validating the the data is indexed as expected).
Option 2) If you want more nuance than described (eg higher scoring matches can 'break through' the ordering ceiling described) then you can explore using a Function Score Query to implement a custom scoring strategy that takes productDisplayOrder and ProductLine into consideration in generating an overall match score.
Option 3). If you can't change the mapping and reindexing your data, you can use Script-Based Sorting to generate sorting values from the currently indexed productDisplayOrder/ProductLine text using a script (eg Groovy). Keep in mind that query performance will be worse than the first two options.

Searching products using details.name and details.value using the Best Buy API

The Best Buy Search allows to search products specifying a criterion on details.name and details.value fields.
http://api.remix.bestbuy.com/v1/products(details.name="Processor Speed" & details.value="2.4Ghz")?apiKey=YOURKEY
However details is a collection. The query above actually returns all products has a detail entry named "processor" and a detail entry whose value is "2.4Ghz" but not necessarily in the same details entry. Is there a way to create a query that will return only products for which those value and name are for the same details entry ?
Unfortunately there is no way to do this unless the particular detail you are interested in has been exposed as a top level attribute (processor speed has not). To accomplish this you will need to run your query as you have described, and then comb through the results and remove the irrelevant products in your own code.

Apache SOLR search by category

I am using apache-solr-1.4.1 and jdk1.6.0_14.
I have the following scenario.
I have 3 categories of data indexed in SOLR i.e. CITIES, STATES, COUNTRIES.
When I query data from SOLR I need the search result from SOLR based on the following criteria:
In a single query to SOLR I need data fetched from SOLR grouped by each category with a predefined results count for each category.
How can I specify this condition in SOLR?
I have tried to use SOLR Field Collapsing feature, but I am not able to get the desired output from SOLR.
Please suggest.
My solution is not exactly what you have asked but is my take on what SOLR does best, which is full text search. Instead of grouping the results by "category", I'd suggest you order the results by relevance score but also provide a facet count for the category values. In my experience users expect a "search" to behave like Google, with the best matches at the top. Deviating form this norm confuses the user in most cases.
If you want exactly as you have asked (actual results grouped by category) then you could use a relational database and do a group_by or write a custom function query with SOLR (I cannot advise on this as I've never done it).
More info: index the data with the appropriate fields, e.g. name, population, etc. But also add a field called "category", which would have a value of either CITIES, STATES or COUNTRIES. Then perform a standard SOLR search, which will return results in order of relevance - i.e. best matches at the top. As part of the request, you can specify a facet.field=category, which will return counts for the search results for each of the given categories (in the "facet" results section). In the UI you can then create links for each category facet which performs the original search plus &fq=category:CITIES, etc., thus restricting results to just that category. See the facetting overview on the SOLR wiki for more info.