Multiple values in a single column for faceted search on Amazon CloudSearch supported? - amazon-cloudsearch

I am looking at Amazon CloudSearch, and am just concerned about having multiple values per column, and if it would be considered as an individual facet by the CloudSearch. What I mean is, if I have a book (1 row) and it has multiple authors, but only one author field, how can the faceted search return each individual author as a separate facet? It wouldn't be that practical to have to set-up a hard-coded set of author fields (ie.. author1,author2,author3) so I'm wondering if it's something built in?
I don't see it being supported, but then again I don't know everything. The way I see it it can accept a CSV value of some sort, or XML?
An example of what I mean is like if I had this data set stored on CloudSearch:
title | author
I am a book | Bob Jones, Mike Miller
But these would be the facets returned:
author
-Bob Jones
-Mike Miller
Any way to achieve something like this?

A facet does precisely what you ask, but you can't return a facet.
So if you created your author field as a facet and you searched title for "book" and returned the facet "author"
/search?q=book&facet=author
you would get back facet with: Bob Jones (1) and Mike Miller (1) with 1 hits. Indicating that there is one book for Bob and one for Mike (in your case the same book).
BUT what is missing is the "authors" field, your facet would in no way show that the both authors belong to the same book. Neither could you display the authors with the book, unless you add an extra field that list the authors in one string or author1 author2.
As for how to insert multiple values for one field, an array, I use the method of posting a json object to /documents/batch. The author field would simply have an array with one or more values (an empty array is not allowed, an empty string is, a book with no authors must be an empty string, not an empty array in your json object).

Related

Searching a word on all fields of an index with solr 8.9

I'm fetching some datas from a sql database using the DIH of Solr. I created a field all as this :
and I would like to be able to use it to search on all fields thought it. so like if I do he query "John" it would match with a title and a author name.
Actually I have a problem, when I do a query on the all field it only works on a perfect match.
For exemple, if I search name:lub it returns
"name":"CR2/LUB/ Lub oil pump",
"all":["1706443412665794562",
"2165C92A-D107-48A6-A410-08D92AA77517",
"CR2/BER/CRACK/LUB/OT/10-PU-200C",
"CR2/LUB/ Lub oil pump"],
Which is good
But if I search all:lub the response show :
"numFound":0,"start":0,"numFoundExact":true,"docs.
The ultimate goal being to be able to use a word to search on all fields, and to ponderate the weight of the different fields.
Like, if someone search John for books it finds it in the title , and in the author fields (by looking in the all) and then ponderate, by making the title more important and viewing in the response the score of each document
Thanks in advance for your advice!

What is most efficient way to find ‘inverse' of getting all records that match particular criteria

I am trying to find the most efficient way to find ‘inverse' of getting all records that match particular criteria
I.e. find all predefined criteria from a set that a particular record matches
I have a table of 'target' criteria that has many records - each built using a querybuilder javascript component - so each target record has its criteria stored as a json string in a field.
I also have a standard 'person' table
It is straight forward to query how many people fit a particular target.
What I am trying to do is get all targets that match a particular person
Is there a more efficient way than just running each target's criteria against a person?
Open to suggestions beyond just sql - e.g. caching , hashing or building up some kind of lookup table/file
Edit:
Hopefully tables below clarify this issue. If I parsed and ran the 'Good Eyesight' target criteria I would expect to return both Bob and Sue
But I want to know that Bob matches the 'Young People' and 'Good Eyesight' target. I will have thousands of users and probably up to 50 active targets.
Table 1: Person
ID Name Age Fav_Vegetable
---------------------------------
1 Bob 20 Carrot
2 Sue 40 Carrot
Table 2: Target
ID Name Criteria_JSON
---------------------------------
1 Young People {"rule": "young_age", "selectedOperator": "<","selectedOperand": "Age","value": "30"}
2 Old People {"rule": "old_age", "selectedOperator": ">","selectedOperand": "Age","value": "30"}
3 Good Eyesight {"rule": "vegetable","selectedOperator": "equals","selectedOperand": "Fav_Vegetable","value": "Carrot"}
The answer I have come up with is to run all targets against all people and maintain an index type table of the results.
i.e. have a table TargetIndex with columns targetId, personId
Then when I need to know the targets for a particular person I can just check against the TargetIndex table rather than rerunning queries.
Obviously these results would need to be refreshed as the target or people records change - - probably whenever a target is added/edited and refreshed periodically (hourly/nightly?) to pick up changes in people
Thanks for people's thoughts

Elasticsearch Search by popular names

As many cities are known by their common/popular names like Madras is still used to search instead of chennai, same for gurgaon. I have cityindex where I have stored documents containing city info, I wan't to search for documents based on their common names. What should be the best approach for the same ?

How can I pull and concactenate a Wikipedia table with individual article data

I'm looking to pull together a full list of the current FTSE 100 constituents with the addition of a column highlighting when the company was founded.
Each wiki info box for the individual companies within a table contains the founder date. I'm struggling to work out the function in sparql utilising dbpedia to take the existing ftse 100 table.

How to do car search like autoscout24.de with / without SQL?

I am interested in the implementation of the search engine in autoscout24.de. It is a platform where you can sell/buy cars. Every car advert has properties: make, price, kilometers, color, etc. (in sum over 50 different properties) that can be searched for.
I am specifically interested in the detail search that works like this: every possible property is displayed on the page. In brackets behind each property there is the number of cars that will match the new search if the property is selected.
Example: I'll start with empty search criterias.
Property make:
BMW (100.000)
Volkswagen (200.000)
Ford (150.000)
...
Property color:
black (210.000)
silver (50.000)
white (100.000)
...
and so on for the other properties.
I'd like to know:
How would you implement this kind of search with SQL?
How would you implement it with an in-memory data structure?
Range queries should be supported, too (all cars with price from X to Y)
Update:
The numbers in brackets show the number of results after the addition of the search criteria. So it changes each time a property is added / removed...
So a naive algorithm would work like this:
find all cars with current search criteria (e.g. make Ford)
for each property do: find all cars that matches previous search criteria ("Ford") AND the search criteria for the chosen property. Write the count in brackets behind the property.
This algorithm is naive because it would execute 1 + N queries (N=#properties). Nobody wants to do that ;-)
I believe that this is referred to as "faceted search". The Apache Solr project might be worth looking at.
It's a basic code
Create a result object with one counter for each property that the cars have
Check all cars one by one, if the car match the filter then add one to each of the numbers
...But it's blasting fast !
I think they do it on several computers, shreading data across them. Each computer compute 5% of the data and send the result to the front computer wich sum all counts.
There are tools for that : look for "map reduce", "elastic search", "strom"...
Have a properties table:
+Properties
id
title
value
count
The count field allows you to "earn" an extra query , so instead of checking how much cars have a certain property , you can just update this field when adding new cars.
Example of rows in this table:
1 'color' 'white' 1000
2 'color' 'black' 122
3 'km' '5000' 1233
4 'km' '30000' 54
And for the cars table , for each property add a field.
+Cars
id
color
km
and the color and km fields will hold the ID's of the property's row in the Properies table.
EDIT: if you're planning not to use mysql db , you might consider using XML files to contain the properties data. But once again, you should update its count value anytime you add / remove or update a car.
<Properties>
<Property>
<Type>Color</Type>
<Value>White</Value>
<Count>1000</Count>
</Property>
</Properties>