Count and sub count in lucene - lucene

My fields in lucene are product_name, type and sub_types.
I am querying on type with abc, this results me in products whose type is abc.
This abc type products have sub_types as pqr and xyz.
I can get total count of the xyz type using TopScoreDocCollector.getTotalHits().
But I want to get the count of sub_types. ie. pqr and xyz.
How can I get it?
Any reply would be of great help for me.
Thanks in advance.

One way to do this is to create a filter based on your abc query, and then use that filter to constrain results for the sub-type queries.
IndexSearcher searcher = // searcher to use
int nDocs = 100; // number of docs to retrieve
QueryParser parser = // query parser to use
Query typeQuery = parser.parse("type:abc");
Filter f = CachingWrapperFilter(new QueryWrapperFilter(typeQuery));
Query subtypeQuery = parser.parse("sub_type:xyz");
TopDocs results = searcher.search(subtypeQuery, f, nDocs);
Another thought: if you know up-front which sub-type you're interested in, you can simply add both a type and a sub-type to the query: +type:abc +sub_type:xyz.
Finally, you might consider using Solr to index your data if you have these kinds of queries.

Related

Filter a Postgres Table based on multiple columns

I'm working on an shopping website. User selects multiple filters on and sends the request to backend which is in node.js and using postgres as DB.
So I want to search the required data in a single query.
I have a json object containing all the filters that user selected. I want to use them in postgres query and return to user the obtained results.
I have a postgres Table that contains a few products.
name Category Price
------------------------------
LOTR Books 50
Harry Potter Books 30
Iphone13 Mobile 1000
SJ8 Cameras 200
I want to filter the table using n number of filters in a single query.
I have to make it work for multiple filters such as the ones mentioned below. So I don't have to write multiple queries for different filters.
{ category: 'Books', price: '50' }
{ category: 'Books' }
{category : ['Books', 'Mobiles']}
I can query the table using
SELECT * FROM products WHERE category='Books' AND 'price'='100'
SELECT * FROM products WHERE category='Books'
SELECT * FROM products WHERE category='Books' OR category='Mobiles'
respectively.
But I want to write my query in such a way that it populates the Keys and Values dynamically. So I may not have to write separate query for every filter.
I have obtained the key and value pairs from the request.query and saved them
const params = req.query;
const keys: string = Object.keys(params).join(",")
const values: string[] = Object.values(params)
const indices = Object.keys(params).map((obj, i) => {
return "$" + (i + 1)
})
But I'm unable to pass them in the query in a correct manner.
Does anybody have a suggestion for me? I'd highly appreciate any help.
Thank you in advance.
This is not the way you filter data from a SQL database table.
You need to use the NodeJS pg driver to connect to the database, then write a SQL query. I recommend prepared statements.
A query would look like:
SELECT * FROM my_table WHERE price < ...
At least based on your question, to me, it is unclear why would want to do these manipulations in JavaScript, nor what you want to be accomplished really.

ScrollableResults size gives repeated value

I am working on application using hibernate and spring. I am trying to get count of result got by query by using ScrollableResults but as query contains lots of join(Inner joins), the result contains id repeated many times. this creates problem for ScrollableResults when i am using it to know total no of unique rows(or unique ids) returned from database. please help. Some part of code is below :
StringBuffer queryBuf = new StringBuffer("Some SQL query with lots of Joins");
Query query = getSession().createSQLQuery(queryBuf.toString());
query.setReadOnly(true);
ScrollableResults results = query.scroll();
if (results.isLast() == false)
results.last();
int total = results.getRowNumber() + 1;
logger.debug(">>>>>>TOTAL COUNT<<<<<< = {}", total);
It gives total count 1440 but actual unique rows in database is 504.
Thanks in Advance.
You can try
Integer count= ((Long)query.uniqueResult()).intValue();
Unfortunately, getRowNumber does not give you the size, or the number of results, but the current position in the results. ScrollableResults does not provide a way to get the number of results out-of-the-box.
I am referring to ScrollableResults Hibernate Version 5.4.
As a workaround, you can try
Long l_resultsCount = 0L;
while(results.next()) {
l_resultsCount++;
}
getRowNumber() gives the number of the current row.
Call last() and afterwards getRowNumber()+1 will give the total number of results.

search within an array with a condition

I have two array I'm trying to compare at many levels. Both have the same structure with 3 "columns.
The first column contains the polygon's ID, the second a area type, and the third, the percentage of each area type for a polygone.
So, for many rows, it will compare, for example, ID : 1 Type : aaa % : 100
But for some elements, I have many rows for the same ID. For example, I'll have ID 2, Type aaa, 25% --- ID 2, type bbb, 25% --- ID 2, type ccc, 50%. And in the second array, I'll have ID 2, Type aaa, 25% --- ID 2, type bbb, 10% --- ID 2, type eee, 38% --- ID 2, type fff, 27%.
here's a visual example..
So, my function has to compare these two array and send me an email if there are differences.
(I wont show you the real code because there are 811 lines). The first "if" condition is
if array1.id = array2.id Then
if array1.type = array2.type Then
if array1.percent = array2.percent Then
zone_verification = True
Else
zone_verification = False
The probleme is because there are more than 50 000 rows in each array. So when I run the function, for each "array1.id", the function search through 50 000 rows in array2. 50 000 searchs for 50 000 rows.. it's pretty long to run!
I'm looking for something to get it running faster. How could I get my search more specific. Example : I have many id "2" in the array1. If there are many id "2" in the array2, find it, and push all the array2.id = 3 in a "sub array" or something like that, and search in these specific rows. So I'll have just X rows in array1 to compare with X rows in array 2, not with 50 000. and when each "id 2" in array1 is done, do the same thing for "id 4".. and for "id 5"...
Hope it's clear. it's almost the first time I use VB.net, and I have this big function to get running.
Thanks
EDIT
Here's what I wanna do.
I have two different layers in a geospatial database. Both layers have the same structure. They are a "spatial join" of the land parcels (55 000), and the land use layer. The first layer is the current one, and the second layer is the next one we'll use after 2015.
So I have, for each "land parcel" the percentage of each land use. So, for a "land parcel" (ID 7580-80-2532, I can have 50% of farming use (TYPE FAR-23), and 50% of residantial use (RES-112). In the first array, I'll have 2 rows with the same ID (7580-80-2532), but each one will have a different type (FAR-23, RES-112) and a different %.
In the second layer, the same the municipal zoning (land use) has changed. So the same "land parcel" will now be 40% of residential use (RES-112), 20% of commercial (COM-54) and 40% of a new farming use (FAR-33).
So, I wanna know if there are some differences. Some land parcels will be exactly the same. Some parcels will keep the same land use, but not the same percentage of each. But for some land parcel, there will be more or less land use types with different percentage of each.
I want this script to compare these two layers and send me an email when there are differences between these two layers for the same land parcel ID.
The script is already working, but it takes too much time.
The probleme is, I think, the script go through all array2 for each row in array 1.
What I want is when there are more than 1 rows with the same ID in array1, take only this ID in both arrays.
Maybe if I order them by IDs, I could write a condition. kind of "when you find what you're looking for, stop searching when you'll find a different value?
It's hard to explain it clearly because I've been using VB since last week.. And english isn't my first language! ;)
If you just want to find out if there are any differences between the first and second array, you could do:
Dim diff = New HashSet(of Polygon)(array1)
diff.SymmetricExceptWith(array2)
diff will contain any Polygon which is unique to array1 or array2. If you want to do other types of comparisons, maybe you should explain what you're trying to do exactly.
UPDATE:
You could use grouping and lookups like this:
'Create lookup with first array, for fast access by ID
Dim lookupByID = array1.ToLookup(Function(p) p.id)
'Loop through each group of items with same ID in array2
For Each secondArrayValues in array2.GroupBy(Function(p) p.id)
Dim currentID As Integer = secondArrayValues.Key 'Current ID is the grouping key
'Retrieve values with same ID in array1
'Use a hashset to easily compare for equality
Dim firstArrayValues As New HashSet(of Polygon)(lookupByID(currentID))
'Check for differences between the two sets of data, for this ID
If Not firstArrayValues.SetEquals(secondArrayValues) Then
'Data has changed, do something
Console.WriteLine("Differences for ID " & currentID)
End If
Next
I am answering this question based on the first part that you wrote (that is without the EDIT section). The correct answer should explain a good algorithm but I am suggesting you to use DB capabilities because they have optimized many queries for these purpose.
Put all the records in DB two tables - O(n) time ... If the records are static you dont need to perform this step every time.
Table 1
id type percent
Table 2
id type percent
Then use the DB query, some thing like this
select count(*) from table1 t1, table2 t2 where t1.id!=t2.id and t1.type!=t2.type
(you can use some better queries, what I am trying to say is give the control to DB to perform this operation)
retrieve the result in your code and perform the necessary operation.
EDIT
1) You can sort them in O(n logn) time based on ID + type + Percent and then perform binary search.
2) Store the first record in hash map with appropriate key - could be ID only or ID+type
this will take O(n) time and searching ,if key is correct, will take constant time.
You need to define a structure to store this data. We'll store all the data in a LandParcel class, which will have a HashSet<ParcelData>
public class ParcelData
{
public ParcelType Type { get; set; } // This can be an enum, string, etc.
public int Percent { get; set; }
// Redefine Equals and GetHashCode conveniently
}
public class LandParcel
{
public ID Id { get; set; } // Whatever the type of the ID is...
public HashSet<ParcelData> Data { get; set; }
}
Now you have to build your data structure, with something like this:
Dictionary<ID, LandParcel> data1 = new ....
foreach (var item in array1)
{
LandParcel p;
if (!data1.TryGetValue(item.id, out p)
data1[item.id] = p = new LandParcel(id);
// Can this data be repeated?
p.Data.Add(new ParcelData(item.type, item.percent));
}
You do the same with a data2 dictionary for the second array. Now you iterate for all items in data1 and compare them with the item with the same id for data2.
foreach (var parcel2 in data2.Values)
{
var parcel1 = data1[parcel2.ID]; // Beware with exceptions here !!!
if (!parcel1.Data.SetEquals(parcel2.Data))
// You have different parcels
}
(Now that I look at it, we are practically doing a small database query here, kind of smelly code ...)
Sorry for the C# code since I don't really feel so comfortable with VB, but it should be fairly straightforward.

Field's value of native query in JPA

How to get value of some fields in a native query (JPA)?
For example I want to get name and age of customer table:
Query q = em.createNativeQuery("SELECT name,age FROM customer WHERE id=...");
Note: I don't want to map results to entities. I just want to get the value of the field.
Thanks
A native query with multiple select expressions returns an Object[] (or List<Object[]>). From the specification:
3.6.1 Query Interface
...
The elements of the result of a Java
Persistence query whose SELECT clause
consists of more than one select
expression are of type Object[]. If
the SELECT clause consists of only one
select expression, the elements of the
query result are of type Object. When
native SQL queries are used, the SQL
result set mapping (see section
3.6.6), determines how many items (entities, scalar values, etc.) are
returned. If multiple items are
returned, the elements of the query
result are of type Object[]. If only a
single item is returned as a result of
the SQL result set mapping or if a
result class is specified, the
elements of the query result are of
type Object.
So, to get the name and age in your example, you'd have to do something like this:
Query q = em.createNativeQuery("SELECT name,age FROM customer WHERE id = ?1");
q.setParameter(1, customerId);
Object[] result = (Object[])q.getSingleResult();
String name = result[0];
int age = result[1];
References
JPA 1.0 specification
Section 3.6.1 "Query Interface"
Section 3.6.6 "SQL Queries"
Depends on the JPA implementation. Hibernate does it different than castor, for example.
Here's a example how it would work in Hibernate:
Query query = session.createSQLQuery(
"select s.stock_code from stock s where s.stock_code = :stockCode")
.setParameter("stockCode", "7277");
List result = query.list();
I don't think that it is possible to get a plane value such as an integer... but this one should come very close to it.

NHibernate How to Select distinct objects based on specific property using HQL?

How can HQL be used to select specific objects that meet a certain criteria?
We've tried the following to generate a list of top ten subscribed RSS feeds (where SubscriptionCount is a derived property):
var topTen = UoW.Session.CreateQuery( #"SELECT distinct rss
FROM RssFeedSubscription rss
group by rss.FeedUrl
order by rss.SubscriptionCount DESC
")
.SetMaxResults(10)
.List<RssFeedSubscription>();
Where the intention is only to select the two unique feed URLs in the database, rather than the ten rows int the database instantiated as objects. The result of the above is:
Column 'RssSubscriptions.Id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.
It's possible just to thin out the results so that we take out the two unique feed URLs after we get the data back from the database, but there must be a way to do this at the DB level using HQL?
EDIT: We realise it's possible to do a Scalar query and then manually pull out values, but is there not a way of simply specifying a match criteria for objects pulled back?
If you change your HQL a bit to look like that:
var topTen = UoW.Session.CreateQuery( #"SELECT distinct rss.FeedUrl
FROM RssFeedSubscription rss
group by rss.FeedUrl
order by rss.SubscriptionCount DESC
")
.SetMaxResults(10)
.List();
the topTen variable will be an object[] with 2 elements in there being the 2 feed URLs.
You can have this returned as strongly typed collection if you use the SetResultTransformer() method of the IQuery interfase.
You need to perform a scalar query. Here is an example from the NHibernate docs:
IEnumerable results = sess.Enumerable(
"select cat.Color, min(cat.Birthdate), count(cat) from Cat cat " +
"group by cat.Color"
);
foreach ( object[] row in results )
{
Color type = (Color) row[0];
DateTime oldest = (DateTime) row[1];
int count = (int) row[2];
.....
}
It's the group by rss.FeedUrl that's causing you the problem. It doesn't look like you need it since you're selecting the entities themselves. Remove that and I think you'll be good.
EDIT - My apologies I didn't notice the part about the "derived property". By that I assume you mean it's not a Hibernate-mapped property and, thus doesn't actually have a column in the table? That would explain the second error message you received in your query. You may need to remove the "order by" clause as well and do your sorting in Java if that's the case.