Filter by related document in RavenDB - ravendb

How can I translate the following SQL query to RQL?
Select bike.* from Bike
inner join Appointment on Bike.BikeID = Appointment.BikeID and
Appointment.Status != 'Booked'
I know that in Mongo it is possible to filter by related documents.
However, RavenDB documentation says that:
It's important to remember that the load clause is not a join; it's applied after the query has already run and before we send the interim results to the projection for the final result. Thus, it can't be used to filter the results of the query by loading related documents and filtering on their properties.
So what is the correct way to run such a query in RavenDB?

Related

Google Bigquery, WHERE clause based on JSON item

I've got a bigquery import from a firestore database where I want to query on a particular field from a document. This was populated via the firestore-bigquery extension and the document data is stored as a JSON string.
I'm trying to use a WHERE clause in my query that uses one of the fields from the JSON data. However this doesn't seem to work.
My query is as follows:
SELECT json_extract(data,'$.title') as title,p
FROM `table`
left join unnest(json_extract_array(data, '$.tags')) as p
where json_extract(data,'$.title') = 'technology'
data is the JSON object and title is an attribute of all of the items. The above query will run but yield 'no results' (There are definitely results there for the title in question as they appear in the table preview).
I've tried using WHERE title = 'technology' as well but this returns an error that title is an unrecognized field (hence the json_extract).
From my research this should work as a standard SQL JSON query but doesn't seem to work on Bigquery. Does anyone know of a way around this?
All I can think of is if I put the results in another table, but I don't know if that's a workable solution as the data is updated via the extension on an update, so I would need to constantly refresh my second table as well.
Edit
I'm wondering if configuring a view would help with this? Though ultimately I would like to query this based on different parameters and the docs here https://cloud.google.com/bigquery/docs/views suggest you can't reference query parameters in a view
I've since managed to work this out, and will share the solution for anyone else with the same problem.
The solution was to use JSON_VALUE in the WHERE clause instead e.g:
where JSON_VALUE(data,'$.title') = 'technology';
I'm still not sure if this is the best way to do this in terms of performance and cost so I will wait to see if anyone else leaves a better answer.

Paging in hibernate subqueries

I have quite a complex query that applies different layers of filtering and requires ordering/paging.
In pseudo-SQL I want to do the following:
SELECT ... FROM a WHERE a.id in (SELECT a.id FROM a WHERE [...] limit 10,10)
I use Criteria and DetachedCriteria, something like this:
Criteria criteria = session.createCriteria(xx.class, "xx");
DetachedCriteria idFilterSubQuery = DetachedCriteria.forClass(xx.class, "xx");
//ADD ALL FILTERS TO idFilterSubQuery
//ADD ALL OUTER_JOINS TO criteria and idFilterSubQuery
criteria.add(Subqueries.propertyIn("id", idFilterSubQuery));
Now for the paging, I can't add it to criteria, since the outer-joins mess it up... So I need to add it to idFilterSubQuery... BUT, DetachedCriteria does not have setFirstResult() and setMaxResult()... But you can't add normal Criteria as SubQuery (I haven't found a way at least)...
I would like a solution that doesn't fetch all id's first, and the full objects afterwards (reduce a round-trip to the DB)
Any ideas how to achieve this? So either use Criteria as SubQuery or add paging to a DetachedCriteria...
I'm using Hibernate 4.2.21

Path references in QueryDSL join to subquery

I'd like to know the best way to implement this query in SQL-style QueryDSL which joins to a subquery. I struggled a bit, but got it to generate the necessary SQL. I'm wondering if there are any simplifications/improvements, however, particularly related to the three "paths" I had to create? For example, would be great to define latestCaseId in terms of latestSubQuery.
In the simplified form of my actual query below, I am finding the set of records (fields spread across ucm and pcm) which have the latest timestamp per case group. The subquery identifies the latest timestamp per group so that we can filter the outer query by it.
final SimplePath<ListSubQuery> latestSubQueryPath = Expressions.path(ListSubQuery.class, "latest");
final SimplePath<Timestamp> caseLatestMentioned = Expressions.path(Timestamp.class, "caseLatestMentioned");
final SimplePath<Integer> latestCaseId = Expressions.path(Integer.class, "latest.caseId");
final ListSubQuery<Tuple> latest = new SQLSubQuery()
.from(ucm2)
.innerJoin(pcm2).on(ucm2.id.eq(pcm2.id))
.groupBy(pcm2.caseId)
.list(pcm2.caseId.as(latestCaseId), ucm2.lastExtracted.max().as(caseLatestMentioned));
q.from(ucm)
.join(pcm).on(ucm.id.eq(pcm.id))
.innerJoin(latest, latestSubQueryPath).on(pcm.caseId.eq(latestCaseId))
.where(ucm.lastExtracted.eq(caseLatestMentioned));
I believe you could use the .get(<various Path impls>) method of PathBuilder. The way I like to think of it is that creating final PathBuilder<Tuple> latestSubQueryPath = new PathBuilder<>(Tuple.class, "latest") and joining to it .innerJoin(latest, latestSubQueryPath) is creating an alias for the subquery. Then you can use .get(<various Path impls>) to access the fields as follows:
q.from(ucm)
.join(pcm).on(ucm.id.eq(pcm.id))
.innerJoin(latest, latestSubQueryPath).on(pcm.caseId.eq(latestSubQueryPath.get(pcm2.caseId)))
.where(ucm.lastExtracted.eq(latestSubQueryPath.get(maxLastExtractedDate)));
I've not run the code but hopefully this is in the right direction. If not, I'll have a look tomorrow when I have the relevant codebase to hand.
Update: As mentioned in the comments, ucm2.lastExtracted.max() requires an alias. I've called it maxLastExtractedDate and assume it's used to alias ucm2.lastExtracted.max() when creating the subquery.

Querying for software using SQL query in SCCM

I am looking for specific pieces of software across our network by querying the SCCM Database. My problem is that, for various reasons, sometimes I can search by the program's name and other times I need to search for a specific EXE.
When I run the query below, does it take 13 seconds to run if the where clause contains an AND, but it will run for days with no results if the AND is replaced with OR. I'm assuming it is doing this because I am not properly joining the tables. How can I fix this?
select vrs.Name0
FROM v_r_system as vrs
join v_GS_INSTALLED_SOFTWARE as VIS on VIS.resourceid = vrs.resourceid
join v_GS_SoftwareFile as sf on SF.resourceid = vrs.resourceid
where
VIS.productname0 LIKE '%office%' AND SF.Filename LIKE 'Office2007%'
GROUP BY vrs.Name0
Thanks!
Your LIKE clause contains a wildcard match at the start of a string:
LIKE '%office%'
This prevents SQL Server from using an index on this column, hence the slow running query. Ideally you should change your query so your LIKE clause doesn't use a wildcard at the start.
In the case where the WHERE clause contains an AND its querying based on the Filename clause first (it is able to use an index here and so this is relatively quick) and then filtering this reduced rowset based on your productname0 clause. When you use an OR however it isn't restricted to just returning rows that match your Filename clause, and so it must search through the entire table checking to see if each productname0 field matches.
Here's a good Microsoft article http://msdn.microsoft.com/en-us/library/ms172984.aspx on improving indexes. See the section on Indexes with filter clauses (reiterates the previous answer.)
Have you tried something along these lines instead of a like query?
... where product name in ('Microsoft Office 2000','Microsoft Office xyz','Whateverelse')

Django - finding the extreme member of each group

I've been playing around with the new aggregation functionality in the Django ORM, and there's a class of problem I think should be possible, but I can't seem to get it to work. The type of query I'm trying to generate is described here.
So, let's say I have the following models -
class ContactGroup(models.Model):
.... whatever ....
class Contact(models.Model):
group = models.ForeignKey(ContactGroup)
name = models.CharField(max_length=20)
email = models.EmailField()
...
class Record(models.Model):
contact = models.ForeignKey(Contact)
group = models.ForeignKey(ContactGroup)
record_date = models.DateTimeField(default=datetime.datetime.now)
... name, email, and other fields that are in Contact ...
So, each time a Contact is created or modified, a new Record is created that saves the information as it appears in the contact at that time, along with a timestamp. Now, I want a query that, for example, returns the most recent Record instance for every Contact associated to a ContactGroup. In pseudo-code:
group = ContactGroup.objects.get(...)
records_i_want = group.record_set.most_recent_record_for_every_contact()
Once I get this figured out, I just want to be able to throw a filter(record_date__lt=some_date) on the queryset, and get the information as it existed at some_date.
Anybody have any ideas?
edit: It seems I'm not really making myself clear. Using models like these, I want a way to do the following with pure django ORM (no extra()):
ContactGroup.record_set.extra(where=["history_date = (select max(history_date) from app_record r where r.id=app_record.id and r.history_date <= '2009-07-18')"])
Putting the subquery in the where clause is only one strategy for solving this problem, the others are pretty well covered by the first link I gave above. I know where-clause subselects are not possible without using extra(), but I thought perhaps one of the other ways was made possible by the new aggregation features.
It sounds like you want to keep records of changes to objects in Django.
Pro Django has a section in chapter 11 (Enhancing Applications) in which the author shows how to create a model that uses another model as a client that it tracks for inserts/deletes/updates.The model is generated dynamically from the client definition and relies on signals. The code shows most_recent() function but you could adapt this to obtain the object state on a particular date.
I assume it is the tracking in Django that is problematic, not the SQL to obtain this, right?
First of all, I'll point out that:
ContactGroup.record_set.extra(where=["history_date = (select max(history_date) from app_record r where r.id=app_record.id and r.history_date <= '2009-07-18')"])
will not get you the same effect as:
records_i_want = group.record_set.most_recent_record_for_every_contact()
The first query returns every record associated with a particular group (or associated with any of the contacts of a particular group) that has a record_date less than the date/ time specified in the extra. Run this on the shell and then do this to review the query django created:
from django.db import connection
connection.queries[-1]
which reveals:
'SELECT "contacts_record"."id", "contacts_record"."contact_id", "contacts_record"."group_id", "contacts_record"."record_date", "contacts_record"."name", "contacts_record"."email" FROM "contacts_record" WHERE "contacts_record"."group_id" = 1 AND record_date = (select max(record_date) from contacts_record r where r.id=contacts_record.id and r.record_date <= \'2009-07-18\')
Not exactly what you want, right?
Now the aggregation feature is used to retrieve aggregated data and not objects associated with aggregated data. So if you're trying to minimize number of queries executed using aggregation when trying to obtain group.record_set.most_recent_record_for_every_contact() you won't succeed.
Without using aggregation, you can get the most recent record for all contacts associated with a group using:
[x.record_set.all().order_by('-record_date')[0] for x in group.contact_set.all()]
Using aggregation, the closest I could get to that was:
group.record_set.values('contact').annotate(latest_date=Max('record_date'))
The latter returns a list of dictionaries like:
[{'contact': 1, 'latest_date': somedate }, {'contact': 2, 'latest_date': somedate }]
So one entry for for each contact in a given group and the latest record date associated with it.
Anyway, the minimum query number is probably 1 + # of contacts in a group. If you are interested obtaining the result using a single query, that is also possible, but you'll have to construct your models in a different way. But that's a totally different aspect of your problem.
I hope this will help you understand how to approach the problem using aggregation/ the regular ORM functions.