Can you avoid for loops with solrj? - sql

I was curious if there was a way to avoid using loops when using SolrJ.
For example. If I were to use straight SQL, using an appropriate java library, I could return a Query result and caste it as a List and pass it on up to my view (in a webapp).
SolrJ (SolrQuery and QueryResponse) have no way of returning succinct lists it seems. This would imply I have to create an iterator to go through each return doc and get the value I want which isn't ideal.
Is there something I am missing here, is there away to avoid these seemingly useless loops?

The SOLRJ wiki give an example that does what you want:
https://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr
Basically:
QueryResponse rsp = server.query( query );
SolrDocumentList docs = rsp.getResults();
List<Item> beans = rsp.getBeans(Item.class);
EDIT:
Based on your comments below, it appears what you want is a non-looping transform of the SOLR response (e.g. a map function in a functional language). Google's guava library provides something like this. My example below assumes that your SOLR response has a "name" field that you want to return a list of:
QueryResponse rsp = server.query(query);
SolrDocumentList docs = rsp.getResults();
List<String> names = Lists.transform(docs, new Function<String,SolrDocument>() {
#Override
public String apply(SolrDocument d) {
return (String)d.get("name");
}
});
Unfortunately, java does not support this style of programming very well, so the functional approach ends up being more verbose (and probably less clear) than a simple loop:
QueryResponse rsp = server.query(query);
SolrDocumentList docs = rs.getResults();
List<String> names = new ArrayList<String>();
for (SolrDocument d : docs) names.add(d.get("name"));

Related

Is there a way to rewrite INSERT, MODIFY or DELETE sparql using ARQ Jena Algebra?

I found some SPARQL query manipulation only for SELECT, ASK, CONSTRUCT https://jena.apache.org/documentation/query/manipulating_sparql_using_arq.html and https://jena.apache.org/documentation/query/algebra.html but could not find anything regarding UPDATE operations
Any examples I can look at?
Thanks.
It seems you can use org.apache.jena.sparql.syntax.syntaxtransform.UpdateTransformOps
I have the same requirement and as the documentation is super limited here, I'm still using debug mode to see how I can achieve my goal. You can inspire from this, if you have a better solution I would be very interested.
Something like that:
public class OpPermissionTransformer extends ElementTransformCopyBase {
#Override
public Element transform(ElementNamedGraph el, Node gn, Element elt1) {
return elt1;
}
}
UpdateRequest modified = UpdateTransformOps.transform(update, new OpPermissionTransformer(), new NodeTransformExpr(n -> {
// modify the node as you wish
return n;
});

Google Cloud Dataflow, BigQueryIO and NullPointerException on TableRow.get

I'm new to GC Dataflow and didn't find a relevant answer here. Apologies if I should have found this already answered.
I'm trying to create a simple pipeline using the v2.0 SDK and am having trouble reading data into my PCollection using BigQueryIO. I am using the .withQuery method and I have tested the query in the BigQuery interface and it seems to be working fine. The initial PCollection seems to get created without any issues, but when I think setup a simple ParDo function to convert the values from the TableRow into a PCollection I am getting a NullPointerException on the line of code that does the .get on the TableRow object.
Here is my code. (I'm probably missing something simple. I'm a total newbie at Pipeline programming. Any input would be most appreciated.)
public class ClientAutocompletePipeline {
private static final Logger LOG = LoggerFactory.getLogger(ClientAutocompletePipeline.class);
public static void main(String[] args) {
// create the pipeline
Pipeline p = Pipeline.create(
PipelineOptionsFactory.fromArgs(args).withValidation().create());
// A step to read in the product names from a BigQuery table
p.apply(BigQueryIO.read().fromQuery("SELECT name FROM [beaming-team-169321:Products.raw_product_data]"))
.apply("ExtractProductNames", ParDo.of(new DoFn<TableRow, String>() {
#ProcessElement
public void processElement(ProcessContext c) {
// Grab a row from the BigQuery Results
TableRow row = c.element();
// Get the value of the "name" column from the table row.
//NOTE: This is the line that is giving me the NullPointerException
String productName = row.get("name").toString();
// Make sure it isn't empty
if (!productName.isEmpty()) {
c.output(productName);
}
}
}))
The query definitely works in the BigQuery UI and the column called "name" is returned when I test the query. Why am I getting a NullPointerException on this line:
String productName = row.get("name").toString();
Any ideas?
This is a common problem when working with BigQuery and Dataflow (most likely the field is indeed null). If you are ok with using Scala, you could take a look at Scio (which is a Scala DSL for Dataflow) and its BigQuery IO.
Just make your code null safe. Replace this:
String productName = row.get("name").toString();
With something like this:
String productName = String.valueOf(row.get("name"));
I think I'm late for this but you can do something like if(row.containsKey("column-name")).
This will basically tell you if the field is null or not.
In BigQuery what happens is, while reading data, if a column value is null, it is not available as a part of that particular TableRow. Hence, you are getting that error. You can also do something like if(null == row.get("column-name")) to check if the field is null or not.

Orchard Search multiple fields with same term

I am trying to create a custom search module based on the Orchard.Search. I have created a custom field called keywords which I have successfully added to the index. I want to match content where the title, body or keywords match. Adding these using .WithField or passing a string array of fields tests for each field matching the term, I need these to return content if there is a match in any of the fields. I have included examples of how I am using both methods below.
Examples of how I am using the search builder:
var searchBuilder = Search()
.WithField("type", "Cell").Mandatory().ExactMatch()
.WithField("body", query)
.WithField("title", query);
.WithField("cell-keywords", query);
String Array FieldNames:
string[] searchFields = new string[2] { "body", "title", "cell-keywords"};
var searchBuilder = Search().WithField("type", "Cell").Mandatory().ExactMatch().Parse(searchFields, query, false);
If anyone could point me in the right direction that would fantastic :)
A colleague wrote an article on this on his blog, should prove helpful http://breakoutdeveloper.com/orchard-cms/creating-an-advanced-search
I have resolved my issue!
The problem was when I was adding my keywords field to the index on the part handler. There were content items with NULL which was causing an error which I missed!!

Entity Framework and LINQ together Batch Update

Afaik Entity Framework 6 doesn't support for batch insert/update/delete.
Is there anyway to make an batch update over an IQueryable object. As an example I have
var query = _db.People.Where(x=>x.Name.Contains(parameter));
an IQueryable (query) object and I want to get the generated sql. Then I hope I can create an update command with this select query like this
Update filteredPerson
Set filteredPerson.Status = 'Updated'
from (here it comes IQueryable Generated SQL :) ) as filteredPerson
over DbContext raw sql execution commands. BTW I don't need EF properties like change tracking and auto detecting. It is just a batch operation.
I know it is pretty risky but I am going to use it for a small piece of code.
Some other logics are appricated. If you know something better, I would like to hear it.
REASON: Why I want to do it this way, because I don't want to spoil the seperation of layers. And there is some validation and filtering comes into the queryable object from other layers. So it is hard to convert it to stored procedure. At the other hand it must be faster than other standard queries.
Again I know there is no support in Entity Framework 6 for batch operations. But other questions are bit outdated. That's another reason why I want to ask this again.
While I was writing the question, I was guessing how I am going to solve it. But I was looking for some more proper way of it. In the end, I know what am I doing and tried to be simple for my colleagues who looking to the same code after me. I know it has some risky usages but I let the exceptions to CLR to handle it. After this excuse :) , I wrote the code like this:
Let's say I have an IQueryable object which is generated with this way:
string parameter = "John";
AdventureWorks2012Entities _db = new AdventureWorks2012Entities();
var query = _db.People.AsQueryable();
//Some parameters added from different layers
query = query.Where(x => x.FirstName.Contains(parameter));
Then I want a batch update over this IQueryable object.
var sqlFrom = query.ToString(); //This is the query which becomes "from table"
var dbq = query.AsDbQuery().GetObjectQuery(); //This does some magic with reflection
var linqParams = dbq.Parameters
.Select(x => new System.Data.SqlClient.SqlParameter(x.Name, x.Value)).ToList();
linqParams.Add(new System.Data.SqlClient.SqlParameter("#ModDate", DateTime.Now));
var sqlBatchUpdate = #"Update filteredPerson Set ModifiedDate = #ModDate From (#FilteredPerson) as filteredPerson"
.Replace("#FilteredPerson", sqlFrom);
var affectedRows = _db.Database.ExecuteSqlCommand(sqlBatchUpdate, linqParams.ToArray());
That's it! Now I don't have to repeat same business logic in stored procedure again. And it is more faster than a foreach and SaveChanges combo.
So I ended up with this for very basic usage. As a fast solution It brings more problems no doubt! But I know I can easily wrap around it for new purposes. So It is up to programmer who wants to use it with more preferences.
Also the code which does the reflection and casting is below and I added a gist for full code:
public static ObjectQuery<T> GetObjectQuery<T>(this DbQuery<T> query)
{
var internalQueryField = query.GetType()
.GetFields(BindingFlags.NonPublic | BindingFlags.Instance)
.Where(f => f.Name.Equals("_internalQuery"))
.FirstOrDefault();
var internalQuery = internalQueryField.GetValue(query);
var objectQueryField = internalQuery.GetType()
.GetFields(BindingFlags.NonPublic | BindingFlags.Instance)
.Where(f => f.Name.Equals("_objectQuery"))
.FirstOrDefault();
var objectQuery = objectQueryField.GetValue(internalQuery) as ObjectQuery<T>;
return objectQuery;
}
Here is the Gist file. Hope It helps somebody out there.

Nhibernate Tag Cloud Query

This has been a 2 week battle for me so far with no luck. :(
Let me first state my objective. To be able to search entities which are tagged "foo" and "bar". Wouldn't think that was too hard right?
I know this can be done easily with HQL but since this is a dynamically built search query that is not an option. First some code:
public class Foo
{
public virtual int Id { get;set; }
public virtual IList<Tag> Tags { get;set; }
}
public class Tag
{
public virtual int Id { get;set; }
public virtual string Text { get;set; }
}
Mapped as a many-to-many because the Tag class is used on many different types. Hence no bidirectional reference.
So I build my detached criteria up using an abstract filter class. Lets assume for simplicity I am just searching for Foos with tags "Apples"(TagId1) && "Oranges"(TagId3) this would look something like.
SQL:
SELECT ft.FooId
FROM Foo_Tags ft
WHERE ft.TagId IN (1, 3)
GROUP BY ft.FooId
HAVING COUNT(DISTINCT ft.TagId) = 2; /*Number of items we are looking for*/
Criteria
var idsIn = new List<int>() {1, 3};
var dc = DetachedCriteria.For(typeof(Foo), "f").
.CreateCriteria("Tags", "t")
.Add(Restrictions.InG("t.Id", idsIn))
.SetProjection( Projections.ProjectionList()
.Add(Projections.Property("f.Id"))
.Add(Projections.RowCount(), "RowCount")
.Add(Projections.GroupProperty("f.Id")))
.ProjectionCriteria.Add(Restrictions.Eq("RowCount", idsIn.Count));
}
var c = Session.CreateCriteria(typeof(Foo)).Add(Subqueries.PropertyIn("Id", dc))
Basically this is creating a DC that projects a list of Foo Ids which have all the tags specified.
This compiled in NH 2.0.1 but didn't work as it complained it couldn't find Property "RowCount" on class Foo.
After reading this post I was hopeful that this might be fixed in 2.1.0 so I upgraded. To my extreme disappointment I discovered that ProjectionCriteria has been removed from DetachedCriteria and I cannot figure out how to make the dynamic query building work without DetachedCriteria.
So I tried to think how to write the same query without needing the infamous Having clause. It can be done with multiple joins on the tag table. Hooray I thought that's pretty simple. So I rewrote it to look like this.
var idsIn = new List<int>() {1, 3};
var dc = DetachedCriteria.For(typeof(Foo), "f").
.CreateCriteria("Tags", "t1").Add(Restrictions.Eq("t1.Id", idsIn[0]))
.CreateCriteria("Tags", "t2").Add(Restrictions.Eq("t2.Id", idsIn[1]))
In a vain attempt to produce the below sql which would do the job (I realise its not quite correct).
SELECT f.Id
FROM Foo f
JOIN Foo_Tags ft1
ON ft1.FooId = f.Id
AND ft1.TagId = 1
JOIN Foo_Tags ft2
ON ft2.FooId = f.Id
AND ft2.TagId = 3
Unfortunately I fell at the first hurdle with this attempt, receiving the exception "Duplicate Association Path". Reading around this seems to be an ancient and still very real bug/limitation.
What am I missing?
I am starting to curse NHibernates name at making what is you would think so simple and common a query, so difficult. Please help anyone who has done this before. How did you get around NHibernates limitations.
Forget reputation and a bounty. If someone does me a solid on this I will send you a 6 pack for your trouble.
I managed to get it working like this :
var dc = DetachedCriteria.For<Foo>( "f")
.CreateCriteria("Tags", "t")
.Add(Restrictions.InG("t.Id", idsIn))
.SetProjection(Projections.SqlGroupProjection("{alias}.FooId", "{alias}.FooId having count(distinct t1_.TagId) = " + idsIn.Count,
new[] { "Id" },
new IType[] { NHibernateUtil.Int32 }));
The only problem here is the count(t1_.TagId) - but I think that the alias should be generated the same every time in this DetachedCriteria - so you should be on the safe side hard coding that.
Ian,
Since I'm not sure what db backend you are using, can you do some sort of a trace against the produced SQL query and take a look at the SQL to figure out what went wrong?
I know I've done this in the past to understand how Linq-2-SQL and Linq-2-Entities have worked, and been able to tweak certain cases to improve the data access, as well as to understand why something wasn't working as initially expected.