Solr 5.3 implementation processes docs but doesn't return results - indexing

I have recently set up a local instance of Solr 5.3 in an effort to get it going for my company. As an initial test case I've set up a Data Import Handler (DIH) that returns PDFs stored within a file directory. When I execute the full import in the admin tool, the DIH processes all the files within the directory, and I'm able to run a general query (*:*) which returns all indexed fields for every record in the index.
When I switch to a specific query using a word definitely contained within the files, however, Solr returns no results. What connection am I not making here?
I can provide excerpts from the schema, solrconfig, and custom data config if needed, but I don't want to oversaturate this post.

The answer I came up with involved a simple newbie mistake combined with something I wasn't anticipating.
1) First, I didn't have my field set to indexed="true". I set that. Yeesh, it stinks being new to this!
2) I needed to make a change to solrconfig.xml for the core in question. Thanks to this article, I was able to determine that I needed to add a default field in the /select requestHandler. Uncommenting the relevant line in solrconfig and changing the field name did the trick-- I no longer need to supply the name in df to return results.
My carryover question for anyone coming across this question in the future is whether this latter point is the proper way to go about using default fields. I see in schema.xml that is deprecated (or heading that direction) in 5.3.0. So is it alright to define df in solrconfig instead?

Related

Underscore and dash in column names after JSON import

I've been using OpenRefine very successfully for a couple of years, working solely with CSV (and TSV) source files. Recently I had some tables from an sql database that I wanted to bring into OpenRefine so I exported them (from SQL) as JSON and then used OpenRefine's JSON import feature. It works beautifully except that the column names all begin with _ - . For example, my JSON records start with
{"ID":"97247",
and OpenRefine made the first column name _ - ID instead of just ID (which I'd prefer - I know I can edit them later, but I have hundreds of fields). I can't see any settings in the parsing page that might help this. Does anyone know if there is a way to import without the extra characters (or if there's an explanation for the underscore dash)? I'm considering submitting a feature request but I thought I'd check to see what other users may know.
This is a known issue.
There has also been a proposal to switch to a standard representation for JSON paths.
Feel free to comment on either tickets to indicate which solution you would prefer.

Is there a way to properly experiment with Solr field-types?

I'm working with Solr for a basic search engine, and I've created a couple different fieldTypes that include various filters and tokenizers in their analyzer chains.
However, I'm finding it very difficult to assess how these components of the chain interact and when I query in the Solr Admin, I consistently get different results than I expect-- with no clue as to why.
Is there a way to see what a phrase like education:"x university" is being transformed into when I type it in the q section of the Admin?
Also, when the phrase goes through the chain can it be transformed into multiple things that are all searched or is it just a single modified phrase?
Thanks for any help!
Use Analysis in Solr Admin to check how each field and its type process the tokens both while querying and indexing.
Analyse Fieldname / FieldType:
from the drop down option select field/type that you want to analyse and clieck on Analyse values.
ex: what tokenizer used, which all filter classes applied to token and how token is transformed after passing each filter class.
if
Verbose Output is checked, it shows more details about each filter class used for the selected field/type.

Lucene Difference between OpenMode.CREATE_OR_APPEND and deleteDocuments

I am pretty new to LUCENE search engine, want to know the functionality of OpenMode.CREATE_OR_APPEND, deleteDocuments? Also, indexSearcher.search method can accept either Term or Query as a parameter, to fetch documents. Can you help me out in which scenario I need to use term and query?
The OpenMode does not affect the behavior of deleteDocuments. It only affects what happens when you open the Indexwriter:
CREATE - Creates a new index. If one already exists, it will be overwritten.
CREATE_OR_APPEND - Uses an existing index, or creates it if none currently exists.
APPEND - Uses an existing index. If none currently exists, throws an IOException.
I'm not aware of any IndexSearcher.search method that takes a Term as an argument. If you can link to what you are referring to, that might be helpful.
However, if you want to search for a term, you can just use TermQuery.

Extracting data using U-SQL file set pattern when silent switch is true

I want to extract data from multiple file so I am using file set pattern that requires one virtual column. Because of some issues in my data, I also require silent switch other wise I am not able to process my data. It looks like, when I use virtual column with silent switch it does not extract any row.
#drivers =
EXTRACT name string,
age string,
origin string
FROM "/input/{origin:*}file.csv"
USING Extractors.Csv(silent:true);
Note that I can extract data from a single file by removing virtual column. Is there any solution for this problem?
first you do not need to name the wildcard (and expose a virtual column) if you do not plan on referring to the value. Although we recommend that you make sure that you are not processing too many files with this pattern, so best may be to use the virtual column as a filter to restrict the number of files to a few thousand right now until we improve the implementation to work on more files.
I assume that at least one file contains some rows with two columns? If that is the case I think you found a bug. Could you please send me a simple repro (one file that works, and an additional file where it stops working and the script) to my email address so I can file it and we can investigate it?
Thanks!

Why are my Lucene Document results empty?

I'm running a simple test--trying to index something and then search for it. I index a simple document, but then when a search for a string in it, I get back what looks to be an empty document (it has no fields). Lucene seems to be doing something, because if I search for a word that's not in the document, it returns 0 results.
Any reason why Lucene would reliably return a document when it finds one that matches the given query, and yet that document has nothing in it?
More details:
I'm actually running Lucandra (Lucene + Cassandra). That certainly may be a relevant detail, but not sure.
The fields are set to Field.Store/YES and Field.Index/ANALYZED
Interestingly, I'm able to get this to work just fine on my local machine, but when we put it on our main server (which is a multi-node cassandra setup), I get the behavior described above. So this seems like probably the relevant detail, but unfortunately, I see no error message to clue me in to what specifically is causing it.
Unsure if this will work with Lucandra, but you have tried opening the index using Luke? Viewing the index contents with Luke might help
It's hard to tell what the problem is since you only provide a very abstract description. However, it sounds a bit like you are not storing the field value in the index. There are different modes for indexing a field. One option determines whether the original value is stored in the index to retrieve it later:
http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/document/Field.Store.html
See also the description of the enclosing class Field
Read: http://anismiles.wordpress.com/2010/05/27/lucandra-an-inside-story/