Lucene Jackrabbit - lucene

Recently we have added Lucene(2.4.1) support to our application which worked with Jackrabbit(1.6.2). We have done all like it was described in jackrabbit tutorial. And all works almost fine. But I noticed some strange behavior and can't find any docs about it. I decided to ask you about it.
For example: I have following text in Node(jcr:content) in jcr:data property
The quick brown fox jumps over the lazy dog
!##$%^&
travmik!
tra!vmik
My XPath query is the following:
String query = "root/element(*,my:documentBody)
[jcr:contains(*/*/element(*),'*" + param +"*')]";
Then I try to search:
"q", "qu", "qui", "quic", "quick", "k", "ck", "ick", "uick", "quick brown fox", "quick fox", "tra", "travmik", "mik" - all found ok
"tra!vmik", "travmik!", "!##$" - nothing
And, yes I escaped all special characters from this.
What did I do wrong?
P.s. I have one more question - in Lucene docs says that "You cannot use a * or ? symbol as the first character of a search", but I use and it works. Why?

I found the problem. It was some misunderstanding with Extractors which are used in jackrabbit for indexing content. I don't want to go into details, but can say that this piece of code from one of Extractors is the cause of all my problems:
if (!Character.isLetterOrDigit(c)) {
if (!space) {
space = true;
buffer.append(' ');
continue;
}
continue;
}
If someone is interested in this - I can explain in greater detail.

Related

Magento 2 API removes spaces within variables

I am working with orders and invoices.
I noticed M2 (2.4.4) removes lots of spaces in almost all variables eg. for:
order['billing_address']:
'city': 'CHAMPIGNYSURMARNE'
In backend, it's well written 'CHAMPIGNY SUR MARNE'
Idem for :
order['billing_address']
'additional_information': [
'Virementbancaire',
'Votrecommandeseraexpédiéelorsquelevirementdesonmontantseraconfirméparnotreorganismebancaire.\r\nVoustrouvereznoscoordonnéesbancairesdanslaconfirmationdecommandeenvoyéesurvotreboîteemail.'
],
I also noticed that issue doesn't happen in all variables. Even if for the time, I can only see ONE value on witch it doesn't happen :
order['status_histories']
'comment': "Remboursement de 6,00\xa0€ hors ligne. <span style='color:deeppink'>(By Axel B)</span>",
Did anyone else ever noticed this ?
My bad ! I post this answer because maybe, somebody one day could be as dizzy as me.
The reason is I use a new tool for formatting - among others - JSON and XML.
I recommend it to those who do not know it yet : DevToys (Mac an Win).
But it is responsible for my misfortunes because is't it that removes spaces when beautifying. As the file was long, I didn't even have a look a the row file. I've searched in the soft an option that could avoid this behaviour ... without success.

Postgres INSERT returning 'invalid input syntax' for json

Problem: Attempting to insert a JSON string into a Postgres table column of json datatype intermittently returns this error for some record insertion attempts but not others.
I confirmed using multiple third party 'JSON validator' apps that the JSON I am inserting is indeed valid, and I have confirmed that any single ' quote characters have been escaped with the double '' technique, and the issue persists.
What are some additional troubleshooting steps to consider?
Here is a scrubbed sample JSON I have attempted:
{"id": "jf4ba72kFNQ","publishedAt": "2012-09-02T06:07:28Z","channelId": "UCrbUQCaozffv1soNdfDROXQ","title": "Scout vs. Witch: a tale of boy meets ghoul (Official Version)","tags": ["L4D","TF2","SFM","animation","zombies","Valve","video game"],"description": "Howdy folks (he''s alive!). I made a new SFM video (October 2015), called \"Nick in a Hotel Room\". Please check it out: https://www.youtube.com/watch?v=FOCTgwBIun0\n\nAlso check out some early behind the scenes of Scout vs. Witch:\nhttps://www.youtube.com/watch?v=73tQEBgD09I\n\nYou can find links to my stuff on my website: http://nailbiter.net\n\n-----\n\nhey gang,\nI''m the animator who made this cartoon. Hope you like it.\n\nThis is my little mash-up of a bunch of stuff I like. What happens when the Scout from Valve''s Team Fortress 2 video-game walks into the wrong neighborhood (Left 4 Dead). Hilarity (and a bodycount) ensues. It was created using Source Film Maker (for all the dialog stuff and the montage at the beginning), and with TF2/Source SDK for the entire 300 alley-run sequence. I had already completed that part before SFM was released. The big zombie horde scenes and a couple others were shot in Left 4 Dead. I hope you get a kick out of it.\n\nStuff I did:\nI animated all of the characters (using Maya) except for the big crowd scenes and parts of the headcrab zombie (the crawling and the legs). The faces in the dialog scenes were animated in SFM.\n\nAlso did additional mapping, particles, motion graphics, zombie maya rigging, and created blendshapes for the Witch''s face to enable her to talk/emote. I didn''t do a full set, just the phonemes I needed for this performance. Inspiration for her performance was based on Meg Mucklebones (if you''ve ever seen Legend) mixed with the demon ladies in Army of Darkness. I have a feeling Valve had seen those movies too when they designed her..\n\nthanks for watching."}
I am answering this question by enumerating all the other troubleshooting steps I have found so far, either 'working knowledge' that 'field workers' will have, or a little more obscure (or buried in postgres docs which, while thorough, are esoteric) insights I have found thru my own trial & error
Steps
Make sure you have escaped any single quote ' characters by double-escaping with like ''
Make sure your JSON string is actually a single line string - JSON is very easy to copy as a multiline string, and postgres JSON columns will not accept this (easy as hitting backspace on any newline)
Most obscure I've found: even when encapsulated in a JSON string field, the ? question mark weirdly enough breaks the JSON syntax for postgres. Something like {"url": "myurl.com?queryParam=someId"} will return as invalid. Solve this by escaping the question mark like: {"url": "myurl.com\?queryParam=someId"}

Formatting SQL Query Inside an IPython/Jupyter Notebook

I want to show some SQL queries inside a notebook. I neither need nor want them to run. I'd just like them to be well formatted. At the very least I want them to be indented properly with new lines, though keyword highlighting would be nice too. Does a solution for this exist already?
If you set the cell as Markdown one you can write the sql query as code specifying the language (e.g. mysql)
``` mysql
SELECT *
FROM table_a AS a
LIMIT 10;
```
This produces:
It highlights the keywords. Unfortunately, it doesn't seem to deal with indentation which seems to be the main issue you are trying to deal with but maybe this helps.
If you - like me - find yourself here because you want to highlight (and run) the %%sql magic, you're best of with the technique of this answer. Posting it here cause it took me quite some time before I found the correct keywords to my answer :)
require(['notebook/js/codecell'], function(codecell) {
codecell.CodeCell.options_default.highlight_modes['magic_text/x-mssql'] = {'reg':[/^%%sql/]} ;
Jupyter.notebook.events.one('kernel_ready.Kernel', function(){
Jupyter.notebook.get_cells().map(function(cell){
if (cell.cell_type == 'code'){ cell.auto_highlight(); } }) ;
});
});
I found that this fixed the issue I was having.
``` sql
Produced styled code in edit mode but not when the cell was run.
``` mysql
Produced correct styling

how to enable neocomplcache quick match?

I don't know how to enable quick match in the neocomplcache vim plugin. I put
let g:neocomplcache_enable_quick_match = 1
in my .vimrc, but it's useless. When I press - nothing happens.
According to the help file, there is no quick fix feature anymore. You need to use unite for that:
A: Quick match feature had been removed in latest neocomplcache
because quick match turned into hard to implement.
But you can use |unite.vim| instead to use quick match.
imap <expr> - pumvisible() ?
\ "\<Plug>(neocomplcache_start_unite_quick_match)" : '-'

help to get rid of HTML special chars in database

I've migrated my site from interspire CMS to Joomla! CMS.
I've managed to migrate all the database of articles, but some of them have a weird issue - when I access the page from joomla, the title contains HTML entities like ’.
As you can guess from the CMS's I use, I rely on PHP as my server side, and MySql for my database.
I tried to go over the titles of the articles in the database with htmlspecialchars_decode AND html_entity_decode in order to get rid of those, but it had no effect.
if I just grab an example from the DB and echo it, it will look OK:
What’s Your Pleasure, Lasagna Or Pizza Manchester Style?
if I go to the article page in joomla it will look like this:
What’s Your Pleasure, Lasagna Or Pizza Manchester Style?
When I go to PhpMyAdmin to see directly what is in the database, this is the contents of the title:
What’s Your Pleasure, Lasagna Or Pizza Manchester Style?
I even tried to remove the symbol with:
str_replace("’","",$title);
or replace it like this
str_replace('’',"'",$title);
but nothing.
When I tried to encode it again instead of decoding it (just to see if i'm on the right DB) it worked and encoded it again...
Please, I would be glad to have any new ideas...
Thanks,
Yanipan
Try setting encoding to cp1252. This worked out for me:
$decoded = html_entity_decode($your_string, ENT_QUOTES, 'cp1252');
Probably your best bet is to do search and replace within the database itself vs trying to do it with php. Search and replace in mysql is done like this:
update TABLE_NAME set FIELD_NAME = replace(FIELD_NAME, ‘find this string’, ‘replace found string with this string’);
So yours should look something like:
update ARTICLES set TITLE = replace(TITLE, '’', '\'');
Give that a shot.
Need more info
What is the character encoding on your database? That & or ;, may be something other than the typical ASCII.
It's possible that PHP/Joomla is double-encoding your string. Look at the browser's page source and find the text in the produced HTML. Instead of What’s, it might just be one of the following:
What&rsquo&59;s
What&38;rsquo&59;s
What&rsquo;s