Part of speech search with lucene - lucene

After many googling searchs I decided to post my problem here hoping that someone help me. What I want to achieve is to perform queries as follows:
q1: (adjective) "jumps" (preposition) // any adj followed by "jumps" followed by any prep.
q2: (adjective:brown) "jumps" (preposition) // brown as adj. followed by "jumps" followed by any prep.
q3: (adjective:brown) (verb:jumps) (preposition) // brown as adj followed by jumps as verb followed by any preposition.
In a more general form, what I want is
(POS[:specific_word]) (POS[:specific_word]) (POS[:specific_word])
For that, I have the text tagged as follows:
the|[pos:DT][lemma:the] quick|[pos:JJ][lemma:quick] brown|[pos:JJ][lemma:brown] fox|[pos:NN][lemma:fox] jumps|[pos:NNS][lemma:jump] over|[pos:IN][lemma:over] the|[pos:DT][lemma:the] lazy|[pos:JJ][lemma:lazy] dog|[pos:NN][lemma:dog]
The first thing I thought was to index extra info of each term as payload and using PayloadNearQuery after in order to access to the payload of each span. The problem is that PayloadNearQuery match the terms first and then access its payload, so none of the 3 above queries will work. (correct me if I'm wrong)
The second thing I thought was to index extra info as synonyms of the term but, this way, the second query won't work since I can't ask if the first term is an adj and the specific word "brown" simultaneously.
Any way to address this problem, suggestions, etc. will be appreciated.

Related

Using an API to Extract All Comments from a Reddit Post

I am using the Reddit API (Pushshift) : https://github.com/pushshift/api
Using the documentation, I understand how I can use this to extract every comment containing the word "covid" that was left in a certain time period:
https://api.pushshift.io/reddit/search/comment?q=covid&after=3h&before=2h&size=1
The output looks something like this:
{"data":[{"subreddit_id":"t5_2qh6p","author_is_blocked":false,"comment_type":null,"edited":false,"author_flair_type":"richtext","total_awards_received":0,"subreddit":"Conservative","author_flair_template_id":null,"id":"j98zf27","gilded":0,"archived":false,"collapsed_reason_code":null,"no_follow":false,"author":"VamboRoolOkay","send_replies":true,"parent_id":41917615743,"score":1,"author_fullname":"t2_7uxkru5f","all_awardings":[],"body":"I will never believe that election fraud wasn't a significant factor. Go ahead - call it a conspiracy theory. But I also maintained that Covid was lab-created. Truth is the Daughter of Time.","top_awarded_type":null,"author_flair_css_class":null,"author_patreon_flair":false,"collapsed":false,"author_flair_richtext":[{"e":"text","t":"Conservative"}],"is_submitter":false,"gildings":{},"collapsed_reason":null,"associated_award":null,"stickied":false,"author_premium":false,"can_gild":true,"link_id":"t3_116l7ct","unrepliable_reason":null,"author_flair_text_color":"dark","score_hidden":true,"permalink":"/r/Conservative/comments/116l7ct/kamala_harris_plans_on_running_with_biden_in_2024/j98zf27/","subreddit_type":"public","locked":false,"author_flair_text":"Conservative","treatment_tags":[],"created_utc":1676866031,"subreddit_name_prefixed":"r/Conservative","controversiality":0,"author_flair_background_color":"","collapsed_because_crowd_control":null,"distinguished":null,"retrieved_utc":1676866047,"updated_utc":1676866048,"body_sha1":"328df3784d15f77b98a84418c4ce720822227cfe","utc_datetime_str":"2023-02-20 04:07:11"}],"error":null,"metadata":{"es":{"took":98,"timed_out":false,"_shards":{"total":828,"successful":828,"skipped":824,"failed":0},"hits":{"total":{"value":573,"relation":"eq"},"max_score":null}},"es_query":{"size":1,"query":{"bool":{"must":[{"bool":{"must":[{"simple_query_string":{"fields":["body"],"query":"covid","default_operator":"and"}},{"range":{"created_utc":{"gte":1676862433000}}},{"range":{"created_utc":{"lt":1676866033000}}}]}}]}},"aggs":{},"sort":{"created_utc":"desc"}},"es_query2":"{\"size\":1,\"query\":{\"bool\":{\"must\":[{\"bool\":{\"must\":[{\"simple_query_string\":{\"fields\":[\"body\"],\"query\":\"covid\",\"default_operator\":\"and\"}},{\"range\":{\"created_utc\":{\"gte\":1676862433000}}},{\"range\":{\"created_utc\":{\"lt\":1676866033000}}}]}}]}},\"aggs\":{},\"sort\":{\"created_utc\":\"desc\"}}","api_launch_time":1673017478.254743,"api_request_start":1676873233.6143198,"api_request_end":1676873233.7406816,"api_total_time":0.12636184692382812}}
My Question: Suppose I identify a post that contains the word "covid" - now, I want to retrieve every comment on this post : Is this possible to do?
For instance, based on the output of these results, I see that :
link_id: t3_116l7ct
parent_id:41917615743
Can I somehow use this information to write an API query to retrieve all comments from this post?
I tried the following query but got an empty result: https://api.pushshift.io/reddit/comment/search/?link_id=t3_116cjib
Thanks!

Postgres INSERT returning 'invalid input syntax' for json

Problem: Attempting to insert a JSON string into a Postgres table column of json datatype intermittently returns this error for some record insertion attempts but not others.
I confirmed using multiple third party 'JSON validator' apps that the JSON I am inserting is indeed valid, and I have confirmed that any single ' quote characters have been escaped with the double '' technique, and the issue persists.
What are some additional troubleshooting steps to consider?
Here is a scrubbed sample JSON I have attempted:
{"id": "jf4ba72kFNQ","publishedAt": "2012-09-02T06:07:28Z","channelId": "UCrbUQCaozffv1soNdfDROXQ","title": "Scout vs. Witch: a tale of boy meets ghoul (Official Version)","tags": ["L4D","TF2","SFM","animation","zombies","Valve","video game"],"description": "Howdy folks (he''s alive!). I made a new SFM video (October 2015), called \"Nick in a Hotel Room\". Please check it out: https://www.youtube.com/watch?v=FOCTgwBIun0\n\nAlso check out some early behind the scenes of Scout vs. Witch:\nhttps://www.youtube.com/watch?v=73tQEBgD09I\n\nYou can find links to my stuff on my website: http://nailbiter.net\n\n-----\n\nhey gang,\nI''m the animator who made this cartoon. Hope you like it.\n\nThis is my little mash-up of a bunch of stuff I like. What happens when the Scout from Valve''s Team Fortress 2 video-game walks into the wrong neighborhood (Left 4 Dead). Hilarity (and a bodycount) ensues. It was created using Source Film Maker (for all the dialog stuff and the montage at the beginning), and with TF2/Source SDK for the entire 300 alley-run sequence. I had already completed that part before SFM was released. The big zombie horde scenes and a couple others were shot in Left 4 Dead. I hope you get a kick out of it.\n\nStuff I did:\nI animated all of the characters (using Maya) except for the big crowd scenes and parts of the headcrab zombie (the crawling and the legs). The faces in the dialog scenes were animated in SFM.\n\nAlso did additional mapping, particles, motion graphics, zombie maya rigging, and created blendshapes for the Witch''s face to enable her to talk/emote. I didn''t do a full set, just the phonemes I needed for this performance. Inspiration for her performance was based on Meg Mucklebones (if you''ve ever seen Legend) mixed with the demon ladies in Army of Darkness. I have a feeling Valve had seen those movies too when they designed her..\n\nthanks for watching."}
I am answering this question by enumerating all the other troubleshooting steps I have found so far, either 'working knowledge' that 'field workers' will have, or a little more obscure (or buried in postgres docs which, while thorough, are esoteric) insights I have found thru my own trial & error
Steps
Make sure you have escaped any single quote ' characters by double-escaping with like ''
Make sure your JSON string is actually a single line string - JSON is very easy to copy as a multiline string, and postgres JSON columns will not accept this (easy as hitting backspace on any newline)
Most obscure I've found: even when encapsulated in a JSON string field, the ? question mark weirdly enough breaks the JSON syntax for postgres. Something like {"url": "myurl.com?queryParam=someId"} will return as invalid. Solve this by escaping the question mark like: {"url": "myurl.com\?queryParam=someId"}

A file named Butterfly7198.txt was found and It's boggling me

I happened to come across a file while changing images for a Toontown Rewritten's Context Pack I've been working on and I stumbled across this file marked
Butterfly7198.txt
and upon clicking on it, I was greeted with the following prompt that reads
"Hi. This isn't part of any grand story arc or anything. We're not using this as
some gimmick to announce some bold new feature. There's no big pot of gold at the end of this rainbow. We're looking for people with technical talent who want to help work on
Toontown Rewritten. You can discuss this file and its contents online if you wish. However, it IS meant to be solved alone, so please don't share answers or spoilers. Good luck. -Butterfly 7198"
Then I was greeted with a weird chain of letters and numbers on which I could only assume meant that this was encoded in some language I had no Idea about. The following goes
A/MNCrYNG1ljAAAAAAAAAAADAAAAQAAAAHNEAAAAZAAAZAEAbAAAWgEAZAAAZAEAbAIAWgMAZAIA
hAAAWgQAZAMAhAAAWgUAZAQAZQYAZgEAZAUAhAAAgwAAWVoHAGQBAFMoBgAAAGn/////TmMBAAAA
AwAAAAQAAABDAAAAczoAAABkAQB9AQB4LQB0AABkAwCDAQBEXR8AfQIAdAEAagIAfAEAfAAAF4MB
AGoDAIMAAH0BAHETAFd8AQBTKAQAAABOdAAAAABpAAQAAGkAABAAKAQAAAB0BgAAAHhyYW5nZXQI
AAAAX2hhc2hsaWJ0BgAAAHNoYTI1NnQGAAAAZGlnZXN0KAMAAAB0AQAAAG10AQAAAGh0AQAAAGko
AAAAACgAAAAAcxEAAABzZWNyZXRfbWVzc2FnZS5weXQFAAAAX2lzaGEEAAAAcwgAAAAAAQYBEwAd
AWMCAAAACQAAAAgAAABDAAAAcyMBAAB0AABkAQCDAQB9AgBkAgB9AwB4XwB0AQBkAQCDAQBEXVEA
fQQAfAMAfAIAfAQAGXQCAHwAAHwEAHQDAHwAAIMBABYZgwEAFzd9AwB8AgB8AwBkAwBAGXwCAHwE
ABkCfAIAfAQAPHwCAHwDAGQDAEA8cR8AV2QCAH0EAGQCAH0DAGQEAH0FAHiWAHwBAERdjgB9BgB4
UQB0AQBkBQCDAQBEXUMAfQcAfAQAZAYAF2QDAEB9BAB8AwB8AgB8BAAZF2QDAEB9AwB8AgB8AwAZ
fAIAfAQAGQJ8AgB8BAA8fAIAfAMAPHGgAFd8AgB8AgB8BAAZfAIAfAMAGRdkAwBAGX0IAHwFAHQE
AHQCAHwGAIMBAHwIAEGDAQA3fQUAcY0AV3wFAFMoBwAAAE5pAAEAAGkAAAAAaf8AAABSAAAAAGn9
AwAAaQEAAAAoBQAAAHQFAAAAcmFuZ2VSAQAAAHQDAAAAb3JkdAMAAABsZW50AwAAAGNocigJAAAA
dAMAAABrZXl0BAAAAGRhdGF0AQAAAFN0AQAAAGpSBwAAAHQDAAAAb3V0dAEAAABidAEAAAB4dAEA
AABLKAAAAAAoAAAAAHMRAAAAc2VjcmV0X21lc3NhZ2UucHl0BQAAAF9tcmM0CQAAAHMgAAAAAAEM
AQYBEwEmASkBBgEGAQYBDQETAQ4BEgEhARoBHgF0BQAAAFJvYm90YwAAAAAAAAAAAQAAAEIAAABz
RwAAAGUAAFoBAGQAAIQAAFoCAGQBAIQAAFoDAGQCAIQAAFoEAGQDAIQAAFoFAGQEAIQAAFoGAGQF
AIQAAFoHAGQGAIQAAFoIAFJTKAcAAABjAQAAAAEAAAACAAAAQwAAAHMWAAAAZAMAfAAAXwAAZAIA
fAAAXwEAZAAAUygEAAAATmkAAAAAUgAAAAAoAgAAAGkAAAAAaQAAAAAoAgAAAHQLAAAAX1JvYm90
X19wb3N0CwAAAF9Sb2JvdF9fc3RrKAEAAAB0BAAAAHNlbGYoAAAAACgAAAAAcxEAAABzZWNyZXRf
bWVzc2FnZS5weXQIAAAAX19pbml0X18cAAAAcwQAAAAAAQkBYwEAAAABAAAAAgAAAEMAAABzDQAA
AHwAAGoAAGQBAIMBAFMoAgAAAE50AQAAAGQoAQAAAHQEAAAAbW92ZSgBAAAAUhkAAAAoAAAAACgA
AAAAcxEAAABzZWNyZXRfbWVzc2FnZS5weXQIAAAAbW92ZURvd24gAAAAcwIAAAAAAWMBAAAAAQAA
AAIAAABDAAAAcw0AAAB8AABqAABkAQCDAQBTKAIAAABOdAEAAABsKAEAAABSHAAAACgBAAAAUhkA
AAAoAAAAACgAAAAAcxEAAABzZWNyZXRfbWVzc2FnZS5weXQIAAAAbW92ZUxlZnQjAAAAcwIAAAAA
AWMBAAAAAQAAAAIAAABDAAAAcw0AAAB8AABqAABkAQCDAQBTKAIAAABOdAEAAAByKAEAAABSHAAA
ACgBAAAAUhkAAAAoAAAAACgAAAAAcxEAAABzZWNyZXRfbWVzc2FnZS5weXQJAAAAbW92ZVJpZ2h0
JgAAAHMCAAAAAAFjAQAAAAEAAAACAAAAQwAAAHMNAAAAfAAAagAAZAEAgwEAUygCAAAATnQBAAAA
dSgBAAAAUhwAAAAoAQAAAFIZAAAAKAAAAAAoAAAAAHMRAAAAc2VjcmV0X21lc3NhZ2UucHl0BgAA
AG1vdmVVcCkAAABzAgAAAAABYwIAAAAJAAAABAAAAEMAAABzbgEAAHQAAHwBAIMBAHQBAGsDAHMw
AHQCAHwBAIMBAGQBAGsDAHMwAHwBAGQCAGsHAHI8AHQDAIMAAIIBAG4AAHwAAGoEAFwCAH0CAH0D
AGkEAGQRAGQEADZkEgBkBgA2ZBMAZAcANmQUAGQIADZ8AQAZXAIAfQQAfQUAZAMAfAIAfAQAFwQD
awEAb5IAZAkAawAAbgIAAgFvtABkAwB8AwB8BQAXBANrAQBvsgBkCQBrAABuAgACAXO7AHQFAFN8
AQBkBgBrAwByzQB8AgBuBwB8AgB8BAAXfQYAfAEAZAgAawMAcukAfAMAbgcAfAMAfAUAF30HAGQB
AHwGAGQKABR8AQBkCwBrBgBkCQAUF3wHABc+ZAwAQHIbAXQFAFN8AAAEagYAfAEANwJfBgB8AgB8
BAAXfAMAfAUAF2YCAHwAAF8EAHgmAGQVAERdHgB9CAB8AABqBgBqBwB8CABkEACDAgB8AABfBgBx
SAFXdAgAUygWAAAATmkBAAAAdAQAAABkbHJ1aQAAAABSGwAAAGn/////Uh4AAABSIAAAAFIiAAAA
aSAAAABpQAAAAHQCAAAAZHVsiQAAAMwhqSWNKClirAW/QlF2SzalSZQbJVnGYblNWUQGaxJ3yRqJ
Fyl50iotVjxwYyWCGTF/+WkPFnNwQwtmNJ1UiEiNWtpvp1XmJIBS8UrmIKoZ43ZWaPcLo01iEmEQ
KSUsdKJRv1U+NW4IQi3WerApRUV7KRhweibvL1pwSiDPJMNZhXw5DpN0n2fPPXYt514QI6p4jGJI
HfA1nhjFVwY6YCRSRV1ltX44J3I7dVWhXlIoSgK0aattVFlhRzxZWRSME6kh11FGKWRrZX0XAjsN
4m1qQl0LLgeLJjwUfQqHc+JGoy4teT5iRhbAENNldj3MGkU9xTPZVXJFwESuOXMvwUzsKGYL2AMI
Sfp//3//SCtT1AB0AgAAAHVkdAIAAABscnQCAAAAcmxSAAAAACgCAAAAaQAAAABpAQAAACgCAAAA
af////9pAAAAACgCAAAAaQEAAABpAAAAACgCAAAAaQAAAABp/////ygEAAAAUiYAAABSJQAAAFIn
AAAAUigAAAAoCQAAAHQEAAAAdHlwZXQDAAAAc3RyUgsAAAB0CgAAAFZhbHVlRXJyb3JSFwAAAHQF
AAAARmFsc2VSGAAAAHQHAAAAcmVwbGFjZXQEAAAAVHJ1ZSgJAAAAUhkAAABSGwAAAFITAAAAdAEA
AAB5dAIAAABkeHQCAAAAZHl0AgAAAGN4dAIAAABjeXQCAAAAYnQoAAAAACgAAAAAcxEAAABzZWNy
ZXRfbWVzc2FnZS5weVIcAAAALAAAAHMeAAAAAAEwAAwBDwEsAUAABAEcARwBJAAEAQ8BFwENARwB
YwEAAAAEAAAABAAAAEMAAABzcAAAAHwAAGoAAFwCAH0BAH0CAHwAAGoAAGQHAGsDAHI0AGQCAGQB
AHwBABhkAQB8AgAYZgIAFlN0AQB8AABqAgCDAQB9AwB8AwBqAwBkAwCDAQBzVgBkBABTdAQAfAMA
ZAUAIHQFAGoGAGQGAIMBAIMCAFMoCAAAAE5pHwAAAHM3AAAAWW91IGFyZSBzdGlsbCAlZCBsZWZ0
IGFuZCAlZCBhYm92ZSB3aGVyZSB5b3Ugc2hvdWxkIGJlIXMCAAAABIVzHwAAAENoZWF0ZXIhIEdv
IHNvbHZlIGl0IGNvcnJlY3RseSFp/f///3OAAAAAUWZ2ZlhZbjROVHpCR084eWo5cjBOTjk0ekZ6
VEphczEyUDIvak1Uc2QzUFFlNjJIeVh0WXZIaGxrMXFkbHR4SnhpZ0Fxd1ozczlqK2E4dGhBZVlp
M242TWY3RDU4eCtEZzhDWEkvS1FSRzB6UGhyYVl6TGRnNGJ2TVpJYTl4Yz0oAgAAAGkfAAAAaR8A
AAAoBwAAAFIXAAAAUggAAABSGAAAAHQIAAAAZW5kc3dpdGhSFQAAAHQJAAAAX2JpbmFzY2lpdAoA
AABhMmJfYmFzZTY0KAQAAABSGQAAAFITAAAAUi8AAAB0AQAAAGsoAAAAACgAAAAAcxEAAABzZWNy
ZXRfbWVzc2FnZS5weXQFAAAAc29sdmU6AAAAcxAAAAAAAQ8BDwEWAg8BDwAEAhABKAkAAAB0CAAA
AF9fbmFtZV9fdAoAAABfX21vZHVsZV9fUhoAAABSHQAAAFIfAAAAUiEAAABSIwAAAFIcAAAAUjkA
AAAoAAAAACgAAAAAKAAAAABzEQAAAHNlY3JldF9tZXNzYWdlLnB5UhYAAAAbAAAAcw4AAAAGAQkE
CQMJAwkDCQMJDigIAAAAdAcAAABoYXNobGliUgIAAAB0CAAAAGJpbmFzY2lpUjYAAABSCAAAAFIV
AAAAdAYAAABvYmplY3RSFgAAACgAAAAAKAAAAAAoAAAAAHMRAAAAc2VjcmV0X21lc3NhZ2UucHl0
CAAAADxtb2R1bGU+AQAAAHMIAAAADAEMAgkFCRI=
I was wondering if anyone knew exactly what either language this was encoded with or whether or not there is someone out there that CAN help me decode this.
(Note: I tried using a Decoding website but from all the languages I checked, none of them were matching the language I was trying to find out. This has been boggling me ever since I started working on this 2 years ago.)
It appears to be BASE64 encoded at least in parts. Putting it through a BASE64 decoder gives the text below which includes a few hints as to what this might be.
Yc#sDddlZddlZdZdZdefdYZdS(iNcCs:d}x-tdD]}tj||j}qW|S(Ntii(txranget_hashlibtsha256tdigest(tmthti((ssecret_message.pyt_ishasc Cs#td}d}x_tdD]Q}|||t||t|7}||d#||||<||d#d|dS(NiR(ii(t_Robot__post_Robot__stk(tself((ssecret_message.pytinits cCs
|jdS(Ntd(tmove(R((ssecret_message.pytmoveDown scCs
|jdS(Ntl(R(R((ssecret_message.pytmoveLeft#scCs
|jdS(Ntr(R(R((ssecret_message.pyt moveRight&scCs
|jdS(Ntu(R(R((ssecret_message.pytmoveUp)sc Csnt|tks0t|dks0|dkrd#rtS|j|7||||f|x&dD]}|jj|d|_qHWtS(NitdlruiRiRR R"i i#tdul!%()bBQvK6I%YaMYDkw)y*-V5nB-z)EE{)pz&/ZpJ $Y|9tg=v-^#xbH5W:`$RE]e~8'r;uU^R(JimTYaGbFev=E=3UrED9s/L(fIH+StudtlrtrlR(ii(ii(ii(ii(R&R%R'R(( ttypetstrRt
ValueErrorRtFalseRtreplacetTrue( RRRtytdxtdytcxtcytbt((ssecret_message.pyR,s0,#$
cCsp|j\}}|jdkr4dd|d|fSt|j}|jdsVdSt|d tjdS(Nis7You are still %d left and %d above where you should be!ssCheater! Go solve it correctly!isQfvfXYn4NTzBGO8yj9r0NN94zFzTJas12P2/jMTsd3PQe62HyXtYvHhlk1qdltxJxigAqwZ3s9j+a8thAeYi3n6Mf7D58x+Dg8CXI/KQRG0zPhraYzLdg4bvMZIa9xc=(ii(RRRtendswithRt _binasciit
a2b_base64(RRR/tk((ssecret_message.pytsolve:s( t__name__t
__module__RRRR!R#RR9(((ssecret_message.pyRs (thashlibRtbinasciiR6RRtobjectR(((ssecret_message.pyts

REDIS - how to store comments

Hi I need your opinion on the design pattern for comments for REDIS:
is it better to store text comments for 1 post as :
- a single LIST where I will RPUSH all comments (may be up to 100chr each string)
- a single LIST of commentID (with RPUSH) and a new comment object each time...?
Thank you!
You should be fine with 1000 comments with 100 characters each. I like John's suggestion of a sorted set.
Keep things simple, but don't paint yourself into a corner on the client side if you later find you have reason to go the other route.

Appcelerator Cloud Services - Find string in photos object custom field

I'm having trouble using the query/search capabilities of Cloud.photos, more specifically, finding a way of search for a string in title field.
After reading this: http://docs.appcelerator.com/cloud/latest/#!/guide/search_query, I was able to perform querys with $regex:
where: {title: {'$regex' : '^' + searchterm}},
But works only if the search term is in the beginning of the title, and it's case sensitive (I'm obviously a complete regex newbie).
Also tried search API with no luck. I didn't find which were the searchable fields.
So, I'm in some sort of dead end. I'm looking for a SQL like similar funcionality, where I can perform a search of a term and if matches regardless of the position and whether it is upper or lower case.
Thanks in advance.
see the release notes here for 11 Apr 2013
http://docs.appcelerator.com/cloud/latest/#!/guide/acs_releasenotes
it cannot be done any other way because the query would be too inefficient