Convert all items in a list to string format - spacy

I am trying to seperate sentences (with spacy sentencizer) within a larger text format to process them in a transformers pipeline.
Unfortunately, this pipeline is not able to process the sentences correctly, since the sentences are not yet in string format after sentencizing the test. Please see the following information.
string = 'The Chromebook is exactly what it was advertised to be. It is super simple to use. The picture quality is great, stays connected to WIfi with no interruption. Quick, lightweight yet sturdy. I bought the Kindle Fire HD 3G and had so much trouble with battery life, disconnection problems etc. that I hate it and so I bought the Chromebook and absolutely love it. The battery life is good. Finally a product that lives up to its hype!'
#Added the sentencizer model to the classification package, so all the sentences in the summary texts of the reviews are being disconnected from each other
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(string)
sentences = list(doc.sents)
sentences
This leads to the following list:
[The Chromebook is exactly what it was advertised to be.,
It is super simple to use.,
The picture quality is great, stays connected to WIfi with no interruption.,
Quick, lightweight yet sturdy.,
I bought the Kindle Fire HD 3G and had so much trouble with battery life, disconnection problems etc.,
that I hate it,
and so I bought the Chromebook and absolutely love it.,
The battery life is good.,
Finally a product that lives up to its hype!]
When I provide this list to the following pipline, I get this error: ValueError: args[0]: The Chromebook is exactly what it was advertised to be. have the wrong format. The should be either of type str or type list
#Now in this line the list of reviews are being processed into triplets
from transformers import pipeline
triplet_extractor = pipeline('text2text-generation', model='Babelscape/rebel-large', tokenizer='Babelscape/rebel-large')
model_output = triplet_extractor(sentences, return_tensors=True, return_text=False)
extracted_text = triplet_extractor.tokenizer.batch_decode([x["generated_token_ids"] for x in model_output])
print("\n".join(extracted_text))
Therefore, can someone please indicate how I can convert all the sentences in the 'sentences' list to string format?
Looking forward for the response. : )

Your sentences are Span objects. You can convert them to strings by using sentence.text, so [ss.text for ss in sentences] for all of them.
What is triplet_extractor? You don't explain it anywhere.

Related

Why the output of Nilearn's fetch ABIDE dataset is empty?

I want to diagonse autism and one of the best autism diagnosis datasets is ABIDE (Autism Brain Imaging Data Exchange). My final goal in this part is to have a connectivity matrix which I can use for the rest of my research. At this moment I am trying to download ABIDE dataset using Nilearn library. I use the code below as mentioned in Nilearn's documentry, but unfortunately I get an empty list from dataset.func_preproc. I don't know the reason, because it works fine for other datasets of Nilearn which I tested.
dataset = nilearn.datasets.fetch_abide_pcp(derivatives=['func_preproc', 'func_mean'])
print(dataset['func_preproc'])
and the output is:
[]
as you see , dataset['func_preproc'] is an empty list.
Does anyone have any idea about this?

Postgres INSERT returning 'invalid input syntax' for json

Problem: Attempting to insert a JSON string into a Postgres table column of json datatype intermittently returns this error for some record insertion attempts but not others.
I confirmed using multiple third party 'JSON validator' apps that the JSON I am inserting is indeed valid, and I have confirmed that any single ' quote characters have been escaped with the double '' technique, and the issue persists.
What are some additional troubleshooting steps to consider?
Here is a scrubbed sample JSON I have attempted:
{"id": "jf4ba72kFNQ","publishedAt": "2012-09-02T06:07:28Z","channelId": "UCrbUQCaozffv1soNdfDROXQ","title": "Scout vs. Witch: a tale of boy meets ghoul (Official Version)","tags": ["L4D","TF2","SFM","animation","zombies","Valve","video game"],"description": "Howdy folks (he''s alive!). I made a new SFM video (October 2015), called \"Nick in a Hotel Room\". Please check it out: https://www.youtube.com/watch?v=FOCTgwBIun0\n\nAlso check out some early behind the scenes of Scout vs. Witch:\nhttps://www.youtube.com/watch?v=73tQEBgD09I\n\nYou can find links to my stuff on my website: http://nailbiter.net\n\n-----\n\nhey gang,\nI''m the animator who made this cartoon. Hope you like it.\n\nThis is my little mash-up of a bunch of stuff I like. What happens when the Scout from Valve''s Team Fortress 2 video-game walks into the wrong neighborhood (Left 4 Dead). Hilarity (and a bodycount) ensues. It was created using Source Film Maker (for all the dialog stuff and the montage at the beginning), and with TF2/Source SDK for the entire 300 alley-run sequence. I had already completed that part before SFM was released. The big zombie horde scenes and a couple others were shot in Left 4 Dead. I hope you get a kick out of it.\n\nStuff I did:\nI animated all of the characters (using Maya) except for the big crowd scenes and parts of the headcrab zombie (the crawling and the legs). The faces in the dialog scenes were animated in SFM.\n\nAlso did additional mapping, particles, motion graphics, zombie maya rigging, and created blendshapes for the Witch''s face to enable her to talk/emote. I didn''t do a full set, just the phonemes I needed for this performance. Inspiration for her performance was based on Meg Mucklebones (if you''ve ever seen Legend) mixed with the demon ladies in Army of Darkness. I have a feeling Valve had seen those movies too when they designed her..\n\nthanks for watching."}
I am answering this question by enumerating all the other troubleshooting steps I have found so far, either 'working knowledge' that 'field workers' will have, or a little more obscure (or buried in postgres docs which, while thorough, are esoteric) insights I have found thru my own trial & error
Steps
Make sure you have escaped any single quote ' characters by double-escaping with like ''
Make sure your JSON string is actually a single line string - JSON is very easy to copy as a multiline string, and postgres JSON columns will not accept this (easy as hitting backspace on any newline)
Most obscure I've found: even when encapsulated in a JSON string field, the ? question mark weirdly enough breaks the JSON syntax for postgres. Something like {"url": "myurl.com?queryParam=someId"} will return as invalid. Solve this by escaping the question mark like: {"url": "myurl.com\?queryParam=someId"}

#BFxForward() in the Bloomberg python api

I've used https://github.com/691175002/BLPInterface as a wrapper to the terribly-documented (and non-supported by Bloomberg Help) Bloomberg python API. I use it to pull price histories, etc.
Lately I've needed to pull specific FX date values. In excel I do that as =#BFxForward("usdjpy",J10, "BidOutright") where J10 is a date.
I would like to pull this information via the Bloomberg Python API (or even better, with the BLPInterace wrapper) but it's not clear how to do it. I've seen someone ask a similar question for a .Net implementation, but the only answer cited page 207 of a developers guide. Every developer guide I can find on bloomberg is well less than 200 pages, and none of it mentions pulling fx values.
Wondering if anyone can point me at some examples or resources to build on to get this ?
It does take some finding, to be sure, but I tracked it down via the Bloomi Terminal. The way I found the information is as follows (for future reference):
Type DAPI in the Bloomberg Terminal
Choose 'Additional Resources' in the left hand panel
Choose 'Help Page for DAPI' in the right hand panel, and a window pops up
Choose 'Constructing Formulas' in the left hand panel
Choose 'FX Broken Dates Forwards Syntax' in the right hand panel
Or paste this link into Bloomi:
{LPHP DAPI:0:1 2277846 }
There are a lot of different examples and options (FX fwds are not my area of expertise), but simply using this format for the ticker seems to work:
ccy1/ccy2 mm/dd/yy Curncy
and then the field PX_BID. You can try this in a BDP call in Excel, for example:
=BDP("EUR/GBP 08/08/22 Curncy","PX_BID")
When it comes to Python, perhaps try using the xbbg python package (other wrappers are available): it does a good job of hiding all the intricacies of the low-level API.
Here's a code sample using xbbg, that pulls back the forward fx rate in the example:
from xbbg import blp
from datetime import datetime
ccy1 = 'EUR'
ccy2 = 'GBP'
fwdDate = datetime(2022,8,8)
ticker = '{0:}/{1:} {2:} Curncy'.format(ccy1,ccy2,fwdDate.strftime('%m/%d/%y'))
df = blp.bdp(ticker,'PX_BID')
print(df)
Output:
px_bid
EUR/GBP 08/08/22 Curncy 0.85344
EDIT: Looking at the OP's choice of Bloomi wrapper, the xbbg call could possibly be replaced by:
blp.referenceRequest(ticker, 'PX_BID')

Does TensorFlow Audio/Speech Recognition work with multi-word trigger keywords?

Related link: https://www.tensorflow.org/tutorials/sequences/audio_recognition
How should I modify my TensorFlow "Simple Audio Recognition" training environment (number of input samples, choice of trigger keywords, training parameters, etc.) to get a robust recognition of a unique trigger keyword (multi-words or single-word) in a normal conversation?
The original TensorFlow "Simple Audio Recognition" comes with 10 single trigger keywords, each 1 second in duration. To avoid single trigger keywords to get detected in a normal conversation and cause false positives, I have recorded 400 times (100 times 4 different people) the following two multi-worded trigger keywords, each 1.5 seconds in duration: PLAY MUSIC, STOP MUSIC. After following the exact same training steps and compensating for the new 1.5 seconds duration in the code, I am getting 100 % recognition of these two multi-worded trigger keywords when pronounced correctly; however, further testing also shows that I am getting false positives during normal speech when any work of these trigger keywords is pronounced, e.g. STOP BLA BLA BLA, STOP VIDEO, PLAY BLA BLA BLA, PLAY VIDEO, etc.
Thank you for your kind response,
PM
You should have added garbage speech into training dataset, not sure if you did that.
For very long phrases, it is more reliable to detect smaller chunks and ensure they all are present - i.e. to have a separate detector for "play" and for "music".
For example, Google separately detects "ok" and "google" in their "ok google" as described in SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
.

A file named Butterfly7198.txt was found and It's boggling me

I happened to come across a file while changing images for a Toontown Rewritten's Context Pack I've been working on and I stumbled across this file marked
Butterfly7198.txt
and upon clicking on it, I was greeted with the following prompt that reads
"Hi. This isn't part of any grand story arc or anything. We're not using this as
some gimmick to announce some bold new feature. There's no big pot of gold at the end of this rainbow. We're looking for people with technical talent who want to help work on
Toontown Rewritten. You can discuss this file and its contents online if you wish. However, it IS meant to be solved alone, so please don't share answers or spoilers. Good luck. -Butterfly 7198"
Then I was greeted with a weird chain of letters and numbers on which I could only assume meant that this was encoded in some language I had no Idea about. The following goes
A/MNCrYNG1ljAAAAAAAAAAADAAAAQAAAAHNEAAAAZAAAZAEAbAAAWgEAZAAAZAEAbAIAWgMAZAIA
hAAAWgQAZAMAhAAAWgUAZAQAZQYAZgEAZAUAhAAAgwAAWVoHAGQBAFMoBgAAAGn/////TmMBAAAA
AwAAAAQAAABDAAAAczoAAABkAQB9AQB4LQB0AABkAwCDAQBEXR8AfQIAdAEAagIAfAEAfAAAF4MB
AGoDAIMAAH0BAHETAFd8AQBTKAQAAABOdAAAAABpAAQAAGkAABAAKAQAAAB0BgAAAHhyYW5nZXQI
AAAAX2hhc2hsaWJ0BgAAAHNoYTI1NnQGAAAAZGlnZXN0KAMAAAB0AQAAAG10AQAAAGh0AQAAAGko
AAAAACgAAAAAcxEAAABzZWNyZXRfbWVzc2FnZS5weXQFAAAAX2lzaGEEAAAAcwgAAAAAAQYBEwAd
AWMCAAAACQAAAAgAAABDAAAAcyMBAAB0AABkAQCDAQB9AgBkAgB9AwB4XwB0AQBkAQCDAQBEXVEA
fQQAfAMAfAIAfAQAGXQCAHwAAHwEAHQDAHwAAIMBABYZgwEAFzd9AwB8AgB8AwBkAwBAGXwCAHwE
ABkCfAIAfAQAPHwCAHwDAGQDAEA8cR8AV2QCAH0EAGQCAH0DAGQEAH0FAHiWAHwBAERdjgB9BgB4
UQB0AQBkBQCDAQBEXUMAfQcAfAQAZAYAF2QDAEB9BAB8AwB8AgB8BAAZF2QDAEB9AwB8AgB8AwAZ
fAIAfAQAGQJ8AgB8BAA8fAIAfAMAPHGgAFd8AgB8AgB8BAAZfAIAfAMAGRdkAwBAGX0IAHwFAHQE
AHQCAHwGAIMBAHwIAEGDAQA3fQUAcY0AV3wFAFMoBwAAAE5pAAEAAGkAAAAAaf8AAABSAAAAAGn9
AwAAaQEAAAAoBQAAAHQFAAAAcmFuZ2VSAQAAAHQDAAAAb3JkdAMAAABsZW50AwAAAGNocigJAAAA
dAMAAABrZXl0BAAAAGRhdGF0AQAAAFN0AQAAAGpSBwAAAHQDAAAAb3V0dAEAAABidAEAAAB4dAEA
AABLKAAAAAAoAAAAAHMRAAAAc2VjcmV0X21lc3NhZ2UucHl0BQAAAF9tcmM0CQAAAHMgAAAAAAEM
AQYBEwEmASkBBgEGAQYBDQETAQ4BEgEhARoBHgF0BQAAAFJvYm90YwAAAAAAAAAAAQAAAEIAAABz
RwAAAGUAAFoBAGQAAIQAAFoCAGQBAIQAAFoDAGQCAIQAAFoEAGQDAIQAAFoFAGQEAIQAAFoGAGQF
AIQAAFoHAGQGAIQAAFoIAFJTKAcAAABjAQAAAAEAAAACAAAAQwAAAHMWAAAAZAMAfAAAXwAAZAIA
fAAAXwEAZAAAUygEAAAATmkAAAAAUgAAAAAoAgAAAGkAAAAAaQAAAAAoAgAAAHQLAAAAX1JvYm90
X19wb3N0CwAAAF9Sb2JvdF9fc3RrKAEAAAB0BAAAAHNlbGYoAAAAACgAAAAAcxEAAABzZWNyZXRf
bWVzc2FnZS5weXQIAAAAX19pbml0X18cAAAAcwQAAAAAAQkBYwEAAAABAAAAAgAAAEMAAABzDQAA
AHwAAGoAAGQBAIMBAFMoAgAAAE50AQAAAGQoAQAAAHQEAAAAbW92ZSgBAAAAUhkAAAAoAAAAACgA
AAAAcxEAAABzZWNyZXRfbWVzc2FnZS5weXQIAAAAbW92ZURvd24gAAAAcwIAAAAAAWMBAAAAAQAA
AAIAAABDAAAAcw0AAAB8AABqAABkAQCDAQBTKAIAAABOdAEAAABsKAEAAABSHAAAACgBAAAAUhkA
AAAoAAAAACgAAAAAcxEAAABzZWNyZXRfbWVzc2FnZS5weXQIAAAAbW92ZUxlZnQjAAAAcwIAAAAA
AWMBAAAAAQAAAAIAAABDAAAAcw0AAAB8AABqAABkAQCDAQBTKAIAAABOdAEAAAByKAEAAABSHAAA
ACgBAAAAUhkAAAAoAAAAACgAAAAAcxEAAABzZWNyZXRfbWVzc2FnZS5weXQJAAAAbW92ZVJpZ2h0
JgAAAHMCAAAAAAFjAQAAAAEAAAACAAAAQwAAAHMNAAAAfAAAagAAZAEAgwEAUygCAAAATnQBAAAA
dSgBAAAAUhwAAAAoAQAAAFIZAAAAKAAAAAAoAAAAAHMRAAAAc2VjcmV0X21lc3NhZ2UucHl0BgAA
AG1vdmVVcCkAAABzAgAAAAABYwIAAAAJAAAABAAAAEMAAABzbgEAAHQAAHwBAIMBAHQBAGsDAHMw
AHQCAHwBAIMBAGQBAGsDAHMwAHwBAGQCAGsHAHI8AHQDAIMAAIIBAG4AAHwAAGoEAFwCAH0CAH0D
AGkEAGQRAGQEADZkEgBkBgA2ZBMAZAcANmQUAGQIADZ8AQAZXAIAfQQAfQUAZAMAfAIAfAQAFwQD
awEAb5IAZAkAawAAbgIAAgFvtABkAwB8AwB8BQAXBANrAQBvsgBkCQBrAABuAgACAXO7AHQFAFN8
AQBkBgBrAwByzQB8AgBuBwB8AgB8BAAXfQYAfAEAZAgAawMAcukAfAMAbgcAfAMAfAUAF30HAGQB
AHwGAGQKABR8AQBkCwBrBgBkCQAUF3wHABc+ZAwAQHIbAXQFAFN8AAAEagYAfAEANwJfBgB8AgB8
BAAXfAMAfAUAF2YCAHwAAF8EAHgmAGQVAERdHgB9CAB8AABqBgBqBwB8CABkEACDAgB8AABfBgBx
SAFXdAgAUygWAAAATmkBAAAAdAQAAABkbHJ1aQAAAABSGwAAAGn/////Uh4AAABSIAAAAFIiAAAA
aSAAAABpQAAAAHQCAAAAZHVsiQAAAMwhqSWNKClirAW/QlF2SzalSZQbJVnGYblNWUQGaxJ3yRqJ
Fyl50iotVjxwYyWCGTF/+WkPFnNwQwtmNJ1UiEiNWtpvp1XmJIBS8UrmIKoZ43ZWaPcLo01iEmEQ
KSUsdKJRv1U+NW4IQi3WerApRUV7KRhweibvL1pwSiDPJMNZhXw5DpN0n2fPPXYt514QI6p4jGJI
HfA1nhjFVwY6YCRSRV1ltX44J3I7dVWhXlIoSgK0aattVFlhRzxZWRSME6kh11FGKWRrZX0XAjsN
4m1qQl0LLgeLJjwUfQqHc+JGoy4teT5iRhbAENNldj3MGkU9xTPZVXJFwESuOXMvwUzsKGYL2AMI
Sfp//3//SCtT1AB0AgAAAHVkdAIAAABscnQCAAAAcmxSAAAAACgCAAAAaQAAAABpAQAAACgCAAAA
af////9pAAAAACgCAAAAaQEAAABpAAAAACgCAAAAaQAAAABp/////ygEAAAAUiYAAABSJQAAAFIn
AAAAUigAAAAoCQAAAHQEAAAAdHlwZXQDAAAAc3RyUgsAAAB0CgAAAFZhbHVlRXJyb3JSFwAAAHQF
AAAARmFsc2VSGAAAAHQHAAAAcmVwbGFjZXQEAAAAVHJ1ZSgJAAAAUhkAAABSGwAAAFITAAAAdAEA
AAB5dAIAAABkeHQCAAAAZHl0AgAAAGN4dAIAAABjeXQCAAAAYnQoAAAAACgAAAAAcxEAAABzZWNy
ZXRfbWVzc2FnZS5weVIcAAAALAAAAHMeAAAAAAEwAAwBDwEsAUAABAEcARwBJAAEAQ8BFwENARwB
YwEAAAAEAAAABAAAAEMAAABzcAAAAHwAAGoAAFwCAH0BAH0CAHwAAGoAAGQHAGsDAHI0AGQCAGQB
AHwBABhkAQB8AgAYZgIAFlN0AQB8AABqAgCDAQB9AwB8AwBqAwBkAwCDAQBzVgBkBABTdAQAfAMA
ZAUAIHQFAGoGAGQGAIMBAIMCAFMoCAAAAE5pHwAAAHM3AAAAWW91IGFyZSBzdGlsbCAlZCBsZWZ0
IGFuZCAlZCBhYm92ZSB3aGVyZSB5b3Ugc2hvdWxkIGJlIXMCAAAABIVzHwAAAENoZWF0ZXIhIEdv
IHNvbHZlIGl0IGNvcnJlY3RseSFp/f///3OAAAAAUWZ2ZlhZbjROVHpCR084eWo5cjBOTjk0ekZ6
VEphczEyUDIvak1Uc2QzUFFlNjJIeVh0WXZIaGxrMXFkbHR4SnhpZ0Fxd1ozczlqK2E4dGhBZVlp
M242TWY3RDU4eCtEZzhDWEkvS1FSRzB6UGhyYVl6TGRnNGJ2TVpJYTl4Yz0oAgAAAGkfAAAAaR8A
AAAoBwAAAFIXAAAAUggAAABSGAAAAHQIAAAAZW5kc3dpdGhSFQAAAHQJAAAAX2JpbmFzY2lpdAoA
AABhMmJfYmFzZTY0KAQAAABSGQAAAFITAAAAUi8AAAB0AQAAAGsoAAAAACgAAAAAcxEAAABzZWNy
ZXRfbWVzc2FnZS5weXQFAAAAc29sdmU6AAAAcxAAAAAAAQ8BDwEWAg8BDwAEAhABKAkAAAB0CAAA
AF9fbmFtZV9fdAoAAABfX21vZHVsZV9fUhoAAABSHQAAAFIfAAAAUiEAAABSIwAAAFIcAAAAUjkA
AAAoAAAAACgAAAAAKAAAAABzEQAAAHNlY3JldF9tZXNzYWdlLnB5UhYAAAAbAAAAcw4AAAAGAQkE
CQMJAwkDCQMJDigIAAAAdAcAAABoYXNobGliUgIAAAB0CAAAAGJpbmFzY2lpUjYAAABSCAAAAFIV
AAAAdAYAAABvYmplY3RSFgAAACgAAAAAKAAAAAAoAAAAAHMRAAAAc2VjcmV0X21lc3NhZ2UucHl0
CAAAADxtb2R1bGU+AQAAAHMIAAAADAEMAgkFCRI=
I was wondering if anyone knew exactly what either language this was encoded with or whether or not there is someone out there that CAN help me decode this.
(Note: I tried using a Decoding website but from all the languages I checked, none of them were matching the language I was trying to find out. This has been boggling me ever since I started working on this 2 years ago.)
It appears to be BASE64 encoded at least in parts. Putting it through a BASE64 decoder gives the text below which includes a few hints as to what this might be.
Yc#sDddlZddlZdZdZdefdYZdS(iNcCs:d}x-tdD]}tj||j}qW|S(Ntii(txranget_hashlibtsha256tdigest(tmthti((ssecret_message.pyt_ishasc Cs#td}d}x_tdD]Q}|||t||t|7}||d#||||<||d#d|dS(NiR(ii(t_Robot__post_Robot__stk(tself((ssecret_message.pytinits cCs
|jdS(Ntd(tmove(R((ssecret_message.pytmoveDown scCs
|jdS(Ntl(R(R((ssecret_message.pytmoveLeft#scCs
|jdS(Ntr(R(R((ssecret_message.pyt moveRight&scCs
|jdS(Ntu(R(R((ssecret_message.pytmoveUp)sc Csnt|tks0t|dks0|dkrd#rtS|j|7||||f|x&dD]}|jj|d|_qHWtS(NitdlruiRiRR R"i i#tdul!%()bBQvK6I%YaMYDkw)y*-V5nB-z)EE{)pz&/ZpJ $Y|9tg=v-^#xbH5W:`$RE]e~8'r;uU^R(JimTYaGbFev=E=3UrED9s/L(fIH+StudtlrtrlR(ii(ii(ii(ii(R&R%R'R(( ttypetstrRt
ValueErrorRtFalseRtreplacetTrue( RRRtytdxtdytcxtcytbt((ssecret_message.pyR,s0,#$
cCsp|j\}}|jdkr4dd|d|fSt|j}|jdsVdSt|d tjdS(Nis7You are still %d left and %d above where you should be!ssCheater! Go solve it correctly!isQfvfXYn4NTzBGO8yj9r0NN94zFzTJas12P2/jMTsd3PQe62HyXtYvHhlk1qdltxJxigAqwZ3s9j+a8thAeYi3n6Mf7D58x+Dg8CXI/KQRG0zPhraYzLdg4bvMZIa9xc=(ii(RRRtendswithRt _binasciit
a2b_base64(RRR/tk((ssecret_message.pytsolve:s( t__name__t
__module__RRRR!R#RR9(((ssecret_message.pyRs (thashlibRtbinasciiR6RRtobjectR(((ssecret_message.pyts