How do I make my input text wholly visible in PySimpleGUI - input

enter image description here
How do I get my program using PySimpleGUI to display the whole text information before the input box?
Suppose I have a long sg.Text('abcdefghijklmnopqrst') - All of this text is not being displayed in the window.
Thanks in Advance

Try to use textwrap.wrap
Wraps the single paragraph in text (a string) so every line is at most width characters long. Returns a list of output lines, without final newlines.
Example code
from textwrap import wrap
import PySimpleGUI as sg
def Frame(text, key, width=30):
lines = wrap(text, width=width)
height = len(lines)
new_text = '\n'.join(lines)
layout = [[sg.Text(new_text, size=(width, height)), sg.Input(key=key)]]
return sg.Frame('', layout, pad=(0, 0))
Questions = [
'How often do you use our products?',
'Which features are most valuable to you?',
'How would you compare our products to our competitors?',
'What important features are we missing?',
'What are you trying to solve by using our product?',
'What other types of people could find our product useful?',
'How easy is it to use our product?',
'How would you rate the value for money?',
'How likely are you to recommend this product to others?',
'How could we improve our product to better meet your needs?',
]
font = ('Courier New', 10, 'bold')
sg.set_options(font=font)
sg.theme('DarkBlue3')
layout = [
[Frame(question, f'Q {i}')] for i, question in enumerate(Questions)] + [
[sg.Push(), sg.Button('Submit'), sg.Push()],
]
sg.Window('Product Survey Questions', layout).read(close=True)

Related

Can use Beautifulsoup to find elements hidden by other wrapped elements?

I would like to extract the text data of the author affiliations on this page using Beautiful soup.
I know of a work around using selenium to simply click on the 'show more' link and scan the page again? Im not sure what kind of elements these are, hidden? as they only appear in the inspector after clicking the button.
Is there a way to extract this info just using beautiful soup or do I need selenium or something equivalent to reveal the elements in the HTML code?
from bs4 import BeautifulSoup
import requests
url = 'https://www.sciencedirect.com/science/article/abs/pii/S0920379621007596'
sp = BeautifulSoup(r.content, 'html.parser')
r = sp.get(url)
author_data = sp.find('div', id='author-group')
affiliations = author_data.find('dl', class_='affiliation').text
print(affiliations)
That info is within a script tag though you need to map the letters for affiliations to the actual affiliations. The code below extracts the JavaScript object housing the info you want and handles with JSON library.
There is then a series of steps to dynamically determine which indices hold the info of interest and then use a constructed mapping of the letters to affiliations to assign the correct affiliation to each author.
The author first and last names are also dynamically ascertained and joined together with a space.
The intention was to avoid hardcoding indices which might change over time.
import re
import json
import requests
r = requests.get('https://www.sciencedirect.com/science/article/abs/pii/S0920379621007596',
headers={'User-Agent': 'Mozilla/5.0'})
data = json.loads(re.search(r'(\{"abstracts".*})', r.text).group(1))
base = [i for i in data['authors']['content']
if i.get('#name') == 'author-group'][0]['$$']
affiliation_data = [i for i in base if i['#name'] == 'affiliation']
author_data = [i for i in base if i['#name'] == 'author']
name_info = [i['_'] for author in author_data for i in author['$$']
if i['#name'] in ['given-name', 'surname']]
affiliations = dict(zip([j['_'] for i in affiliation_data for j in i['$$'] if j['#name'] == 'label'], [
j['_'] for i in affiliation_data for j in i['$$'] if isinstance(j, dict) and '_' in j and j['_'][0].isupper()]))
# print(affiliations)
author_affiliations = dict(zip([' '.join([i[0], i[1]]) for i in zip(name_info[0::2], name_info[1::2])], [
affiliations[j['_']] for author in author_data for i in author['$$'] if i['#name'] == 'cross-ref' for j in i['$$'] if j['_'] != '⁎']))
print(author_affiliations)

Convert CONLL file to a list of Doc objects

Is there a way to convert CONLL file into list of Doc objects without having to parse the sentence using the nlp object. I have a list of annotations that I have to pass to the automatic component that uses Doc objects as input. I have found a way to create the doc:
doc = Doc(nlp.vocab, words=[...])
And that I can use the from_array function to recreate the other linguistic features. This array can be recreated by using index value from StringStore object, I have successfully created Doc object with LEMMA and TAG information but cannot recreate HEAD data. My question is how to pass HEAD data to Doc object using from_array method.
The confusing thing about the HEAD is that for sentence that has this structure:
Ona 2
je 2
otišla 2
u 4
školu 2
. 2
The output of this code snippet:
from spacy.attrs import TAG, HEAD, DEP
doc.to_array([TAG, HEAD, DEP])
is:
array([[10468770234730083819, 2, 429],
[ 5333907774816518795, 1, 405],
[11670076340363994323, 0, 8206900633647566924],
[ 6471273018469892813, 1, 8110129090154140942],
[ 7055653905424136462, 18446744073709551614, 435],
[ 7173976090571422945, 18446744073709551613, 445]],
dtype=uint64)
I cannot correlate the center column of the from_array output to dependency tree structure given above.
Thanks in advance for the help,
Daniel
Ok, so I finally cracked it it appears that head - index if the index is lower than head and 18446744073709551616 - index otherwise. This is the function that I used if anyone else needed it:
import numpy as np
from spacy.tokens import Doc
docs = []
for sent in sents:
generated_doc = Doc(doc.vocab, words=[word["word"] for word in sent])
heads = []
for idx, word in enumerate(sent):
if word["pos"] not in doc.vocab.strings:
doc.vocab.strings.add(word["pos"])
if word["dep"] not in doc.vocab.strings:
doc.vocab.strings.add(word["dep"])
if word["head"] >= idx:
heads.append(word["head"] - idx)
else:
heads.append(18446744073709551616-idx)
np_array = np.array([np.array([doc.vocab.strings[word["pos"]], heads[idx], doc.vocab.strings[word["dep"]]], dtype=np.uint64) for idx, word in enumerate(sent)], dtype=np.uint64)
generated_doc.from_array([TAG, HEAD, DEP], np_array)
docs.append(generated_doc)

BeautifulSoup: Pull p tag while between defined h2 tags

this has puzzled me for a bit now. I am trying to pull all of the text from 'p' tags under 'h2' tags by names of "New Fundings" and "New Funds".
Number of 'p' tags aren't consistent for each page, so I was thinking of some sort of while loop and what I tried didn't work. The format for each tag is often the company name with 'strong', then listing text and other 'strong' tags for who funded/invested.
Once I can parse it properly, the goal is to export the company name from 'strong' tag with the proceeding text and the investing companies/people (from following 'strong' tags in the 'p' block to do some data analysis.
Any help would be appreciated - yes, I have looked through various other help pages, but the attempts I've made haven't been successful, so I came here.
import requests
page = requests.get("https://www.strictlyvc.com/2017/06/13/strictlyvc-june-12-2017/")
page
page.content
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
entrysoup = soup.find(class_ = 'post-entry')
// trying to pull the right paragraphs but these only select the NEXT one, I want all of the tags under 'New Fundings' & 'New Funds' (basically, until the next tag that isn't either of those.
print(entrysoup.find('h2', text = 'New Fundings').find_next_sibling('p'))
print(entrysoup.find('h2', text = 'New Funds').find_next_sibling('p'))
// This was closer, but I wasn't sure how to get it to stop when it hit the non-New Fundings/New Funds tags
for strong_tag in entrysoup.find_all('strong'):
print (strong_tag.text, strong_tag.next_sibling)
I think this is the best result I could get for now. if It it's not what you want let me know so I could fiddle more. if it is mark it as answer:)
import requests
import bs4
page = requests.get("https://www.strictlyvc.com/2017/06/13/strictlyvc-june-12-2017/")
soup =bs4.BeautifulSoup(page.content, 'html.parser')
entrysoup = soup.find(class_ = 'post-entry')
Stop_Point = 'Also Sponsored By . . .'
for strong_tag in entrysoup.find_all('h2'):
if strong_tag.get_text() == 'New Fundings':
for sibling in strong_tag.next_siblings:
if isinstance(sibling, bs4.element.Tag):
print(sibling.get_text())
if sibling.get_text() == Stop_Point:
break
if sibling.name == 'div':
for children in sibling.children:
if isinstance(children, bs4.element.Tag):
if children.get_text() == Stop_Point:
break
print(children.get_text())

How to extract Highlighted Parts from PDF files

Is there any way to extract highlighted text from a PDF file programmatically? Any language is welcome. I have found several libraries with Python, Java, and also PHP but none of them do the job.
To extract highlighted parts, you can use PyMuPDF. Here is an example which works with this pdf file:
Direct download
# Based on https://stackoverflow.com/a/62859169/562769
from typing import List, Tuple
import fitz # install with 'pip install pymupdf'
def _parse_highlight(annot: fitz.Annot, wordlist: List[Tuple[float, float, float, float, str, int, int, int]]) -> str:
points = annot.vertices
quad_count = int(len(points) / 4)
sentences = []
for i in range(quad_count):
# where the highlighted part is
r = fitz.Quad(points[i * 4 : i * 4 + 4]).rect
words = [w for w in wordlist if fitz.Rect(w[:4]).intersects(r)]
sentences.append(" ".join(w[4] for w in words))
sentence = " ".join(sentences)
return sentence
def handle_page(page):
wordlist = page.get_text("words") # list of words on page
wordlist.sort(key=lambda w: (w[3], w[0])) # ascending y, then x
highlights = []
annot = page.first_annot
while annot:
if annot.type[0] == 8:
highlights.append(_parse_highlight(annot, wordlist))
annot = annot.next
return highlights
def main(filepath: str) -> List:
doc = fitz.open(filepath)
highlights = []
for page in doc:
highlights += handle_page(page)
return highlights
if __name__ == "__main__":
print(main("PDF-export-example-with-notes.pdf"))
Ok, after looking I found a solution for exporting highlighted text from a pdf to a text file. Is not very hard:
First, you highlight your text with the tool you like to use (in my case, I highlight while I'm reading on an iPad using Goodreader app).
Transfer your pdf to a computer and open it using Skim (a pdf reader, free and easy to find on the web)
On FILE, choose CONVERT NOTES and convert all the notes of your document to SKIM NOTES.
That's all: simply go to EXPORT an choose EXPORT SKIM NOTES. It will export you a list of your highlighted text. Once opened this list can be exported again to a txt format file.
Not much work to do, and the result is fantastic.

How to get an outline view in sublime texteditor?

How do I get an outline view in sublime text editor for Windows?
The minimap is helpful but I miss a traditional outline (a klickable list of all the functions in my code in the order they appear for quick navigation and orientation)
Maybe there is a plugin, addon or similar? It would also be nice if you can shortly name which steps are neccesary to make it work.
There is a duplicate of this question on the sublime text forums.
Hit CTRL+R, or CMD+R for Mac, for the function list. This works in Sublime Text 1.3 or above.
A plugin named Outline is available in package control, try it!
https://packagecontrol.io/packages/Outline
Note: it does not work in multi rows/columns mode.
For multiple rows/columns work use this fork:
https://github.com/vlad-wonderkidstudio/SublimeOutline
I use the fold all action. It will minimize everything to the declaration, I can see all the methods/functions, and then expand the one I'm interested in.
I briefly look at SublimeText 3 api and view.find_by_selector(selector) seems to be able to return a list of regions.
So I guess that a plugin that would display the outline/structure of your file is possible.
A plugin that would display something like this:
Note: the function name display plugin could be used as an inspiration to extract the class/methods names or ClassHierarchy to extract the outline structure
If you want to be able to printout or save the outline the ctr / command + r is not very useful.
One can do a simple find all on the following grep ^[^\n]*function[^{]+{ or some variant of it to suit the language and situation you are working in.
Once you do the find all you can copy and paste the result to a new document and depending on the number of functions should not take long to tidy up.
The answer is far from perfect, particularly for cases when the comments have the word function (or it's equivalent) in them, but I do think it's a helpful answer.
With a very quick edit this is the result I got on what I'm working on now.
PathMaker.prototype.start = PathMaker.prototype.initiate = function(point){};
PathMaker.prototype.path = function(thePath){};
PathMaker.prototype.add = function(point){};
PathMaker.prototype.addPath = function(path){};
PathMaker.prototype.go = function(distance, angle){};
PathMaker.prototype.goE = function(distance, angle){};
PathMaker.prototype.turn = function(angle, distance){};
PathMaker.prototype.continue = function(distance, a){};
PathMaker.prototype.curve = function(angle, radiusX, radiusY){};
PathMaker.prototype.up = PathMaker.prototype.north = function(distance){};
PathMaker.prototype.down = PathMaker.prototype.south = function(distance){};
PathMaker.prototype.east = function(distance){};
PathMaker.prototype.west = function(distance){};
PathMaker.prototype.getAngle = function(point){};
PathMaker.prototype.toBezierPoints = function(PathMakerPoints, toSource){};
PathMaker.prototype.extremities = function(points){};
PathMaker.prototype.bounds = function(path){};
PathMaker.prototype.tangent = function(t, points){};
PathMaker.prototype.roundErrors = function(n, acurracy){};
PathMaker.prototype.bezierTangent = function(path, t){};
PathMaker.prototype.splitBezier = function(points, t){};
PathMaker.prototype.arc = function(start, end){};
PathMaker.prototype.getKappa = function(angle, start){};
PathMaker.prototype.circle = function(radius, start, end, x, y, reverse){};
PathMaker.prototype.ellipse = function(radiusX, radiusY, start, end, x, y , reverse/*, anchorPoint, reverse*/ ){};
PathMaker.prototype.rotateArc = function(path /*array*/ , angle){};
PathMaker.prototype.rotatePoint = function(point, origin, r){};
PathMaker.prototype.roundErrors = function(n, acurracy){};
PathMaker.prototype.rotate = function(path /*object or array*/ , R){};
PathMaker.prototype.moveTo = function(path /*object or array*/ , x, y){};
PathMaker.prototype.scale = function(path, x, y /* number X scale i.e. 1.2 for 120% */ ){};
PathMaker.prototype.reverse = function(path){};
PathMaker.prototype.pathItemPath = function(pathItem, toSource){};
PathMaker.prototype.merge = function(path){};
PathMaker.prototype.draw = function(item, properties){};