Space between paragraphs in Selenium - selenium

I have a text file with text message(in 5 paragraphs) and the bot that sends messages to Instagram users. But it doesn't put spaces between paragraphs like it is in txt file, just straight text. Any ideas how to fix it?

If you are having issues with selenium not typing spaces when typing a string, You can use split method and each time you type a string of the new list, you can send a SPACE using Keys on selenium, like this:
parg = "Hello World!"
list_words = parg.split()
for word in list_words:
input_selected.send_keys(word)
input_selected.send_keys(Keys.SPACE)
I believe something like this can work too and it will be better since it won't require using any lists, just one line of code:
input_selected.send_keys(parg.replace(" ",Keys.SPACE))
NOTE:
in order to import Keys in selenium:
from selenium.webdriver.common.keys import Keys

Related

How to pass emoji scraping a text in phyton with bs4

I'm creating a scraper that scrapes all the comments in a URL page and I'm saving the text in a txt file (1 comment = 1 txt).
Now I'm having a problem when there are some emoji in the text of a comment. In fact, the program stops and says "UnicodeEncodeError: 'charmap' codec can't encode the character". How can I pass this problem? (I'm using bs4)
The structure of the code is like this:
q=requests.get(url)
soup=BeautifulSoup(q.content, "html.parser")
x=soup.find("a", {"class":"comments"})
y=x.find_all("div", {"class":"blabla"})
i=0
for item in y:
name=str(i)
comment=item.find_all("p")
out_file=open('%s.txt'%CreatorName, "w")
out_file.write(str(comment)
out_file.close
i=i+1
Thanks to everyone.
My guess is that you are on Windows. You code works perfectly on Linux. So change the encoding on the file you open to utf-8 like this:
out_file=open('%s.txt'%CreatorName, "w", encoding='utf-8')
This should write to the file without error although the emoji may not display properly in notepad you can always open it in FireFox or another application if you want to see the emoji. Other comment text should be readable in notepad though.

Import text from a .txt file using keywords in random positions

I'm new in this great platform and I have a question in Visual Basic.net.
I would like to import data from a txt file (or if you prefer a richtextbox!) using keywords that can be placed in a random position within the txt file. For example a txt like this:
keyword 25
or like this:
keyword 25
In both cases the application should be able to recognise the line because of the presence of the keyword and get the number (25) that will be saved in a variable. Of course this number can vary in different files.
I was thinking to use a code similar to this one:
If line.StartsWith(keyword) Then
.....
End If
but the problem is that the keyword is not always placed as first char (there can be spaces before) and I don't know the line where this keyword is placed int the txt file.
Then I would even ask you how to get the number 25 that can be also placed in random position after the keyword (but for sure on the same line).
I hope everything is clear and thanks if you can help me.
You may consider using .TrimStart() on the lines as you read them, like so:
If line.TrimStart.StartsWith(keyword) Then
.......
End If

lxml clean breaks href attribute

import lxml.html.clean as clean
cleaner = clean.Cleaner(style=True, remove_tags=['div','span',], safe_attrs_only=['href',])
text = cleaner.clean_html('link')
print text
prints
link
how to get:
link
i.e href in normal encoding?
clean does the right thing -- the string in parentheses should be properly encoded, and the seemingly garbled thing is the proper encoding.
You might not know, but kyrillic domain names don't exist -- there's a complex system to map these to "allowed" characters.

making a list of traditional Chinese characters from a string

I am currently trying to estimate the number of times each character is used in a large sample of traditional Chinese characters. I am interested in characters not words. The file also includes punctuation and western characters.
I am reading in an example file of traditional Chinese characters. The file contains a large sample of traditional Chinese characters. Here is a small subset:
首映鼓掌10分鐘 評語指不及《花樣年華》
該片在柏林首映,完場後獲全場鼓掌10分鐘。王家衛特別為該片剪輯「柏林版本
增減20處 趙本山香港戲分被刪
在柏林影展放映的《一代宗師》版本
教李小龍武功 葉問決戰散打王
另一增加的戲分是開場時葉問(梁朝偉飾)
My strategy is to read each line, split each line into a list, and go through and check each character to see if it already exists in a list or a dictionary of characters. If the character does not yet exist in my list or dictionary I will add it to that list, if it does exist in my list or dictionary, I will increase the counter for that specific character. I will probably use two lists, a list of characters, and a parallel list containing the counts. This will be more processing, but should also be much easier to code.
I have not gotten anywhere near this point yet.
I am able to read in the example file successfully. Then I am able to make a list for each line of my file. I am able to print out those individual lines into my output file and sort of reconstitute the original file, and the traditional Chinese comes out intact.
However, I run into trouble when I try to make a list of each character on a particular line.
I've read through the following article. I understood many of the comments, but unfortunately, was unable to understand enough of it to solve my problem.
How to do a Python split() on languages (like Chinese) that don't use whitespace as word separator?
My code looks like the following
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import codecs
wordfile = open('Chinese_example.txt', 'r')
output = open('Chinese_output_python.txt', 'w')
LINES = wordfile.readlines()
Through various tests I am sure the following line is not splitting the string LINES[0] into its component Chinese characters.
A_LINE = list(LINES[0])
output.write(A_LINE[0])
I mean you want to use this, from answerer 'flow' at How to do a Python split() on languages (like Chinese) that don't use whitespace as word separator? :
from re import compile as _Re
_unicode_chr_splitter = _Re( '(?s)((?:[\ud800-\udbff][\udc00-\udfff])|.)' ).split
def split_unicode_chrs( text ):
return [ chr for chr in _unicode_chr_splitter( text ) if chr ]
to successfully split a line of traditional Chinese characters.. I just had to know the proper syntax to handle encoded characters.. pretty basic.
my_new_list = list(unicode(LINE[0].decode('utf8')));

How to get data from a .rtf file or excel file into database(sqlite) in iphone sdk?

I had lots of data in a .rtf file(having usernames and passwords).How can I fetch that data into a table. I'm using sqlite3.
I had created a "userDatabase.sql" in that I had created a table "usersList" having fields "username","password". I want to get the list of data in the "list.rtf" file in to my table "usersList". Please help me .
Thanks in advance.
Praveena.
I would write a little parser. Re-save the .rtf as a txt-file and assume it look like this:
user1:pass1
user2:pass2
user5:pass5
Now do this (in your code):
open the .txt file (NSString -stringWithContentsOfFile:usedEncoding:error:)
read line by line
for each line, fetch user and password (NSArray -componentsSeparatedByString)
store user/password into your DB
Best,
Christian
Edit: for parsing excel-sheets I recommend export as CSV file and then do the same
Parsing RTF files is mostly trivial. They're actually text, not binary (like doc pdf etc).
Last I used it, I remember the file format wasn't too difficult either.
Example:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
{\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\sa200\sl276\slmult1\lang9\f0\fs22 Username Password\par
Username2 Password2\par
UsernameN PasswordN\par
}
Do a regular expression match to get the last { ... } part. By sure to match { not \{.
Next, parse the text as you want, but keep in mind that:
everything starting with a \ is escaped, I would write a little function to unescape the text
the special identifier \par is for a new line
there are other special identifiers, such as \b which toggles bolding text
the color change identifier, \cfN changes the text color according to the color table defined in the file header. You would want to ignore this identifier since we're talking about plain text.