TypeError: 'NoneType' object is not iterable Youtube Live Chat - beautifulsoup

I am trying to get names and messages from yt live chat.
Here is the code:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
url ='https://www.youtube.com/live_chat?is_popout=1&v=EEIk7gwjgIM'
browser = webdriver.Firefox()
browser.get(url)
r = requests.get(url)
soup = BeautifulSoup(r.content,"html.parser")
contents =soup.find("yt-live-chat-text-message-renderer")
for content in contents:
author = content.find("span",{"id":"author-name"}).text
message_content = content.find("span",{"id":"message"}).text
print(author+message_content)
It gives me this error:
TypeError: 'NoneType' object is not iterable
How can i solve that?

Use selenium to extract the html code of the website
Use soup.find_all() instead of soup.find() to retrieve all messages
Here is the full code:
from bs4 import BeautifulSoup
from selenium import webdriver
import time
url ='https://www.youtube.com/live_chat?is_popout=1&v=EEIk7gwjgIM'
browser = webdriver.Chrome()
browser.get(url)
time.sleep(3)
soup = BeautifulSoup(browser.page_source,"html.parser")
contents = soup.find_all("yt-live-chat-text-message-renderer")
for content in contents:
author = content.find("span",{"id":"author-name"}).text.strip()
message_content = content.find("span",{"id":"message"}).text.strip()
print(author ,message_content)
Output:
Plague Doctor #Yellow Rose I don't what you have to say.All people like you do is jump into a conversation and act like you are important
Yellow Rose Haha #Plague Doctor Whatever you say
racheal naidoo hi
NoMade lion Hi yellow rose what is going on in your life
半杯苦茶 Hallo
racheal naidoo hello everyone
Arifa Waheed hi
Plague Doctor Hello 1/2 cup bitter tea
putri zahrani ada orang Indonesia gak?
racheal naidoo hi night bot! 🌃
Yellow Rose Not much #NoMade Lion how about you?
Sath Sitivili hi
racheal naidoo bye everyone 😇🌃
Yellow Rose !iss12
Nightbot The Water Recovery System on ISS reduces crew dependence on “delivered” water by 75% – from about 4 litre a day to 1 litre. Learn more:https://youtu.be/cR_jQ4Is8t0
Arifa Waheed yellow rose what 12?
.
.
.
.
Aniket Srivastava clear sky
C Moore music is so right for the image😊
δεσποινα ζαχου you calm when you look at this view
kaminari fala aeeeeee galera
Kazouchan hello c'est moi je suis quantique
Note: Your output might be different from mine as the time during when we scraped the data would be different.

Related

Python TTS stops after 9 iterations. file is created, but no audio

Raspberry PI, Python 3.7, program reads city list for weather conditions. Trying to output just the city name, e.g., New York, Detroit, etc. Program outputs the audio okay for first nine cities, afterwards the file is created but has no audio. Program loop is 240 seconds, 15 TTS per hour which is within the 100 TTS calls per hour limit. Python program continues, no errors, just no audio after the first nine cities. Restarting program after the tenth city, audio is okay but only for first nine cities, then no audio.
Appreciate any thoughts on this.
As said no Python program errors, restart works for 9 cities, then no audio.
Below are the two routines which run in a loop for each city [name]. The routines are called from main which send data to Nextion display.
enter code here
TTS Name
def tts_name(name):
global saveit
# TTS and Save City Name
tts = gtts.gTTS(name)
saveit = "Name.mp3"
tts.save(saveit)
# Play City Name
def Play_Name():
# Define Media Player file
media = vlc.MediaPlayer("/home/pi/Nextion/OpenWeather_10/" + saveit)
# setting volume
media.audio_set_volume(90)
#Play Name
media.play()

Looking for a way to scrap these urls using Selenium from paginated website

Im trying to scrap urls from a website and then output them in a csv.
The code is working, but not going to the next page as the website is paginated. While the counter is increasing and changing the url, the page that is loading is page 1.
How do I resolve this?
import csv
from selenium import webdriver
MAX_PAGE_NUM =3
MAX_PAGE_DIG=1
driver = webdriver.Firefox()
for i in range(1, MAX_PAGE_NUM + 1):
page_num = (MAX_PAGE_DIG - len (str(i))) *'0' + str(i)
driver.get("https://www.example.com/user/learn/freehelp/dynTest/1/Landing/1/page"+page_num)
find_href = driver.find_elements_by_xpath('//div[#class="col-md-12"]/a')
num_page_items= len(find_href)
with open('links1.csv', 'a') as f:
for i in range(num_page_items):
for my_href in find_href:
f.write(my_href.get_attribute("href") +'\n')
driver.close()
You don't need selenium for this task: that info is accessible with requests.
Here is one way of getting that data:
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs
from tqdm import tqdm
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
big_list = []
for x in tqdm(range(1, 5)): ## there are 100 pages, so you may want to increase the range to 101
url = f'https://www.studypool.com/user/learn/freehelp/dynTest/1/Landing/1/page/{x}'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
table = soup.select_one('table.feedTable')
titles = table.select('p.qn-title')
for t in titles:
title = t.get_text(strip=True)
link = 'https://www.studypool.com' + t.parent.get('href')
big_list.append((title, link))
df = pd.DataFrame(big_list, columns = ['Thread', 'Url'])
print(df)
df.to_csv('another_issue_solved.csv')
This will save the dataframe as a csv file, and print out in terminal:
Thread Url
0 TAX 4011 FIU Preparing a Tax Memoir for Your New Client Memorandum https://www.studypool.com/discuss/18364986/preparing-a-tax-memor-for-your-new-client
1 Accounting Twilio Companys Billing System Case Study Project https://www.studypool.com/discuss/18330081/briefly-outline-the-problem-statement-objectives-and-goals-and-your-approach-to-the-needs-assessment-and-research-methodology-in-2-4-pages
2 University of Houston Accounting Oil and Gas Accounting Issues Paper https://www.studypool.com/discuss/18330479/issues-of-exporting-lng-from-the-u-s-course-oil-and-gas-accounting
3 RC Accounting Business Management Income and Balance Sheet Analysis https://www.studypool.com/discuss/18330477/evaluating-performance-and-benchmarking-2
4 SNHU Accounting Paper https://www.studypool.com/discuss/18293340/draft-for-introduction-for-final-project
... ... ...
155 BPA 331 University of Phoenix ?time Value of Money Excel Analysis https://www.studypool.com/discuss/18639381/bpa-331-time-value-of-money-assignment
156 Accounting Business Communication Agenda for The First Team Meeting Portfolio Tasks https://www.studypool.com/discuss/18659279/portfolio-2-1
157 BPA 331 University of Phoenix ?life Cycle Costing Analysis https://www.studypool.com/discuss/18639378/bpa-331-life-cycle-costing-analysis-assignment
158 Accounting International Business and Corporate Strategies Group Essay https://www.studypool.com/discuss/18659434/write-a-part-of-body-paragraph-of-an-essay
159 Aklan Catholic College Direct Labor Cost Assigned to Production Accounting Questions https://www.studypool.com/discuss/18642386/accounting-413
BeautifulSoup docs: https://beautiful-soup-4.readthedocs.io/en/latest/index.html
Requests docs: https://requests.readthedocs.io/en/latest/
Pandas: https://pandas.pydata.org/pandas-docs/stable/index.html
And tqdm: https://tqdm.github.io/
Try this if this is your web address
driver.get("https://www.studypool.com/user/learn/freehelp/dynTest/1/Landing/1/page1//page/"+page_num)

selenium web scraping variable empty even though inspect element show correct place

I have this line coin_name = driver.find_elements(By.XPATH, '//tbody[1]/tr/td[3]/div/div[2]/a[1]') for CoinGecko
But the variable it empty - why?
wait=WebDriverWait(driver, 60)
driver.get("https://www.coingecko.com/en/coins/recently_added")
coin_names=wait.until(EC.presence_of_all_elements_located((By.XPATH,'//tbody[1]/tr/td[3]/div/div[2]/a[1]')))
for coin in coin_names:
print(coin.text)
Did you wait for the elements to come up. Also since you used driver.find_elements I am assuming you want a list.
Imports:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Outputs:
Empire Capital Token
BlackPoker
MAGIC BNB
Traders Global Business
SuperMegaHyperDoge
Zeronauts
DopeWarz
YESorNO
Dawn Wars
Plastiks
KaraStar UMY
Orne
Spintop
Charm
Crystal Wallet
Bitcoin BR
Solar Full Cycle
Shinomics
Defender of Doge
Doge Rise Up
Polkago
Aurora token
Trick
AvaPay
Andromeda
Yooshiba Inu
xSuter
Coxswap
Small Fish Cookie
BuffSwap
Kanga Exchange
MetaGaming
NFTinder
Peoplez
The Monopolist
Index Coop - ETH 2x Flexible Leverage Index (Polygon)
KingPad
Zodium
Family
Metaverse Exchange
Victoria VR
MooMonster
Decentral Games Governance
Infam
Xolo Inu
Decentral Games
Multiverse Capital
Drachen Lord
Studio Shibli
CheeseDAO
Xeebster
DiamondShiba
Ohm Inu DAO
GamingShiba
Honey Deluxe
OOOOR Finance
Wizards And Dragons
Return of The King
Devia8
Umami Finance
TerraKub
Big Fund Capital DAO
Stamen Tellus Token
Harmony Parrot Egg
Luxy
Doge Raca
FireFlame Inu
TerraUSD (Wormhole)
WAGMI Game
Cryptogram
AngryFloki
4JNET
GizaDao
DoragonLand
Moonscape
Lord Arena
WidiLand
Witnet
Kiradoge
Solidray Finance
Fenix Danjon
Dogelana
Santos FC Fan Token
Integritee
Kardia Info
Governance OHM
MINE Network
StraitsX Indonesia Rupiah
Income Island
SuperPlayer World
ForthBox
Txbit
StaFi rATOM
Juicebox
Foxboy
ImpactXP
Miaw
ShopNEXT
Creaticles
Kaizilla

Getting the images produced by AzureML experiments back

I have created a toy example in Azure.
I have the following dataset:
amounts city code user_id
1 2.95 Colleferro 100 999
2 2.95 Subiaco 100 111
3 14.95 Avellino 101 333
4 14.95 Colleferro 101 999
5 14.95 Benevento 101 444
6 -14.95 Subiaco 110 111
7 -14.95 Sgurgola 110 555
8 -14.95 Roma 110 666
9 -14.95 Colleferro 110 999
I create an AzureML experiment that simply plots the column of the amounts.
The code into the R script module is the following:
data.set <- maml.mapInputPort(1) # class: data.frame
#-------------------
plot(data.set$amounts);
title("This title is a very long title. That is not a problem for R, but it becomes a problem when Azure manages it in the visualization.")
#-------------------
maml.mapOutputPort("data.set");
Now, if you click on the right output port of the R script and then on "Visualize"
you will see the Azure page where the outputs are shown.
Now, the following happens:
The plot is stucked into an estabilished space (example: the title is cut!!!)
The image produced is a low resolution one.
The JSON produced by Azure is "dirty" (making the decoding in C# difficult).
It seems that this is not the best way to get the images produced by the AzureML experiment.
Possible solution: I would like
to send the picture produced in my experiment to a space like the blob
storage.
This would be also a great solution when I have a web-app and I have to pick the image produced by Azure and put it on my Web App page.
Do you know if there is a way to send the image somewhere?
To saving the images into Azure Blob Storage with R, you need to do two steps, which include getting the images from the R device output of Execute R Script and uploading the images to Blob Storage.
There are two ways to implement the steps above.
You can publish the experiment as a webservice, then get the images with base64 encoding from the response of the webservice request and use Azure Blob Storage REST API with R to upload the images. Please refer to the article How to retrieve R data visualization from Azure Machine Learning.
You can directly add a module in C# to get & upload the images from the output of Execute R Script. Please refer to the article Accessing a Visual Generated from R Code in AzureML.
You can resize the image in following way:
graphics.off()
png("myplot.png",width=300,height=300) ## Create new plot with desired size
plot(data.set);
file.remove(Sys.glob("*rViz*png")) ## Get rid of default rViz file

Download history stock prices automatically from yahoo finance in python

Is there a way to automatically download historical prices of stocks from yahoo finance or google finance (csv format)? Preferably in Python.
When you're going to work with such time series in Python, pandas is indispensable. And here's the good news: it comes with a historical data downloader for Yahoo: pandas.io.data.DataReader.
from pandas.io.data import DataReader
from datetime import datetime
ibm = DataReader('IBM', 'yahoo', datetime(2000, 1, 1), datetime(2012, 1, 1))
print(ibm['Adj Close'])
Here's an example from the pandas documentation.
Update for pandas >= 0.19:
The pandas.io.data module has been removed from pandas>=0.19 onwards. Instead, you should use the separate pandas-datareader package. Install with:
pip install pandas-datareader
And then you can do this in Python:
import pandas_datareader as pdr
from datetime import datetime
ibm = pdr.get_data_yahoo(symbols='IBM', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1))
print(ibm['Adj Close'])
Downloading from Google Finance is also supported.
There's more in the documentation of pandas-datareader.
Short answer: Yes. Use Python's urllib to pull the historical data pages for the stocks you want. Go with Yahoo! Finance; Google is both less reliable, has less data coverage, and is more restrictive in how you can use it once you have it. Also, I believe Google specifically prohibits you from scraping the data in their ToS.
Longer answer: This is the script I use to pull all the historical data on a particular company. It pulls the historical data page for a particular ticker symbol, then saves it to a csv file named by that symbol. You'll have to provide your own list of ticker symbols that you want to pull.
import urllib
base_url = "http://ichart.finance.yahoo.com/table.csv?s="
def make_url(ticker_symbol):
return base_url + ticker_symbol
output_path = "C:/path/to/output/directory"
def make_filename(ticker_symbol, directory="S&P"):
return output_path + "/" + directory + "/" + ticker_symbol + ".csv"
def pull_historical_data(ticker_symbol, directory="S&P"):
try:
urllib.urlretrieve(make_url(ticker_symbol), make_filename(ticker_symbol, directory))
except urllib.ContentTooShortError as e:
outfile = open(make_filename(ticker_symbol, directory), "w")
outfile.write(e.content)
outfile.close()
Extending #Def_Os's answer with an actual demo...
As #Def_Os has already said - using Pandas Datareader makes this task a real fun
In [12]: from pandas_datareader import data
pulling all available historical data for AAPL starting from 1980-01-01
#In [13]: aapl = data.DataReader('AAPL', 'yahoo', '1980-01-01')
# yahoo api is inconsistent for getting historical data, please use google instead.
In [13]: aapl = data.DataReader('AAPL', 'google', '1980-01-01')
first 5 rows
In [14]: aapl.head()
Out[14]:
Open High Low Close Volume Adj Close
Date
1980-12-12 28.750000 28.875000 28.750 28.750 117258400 0.431358
1980-12-15 27.375001 27.375001 27.250 27.250 43971200 0.408852
1980-12-16 25.375000 25.375000 25.250 25.250 26432000 0.378845
1980-12-17 25.875000 25.999999 25.875 25.875 21610400 0.388222
1980-12-18 26.625000 26.750000 26.625 26.625 18362400 0.399475
last 5 rows
In [15]: aapl.tail()
Out[15]:
Open High Low Close Volume Adj Close
Date
2016-06-07 99.250000 99.870003 98.959999 99.029999 22366400 99.029999
2016-06-08 99.019997 99.559998 98.680000 98.940002 20812700 98.940002
2016-06-09 98.500000 99.989998 98.459999 99.650002 26419600 99.650002
2016-06-10 98.529999 99.349998 98.480003 98.830002 31462100 98.830002
2016-06-13 98.690002 99.120003 97.099998 97.339996 37612900 97.339996
save all data as CSV file
In [16]: aapl.to_csv('d:/temp/aapl_data.csv')
d:/temp/aapl_data.csv - 5 first rows
Date,Open,High,Low,Close,Volume,Adj Close
1980-12-12,28.75,28.875,28.75,28.75,117258400,0.431358
1980-12-15,27.375001,27.375001,27.25,27.25,43971200,0.408852
1980-12-16,25.375,25.375,25.25,25.25,26432000,0.378845
1980-12-17,25.875,25.999999,25.875,25.875,21610400,0.38822199999999996
1980-12-18,26.625,26.75,26.625,26.625,18362400,0.399475
...
There is already a library in Python called yahoo_finance so you'll need to download the library first using the following command line:
sudo pip install yahoo_finance
Then once you've installed the yahoo_finance library, here's a sample code that will download the data you need from Yahoo Finance:
#!/usr/bin/python
import yahoo_finance
import pandas as pd
symbol = yahoo_finance.Share("GOOG")
google_data = symbol.get_historical("1999-01-01", "2016-06-30")
google_df = pd.DataFrame(google_data)
# Output data into CSV
google_df.to_csv("/home/username/google_stock_data.csv")
This should do it. Let me know if it works.
UPDATE:
The yahoo_finance library is no longer supported.
You can check out the yahoo_fin package. It was initially created after Yahoo Finance changed their API (documentation is here: http://theautomatic.net/yahoo_fin-documentation).
from yahoo_fin import stock_info as si
aapl_data = si.get_data("aapl")
nflx_data = si.get_data("nflx")
aapl_data.head()
nflx_data.head()
aapl_data.to_csv("aapl_data.csv")
nflx_data.to_csv("nflx_data.csv")
It's trivial when you know how:
import yfinance as yf
df = yf.download('CVS', '2015-01-01')
df.to_csv('cvs-health-corp.csv')
If you wish to plot it:
import finplot as fplt
fplt.candlestick_ochl(df[['Open','Close','High','Low']])
fplt.show()