I am trying to scrape data from a webpage with the following format
<html class="gr__racinng_applledaily_com_hk" style='overflow: initial;">
<head> ... </head>
<body data-gr-c-s-loaded="true">
<!-- Google Tag Mananger (noscript) -->
<noscript> ...</noscript>
<!-- End Google Tag Mananger (noscript) -->
<div data-v-6223d6a8 id="app" class="web"> ... </div>
</body>
</html>
By using
from bs4 import BeautifulSoup
page = BeautifulSoup(raw_html.content, 'html.parser')
or
from bs4 import BeautifulSoup
page = BeautifulSoup(raw_html.content, 'html5lib')
it missed the <div> part, is it possible to get it back
Related
I am having real problem with Arabic text in <pre> tag
For example if I put this code in a page view
<pre>
<html>
<head>
<title>First HTML Page</title>
</head>
<body>
<h1><span>اول صفحة ويب</span></h1>
<p>هذه أول فقرة ننشؤها في أول صفحة</p>
</body>
</html>
</pre>
I get this display in browser
Here you can see that the arabic text is reversed
This happens just in ASP.NET Core MVC. In other frameworks, the text is displayed correctly.
I've tried to change the dir and the lang attributes but it does not help.
I copied your code snippet and reproduced the issue in my side, I noticed that the content in the page is wrong but in F12 is right, so I'm afraid the browser deal with specific language content automatically for the elements in <pre> tag... So I tried to change the direction manually with the style. How do you think about it?
<div>
<div>
<div style="direction:ltr;unicode-bidi: bidi-override;color:red">اول صفحة ويب</div>
<span>اول صفحة ويب</span>
<p>هذه أول فقرة ننشؤها في أول صفحة</p>
</div>
<pre >
<html>
<head>
<title>First HTML Page</title>
</head>
<body>
<h1><span style="direction:rtl;unicode-bidi: bidi-override;">اول صفحة ويب</span></h1>
<h1><span>اول صفحة ويب</span></h1>
<p>هذه أول فقرة ننشؤها في أول صفحة</p>
</body>
</html>
</pre>
</div>
I am working with Django and geemap modules, in which I am trying to make an app that can display satellite data on the map and the map should also be interactive as in there should be a bidirectional flow of data from the front-end(Django template) to back-end(python script) and vice-versa.
As of now I only know how to display the instance of geemap.Map() on Jupyter Notebook cell or on Colab(we just need to write the name of the variable for it.). But, I have no idea about how can i display the instance of geemap.Map() in Django Template.
When I use the following method it just prints the instance object as a dictionary instead of interpreting it as a map and displaying the same.
The code for my views.py
from django.http import HttpResponse
from django.shortcuts import render
import geemap as gm
#import pandas as pd
def params(request):
g_map = gm.Map()
return render(request, "PlotMap/params.html", { "m" : g_map })
The code for the template(params.html)
<!DOCTYPE html>
{% load static %}
<html>
<head>
<meta charset="utf-8">
<title>map</title>
</head>
<body>
{{ m }}
</body>
</html>
The output that I get is as follows. output
If someone can help me out, It would mean a lot Thank you.
You can use geemap.foliumap.Map() or folium.Map()
Code for html template
<!DOCTYPE html>
{% load static %}
<html>
<head>
<meta charset="utf-8">
<title>map</title>
{{ map.header.render|safe }}
</head>
<body>
<div class="map">
{{ map.html.render|safe }}
</div>
</body>
<script> {{ map.script.render | safe }}</script>
</html>
Code for backend (views.py)
import folium
import geemap.foliumap as geemap
class map(TemplateView):
template_name = 'map.html'
def get_context_data(request):
figure = folium.Figure()
Map = geemap.Map(plugin_Draw = True,
Draw_export = True)
Map.add_to(figure)
figure.render()
return {"map": figure}
Code for urls.py
urlpatterns = [
path('', views.map.as_view(), name = 'map'),
]
Given the following html markup structure:
<body>
<div id=Body>
<div>
<iframe>//Data within iframe is generated dynamically
<html>
<head></head>
<body>
<div id="abcdef">
<div></div>
<div></div>
<div></div>
</body>
</html>
</iframe>
</div>
<body>
I want to access the xpath and CSS Selector for <div id="abcdef"> but I am not able to, as it is referring to the internal <html> tag as different frame.
Using
<body>
<div id="Body">
<iframe>//Data within iframe is generated dynamically
<html>
<head></head>
<body>
<div id="abcdef">
<div></div>
<div></div>
<div></div>
</div>
</body>
</html>
</iframe>
</div>
</body>
I was able to successfully select that element with:
//iframe//div[#id = 'abcdef']
Try it out here: http://www.freeformatter.com/xpath-tester.html
However, this is not going to work in the DOM, because of iframe security restrictions. This xpath is correct, but you can test in the chrome dev console with
$x("//iframe//div[#id = 'abcdef']")
and see that you do not get any results. When dealing with HTML documents, your browsers are going to restrict your access to iframes, so you will need to actually grab the iframe, read the html, and then search that html. You will not be able to use an xpath or css selector, as far as I am aware, without getting the content of the iframe and then searching through it as it's own document/element.
Grabbing the iframe like below worked,
driver.switch_to.frame(driver.find_element_by_id('frameid'));
To refer more methods,
Here's a link
Switching to frame, enables you to access all the elements directly.
for above example xpath will be:
//*[#id='abcdef']
I am trying to integrate iTunes preview url (m4a file) using jplayer but running into issues
None of the files from Apple iTunes work. For example:
http://a1.phobos.apple.com/us/r1000/057/Music/fd/8b/40/mzm.staswrxu.aac.p.m4a
However these 2 m4a (non apple) links work:
http://jwdriggs.com/jagspodcast/m4a/%23116.m4a
They do work in iTunes player themselves.
What can I do to make them work? Does it have anything to do with iTunes encoding. It would be great if there is any workaround for this issue.
Here is my code:
<!DOCTYPE html>
<html>
<head>
<meta charset=utf-8 />
<!-- Website Design By: www.happyworm.com -->
<title>Demo : jPlayer circle player</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<link rel="stylesheet" href="skin/circle.skin/circle.player.css">
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.6/jquery.min.js"></script>
<script type="text/javascript" src="js/jquery.jplayer.min.js"></script>
<script type="text/javascript" src="js/jquery.transform.js"></script>
<script type="text/javascript" src="js/jquery.grab.js"></script>
<script type="text/javascript" src="js/mod.csstransforms.min.js"></script>
<script type="text/javascript" src="js/circle.player.js"></script>
<script type="text/javascript">
//<![CDATA[
$(document).ready(function(){
/*
* Instance CirclePlayer inside jQuery doc ready
*
* CirclePlayer(jPlayerSelector, media, options)
* jPlayerSelector: String - The css selector of the jPlayer div.
* media: Object - The media object used in jPlayer("setMedia",media).
* options: Object - The jPlayer options.
*
* Multiple instances must set the cssSelectorAncestor in the jPlayer options. Defaults to "#cp_container_1" in CirclePlayer.
*
* The CirclePlayer uses the default supplied:"m4a, oga" if not given, which is different from the jPlayer default of supplied:"mp3"
* Note that the {wmode:"window"} option is set to ensure playback in Firefox 3.6 with the Flash solution.
* However, the OGA format would be used in this case with the HTML solution.
*/
var myCirclePlayer = new CirclePlayer("#jquery_jplayer_1",
{
//m4a: "http://www.jplayer.org/audio/m4a/Miaow-07-Bubble.m4a"
m4a: "http://a1549.phobos.apple.com/us/r1000/026/Music/d8/01/eb/mzm.mxkkesne.aac.p.m4a"
}, {
cssSelectorAncestor: "#cp_container_1",
swfPath: "js",
wmode: "window"
});
});
//]]>
</script>
</head>
<body>
<!-- The jPlayer div must not be hidden. Keep it at the root of the body element to avoid any such problems. -->
<div id="jquery_jplayer_1" class="cp-jplayer"></div>
<!-- The container for the interface can go where you want to display it. Show and hide it as you need. -->
<div id="cp_container_1" class="cp-container">
<div class="cp-buffer-holder"> <!-- .cp-gt50 only needed when buffer is > than 50% -->
<div class="cp-buffer-1"></div>
<div class="cp-buffer-2"></div>
</div>
<div class="cp-progress-holder"> <!-- .cp-gt50 only needed when progress is > than 50% -->
<div class="cp-progress-1"></div>
<div class="cp-progress-2"></div>
</div>
<div class="cp-circle-control"></div>
<ul class="cp-controls">
<li>play</li>
<li>pause</li> <!-- Needs the inline style here, or jQuery.show() uses display:inline instead of display:block -->
</ul>
</div>
</body>
</html>
Try using SoundManager2 - this library works fine for playing iTunes audio previews.
I am trying to use the Dojo/Dijit declarative menu with Spring ROO 1.1.4, but even if I replace the complete roo generated menue.jspx with the example (ligthly addapted) from the Dojo/Dijit hompage, it does not replace the decorated menu divs with the menu.
that is how it look
that is how should look:
My modified menu.jspx
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<jsp:root xmlns:jsp="http://java.sun.com/JSP/Page"
xmlns:menu="urn:jsptagdir:/WEB-INF/tags/menu"
xmlns:sec="http://www.springframework.org/security/tags"
version="2.0">
<jsp:directive.page contentType="text/html;charset=UTF-8" />
<jsp:output omit-xml-declaration="yes" />
<script type="text/javascript">
dojo.require("dijit.MenuBar");
dojo.require("dijit.PopupMenuBarItem");
dojo.require("dijit.Menu");
dojo.require("dijit.MenuItem");
dojo.require("dijit.PopupMenuItem");
</script>
</head>
<div dojoType="dijit.MenuBar" id="navMenu">
<div dojoType="dijit.PopupMenuBarItem">
<span>
File
</span>
<div dojoType="dijit.Menu" id="fileMenu">
<div dojoType="dijit.MenuItem" onClick="alert('file 1')">
File #1
</div>
<div dojoType="dijit.MenuItem" onClick="alert('file 2')">
File #2
</div>
</div>
</div>
<div dojoType="dijit.PopupMenuBarItem">
<span>
Edit
</span>
<div dojoType="dijit.Menu" id="editMenu">
<div dojoType="dijit.MenuItem" onClick="alert('edit 1')">
Edit #1
</div>
<div dojoType="dijit.MenuItem" onClick="alert('edit 2')">
Edit #2
</div>
</div>
</div>
</div>
</jsp:root>
Can anybody give me a hint what I am doing wrong?
(I know the fallback to do the menu programmatic, but I want to do it declarative.)
The html header is looks like that:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=8" />
...
<script type="text/javascript">var djConfig = {parseOnLoad: false, isDebug: false, locale: '${fn:toLowerCase(userLocale)}'};</script>
<script src="${dojo_url}" type="text/javascript"><!-- required for FF3 and Opera --></script>
<script src="${spring_url}" type="text/javascript"><!-- /required for FF3 and Opera --></script>
<script src="${spring_dojo_url}" type="text/javascript"><!-- required for FF3 and Opera --></script>
<script language="JavaScript" type="text/javascript">dojo.require("dojo.parser");</script>
<spring:message code="application_name" var="app_name"/>
<title><spring:message code="welcome_h3" arguments="${app_name}" /></title>
</head>
I don't knows anything about Spring Roo, so maybe I'm saying something very stupid here...
Is that menu.jspx compiling into some static html? If this is the case, you can tell Dojo to parse your initial page simply by setting parseOnLoad to true on your djConfig
var djConfig = {parseOnLoad: true, ...}
(no need to require dojo.parser in this case).
On the other hand, if that template is inserted dinamicaly, you will need to call dojo.parser.parse() on the root 'navMenu' node yourself. You seem to be require-ing it, but I don't see where it is being called.
I had to use:
{
dojo.addOnLoad(function(){
dojo.parser.parse();
});
}
instead of parseOnLoad:true