I have external links on my webpage:
<a target="_blank" rel="nofollow" href=<?php echo $r['url'] ;?>>
VISIT STORE
</a>
When $r['url'] has http:// it shows the correct external url but when it has only www or just the website name with the domain it appends the web page's url.
Case 1 : url = http://google.com Works fine
Case 2: url = www.google.com creates a link as: http://localhost/appname/controller/action/www.google.com
Yii has no any relation to your problem. You don't know difference between absolute and relative URL's.
In your code, you don't use Yii anywhere. Yii has very powerfull URL manager with methods: createUrl, createAbsoluteUrl and other. You don't use this.
You need to understand the difference between absolute and relative URL's and your question gone away. There are more information in internet and in StackOverflow too: Absolute vs relative URLs
try this:
<?php echo CHtml::link('Google', '//www.google.com', array('target'=>'_blank')) ?>
Hope all your URLs are external. Then check your url for http:. If url contains http, then use directly, Otherwise add.
function is_valid_url($url)
{
if(strpos($url, "http://")!==false)
return $url; // correct
else return "http://$url";// http not found.
}
In your a tag
<a target="_blank" rel="nofollow" href=<?php echo is_valid_url($r['url']) ;?>>
VISIT STORE </a>
Related
To avoid having many files named Index.cshtml in my codebase, I want to use filenames named after the subfolder and with the suffix "Page" as the default pages in a Razor pages site instead of the conventional Index.cshtml.
So instead of having this file structure
/Pages
/Settings
Index.cshtml
/Users
Index.cshtml
I want this file structure:
/Pages
/Settings
SettingsPage.cshtml
/Users
UsersPage.cshtml
I have tried setting the route in SettingsPage.cshtml like this:
#page "/"
But this results in this error:
An unhandled exception occurred while processing the request.
AmbiguousMatchException: The request matched multiple endpoints. Matches:
/Index
/Settings/SettingsPage
How can I configure the razorpages options/conventions to use what I described above?
Ok, the solution I found was to simply use the folder name in the #page directive like this:
Instead of having this file /Pages/Settings/Index.cshtml
where the first line is
#page
I have this file /Pages/Settings/SettingsPage.cshtml
where the first line is:
#page "/Settings"
And when linking to the page I need to specify the new page file name instead of Index, for instance, ie. instead of:
<a class="nav-link text-dark" asp-area="" asp-page="/Settings/Index">Settings</a>
I use
<a class="nav-link text-dark" asp-area="" asp-page="/Settings/SettingsPage">Settings</a>
This works it seems. Any comments or suggestions are still much appreciated.
Try to set your SettingsPage.cshtml as below:
#page "/SettingsPage"
result:
The error indicates there're Mutipule razor pages (Pages/Index.cshtml&Pages/Settings/SettingsPage.cshtml)share the same route.
If you want Pages/Settings/SettingsPage.cshtml match the path "/",Set Pages/Index.cshtml with another route
For example #page "/Index"
If you didn't modify anything, /Pages/Index.cshtml would match the path"/" and "/Index" You could check this document ,When you specific the path"/" for Pages/Settings/SettingsPage.cshtml, the error occurs
I am having an issue. I am using scrapy to extract data from HTML tables that are displayed after a form search. The problem is that it will not continue to crawl to the next page. I have tried multiple combinations of rules. I understand that it is not recommended to override the default parse logic in CrawlSpider. I have found many answers that fix others issues but, I have not been able to find a solution in which a form POST must occur first. I look at my code and see that it requests the allowed_urls then POST to search.do and the results are returned in HTML formatted results page and thus the parsing begins. Here is my code and I have replaced the real url with nourl.com
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from scrapy.http import FormRequest, Request
from EMD.items import EmdItem
class EmdSpider(CrawlSpider):
name = "emd"
start_urls = ["https://nourl.com/methor"]
rules = (
Rule(SgmlLinkExtractor(restrict_xpaths=('//div//div//div//span[#class="pagelinks"]/a[#href]'))),
Rule(SgmlLinkExtractor(allow=('')), callback = 'parse_item')
)
def parse_item(self, response):
url = "https://nourl.com/methor-app/search.do"
payload = {"county": "ANDERSON"}
return (FormRequest(url, formdata = payload, callback = self.parse_data))
def parse_data(self, response):
print response
sel = Selector(response)
items = sel.xpath('//td').extract()
print items
I have left allow = ('') blank because I have tried so many combinations of it. Also in my xpath leads to this:
<div align="center">
<div id="bg">
<!--
Main Container
-->
<div id="header2"></div>
<!--
Content
-->
<div id="content">
<!--
Hidden/Accessible Headers
-->
<h1 class="hide"></h1>
<!--
InstanceBeginEditable name="Content"
-->
<h2></h2>
<p align="left"></p>
<p id="printnow" align="center"></p>
<p align="left"></p>
<span class="pagebanner"></span>
<span class="pagelinks">
[First/Prev]
<strong></strong>
,
<a title="Go to page 2" href="/methor-app/results.jsp?d-49653-p=2"></a>
,
<a title="Go to page 3" href="/methor-app/results.jsp?d-49653-p=3"></a>
[
/
]
</span>
I have checked with multiple tools and my xpath is correctly pointing to the URLs to go to next page. my output in the command prompt is only grabbing data from the first page. I have seen a couple of tutorials where the code contains a yield statement but I am not sure what that does other than "tell the function that it will be used again later without loosing its data" Any ideas would be helpful. Thank you!!!
It may be because you need to select the actual URL in your rule, not just the <a>node. [...] in XPath is used to make a condition, not select something. Try:
//span[#class="pagelinks"]/a/#href
Also a few comments:
How did you find this HTML? Beware of tools to find XPath, as HTML retrieved with browsers and with scrapy may be different, because scrapy doesn't handle Javascript (which can be used to generated the page you're looking at, and also some browsers try to sanitize HTML).
It may not be the case here, but the "javascript form" in a scrapy question spooked me. You should always check that the content of response.body is what you expect.
//div//div//div is exactly the same as //div. The two slashes means we don't care anymore about the structure, just select all the nodes named div in the children of the current node. That also why here //span[...] might do the trick.
This code is not working:
name="souq_com"
allowed_domains=['uae.souq.com']
start_urls=["http://uae.souq.com/ae-en/shop-all-categories/c/"]
rules = (
#categories
Rule(SgmlLinkExtractor(restrict_xpaths=('//div[#id="body-column-main"]//div[contains(#class,"fl")]'),unique=True)),
Rule(SgmlLinkExtractor(restrict_xpaths=('//div[#id="ItemResultList"]/div/div/div/a'),unique=True),callback='parse_item'),
Rule(SgmlLinkExtractor(allow=(r'.*?page=\d+'),unique=True)),
)
The first rule is getting responses, but the second rule is not working.
I'm sure that the second's rule xpath is correct (I've tried it using scrapy shell ) I also tried adding a callback to the first rule and selecting the path of the second rule ('//div[#id="ItemResultList"]/div/div/div/a') and issuing a Request and it's working correctly.
I also tried a workaround, I tried to use a Base spider instead of a Crawl Spider, it only issues the first request and doesn't issue the callback.
how should I fix that ?
The order of rules is important. According to scrapy docs for CrawlSpider rules:
If multiple rules match the same link, the first one will be used, according to the order they’re defined in this attribute.
If I follow the first link in http://uae.souq.com/ae-en/shop-all-categories/c/, i.e. http://uae.souq.com/ae-en/antique/l/, the items you want to follow are within this structure
<div id="body-column-main">
<div id="box-ads-souq-1340" class="box-container ">...
<div id="box-results" class="box-container box-container-none ">
<div class="box box-style-none box-padding-none">
<div class="bord_b_dash overhidden hidden-phone">
<div class="item-all-controls-wrapper">
<div id="ItemResultList">
<div class="single-item-browse fl width-175 height-310 position-relative">
<div class="single-item-browse fl width-175 height-310 position-relative">
...
So, the links you target with the 2nd Rule are in <div> that have "fl" in their class, so they also match the first rule, which looks for all links in '//div[#id="body-column-main"]//div[contains(#class,"fl")]', and therefore will NOT be parsed with parse_item
Simple solution: Try putting your 2nd rule before the "categories" Rule (unique=True by default for SgmlLinkExtractor)
name="souq_com"
allowed_domains=['uae.souq.com']
start_urls=["http://uae.souq.com/ae-en/shop-all-categories/c/"]
rules = (
Rule(SgmlLinkExtractor(restrict_xpaths=('//div[#id="ItemResultList"]/div/div/div')), callback='parse_item'),
#categories
Rule(SgmlLinkExtractor(restrict_xpaths=('//div[#id="body-column-main"]//div[contains(#class,"fl")]'))),
Rule(SgmlLinkExtractor(allow=(r'.*?page=\d+'))),
)
Another option is to change your first rule for category pages to a more restrictive XPath, that does not exist in the individual category pages, such as '//div[#id="body-column-main"]//div[contains(#class,"fl")]//ul[#class="refinementBrowser-mainList"]'
You could also define a regex for the category pages and use accept parameter in you Rules.
I am doing something wrong, that much I know. :) I am trying to display a simple breadcrumb on a page. I have this in a view:
#if (ViewContext.RouteData.Values["Action"].ToString() == "Index")
{
<li>
// This displays "Matter"
#ViewContext.RouteData.Values["Controller"]
</li>
}
else
{
<li>
// This displays a hyperlink "Matter",
// but the Href goes to "MyApp/Matter/Matter"
<a href="#ViewContext.RouteData.Values["Controller"].ToString()">
#ViewContext.RouteData.Values["Controller"]
</a>
</li>
}
In the above scenario, I have my Route.cs file set up to be "MyApp/Matter" which corresponds to an "Index" action on my "MatterController".
Clicking the link brings you to "MyApp/Matter/Matter" which does not work.
Any thoughts on how I can get this to work?
You're setting a relative path in the anchor tag. It's evaluating to:
Matter
That (Matter) gets appended to your current URL which, in this case, I can only assume is "MyApp/Matter". The result is "MyApp/Matter/Matter".
You need to specify an absolute URL or a more complete relative URL -- ../Matter would work in this case.
Beyond that, I can't help you without understanding a little more about what you're trying to do.
Where do you want the breadcrumb to take them? What's in the breadcrumb in relation to what they're looking at?
Is MyApp in your example the directory that contains the app or is it an area within your application?
I can only gather that Matter is the controller, but what's the action? If you're getting a link displayed then it's not currently looking at the Index action.
The some part of the html of the webpage which I'm testing looks like this
<div id="twoWideCallouts">
<div class="callout">
<a target="_blank" href="http://facebook.com">Facebook</a>
</div>
<div class="callout last">
<a target="_blank" href="http://youtube.com">Youtube</a>
</div>
I've to check using selenium that when I click on text, the URL opened is the same that is given in href and not error page.
Using Xpath I've written the following command
//i is iterator
selenium.getAttribute("//div[contains(#class, 'callout')]["+i+"]/a/#href")
However, this is very slow and for some of the links doesn't work. By reading many answers and comments on this site I've come to know that CSS loactors are faster and cleaner to maintain so I wrote it again as
css = div:contains(callout)
Firstly, I'm not able to reach to the anchor tag.
Secondly, This page can have any number of div where id = callout. Using xpathcount i can get the count of this, and I'll be iterating on that count and performing the href check. How can something similar be done using CSS locator?
Any help would be appreciated.
EDIT
I can click on the link using the locator css=div.callout a, but when I try to read the href value using String str = "css=div.callout a[href]";
selenium.getAttribute(str);. I get the Error - element not found. Console description is given below.
19:12:33.968 INFO - Command request: getAttribute[css=div.callout a[href], ] on session
19:12:33.993 INFO - Got result: ERROR: Element css=div.callout a[href not found on session
I tried to get the href attribute using xpath like this
"xpath=(//div[contains(#class, 'callout')])["+1+"]/a/#href" and it worked fine.
Please tell me what should be the corresponding CSS locator for this.
It should be -
css = div:contains(callout)
Did you notice ":" instead of "." you used?
For CSSCount this might help -
http://www.eviltester.com/index.php/2010/03/13/a-simple-getcsscount-helper-method-for-use-with-selenium-rc/
#
On a different note, did you see proposal of new selenium site on area 51 - http://area51.stackexchange.com/proposals/4693/selenium.
#
To read the sttribute I used css=div.callout a#href and it worked. The problem was with use of square brackets around attribute name.
For the first part of your question, anchor your identifier on the hyperlink:
css=a[href=http://youtube.com]
For achieving a count of elements in the DOM, based on CSS selectors, here's an excellent article.