Import.io > Extractor : page never load, so cannot extract datas - import.io

Import.io is working pretty fine, but there is one website I would like to extract datas, but when I start the extractor, then enter the URL http://restaurant.michelin.fr/restaurants/france/75000-paris/restaurants-michelin/page-4/ which is loaded. Then I press the ON button, but the page won't load, nothing is displayed.... blank page and looks like it's still loading... In that case, how can I do ? I've also tried with the crawler, but same result. I restarted the program and computer but always the same issue. Thanks a lot.

The import.io desktop app browser uses firefox24. Few websites aren't compatible with the browser and this appears to be what is happening in this case.
It does however work in Magic! https://magic.import.io/
Once you have published the Magic API, you can then use the tools in MyData such as Bulk and Chain to add more URLs.
I have just tried to save a Magic API and it worked a treat. The only disadvantage here is that you won't be able to edit the columns until after you have extracted the data.

Related

How can I get the REAL source code in the browser?

I'm trying to write a test for a simple API, which always fails because of a strange browser behaviour.
The response coming from the API is just some plain text:
foo-bar-123
I can see exactly that in the browser window and also as response in the network tab.
Okay so far, but when I look at the Inspector, I see something like that:
<html><head></head><body>foo-bar-123</body></html>
If I control the browser with selenium, the result of webdriver.page_source is the same.
For reasons I don't understand, the browser adds some HTML tags to the content.
Is this some strange kind of "feature"? Can this be switched off?
I don't think it's a bug because both Firefox and Chrome are showing this behaviour.
I just want to get the real content without any fancy stuff the browser thinks I need.

Using a WCF Data Service

I built a WCF Data Service by following a blog.
It works ok, but I don't get the expected result format in the browser.
When I run the project, I get this:
But, when I try to browse one of these tables, say Customers, this is what I get:
As you can see, the Customers are there, but all I see is the current date for each one of them.
There must be something I'm not doing.
It looks to me like firefox is displaying the data as an RSS Feed because your service returns an xml payload. There should be a setting in Firefox to turn it off... I think it is under firefox > options > applications > web feed.
Since the response is ATOM-based, Firefox will assume this is an RSS feed and try to apply the RSS view (and fail, as you can see).
I tend to use IE when working with OData (and disable "Feed Reading view" under Options -> Content). In Firefox you can change some settings under Options -> Applications -> Web Feed, but I haven't figured out yet how to completely disable it.
As other answers have said, this is the default RSS view, correctly rendered in Firefox. You can still use the View Page Source option in Firefox to view the actual XML returned by your server.
If you want your data to render in a more user-friendly manner in the default RSS view you will have to use OData's feed customisation features, for example, to set a value for the Atom title field.

In any web site, the image always downloaded in the background, right?

Just to confirm, the image always downloaded in another thread which is different with the page text loading thread??
I put in my page, refer to a image on internet, the all text always show up firstly.
What do you think?
I think that html file contains all the prose and refers to pictures, so in whatever threads you do that you first download the text. Whether it's rendered before pictures are downloaded is up to UA and they may or may not be the same in this respect.
Depends on the Browser and the website. In most cases the Browser loads the "main html" where there are references to the Pics and other things.
If the Website loads most of the text-content via AJAX it could be kind of the other way round.
.. but in most cases you are right

Safari 6's New Developer Toolbar doesn't show Form Data in XHR/AJAX Requests

Safari recently went to version 6 (Lion/Mtn Lion) and they've changed over from the standard webkit dev tools to one that's much more XCode looking, my problem other than the OCD of not liking things change is that in the resource tab (or anywhere you can track down the DataService.aspx/AJAX calls) I can no longer see the form data that I am passing.
Can anyone point me to where I can find that data so I don't have to console out my params when I'm testing new data service/backend calls?
I've logged a bug with Apple, they've marked it as a duplicate so hopefully they've received enough requests to fix this, until then I'm continuing to use chrome as the webkit developer is the same as safari's old version.
You can find this info in Instrument tab (stop-watch icon). In the left sidebar thers Timelines row, click the grey circle (record button) on the right. Then click to Network Requests where you see all reqs, and you have to click small icon on the right oc request to display response headers and all form data are available in right panel. Panel can be hidden same as left one (in case you dnt see it).
Unfortunately there are no query pamaters listed, according to this disscussion. I belive its a bug in safari
Edit 15.May 2013: This bug was fixed in Safari 6.0.3.
As far as I can tell, there's no way to show the request parameters.
This goes even further. I can't see the JSON response data either (no clickable arrows to show the containing Javascript objects within the JSON, just pure text)
I think we have to switch to Firefox /w Firebug or regular Webkit in order to get XHR monitoring...
Guys if you want to see post data in safari 6 which is not possible right now, install the firebug lite extension and there you go you have the post data.
I used it and it works great with safari 6
Actually the request headers, response headers and query parameters are in the details sidebar on the right when using the resources view or if you click to see the content of a request in the Timelines/Network Requests view. Took me a few minutes to find that too.
If you need to see what the device is actually sending and your server is on a Windows Machine I use http://www.Wireshark.org and check on the server side of things. No interpretation by any WebKit stuff and very valuable (such as issue with iOS and the 'Blob' data). Similar network snooping should exist on Mac as well.

Create screenshot of the page with Watin-like tool

I need to create a screenshot of the page by providing a page URL to the command line tool. I found the following application: Convert HTML To Image. This tool is OK but want a more flexible application. I need to have ability to perform the following:
Go to the following page.
Click button.
Take a screenshot and save it.
I want to create an application that will test a site by going by URL, take a shots, and then send the images to the email.
Does anybody has an experience in solving such problems?
Watin can capture screenshots:
ie.CaptureWebPageToFile("c:\tmp\watin main page.jpg");
More info:
http://watin.sourceforge.net/releasenotes-1-2-0-4000.html
http://fwdnug.com/blogs/ddodgen/archive/2008/06/19/watin-api-capturewebpagetofile.aspx
I am a contributor to the WatiN project and the author of the WatiN Test Recorder. To do what you want, I'd suggest using something like csExWB2 (http://code.google.com/p/csexwb2/). The demo will give you the basic browser, and you can add screen shots where you like. Emailing is not covered, but that should be fairly easy.
I know this is very old post but i want to leave a message for visitor of this post.
PhantomJS is one option (http://www.phantomjs.org).
According to the WatiN features page:
Supports creating screenshots of webpages
I would direct you to more specifical documentation, but the documentation web doesn't work well with Firefox, so I can't search it.