Why does the "I've got what I need" button sometimes not work in import.io? - import.io

I am using import.io and trying to create a crawler based on this page:
http://www.flashscore.com/match/IeHoEHvJ/#match-statistics;0
After selecting single rows (one row per page), and adding some columns and training them, I want to click on the button "I've got what I need!" in order to proceed and train another similar page. But it is not possible to click on the button, it is as if the program is waiting for me to train more even though it is not necessary (I have successfully completed this procedure on other website, but for some reason this page does not work).
Any idea why this does not work?
Please see the following screenshot of import.io when I am trying to click the button without success:
http://puu.sh/j5Vlm/fcc322549a.png
UPDATE: Got a reply from the import.io facebook group. Building a Crawler might not work due to robots.txt. But building an Extractor seems to work, only have to find an easy way to collect all the links to use in the extractor.

the website you're trying to scrape is probably protected by a robots.txt file so as the Facebook group told you I suggest you to try with Extractor.
The solution is a bit tricky but it should work.
Create an Extractor to grab the data you need from the page you want data from. I did it and it worked.
Create an Extractor to get the links: (Mine is here: 5ef25069-f0cc-4ac7-9184-b2a035277403) for this page
Then download the dataset as CSV, open it with a spreadsheet processor and add this string of text at the end of the link: #match-statistics;0
Finally copy the list of links and go back to import.io. Choose the feature Bulk Extract on the first API and paste the list of URLs.
It should work ;)

Related

Is there a way to create a link that will execute a YouTrack command such as clone issue?

The "Generate Issue Template URL" functionality is clunky, and I'm try to work around it. I have a webpage outside of YouTrack with a list of links containing different templates for ticket writers. Any time a template changes we have to click the "generate issue template" then go update that link on our other web page. It would be nice to simply link to a template, by ID, that when saved will create a new issue or instead link directly to the clone command. The intent is that we won't have to update our template links going forward, and ticket writers will always get the latest version of the template they need.
Ideally it would be best if the entire call to YT could be in the href attribute of a link, but using AJAX is an option as well.
YT Version: 2021.3.22256
I've tried this, and a couple of variations, with no luck:
Template 1
In YouTrack there's no link you can pass a command into to get get it executed.
What you can do is to compose a workflow script to autofill issue fields as required. The only remaining bit is some kind of trigger to get script started. For that you can still use "Generate Issue Template URL" functionality with a single fields or any other marker to let the script recognize the right change to react to.

Auto login to website using script or bookmark

I've been trying to figure this out using various different methods. I'm trying to create a script/bookmark or some type of quick action to open a browser tab or window with a specific URL, and automatically log me in using my credentials. I'm not all that concerned about security for this at the moment.
At first I figured I'd try to use a javascript bookmark to do this, but nothing I found in my research worked. Next I tried to create a bash script, but I couldn't figure out how to send the credentials in via the terminal. Most recently, I literally copied the source code of a site, created a local file and tried to hack together something where I could prefill the form data with credentials and use JS to submit the form, and I've gotten close with this, but for some reason when I use the JS submit function, it errors out and says that the username and password are invalid. But when i turn off the submit function and manually click "log in" on my local html page, it works as expected. I want this to be a one click process, so the idea of using onload/submit or something to that affect is really important to me.
The site I'm testing with has a Rails backend and my next attempt might be trying to use POST to do what I'm thinking, but that's currently outside of my level of knowledge on the subject.
Anyone answering: i do not want to use a password manager to accomplish this.
My requirement is that i will either be able to a) run a script or b) use a 1-click option to do this per website. Ideally i'd be able to set this up in a sort of programmatic way to do this with multiple sites, but I'd be happy with 1 at the moment.
i know similar questions have been answered before, but I haven't been able to use information from those posts (the ones I've seen anyway) to figure out a good way to do this.
Create a bookmark for the current page you have opened.
Edit the bookmark
Change the value for the URL to something like this.
(javascript:(function(){CODE_GOES_HERE_FROM_BELLOW})();
find the field for username and password on the page.
Given example for hotmail
var inputs = document.getElementsByTagName('input'); for(var i=0;i<inputs.length;i++){if(inputs[i].name === 'passwd'){inputs[i].value = 'YOUR_PASSWORD'}else if(inputs[i].name === 'loginfmt'){inputs[i].value = 'YOUR_USERNAME'}}; document.getElementById(document.getElementsByTagName('form')[0].id).submit();
OR
try out casperjs.
The proposed solution didn't work for me and rather than spending tons of time installing a testing framework that I'll never use other than for this purpose, I decided to try to do this another way.
First, I found out that the reason my JS wasn't working before is because the site did not allow a JS submit to be done, or atleast that's what it seemed to be when I got this error: "Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user's experience"
The javascript I was using was in fact working, just not submitting. I used the following code to fill the fields (using "Class Name" elements on the page since there was no name or ID):
document.getElementsByClassName('username')[0].setAttribute('value', 'user');
document.getElementsByClassName('password')[0].setAttribute('value', 'password');
As I mentioned, the problem was when I tried to use JQuery to submit the form: document.getElementsByClassName('loginForm')[0].submit();
Which is when the above error cropped up. I can't really say for sure whether this is the root of the cause, but the page does submit, but I get an invalid username/password error when I do
I haven't figured out a great way to get around this just yet, but my short-term, "hacky" solution was to use Applescript to send a return keystroke to the browser to submit the form. I'd ideally like to figure out how to get the submission to work using JQuery, but I'm not sure how to get around it.

manually edit JSON for extractor?

In import.io, is there any way to manually edit the JSON for the extractor? I can view it, but not edit. I can't get manual row training to work on a page. I created a striped-down version of the page, trained the rows, and can see the content of resultXPaths in the JSON, but there's no way to copy this to the extractor for the original page.
Silly me. All I had to do was make my own copy of the problem page, simplify the messy code so manual row training works, save the extractor, then point it at the original page.

Modifying photosphere on website thing

What i am trying to do is to use a photosphere on my website so that it shows up on full screen as a website cover page. The problem is the the code to embed a photosphere in a webpage given here by google
https://developers.google.com/photo-sphere/web/
lets only the photosphere size to be hardcoded as
displaysize="600,400"
what ever the values but its still hardcoded. What i want is that it gets adjusted to the screen of the user and gets displayed in the whole browser window. Any one got an idea how to pull it off? I didn't find any stuff about 'photosphere on web' other than the google link i gave above.
Indeed the API is currently designed to take static values. I think it's a good point that users might want to set the dimensions to 100% and let it resize dynamically.
I put it on the TODO list and will try to get to it shortly.
In the meantime, one work around is the following: After the viewer loads you will find an iframe on the page which contains it. You can change it's dimensions dynamically to your liking and the viewer should adapt.
The API provided by Google wraps the whole photosphere in layers of iFrames.
You can use the API to request a certain photosphere but only use the response to parse it for the values you need. Then you create your own request and the result can be shown fullscreen.
An example link is this
I created this link dynamically from the JSON response from the elements
media$group media$content 0 url
Hope it helps.
Can't you take the raw image and just use webgl to project it on the inside of a sphere?

How do I get a Captcha off a website and display it in a picturebox using VB.NET?

In Visual Basic .NET is there a way to access a website/signup page and then get the Captcha and load it into a picturebox? How would I do it?
From your question, I can't tell if you are looking for a captcha plug-in or use a plug-in from another site. If you're looking for a plugin, try Recaptcha.
UPDATE
Trying to pull a the captcha image off of a site could be done in two ways, but it the captcha rotation were done correctly, it would no do you any good to be able to pull it off.
One way would be to just right-click on the image and reference that URL in your code. However, as stated previously, this would not be that reliable. The service that generates the image would rotate, and the image URL would be different on every refresh. In other words, the copied URL would only be good for the one time you copied/captured it via right-click or whatever. If the URL did not rotate, then that would be a security issue for the site which is why the image source is different on each refresh.
Another way would be to make a direct request to the page, scrape the content for the captcha image's source, and pull the source from the parsed content. The code for this would be fairly specific per page, and, with my limited knowledge, I can't think of a way to make a generic application to do so.
I don't know why you would want to do what you are wanting to do, unless this is a homework assignment, or you are up to no good.
Depends on the captcha service the website uses.
If the site uses reCAPTCHA, you would probably need to look for the image tag that has id "recaptcha_challenge_image" and display that image tag in a web browser control.
Here is the demo page I found: http://www.google.com/recaptcha/demo/. If the captcha itself is in a frame (or iframe), you will need to check the code in the frame itself.