Scrape dynamic websites using dart in flutter app - selenium

I have a website that generates a list of items using some javascript and I am trying to scrape it inside my flutter app using beautiful soap package for dart. The thing is that I am unable to scrape the dynamic data generated by the java script. I want to implement a solution that allows me to grab the source code of the website after it fully loads inside the app. A hidden webview inside the app would be perfect but what is blocking me is that how to get the data after the webview loads. This is my main concern. Code examples would be appreciated. Also better practices are welcomed.

What I have came to realize is that scraping dynamic websites that contain some javascript or a website that you want to click in it using a certain script to scrape it properly is not possible over flutter mobile. What you should do is to move the scraping to the cloud by creating your own api then using this api to return the response to your app. This will make scraping easier since you will not have to update your app for every error you find in your scripts. Also imagine that the website that you target updates itself every week, then you will have to update your app every week and wait for approval from all the stores you are subscribed to. A simple example would be using cloud functions from firebase in combination with javascript by utilizing the puppeteer package. A simple video tutorial is here: Tutorial over youtube

After lots of research I did indeed find a way.
Basically loading a hidden webview and scraping the data off of it, then showing it on screen. Here's how..
Spinning a webview in the UI
The Visibility widget and width/height properties will make sure the webview is impossible to be seen by the user. I suggest showing a loading screen until the data is scraped.
Visibility(
visible: false,
maintainState: true,
child: Container(
height: 1,
width: 1,
child: WebViewPlus(
onWebViewCreated: (controller) async {
log.e("onWebViewCreated");
await model.onWebViewCreated(controller);
},
onPageFinished: (url) async {
log.e("onPageFinished");
await model.onPageFinished(url);
},
javascriptMode: JavascriptMode.unrestricted,
),
),
),
The actual scraping
onWebViewCreated(controller) async {
this.webViewController = controller;
// Load the URL
await controller.loadUrl("<Your Website URL Here>",headers:_apiService.getAuthHeader());
// Get the HTML of the webpage as a JSON object
String docu = await webViewController?.webViewController.evaluateJavascript('document.documentElement.innerHTML') as String;
// Convert from JSON to String
var jsonString = json.decode(docu);
// Parse the String to a HTML DOM to actually access the elements
var dom = parse(jsonString);
// Some logic I needed in my application by scraping
for (var child in dom.getElementById("autodl-log-tbody")!.children) {
feed.add(child.text);
}
}
Pro Tip : If you think the webpage might need a bit more time to load, you can stall the execution of the function by using await Future.delayed(Duration(Seconds:5)); before the line of code where you load the URL in the onWebViewCreated() function.

Related

Load different page template for mobile in nuxt

I'm working on a larger project and need to create a separate UX for mobile users on some pages. Using a responsive layout with CSS won't cut it, and dynamic component rendering with v-if results in a horrifying template.
This answer is the closest that I have come to, but I want to avoid manually defining routes as there are a ton of pages.
I am currently using a middleware to redirect based on a user agent check:
export default function(context) {
if (context.isMobile) {
if (context.route.fullPath.indexOf('/m') !== 0) {
return context.redirect('/m' + context.route.fullPath)
}
}
if (context.isDesktop) {
if (context.route.fullPath.indexOf('/m') === 0) {
return context.redirect(context.route.fullPath.substring(2))
}
}
}
but I don't have a way of telling whether there is a mobile version or not, so if there isn't, the error page will be displayed.
I also tried working with this answer but using nuxt-device-detect instead of breakpoints, but since the router is configured before getting in the browser, the check function will return the fallback option, so it didn't work well for me. Also since I'll be using SSR I'm avoiding things like document.documentElement.clientWidth.
I guess in short, my question is: what is the best practice for serving separate pages to mobile users?

node express multer fast-csv pug file upload

I trying to upload a file using pug, multer and express.
The pug form looks like this
form(method='POST' enctype="multipart/form-data")
div.form-group
input#uploaddata.form-control(type='file', name='uploaddata' )
br
button.btn.btn-primary(type='submit' name='uploaddata') Upload
The server code looks like this (taken out of context)
.post('/uploaddata', function(req, res, next) {
upload.single('uploaddata',function(err) {
if(err){
throw err;
} else {
res.json({success : "File upload sucessfully.", status : 200});
}
});
})
My issue is that while the file uploads successfully, the success message is not shown on the same page, ie: a new page is loaded showing
{success : "File upload sucessfully.", status : 200}
As an example for other elements (link clicks) the message is displayed via such javascript:
$("#importdata").on('click', function(){
$.get( "/import", function( data ) {
$("#message").show().html(data['success']);
});
});
I tried doing a pure javascript in order to workaround the default form behaviour but no luck.
Your issue has to do with mixing form submissions and AJAX concepts. To be specific, you are submitting a form then returning a value appropriate to an AJAX API. You need to choose one or the other for this to work properly.
If you choose to submit this as a form you can't use res.json, you need to switch to res.render or res.redirect instead to render the page again. You are seeing exactly what you are telling node/express to do with res.json - JSON output. Rendering or redirecting is what you want to do here.
Here is the MDN primer on forms and also a tutorial specific to express.js.
Alternatively, if you choose to handle this with an AJAX API, you need to use jquery, fetch, axios, or similar in the browser to send the request and handle the response. This won't cause the page to reload, but you do need to handle the response somehow and modify the page, otherwise the user will just sit there wondering what has happened.
MDN has a great primer on AJAX that will help you get started there. If you are going down this path also make sure you read up on restful API design.
Neither one is inherently a better strategy, both methods are used in large-scale production applications. However, you do need to choose one or the other and not mix them as you have above.

Node.js Response From Image Upload Without Refreshing Client Page

Problem Set: Client .posts image from form action='/pages/contact/image/something' to node.js and I .putItem to AWS S3. On the success response I would like to send the image url back to the client without refreshing the screen and update the location they wanted to add the image with the new src url.
If the page refreshes I lose the location where they wanted to upload the image. Open to any suggestions, I have looked at res.send, res.end, res.json, res.jsonp, res.send(callback): all of which overwrite(refresh) the client webpage with the array, text or context in general I am passing back to the client . Code below:
myrouter.route('/Pages/:Page/Image/:Purpose')
.post(function (req, res) {
controller.addImageToS3(req, res)
.then(function(imgurl){
//res.json({imgurl : imgurl});
//res.send(imgurl);
//res.end(imgurl);
//res.send(req.query.callback(imgUploadResponse(imgurl)))
<response mechanism here>
console.log('Image Upload Complete');
}, function (err){
res.render('Admin/EditPages', {
apiData : apiData,
PageId : PageId
});
});
});
Ideally there could be a passed parameter to a javascript function that I could then use: Example:
function imgUploadResponse(imgurl){
// Do something with the url
}
You, as a developer, have full control over the s3 url format. It follows a straightforward convention:
s3-region.amazonaws.com/your-bucket-name/your-object-name.
For example:
https://s3-us-west-2.amazonaws.com/some-random-bucket-name/image.jpg
While I would recommend keeping those details in the back-end, if you really want to avoid using res.send, you can basically make the front-end aware of the url formatting convention and present the url to the user, even before the image was actually uploaded (just need to append the name of the image to the s3-region.amazonaws.com/your-bucket-name)
Also, I'm not sure why your page would refresh. There are ways to refresh content of your page without refreshing the whole page, the most basic being AJAX. Frameworks like angular provide you with promises that allow you to do back-end calls from the front-end.
Hope this helps!

Communication Between WebView and WebPage - Titanium Studio

I am working in a Mobile project (using Titanium Studio), in which i have the below situation
1) My Mobile app contacts Rails backend to check some data, say check validity of a
user id.
2) I found a way to load web pages in Mobile app, i.e., WebView
3) I could able to load the desired url, ex http://www.mydomain.com/checkuser?uid=20121
which would return data like status:success
But i need to read this data to show whether the response from server is a success or failure, how do i achieve this?
NOTE : The above mentioned scenario is an usecase, but actually what happens is i load a third party url in WebView and when user enters the data and submits, the result will be posted back to my website url.
EDIT : So the process is like below
1) WebView loaded with third party url like http://www.anyapiprovider.com/processdata
2) User will enter set of data in this web page and submits the page
3) The submitted data will be processed by the apiprovider and it returns data to my web page say http://www.mydomain.com/recievedata
This is the reason why i am not directly using GET using HTTPClient
FYI : I tried to fire Ti.APP events right from the actual web page as suggested by few articles, but most of them says this will work only if the file loaded is in local and not a remote file. Reference Link
Please suggest me if my approach has to be improved.
Thanks
If you don't want to follow Josiah's advice, then take a look at the Titanium docs on how to add a webview.addEventListener('load',... event listener and use webview.evalJS() to inject your own code into the third party HTML.
Maybe you can inject code to trap the submit event and fire a Ti event to trigger the downloading of data from your website.
Communication Between WebViews and Titanium - Remote Web Content Section
I found a solution for my problem
1) Load the http://www.mydomain.com/checkuser?uid=20121 in a webview
2) Let user enter and submit data to third party url
3) Recieve the response from third party url and print only <div id="result">status:success</div> in http://www.mydomain.com/recievedata page.
4) Add event listener for the web view as follows
webView.addEventListener('load', function(data)
{
//Add condition to check if the loaded web page has any div with id = result (to check if this is /recievedata page)
alert(webView.evalJS("document.getElementById('result').innerHTML"));
});
The above alert would print the result status:success, read it in webview load event
and take actions in web accordingly.
It works fine for me.
Instead of loading it in a WebView why not just GET it using a HTTP Client? This is much cleaner, and more standards based:
var xhr_get = Ti.Network.createHTTPClient({
onload : function(e) {
// Here is your "status:success" string
var returnValue = this.responseText;
},
onerror : function(e) {
Ti.API.info(this.responseText);
Ti.API.info('CheckUserProgressOnActivity webservice failed with message : ' + e.error);
}
});
xhr_get.open('GET', 'http://www.mydomain.com/checkuser?uid=20121');
xhr_get.send();

Using google places api with jQuery autocomplete

I'm using google places and jquery to achieve the goal of once the user starts typing in an input field, it does a call to google places and feeds the results in a dropdown (via jquery ui autocomplete)
My problem is, in my autocomplete function I have
source: function( request, response ) {
initialize()
}
In there, I'm trying to call this function
function initialize() {
service.search(request, callback);
}
Which works fine... but the problem is... initialize does a call out to the function callback()... so I'm not sure how to listen to see when the callback is done.
So for example, what would I do here:
source: function( request, response ) {
// need code here to know when initialize and callback are done and are sending me the list of results from google ?
}
I'm just not sure how to wait for google places to get done, before I use $.map from the results to produce the dropdown.
Timing issues with google apis? I feel your pain. But to sidestep your issue and maybe save you some pain, you could use Google's prebuilt solution:
http://code.google.com/apis/maps/documentation/javascript/places.html#places_autocomplete
Apologies if you have some reason to not use their API.