Website data retrieval

Website data retrieval - api

An recent article has prompted me to pick up a project I have been working on for a while. I want to create a web service front end for a number of sites to allow automated completion of forms and data retrieval from the results, and other areas of the site. I have acheived a degree of success using Selenium and custom code however I am looking to extend this to a stage where adding additional sites is a trivial task (maybe one which doesn't require a developer even).
The Kapow web data server looks to achieve a lot of this however I am told it is quite expensive (currently awaiting a quote). Has anyone had experience with this, or can suggest any alternatives (Open Source ideally)?
Disclaimer: I realise the potential legality issues around automating data retrieval from 3rd party websites - this tool is designed to be used in a price comparison system and all of the websites integrated with it will be done with the express permission of the owners. Where the sites provide an API this will clearly be the favoured approach.
Thanks

Realised it's been a while since I posted this, however should anyone come across it, I have had lots of success in using the WSO2 framework (particularly the mashup server) for this. For data mining tasks I have also used a Java library that this wraps - webharvest - which has achieved everything I needed

Related

Performance testing of a Sitecore website

My client gave me a Sitecore website to do some performance testing.I really don't have any expirence with Sitecore website or Sitecore itself (which I am working on now).I have some idea of performance testing of a website and also got additional info from stackflow. But I am curious to know if there is any difference in testing a Sitecore website? What is the best practice to test a Sitecore website? Little bit about the performance testing scope:
The website handles different kind of enrollment path for students. So there are a couple of enrollment paths , all of which ends with a payment done by the customer.There can be more than 1 student enrolling at a time (like 6 together). Performance testing will include enrollments for all of these paths.
A lot of customers trying to enroll at the same time in the same enrollment path and different enrollment paths.
Also have to keep in mind that since this is a customer facing website the images/texts/files that are hosted in Sitecore should be shown in the website quite quickly.
Any help is appreciated.Thanks!

Typically there are three ways to come at perforamce testing for Sitecore.
The first is that it's basically just a web application, so most tools you'd use to test those are valid. Load testing tools like jMeter (or Windows equivalents) that simulate requests to pages and measure response times can give you an idea of how your Sitecore application works under load. Or the developer tools in browsers can show you how long individual requests take, and what resources are being downloaded. Both can help you form a picture of the site's overall performance levels.
The second is that Sitecore includes some tools for measuring how hard Sitecore itself is working to render pages. The "Experience Editor" (the WYSIWYG view for editing web page content) has a "debug" mode which can tell you how many content items are being read to render a page, what UI components are being run and how long these things are taking. This can help you tweak how code queries Sitecore's databases, and how components are cached in order to increase performance.
Thirdly, any ASP.Net application can have low-level performance tracing done with standard .Net tools. Visual Studio's performance tracing tools, or 3rd party stuff like dotTrace can all give you a detailed view of how long IIS spends working on individual pages, and what parts of the code are taking the most time or memory.
I wrote up a user-group presentation I did on this topic a while back:
https://jermdavis.wordpress.com/2017/10/02/measure-if-you-want-to-go-faster/
and more recently I wrote about some general patterns you might see when doing low-level performance traces:
https://jermdavis.wordpress.com/2018/02/05/spotting-common-challenges-when-youre-doing-performance-tracing/

Sitecore is basically a .NET-based content management system so there should not be any difference from other web applications performance testing so the same approach applies.
The best entry-level document I've seen over the web so far is Performance Testing Guidance for Web Applications, you can quickly get familiarized with the concept of load testing, how to implement it, what metrics need to be considered, etc.
With regards to load testing tool, the most natural choice would be Microsoft Visual Studio Load Testing Framework, however it assumes having a relevant license and some C# coding skills. If you don't have any of these you can consider one of free and open source load testing tools.
While creating your script keep in mind that each virtual user needs to represent real user as close as possible so mind cookies, headers, cache, think times, distribution of virtual user groups, etc.

Automatically triaging and/or resolving Jira tickets using expert systems?

Has anyone looked into auto-triage'ing and/or auto-resolving Jira tickets, using expert systems such as LogicNets?
Goal is not to completely automate all tickets, just reduce workload needed to auto-resolve simple tickets, and enable support engineers to focus on more complicated issues.

The most flexible way is using REST API, check https://docs.atlassian.com/jira/REST/latest/
If you tool can do some sort of GET or POST with a trigger, or could be programmatic extended then you could do almost everything in Jira, you can even use shell scripts with curl to make the rest calls in linux/mac terminal.

How to access results of Sonar metrics for use with applications like PowerPivot

I'm trying to run a number of applications with known failure rates through Sonar, with hopes of deciding which metrics are most valuable in determining whether a particular application will fail. Ultimately I'll be making some sort of algorithm that will look at the outputs of whatever metrics I'm using and generate a score from 1 - 100. I've got about 21 applications put through Sonar, and the results have been stored in a MySQL database. I originally planned to use PowerPivot to find relationships in the data, but it seems like the formatting of the tables doesn't lend itself well to that. Other questions on stackoverflow have told me that Sonar's tables are unformatted, and I should instead use the Web Service API to get the information. I'm unfamiliar with API and was unsuccessful in trying to do what I wanted by looking at Sonar's documentation for API.
From an answer to another question:
http://nemo.sonarsource.org/api/timemachine?resource=org.apache.cxf:cxf&format=csv&metrics=ncloc,violations_density,comment_lines_density,public_documented_api_density,duplicated_lines_density,blocker_violations,critical_violations,major_violations,minor_violations
This looks very similar to what I'd like to have, except I'm only looking at each application once (I'm analyzing a sample of all the live applications on a grid), which means Timemachine isn't really what I'm looking for. Would it be possible to generate a similar table, except instead of the stats for a particular application per date, it showed the statistics for an application and all of its classes, etc?

If you're not familiar with the WS API, you can also create your own Sonar plugin to achieve whatever you want: it is written in Java and it will execute on every analysis you run. This way, in the code ot this custom plugin, you can do whatever you want: flush the metrics you need in an output file, push them into a third party system, ... etc.
Just take a look on how to write a plugin (most probably you will create a Decorator). You have concrete examples also to get started faster.

Dropbox API - Using Dropbox as a server

I was wanting to use a file sharing server to keep certain files up-to-date and constant across multiple instances of my application across multiple computers - like (for example) writing a multiplayer game, which stores all the player's positions in a text file, and uses something like Dropbox to keep the text file constant across all the applications, and each application instance can change the file with that application's player's position, and then the rest of the applications can update accordingly. This is only an example, and is not what I intend to do using this technology. What I want to do does not rely on fast sharing of data very quickly - but only periodically downloading and updating the text file.
I was wondering how I might be able to do this using the Dropbox API for Objective-C without prompting the user for any Dropbox username/password - just store a single Dropbox account's login information, log into it automatically and update/download the file stored on it?
From what I have found out from experimenting, Dropbox prompts users for their passwords via a web-broswer, and is designed to accommodate multiple accounts, whereas I only need to accommodate the 'Server' account.
So, is there anyway to do this sort of thing using the Dropbox API, or should I use something else. Or do I need to find out how to write my own server. Using some sort of file sharing API seems a lot easier to me than writing an actual server.
Thanks for any help,
Ben

You might think about using Google App Engine (GAE). I had a similar requirement recently and I'm thinking this is a good option when you want centralized data. Plus you can do the no-browser account login by using your own custom authentication, or I think it's even possible via OAuth? Depends on how sensitive the data is I guess. I just rolled my own.
From my research I found that using Dropbox as a server has some issues with scalability, since you'll be limited to maybe 5,000 calls per day. source It's built on Amazon S3, so you could also look at using that directly.
GAE lifts that limit up to 675,000, but can be increased up to 91 million for free.
https://developers.google.com/appengine/docs/quotas
I did find an open-source project for doing this with Java, alternative you could look at Python example

I've written a daemon that continuously checks for updated files and syncs them. I wrote it for my own file manager iOS app. You can find the implementation here:
https://github.com/H2CO3/MyFile/tree/master/DropboxDaemon

I'm personally not an iOS developer but I came across this question while looking for something else and thought I would offer up another potential solution to the OP's question.
Microsoft just released something called Azure Mobile Services which supports iOS development (among other platforms). It's basically a convenient way to set up a back end system complete with push notifications, authentication, etc. without rolling your own. You don't need to know anything about Azure or servers as the setup process walks you through most of it. It is new so keep that in mind, but it looks promising for situations like this.
Here's a 10 minute video explaining how to use it with an iOS developed app along with links to more documentation:
http://channel9.msdn.com/posts/iOS-Support-in-Windows-Azure-Mobile-Services/
Hope this helps.

How to set up development environment for yql and open table development? How to test it locally? (best practice)

I develop -- from time to time -- yahoo open tables to access different resources on the web. Currently I am using a JavaScript editor and -- when I want to test if my open table works -- I upload the xml table description to a server to test it with a yql client application. However this approach is quite slow and -- sometimes -- I get blocked by yahoo, because of a mistake in my open table description. Therefore I would like to learn about best practices on how to test and develop yahoo open table locally. How does look your set up for the open table development?
To clarify my question, I am looking for any convenient way (best practise) to develop and test yql tables, e.g., running a part of a Java Script inside Rhino.

First of all: I agree that I don't see a really convenient way either to test YQL datatable definitions locally. Nevertheless, here is how I approach this issue.
Hosting on github
YQL datatable definitions are often used in very open scenarios e.g. when there is an existing API that you want to wrap via YQL. Therefore I am normally working on a fork of the YQL community tables and I just add my own definitions there. The hosting of the .xml files takes place on github in this case: https://github.com/yql/yql-tables
The other advantage of this approach is as well that it is easy for me to share my datatables with the community if I feel that they might be valuable for others as well.
Hosting privately
The free github account only comes with free repositories though, so everybody would be able to see and use your datatables. If that is not good for you then you could either buy a github pro account to get private repositories, or host the datatable definitions yourself.
To do that you can upload them to your own server - as you are already doing - or you should also be able to set up a web server like Apache locally on your machine and then get a dynamic hostname from dyndns.com or similar, so that you can point to this definitions from YQL. I have not tried this because github was working sufficiently well for me but I am sure that it is possible.

Why don't you just put the file you are editing in a public dropbox folder? This is what I do and it works pretty well.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas