Retrieving dynamic text from a website in vb.net (VS2008) - vb.net

I want to be able to retrieve dynamic data from a web page (share prices). I started out by retrieving the html code before I realised that as it is live data, the html code will be of little use. Although I am looking to capture specific data, all i wish to do is process a webpage that I specify which will return the text off that website and not the HTML code. Basically a copy and paste of the entire page would be great..
Any ideas would be really appreciated!

'Screen Scraping' by parsing HTML is so early 2000s...what I would do is read up on Amazon's Mechnical Turk. You can develop a queued architecture where you submit urls to this Mechnical Turk service. The service would automatically distribute these bits of work to users who would then do the dirty task of copying and pasting out the valuable stock quote information you require. Users around the world would anxiously await delivery of the next URL to their Mechanical Turk inbox...pinning for the opportunity to copy/paste out another share price for your application. Sure, it might take a few minutes to update your prices, but hey, they would be HAND parsed by REAL people around the globe! Just think of the possibilities!

Well, the HTML contains the text of the website, so you "just" need to parse the HTML.
EDIT: If the data is not in the HTML but loaded dynamically, the situation is different. As I can see, you have two options:
Find out how the data is loaded (i.e. read the JavaScript on the page). If it is updated via some web service, you could query the same web service in your program.
Use a web browser to get the data and then get the dynamic HTML tree of the page. Maybe the WPF Webbrowser control can help you with this, but I'm not sure since I've never done this myself.

Is it possible to find this same data provided in a ready-to-consume format rather than scraping HTML for it? It seems like there's probably public web-services for stock quotes.
For example: A quick search for "Stock price webservice" turned up http://www.webservicex.net/stockquote.asmx; an ASMX web-service that is easy to consume in .NET.
In your Visual Studio project you should be add a reference to this service via the "Add Web Reference" command; the dialog you're given varies depending on whether your project is targeting for .NET 2.0 or .NET 3.0/3.5.
I added a reference to the service named StockPriceProxy:
Public Function GetQuote(ByVal symbol As String) As String
Using quoteService As New StockPriceProxy.StockQuote
return quoteService.GetQuote(symbol)
End Using
End Function

Related

How to get Webtop Drl of a document via .net application?

Is it possible to retrieve drl ex:https://host:port /ewebtop/drl/objectId/0900a58e80970f7b of document via .net application?.So that when users clicks on this link they can be able to edit the document and when they close the document the document should be autosaved onto documentum.
First of all: a link is a link. What you decide to do with it I u to you. Default handler in browser will just redirect you to webtop application. If you have SSO you can have the document opened for edit. There are some extra arguments that can be provided (view/edit).
The object id is the only varying part of the URL, so you can easily construct this in code.
Secondly: what is your goal? There is no way to make the document upload itself into Documentum repo. You can write a plugin for every application to handle that, but it seems like a big task - especially dealing with security.
The problem is that upon check-in, user must provide some information - at least about the new version number...
If you're building a thick client in .net I would go with DFS - that's the only real option here.

Displaying documents SP2013 SharePoint Hosted App Model

So what are the cool kids using for displaying documents inside a SharePoint Hosted App Model? Right now, I'm doing rest calls and attempting to display the data using jquery datatables plugin. It works but isn't exactly usable, not to mention that I have to make additional async calls to get Author names, etc.
Should I dump REST and use CSOM and format the data accordingly? Build the hyperlink to the item, format date, build Author name.
Seems like I'm re-inventing the wheel here and want to make sure that I'm not overlooking something obvious.
Thanks
Out of interest, what about DataTables is not usable? If the data is in the JSON return, it can obtain the data directly. Sorry, I know this doesn't help you, but I'm trying to make DataTables better, so I'm interested to know what use cases it is failing to deliver on at the moment!

Should a dynamic help page content be stored in Database or HTML file?

So, I'm trying to come up with a better way to do a dynamic help module that displays a distinct help page for each page of a website. Currently there is a help.aspx that has a title and div section that is filled by methods that grab a database record. Each DB record is stored html withy the specific help content. Now, this works but it is an utter pain to maintain when, say an image, changes or the text has to be edited, you have to find and updated 1 or more DB records. I was thinking instead, I could build a single html page that basically shows/hides panels and inside each panel is the appropriate help content. As long as you follow a proper naming convention (name the panels ID to the page/content it represents) using ctrl + f will get you where you need to go and make it easier to find the content you need. What I'm curious of is would this have an impact on performance? The html page would be a fairly large file and would be hosted/ran at the server but it would also remove the need for Database calls. Would the work even be worth the benefit here or am I reinventing the wheel already in place?
Dynamic anything should be stored in the database. A truly usable web application should NEVER need code modified to change content. Hiding content is usually not a good idea, imagine if you expanded your application to 100 different pages that need their own help page. Then when someone clicks help their browser has to load 99 hidden pages to get 1 that it will show. You need to break your help page down into sections and just store the plain text in the database. I would need to know more about what language you're using as well as the architecture you're using to elaborate further but take a look below.
The need your describing is pretty much what MVC (web application architecture type) was built for.
If you're already using ASP.net and you aren't too far into your project I would consider switching to MVC. It's an architecture built specifically with dynamic page content in mind. You build different 'Views' (the V in MVC) that will dynamically build the HTML based on the content it receives from the Controller (The C in MVC) which pulls it's data from the database/Model (The M) and modifies it for the View. Also once you get into MVC you can couple it with Razor and half of your code get's written for you. It's a wonderful thing.
http://www.asp.net/mvc

Pass a string to various websites

I have a product code which I need to enter into 6 different websites in order to pull different information from them about the product. Is there away to save this product code into some sort of variable and pass it into each websites input box and it return all the information from each one automatically? Really have no idea where to go/start with this so if anyone can brainstorm a few ideas to get me moving that would be great.
In order get what you are planning for:
You need a script which visits the specified web site,
then at the website, you can get the element by tag.
For instance in javascript,
var textBox = document.getElementByTag(Input);
This will give you a reference to text field to enter the text. It can be done as follows:
textBox.value = "any string";
Once you have done this, you will have to retrieve the results from the page, based on the website layout.
So if you can specify about your work in detail, you would get better response.
Assuming you're talking about using an ordinary GUI browser, the best you can do is copy it to your system clipboard, and paste it into each page on the browser.
If you're talking about a programmatic web-access like wget or curl, it depends on what language you are writing your script in.
you have to create the web request for each web site and find a way to parse the response which will be HTML
have a look at the HttpWebRequest you can find lots of example on internet that shows how you can create an HTTP POST to a website.
http://www.terminally-incoherent.com/blog/2008/05/05/send-a-https-post-request-with-c/

using Search Server 2010 Express web service results are returning .aspx pages instead of documents

I have a Search web reference (from a Search Server 2010 express install) in a vb.net application that is utilizing the QueryService Class to search a production Sharepoint foundation 2010 site.
At a previous point in time, we had created a proof of concept on an entirely test system that has since been turfed. From my recollection on this test system when documents were uploaded as a specific site content type (that inherits from document) and metadata was provided, we could search for specific metadata by making managed properties for each, and search results would be returned as documents (with the isdocument flag set to true). Viewing the document then became simple, as we could simply use the filename and path to display the stored file.
Now we are developing a production system and we have encountered a new behavior, where these results now are returned as aspx results such as
http://digitizaton/Company/Client Documents/Forms/DispForm.aspx?ID=1703
This of course makes it terribly difficult to locate and view the document, we can extract the Title which will then give us the name of the file with no extension, but that hardly helps, as the FileExtension data is aspx, not the documents file extension, so we don't have a full filename. We could display the page returned as a result, but would much prefer the document itself.
I've made a test document library, with just bare bones setup, (not using the site content type, or site columns) and uploaded some documents on the same site, and they are returned in the same fashion, so I don't believe the document library, or content type are the issue.
With a fairly limited understanding of both Sharepoint and Search Server, I don't know if this is a setup issue with the search service itself, with the site configuration, or with the querypacket I am sending. We also have a third party application (Knowledgelake) installed on the server that ties into sharepoint which could have changed configuration somewhere as well?
I don't think the query packet has changed since it was working in the proof of concept, other than the custom data column names. I will provide it here in case there is something glaringly obvious to an external reader.
<QueryPacket xmlns='urn:Microsoft.Search.Query.Document'>"
<Query>
<SupportedFormats>
<Format>urn:Microsoft.Search.Response</Format>
</SupportedFormats>
<Range>
<Count>0</Count>
</Range>
<Context>
<QueryText type='MSSQLFT'>
SELECT Filename, Title, FileExtension, IsDocument, Path from Scope() WHERE ""Scope"" = 'Department1' AND CustomData = 'X' --
</QueryText>
</Context>
Any guidance would be incredibly appreciated. If I have not provided some relevant information, please let me know and I can track it down.
Thanks everyone
So now I feel like an idiot, I've searched for hours with no luck, and literally seconds after composing this post, I find the nugget of gold I've been searching for.
It appears that our primary file type, PDF, has a known issue with Sharepoint 2010, as shown at the following site.
http://www.sharepointsharon.com/2010/03/sharepoint-2010-and-adobe-pdf/
and further to that, this registry entry setting is required to link it all together
http://www.mossgurus.com/adnan/Lists/Categories/Category.aspx?Name=SharePoint%202010%20--%20Configuration