how to download page from source code - vb.net

i need to download page from source code..for example
<span id="businessNumOnMap" class="resultNumberOnMap" style="display:none;"></span><span>Cellini's Italian Restaurant
i want to download the "/len/aaproximat...php"..i didnt find the suitable regex for it..and i need to download that page..can anyone help?
im using vb.net

Normally it's not recommended to parse HTML with a regex, with the exception if this is a simple page that you know the format of, the Html Agility Pack is often recommended for this purpose instead.
Be aware though, if you're parsing this from a page that's on the internet, the site in question might have T&Cs for the usage of their data that you might need to follow to stay legal.

Do you want to download the php file itself with all the codes and not the only html codes? If it's in that case it's not possible

Use WebClient.DownloadString method for downloading. If you haven't found a suitable expression to extract that "Span" from the source, then build you own.

Related

QNAmaker: How to enable context-only for subquestions?

I have a document pdf or docx (only accepted formats for multiturn), this contains alot of subheadings which translate to follow up prompts. This all works fine! But I would like to enable context-only for all of my prompts, because the answers are not relevant out of context.
Can I denote this in my document itself? There are way too many too manually check the button.
I could write a script that changes the contextOnly to true on the exported tsv, but this seems like it is a silly workaround.
There does not seem to be any way to indicate whether a question is context-only through the document extraction process, so you will need to automate this with a script. If you don't want to modify the TSV directly, you can use the QnA REST API. You can also access this API through the Bot Framework CLI but I don't know if that makes anything more convenient for you.

Replace content inte a PDF File

The system I'm working with are receiving PDF documents, inside those documents there are two clickable images. The click events just triggers a http url. The thing is that I need to update those two url:s when I receive the document.
So my question is, is it possible to find the events and change the url and then save the file again? Those two images can be anywhere in the document so I can't look in a specific location.
Edit: I forgot to say that I'm coding in C# so it needs to be a .NET library.
Yes, it's possible.
It's hard to describe the way it can be achieved without knowing how PDFs are constructed (there are a few ways to create the described behavior) and tools you are going to use.
I just want to tell you how I solved this problem, or rather where I found the solution. I used the code in this thread, and it worked like a charm.

Basics of i18next

Im new to i18n and when I typed it in the search bar i18next is in the top results.
I already did my research regarding i18n and how to use it. But it still not clear to me. All I know is that to be able to make your web app available to other language, you need to do a json file that contains the keys and value of your app, and you need to add a script for the i18n.
The rest is still confusing for me. This might sound a stupid question for you, but I just cant understand how it works.
1) Im not sure but based on my observation, you only create a json translation for those that have a value or text that will be shown in the page. Correct? Assuming in the html file, I have a text that is not inside a label or innerhtml, example:
<html>
<body>
**How are we going to translate this text? What key am I going to use?**
</body>
</html>
What do I need to do to translate this text?
2) What should we use as the key? id? class? tag? Because I've seen different examples and it uses different any of these. When is the right time to use these?
3) regarding the key-value pair, what if the pair is coming from the server? what's the syntax for this?
4) When do we need a multi line json?
i18n is a big topic, with a lot of solutions depending on what kind of web app you are trying to internationalize / localize. Unfortunately, i18next's documentation is not very good, and it has next to nothing in way of tutorials.
That said, you might be best off taking a look at the sample app on i18next.js's github repository here: https://github.com/jamuhl/i18next/tree/master/sample/static. It does give some examples of how i18next can be used to replace html text with localized versions of the same. To answer some of your questions:
There are a few ways of doing this. The sample script replaces much of the data by using the jQuery .text call -- something like this: $('#MyHTMLID').text($.t('ns.common:MyLocalizedTextForMyHTMLID'));. Any html inside the id "MyHTMLID" is replaced by the localized data for the key "MyLocalizedTextForMyHTMLID' by the i18next .t call.
A lot of these decisions are just convention. Keep it simple, be consistent.
Normally in a web app the json file is on the server, in a locales subdirectory of the directory where your html resides. Take a look at that i18next example for how it's laid out.
When you're first building your web app, use a multi-line json file to be able to troubleshoot. You can compress it later using something like http://jsonformatter.curiousconcept.com/.
Hope this helps get you started!

How to detect image in a document

How can I detect images in a document say doc,xls,ppt or pdf ?
I came across with Apache Tika, I am trying its command line option.
http://tika.apache.org/1.2/gettingstarted.html
But not quite sure how it will detect images.
Any help is appreciated.
Thanks
You've said you want to use a command line solution, and not write any Java code, so it's not going to be the prettiest way to do it... If you are happy to write a little bit of Java, and create a new program to call from Python, then you can do it much nicer!
The first thing to do is to have the Tika App extract out any embedded resources within your file. Use the --extract option for this, and have the extraction occur in a special temp directory you app controls, eg
$ java -jar tika.jar --extract ../testWORD_embedded_pdf.doc
Extracting 'image1.emf' (application/x-emf)
Extracting '_1402837031.pdf' (application/pdf)
Grab the output of the extraction if you can, and parse that looking for images (but be aware that some images have an application/ prefix on their canconical mimetype!). You might need to run a second --detect step on a few, I'm not sure, test how the parsers get on with the extraction.
Now, if there were images, they'll be in your test dir. Process them as you want. Finally, zap the temp dir when you're done with the file!
Having used Tika in the past I can't see how Tika can help with images embedded within Office documents or PDFs I was wrong to answer No. You will have may still try to resolve to native APIs like Apache POI and Apache PDFBox. Tika does use both libraries to parse text and metadata but no embedded image support.
Using Tika makes these APIs automatically available (side effect of using Tika).
UPDATE:
Since Tika 0.8: look for EmbeddedResourceHandler and examples - thanks to Gagravarr.

dojo js library + jsdoc -> how to document the code?

I'd love to ask you how do the guys developing dojo create the documentation?
From nightly builds you can get the uncompressed js files with all the comments, and I'm sure there is some kind documenting script that will generate some html or xml out of it.
I guess they use jsdoc as this can be found in their utils folder, but I have no idea on how to use it. jsDoc toolkit uses different /**commenting**/ notations than the original dojo files.
Thanks for all your help
It's all done with a custom PHP parser and Drupal. If you look in util/docscripts/README and util/jsdoc/INSTALL you can get all the gory details about how to generate the docs.
It's different than jsdoc-toolkit or JSDoc (as youv'e discovered).
FWIW, I'm using jsdoc-toolkit as it's much easier to generate static HTML and there's lots of documentation about the tags on the google code page.
Also, just to be clear, I don't develop dojo itself. I just use it a lot at work.
There are two parts to the "dojo jsdoc" process. There is a parser, written in PHP, which generates xml and/or json of the entirety of listed namespaces (defined in util/docscripts/modules, so you can add your own namespaces. There are basic usage instructions atop the file "generate.php") and a Drupal part called "jsdoc" which installs as a drupal module/plugin/whatever.
The Drupal aspect of it is just Dojo's basic view of this data. A well-crafted XSLT or something to iterate over the json and produce html would work just the same, though neither of these are provided by default (would love a contribution!). I shy away from the Drupal bit myself, though it has been running on api.dojotoolkit.org for some time now.
The doc parser is exposed so that you may use its inspection capabilities to write your own custom output as well. I use it to generate the Komodo .cix code completion in a [rather sloppy] PHP file util/docscripts/makeCix.php, which dumps information as found into an XML doc crafted to match the spec there. This could be modified to generate any kind of output you chose with a little finagling.
The doc syntax is all defined on the style guideline page:
http://dojotoolkit.org/reference-guide/developer/styleguide.html