Extract changes from Wikipedia/Wikimedia revision pages

Extract changes from Wikipedia/Wikimedia revision pages - wikipedia-api

I have a simple query regarding the Wikimedia/Wikipedia API.
I have to fetch the changes made from a list of "revids". I am able to fetch the XML content for a batch of "revids", but I failed to extract only the changed text.
Does API provide any way to extract only the changed sentences? If not any external script/module that can do this job?
Query to fetch the revision details: https://en.wikipedia.org/w/api.php?action=query&prop=info|revisions&rvprop=user|userid|ids|tags|comment|content&format=jsonfm&revids=1228415
I would appreciate any suggestions/solutions that could solve this issue!
(Currently, I am using the Wikitools python module to make the queries)

You can get the diff between the old and new text with action=compare, but it segments text by wikitext lines, not sentences, isn't meant to be machine-readable, and is generally not that helpful. Since you are using Python, the client-side library deltas will probably work better for you.

Related

Somehow send command line commands on windows externally and get back the response

Problem: Need to convert local html (with local images etc) to pdf from an AIX box running Universe 11.2.5 with System Builder
Current solution: FTP over html file to a Windows server which converts in batches and sends the e-mail to the destination
Proposed Solution: Do everything on the AIX box, from converting html to pdf and sending the e-mail.
Current problem: Unable to find a way to convert local html to PDF on the AIX box. I have been trying many different ways from trying to install Python3, but to no avail.

The only really difficult part of the process is getting the HTML to render into a format will properly display your html into pages that are suitable for printing. There is a fair amount of magic that goes on between HTTP:GET and clicking print on a browser window that needs to be accounted for.
I was trying accomplish something similar many moons ago on AIX but kind of ran into a skill level/time wall because I was going to have essentially create a headless browser to render the html. It looks like there are now some utilities that you might be able to leverage. I found this recent updated article on Super User that actually got me somewhat excited, especially since I don't use AIX anymore so precompiled binaries and well understood and easily attainable dependencies are something I can actually have in my life.
https://superuser.com/questions/280552/how-can-i-render-a-website-as-an-image-from-the-shell
Good Luck.

There seems to be several questions rolled into this one item.
Converting HTML to PDF, while that is just a data manipulation that you could do in basic, writing such code would be a large task. The option you use sending it to another system is valid, but put more points of failure into the system. I would think you could find code to do it on the AIX box.
Rocket plans on getting the MV Python to work on AIX, this will make the converting of html to PDF much easier since there are a lot of open source modules.
As for my suggestion of using sockets, that would be if you intend to send it to a service that will take the htms, and return the pdf document.
i.e. Is there a web service for converting HTML to PDF?
Once you have the pdf document, you can either store it in a UniVerse type-19 file, or do the base64 encoding and store it in UniVerse hash file.
Hope this helps,
Mike

Jedox - How to export SOAP Log into Excel file

I'm pretty new on Jedox, and I'm trying for internal use to export some specific "warning logs" (eg. "(Mapping missing)" ) into an excel/wss file.
And I don't how to do that...
Can you help me please.
Regards,
Usik

The easiest way to get these information is to use the Integrator in Jedox.
There you have the possibility to use a File Extract and then you can filter the information you are searching for.
After that it's possible to load these filtered information into a File.
The minimum steps you'll need are Connection -> Extract -> Transform -> Load.
Please take a look at the sample projects that are delivered with the Jedox software. In the example "sampleBiker", there are also file connections, extracts etc.
You can find more samples in:
<Install_path>\tomcat\webapps\etlserver\data\samples
I recommend to check the Jedox Knowledgebase.
The other way (and maybe more flexible way) would be to use, for example, a PHP macro inside of a Jedox Web report and read the log file you're trying to display.
If you've got a more specific idea what you'd like to do, please let me know and I'll try to give you an example how to do so.

Is it possible to run an OpenRefine script in the background?

Can I trigger an OpenRefine script to run in the background without user interaction? Possibly use a windows service to load a OpenRefine config file or start the OpenRefine web server with parameters and save the output?
We parse various data sources from files and place the output into specific tables and fields in sql server. We have a very old application that creates these "match patterns" and would like to replace it with something more modern. Speed is important but not critical. We are parsing files with 5 to 1,000,000 lines typically.
I could be going in the wrong direction with OpenRefine if so please let me know. Our support team that creates these "match patterns" would be best suited with a UI like OpenRefine instead of writing Perl or Python scripts.
Thanks for your help.

OpenRefine has a set of library that let you automated an existing job. The following are available:
* two in Python here and here
* one in ruby
* one in nodejs
Those libraries needs two inputs:
a source file to be processed in OpenRefine
the OpenRefine operation in JSON format.
At RefinePro (disclaimer I am the founder and CEO of RefinePro), we have written some extra wrapper to schedule to select an OpenRefine project, extract the JSON operations, start the library and save the result. The newly created job can then be scheduled.
Please keep in mind that OpenRefine has very poor error handling which limits it's usage as an ETL platform.

Dynamic Reports - is it possible to set starting page?

I'm using Dynamic Reports to build huge PDF files (like 80.000 pages) and for now, the solution I found was to create intermediary files and merge them after processing. The last challenge for getting it done is to add page numbers, but the default counting obviously get messed up after merging. So I need some way to set the starting page number when creating the temp PDF files. The three methods available don't allow page setting. Is it possible? How do I do it?
Thanks in advance.

Yes, it is possible. Although it's hard to find in the documentation, the report() has exactly what is needed: the method
report.setStartPageNumber(int)
As stated by Ricardo in the DynamicReports forum.

where to get sample ofx files for testing?

I am building a php application using Ofx Parser Class from http://www.phpclasses.org/package/5778-PHP-Parse-and-extract-financial-records-from-OFX-files.html . But where can i get a sample ofx file to use this class and test my application?

Try searching "filetype:ofx" in google. I have found a couple there. If you need a whole bunch for a more complete test I don't know.

Easiest by far is to have an online bank account yourself that supports ofx downloads. But you're right; it's surprisingly difficult to find anything past a simplest case online.
I dug up this article on IBM developerWorks that includes a quick sample. It's on parsing ofx with php and helpfully shows the difference between a well-formed XML version of an ofx and the starting-tag only version you'll often find when you download from various banks, but even this sample is only one withdrawal and one deposit.

Try using https://github.com/wesabe/fixofx. It has a script called fakeofx.py
The fakeofx.py script generates real-ish-seeming OFX for testing and
demo purposes. You can generate a few fake OFX files using the script,
and upload them to Wesabe to try it out or demonstrate it without
showing your real account data to anyone.
The script uses some real demographic data to make the fake
transactions it lists look real, but otherwise it isn't at all
sophisticated. It will randomly choose to generate a checking or
credit card statement and has no options.

These are the two references I used. The first one is about the structure of and ofx file and the second one give you the connection information for the financial institutions.
http://www.ofx.net/
http://www.ofxhome.com/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Extract changes from Wikipedia/Wikimedia revision pages - wikipedia-api

You can get the diff between the old and new text with action=compare, but it segments text by wikitext lines, not sentences, isn't meant to be machine-readable, and is generally not that helpful. Since you are using Python, the client-side library deltas will probably work better for you.

Related

Somehow send command line commands on windows externally and get back the response

Jedox - How to export SOAP Log into Excel file

Is it possible to run an OpenRefine script in the background?

Dynamic Reports - is it possible to set starting page?

where to get sample ofx files for testing?

Categories

Resources