Filter changes by comment - wikipedia-api

Are there any way to filter changes by comment (e.g. changes containing some word)?
I want something similar to meta=query&list=recentchanges api, but with ability to filter changes by comment.

It's not possible, comments are not indexed.
You could write your own code for it, of course - on a smallish wiki unindexed queries would work fine, especially if you limit it to the recentchanges table (last 30 days only). Or you can have client-side logic iterate through the API and filter it.

Related

Displaying data with pagination - get whole data at once or get part for each page?

Which way is the best in case of performance if I want to display data with pagination? Should I download all data from the DB and then locally switch those parts depending on the current page or get the data from the DB part by part?
Firstly I was opting for the second option but I started wondering, found this article and I'm lost now.
In my SQL queries, I'm using OFFSET and LIMIT attribute and since I'm also obtaining the last page of the pagination so the better option would be the first one as far as I understand? Important to notice is my database is quite small.
And the best one option would be to still use OFFSET but without reading last page or I'm wrong (in case of larger databases and improvement of performance)?
So at the end, I implemented it just like the article says. Removed the "move to last page" button so it won't be forced to count all rows and anyway I got some of sorting features like ASC/DESC by particular columns so still if user want to go to the last page he can casually click this filter option and he will get the last elements but using as I said before ASC/DESC queries which I hope are faster than OFFSET.

How should I deal with copies of data in a database?

What should I do if a user has a few hundred records in the database, and would like to make a draft where they can take all the current data and make some changes and save this as a draft potentially for good, keeping the two copies?
Should I duplicate all the data in the same table and mark it as a draft?
or only duplicate the changes? and then use the "non-draft" data if no changes exist?
The user should be able to make their changes and then still go back to the live and make changes there, not affecting the draft?
Just simply introduce a version field in the tables that would be affected.
Content management systems (CMS) do this already. You can create a blog post for example, and it has version 1. Then a change is made and that gets version 2 and on and on.
You will obviously end up storing quite a bit more data. A nice benefit though is that you can easily write queries to load a version (or a snapshot) of data.
As a convention you could always make the highest version number the "active" version.
You can either use BEGIN TRANS, COMMIT and ROLLBACK statements or you can create a stored procedure / piece of code that means that any amendments the user makes are put into temporary tables until they are ready to be put into production.
If you are making a raft of changes it is best to use temporary tables as using COMMIT etc can result in locks on the live data for other uses.
This article might help if the above means nothing to you: http://www.sqlteam.com/article/temporary-tables
EDIT - You could create new tables (ie NOT temporary, but full fledged sql tables) "on the fly" and name them something meaningful. For instance, the users intials, followed by original table name, followed by a timestamp.
You can then programtically create, amend and delete these tables over long periods of time as well as compare against Live tables. You would need to keep track of how many tables are being created in case your database grows to vast sizes.
The only major headache then is putting the changes back into the live data. For instance, if someone takes a cut of data into a new table and then 3 weeks later decides to send it into live after making changes. In this instance there is a likelihood of the live data having changed anyway and possibly superseding the changes the user will submit.
You can get around this with some creative coding though. There are many ways to tackle this, so if you get stuck at the next step you might want to start a new question. Hopefully this at least gives you some inspiration though.

User Specific Lucene Search

I don't think this is a very obscure Lucene problem, but somehow I just don't seem to be able to find a good solution to it. I will use an example.
Let's say I am building a news articles website. Registered users can bookmark articles that they are interested in. I want to allow users to search for only articles that he/she bookmarks. For the sake of example, let's also assume that a user can potentially bookmark thousands of articles, and we have hundreds of thousands of users in our database. How do I build a scalable solution for this problem?
Thanks a lot!
This is a very typical Lucene problem as it does not support joins. More specifically, there's no first class support and you have to find your ways around it. I can suggest a few:
You could have a database, which has users, articles and bookmarks tables (the latter would have foreign keys pointing to the first two). You would also have articles indexed in Lucene. When running a search against articles, you could write a Lucene Filter which would exclude all articles not bookmarked by the current user.
You could index all articles and bookmarks in Lucene - probably best if you do this using separate indices. Then you could run a query for bookmarks (to retrieve which articles current user has bookmarked) and then run another separate query for articles. Like in the previous example, you could use the results of the first query to exclude all other articles which are not bookmarked by the current user.
I personally prefer option #1 as this is classical relational structure and databases are designed for exactly this purpose. With the option #2 you would have to modify both user storage and Lucene index when user gets deleted.

CMS Settings, .INI file or MySQL?

I'm looking for the best solution to store the ettings for a website, like the limit of posts for users, limit of users online, ranks, min. number of posts to be able to do something.
Like here, if you're new you can't thumbs up/down a post, or whatever, so how would you store all of these?
I thought of creating a table with constants in mysql but i think it's not the best solution to add a new mysql query on every page refresh.
Why not? After all, MySQL can handle a large number of requests and I've seen much more complex queries than checking access rights. A MySQL query is a query to a file just like checking an INI file, but optimized. I'm guessing that if you don't expect a huge amount of traffic, you'll be fine with a database.
Here it's a matter of preference. I prefer to do this in MySQL because I don't like to parse files and find querying a database easier. Also, editing rows is easier than changing values in a text file.
I'd say your first thought was spot-on. Put constants into a database.

Ajax autocomplete extender populated from SQL

OK, first let me state that I have never used this control and this is also my first attempt at using a web service.
My dilemma is as follows. I need to query a database to get back a certain column and use that for my autocomplete. Obviously I don't want the query to run every time a user types another word in the textbox, so my best guess is to run the query once then use that dataset, array, list or whatever to then filter for the autocomplete extender...
I am kinda lost any suggestions??
Why not keep track of the query executed by the user in a session variable, then use that to filter any further results?
The trick to preventing the database from overloading I think is really to just limit how frequently the auto updater is allowed to update, something like once per 2 seconds seems reasonable to me.
What I would do is this: Store the current list returned by the query for word A server side and tie that to a session variable. This should be basically the entire list I would think. Then, for each new word typed, so long as the original word A exists, you can filter the session info and spit the filtered results out without having to query again. So basically, only query again when word A changes.
I'm using "session" in a PHP sense, you may be using a different language with different terminology, but the concept should be the same.
This question depends upon how transactional your data store is. Obviously if you are looking for US states (a data collection that would not change realistically through the life of the application) then I would either cache a System.Collection.Generic List<> type or if you wanted a DataTable.
You could easily set up a cache of the data you wish to query to be dependent upon an XML file or database so that your extender always queries the data object casted from the cache and the cache object is only updated when the datasource changes.
RAM is cheap and SQL is harder to scale than IIS so cache everything in memory:
your entire data source if is not
too large to load it in reasonable
time,
precalculated data,
autocomplete webservice responses.
Depending on your autocomplete desired behavior and performance you may want to precalculate data and create redundant structures optimized for reading. Make use of structs like SortedList (when you need sth like 'select top x ... where z like #query+'%'), Hashtable,...
While caching everything is certainly a good idea, your question about which data structure to use is an issue that wasn't fully answered here.
The best data structure for an autocomplete extender is a Trie.
You can find a good .NET article and code here.