Getting the exact edited data from SQL Server - sql

I have two Tables:
Articles(artID, artContents, artPublishDate, artCategoryID, publisherID).
ArticleUpdated(upArtID, upArtContents, upArtEditedData, upArtPublishDate, upArtCategory, upArtOriginalArticleID, upPublisherID)
A user logging in to the application and update an article's
contents at (artContents) column. I want to know about:
Which Changes the user made to the article's contents?
I want to store both versions of the Article, Original version and Edited Version!
What should I do for doing above two task:
Any necessary changes into the tables?
The query for getting exact edited data of (artContents).
(The exact edited data means, that there may 5000 characters in the coloumns, the user may edit 200 characters in the middle or somewhere else in column's characters, I want exact those edited characters, before of edit and after of edit)
Note: I am using ASP.NET with C# for Developing

You are not going to be able to do the exact editing using SQL. You need an algorithm such as the Unix diff on files (which works on the line level). At the character level, the algorithm would be some variation of Levenshtein distance. If diff meets your needs, you could download it, write a stored-procedure to call it, and then use it in the database. This would be rather expensive.
The part of your question of maintaining the different versions is much easier. I would add two colmnns EffDate and EndDate onto each record. You can get the most recent version by looking for EndDate is NULL and find the version active at any given time. Merge is generally useful for maintaining such a table.

Basically this type for requirement needs custom logging.
The example what you have provided i.e. "The exact edited data means, that there may 5000 characters in the coloumns, the user may edit 200 characters in the middle or somewhere else in column's characters, I want exact those edited characters, before of edit and after of edit"
Can have a case that user updates particular words from different place from the text.
You can use http://nlog-project.org/ for logging, its a fast and robust tool that normally we use for doing .net logging.
Also you can take a look
http://www.codeproject.com/Articles/38756/Two-Simple-Approaches-to-WinForms-Dirty-Tracking
Asp.net Event for change tracking of entities
What would be the best way to implement change tracking on an object
Above urls will clear some air, on how to do it.
You would obviously need to track down and store every change.

Related

What ABAP object has been changed today?

Some functionality in a big project is broken on the development system.
Pretty sure it worked a few hours ago.
How do I know, which ABAP objects have been changed lately?
(I think I can guess the transport and the package that contains the change if that helps)
The nearest answer that I found is table VRSD.
It contains the date of the version of an object.
This doesn't help, since you need to export the transport or create a manual version to get an entry in this table.
So which objects have been changed without creating a new version?
(Yes we will find the change with functional checks, but knowing the changed objects would be a nice shortcut)
For code - table TRDIR has a changed on date that updates when code is activated.
For data dictionary objects check the DD* tables. I know DD01L is domains and DD02L is tables. Both of these will have a change date. I'm sure there are others for the other data types.
There is also the table REPOLOAD which contains the ABAP byte code. There are 3 fields UDAT, UTIME and UNAME for date, time and user who did the last generation (PS: don't be confused by SDAT and STIME fields).

Large results set from Oracle SELECT

I have a simple, pasted below, statement called against an Oracle database. This result set contains names of businesses but it has 24,000 results and these are displayed in a drop down list.
I am looking for ideas on ways to reduce the result set to speed up the data returned to the user interface, maybe something like Google's search or a completely different idea. I am open to whatever thoughts and any direction is welcome.
SELECT BusinessName FROM MyTable ORDER BY BusinessName;
Idea:
SELECT BusinessName FROM MyTable WHERE BusinessName LIKE "A%;
I'm know all about how LIKE clauses are not wise to use but like I said this is a LARGE result set. Maybe something along the lines of a BINARY search?
The last query can perform horribly. String comparisons inside the database can be very slow, and depending on the number of "hits" it can be a huge drag on performance. If that doesn't concern you that's fine. This is especially true if the Company data isn't normalized into it's own db table.
As long as the user knows the company he's looking up, then I would identify an existing JavaScript component in some popular JavaScript library that provides a search text field with a dynamic dropdown that shows matching results would be an effective mechanism. But you might want to use '%A%', if they might look for part of a name. For example, If I'm looking for IBM Rational, LLC. do I want it to show up in results when I search for "Rational"?
Either way, watch your performance and if it makes sense cache that data in the company look up service that sits on the server in front of the DB. Also, make sure you don't respond to every keystroke, but have a timeout 500ms or so, to allow the user to type in multiple chars before going to the server and searching. Also, I would NOT recommend bringing all of the company names to the client. We're always looking to reduce the size and frequency of traversals to the server from the browser page. Waiting for 24k company names to come down to the client when the form loads (or even behind the scenes) when shorter quicker very specific queries will perform sufficiently well seems more efficient to me. Again, test it and identify the performance characteristics that fit your use case best.
These are techniques I've used on projects with large data, like searching for a user from a base of 100,000+ users. Our code was a custom Dojo widget (dijit), I 'm not seeing how to do it directly with the dijit code, but jQuery UI provides the autocomplete widget.
Also use limit on this query with a text field so that the drop down only provides a subset of all the matches, forcing the user to further refine the query.
SELECT BusinessName FROM MyTable ORDER BY BusinessName LIMIT 10

xwiki/velocity recent changes

by default the recent pages code that can be found does not do what I want it to do.
How can I get
a media-wiki-like version of recent changes
-and/or-
the last 10 changed pages
preferably using velocity.
Many greetings
The code is probably based on a database request to get the last pages so that mean you can limit the number of result easily either as a $xwiki.search method parameter (see http://tinyurl.com/7r8od94 for example) or better using setLimit if you are using the new query service (see http://tinyurl.com/7y99smg).
If you can point me to the exact code you are talking about I can probably give you more details on what to modify.

django objects...values() select only some fields

I'm optimizing the memory load (~2GB, offline accounting and analysis routine) of this line:
l2 = Photograph.objects.filter(**(movie.get_selectors())).values()
Is there a way to convince django to skip certain columns when fetching values()?
Specifically, the routine obtains all rows of the table matching certain criteria (db is optimized and performs it very quickly), but it is a bit too much for python to handle - there is a long string referenced in each row, storing the urls for thumbnails.
I only really need three fields from each row, but, if all the fields are included, it suddenly consumes about 5kB/row which sadly pushes the RAM to the limit.
The values(*fields) function allows you to specify which fields you want.
Check out the QuerySet method, only. When you declare that you only want certain fields to be loaded immediately, the QuerySet manager will not pull in the other fields in your object, till you try to access them.
If you have to deal with ForeignKeys, that must also be pre-fetched, then also check out select_related
The two links above to the Django documentation have good examples, that should clarify their use.
Take a look at Django Debug Toolbar it comes with a debugsqlshell management command that allows you to see the SQL queries being generated, along with the time taken, as you play around with your models on a django/python shell.

Database vs. Front-End for Output Formatting

I've read that (all things equal) PHP is typically faster than MySQL at arithmetic and string manipulation operations. This being the case, where does one draw the line between what one asks the database to do versus what is done by the web server(s)? We use stored procedures exclusively as our data-access layer. My unwritten rule has always been to leave output formatting (including string manipulation and arithmetic) to the web server. So our queries return:
unformatted dates
null values
no calculated values (i.e. return values for columns "foo" and "bar" and let the web server calculate foo*bar if it needs to display value foobar)
no substring-reduced fields (except when shortened field is so significantly shorter that we want to do it at database level to reduce result set size)
two separate columns to let front-end case the output as required
What I'm interested in is feedback about whether this is generally an appropriate approach or whether others know of compelling performance/maintainability considerations that justify pushing these activities to the database.
Note: I'm intentionally tagging this question to be dbms-agnostic, as I believe this is an architectural consideration that comes into play regardless of one's specific dbms.
I would draw the line on how certain layers could rotate out in place for other implementations. It's very likely that you will never use a different RDBMS or have a mobile version of your site, but you never know.
The more orthogonal a data point is, the closer it should be to being released from the database in that form. If on every theoretical version of your site your values A and B are rendered A * B, that should be returned by your database as A * B and never calculated client side.
Let's say you have something that's format heavy like a date. Sometimes you have short dates, long dates, English dates... One pure form should be returned from the database and then that should be formatted in PHP.
So the orthogonality point works in reverse as well. The more dynamic a data point is in its representation/display, the more it should be handled client side. If a string A is always taken as a substring of the first six characters, then have that be returned from the database as pre-substring'ed. If the length of the substring depends on some factor, like six for mobile and ten for your web app, then return the larger string from the database and format it at run time using PHP.
Usually, data formatting is better done on client side, especially culture-specific formatting.
Dynamic pivoting (i. e. variable columns) is also an example of what is better done on client side
When it comes to string manipulation and dynamic arrays, PHP is far more powerful than any RDBMS I'm aware of.
However, data formatting can use additional data which is also kept in the database. Like, the coloring info for each row can be stored in additional table.
You should then correspond the color to each row on database side, but wrap it into the tags on PHP side.
The rule of thumb is: retrieve everything you need for formatting in as few database round-trips as possible, then do the formatting itself on the client side.
I believe in returning the data pretty much as-is from the database and letting it be formatted on the front-end instead. I don't stick to it religously, but in general I think it's better as it provides greater flexibility - e.g. 1 sproc can service n different requirements for data, each of which can format the data as each individually needs. Otherwise, you end up either with multiple queries returning the same data with slightly different formatting from the DB (from a SQL Server point of view, thus reducing execution plan caching benefits - therefore negative impact on performance).
Leave output formatting to the web server