What do the entries in Lamport clocks representations represent? - process

I'm trying to understand an illustrative example of how Lamport's algorithm is applied. In the course that I'm taking, we were presented with two representations of the clocks within three [distant] processes, one with the lamport alogrithm applied and the other without.
Without the Lamport algorithm:
With the lamport algorithm applied:
My question is concerning the validity of the change that was applied to the third entry of the table pertaining to the process P1. Shouldn't it be, as the Lamport algorithm instructs, max(2, 2) + 1, which is 3 not 4?
When I asked some of my classmates regarding this issue, one of them informed me that the third entry of the table of P1 represents a "local" event that happened within P1, and so when message A is arrived, the entry is updated to max(2, 3) + 1, which is 4. However, if that was the case, shouldn't the receipt of the message be represented in a new entry of its own, instead of being put in the same entry that represents the local event that happened within P1?
Upon further investigation, I found, in the same material of the course, a figure that was taken from Tannenbaum's Distributed Systems: Principles and Paradigms, in which the new values of an entry that corresponds to the receipt of a message is updated by adding 1 to the max of the entry before it in the same table and the timestamp of the received message, as shown below, which is quite different from what was performed in the first illustration.
I'm unsure if the problem relates to a faulty understanding that I have regarding the algorithm, or to the possibility that the two illustrations are using different conventions with respect to what the entries represent.

validity of the change that was applied to the third entry of the table pertaining to the process P1
In classical lamport algorithm, there is no need to increase local counter before taking max. If you do that, that still works, but seems like an useless operation. In the second example, all events are still properly ordered. In general, as long as counters go up, the algorithm works.
Another way of looking at correctness is trying to rebuild the total order manually. The hard requirement is that if an event A happens before an event B, then in the total order A will be placed before B. In both picture 2 and 3, everything is good.
Let's look into picture 2. Event (X) from second cell in P0 happens before the event (Y) of third cell of P1. To make sure X does come before Y in the total order it is required that the time of Y to be larger than X's. And it is. It doesn't matter if the time difference is 1 or 2 or 100.
in which the new values of an entry that corresponds to the receipt of a message is updated by adding 1 to the max of the entry before it in the same table and the timestamp of the received message, as shown below, which is quite different from what was performed in the first illustration
It's actually pretty much the same logic, with exception of incrementing local counter before taking max. Generally speaking, every process has its own clock and every event increases that clock by one. The only exception is when a clock of a different process is already in front, then taking max is required to make sure all events have correct total order. So, in the third picture, P2 adjusts clock (taking max) as P3 is way ahead. Same for P1 adjust.

Related

Reducing database load from consecutive queries

I have an application which calls the database multiple times to achieve one simple goal.
A little information about this application; In short, the application scrapes data from a webpage & stores specific information from this page into a database. The important information in this query is: Player name, Position. There can be multiple sitting at one specific position, kill points & Class
Player name has every potential to change or remain the same every day
Regarding the Position, there can be multiple sitting in one position
Kill points has the potential to increase or remain the same every day
Class, there is only 2 possibilities that a name can be, Ex: A can change to B or remain A (same in reverse), but cannot be C,D,E,F
The player name can change at any particular day, Position can also change dependent on the kill point increase from the last update which spins back around to the goal. This is to search the database day by day, from the current date to as far back as 2021-02-22 starting at the most recent entry for a player name and back track to the previous day to check if that player name is still the same or has changed.
What is being used as a main reference to the change is the kill points. As the days go on, this number will either be the exact same or increase, it can never decrease.
So now onto the implementation of this application.
The first query which runs finds the most recent entry for the player name
SELECT TOP(1) * FROM [changes] WHERE [CharacterName]=#charname AND [Territory]=#territory AND [Archived]=0 ORDER BY [Recorded] DESC
Then continue to check the previous days entries with the following query:
SELECT TOP(1) * FROM [changes] WHERE [Territory]=#territory AND [CharacterName]=#charname AND [Recorded]=#searchdate AND ([Class] LIKE '%{Class}%' OR [Class] LIKE '%{GetOpposite(Class)}%' AND [Archived]=0 )
If no results are found, will then proceed to find an alternative name with the following query:
SELECT TOP(5) * FROM [changes] WHERE [Kills] <= #kills AND [Recorded]='{Data.Recorded.AddDays(-1):yyyy-MM-dd}' AND [Territory]=#territory AND [Mode]=#mode AND ([Class] LIKE #original OR [Class] LIKE #opposite) AND [Archived]=0 ORDER BY [Kills] DESC
The aim of the query above is to get the top 5 entries that are the closest possible matches & Then cross references with the day ahead
SELECT COUNT(*) FROM [changes] WHERE [CharacterName]=#CharacterName AND [Territory]=#Territory AND [Recorded]=#SearchedDate AND [Archived]=0
So with checking the day ahead, if the character name is not found in the day ahead, then this is considered to be the old player name for this specific character, else after searching all 5 of the results and they are all found to be present in the day aheads searches, then this name is considered to be new to the table.
Now with the date this application started to run up to today's date which is over 400 individual queries on the database to achieve one goal.
It is also worth a noting that this table grows by 14,400 - 14,500 Rows each and every day.
The overall question to this specific? Is it possible to bring all these queries into less calls onto the database, reduce queries & improve performance?
What you can do to improve performance will be based on what parts of the application stack you can manipulate. Things to try:
Store Less Data - Database content retrieval speed is largely based on how well the database is ordered/normalized and just how much data needs to be searched for each query. Managing a cache of prior scraped pages and only storing data when there's been a change between the current scrape and the last one would guarantee less redundant requests to the db.
Separate specific classes of data - Separating data into dedicated tables would allow you to query a specific table for a specific character, etc... effectively removing one where clause.
Reduce time between queries - Less incoming concurrent requests means less resource contention and faster response times to prior requests.
Use another data structure - The only reason you're using top() is because you need data ordered in some specific way (most-recent, etc...). If you just used a code data structure that keeps the data ordered and still easily-query-able you could then perhaps offload some sql requests to this structure instead of the db.
The suggestions above are not exhaustive, but what you do to improve performance is largely a function of what in the application stack you have the ability to modify.

Users updating same row at the same time SQL Server

I want to create a SQL Server table that has a Department and a Maximum Capacity columns (assume 10 for this scenario). When users add them selves to a department the system will check the current assignment count (assume 9 for this scenario) in the department and compare it to the maximum value. If it is below the maximum, they will be added.
The issue is this: what if two users submit at the same time and the when the code retrieves the current assignment count it will be 9 for both. One user updates the row sooner so now its 10 but the other user has already retrieved the previous value before the update (9) and so both are valid when compared and we end up with 11 users in the department.
Is this even possible and how can one solve it?
The answer to your problem lies in understanding "Database Concurrency" and then choosing the correct solution to your specific scenario.
It too large a topic to cover in a single SO answer so I would recommend doing some reading and coming back with specific questions.
However in simple form you either block the assignments out to the first person who tries to obtain them (pessimistic locking), or you throw an error after someone tries to assign over the limit (optimistic locking).
In the pessimistic case you then need ways to unblock them if the user fails to complete the transaction e.g. a timeout. A bit like on a ticket booking website it says "These tickets are being held for you for the next 10 minutes, you must complete your booking within that time else you may lose them".
And when you're down to the last few positions you are going to be turning everyone after the first away... no other way around it if you require this level of locking. (Well you could then create a waiting list, but that's another issue in itself).

How to make the Automatic Record Permission field to update itself as quickly as possible?

If you are working with access control, you must have faced the issue where the Automatic Record Permission field (with Rules) does not update itself on recalculating the record. You either have to launch full recalculation or wait for a considerable amount of time for the changes to take place.
I am facing this issue where based on 10 different field values in the record, I have to give read/edit access to 10 different groups respectively.
For instance:
if rule 1 is true, give edit access to 1st group of users
if rule 1 and 2 are true, give edit access to 1st AND 2nd group of
users.
I have selected 'No Minimum' and 'No Maximum' in the Auto RP field.
How to make the Automatic Record Permission field to update itself as quickly as possible? Am I missing something important here?
If you are working with access control, you must have faced the issue
where the Automatic Record Permission field (with Rules) does not
update itself on recalculating the record. You either have to launch
full recalculation or wait for a considerable amount of time for the
changes to take place.
Tanveer, in general, this is not a correct statement. You should not face this issue with [a] well-designed architecture (relationships between your applications) and [b] correct calculation order within the application.
About the case you described. I suggest you check and review the following possibilities:
1. Calculation order.Automatic Record Permissions [ARP from here] are treated by Archer platform in the same way as calculated fields. This means that you can modify the calculation order in which calculated field and automatic record permissions will be updated when you save the record.So it is possible that your ARP field is calculated before certain calculated fields you use in the rules in ARP. For example, let say you have two rules in ARP field:
if A>0 then group AAA
if B>0 then groub BBB
Now, you will have a problem if calculation order is the following:
"ARP", "A", "B"
ARP will not be updated after you click "Save" or "Apply", but it will be updated after you click "Save" or "Apply" twice within the save record.With calculation order "A","B","ARP" your ARP will get recalculated right away.
2. Full recalculation queue.
Since ARPs are treated as calculated fields, this mean that every time ARP needs to get updated there will be recalculation job(s) created on the application server on the back end. And if for some reason the calculation queue is full, then record permission will not get updated right away. Job engine recalculation queue can be full if you have a data feed running or if you have a massive amount of recalculations triggered via manual data imports. Recalculation job related to ARP update will be created and added to the queue. Recalculation job will be processed based on the priorities defined for job queue. You can monitor the job queue and alter default job's processing priorities in Archer v5.5 via Archer Control Panel interface. I suggest you check the job queue state next time you see delays in ARP recalculations.
3. "Avalanche" of recalculations
It is important to design relationships and security inheritance between your applications so recalculation impact is minimal.
For example, let's say we have Contacts application and Department application. - Record in the Contacts application inherits access using Inherited Record Permission from the Department record.-Department record has automatic record permission and Contacts record inherits it.-Now the best part - Department D1 has 60 000 Contacts records linked to it, Department D2 has 30 000 Contacts records linked to it.The problem you described is reproducible in the described configuration. I will go to the Department record D1 and updated it in a way that ARP in the department record will be forced to recalculate. This will add 60 000 jobs to the job engine queue to recalculate 60k Contacts linked to D1 record. Now without waiting I go to D2 and make change forcing to recalculate ARP in this D2 record. After I save record D2, new job to recalculate D2 and other 30 000 Contacts records will be created in the job engine queue. But record D2 will not be instantly recalculated because first set of 60k records was not recalculated yet and recalculation of the D2 record is still sitting in the queue.
Unfortunately, there is not a good solution available at this point. However, this is what you can do:
- review and minimize inheritance
- review and minimize relationships between records where 1 record reference 1000+ records.
- modify architecture and break inheritance and relationships and replace them with Archer to Archer data feeds if possible.
- add more "recalculation" power to you Application server(s). You can configure your web-servers to process recalculation jobs as well if they are not utilized to certain point. Add more job slots.
Tanveer, I hope this helps. Good luck!

Dynamically filtering large query result for presentation in SSRS

We have a system that records data to an SQL Server DB captured from field equipment every minute. This data is used for a number of purposes, one of which is for charting in reports via SSRS.
The issue is that with such a high volume of data, when a report is run for period of for example 3 months, the volume of data returned obviously causes excessive report rendering times.
I've been thinking of finding a way of dynamically reducing the amount of data returned, based on the start and end time periods chosen. Something along the lines of a sliding scale where from the duration between the start and end period, I can apply different levels of filtering so that where larger periods are chosen, more filtering occurs while for smaller periods less or no filtering occurs.
There is still a need to be able to produce higher resolution (as in more data points returned) reports for troubleshooting purposes.
For example:
Scenario 1:
User is executing a report for a period of 3 months. Result set returned by the query is reduced for performance reasons without adversely affecting what information the user wants to see (the chart is still representative of the changes over time).
Scenario 2:
User executes the report for a period of 1 hour, in order to look for potential indicator(s) of problems with field devices while troubleshooting the system. For this short time period, no filtering is applied.
My first thought was to use a modulo operation on the primary key of the data (which is an identity field), whereby the divisor is chosen depending on the difference between the start and end dates.
For example, something like if the difference between the start and end dates for the report execution period is 5 weeks, choose a divisor of 5 and apply a mod to the PK, selecting where the result is equal to zero.
I would love to get feedback as to whether this sounds like a valid approach or whether there is a better way to do this.
Thanks.

track sales for week/month and find the best sellers

Lets say I have a website that sells widgets. I would like to do something similar to a tag cloud tracking best sellers. However, due to constantly aquiring and selling new widgets, I would like the sales to decay on a weekly time scale.
I'm having problems puzzling out how store and manipulate this data and have it decay properly over time so that something that was an ultra hot item 2 months ago but has since tapered off doesn't show on top of the list over the current best sellers. What would be the logic and database design for this?
Part 1: You have to have tables storing the data that you want to report on. Date/time sold is obviously key. If you need to work in decay factors, that raises the question: for how long is the data good and/or relevant? At what point in time as the "value" of the data decayed so much that you no longer care about it? When this point is reached for any given entry in the database, what do you do--keep it there but ensure it gets factored out of all subsequent computations? Or do you archive it--copy it to a "history" table and delete it from your main "sales" table? This is relevant, as it has to be factored into your decay formula (as well as your capacity planning, annual reporting requirements, and who knows what all else.)
Part 2: How much thought has been given to the decay formula that you want to use? There's no end of detail you can work into this. Options and factors to wade through include but are not limited to:
Simple age-based. Everything before the cutoff date counts as 1; everything after counts as 0. Sum and you're done.
What's the cutoff date? Precisly 14 days ago, to the minute? Midnight as of two Saturdays ago from (now)?
Does the cutoff date depend on the item that was sold? If some items are hot but some are not, does that affect things? What if you want to emphasize some things (the expensive/hard to sell ones) over others (the fluff you'd sell anyway)?
Simple age-based decays are trivial, but can be insufficient. Time to go nuclear.
Perhaps you want some kind of half-life, Dr. Freeman?
Everything sold is "worth" X, where the value of X is either always the same or varies on the item sold. And the value of X can decay over time.
Perhaps the value of X decreased by one-half every week. Or ever day. Or every month. Or (again) it may vary depending on the item.
If you do half-lifes, the value of X may never reach zero, and you're stuck tracking it forever (which is why I wrote "part 1" first). At some point, you probably need some kind of cut-off, some point after which you just don't care. X has decreased to one-tenth the intial value? Three months have passed? Either/or but the "range" depends on the inherent valud of the item?
My real point here is that how you calculate your decay rate is far more important than how you store it in the database. So long as the data's there that the formalu needs to do it's calculations, you should be good. And if you only need the last month's data to do this, you should perhaps move everything older to some kind of archive table.
you could just count the sales for the last month/week/whatever, and sort your items according to that.
if you want you can always add the total amonut of sold items into your formula.
You might have a table which contains the definitions of the pointing criterion (most sales, most this, most that, etc.), then for a given period, store in another table the attribution of points for each of the criterion defined in the criterion table. Obviously, a historical table will be used to store the score for each sellers for a given period or promotion, call it whatever you want.
Does it help a little?