How to store constantly coordinates in a DB, and then update/replace on changes? - sql

I want to store GPS coordinates from any device into a DB on a SQL SERVER, and check it in real time on a web, that will ask constantly for positions.
I saw others questions and answers (on StackOverflow and Google), and everybody want to add new rows (with the coordinates) on a table, where there were already stored previous coordinates.
In my case, I don't want to store previous coordinates, I just want to know where are they NOW, so I think it has no point to add new rows.
Therefore, the numbers of rows will remain constant.
Since I had two tables: DEVICES(idDevice, device) and COORDINATES(device, long, lat), everytime a device send a new position (let's say every 1 second), it value will UPDATE the row existing with its previous value.
My question: Is that the best way (the "continuously auto-replacement" technique) I can do that? or is there a more optimal way to update positions?
And, like a 2nd question: that's the best way to build the tables for what I want to do?

If you are definitely storing only one set of coordinates then I would suggest you remove COORDINATES and use DEVICES(idDevice, device, long, lat). You must already be handling making sure that a DEVICES row exists so now you can simply UPDATE DEVICES SET long = xxx, lat = yyy WHERE idDevice = deviceId.

Related

What do the entries in Lamport clocks representations represent?

I'm trying to understand an illustrative example of how Lamport's algorithm is applied. In the course that I'm taking, we were presented with two representations of the clocks within three [distant] processes, one with the lamport alogrithm applied and the other without.
Without the Lamport algorithm:
With the lamport algorithm applied:
My question is concerning the validity of the change that was applied to the third entry of the table pertaining to the process P1. Shouldn't it be, as the Lamport algorithm instructs, max(2, 2) + 1, which is 3 not 4?
When I asked some of my classmates regarding this issue, one of them informed me that the third entry of the table of P1 represents a "local" event that happened within P1, and so when message A is arrived, the entry is updated to max(2, 3) + 1, which is 4. However, if that was the case, shouldn't the receipt of the message be represented in a new entry of its own, instead of being put in the same entry that represents the local event that happened within P1?
Upon further investigation, I found, in the same material of the course, a figure that was taken from Tannenbaum's Distributed Systems: Principles and Paradigms, in which the new values of an entry that corresponds to the receipt of a message is updated by adding 1 to the max of the entry before it in the same table and the timestamp of the received message, as shown below, which is quite different from what was performed in the first illustration.
I'm unsure if the problem relates to a faulty understanding that I have regarding the algorithm, or to the possibility that the two illustrations are using different conventions with respect to what the entries represent.
validity of the change that was applied to the third entry of the table pertaining to the process P1
In classical lamport algorithm, there is no need to increase local counter before taking max. If you do that, that still works, but seems like an useless operation. In the second example, all events are still properly ordered. In general, as long as counters go up, the algorithm works.
Another way of looking at correctness is trying to rebuild the total order manually. The hard requirement is that if an event A happens before an event B, then in the total order A will be placed before B. In both picture 2 and 3, everything is good.
Let's look into picture 2. Event (X) from second cell in P0 happens before the event (Y) of third cell of P1. To make sure X does come before Y in the total order it is required that the time of Y to be larger than X's. And it is. It doesn't matter if the time difference is 1 or 2 or 100.
in which the new values of an entry that corresponds to the receipt of a message is updated by adding 1 to the max of the entry before it in the same table and the timestamp of the received message, as shown below, which is quite different from what was performed in the first illustration
It's actually pretty much the same logic, with exception of incrementing local counter before taking max. Generally speaking, every process has its own clock and every event increases that clock by one. The only exception is when a clock of a different process is already in front, then taking max is required to make sure all events have correct total order. So, in the third picture, P2 adjusts clock (taking max) as P3 is way ahead. Same for P1 adjust.

Reducing database load from consecutive queries

I have an application which calls the database multiple times to achieve one simple goal.
A little information about this application; In short, the application scrapes data from a webpage & stores specific information from this page into a database. The important information in this query is: Player name, Position. There can be multiple sitting at one specific position, kill points & Class
Player name has every potential to change or remain the same every day
Regarding the Position, there can be multiple sitting in one position
Kill points has the potential to increase or remain the same every day
Class, there is only 2 possibilities that a name can be, Ex: A can change to B or remain A (same in reverse), but cannot be C,D,E,F
The player name can change at any particular day, Position can also change dependent on the kill point increase from the last update which spins back around to the goal. This is to search the database day by day, from the current date to as far back as 2021-02-22 starting at the most recent entry for a player name and back track to the previous day to check if that player name is still the same or has changed.
What is being used as a main reference to the change is the kill points. As the days go on, this number will either be the exact same or increase, it can never decrease.
So now onto the implementation of this application.
The first query which runs finds the most recent entry for the player name
SELECT TOP(1) * FROM [changes] WHERE [CharacterName]=#charname AND [Territory]=#territory AND [Archived]=0 ORDER BY [Recorded] DESC
Then continue to check the previous days entries with the following query:
SELECT TOP(1) * FROM [changes] WHERE [Territory]=#territory AND [CharacterName]=#charname AND [Recorded]=#searchdate AND ([Class] LIKE '%{Class}%' OR [Class] LIKE '%{GetOpposite(Class)}%' AND [Archived]=0 )
If no results are found, will then proceed to find an alternative name with the following query:
SELECT TOP(5) * FROM [changes] WHERE [Kills] <= #kills AND [Recorded]='{Data.Recorded.AddDays(-1):yyyy-MM-dd}' AND [Territory]=#territory AND [Mode]=#mode AND ([Class] LIKE #original OR [Class] LIKE #opposite) AND [Archived]=0 ORDER BY [Kills] DESC
The aim of the query above is to get the top 5 entries that are the closest possible matches & Then cross references with the day ahead
SELECT COUNT(*) FROM [changes] WHERE [CharacterName]=#CharacterName AND [Territory]=#Territory AND [Recorded]=#SearchedDate AND [Archived]=0
So with checking the day ahead, if the character name is not found in the day ahead, then this is considered to be the old player name for this specific character, else after searching all 5 of the results and they are all found to be present in the day aheads searches, then this name is considered to be new to the table.
Now with the date this application started to run up to today's date which is over 400 individual queries on the database to achieve one goal.
It is also worth a noting that this table grows by 14,400 - 14,500 Rows each and every day.
The overall question to this specific? Is it possible to bring all these queries into less calls onto the database, reduce queries & improve performance?
What you can do to improve performance will be based on what parts of the application stack you can manipulate. Things to try:
Store Less Data - Database content retrieval speed is largely based on how well the database is ordered/normalized and just how much data needs to be searched for each query. Managing a cache of prior scraped pages and only storing data when there's been a change between the current scrape and the last one would guarantee less redundant requests to the db.
Separate specific classes of data - Separating data into dedicated tables would allow you to query a specific table for a specific character, etc... effectively removing one where clause.
Reduce time between queries - Less incoming concurrent requests means less resource contention and faster response times to prior requests.
Use another data structure - The only reason you're using top() is because you need data ordered in some specific way (most-recent, etc...). If you just used a code data structure that keeps the data ordered and still easily-query-able you could then perhaps offload some sql requests to this structure instead of the db.
The suggestions above are not exhaustive, but what you do to improve performance is largely a function of what in the application stack you have the ability to modify.

Randomly select DynamoDB entry

I'm have a DynamoDB table called URLArray that contains a list of URL's (myURL) and a unique video name (myKey).
I need to do two things:
When a user clicks the next video button, a random entry needs to be selected from this URLArray. There could be potentially tens of thousands of rows.
The user is logged into the app. Everytime they finish watching a video, the video's unique video name is recorded. So....when the user has seen a video, its added to a list in a table called Users under the user's info row.
Soo...This random entry that gets selected when the user clicks the next video button in point 1, has to be compared to the list of videos they've already seen. To make sure that it doesn't randomly appear again for that particular user.
I do something woefully inefficient so far, that works, but it's not great:
By the way i'm using AppSync + GraphQL to interact with the DynamoDB table. I first get a local copy of the URLArray:
//Gets a list of the Key/URL pairs in the UrlArrays table in GraphQL ****IN CONSTRUCTOR, so we have this URLArray data when componentDidMount()****
listUrlArrays = async () => {
try {
URLData = await API.graphql(graphqlOperation(ListUrlArrays)); //GraphQL query
//URLData[] is available in the entire class
this.setState({urlArrayLength: apiData.data.listURLArrays.items.length}); //gets the length of URLArray (i.e. how many videos are in the database)
}
}
As an overview, when user clicks for the next video:
//When clicking next video
async nextVideo(){
await this.logVideosSeen(); //add myKey to the list of videos in *Users* table the logged in user has now seen
await this.getURL(); //get the NEXT upcoming video's details, for Video Player to play and make sure it's not been seen before
}
//This will update the 'listOfVideosSeen[]' in Users table with videos unique myKey, the logged in user has seen
logVideosSeen = async () => {
.......
}
async getURL() {
var dbIndex = this.getUniqueRandomNumber(this.state.urlArrayLength); //Choose a number between 0 and N number of videos in URLArray
//the hasVideoBeenSeen() basically gets the list of videos a user has already seen from `Users` table with the GraphQL getusers command, and creates a local copy of this list (can get big). I use javascripts indexOf() to check whether myKey already exists in the list
while(await this.hasVideoBeenSeen(this.state.URLData[dbIndex].myKey)) //while true i.e. user has seen that video before
{
dbIndex = this.getUniqueRandomNumber(this.state.urlArrayLength); //get another random number to fetch a new myKey
}
//If false, we'll exit the loop and know we've got a not seen before myKey, proceed to set to play...
if(dbIndex != null){
this.setState({ playURL: this.state.URLData[dbIndex].vidURL }); //Retrieve the URL from the local URLArray that we're going to play (i.e. the next video to come)
}
}
I can share a little more code if needed, but essentially I wanted to know how to:
Let a Lambda function select a random number based on the current URLArray size (i may need to keep a local copy of URLArray anyway). But i think point 2 here is where it's really inefficient..
Let a Lambda function check (the while loop) against the Users table whether myKey has already been seen. Mainly to shift this computational burden to the cloud instead of the local device the app runs on.
AFTER A THINK................
Thanks for the suggestion Seth. I have been thinking about it for some time, and while the randomness requirement still holds true, I think there is some truth in what you’ve suggested. The reason I need randomness, is so that 2 users sat side by side for example, can’t predict which video is coming next. It shouldn’t be a predictable sequence of videos. I'm not sure I can use Scan function with AWS Amplify/GraphQL. So remember there’s 2 things going on here: (1) a video upload, recording it in the URLArray sensibly for future reference. (2) users viewing a previously unseen random video and then moving onto another unseen random video
*(1)
I like your idea of using a number to index the URLArray, and it’s helped to make life a bit easier. So the first URL being at index 0, the next at 1 etc…
My thinking here (to avoid me doing a ListUrlArrays() and bringing the WHOLE array locally to the phone), is to create a GSI called VideoNumber for the URLArray table. This will be the unique VideoNumber column with a number 0-N. So imagine the diagram above having another column called VideoNumber. Row 1 having VideoNumber set to 0, Row 2 having VideoNumber set to 1 etc… THEN all I would need to do, is locally on the device, generate a random number between 0-N, call a getURLArrayIdbyVideoNumber() query specific for that GSI, with the number that we just generated, and it’ll unlock the information I need from the row. Voila! I think that shifts most of that heavy burden away now.
Question: Before each video is uploaded, how do I easily get the current total number of rows N in the table (or row count)? I would then increment it by one.
The other thing I can do is save this current count number in another DynamoDB table that I use for persisted data, read the number from there before upload, and write an N+1 after upload to increment it (2 DynamoDB operations per upload). It’s not ideal.
*(2)
When a user has finished watching a video, I can log in a list (under the users information in DynamoDB), which video’s they’ve already seen. So for example this could now be a seen list: [3,12,73,108,57] for the 5 videos they’ve seen so far. When the user clicks nextVideo() we’ll generate a random newNumber, and straight away compare that with any number in the seen list. I use seenlist.indexOf(newNumber) and it will, either go again or stop if the newNumber doesn’t exist in the list. THEN I can go through the GSI query, and retrieve the relevant information to display the video from URLArray.
I think that this indexOf() is the biggest computational burden on the device, and obviously gets a little slower as the seenList increases. But it should be quicker with pure integer numbers then an alphanumeric myKey as I was using before. Any other suggestion would be welcome :)
I’ve yet to try it, but it was just an idea, as I need to keep the random element. But first, do you know how I can easily find the number of rows or table count of URLArray?
I think you'll have an easier time coming up with a solution to this problem if you drop the randomness requirement. It sounds like the more important requirement is presenting the user with a video they haven't seen before.
If that's correct, it sounds like your access pattern could be stated as
Fetch previously unseen video for user
which is an easier problem to solve.
Unlike SQL databases, there are often many ways to implement a given access pattern in DynamoDB. My answer here is just one way.
Imagine your URLArray table as a giant array. The first URL is at index 0, the next URL is at index 1, the second URL at index 2, and so on. Each user of your application would start by watching the video at URL index 0, then URL index 1, etc. This would ensure the user never sees the same video twice. You would not need to store a list of all the videos they've seen. Instead, you could store the index of the last video they saw.
Your application could grab the first n videos from the table to present to your users. Once that list was exhausted, it could go grab the next n videos. And so on...
What I've described here is essentially how pagination is implemented in DynamoDB. To bring this abstraction back to the world of DynamoDB, your algorithm could look something like this:
Scan the URLArray table for the first "page" of URLs (a scan operation with no filter criteria)
Along with the results, DynamoDB will respond with a LastEvaluatedKey, which will allow you to retrieve the next page of results starting from this position
Present your user with each video you pulled back from the scan operation, making sure to record the id (the Primary Key) the the last video they saw.
When you exhaust the URLs from step 1, execute another scan operation with the ExclusiveStartKey set to the LastEvaluatedKey returned from step 2.
When users return to your application, query for the next page from the URLArray table with ExclusiveStartKey set to the id of the last video they viewed.
This effectively uses the scan operation to search through your URLArray table one page at a time. Your application would effectively be searching the table from top to bottom, keeping track of where each user is at any given time. When a user revisits your application, just start where they left off.
In response to your edit:
If your use case requires the next video to be unpredictable (e.g. no 2 users can predict what video is next), you have a few problems to solve at the same time:
Selecting an item in an unpredictable/random manner
Tracking what a user has already seen
Putting those two requirements together makes for a tricky access pattern. Let's say you have N videos in your table, and the user has viewed N-1 of these videos leaving only one video unseen. If you are fetching your next video randomly and need to ensure it has not yet been seen, how will you find the last unseen video? How many times would you need to guess before you came across the only unseen video? What query/scan operation could you perform that does this in a single request to DDB? I'm not saying it's impossible, it's just...complicated.
I think it's better to generate a strategy that is unpredictable to the user, but predictable to you when it comes to select the next unseen video.
For example, you could pre-calculate a random order of indexes from 1..N ahead of time, which would represent the order you present the videos for a given user. You could go through that list sequentially, keeping track of the last seen index. That way, you'd always know which video was next and that the video hadn't previously been seen by this user. Fetching that video would be a simple query operation to DDB.
You also asked how to find the number of items in DynamoDB. Unfortunately, there is no DynamoDB equivalent of the SQL count operation. The answer to this question is not straightforward. For the benefit of the community (and to get a diverse set of answers), I'd suggest you make a separate question on Stackoverflow regarding the number of items in a DDB table.

Want an efficient approach to retrieving records from a database when the retrieval is weighted and balanced

Im working on something incredibly unique..... a property listings website. ;)
It displays a list of properties. For each property a teaser image and some caption data is displayed. If the teaser image and caption takes a site visitors interest, they can click on it and get a full property profile. All very standard.
The customer wants to be able to allow property owners to add multiple teaser images and to be able to track which teaser images got the most click throughs. No worries there.
But they also want to allow the property owner to weight each teaser image to control when it is shown. So for 3 images with weightings of 2, 6, 2, the 2nd image would be shown 6/10 times. This needs to be balanced. If the first 6 times the 2nd image is shown, it cant be shown again until the 1st and 3rd images have be shown twice each.
So I need to both increment how often an image has been retrieved and also retrieve images in a balanced way. Forget about actual image handling, Im actually just talking about Urls.
Note incrementing how often it has been retrieved is a different animal to incrementing how often it has captured a click through.
So i can think of a few different ways to approach the problem using database triggers or maybe some LINQ2SQL, etc but it strikes me that someone out there will know of a solution that could be orders fo magnitude faster than what i might come up with.
My first rough idea is to have a schema like so:
TeaseImage(PropId, ImageId, ImageUrl, Weighting, RetrievedCount, PropTotalRetrievedCount)
and then
select ImageRanks.*
from (Select t.ImageID,
t.ImageUrl,
rank() over (partition by t.RetrievedCount order by sum(t.RetrievedCount) desc) as IMG_Rank
from TeaseImage t
where t.RetrievedCount<t.Weighting
group by t.PropID) ImageRanks
where ImageRanks.IMG_Rank <= 1
And then
1. for each ImageId in the result set increment RetrievedCount by 1 and then
2. for each PropId in ResultSet increment PropTotalRetrievedCount by 1 and then
3. for each PropId in ResultSet check if PropTotalRetrievedCount ==10 and if so reset it to PropTotalRetrievedCount = 0 and RetrievedCount=0 for each associated ImageId
Which frankly sounds awful :(
So any ideas?
Note: if I have to step out of the datalayer I'd be using C# / .Net. Thanks.
If you want to do this entirely in your database, you could split your table in two:
Image(ImageId, ImageUrl)
TeaseImage(TeaseImageId, PropId, ImageId, DateLastAccessed)
The TeaseImage table manages weightings by storing additional (redundant) copies of each property-image pair. So an image with a weight of six would get six records.
Then the following query gives you the least-recently used record.
select top 1 ti.TeaseImageId, i.ImageUrl
from TeaseImage ti
join Image i
on i.ImageId = ti.ImageId
where ti.PropId = #PropId
order by ti.DateLastAccessed
Following the select, just update the record's DateLastAccessed. (Or even update it as part of the select procedure, depending on how fault-tolerant you need to be.)
Using this technique would give you fine-grained control over the order of image delivery, (by seeding their DateLastAccessed values appropriately) and you could easily modify the ratios if need be.
Of course, as the table grows, the additional records would degrade query performance earlier than other approaches, but depending on the cost of the query relative to everything else that's going on that may not be significant.

What do I gain by adding a timestamp column called recordversion to a table in ms-sql?

What do I gain by adding a timestamp column called recordversion to a table in ms-sql?
You can use that column to make sure your users don't overwrite data from another user.
Lets say user A pulls up record 1 and at the same time user B pulls up record 1. User A edits the record and saves it. 5 minutes later, User B edits the record - but doesn't know about user A's changes. When he saves his changes, you use the recordversion column in your update where clause which will prevent User B from overwriting what User A did. You could detect this invalid condition and throw some kind of data out of date error.
Nothing that I'm aware of, or that Google seems to find quickly.
You con't get anything inherent by using that name for a column. Sure, you can create a column and do the record versioning as described in the next response, but there's nothing special about the column name. You could call the column anything you want and do versioning, and you could call any column RecordVersion and nothing special would happen.
Timestamp is mainly used for replication. I have also used it successfully to determine if the data has been updated since the last feed to the client (when I needed to send a delta feed) and thus pick out only the records which have changed since then. This does require having another table that stores the values of the timestamp (in a varbinary field) at the time you run the report so you can use it compare on the next run.
If you think that timestamp is recording the date or time of the last update, it does not do that, you would need dateTime fields and constraints (To get the orginal datetime)and triggers (to update) to store that information.
Also, keep in mind if you want to keep track of your data, it's a good idea to add these four columns to every table:
CreatedBy(varchar) | CreatedOn(date) | ModifiedBy(varchar) | ModifiedOn(date)
While it doesn't give you full history, it lets you know who and when created an entry, and who and when last modified it. Those 4 columns create pretty powerful tracking abilities without any serious overhead to your DB.
Obviously, you could create a full-blown logging system that tracks every change and gives you full-blown history, but that's not the solution for the issue I think you are proposing.