SQL Sever Geospatial, find location of point at a distance along a linestring - sql

We are investigating migrating a prototype into SQL Server (azure).
We have LineStrings that also have M values. What we would like to do is given another M value find out what its geographical location is.
To aid your visualisation, here is a real-world example:
I have a linestring that represents a flight path. Because the flight goes up and down the distance the plane has actually moved is not the same as the total length of the linestring. We have calibrated M values as a part of the linestring but need to be able to plot on it where a given event occurred. All we know about this event is its M value.
SET #g = geometry::STGeomFromText('LINESTRING(1 0 NULL 0, 2 2 NULL 5, 1 4 NULL 9, 3 6 NULL 15)', 0);
Given something like the above, what is the lat and long of a point with an M value of 8?
This should be an equivalent postgis's ST_LocateAlong
The M value is not a time, but a distance. It should be understood that this distance is arbitrary and does not directly relate to the length of the line and is calibrated against known points. This is due to the set being based on historic data that is in no way accurate by today's standards.
*Note I am not sure if I have Nulled the Z or M value. The extra parameter we are considering here is the M only.

Related

VBA to find maximum value in a chart

I have a range of data columns A, B, and C. I have displayed as a line graph with B as the primary axis and C as the secondary axis. Column A is the category axis. I want to find the maximum value of column C and put a data callout on the point that is the maximum of column C and where column B occurs.
I know this sounds confusing. In this example, the maximum of Column C occurs at Point 27 (or 1.50% on the category axis). I would like a dot at point 27 for both Column B and C.
Column A is percentage from -5.00 to 10.00 incremented at .25%. Columns B and C are plotted against the change.
In the past I have done something similar, use a formula in column D to identify the largest number in Column C and B and make it a value high on your chart if the result is true.
Add Column D as a series to the chart.
Change the chart type on that series only to a scatter chart or something that puts points up there.
You can put a label on or simply put the amount showing above the plotted point.
You don't need VBA for this.
You might be interested to know I found a solution that works for me. First, I added columns D and E using the formula =IF(C2=MAX(C$2:C$62),C2,NA()) and =IF(C2=MAX(C$2:C$62),B2,NA()), this gave me the point on the graph for both lines B and C where B was maximum. I then formatted the graph so that these points had data callouts (a request from the client). Finally, I set columns D and E to have white font, to match the background so the appear invisible. I don't love this step, but I don't want the client to see the extra rows of #NA, etc.
The basic VBA for data callout is ActiveChart.FullSeriesCollection(5).Select
ActiveChart.SetElement (msoElementDataLabelCallout)
Where the series is 5 (column E) and I'm putting a data callout on the graphed point, which happens to be the maximum of column 3.

storing weight in a database table

I am playing around with learning MVC and want to create a recipe recorder application to store my recipes.
I am using .net with Sql Server 2008 R2 however I don't think that really matters with what I am trying to do.
I want to be able to record all of the measures I use. In my country we use metric however I want people to be able to use imperial with my application.
How do I structure my table to cope with the differences, I was thinking of storing all of the measurements as ints and have a foreign key to store the kind of weight.
Ideally I would like to be able to share the recipes between people and display the measurements in their preferred way.
Is this the right kind of way
IngredientID PK
Weight int
TypeOfWeight int e.g. tsp=1,tbl=2,kilogram=3,pound=4,litre=5,ounce=6 etc
UserID int
Or is this way off track? Any suggestions would be great!
I think you should store the weights (Kilo/Pound) etc as a single weight type (metric) and simply "display" them in the correct conversion using the user's preference. If the user has there weight settings set to Imperial, values entered into the system would need to be converted as well. This should simplify your data anyway.
Similar to Dates, you could store every date and what timezone it is from, or otherwise store all dates as the same (or no timezone) and then display them in the application using offsets according to the user's preference
If you are storing weights (a non-discrete value) I would strongly suggest using numeric or decimal for this data. You have the right idea with the typeofweight column. Store a reference table somewhere showing what the conversion ratio is for each (to a certain standard).
This gets quite tricky when you want to show ounces as TSP, because the conversion depends on the ingredient itself, so you need a 3rd table - ingredient: id, name, volume-to-weight ratio.
Example typeofweight table, where the standard unit is grams
type | conversion
gram | 1
ounce | 28.35
kg | 1000
tsp | 5 // assuming that 1 tsp = 5 grams of water
pound | 453.59
Example ingredient volume to weight conversion
type | vol-to-weight
water | 1
sugar | 1.4 // i.e. 1 tsp holds 5g of water, but 7g of sugar
So to display 500 ounces of sugar in tsp, you would use the formula
units x ounce.conversion x sugar.vol-to-weight
= 500 x 28.35 x 1.4
Another example with 2 weights
Ingredient is specified as 3 ounces of starch. Show in grams
= 3 x 28.35 (straightforward isn't it)
or
Ingredient is specified as 3 ounces of starch. Show in pounds
= 3 * 28.35 / 453.59

Power-law distribution in T-SQL

I basically need the answer to this SO question that provides a power-law distribution, translated to T-SQL for me.
I want to pull a last name, one at a time, from a census provided table of names. I want to get roughly the same distribution as occurs in the population. The table has 88,799 names ranked by frequency. "Smith" is rank 1 with 1.006% frequency, "Alderink" is rank 88,799 with frequency of 1.7 x 10^-6. "Sanders" is rank 75 with a frequency of 0.100%.
The curve doesn't have to fit precisely at all. Just give me about 1% "Smith" and about 1 in a million "Alderink"
Here's what I have so far.
SELECT [LastName]
FROM [LastNames] as LN
WHERE LN.[Rank] = ROUND(88799 * RAND(), 0)
But this of course yields a uniform distribution.
I promise I'll still be trying to figure this out myself by the time a smarter person responds.
Why settle for the power-law distribution when you can draw from the actual distribution ?
I suggest you alter the LastNames table to include a numeric column which would contain a numeric value representing the actual number of indivuduals with a name that is more common. You'll probably want a number on a smaller but proportional scale, say, maybe 10,000 for each percent of representation.
The list would then look something like:
(other than the 3 names mentioned in the question, I'm guessing about White, Johnson et al)
Smith 0
White 10,060
Johnson 19,123
Williams 28,456
...
Sanders 200,987
..
Alderink 999,997
And the name selection would be
SELECT TOP 1 [LastName]
FROM [LastNames] as LN
WHERE LN.[number_described_above] < ROUND(100000 * RAND(), 0)
ORDER BY [number_described_above] DESC
That's picking the first name which number does not exceed the [uniform distribution] random number. Note how the query, uses less than and ordering in desc-ending order; this will guaranty that the very first entry (Smith) gets picked. The alternative would be to start the series with Smith at 10,060 rather than zero and to discard the random draws smaller than this value.
Aside from the matter of boundary management (starting at zero rather than 10,060) mentioned above, this solution, along with the two other responses so far, are the same as the one suggested in dmckee's answer to the question referenced in this question. Essentially the idea is to use the CDF (Cumulative Distribution function).
Edit:
If you insist on using a mathematical function rather than the actual distribution, the following should provide a power law function which would somehow convey the "long tail" shape of the real distribution. You may wan to tweak the #PwrCoef value (which BTW needn't be a integer), essentially the bigger the coeficient, the more skewed to the beginning of the list the function is.
DECLARE #PwrCoef INT
SET #PwrCoef = 2
SELECT 88799 - ROUND(POWER(POWER(88799.0, #PwrCoef) * RAND(), 1.0/#PwrCoef), 0)
Notes:
- the extra ".0" in the function above are important to force SQL to perform float operations rather than integer operations.
- the reason why we subtract the power calculation from 88799 is that the calculation's distribution is such that the closer a number is closer to the end of our scale, the more likely it is to be drawn. The List of family names being sorted in the reverse order (most likely names first), we need this substraction.
Assuming a power of, say, 3 the query would then look something like
SELECT [LastName]
FROM [LastNames] as LN
WHERE LN.[Rank]
= 88799 - ROUND(POWER(POWER(88799.0, 3) * RAND(), 1.0/3), 0)
Which is the query from the question except for the last line.
Re-Edit:
In looking at the actual distribution, as apparent in the Census data, the curve is extremely steep and would require a very big power coefficient, which in turn would cause overflows and/or extreme rounding errors in the naive formula shown above.
A more sensible approach may be to operate in several tiers i.e. to perform an equal number of draws in each of the, say, three thirds (or four quarters or...) of the cumulative distribution; within each of these parts list, we would draw using a power law function, possibly with the same coeficient, but with different ranges.
For example
Assuming thirds, the list divides as follow:
First third = 425 names, from Smith to Alvarado
Second third = 6,277 names, from to Gainer
Last third = 82,097 names, from Frisby to the end
If we were to need, say, 1,000 names, we'd draw 334 from the top third of the list, 333 from the second third and 333 from the last third.
For each of the thirds we'd use a similar formula, maybe with a bigger power coeficient for the first third (were were are really interested in favoring the earlier names in the list, and also where the relative frequencies are more statistically relevant). The three selection queries could look like the following:
-- Random Drawing of a single Name in top third
-- Power Coef = 12
SELECT [LastName]
FROM [LastNames] as LN
WHERE LN.[Rank]
= 425 - ROUND(POWER(POWER(425.0, 12) * RAND(), 1.0/12), 0)
-- Second third; Power Coef = 7
...
WHERE LN.[Rank]
= (425 + 6277) - ROUND(POWER(POWER(6277.0, 7) * RAND(), 1.0/7), 0)
-- Bottom third; Power Coef = 4
...
WHERE LN.[Rank]
= (425 + 6277 + 82097) - ROUND(POWER(POWER(82097.0, 4) * RAND(), 1.0/4), 0)
Instead of storing the pdf as rank, store the CDF (the sum of all frequencies until that name, starting from Aldekirk).
Then modify your select to retrieve the first LN with rank greater than your formula result.
I read the question as "I need to get a stream of names which will mirror the frequency of last names from the 1990 US Census"
I might have read the question a bit differently than the other suggestions and although an answer has been accepted, and a very through answer it is, I will contribute my experience with the Census last names.
I had downloaded the same data from the 1990 census. My goal was to produce a large number of names to be submitted for search testing during performance testing of a medical record app. I inserted the last names and the percentage of frequency into a table. I added a column and filled it with a integer which was the product of the "total names required * frequency". The frequency data from the census did not add up to exactly 100% so my total number of names was also a bit short of the requirement. I was able to correct the number by selecting random names from the list and increasing their count until I had exactly the required number, the randomly added count never ammounted to more than .05% of the total of 10 million.
I generated 10 million random numbers in the range of 1 to 88799. With each random number I would pick that name from the list and decrement the counter for that name. My approach was to simulate dealing a deck of cards except my deck had many more distinct cards and a varing number of each card.
Do you store the actual frequencies with the ranks?
Converting the algebra from that accepted answer to MySQL is no bother, if you know what values to use for n. y would be what you currently have ROUND(88799 * RAND(), 0) and x0,x1 = 1,88799 I think, though I might misunderstand it. The only non-standard maths operator involved from a T-SQL perspective is ^ which is just POWER(x,y) == x^y.

Identifying graphs in heap of connected nodes -- how is this called?

I have a SQL table with three columns X, Y, Z. I need to split it in groups in such a way that all records with same value of X or Y or Z are assigned to the same group. I need to make sure that the records with same value X or Y or Z are never split across multiple groups.
If you think of records as nodes and values of X, Y, Z as edges, this problem is the same as finding all graphs where the nodes in each graph will be connected directly or indirectly via X, Y, or Z-edge, but each graph will have no edges in common with other graphs (otherwise it would be part of the same graph).
A few years ago I knew what this was called and even remembered the algorithm but now it escapes me. Please tell me how this problem is called so I can Google for solution. If you now a good algorithm -- please point me to it. If you have a SQL implementation -- I will marry you :)
Example:
X Y Z BUCKET
--------- ---------------- --------- -----------
1 34 56 1
54 43 45 2
1 12 22 1
2 34 11 1
The last row is in bucket 1 because of the value of Y=34 which is the same as of the first row, which is in bucket 1.
It looks not like a graph, more like a simplicial complex.
But if we treat this complex as its skeletal graph (the numbers are treated as vertices and a row in a table means that all that three vertices are connected by an edge), then we may just use any algorithm to find connected components of this graph. I'm not sure whether there is a feasible way to do this in SQL though, perhaps it would be more prudent to use a graph database somehow.
However, for this specific problem there may be some easy solution attainable by means of SQL which I didn't look for.
to find how many nodes in each group x:
select x, count(x)
from mytable
group by x
or to find the list of sets x:
select distinct x from mytable;
Why don't you initially GROUP BY one of the colums (say X), make buckets, then do so for Y and Z, each time merging all the buckets from the previous step if you find new groups.
Repeat the process for X, Y, and Z until the buckets stop changing.
Are you working for linked-in or facebook? :)

How to Resize using Lanczos

I can easily calculate the values for sinc(x) curve used in Lanczos, and I have read the previous explanations about Lanczos resize, but being new to this area I do not understand how to actually apply them.
To resample with lanczos imagine you
overlay the output and input over
eachother, with points signifying
where the pixel locations are. For
each output pixel location you take a
box +- 3 output pixels from that
point. For every input pixel that lies
in that box, calculate the value of
the lanczos function at that location
with the distance from the output
location in output pixel coordinates
as the parameter. You then need to
normalize the calculated values by
scaling them so that they add up to 1.
After that multiply each input pixel
value with the corresponding scaling
value and add the results together to
get the value of the output pixel.
For example, what does "overlay the input and output" actually mean in programming terms?
In the equation given
lanczos(x) = {
0 if abs(x) > 3,
1 if x == 0,
else sin(x*pi)/x
}
what is x?
As a simple example, suppose I have an input image with 14 values (i.e. in addresses In0-In13):
20 25 30 35 40 45 50 45 40 35 30 25 20 15
and I want to scale this up by 2, i.e. to an image with 28 values (i.e. in addresses Out0-Out27).
Clearly, the value in address Out13 is going to be similar to the value in address In7, but which values do I actually multiply to calculate the correct value for Out13?
What is x in the algorithm?
If the values in your input data is at t coordinates [0 1 2 3 ...], then your output (which is scaled up by 2) has t coordinates at [0 .5 1 1.5 2 2.5 3 ...]. So to get the first output value, you center your filter at 0 and multiply by all of the input values. Then to get the second output, you center your filter at 1/2 and multiply by all of the input values. Etc ...