average of derived attributes in new table

average of derived attributes in new table - sql

How can I calculate the average of a person (in this case player) x & y position whilst creating a new table and adding said average to a new column.
CREATE TABLE PlayerStatistics AS SELECT
PLAY_Name
FROM
player;
ALTER TABLE
PlayerStatistics ADD AveragePosition DECIMAL(6, 5)
SELECT
AVG(
Player1(T1) - X,
Player1(T1) - Y
))
FROM
tracksdataview
The end result of the code is a new table with one column of the player's name/id and another column that has an average value of both the x and y positions in each row.

Depending on your DBMS, you may be able to combine your calculation and CREATE TABLE statement.
CREATE TABLE PlayerStatistics AS
SELECT
PLAY_Name,
CAST((Player_X + Player_Y) / 2 AS DECIMAL(6,5)) AS AveragePosition
FROM player p
LEFT JOIN tracksdataview tdv ON p.play_name = tdv.play_name -- Get track data (if any)
;
You may need to CAST the x, y values as FLOAT before doing the division. Give it a try and let me know.

I suspect, that a new table might not be the best solution to your problem. Consider the case where a position X or Y changes over time. This will then not be reflected in your "derived" attributes in the separate table.
My suggestion would be to generate a view that will always "look" at the original table:
CREATE VIEW PlayerStatistics AS
SELECT *, ax-X devX, ay-Y devY
FROM tracksdataview t
INNER JOIN (SELECT playerId, AVG(X) ax, AVG(Y) ay FROM tracksdataview GROUP BY playerId) ta
ON ta.playerId=t.playerId
As I was uncertain about the type of "average" you want I calculated an average over all positions of a particular player and then created two columns showing the player's x- and y- deviations from their average positions.
(I also made the assumption that an ID-columns (playerId) exists ...)

Related

Why is PostGIS stacking all points on top of each other? (Using ST_DWithin to find all results within 1000m radius)

New to PostGIS/PostgreSQL...any help would be greatly appreciated!
I have two tables in a postgres db aliased as gas and ev. I'm trying to choose a specific gas station (gas.site_id=11949) and locate all EV/alternative fuel charging stations within a 1000m radius. When I run the following though, PostGIS returns a number of ev stations that are all stacked on top of each other in the map (see screenshot).
Anyone have any idea why this is happening? How can I get PostGIS to visualize the points within a 1000m radius of the specified gas station?
with myplace as (
SELECT gas.geom
from nj_gas gas
where gas.site_id = 11949 limit 1)
select myplace.*, ev.*
from alt_fuel ev, myplace
where ST_DWithin(ev.geom1, myplace.geom, 1000)

The function ST_DWithin does not compute distances in meters using geometry typed parameters.
From the documentation:
For geometry: The distance is specified in units defined by the
spatial reference system of the geometries. For this function to make
sense, the source geometries must both be of the same coordinate
projection, having the same SRID.
So, if you want compute distances in meters you have to use the data type geography:
For geography units are in meters and measurement is defaulted to
use_spheroid=true, for faster check, use_spheroid=false to measure
along sphere.
That all being said, you have to cast the data type of your geometries. Besides that your query looks just fine - considering your data is correct :-)
WITH myplace as (
SELECT gas.geom
FROM nj_gas gas
WHERE gas.site_id = 11949 LIMIT 1)
SELECT myplace.*, ev.*
FROM alt_fuel ev, myplace
WHERE ST_DWithin(ev.geom1::GEOGRAPHY, myplace.geom::GEOGRAPHY, 1000)
Sample data:
CREATE TABLE t1 (id INT, geom GEOGRAPHY);
INSERT INTO t1 VALUES (1,'POINT(-4.47 54.22)');
CREATE TABLE t2 (geom GEOGRAPHY);
INSERT INTO t2 VALUES ('POINT(-4.48 54.22)'),('POINT(-4.41 54.18)');
Query
WITH j AS (
SELECT geom FROM t1 WHERE id = 1 LIMIT 1)
SELECT ST_AsText(t2.geom)
FROM j,t2 WHERE ST_DWithin(t2.geom, j.geom, 1000);
st_astext
--------------------
POINT(-4.48 54.22)
(1 Zeile)

You are cross joining those tables and have PostgreSQL return the cartesian product of both when selecting myplace.* & ev.*.
So while there is only one row in myplace, its geom will be merged with every row of alt_fuel (i.e. the result set will have all columns of both tables in every possible combination of both); since the result set thus has two geometry columns, your client application likely chooses either the first, or the one called geom (as opposed to alt_fuel.geom1) to display!
I don't see that you are interested in myplace.geom in the result set anyway, so I suggest to run
WITH
myplace as (
SELECT gas.geom
FROM nj_gas gas
WHERE gas.site_id = 11949
LIMIT 1
)
SELECT ev.*
FROM alt_fuel AS ev
JOIN myplace AS mp
ON ST_DWithin(ev.geom1, mp.geom, 1000) -- ST_DWithin(ev.geom1::GEOGRAPHY, mp.geom::GEOGRAPHY, 1000)
;
If, for some reason, you also want to display myplace.geom along with the stations, you'd have to UNION[ ALL] the above with a SELECT * on myplace; note that you will also have to provide the same column list and structure (same data types!) as alt_fuel.* (or better, the other side of the UNION[ ALL]) in that SELECT!
Note the suggestions made by #JimJones about units; if your data is not projected in a meter based CRS (but in a geographic reference system; 'LonLat'), use the cast to GEOGRAPHY to have ST_DWithin consider the input as meter (and calculate using spheroidal algebra instead of planar (Euclidean))!

Resolved by using:
WITH
myplace as (
SELECT geom as g
FROM nj_gas
WHERE site_id = 11949 OR site_id = 11099 OR site_id = 11679 or site_id = 480522
), myresults AS (
SELECT ev.*
FROM alt_fuel AS ev
JOIN myplace AS mp
ON ST_DWithin(ev.geom, mp.g, 0.1))
select * from myresults```
Thanks so much for your help #ThingumaBob and #JimJones ! Greatly appreciate it.

One-dimensional earth mover's distance in BigQuery/SQL

Let P and Q be two finite probability distributions on integers, with support between 0 and some large integer N. The one-dimensional earth mover's distance between P and Q is the minimum cost you have to pay to transform P into Q, considering that it costs r*|n-m| to "move" a probability r associated to integer n to another integer m.
There is a simple algorithm to compute this. In pseudocode:
previous = 0
sum = 0
for i from 0 to N:
previous = P(i) - Q(i) + previous
sum = sum + abs(previous) // abs = absolute value
return sum
Now, suppose you have two tables that contain each a probability distribution. Column n contains integers, and column p contains the corresponding probability. The tables are correct (all probabilities are between 0 and 1, their sum is I want to compute the earth mover's distance between these two tables in BigQuery (Standard SQL).
Is it possible? I feel like one would need to use analytical functions, but I don't have much experience with them, so I don't know how to get there.
What if N (the maximum integers) is very large, but my tables are not? Can we adapt the solution to avoid doing a computation for each integer i?

Hopefully I fully understand your problem. This seems to be what you're looking for:
WITH Aggr AS (
SELECT rp.n AS n, SUM(rp.p - rq.p)
OVER(ORDER BY rp.n ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS emd
FROM P rp
LEFT JOIN Q rq
ON rp.n = rq.n
) SELECT SUM(ABS(a.emd)) AS total_emd
FROM Aggr a;
WRT question #2, note that we only scan what's actually in tables, regardless of the N, assuming a one-to-one match for every n in P with n in Q.

I adapted Michael's answer to fix its issues, here's the solution I ended up with. Suppose the integers are stored in column i and the probability in column p. First I join the two tables, then I compute EMD(i) for all i using the window, then I sum all absolute values.
WITH
joined_table AS (
SELECT
IFNULL(table1.i, table2.i) AS i,
IFNULL(table1.p, 0) AS p,
IFNULL(table2.p, 0) AS q,
FROM table1
OUTER JOIN table2
ON table1.i = table2.i
),
aggr AS (
SELECT
(SUM(p-q) OVER win) * (i - (LAG(i,1) OVER win)) AS emd
FROM joined_table
WINDOW win AS (
ORDER BY i
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
)
SELECT SUM(ABS(emd)) AS total_emd
FROM aggr

MS SQL - Count by range of data

I have a database of aircraft flight track data that cross certain points. I'm looking at the altitude that the aircraft crossed these points at and trying to bin them by every 100 ft. The altitudes range from about 2000 ft to 15000 ft so I want a way to do this that automates the 100 ft increments. So I want to have the crossing point, a range (say 2000-2100 ft), and the count. And the next line is the crossing point, the next range (2100-2200 ft), and the count, and so on.
I'm still a SQL newbie so any help to get me pointed in the right direction would be appreciated. Thanks.
Edited for clarity - I have nothing. I want a column with my crossing location, another with the altitude range, and a third with the count. I'm just not sure to bin the data so it will give me the ranges in 100 ft. increments.

You can use a calculated column for the AltitudeBucket. This is automatically calculated. (This technique is often used for loading dimension tables into data warehouses.)
In this case, having the AltitudeBucket as a calculated column means you can do calculations on it and use it in WHERE clauses.
Create and populate a table.
CREATE TABLE dbo.TrackPoint
(
TrackPointID int NOT NULL IDENTITY(1,1) PRIMARY KEY,
CrossingPoint nvarchar(50) NOT NULL,
AltitudeFeet int NOT NULL
CHECK (AltitudeFeet BETWEEN 1 AND 60000),
AltitudeBucket AS (AltitudeFeet / 100) * 100 PERSISTED NOT NULL
);
GO
INSERT INTO dbo.TrackPoint (CrossingPoint, AltitudeFeet)
VALUES
(N'Paris', 12772),
(N'Paris', 12765),
(N'Paris', 32123),
(N'Toulouse', 5123),
(N'Toulouse', 6123),
(N'Toulouse', 6120),
(N'Lyon', 15000),
(N'Lyon', 15010);
Display what's in the table.
SELECT *
FROM dbo.TrackPoint;
Run a SELECT query to calculate summarised counts.
SELECT CrossingPoint, AltitudeBucket, COUNT(*) AS 'Count'
FROM dbo.TrackPoint
GROUP BY CrossingPoint, AltitudeBucket
ORDER BY CrossingPoint, AltitudeBucket;
If you want to display the altitude range.
SELECT CrossingPoint, AltitudeBucket, CAST(AltitudeBucket AS nvarchar) + N'-' + CAST(AltitudeBucket + 99 AS nvarchar) AS 'AltitudeBucketRange', COUNT(*) AS 'Count'
FROM dbo.TrackPoint
GROUP BY CrossingPoint, AltitudeBucket
ORDER BY CrossingPoint, AltitudeBucket;

Whenever you're attempting to automate any kind of process, you first must design the algorithm for the process to successfully execute manually. To begin, pick out the smallest piece of this process: returning a count of altitudes between range x and x+100. So when x = 2000, you want to return all records between 2000 and 2100.
SELECT COUNT(*) FROM AltitudesTable
WHERE altitude >= 2000 AND altitude < 2100;
The above code works for one case: 2000 <= x < 2100.
To "automate," or loop through all cases, try using T-SQL:
DECLARE #x INT = 2000;
WHILE EXISTS(SELECT * FROM AltitudesTable)
BEGIN
SELECT COUNT(*) FROM AltitudesTable
WHERE altitude >= #x AND altitude < #x+100;
#x = #x+100;
END
Respectfully, your requirements are not solidly defined, so I had to make some assumptions regarding table structure and datatypes.

Finding the pair of points whose distance from each other is maximal

I have a very small database which includes 6 points, with those columns id, the_geom, descr. And my aim to write a PL/pgSQL function which finds the the pair of points whose distance from each other is maximal. As an output, I would like to show the id or descr of two points and also the distance between them.
I have tried to do a function with returns table but setof text would be better solution?

You may try something like a cross join to find all combinations, then order by the difference. If your table name was foo something similar to:
SELECT set1.id, set2.id, abs(set1.the_geom - set2.the_geom) --- May want to use earth_distance extension ehre
FROM foo set1, foo set2
WHERE set1.id != set2.id
ORDER BY 3 DESC;
And if you need earth distance to calculate the distance itself - http://www.postgresql.org/docs/9.3/static/earthdistance.html

Exclude row if one of 2 flattened columns didn't return

I am joining against a view, in a stored procedure to try and return a record.
In one table I have something like this:
Measurement | MeasurementType | Date Performed
I need to plot TireHeight and TireWidth.
So I am flattening that table into one row.
I only want to return a row though if both TireHeight and TireWidth were measured on that date. I am not using the date performed for anything other than joining TireWidth and TireHeight together. I run a calculation on these 2 numbers for my chart point, and use TireAge for the other axis.
How can I exclude a result row if either TireHeight or TireWidth are not available?
Thanks!

You'd use an INNER JOIN to only return rows when they are present in both tables. For example:
SELECT th.DatePerformed
, th.Measurement as TireHeight
, tw.Measurement as TireWidth
FROM (
SELECT DatePerformed, Measurement
FROM Measurements
WHERE MeasurementType = 'TireHeight'
) th
INNER JOIN (
SELECT DatePerformed, Measurement
FROM Measurements
WHERE MeasurementType = 'TireWidth'
) tw
ON tw.DatePerformed = th.DatePerformed

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

average of derived attributes in new table - sql

Related

Why is PostGIS stacking all points on top of each other? (Using ST_DWithin to find all results within 1000m radius)

One-dimensional earth mover's distance in BigQuery/SQL

MS SQL - Count by range of data

Finding the pair of points whose distance from each other is maximal

Exclude row if one of 2 flattened columns didn't return

Categories

Resources