In my user schema I have a location column which will hold the last updated coordinates of the user. Is there a PostgreSQL data type I could use to allow me to run SQL queries to search, for example something like: SELECT username FROM Users WHERE currentLocation="20 Miles Away From $1"
If your "location" column holds GPS coordinates then basically your only option is to use the PostGIS extension. With PostGIS you store your locations in a column with geometry or geography data type and then you can apply a rich set of function on it. Your query would become something like:
SELECT username
FROM users
WHERE ST_Distance(currentlocation, $1) = 20; -- careful with units
This is assuming that $1 is of the same data type as currentlocation. You probably have to convert the miles to kilometers, depending on your coordinate reference system and the data type (geography would produce kilometers, geometryproduces whatever unit the data is in).
Related
I have a integer type column in my BigQuery table and now I need to convert it to a float column. I also have to keep all records. What I want to do is changing the column type. Not casting.
I've read that it's possible to do it just by exporting results of a query on a table to itself.
How to do it?
Using SELECT with writing result back to table
SELECT
CAST(int_field AS FLOAT) AS float_field,
<all_other_fields>
FROM YourTable
This approach will co$t you scan of whole table
To execute this - you should use Show Option button in BQ Web UI and properly set options as in below example. After you run this - your table will have that column with float vs. original integer data type as you wanted. Note: You should use proper/same field name for both int_field and float_field
if you would just needed to add new column I would point you to Tables: patch GBQ API.
I am not sure if this API allows to change type of column - I doubt - but it is easy to check
Using Jobs: insert EXTRACT and then LOAD
Here you can extract table to GCS and then load it back to GBQ with adjusted schema
Above approach will a) eliminate cost cost of querying (scan) tables and b) can help with limitations that you can come up if you have complex schame (with records/repeated/nested, etc. type /mode)
I have a database which has latitudes and longitudes of various properties stored. I want to find out, which city does each of these properties belong to (all properties are in the US).
Talking about Postgresql, first of all you need to get a data of US cities boundaries shape file. Possible sites are
https://www.census.gov/geo/maps-data/data/tiger.html
https://www.census.gov/geo/maps-data/data/tiger-cart-boundary.html
https://catalog.data.gov/dataset?tags=cities
After that import data into postgres. I am assuming that your properties data is already stored in postgres. Make sure the SRID geometry type of cities boundaries is 4326. if not, you can convert it easily with ST_transform function.
Finally, to check which city some specific lat/long falls in, you need to convert the lat/long into point geometry and check against the cities data. e.g it would be some thing like this
SELECT c.city_name FROM cities_boundaries AS c, properties AS p
WHERE ST_CONTAINS(c.geom, ST_SetSRID(ST_MakePoint(p.longitude, p.latitude), 4326))
I have a integer type column in my BigQuery table and now I need to convert it to a float column. I also have to keep all records. What I want to do is changing the column type. Not casting.
I've read that it's possible to do it just by exporting results of a query on a table to itself.
How to do it?
Using SELECT with writing result back to table
SELECT
CAST(int_field AS FLOAT) AS float_field,
<all_other_fields>
FROM YourTable
This approach will co$t you scan of whole table
To execute this - you should use Show Option button in BQ Web UI and properly set options as in below example. After you run this - your table will have that column with float vs. original integer data type as you wanted. Note: You should use proper/same field name for both int_field and float_field
if you would just needed to add new column I would point you to Tables: patch GBQ API.
I am not sure if this API allows to change type of column - I doubt - but it is easy to check
Using Jobs: insert EXTRACT and then LOAD
Here you can extract table to GCS and then load it back to GBQ with adjusted schema
Above approach will a) eliminate cost cost of querying (scan) tables and b) can help with limitations that you can come up if you have complex schame (with records/repeated/nested, etc. type /mode)
I have tab-separated output from a Mahout recommender that I'd like to query in Hive. The recommendations look like this:
54508 [19:4.9,22:3.5]
54584 [17:5.2]
54648 [13:6.1,3:5.9]
54692 [17:8.1]
55424 [1:3.8]
55448 [16:2.7,3:1.2]
55452 [17:6.8]
57084 [42:6.8,3:5.4]
57212 [17:3.5]
There are two columns: the first column contains a userID, and the second contains a list of recommended products and their expected ratings.
I created a Hive table:
CREATE TABLE `recommendations_raw`(
user int,
recommendations string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/etl/recommender/output';
And I'm able to transform the data into a long tabular form in a Hive query:
select
user,
product,
rating
from recommendations_raw
lateral view explode(str_to_map(substr(recommendations, 2, length(recommendations) - 2), ",", ":")) product_rating as product, rating
user product rating
54508 19 4.9
54508 22 3.5
54584 17 5.2
[etc...]
However, I would have preferred to create the map inside the create table statement instead of using str_to_map inside the query since it seems wrong to create a table with string datatype when it's really a map.
Is this possible/practical? If so, how?
Looks like in essence, you are using an EXTERNAL TABLE on a TEXT datafile produced by a non-Hive program (Mahout in this case).
If the file format was compatible with the way Hive serializes its MAP data type in TEXT (which is not the case because of the enclosing brackets), I guess you could just "map" a MAP column (excuse the pun) on that list of key:value.
Google pointed me to that post for example.
But anyway, TEXT is TEXT. Hive has to deserialize the map on every read, whether implicitly (in the case of a MAP column definition) or explicitly (in case of a STRING column plus user-defined str_to_map()).
Bottom line: if your goal is simply to explode the list and feed another table with a "normalized" structure, as shown in your sample code, then your solution with str_to_map() is better because it is more versatile (can manage the brackets...!)
I would like to change the manner in which the mileage is represented in the database. For example, right now the mileage is represented as 080+0.348; this would mean that this particular feature is at mileage point 80.348 along the roadway corridor. I would like to have the data represented in the database in the latter form, 80.348 and so on. This would save me from having to export the dataset to excel for the conversion. Is this even possible? The name of the column is NRLG_MILEPOINT.
Much appreciated.
One thing you could try is to pick the string value apart into its component pieces and then recombine them as a number. If your data is in a table called TEST you might do something like the following:
select miles, fraction,
nvl(to_number(miles), 0) + nvl(to_number(fraction), 0) as milepoint
from (select regexp_substr(nrlg_milepoint, '[0-9]*') as miles,
regexp_substr(nrlg_milepoint, '[+-][0-9.]*') as fraction
from test);
SQLFiddle here.
Share and enjoy.
Using the answer provided above, I was able to expand it to get exactly the answer i needed. Thanks a ton to everyone who helped! Here is the query i ended up with:
select distinct nrlg_dept_route,corridor_code_rb,nrlg_county,next_county,
nvl(to_number(miles), 0) + nvl(to_number(fraction), 0) as milepoint
from (select regexp_substr(nrlg_milepoint, '[0-9]*') as miles,
nrlg_milepoint as nrlg_mile_point
nrlg_dept_route as nrlg_dept_route,
nrlg_county as nrlg_county,
next_county as next_county,
corridor_code_rb as corridor_code_rb,
corridor_code as corridor_code,
regexp_substr(nrlg_milepoint, '[+-][0-9.]*') as fraction
from corridor_county_intersect,south_van_data_view)
where nrlg_dept_route = corridor_code
order by 1,5;
There are a variety of ways to do this. Which one depends on your situation, how the data needs to be stored, and how it is being interacted with. Some of these options include:
Changing the datatype.
This option would potentially require you to change how the data is being stored currently. The conversion of the data would have to be done by whatever is writing the data to the schema currently.
Creating another column that stores the data in the correct format.
If you have an existing means of storing the data that would be broken by changing the datatype of NRLG_MILEPOINT and/or you have a requirement to store the data in that format; you could optionally add another column... say NRLG_MILEAGE_DISPLAY that is of a datatype number perhaps, and store the data there. You could make a trigger that updates/inserts NRLG_MILEAGE_DISPLAY appropriately, based on the data in NRLG_MILEPOINT.
If you are just wanting the data to be displayed differently in your select statement, you can convert the datatype in your SQL statement. Specifically how you would do this depends on the current datatype of NRLG_MILEPOINT.
Assuming that varchar2 is the type, based on the comments, here is an SQLFIDDLE link displaying a crude example of option 3. Your usage of this may vary depending on the actual datatype of NRLG_MILEPOINT. Regardless of its datatype... I am sure there is a means of converting how it is displayed in your query. You could take this further and create a view if you needed to. As an inline view or as a stored view, you can then use the converted value for doing your join later.