I have two tables. In first table, each row represents a polygon. In second table, each row represents a point. I want to find if each point within one of the polygon. I try to use ST_Contains, ST_MultiPolygon, and ST_Point in hive. I think there is a way to feed all rows into ST_MultiPolygon, but not sure how to do that. The following is my test data.
pid | shape
1 | [2,0,3,0,3,1,2,1]
2 | [0,0,1,0,1,1,0,1]
This is Polygon table.
pid | x | y
1 | 0.5 | 0.5
2 | 2.1 | 0.5
3 | 1.5 | 0.5
This is Point table
I want to get the result like
pid | is_in
1 | true
2 | true
3 | false
Here is the way I think to solve this problem. what I wanna do is to determine if a point is in one of the Polygon that is stored in hive. Suppose I have 2 polygons [2,0,3,0,3,1,2,1] and [0,0,1,0,1,1,0,1], actually they really are [(2,0), (3,0), (3,1), (2,1)] and [(0,0), (1,0), (1,1), (0,1)]. The reason that I stored in this odd way is because ST_MultiPolygon takes this kind of format as a parameter, like ST_MultiPolygon(array(2,0,3,0,3,1,2,1)). Combined ST_MultiPolygon with ST_Contains and st_point, I can get a boolean result that indicates if a point is in a MultiPolygon. ST_MultiPolygon can even takes multiple arrays, such as ST_MultiPolygon(array(2,0,3,0,3,1,2,1), array(0,0,1,0,1,1,0,1)). By this way, if there is a way for me to feed all Polygons into MultiPolygons, then I know if a point in one of the Polygon.
Any comment will be appreciated.
Related
I’ve run into a problem, as I cannot get a proper working LINQ statement here.
Suppose I have a DataTable with x rows and I have to sort based on the sum of the Quantity column. Then I have a condition Requested Quantity = 20. I need to find the rows equal to the exact sum of RequestedQuantity, but only where the combination of 3 rows is equal to it.
+-----+----------+
| Bin | Quantity |
+-----+----------+
| 1 | 10 |
| 2 | 5 |
| 3 | 5 |
| 4 | 10 |
| 5 | 15 |
+-----+----------+
I can’t seem to figure out the proper LINQ syntax to get this to work. My starting point is this:
From row In StorageBins.AsEnumerable.GroupBy( _
Convert.ToDouble(Function (x) x("Quantity"), cultureInfo)).Sum( _
Function (y) Convert.ToDouble(y("Quantity"), cultureInfo) = _
Double.Parse(RequestedQuantity,cultureInfo))
Initially, I am just trying to get any rows that are equal to my condition. My end-goal, however, is getting any three rows that exactly sum up to my Requested quantity.
I’m not an expert in LINQ, unfortunately. I hope some of you might be!
Maybe I'm missing something, but this actually seems like a pretty complicated problem. Pick any 3 records, but only 3, that add up to exactly 20. How many rows are there in the database? Because this could get to be quite a few potential combinations pretty quickly. And what do you do after you get the 3? Do you have to go back through recursively and group up the other records as well? Or you just need the first set of 3 that add up to 20?
Assuming you just need the first 3, I would do something like this:
Get the first record that is less that 20. Remove it from your input list and put it into your target set.
Then get the first record that is less than 20 minus the first value. ie if the first value was a '5', get records that are less than 15 (20 minus 5). This ensures you 'leave room' for the third value. Remove it from the original list and into your target set.
Then get the first record that is exactly 20 minus number one minus number two. Remove it from the input list and into the target set.
Now you would have to do this in iterators. If there is no value that meets the third criterion, release the third value from your target set and put it back in your input list. Then go back to step 2 and pick the next record that matches step 2 (and ideally that is not equal to the previous value). And if you exhaust all of the iterations through step 2, go back to step one and pick the next value there, and start the whole thing over again...
Unless I'm misunderstanding your requirement...
Hi I'm quite new in Pentaho Spoon and I have a problem:
I have a table like this:
model | type | color| q
--1---| --1-- | blue | 1
--1---| --2-- | blue | 2
--1---| --1-- | red | 1
--1---| --2-- | red | 3
--2---| --1-- | blue | 4
--2---| --2-- | blue | 5
And I would like to create a single table (to export in csv or excel) for each model grouped by type with the value of the group as header and as value the q value:
table-1.csv
type | blue | red
--1--| -1-- | -1-
--2--| -2-- | -3-
table-2.csv
type | blue
--1--| -4-
--2--| -5-
I tried with row denormalizer but nothing.
Any suggestion?
Typically it's helpful to see what you have done in order to offer help, but I know how counterintuitive the "help" on this step is.
Make sure you sort the rows on Model and Type before sending them to the denormalizer step. Then give this a try:
As for splitting the output into files, there are a few ways to handle that. Take a look at the Switch/Case step using the Model field.
Also, if you haven't found them already, take a look at the sample files that come with the PDI download. They should be in ...pdi-ce-6.1.0.1-196\data-integration\samples. They can be more helpful than the online documentation sometimes.
Row denormalizer can't be used here if number of colors is unknown, also, you can't define text output fields dynamically.
There are few ways that I can see without using java and js steps. One of them is based on the following idea: we can prepare rows with two columns:
Row Model
type|blue|red 1
1|1|1 1
2|2|3 1
type|blue 2
1|4 2
2|5 2
Then we can prepare filename for each row using Model field and then easily output all rows using text output where file name is taken from filename field. In this case all records will be exported into two files without additional efforts.
Here you can find sample transformation: copy-paste me into new transformation
Please note that it's a sample solution that works only with csv. Also it works only if you have the same number of colors for each type inside model. It's just a hint how to use spoon, it's not a complete solution.
I have the variable labels and value labels in a table in my database, like this
id_variable_label | variable_label | id_value_label | value_label | id_father_label
---------------------------------------------------------------------------------------------------------
1 | father_label | null | null | null
null | father_label | 1 | child01 | 1
null | father_label | 2 | child02 | 1
Is there a way to generate automatically all the variables and value labels when I import the data from my database through a ODBC connection?
There isn't a direct way to do this, but if you read that table as an SPSS dataset, it would be pretty simple to generate the labels with a little Python code.
Note also that if your labeling is static, you can use APPLY DICTIONARY to copy labels from one dataset to another, so saving one fully labeled file would allow you to propagate that to others that are similarly structured.
You can use SPSS syntax to create variable and value labels.
See the SPSS commands VARIABLE LABELS and VALUE LABELS.
Here's a tutorial here that explains how you can use them.
You could generate the syntax from your database.
I have crosstab which has row columns indicating different classes, and then peoples names across the top.
| | Required | Person 1 | Person 2 | Person 3 |
| Class 1 | 8 6 | 1 6 | 3 6 | 4 6 |
| Class 2 | 6 2 | 3 2 | 2 2 | 1 2 |
Each field contains 2 values The first value is the number of hours spent in the class, the second field is the number of hours required for certification.
The Required field id my grand total summary.
In the cross tab expert the fields are defined as follows.
Rows:
Command.descr -> a field containing the class names
Columns:
Command.fullname -> a field containing students full names
Summarized Fields:
Sum of Command.evlength -> summation of all time spent in a given course
Max of #required -> this formula returns the number of required hours based on the course name
I am trying to highlight the field Sum of Command.evlength if it is greater than or equal to the value of Max of #required.
My solution was to perform background formatting. Right-Click on the Sum of Command.evlength field, select Format Field. Click the borders tab, check Background, and enter a formula.
The formula I was using is:
if CurrentFieldValue >= {#required} then color(152, 251, 152) else crNoColor
This is not the correct formula. My crosstab has been placed in the footer, which causes {#required} to contain the last value in the grid which in the above example is 2.
From my research I thought I would have to use GridRowColumnValue(row or column name) to access the value of {#required} in the crosstab, but I could not come up with the correct string to represent it.
Does anyone have a way for me to correctly perform this comparison?
Frustratingly I don't think you can use the highlighting expert to compare to a dynamic value. You could swap the columns round then add the following formulas:
To the max_of_required background colour:
whileprintingrecords;
global numbervar required_hrs := currentfieldvalue;
crNoColor;
To the sum_of_command.evlength background colour:
whileprintingrecords;
global numbervar required_hrs;
if currentfieldvalue >= required_hrs then
crRed
else
crNoColor;
I think there are a few other ways but i'm not as confident with those so start here.
I have a table which contains the edges from node x to node y in a graph.
n1 | n2
-------
a | a
a | b
a | c
b | b
b | d
b | c
d | e
I would like to create a (materialized) view which denotes the shortest number of nodes/hops a path contains to reach from x to node y:
n1 | n2 | c
-----------
a | a | 0
a | b | 1
a | c | 1
a | d | 2
a | e | 3
b | b | 0
b | d | 1
b | c | 1
b | e | 2
d | e | 1
How should I model my tables and views to facilitate this? I guess I need some kind of recursion, but I believe that is pretty difficult to accomplish in SQL. I would like to avoid that, for example, the clients need to fire 10 queries if the path happens to contain 10 nodes/hops.
This works for me, but it's kinda ugly:
WITH RECURSIVE paths (n1, n2, distance) AS (
SELECT
nodes.n1,
nodes.n2,
1
FROM
nodes
WHERE
nodes.n1 <> nodes.n2
UNION ALL
SELECT
paths.n1,
nodes.n2,
paths.distance + 1
FROM
paths
JOIN nodes
ON
paths.n2 = nodes.n1
WHERE
nodes.n1 <> nodes.n2
)
SELECT
paths.n1,
paths.n2,
min(distance)
FROM
paths
GROUP BY
1, 2
UNION
SELECT
nodes.n1,
nodes.n2,
0
FROM
nodes
WHERE
nodes.n1 = nodes.n2
Also, I am not sure how good it will perform against larger datasets. As suggested by Mark Mann, you may want to use a graph library instead, e.g. pygraph.
EDIT: here's a sample with pygraph
from pygraph.algorithms.minmax import shortest_path
from pygraph.classes.digraph import digraph
g = digraph()
g.add_node('a')
g.add_node('b')
g.add_node('c')
g.add_node('d')
g.add_node('e')
g.add_edge(('a', 'a'))
g.add_edge(('a', 'b'))
g.add_edge(('a', 'c'))
g.add_edge(('b', 'b'))
g.add_edge(('b', 'd'))
g.add_edge(('b', 'c'))
g.add_edge(('d', 'e'))
for source in g.nodes():
tree, distances = shortest_path(g, source)
for target, distance in distances.iteritems():
if distance == 0 and not g.has_edge((source, target)):
continue
print source, target, distance
Excluding the graph building time, this takes 0.3ms while the SQL version takes 0.5ms.
Expanding on Mark's answer, there are some very reasonable approaches to explore a graph in SQL as well. In fact, they'll be faster than the dedicated libraries in perl or python, in that DB indexes will spare you the need to explore the graph.
The most efficient of index (if the graph is not constantly changing) is a nested-tree variation called the GRIPP index. (The linked paper mentions other approaches.)
If your graph is constantly changing, you might want to adapt the nested intervals approach to graphs, in a similar manner that GRIPP extends nested sets, or to simply use floats instead of integers (don't forget to normalize them by casting to numeric and back to float if you do).
Rather than computing these values on the fly, why not create a real table with all interesting pairs along with the shortest path value. Then whenever data is inserted, deleted or updated in your data table, you can recalculate all of the shortest path information. (Perl's Graph module is particularly well-suited to this task, and Perl's DBI interface makes the code straightforward.)
By using an external process, you can also limit the number of recalculations. Using PostgreSQL triggers would cause recalculations to occur on every insert, update and delete, but if you knew you were going to be adding twenty pairs of points, you could wait until your inserts were completed before doing the calculations.