I have these two tables in postgresql , PATHWAY , and the vertices table that i created using pgr_createTopology, called PATHWAY_VERTICES_PGR. Everything was great until i decided to backup the database to restore it later, now that i have restored it, with the same postgres 9.3.4 x64, postgis 2.1.3 and pgrouting 2.0 versions, nothing has changed but the fact that i have restored it, and now the pgr_dijkstra stopped working, im receiving this error every time i query for pgr_dijkstra:
ERRO: Error computing path: Unknown exception caught!
********** Error **********
ERRO: Error computing path: Unknown exception caught!
SQL state: 38001
but when i search for the error code:
38001 containing_sql_not_permitted
An example of query that was completely fine until the restore:
SELECT seq, id1 AS node, id2 AS edge, cost, geom FROM pgr_dijkstra( ' SELECT r.gid as id, r.source, r.target, st_length(r.geom) as cost,r.geom FROM PATHWAY r' ,956358,734134, false, false ) as di JOIN PATHWAY pt ON di.id2 = pt.gid
I've already tried reinstalling Postgres, deleting and adding the postgis and pgrouting extensions again but the error persists. If you guys have any idea let me know, these postgresql error codes are hard to decipher
This is a memory allocation problem.
Your source and target nodes have high id's and PgRouting tries to allocate the memory based on the highest node id it can find, even if there is only a few edges and nodes in the graph.
Dijkstra, drivingDistance and other functions have the same problem.
IMHO this is a real problem since you can't select a subgraph from a huge graph without renumbering the edges and nodes, which renders unusable the query parameters of these functions.
A simple test case to reproduce the problem : Create a small graph with 1 edge and starting and ending nodes id of 2 000 000 000 and 2 000 000 001. You ll get an error running dijkstra on these two nodes.
Technical analysis follows :
Looking at the C source code (PgRouting v2.0.0), in src\bd_dijkstra\src :
bdsp.c
...
line 271 : computing max node id
for(z=0; z<total_tuples; z++) {
if(edges[z].source<v_min_id) v_min_id=edges[z].source;
if(edges[z].source>v_max_id) v_max_id=edges[z].source;
if(edges[z].target<v_min_id) v_min_id=edges[z].target;
if(edges[z].target>v_max_id) v_max_id=edges[z].target;
then line 315, the v_max_id is used as parameter...
ret = bidirsp_wrapper(edges, total_271tuples, v_max_id + 2, start_vertex, end_vertex,
directed, has_reverse_cost,
path, path_count, &err_msg);
in BiDirDijkstra.cpp
...
line 281, v_max_id + 2 = maxNode
int BiDirDijkstra::bidir_dijkstra(edge_t *edges, unsigned int edge_count, int maxNode, int start_vertex, int end_vertex,
path_element_t **path, int *path_count, char **err_msg)
{
max_node_id = maxNode;
max_edge_id = -1;
// Allocate memory for local storage like cost and parent holder
DBG("calling initall(maxNode=%d)\n", maxNode);
initall(maxNode);
and then line 67, trying to allocate A LOT of memory :
void BiDirDijkstra::initall(int maxNode)
{
int i;
m_vecPath.clear();
DBG("BiDirDijkstra::initall: allocating m_pFParent, m_pRParent maxNode: %d\n", maxNode+1);
m_pFParent = new PARENT_PATH[maxNode + 1];
m_pRParent = new PARENT_PATH[maxNode + 1];
DBG("BiDirDijkstra::initall: allocated m_pFParent, m_pRParent\n");
DBG("BiDirDijkstra::initall: allocating m_pFCost, m_pRCost maxNode: %d\n", maxNode+1);
m_pFCost = new double[maxNode + 1];
m_pRCost = new double[maxNode + 1];
...
Indirectly related to http://pgrouting.974090.n3.nabble.com/pgrouting-dev-PGR-2-Add-some-robustness-to-the-boost-wrappers-td4025087.html
Related
I have two sql using parquet-arrow:
`table` has 50 column
sql1 = `select * from table`, total_data_size = 45GB
sql2 = `select value from table`, total_data_size = 30GB
I add profile for io-throughput(Yeah, drop page-cache and just watch disk-io).
I found:
Parquet on HDFS: sql2 is faster than sql1, about 1.5 times which is reasonable
Parquet on local-disk(1MB randread=130MB;1MB read=250MB): sq1 is faster than sql2, about 4 times which is confusing.
I guess two reasons via iostat:
the io-load is high(about 100~130MB/S, utils=90%~100%) when execute sql2, which seem mean the select one column is more rand read and make the io-throughput decrease
select * will cache more page-cache and the hit-ratio is high in process though I drop page-cache before executing. so for the select *, the io-throughput actually is benefit from cache hit ratio.
Expect your help, thanks!
I use cachestat to get the page-cache hit-ratio, and I found select * has higher ratio(50%) than select one column(27%), so the io-throughput of select * is more better because of the page-cache
I try open with O_DIRECT to read the parquet to make sure the conclusion, but it report errno: 22, strerror: Invalid argument, I haven't found the error root cause, but I think the page-cache hit-ratio is the root cause for io-throughput.
However, why select * has higher hit-ratio?
I ended up with a table storing a network topology as follows:
create table topology.network_graph(
node_from_id varchar(50) not null,
node_to_id varchar(50) not null
PRIMARY KEY (node_from_id, node_to_id)
);
The expected output data in something like this, where all the sub-graphs starting from the node "A" are listed:
Now I try to find the paths between the nodes, starting at a specific node using this query:
WITH RECURSIVE network_nodes AS (
select
node_from_id,
node_to_id,
0 as hop_count,
,ARRAY[node_from_id::varchar(50), node_to_id::varchar(50)] AS "path"
from topology.network_graph
where node_from_id = '_9EB23E6C4C824441BB5F75616DEB8DA7' --Set this node as the starting element
union
select
nn.node_from_id,
ng.node_to_id,
nn.hop_count + 1 as hop_count,
, (nn."path" || ARRAY[ng.node_to_id])::varchar(50)[] AS "path"
from topology.network_graph as ng
inner join network_nodes as nn
on ng.node_from_id = nn.node_to_id
and ng.node_to_id != ALL(nn."path")
)
select node_from_id, node_to_id, hop_count, "path"
from network_nodes
order by node_from_id, node_to_id, hop_count ;
The query runs several minutes before throwing the error:
could not write to tuplestore temporary file: No space left on device
recursive query postgresql
The topology.network_graph has 2148 records and during the query execution the base/pgsql_tmp directory grows bis some GBs. It seems I have an infinite loop.
Can someone find what could be wrong?
i want to count the data type of each redis key, I write following code, but run error, how to fix it?
local detail = {}
detail.hash = 0
detail.set = 0
detail.string = 0
local match = redis.call('KEYS','*')
for i,v in ipairs(match) do
local val = redis.call('TYPE',v)
detail.val = detail.val + 1
end
return detail
(error) ERR Error running script (call to f_29ae9e57b4b82e2ae1d5020e418f04fcc98ebef4): #user_script:10: user_script:10: attempt to perform arithmetic on field 'val' (a nil value)
The error tells you that detail.val is nil. That means that there is no table value for key "val". Hence you are not allowed to do any arithmetic operations on it.
Problem a)
detail.val is syntactic sugar for detail["val"]. So if you expect val to be a string the correct way to use it as a table key is detail[val].
Possible problem b)
Doing a quick research I found that this redis call might return a table, not a string. So if detail[val] doesn't work check val's type.
We are considering using an In-Memory database (such as Apache Ignite) to deal with performance intense BI-like operations. So as a (very primitive) example, I filled Apache Ignite with 250.000 records from a csv-file (14 columns) and did some group-by operations. Previously, I also used the same data to do some performance-tests with MS SQL-Server.
Interestingly and unexpected, MS SQL-Server need about 0.25 seconds to perform this operations, while it takes 1-2 seconds with Apache Ignite.
1, I always was under the impression that Apache Ignite is not only a good option for distributed computing, but also leads to a performance gain compared to a conventional relational database due to its memory oriented architecture. Is that true? Why is it that slow in my example?
2, Did I use Apache Ignite in a wrong way or are there some additional tuning options that I should use?
Here is the source-code I used in my example:
private static Connection conn = null;
private static Statement stmt = null;
private static ResultSet rs = null;
private static void initialize() throws ClassNotFoundException, SQLException
{
// Register JDBC driver.
Class.forName("org.apache.ignite.IgniteJdbcThinDriver");
// Create database tables.
stmt = conn.createStatement();
// Create table
stmt.executeUpdate("CREATE TABLE PIVOT_TEST (" +
" REGION VARCHAR, COUNTRY VARCHAR, ITEM_TYPE VARCHAR, SALES_CHANNEL VARCHAR, ORDER_PRIORITY VARCHAR, ORDER_DATE VARCHAR, ORDER_ID VARCHAR PRIMARY KEY, "
+ "SHIP_DATE VARCHAR, UNITS_SOLD NUMERIC, UNIT_PRICE NUMERIC, UNIT_COST NUMERIC, TOTAL_REVENUE NUMERIC, TOTAL_COST NUMERIC, TOTAL_PROFIT NUMERIC )");
}
private static void fill() throws ClassNotFoundException, SQLException
{
// Register JDBC driver
Class.forName("org.apache.ignite.IgniteJdbcThinDriver");
// Populate table
PreparedStatement stmt =
conn.prepareStatement("COPY FROM 'LINK_TO_CSV_FILE'" +
"INTO PIVOT_TEST (REGION , COUNTRY , ITEM_TYPE , SALES_CHANNEL , ORDER_PRIORITY , ORDER_DATE , ORDER_ID , SHIP_DATE , UNITS_SOLD , UNIT_PRICE , UNIT_COST , TOTAL_REVENUE , TOTAL_COST , TOTAL_PROFIT ) FORMAT CSV");
stmt.executeUpdate();
stmt = conn.prepareStatement("CREATE INDEX index_name ON PIVOT_TEST(COUNTRY)");
stmt.executeUpdate();
}
private static void getResult() throws ClassNotFoundException, SQLException
{
// Register JDBC driver
Class.forName("org.apache.ignite.IgniteJdbcThinDriver");
// Get data
stmt = conn.createStatement();
rs =
stmt.executeQuery("SELECT AVG(UNIT_PRICE) AS AVG_UNIT_PRICE, MAX(UNITS_SOLD) AS MAX_UNITS_SOLD, SUM(UNIT_COST) AS SUM_UNIT_COST, AVG(TOTAL_REVENUE) AS AVG_TOTAL_REVENUE , AVG(TOTAL_COST) AS AVG_TOTAL_COST, AVG(TOTAL_PROFIT) as AVG_TOTAL_PROFIT FROM PIVOT_TEST GROUP BY COUNTRY;");
retrieveResultSet();
}
private static void retrieveResultSet() throws SQLException
{
while (rs.next())
{
for(int i=0; i<rs.getMetaData().getColumnCount(); i++)
{
rs.getObject(i+1);
}
}
}
public static void main(String[] args) throws SQLException, ClassNotFoundException
{
Ignite ignite = null;
try
{
//--------------------------------CONNECTION-------------------//
IgniteConfiguration configuration = new IgniteConfiguration();
ignite = Ignition.start(configuration);
conn = DriverManager.getConnection("jdbc:ignite:thin://127.0.0.1/");
initialize();
fill();
long endPrepTable = System.currentTimeMillis();
getResult();
long endGetResult = System.currentTimeMillis();
System.out.println("Get Result (s)" + " " + (endGetResult - endPrepTable)*1.0/1000);
}
catch(Exception e)
{
e.printStackTrace();
}
finally
{
ignite.close();
conn.close();
rs.close();
}
}
Thank you for your help!
There are several things to consider when Ignite is compared to a relational database:
Ignite SQL engine is optimized for multi-nodes deployments with the RAM as primary storage. Don't try to compare a single-node Ignite cluster to a relational database that was optimized for such configurations. Have a multi-nodes cluster deployed with a whole copy of data in RAM.
Take into account basic recommendations during data modeling and optimizations like affinity collocation, secondary indexes and others listed here.
Plus, keep in mind that relational databases leverage from local caching techniques and depending on the total data size, and a type of a query can complete some queries even faster than Ignite in a multi-node configuration. For instance, I've seen a SQL server completing a query below in 5 ms while Ignite single node cluster in 8 ms and 4-nodes cluster in 20 ms:
SELECT * FROM Input i JOIN Party pr ON (pr.prt_id) = (i.mbr_id) order by i.input_id offset 0 limit 100
It was expected because the data set size was around 64GB, and SQL Server could cache a lot in local RAM. Plus, the costs for intra-node communication affected the numbers for 4 nodes cluster in comparison to the single node one.
To unleash the power of the distributed in-memory computing, preload more data to your cluster or/and force SQL Server to go to disk by checking more complicated queries like the one below:
SELECT * FROM Input i INNER JOIN Product p ON (i.product_id) = (p.product_id) INNER JOIN Party pr ON (pr.prt_id) = (i.mbr_id) and (pr.session_id=i.session_id) WHERE I.PRODUCT_ID=5 and I.SOURCE_ID=6
In my case, it took 510 seconds for SQL Server in the same configuration and 64GB of data to finish the query (it had to go to disk). Ignite's 4 nodes cluster finished in 32 seconds and 8-nodes cluster completed in 8 seconds.
You could apply the following turning points:
Use the collocated flag[1] :
jdbc:ignite:thin://127.0.0.1;collocated=true
Introduce a variable for rs.getMetaData().getColumnCount():
int count = rs.getMetaData().getColumnCount();
while (rs.next())
{
for(int i=0; i< count; i++)
rs.getObject(i+1);
}
[1] https://apacheignite-sql.readme.io/docs/jdbc-driver#section-parameters
[2] https://apacheignite.readme.io/docs/affinity-collocation#collocate-data-with-data
As with any database, there are many ways to tune and optimise it. And Ignite is designed with different trade-offs than SQL Server -- it's not possible to guarantee that it'll be faster in every case.
Having said that, there is some documentation on improving performance.
Things to consider: quarter of a million records isn't that many. Ignite is optimised to work in a cluster where operations can be parallelised. With a single, "hard" query, you might need to increase queryParallelism otherwise you're going to be limited to a single thread in each node.
Of course you can also do things like EXPLAIN PLAN to make sure it's using the right indexes, etc. As with any optimisation exercise, it's as much an art as a science.
I am trying to run the model in R software which calls functions from GRASS GIS (version 7.0.2) and PostgreSQL (version 9.5) to complete the task. I have created a database in PostgreSQL and created an extension Postgis, then imported required vector layers into the database using Postgis shapefile importer. Every time I try to run using R (run as an administrator), it returns an error like:
Error in fetch(dbSendQuery(con, q, n = -1)) :
error in evaluating the argument 'res' in selecting a method for function 'fetch': Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: column mm.geom does not exist
LINE 5: (st_dump(st_intersection(r.geom, mm.geom))).geom as geom,
^
HINT: Perhaps you meant to reference the column "r.geom".
QUERY:
insert into m_rays
with os as (
select r.ray, st_endpoint(r.geom) as s,
(st_dump(st_intersection(r.geom, mm.geom))).geom as geom,
mm.legend, mm.hgt as hgt, r.totlen
from rays as r,bh_gd_ne_clip as mm
where st_intersects(r.geom, mm.geom)
)
select os.ray, os.geom, os.hgt, l.absorb, l.barrier, os.totlen,
st_length(os.geom) as shape_length, st_distance(os.s, st_endpoint(os.geom)) as near_dist
from os left join lut as l
on os.legend = l.legend
CONTEXT: PL/pgSQL function do_crtn(text,text,text) line 30 at EXECUTE
I have checked over and over again, column geometry does exist in Schema>Public>Views of PostgreSQL. Any advise on how to resolve this error?
add quotes and then use r."geom" instead r.geom