Cypher query loading complex CSV file

Cypher query loading complex CSV file - cypher

I have been experimenting with Cypher to load a collection of CSV files to build a view of VMware's operational state. I'm able to execute the following query:
LOAD CSV WITH HEADERS FROM 'file:///Users/rmorgan/Downloads/All_vm_hierrachy.csv' as csvline
with csvline
where csvline.type = 'VirtualMachine'
MERGE (host:HostSystem { name: csvline.hostid, description: csvline.hosttype })
MERGE (pool:ResourcePool { name: csvline.resourcePool})
MERGE (parent:Folder { name: csvline.parentid, description: csvline.parenttype })
MERGE (vm:VirtualMachine { name: csvline.moid, description: csvline.name })
CREATE (host)-[:HAS_VM]->(vm)
CREATE (vm)-[:HAS_PARENT]->(parent)
CREATE (vm)-[:HAS_RESOURCEPOOL]->(pool);
This loads details about the host, I find the matching host object and then use SET to add more attributes.
LOAD CSV WITH HEADERS FROM 'file:///Users/rmorgan/Downloads/All_vm_hierrachy.csv' as csvline
with csvline
where csvline.type = 'HostSystem' and csvline.parenttype = 'ClusterComputeResource'
match (h:HostSystem {name:csvline.moid})
set h.fullName = csvline.name
merge (p:ClusterComputeResource {name: csvline.parentid, type: csvline.parenttype})
CREATE (p)-[:HAS_CHILD]->(h);
Both queries load the same file but instantiate different objects. Is there a way to combine them? I would like to put a CASE statement in there:
LOAD CSV WITH HEADERS FROM 'file:///Users/rmorgan/Downloads/All_vm_hierrachy.csv' as csvline
with csvline
CASE csvline.type = 'HostSystem'
// create some objects and add attributes
CASE csvline.type = 'VirtualMachine'
// create some objects and add other attributes
END
// create some common objects / attributes irrespective of the path above
I would also like to nest these to produce more complex execution paths
LOAD CSV WITH HEADERS FROM 'file:///Users/rmorgan/Downloads/All_vm_hierrachy.csv' as csvline
with csvline
CASE csvline.type = 'HostSystem'
// create host object
CASE csvline.parenttype = 'ClusterComputeResource'
// create ClusterComputeResource object, link to host
CASE csvline.parenttype = 'ComputeResource'
// create ComputeResource object, link to host
CASE csvline.type = 'VirtualMachine'
// create some objects and add other attributes
END
// create some common objects / attributes irrespective of the path above
Is this possible in Cypher? the commands seem to exist but I cannot make them work together.

Cool use-case.
create indexes on unique properties you merge on
e.g create index on :HostSystem(name);
don't use merge on multiple attributes, esp. on import
e.g. MERGE (host:HostSystem { name: csvline.hostid }) ON CREATE SET host.description= csvline.hosttype
Regarding your second question:
we're thinking about adding a conditional update functionality
there is a workaround
keeping your statements simpler will keep them faster
I'm currently writing a blog post about this, it is not done yet,
feel free to have a look and apply the tips:
https://dl.dropboxusercontent.com/u/14493611/load_csv_with_success.adoc
Esp. the part with the Eager loading which might affect you if you have a lot of data.
the workaround is, use a combination of filter and foreach to create a single or zero element collection of "work-items"
LOAD CSV WITH HEADERS FROM 'file:///Users/rmorgan/Downloads/All_vm_hierrachy.csv' as csvline
with csvline
foreach (line in
filter(x in [csvline] where x.type = 'HostSystem'
and x.parenttype = 'ClusterComputeResource')
| MERGE (h:HostSystem {name:line.moid})
SET h.fullName = csvline.name
MERGE (p:ClusterComputeResource {name: csvline.parentid})
ON CREATE SET p.type= csvline.parenttype
CREATE (p)-[:HAS_CHILD]->(h)
)
CASE ... WHEN THEN ELSE END is an expression.

Related

How do I loop thought each DB field to see if range is correct

I have this response in soapUI:
<pointsCriteria>
<calculatorLabel>Have you registered for inContact, signed up for marketing news from FNB/RMB Private Bank, updated your contact details and chosen to receive your statements</calculatorLabel>
<description>Be registered for inContact, allow us to communicate with you (i.e. update your marketing consent to 'Yes'), receive your statements via email and keep your contact information up to date</description>
<grades>
<points>0</points>
<value>No</value>
</grades>
<grades>
<points>1000</points>
<value>Yes</value>
</grades>
<label>Marketing consent given and Online Contact details updated in last 12 months</label>
<name>c21_mrktng_cnsnt_cntct_cmb_point</name>
</pointsCriteria>
There are many many many pointsCriteria and I use the below xquery to give me the DB value and Range of what that field is meant to be:
<return>
{
for $x in //pointsCriteria
return <DBRange>
<db>{data($x/name/text())}</db>
<points>{data($x//points/text())}</points>
</DBRange>
}
</return>
And i get the below response
<return><DBRange><db>c21_mrktng_cnsnt_cntct_cmb_point</db><points>0 1000</points></DBRange>
That last bit sits in a property transfer. I need SQL to bring back all rows where that DB field is not in that points range (field can only be 0 or 1000 in this case), my problem is I dont know how to loop through each DBRange/DBrange in this manner? please help

I'm not sure that I really understand your question, however I think that you want to make queries in your DB using specific table with a column name defined in your <db> field of your xml, and using as values the values defined in <points> field of the same xml.
So you can try using a groovy TestStep, first parse your Xml and get back your column name, and your points. To iterate over points if the values are separated with a blank space you can make a split(" ") to get a list and then use each() to iterate over the points on this list. Then using groovy.sql.Sql you can perform the queries in your DB.
Only one more thing, you need to put the JDBC drivers for your vendor DB in $SOAPUI_HOME/bin/ext and then restart SOAPUI in order that it can load the necessary driver classes.
So the follow code approach can achieve your goal:
import groovy.sql.Sql
import groovy.util.XmlSlurper
// soapui groovy testStep requires that first register your
// db vendor drivers, as example I use oracle drivers...
com.eviware.soapui.support.GroovyUtils.registerJdbcDriver( "oracle.jdbc.driver.OracleDriver")
// connection properties db (example for oracle data base)
def db = [
url : 'jdbc:oracle:thin:#db_host:d_bport/db_name',
username : 'yourUser',
password : '********',
driver : 'oracle.jdbc.driver.OracleDriver'
]
// create the db instance
def sql = Sql.newInstance("${db.url}", "${db.username}", "${db.password}","${db.driver}")
def result = '''<return>
<DBRange>
<db>c21_mrktng_cnsnt_cntct_cmb_point</db>
<points>0 1000</points>
</DBRange>
</return>'''
def resXml = new XmlSlurper().parseText(result)
// get the field
def field = resXml.DBRange.db.text()
// get the points
def points = resXml.DBRange.points.text()
// points are separated by blank space,
// so split to get an array with the points
def pointList = points.split(" ")
// for each point make your query
pointList.each {
def sqlResult = sql.rows "select * from your_table where ${field} = ?",[it]
log.info sqlResult
}
sql.close();
Hope this helps,

Thanks again for your help #albciff, I had to add this into a multidimensional array (I renamed field to column and result is a large return from the Xquery above)
def resXml = new XmlSlurper().parseText(result)
//get the columns and points ranges
def Column = resXml.DBRange.db*.text()
def Points = resXml.DBRange.points*.text()
//sorting it all out into a multidimensional array (index per index)
count = 0
bigList = Column.collect
{
[it, Points[count++]]
}
//iterating through the array
bigList.each
{//creating two smaller lists and making it readable for sql part later
def column = it[0]
def points = it[1]
//further splitting the points to test each
pointList = points.split(" ")
pointList.each
{//test each points range per column
def sqlResult = sql.rows "select * from my_table where ${column} <> ",[it]
log.info sqlResult
}
}
sql.close();
return;

In VTD-XML how to add new attribute into tag with existing attributes?

I'm using VTD-XML to update XML files. In this I am trying to get a flexible way of maintaining attributes on an element. So if my original element is:
<MyElement name="myName" existattr="orig" />
I'd like to be able to update it to this:
<MyElement name="myName" existattr="new" newattr="newValue" />
I'm using a Map to manage the attribute/value pairs in my code and when I update the XML I'm doing something like the following:
private XMLModifier xm = new XMLModifier();
xm.bind(vn);
for (String key : attr.keySet()) {
int i = vn.getAttrVal(key);
if (i!=-1) {
xm.updateToken(i, attr.get(key));
} else {
xm.insertAttribute(key+"='"+attr.get(key)+"'");
}
}
vn = xm.outputAndReparse();
This works for updating existing attributes, however when the attribute doesn't already exist, it hits the insert (insertAttribute) and I get "ModifyException"
com.ximpleware.ModifyException: There can be only one insert per offset
at com.ximpleware.XMLModifier.insertBytesAt(XMLModifier.java:341)
at com.ximpleware.XMLModifier.insertAttribute(XMLModifier.java:1833)
My guess is that as I'm not manipulating the offset directly this might be expected. However I can see no function to insert an an attribute at a position in the element (at end).
My suspicion is that I will need to do it at the "offset" level using something like xm.insertBytesAt(int offset, byte[] content) - as this is an area I have needed to get into yet is there a way to calculate the offset at which I can insert (just before the end of the tag)?
Of course I may be mis-using VTD in some way here - if there is a better way of achieving this then happy to be directed.
Thanks

That's an interesting limitation of the API I hadn't encountered yet. It would be great if vtd-xml-author could elaborate on technical details and why this limitation exists.
As a solution to your problem, a simple approach would be to accumulate your key-value pairs to be inserted as a String, and then to insert them in a single call after your for loop has terminated.
I've tested that this works as per your code:
private XMLModifier xm_ = new XMLModifier();
xm.bind(vn);
String insertedAttributes = "";
for (String key : attr.keySet()) {
int i = vn.getAttrVal(key);
if (i!=-1) {
xm.updateToken(i, attr.get(key));
} else {
// Store the key-values to be inserted as attributes
insertedAttributes += " " + key + "='" + attr.get(key) + "'";
}
}
if (!insertedAttributes.equals("")) {
// Insert attributes only once
xm.insertAttribute(insertedAttributes);
}
This will also work if you need to update the attributes of multiple elements, simply nest the above code in while(autoPilot.evalXPath() != -1) and be sure to set insertedAttributes = ""; at the end of each while loop.
Hope this helps.

How to insert image in postgresql?

I am wondering how to insert an image on one of the fields in my postgresql table. I cannot find an appropriate tutorial re this matter. The dataype of the field is oid. Has anyone tried this? Thanks!

// All LargeObject API calls must be within a transaction
conn.setAutoCommit(false);
// Get the Large Object Manager to perform operations with
LargeObjectManager lobj = ((org.postgresql.PGConnection)conn).getLargeObjectAPI();
//create a new large object
int oid = lobj.create(LargeObjectManager.READ | LargeObjectManager.WRITE);
//open the large object for write
LargeObject obj = lobj.open(oid, LargeObjectManager.WRITE);
// Now open the file
File file = new File("myimage.gif");
FileInputStream fis = new FileInputStream(file);
// copy the data from the file to the large object
byte buf[] = new byte[2048];
int s, tl = 0;
while ((s = fis.read(buf, 0, 2048)) > 0)
{
obj.write(buf, 0, s);
tl += s;
}
// Close the large object
obj.close();
//Now insert the row into imagesLO
PreparedStatement ps = conn.prepareStatement("INSERT INTO imagesLO VALUES (?, ?)");
ps.setString(1, file.getName());
ps.setInt(2, oid);
ps.executeUpdate();
ps.close();
fis.close();
Found that sample code from here. Really very good bunch of sql operations.

To quote this site,
PostgreSQL database has a special data type to store binary data
called bytea. This is a non-standard data type. The standard data type
in databases is BLOB.
You need to write a client to read the image file, for example
File img = new File("woman.jpg");
fin = new FileInputStream(img);
con = DriverManager.getConnection(url, user, password);
pst = con.prepareStatement("INSERT INTO images(data) VALUES(?)");
pst.setBinaryStream(1, fin, (int) img.length());
pst.executeUpdate();

You can either use the bytea type or the large objects facility. However note that depending on your use case it might not be a good idea to put your images in the DB because of additional load it may put on the DB server.
Rereading your question I notice you mentioned you have a field of type oid. If this is an application you are modifying it suggests to me it is using large objects. These objects get an oid which you then need to store in another table to keep track of them.

SQL to Magento model understanding

Understanding Magento Models by reference of SQL:
select * from user_devices where user_id = 1
select * from user_devices where device_id = 3
How could I perform the same using my magento models? getModel("module/userdevice")
Also, how can I find the number of rows for each query
Following questions have been answered in this thread.
How to perform a where clause ?
How to retrieve the size of the result set ?
How to retrieve the first item in the result set ?
How to paginate the result set ? (limit)
How to name the model ?

You are referring to Collections
Some references for you:
http://www.magentocommerce.com/knowledge-base/entry/magento-for-dev-part-5-magento-models-and-orm-basics
http://alanstorm.com/magento_collections
http://www.magentocommerce.com/wiki/1_-_installation_and_configuration/using_collections_in_magento
lib/varien/data/collection/db.php and lib/varien/data/collection.php
So, assuming your module is set up correctly, you would use a collection to retrieve multiple objects of your model type.
Syntax for this is:
$yourCollection = Mage::getModel('module/userdevice')->getCollection()
Magento has provided some great features for developers to use with collections. So your example above is very simple to achieve:
$yourCollection = Mage::getModel('module/userdevice')->getCollection()
->addFieldToFilter('user_id', 1)
->addFieldToFilter('device_id', 3);
You can get the number of objects returned:
$yourCollection->count() or simply count($yourCollection)
EDIT
To answer the question posed in the comment: "what If I do not require a collection but rather just a particular object"
This depends if you still require both conditions in the original question to be satisfied or if you know the id of the object you wish to load.
If you know the id of the object then simply:
Mage::getModel('module/userdevice')->load($objectId);
but if you wish to still load based on the two attributes:
user_id = 1
device_id = 3
then you would still use a collection but simply return the first object (assuming that only one object could only ever satisfy both conditions).
For reuse, wrap this logic in a method and place in your model:
public function loadByUserDevice($userId, $deviceId)
{
$collection = $this->getResourceCollection()
->addFieldToFilter('user_id', $userId)
->addFieldToFilter('device_id', $deviceId)
->setCurPage(1)
->setPageSize(1)
;
foreach ($collection as $obj) {
return $obj;
}
return false;
}
You would call this as follows:
$userId = 1;
$deviceId = 3;
Mage::getModel('module/userdevice')->loadByUserDevice($userId, $deviceId);
NOTE:
You could shorten the loadByUserDevice to the following, though you would not get the benefit of the false return value should no object be found:
public function loadByUserDevice($userId, $deviceId)
{
$collection = $this->getResourceCollection()
->addFieldToFilter('user_id', $userId)
->addFieldToFilter('device_id', $deviceId)
;
return $collection->getFirstItem();
}

Proper Way to Retrieve More than 128 Documents with RavenDB

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}

It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.

The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default

The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging

Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();

You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Cypher query loading complex CSV file - cypher

Related

How do I loop thought each DB field to see if range is correct

In VTD-XML how to add new attribute into tag with existing attributes?

How to insert image in postgresql?

SQL to Magento model understanding

Proper Way to Retrieve More than 128 Documents with RavenDB

Categories

Resources