How to iterate through node while there is a relationship - while-loop

I have nodes that are structured like folder, subfolder and files. Any folder can have a relationship with a subfolder, which can have a relationship with another subfolder, which can have a relationship with files. I'd like to iterate through every folder to find every subfolder and files inside a given folder.
In one query, I'd like to be able to get every file that is inside a folder or in his subfolders. I can't find any way to do it with Cypher. I saw FOREACH and UNWIND but I don't think it helps me.

Assuming you have labelled the nodes accordingly as Folder and File, the following query will fetch all the files belonging to the starting folder, directly or through a chain of one or more sub-folders:
MATCH(ParentFolder:Folder)-[*]->(childFile:File)
WHERE ParentFolder.name='Folder1'
RETURN childFile
If you haven't used Labels (highly recommend using them), you can look for all the paths starting with the specified folder and find all the last nodes in that path.
MATCH(ParentFolder)-[*]->(childFile)
WHERE ParentFolder.name='Folder1' AND NOT (childFile)-->()
RETURN childFile
The second query will fetch all the terminal nodes, even if they are folders. You would have to use labels or add filters in the where clause to ensure only files are fetched for childFile.
Both versions of the query work based on varying length paths. The wild character(*) retrieves all paths of any length starting from ParentFolder.

Related

Is there a way to list the directories in a using PySpark in a notebook?

I'm trying to see every file is a certain directory, but since each file in the directory is very large, I can't use sc.wholeTextfile or sc.textfile. I wanted to just get the filenames from them, and then pull the file if needed in a different cell. I can access the files just fine using Cyberduck and it shows the names on there.
Ex: I have the link for one set of data at "name:///mainfolder/date/sectionsofdate/indiviual_files.gz", and it works, But I want to see the names of the files in "/mainfolder/date" and in "/mainfolder/date/sectionsofdate" without having to load them all in via sc.textFile or sc.Wholetextfile. Both those functions work, so I know my keys are correct, but it takes too long for them to be loaded.
Considering that the list of files can be retrieve by one single node, you can just list the files in the directory. Look at this response.
wholeTextFiles returns a tuple (path, content) but I don't know if the file content is lazy to get only the first part of the tuple.

Alfresco lucene search cannot find folder

I have a folder in document library of a site. I want to find all content of that folder. Running following lucene/alfresco-fts query in Node Browser returns No items found:
PATH:"/app:company_home/st:sites/cm:mysite/cm:documentLibrary/cm:MyFolder/*"
Which is wrong, as I have documents in that folder and running same query for different folder returns proper result. Another strange thing is that I cannot get this folder: following query also returns No items found:
PATH:"/app:company_home/st:sites/cm:mysite/cm:documentLibrary/cm:MyFolder"
Also if I get content of document library then MyFolder is skipped in the results and subfolder is returned:
PATH:"/app:company_home/st:sites/cm:mysite/cm:documentLibrary/*"
Name | Parent
--------------|---------------------
cm:MyFolder2 | /app:company_home/st:sites/cm:mysite/cm:documentLibrary
cm:MySubfolder| /app:company_home/st:sites/cm:mysite/cm:documentLibrary/cm:MyFolder
I have checked the aspects and properties of MyFolder and they are the same as MyFolder2. I do not have any custom behaviours/rules/etc.
How can I make first lucene query work and return content of MyFolder?
Try updating metadata on the folder so Solr re-indexes it. You could also get its db id and then tell solr to re-index it by db id. If it has over 1000 children, a FTS query may fail. - Known issue. Try using a txmd query.
I would suggest you to get the node ref of the folder from folder details page and search in node browser. There you can get the primary path. Please cross verify the path you use to search using lucene or use that primary path to search for the folder in lucene search.
Another possibility is that the locale property(sys:locale) of the folder(MyFolder) will be different from the locale of your browser. Please check whether the locale of MyFolder and the other folders for which result is shown, are same or not. If not that can also be a reason.

Documentum - getting a list of sub folders

Is there a way in Documentum to get all sub folders of a folder? Can someone suggest a DQL or some thing where I can specify a parent folder and the DQL returns me a folder path of all the sub folders.
select distinct r_folder_path from dm_folder where folder('/Folder1/Folder2', descend)
This will return all the folders and subfolders under /Folder1/Folder2
One thing to keep in mind:
Documentum supports linking objects to multiple parent folders. This means that one folder can have multiple parent folders.
If you have a folder structure like this
Cabinet1
/Test1
/Test3
/Test2/
/Test3
Where Test3 is sub folder of Test1 but also of (as it can be linked to) Test2!
Documentum acomplishes this using repeating attributes. r_folder_path is a repating attribute of dm_folder (actually of dm_sysobject which is it's super type).
So, running a DQL :
select distinct r_folder_path from dm_folder where folder('/Folder1/Folder2', descend)
will return all folder paths your folder is part of (linked to):
/Cabinet1/Test1/Test3
/Cabinet1/Test2/Test3
Which might not be what you are looking for!
As DQL does not allow you to specify which repeating attribute value (you can not specify the index of repeating attribute) to be returned there is not elegant ( and fail safe) way to do it in DQL.
What you can do is to fetch all object_name of subfolders and prefix them with folder path of the parent folder you used in search (but that is with some coding).
Check Documentum Content Server System Object Reference guide (it is available on EMC developer community or for now also here)

Synchronizing PostgreSQL media tree of folder/file nodes to filesystem paths

In PostgreSQL I have implemented a materialized path tree (ltree) where each node has an an additional parent_id column besides the path.
These nodes can be associated with custom content types such as eg. folder and image.
The home directory for all these files are ./media
So right now i save the url file system reference inside the database like:
tbl_node:
id bigint
name character varying
path ltree (eg. 'nodeid.nodeid.nodeid.etc')
parent_id bigint
node_type int (document, media, template, etc)
tbl_content:
id bigint
node_id bigint
meta json (eg. {alt:"alt text here", caption: "etc", url:"/media/folder/(subfolders)/../image.jpg"})
tab_node.name = filename.filetype or foldername (in filesystem)
If PostgreSQL just had an equivalent to MSSQL's FILESTREAM or if DATALINK was implemented, then my problems would probably be solved, but I need somewhat the same functionality as these would have given me.
So my problem is... my node structure and file structure should always be kept in sync. However I can assume all renaming and moving of files and folders is done through my GUI and not directly in the filesystem.
That means whenever i rename eg. a parent-parent folder node in the database, the sub-sub image node's meta -> url must also reflect this new parent folder name (and of course the filesystem must change as well), so I guess I need to come up with a different solution, instead of saving a reference to the file (filesystem path) in a column inside the node.
Of course if I rename the node (image node) itself, I can easily change the url to point to the new name, and rename the image .jpg file during the db transaction in my golang model (node name = filename is my convention, as well as node hierarchy = folder hierarchy). The problem is when I change a parent node for eg. the parent (or parent-parent-...) folder - since I store the whole url on the children nodes to that image file, and the path has now changed.
Would a good solution to make a trigger fire when the path changes? I hope there's a better way.
What other approaches do exist, now that PostgreSQL does not have the FILESTREAM nor DATA LINK datatypes to help keeping my file system db nodes/tree synchronized with my media file system / directory structure?
Ps.
Since I guess Postgres does not have a clever solution to this, maybe it could be relevant to mention that I'm coding in Go (Golang), if you somehow have some way of coding around this from the application itself.
Thoughts and pseudocode so far:
1) Consider this tree that maps to this file system
Tree, DB:
sample image folder (node.name)
image.jpg (node.name)
Filesystem:
/media/sample image folder
/media/sample image folder/image.jpg
2) First we rename the "sample image folder" node.name to "renamed image folder" (and of course it's corresponding content record's meta->url to "/media/renamed image folder" from the CMS GUI.
3) When renaming the parent folders node.name column and content records meta->url, during the same transaction rename the corresponding folder in the filesystem
4) Later in that same transaction select all children by node path (ltree, materialized path)
5) And by referencing on the node.id and content.node_id, update the corresponding content record's meta->"url" (file system reference) substring(0,lastindexof("/")) - before the filename. - for ALL the child nodes of the folder/node-folder we renamed.
Hopefully all this gives you somewhat a clearer idea of what I'm trying to achieve. :)
Is very hard to understand what you wrote. But for what I get, I don't think you need a database to keep the changes of the file system. The file system IS the database. Now what you can do with the database is to have "virtual" paths pointing to the filesystem paths. But at that point you don't need to move files around in the filesystem, only make the database representation of it consistent.
node_id int
virtual_path string
physical_path string
virtual_attributes
physical_attributes
Hope it helps.
P.s this is not a go question

Single file versioning best practices?

User is selecting rather hefty single XML files via an NSOpenPanel. The application is making moderate changes to the file so I'd like to include an option of creating a backup in a subfolder based on the directory the original file was selected. Creating the new subfolder is no problem but does anybody have a good way to to create a backup of said foo.xml, is there a practice for such thing or is it as simple as creating a duplicate and renaming it foo.back01.xml?
Not sure, how much this Approach will fit with your requirement, but this is what i was doing,
-- Have a directory in the Temporary folder of the System : Assuming once the Application is closed all this files will be deleted,
-- To have the uniqueness in the file, generate file name with following pattern , have a function say [+(NSString *) generateFileNameForExtension:(NSString *)extension Create:(bool)bCreate]
Assuming input is .xml and false , it might give fileName something like this,
AppName128908765445.xml , i.e. [AppName][UTCTimeStamp].[Fileextension]
-- Once you think its done, there could be Function call [self addToDeleteList:(NSString *)fileName] which will add a file to delete list,
-- There would be a function, which shall invoke a timer for 1 minute and every one minute it will read all the files gets added into delete list then delete it.
Will share the code with you if needed...