Synchronizing PostgreSQL media tree of folder/file nodes to filesystem paths - sql

In PostgreSQL I have implemented a materialized path tree (ltree) where each node has an an additional parent_id column besides the path.
These nodes can be associated with custom content types such as eg. folder and image.
The home directory for all these files are ./media
So right now i save the url file system reference inside the database like:
tbl_node:
id bigint
name character varying
path ltree (eg. 'nodeid.nodeid.nodeid.etc')
parent_id bigint
node_type int (document, media, template, etc)
tbl_content:
id bigint
node_id bigint
meta json (eg. {alt:"alt text here", caption: "etc", url:"/media/folder/(subfolders)/../image.jpg"})
tab_node.name = filename.filetype or foldername (in filesystem)
If PostgreSQL just had an equivalent to MSSQL's FILESTREAM or if DATALINK was implemented, then my problems would probably be solved, but I need somewhat the same functionality as these would have given me.
So my problem is... my node structure and file structure should always be kept in sync. However I can assume all renaming and moving of files and folders is done through my GUI and not directly in the filesystem.
That means whenever i rename eg. a parent-parent folder node in the database, the sub-sub image node's meta -> url must also reflect this new parent folder name (and of course the filesystem must change as well), so I guess I need to come up with a different solution, instead of saving a reference to the file (filesystem path) in a column inside the node.
Of course if I rename the node (image node) itself, I can easily change the url to point to the new name, and rename the image .jpg file during the db transaction in my golang model (node name = filename is my convention, as well as node hierarchy = folder hierarchy). The problem is when I change a parent node for eg. the parent (or parent-parent-...) folder - since I store the whole url on the children nodes to that image file, and the path has now changed.
Would a good solution to make a trigger fire when the path changes? I hope there's a better way.
What other approaches do exist, now that PostgreSQL does not have the FILESTREAM nor DATA LINK datatypes to help keeping my file system db nodes/tree synchronized with my media file system / directory structure?
Ps.
Since I guess Postgres does not have a clever solution to this, maybe it could be relevant to mention that I'm coding in Go (Golang), if you somehow have some way of coding around this from the application itself.
Thoughts and pseudocode so far:
1) Consider this tree that maps to this file system
Tree, DB:
sample image folder (node.name)
image.jpg (node.name)
Filesystem:
/media/sample image folder
/media/sample image folder/image.jpg
2) First we rename the "sample image folder" node.name to "renamed image folder" (and of course it's corresponding content record's meta->url to "/media/renamed image folder" from the CMS GUI.
3) When renaming the parent folders node.name column and content records meta->url, during the same transaction rename the corresponding folder in the filesystem
4) Later in that same transaction select all children by node path (ltree, materialized path)
5) And by referencing on the node.id and content.node_id, update the corresponding content record's meta->"url" (file system reference) substring(0,lastindexof("/")) - before the filename. - for ALL the child nodes of the folder/node-folder we renamed.
Hopefully all this gives you somewhat a clearer idea of what I'm trying to achieve. :)

Is very hard to understand what you wrote. But for what I get, I don't think you need a database to keep the changes of the file system. The file system IS the database. Now what you can do with the database is to have "virtual" paths pointing to the filesystem paths. But at that point you don't need to move files around in the filesystem, only make the database representation of it consistent.
node_id int
virtual_path string
physical_path string
virtual_attributes
physical_attributes
Hope it helps.
P.s this is not a go question

Related

How to iterate through node while there is a relationship

I have nodes that are structured like folder, subfolder and files. Any folder can have a relationship with a subfolder, which can have a relationship with another subfolder, which can have a relationship with files. I'd like to iterate through every folder to find every subfolder and files inside a given folder.
In one query, I'd like to be able to get every file that is inside a folder or in his subfolders. I can't find any way to do it with Cypher. I saw FOREACH and UNWIND but I don't think it helps me.
Assuming you have labelled the nodes accordingly as Folder and File, the following query will fetch all the files belonging to the starting folder, directly or through a chain of one or more sub-folders:
MATCH(ParentFolder:Folder)-[*]->(childFile:File)
WHERE ParentFolder.name='Folder1'
RETURN childFile
If you haven't used Labels (highly recommend using them), you can look for all the paths starting with the specified folder and find all the last nodes in that path.
MATCH(ParentFolder)-[*]->(childFile)
WHERE ParentFolder.name='Folder1' AND NOT (childFile)-->()
RETURN childFile
The second query will fetch all the terminal nodes, even if they are folders. You would have to use labels or add filters in the where clause to ensure only files are fetched for childFile.
Both versions of the query work based on varying length paths. The wild character(*) retrieves all paths of any length starting from ParentFolder.

How do you differentiate between QVD source files and target files when reading a QVW's XML MetaData?

I am currently trying to find an alternative to the Governance Dashboard that Rob Wunderlich (Qlik founder) created, since I am currently encountering errors when using it.
How do you differentiate between a data source (QVD, aka source) that is used by a QVW or a data file (QVD, aka target) that is generated by that QVW?
QVW:
LOAD
Lower(Discriminator) AS DataFile.Filepath
FROM C:\Sample_Transform_file.qvw (xmlSimple, Table is[DocumentSummary/LineageInfo])
Below is an example of what I found when parsing through the XML Metadata
(discriminator subtag within the lineageinfo tag) for one specific Transform QVW.
Sample Table Output
Are targets just identified by this?
STORE - [qvdName.qvd](qvd)
From what I have found, That appears to be the case, to a degree.
All of our QVW files that output a QVD utilize DIRECTORY statements rather than either hard-coded file location paths or variablized paths. Hence why all of the Targets are getting displayed as "STORE - qvdname.qvd", instead of displaying the filepath. In a sense, that is a flaw on QlikView's part, regarding its Governance Dashboard (or at the very least, they don't seem to recommend variablizing those paths as a standard in order to avoid breaking the lineage).

Which is the best practice either to save image name or full URL in database

Which is the better approach for storing image name in database? I have two choices first one is to store just image name e.g. apple.png and second choice is to store full image URL e.g. abc.com/src/apple.png.
Any help will be appreciated. Thanks.
Best practice is not save full path to image like abc.com/src/apple.png but saving specific domain path to image. Ex:
Users image : /user/{id}/avatar/img.png
Product image: /product/{id}/1.png
In this case you avoid sticking images to defined server, server path, url etc. For example, you will decide to move all your images to another server, in this case you don't need to change all records in DB.
The 2 answers already covered it pretty well. It is indeed best practice to save the directory path instead of saving the entire URL path. Some of the reasons were already covered, such as making it easy to move your folders to another server without having to make any changes whatsoever in your file logic.
What you could do, is also have everything in one directory, refer to that, and then just save the image name. However, I would not recommend that. The other structure simply makes it way easier to navigate and look through. Good file structure is something you'll thank yourself for later in case you ever have to go through things manually for one reason or another.
With that said, I'd like to add this trick into the mix:
$_SERVER['DOCUMENT_ROOT']. This always makes you start from the root directory as opposed to having to do tedious things, such as ../../ etc. It looks like a mess.
So in the end as an image path, you'd have something like:
<img src="<?php echo $_SERVER['DOCUMENT_ROOT'].'/'.$row['filePath']; ?>" >
$row['filePath'] being your stored filepath from the database.
Depending on how your file path is saved, you can lose the / in the image source link.
first of all you need to upload all images in public folder of your project , so no need to save domain name
If you are storing all images in one directory , then there is no problem storing only imagename in database
you can easily access images like <img src="/foldername/imagename.jpg" />
but if in your project there are multiple directory like
profile :to save user avatar image ,
background : to save background images,
then it is better to save image with path in database like "/profile/avatar.jpg"
so you can access image like <img src="imagepathhere" />
Another common way is to create image table with cols
id
type (enum or int)
name (file name)
Define in your app (better in model) types like
USER_AVATAR = 1;
PRODUCT_IMG = 2;
Define path map foreach image type like:
$paths = [
USER_AVATAR => '/var/www/project/web/images/users',
...
];
and use id's from this image table in another tables. It is called polymorphic association. It is most flexible way to store images.

Single file versioning best practices?

User is selecting rather hefty single XML files via an NSOpenPanel. The application is making moderate changes to the file so I'd like to include an option of creating a backup in a subfolder based on the directory the original file was selected. Creating the new subfolder is no problem but does anybody have a good way to to create a backup of said foo.xml, is there a practice for such thing or is it as simple as creating a duplicate and renaming it foo.back01.xml?
Not sure, how much this Approach will fit with your requirement, but this is what i was doing,
-- Have a directory in the Temporary folder of the System : Assuming once the Application is closed all this files will be deleted,
-- To have the uniqueness in the file, generate file name with following pattern , have a function say [+(NSString *) generateFileNameForExtension:(NSString *)extension Create:(bool)bCreate]
Assuming input is .xml and false , it might give fileName something like this,
AppName128908765445.xml , i.e. [AppName][UTCTimeStamp].[Fileextension]
-- Once you think its done, there could be Function call [self addToDeleteList:(NSString *)fileName] which will add a file to delete list,
-- There would be a function, which shall invoke a timer for 1 minute and every one minute it will read all the files gets added into delete list then delete it.
Will share the code with you if needed...

proper way to organize files in jcr repository

what is a proper way of organizing files in a wcm that is using JCR. Let's say the total file count is 100,000+ files and total file size is about 50-70GB.
Is it better to organize files by fie types ( and create sub directories to further group the files by some category)
What are the advantages. Does it make any difference while using query api, maintenance, or something.
Proposal 1:
--shared
------images
------pdf
------movies
--location1
------images
------pdf
------movies
--location2
------images
------pdf
------movies
Proposal 2:
--pdf
-------shared
-------location1
-------location2
--images
--------shared
--------location1
--------location2
.. etc
Take a look at this: David's Model: A guide for content modeling
Some highlights:
Data First, Structure Later. Maybe.
Drive the content hierarchy, don't let it happen.
Workspaces are for clone(), merge() and update().
Beware of Same Name Siblings.
References considered harmful.
Files are Files are Files.
ID's are evil.
Whatever you do, make sure you don't end up with more than a 1000 child nodes under any given node.
Just as in any (real) file system, when you want to list a folder with a lot of files/subfolders in it, it can take some time.
By default Jackrabbit 2.x will now hash up the user space.
ie:
/users/s/sa/sandra
/users/s/si/simong
...
I would personally go for your first proposal as it makes more sense.
We have a webapp where all our users can upload/delete/modify their files in JCR and did it this way:
/_users/s/si/simon/public
/_users/s/si/simon/public/My Pictures
/_users/s/si/simon/public/My Pictures/2010/06/Trip to the US
/_users/s/si/simon/public/My Pictures/2010/06/Trip to the US/DC1001.jpg
/_users/s/si/simon/private/account_details.txt
...
We're loosely following the way home folders are done in UNIX-like systems.
We try to hash up all the things we (reasonably) can. Like the for example the user space (/s/si/simong) but also things like messages:
/_users/s/si/simong/messages/2009/12/25/ab34ed87dee
/_users/s/si/simong/messages/2010/03/12/e4f1de3cd48
...
However it's up to the individual user to not have more then 1000 child files in a given folder (we do warn them though.)
Doing it this way also gives you a nice benefit of exercising Access Control.
ie: everthing under ~/private is only read- and writeable by the current user, ~/public is readable by everybody.