Single file versioning best practices? - objective-c

User is selecting rather hefty single XML files via an NSOpenPanel. The application is making moderate changes to the file so I'd like to include an option of creating a backup in a subfolder based on the directory the original file was selected. Creating the new subfolder is no problem but does anybody have a good way to to create a backup of said foo.xml, is there a practice for such thing or is it as simple as creating a duplicate and renaming it foo.back01.xml?

Not sure, how much this Approach will fit with your requirement, but this is what i was doing,
-- Have a directory in the Temporary folder of the System : Assuming once the Application is closed all this files will be deleted,
-- To have the uniqueness in the file, generate file name with following pattern , have a function say [+(NSString *) generateFileNameForExtension:(NSString *)extension Create:(bool)bCreate]
Assuming input is .xml and false , it might give fileName something like this,
AppName128908765445.xml , i.e. [AppName][UTCTimeStamp].[Fileextension]
-- Once you think its done, there could be Function call [self addToDeleteList:(NSString *)fileName] which will add a file to delete list,
-- There would be a function, which shall invoke a timer for 1 minute and every one minute it will read all the files gets added into delete list then delete it.
Will share the code with you if needed...

Related

Azure Data Factory check file name dynamically

I'm checking daily if certain files exist in a folder on-prem. The files have a specific format, but the first few letters indicate specific job. For example, xyz-yyyyMMdd.csv, or abc-yyMMdd.csv etc
I would like to use switch activity to see if the file for each job has arrived or an alert should be used. How can I dynamically let the switch activity read the 'xyz' portion knowing that the other part of the file name is dynamic?
Thank you
If number of your few letters is three as you said, you can try this expression:
#substring(item().name,0,3)
If no, you can try this:
#split(item().name,'-')[0]
Here is my test:

Is there a way to list the directories in a using PySpark in a notebook?

I'm trying to see every file is a certain directory, but since each file in the directory is very large, I can't use sc.wholeTextfile or sc.textfile. I wanted to just get the filenames from them, and then pull the file if needed in a different cell. I can access the files just fine using Cyberduck and it shows the names on there.
Ex: I have the link for one set of data at "name:///mainfolder/date/sectionsofdate/indiviual_files.gz", and it works, But I want to see the names of the files in "/mainfolder/date" and in "/mainfolder/date/sectionsofdate" without having to load them all in via sc.textFile or sc.Wholetextfile. Both those functions work, so I know my keys are correct, but it takes too long for them to be loaded.
Considering that the list of files can be retrieve by one single node, you can just list the files in the directory. Look at this response.
wholeTextFiles returns a tuple (path, content) but I don't know if the file content is lazy to get only the first part of the tuple.

Is there a recommended way to skip first line of a CSV file in Moqui?

I have a CSV file generated by another programme that is uploaded to Moqui as a FileItem without any editing done on the CSV file.
So it has a header that I don't want to use, therefore I manually specify csvEntityName and csvFieldNames for the entity data loader. But the header is taken as the first record. - Is there a recommended way to skip the first line?
Digging deeper, in EntityDataLoaderImpl.groovy we have:
CSVParser parser = CSVFormat.newFormat(edli.csvDelimiter)
.withCommentMarker(edli.csvCommentStart)
.withQuote(edli.csvQuoteChar)
.withSkipHeaderRecord(true) // TODO: remove this? does it even do anything?
.withIgnoreEmptyLines(true)
.withIgnoreSurroundingSpaces(true)
.parse(reader)
The reason .withSkipHeaderRecord(true) currently does nothing is you first have to specify that the file has a header to skip using .withHeader(). ( https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVFormat.html#withFirstRecordAsHeader--)
If you add that, the .withSkipHeaderRecord(boolean) will skip the header record if passed 'true'.
(I think this needs to be made an issue, so I will do that.)

How do you differentiate between QVD source files and target files when reading a QVW's XML MetaData?

I am currently trying to find an alternative to the Governance Dashboard that Rob Wunderlich (Qlik founder) created, since I am currently encountering errors when using it.
How do you differentiate between a data source (QVD, aka source) that is used by a QVW or a data file (QVD, aka target) that is generated by that QVW?
QVW:
LOAD
Lower(Discriminator) AS DataFile.Filepath
FROM C:\Sample_Transform_file.qvw (xmlSimple, Table is[DocumentSummary/LineageInfo])
Below is an example of what I found when parsing through the XML Metadata
(discriminator subtag within the lineageinfo tag) for one specific Transform QVW.
Sample Table Output
Are targets just identified by this?
STORE - [qvdName.qvd](qvd)
From what I have found, That appears to be the case, to a degree.
All of our QVW files that output a QVD utilize DIRECTORY statements rather than either hard-coded file location paths or variablized paths. Hence why all of the Targets are getting displayed as "STORE - qvdname.qvd", instead of displaying the filepath. In a sense, that is a flaw on QlikView's part, regarding its Governance Dashboard (or at the very least, they don't seem to recommend variablizing those paths as a standard in order to avoid breaking the lineage).

proper way to organize files in jcr repository

what is a proper way of organizing files in a wcm that is using JCR. Let's say the total file count is 100,000+ files and total file size is about 50-70GB.
Is it better to organize files by fie types ( and create sub directories to further group the files by some category)
What are the advantages. Does it make any difference while using query api, maintenance, or something.
Proposal 1:
--shared
------images
------pdf
------movies
--location1
------images
------pdf
------movies
--location2
------images
------pdf
------movies
Proposal 2:
--pdf
-------shared
-------location1
-------location2
--images
--------shared
--------location1
--------location2
.. etc
Take a look at this: David's Model: A guide for content modeling
Some highlights:
Data First, Structure Later. Maybe.
Drive the content hierarchy, don't let it happen.
Workspaces are for clone(), merge() and update().
Beware of Same Name Siblings.
References considered harmful.
Files are Files are Files.
ID's are evil.
Whatever you do, make sure you don't end up with more than a 1000 child nodes under any given node.
Just as in any (real) file system, when you want to list a folder with a lot of files/subfolders in it, it can take some time.
By default Jackrabbit 2.x will now hash up the user space.
ie:
/users/s/sa/sandra
/users/s/si/simong
...
I would personally go for your first proposal as it makes more sense.
We have a webapp where all our users can upload/delete/modify their files in JCR and did it this way:
/_users/s/si/simon/public
/_users/s/si/simon/public/My Pictures
/_users/s/si/simon/public/My Pictures/2010/06/Trip to the US
/_users/s/si/simon/public/My Pictures/2010/06/Trip to the US/DC1001.jpg
/_users/s/si/simon/private/account_details.txt
...
We're loosely following the way home folders are done in UNIX-like systems.
We try to hash up all the things we (reasonably) can. Like the for example the user space (/s/si/simong) but also things like messages:
/_users/s/si/simong/messages/2009/12/25/ab34ed87dee
/_users/s/si/simong/messages/2010/03/12/e4f1de3cd48
...
However it's up to the individual user to not have more then 1000 child files in a given folder (we do warn them though.)
Doing it this way also gives you a nice benefit of exercising Access Control.
ie: everthing under ~/private is only read- and writeable by the current user, ~/public is readable by everybody.