Azure ADF GetMetadata childItems if folder might not exist - azure-data-factory-2

I have a path to DataLakeStorage, which may or may not exist.
I want to iterate over the contents of that folder, if it exists.
In C# I would arrange to have a children collection, that was empty if the folder didn't exist, and then iterate over that (possibly empty) collection.
Can I do the same in ADF (v2)?
If I do a Get Metadata activity returning both exists and childItems, then it nearly works:
It works if the folder does exist
It doesn't error if the folder does NOT exist.
But the childItems property is not defined if the folder doesn't exist, so I don't get an empty array to iterate over.
The first solution that comes to mind is to try to build Azure ADF expression that returns either an existing array, or an empty array, based on a bool, which I've asked as a direct question. But if there's a nicer / more idiomatic approach, then I'm open to that too :)

Please try something like this:
1.create a variable,type is array,value is empty,like this:
2.create a For Each activity which depends on your Get Metadata success.
Expression:
#if( contains(activity('Get Metadata1').output,'childitems'), activity('Get Metadata1').output.childitems, variables('emptyArr'))
or
#if( activity('Get Metadata1').output.exists, activity('Get Metadata1').output.childitems, variables('emptyArr'))
Below is my test:
Scenario one:path exists
Scenario two:path not exists
Hope this can help you:)

Related

Is there a way to list the directories in a using PySpark in a notebook?

I'm trying to see every file is a certain directory, but since each file in the directory is very large, I can't use sc.wholeTextfile or sc.textfile. I wanted to just get the filenames from them, and then pull the file if needed in a different cell. I can access the files just fine using Cyberduck and it shows the names on there.
Ex: I have the link for one set of data at "name:///mainfolder/date/sectionsofdate/indiviual_files.gz", and it works, But I want to see the names of the files in "/mainfolder/date" and in "/mainfolder/date/sectionsofdate" without having to load them all in via sc.textFile or sc.Wholetextfile. Both those functions work, so I know my keys are correct, but it takes too long for them to be loaded.
Considering that the list of files can be retrieve by one single node, you can just list the files in the directory. Look at this response.
wholeTextFiles returns a tuple (path, content) but I don't know if the file content is lazy to get only the first part of the tuple.

No output is generated when using reference data in Azure Stream Analytics

I have written a simple query and join it with a json reference data. I can see correct results when testing the query in "Test results" tab. However, no output is generated when starting the job.
I have confirmed that the output blob is created when no join with reference data is used in the query.
Any help is appreciated. The sample reference json follows:
[
{
"DeviceId":"DEV-021",
"Brand":"brand01",
"Model":"model01"
}
]
Use flat json structure instead of array. It should give you the output
Check the path you specified in the reference data, maybe it is not correct or you did not specify the file name. Does it contain something like {date}/{time}/filename.json?
If you forget to specify the file name, it does not work as well.
And if you are testing the job, usually you specify the file manually and that is why your query works.

How to apply metadata to all files in a content directory

I have a content directory called foo and I want all files under that directory to have an extra metadata item foovar: default, unless explicitly overridden in the file header. I think I'm supposed to do this with EXTRA_PATH_METADATA, but I can't figure out what incantation it wants.
(for my current use case I'm trying to apply template: sometemplate within this dir, but I'm interested in solving the general case as it would make several related headaches go away)
I think what you're looking for is actually DEFAULT_METADATA. Check out this portion of the documentation:
DEFAULT_METADATA = {}
The default metadata you want to use for all articles and pages.
So, in your case it might look something like this in your config file:
DEFAULT_METADATA = {'foovar': 'default'}
Then to assign your custom template(s), see this portion of the documentation.
This wasn't possible at the time I asked. I've since sent the devs a PR adding support, and it's been merged to master. Presumably it will go out in the next release. It makes EXTRA_PATH_METADATA recursive, so you can apply settings to a subdir like this:
EXTRA_PATH_METADATA = {'dirname/subdir': {'status': 'hidden'}}

Documentum - getting a list of sub folders

Is there a way in Documentum to get all sub folders of a folder? Can someone suggest a DQL or some thing where I can specify a parent folder and the DQL returns me a folder path of all the sub folders.
select distinct r_folder_path from dm_folder where folder('/Folder1/Folder2', descend)
This will return all the folders and subfolders under /Folder1/Folder2
One thing to keep in mind:
Documentum supports linking objects to multiple parent folders. This means that one folder can have multiple parent folders.
If you have a folder structure like this
Cabinet1
/Test1
/Test3
/Test2/
/Test3
Where Test3 is sub folder of Test1 but also of (as it can be linked to) Test2!
Documentum acomplishes this using repeating attributes. r_folder_path is a repating attribute of dm_folder (actually of dm_sysobject which is it's super type).
So, running a DQL :
select distinct r_folder_path from dm_folder where folder('/Folder1/Folder2', descend)
will return all folder paths your folder is part of (linked to):
/Cabinet1/Test1/Test3
/Cabinet1/Test2/Test3
Which might not be what you are looking for!
As DQL does not allow you to specify which repeating attribute value (you can not specify the index of repeating attribute) to be returned there is not elegant ( and fail safe) way to do it in DQL.
What you can do is to fetch all object_name of subfolders and prefix them with folder path of the parent folder you used in search (but that is with some coding).
Check Documentum Content Server System Object Reference guide (it is available on EMC developer community or for now also here)

Single file versioning best practices?

User is selecting rather hefty single XML files via an NSOpenPanel. The application is making moderate changes to the file so I'd like to include an option of creating a backup in a subfolder based on the directory the original file was selected. Creating the new subfolder is no problem but does anybody have a good way to to create a backup of said foo.xml, is there a practice for such thing or is it as simple as creating a duplicate and renaming it foo.back01.xml?
Not sure, how much this Approach will fit with your requirement, but this is what i was doing,
-- Have a directory in the Temporary folder of the System : Assuming once the Application is closed all this files will be deleted,
-- To have the uniqueness in the file, generate file name with following pattern , have a function say [+(NSString *) generateFileNameForExtension:(NSString *)extension Create:(bool)bCreate]
Assuming input is .xml and false , it might give fileName something like this,
AppName128908765445.xml , i.e. [AppName][UTCTimeStamp].[Fileextension]
-- Once you think its done, there could be Function call [self addToDeleteList:(NSString *)fileName] which will add a file to delete list,
-- There would be a function, which shall invoke a timer for 1 minute and every one minute it will read all the files gets added into delete list then delete it.
Will share the code with you if needed...