Input data from file from directory where filename updates each month - pentaho

I'm having trouble finding the answer to this, but if it already exists, please share with me. I'm trying input a file that where the file ending is ambiguous within a directory. For example the directory would be 'ThisPC\Desktop\Folder' and I'm looking for file 'SomeData_April2021'. I need it to search for the term 'SomeData' within the folder, as the ending is updated for each month. There are also other files in the folder, so I can't just specify the directory and have it upload everything.
Hopefully this is clear! If not, I'm happy to clarify.

Figured it out. Had to type (?i).+SomeData.. into the wildcard expression field

Related

Not able to filter files using pathGlobFilter

We are trying to read file from directory based on pattern from azure blob srorage.We are using
pathGlobFilter option to select files. The directory contains following files
Sales_51820_14529409_T_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_51820_14529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_T_7a3cc7d1d17261fd17e7e1fabd3.csv
We need to process only those files which does not have "T" in file name .We need to process only these two files
Sales_51820_14529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_7a3cc7d1d17261fd17e7e1fabd3.csv
But we are not able to read only these two files.
Here is the code,
df = spark.read.format("csv").schema(structSchema).options(header=False,inferSchema=True,sep='|',pathGlobFilter= "Sales_\d{5} _ \d{8}_[a-z0-9]+.csv$").load("wasbs://abc#xxxxx.blob.core.windows.net/abc/2022/02/11/"
Regards,
Rajib
Glob is not a standard regular expression, there is differences between them.
For example glob doesn't match the number of times.
For details, see:here
Back to this question, a relatively stupid way, looking forward to the perfect solution of the giant.
pathGlobFilter="Sales_[0-9][0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[a-z0-9]*.csv"

Regexpression for getting a file

I have to get a file through PDI based on the filename and i want to select file with name matching pattern eligible_for_push which has to be at the end.The file can be .txt or .csv
Please Help
Thanks
There are two part to your query:
1. Finding all files ending with "eligible_for_push":
You cannot use regex to find this sort of pattern (at least i am not aware of). So as an alternate do the following:
Search all the files in the path using "Get Filename" steps. Use modified Javascript to find out the file ending with the above pattern. Check the JS file below.
2. Files can be ".txt" or ".csv":
You can use the below regex/wildcard to find choose between either .txt or .csv
.*\.txt|.*\.csv
Note : Use this code once you have filtered out the files ending with "eligible_for_push". The above JS ignore all the file patterns. After that use the second step to sort out all the .txt or .csv files.
Hope it helps :)

Mod-Rewrite to variable ending file

I'm trying to get apache to serve any request for /uploaded/2 with the first file that starts with 2 in a certain directory (say /foo/bar/).
Basically, If I have directory /foo/bar with contents:
1-filenameclutter.wav
2-clutterinthefilename.mp3
3-someweirdtext.jpg
And a web browser makes a request for /uploaded/1, apache would return 1-filenameclutter.wav; a request for /uploaded/2 would return 2-clutterinthefilename.mp3; etc. (all files with the right mime-type).
As far as I can see, ModRewrite can only go from a source with extraneous data to a simplified file on the file system, not the other way around.
Do you guys know any way to do this, with ModRewrite or using apache in another way (no PHP)?
EDIT:
Two things to point out, 1) I'm not concerned with duplicate files starting with the same id. These files correspond to an object in a database, which has a primarykey, id. 2) The reason I'm doing this is because I won't know exactly what the extension of the file is, but I do know the id, so when I form these I just prepend the orignal filename to the end of {{id}}- (Don't worry, I replace all ".."'s with "~").
Mod-rewrite has the ability to check if a single specific file exists, but you can't search a directory for a file pattern. Note that what you are suggesting would have horrible scaling attributes because the system would have to search all the files to find the file you are looking for. Since you already have the file in the database, why don't you just name the file with the id and keep the real filename in the database? In that case, /uploaded/2 would return the file at that location. You don't even need mod_rewrite.

Parse M3U file locations to fully qualified paths

I would like to parse the file location information in an M3U playlist into fully qualified paths. The possible formats in M3U files seem to be:
c:\mydir\songs\tune.mp3
\songs\tune.mp3
..\songs\tune.mp3
For the first example, just leave it alone. For the second add the directory that the playlist resides in so it would become c:\playlists\songs\tune.mp3 and the same for the third case so it would also become: c:\playlists\songs\tune.mp3.
I'm using vb under VS2008 and I can't find a way to recognise each of the potential location formats in the M3U file. System.IO.Path offers no solution that I can find. I've searched extensively for terms like "convert relative path to absolute" but no luck.
Any advice appreciated.
Thanks.
Write a batch script that just reads the m3u file line by line, and then just parse each line looking for ":" , and for "..", and edit the string as needed. You can then just write the "converted" strings to another file...

How to find any "txt" file at particular location in system?

I have a robot to find a file of the given name at a particular location in a system but now I want to find all the text files at that particular location. I have tried to use "*.txt", but it didn't worked out. Is there a way to do that?
file.exists ♥environment⟦USERPROFILE⟧\Documents\t.txt errormessage ‴Sorry, I could not find a file‴
dialog ‴File exists‴
You can use the directory command. The pattern arguments allows you to filter out files of a particular extension.
directory path ♥environment⟦USERPROFILE⟧\Desktop pattern *.txt result ♥files
dialog ♥files⟦count⟧
The above code should let you know how many files of the given extension exist in the given directory.
You could take values from the returned list and use it with file.exists command.