Filter files from directory vb.net - vb.net

Straight to the question...I have files such as word documents with extension(.doc) and its respective sample files starting with (.sample)
Now I would like to load only the word documents..
I found the way as shown below to load the files but this loads all the files
Can anyone say me how do I filter these files while loading them ?
This is what I'm trying to do:
Dim files = Array.FindAll(Directory.GetFiles(mydir), Function(x) (Not x.StartsWith(".sample")))
This is my directory consists of files as said above:

The way you use it, all the files are retrieved (paying the whole computational cost) and then they are filtered.
As stated in this article, you can use a search pattern directly in file retrieval from your file system.
I suppose you could do something like that:
Dim files = Directory.GetFiles(mydir,".doc*")
If you gave an example of filenames, perhaps I would give you the right filter to apply too.
Hope I helped!

The GetFiles method returns filenames with the path that you specified included.
So if your files are in a folder C:\working\, your mydir variable will contain "C:\working\" and all of the results of GetFiles will be something like
"C:\working\.sample_filename.doc"
"C:\working\123797.doc"
So your x.StartsWith is always going to return false, because x always starts with C:\
Try this:
Dim files = Array.FindAll(Directory.GetFiles(mydir), Function(x) (Not x.StartsWith(mydir & ".sample")))
Note this assumes that your mydir variable ends with a \ character. If not, add it in in the concatenation within the function.

Try this,
Dim files = Array.FindAll(Directory.GetFiles(mydir), Function(x) (Not Path.GetFileName(x).StartsWith(".sample")))

Related

Not able to filter files using pathGlobFilter

We are trying to read file from directory based on pattern from azure blob srorage.We are using
pathGlobFilter option to select files. The directory contains following files
Sales_51820_14529409_T_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_51820_14529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_T_7a3cc7d1d17261fd17e7e1fabd3.csv
We need to process only those files which does not have "T" in file name .We need to process only these two files
Sales_51820_14529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_7a3cc7d1d17261fd17e7e1fabd3.csv
But we are not able to read only these two files.
Here is the code,
df = spark.read.format("csv").schema(structSchema).options(header=False,inferSchema=True,sep='|',pathGlobFilter= "Sales_\d{5} _ \d{8}_[a-z0-9]+.csv$").load("wasbs://abc#xxxxx.blob.core.windows.net/abc/2022/02/11/"
Regards,
Rajib
Glob is not a standard regular expression, there is differences between them.
For example glob doesn't match the number of times.
For details, see:here
Back to this question, a relatively stupid way, looking forward to the perfect solution of the giant.
pathGlobFilter="Sales_[0-9][0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[a-z0-9]*.csv"

How to automate reading files in the folder?

I have some folder on my desktop, let's call it FOLDER_X.
I want to read all the names of files which are in the FOLDER_X.
Is it possible to do it with G1ANT, how can I do that?
You can do it also with usage of snippets, i.e. C#:
♥files = ⟦list⟧⊂System.IO.Directory.GetFiles("your path here")⊃
foreach ♥file in ♥files
dialog ♥file
end
If you want to remove the path from ♥file place this line at the beginning of the loop:
text.replace regex ‴^.*\\‴ text ♥file replace ‴‴ result ♥file
The below script will display all names of files that are in the FOLDER_X.
The directory command will retrieve all the directories and files that are in the specified path and create the ♥result variable containing all the data in a list.
The foreach loop will iterate through the found elements which are of G1ANT path structure. There are several indexes that you can use, for example isfile and name that are useful here.
directory path ‴♥environment⟦USERPROFILE⟧\Desktop\FOLDER_X‴
foreach ♥element in ♥result
dialog ♥element⟦name⟧ if ⊂♥element⟦isfile⟧⊃
end foreach

VB.net rename each file in a directory

Id like to rename all my files in one specific directory. They shall all get the identical extension. I tried using a for loop:
For Each s As String In IO.Directory.GetFiles(Environ("PROGRAMFILES(x86)"), "*", IO.SearchOption.AllDirectories)
Try
My.Computer.FileSystem.RenameFile(s, s & ".new")
Catch ex As Exception
End Try
So that the name s (as string) becomes s & extension ".new"
However, that didnt work.
If you had read the documentation for the RenameFile method you are calling, as you should have to begin with but especially when it didn't work, then you would know that the first argument requires the full path of the file while the second argument requires just the new name of the file. That means that you need this:
My.Computer.FileSystem.RenameFile(s, My.Computer.FileSystem.GetName(s) & ".new")
The File.Move method requires full paths in both cases because it supports renaming within the same folder and moving to a different folder. You say that you want to use RenameFile but didn't bother to note how it is different, i.e. it only supports renaming within the same folder so specifying that path twice is pointless and allowing different paths to be specified would cause problems.

Regexpression for getting a file

I have to get a file through PDI based on the filename and i want to select file with name matching pattern eligible_for_push which has to be at the end.The file can be .txt or .csv
Please Help
Thanks
There are two part to your query:
1. Finding all files ending with "eligible_for_push":
You cannot use regex to find this sort of pattern (at least i am not aware of). So as an alternate do the following:
Search all the files in the path using "Get Filename" steps. Use modified Javascript to find out the file ending with the above pattern. Check the JS file below.
2. Files can be ".txt" or ".csv":
You can use the below regex/wildcard to find choose between either .txt or .csv
.*\.txt|.*\.csv
Note : Use this code once you have filtered out the files ending with "eligible_for_push". The above JS ignore all the file patterns. After that use the second step to sort out all the .txt or .csv files.
Hope it helps :)

BigQuery loading batch folders error

I'm trying to load group of folders files in one time with when
i set
sourceURI = 'gs://ybbi/bi_landing_zone/files_to_load/app/reports/app_network_analytics_report/201409011*'
all the folders that i'm want to load start with 20140911
but i get the error:
ERROR: Invalid path: gs://ybbi/bi_landing_zone/files_to_load/apn/reports/appnexus_network_analytics_report/20140901191111_3bab8ec0_092a_43de_a157_db35d1555ea0/
20140901191111_3bab8ec0_092a_43de_a157_db35d1555ea0 is one of these folders(don't know why it's print the all folder name of this specific folder)
in some other folder tree cases it's works, but in this specific folder tree it's return the same error .
i know that cloud storage don't have real folders and it's part of the name of the object, but you understand what i mean.
is it bug?
Without more information, what it looks like is that you have a object file called gs://ybbi/bi_landing_zone/files_to_load/apn/reports/appnexus_network_analytics_report/20140901191111_3bab8ec0_092a_43de_a157_db35d1555ea0/ that is not a csv/json file. Some tools may create these dummy files in order to simulate directories. BigQuery requires all objects that match the input glob path to be importable files.
One solution would be to change the glob path to include a narrower set of files. You can pass multiple paths if that makes things easier. For example, you could pass
gs://ybbi/bi_landing_zone/files_to_load/apn/reports/appnexus_network_analytics_report/20140901191111_3bab8ec0_092a_43de_a157_db35d1555ea0/*
and
gs://ybbi/bi_landing_zone/files_to_load/apn/reports/appnexus_network_analytics_report/20140901191111_some_other_path/*