What is the wildcard for the File connector file path field in Anypoint Studio and Mule - mule

I am using Anypoint Studio 7 and Mule 4.1.
A product file in csv format with a filename that will include the current timestamp will be added to a directory on a daily basis and needs to be processed. To do this we are creating a mule workflow using the file connector and want to configure the file path field to only read csv file formats regardless of name.
At the moment, the only way I can get it to work is by specifying the filename in the file path field which looks like this:
C:/Workspace/product-files-v1/src/main/resources/input/products-2018112011001111.csv
when I would like to specify some kind of wildcard in the file path similar to this:
C:/Workspace/product-files-v1/src/main/resources/input/products-*.csv
but the above does not work.
What is the correct wildcard syntax and also is there a way to specify the relative file path instead of the absolute one as when I try to specify a relative file path I get an error too?
Error message in logs:
********************************************************************************
Message : Illegal char <*> at index 108: C:/Workspace/product-files-v1/src/main/resources/input/products-*.csv.
Element : product-files-v1/processors/1 # product-files-v1:product-files-v1.xml:16 (Read File)
Element XML : <file:read doc:name="Read File" doc:id="fdbbf477-e831-4e7c-827c-71efd1d2e538" config-ref="File_Config" path="C:/Workspace/product-files-v1/src/main/resources/input/products-*.csv" outputMimeType="application/csv" outputEncoding="UTF-8"></file:read>
Error type : MULE:UNKNOWN
--------------------------------------------------------------------------------
Root Exception stack trace:
java.nio.file.InvalidPathException: Illegal char <*> at index 108: C:/Workspace/product-files-v1/src/main/resources/input/products-*.csv
Thanks for any help

i am assuming you need to user a <file:matcher> when you want to filter or read certain type of files from a directory.
an example would be
<file:matcher
filename-pattern="a?*.{htm,html,pdf}"
path-pattern="a?*.{htm,html,pdf}"
/>

Related

AzureSynapse Lookup UserErrorFileNotFound with Wildcard path

I am facing an odd issue where my lookup is returning a filenotfound error when I use a wildcard path. If I specify and exact file path, the lookup runs without error. However, if I replace the filename with a *, I get a filenotfound error.
The file is Data_643.json, located in my Azure Data Lake Storage Gen2, under the labournavigatorfile system. The exact file path is:
labournavigatorfile/raw_data/Scraped/HeadHunter/Saudi_Arabia/Data_643.json.
If I put this exact path into the Integration dataset configuration, the pipeline runs without issue. However, as soon as I replace the 'Data_643.json' with a '*', the pipeline crashes with a filenotfound error.
What am I doing wrong? Many Thanks for any support. This must be something very simple that I am missing.
Exact path works:
Wildcrad path throws error:
I have 3 files in my container as file1.json, file2.json, file3.json as shown below:
The following is how I configured my dataset to read using wildcard with configuration same as in the image provided in the question.
When I used this in lookup I got the same error:
To overcome this, go to your lookup activity. When you want to use wildcards to read a file/files, check the wildcard file path option. Then specify the folder structure and use wildcard where required. The following is an image for reference.
The following is the debug output when I run the pipeline (Each of my files had 10 rows):

how to read a mounted dbc file in databricks?

I try to read a dbc file in databricks (mounted from an s3 bucket)
the file path is:
file_location="dbfs:/mnt/airbnb-dataset-ml/dataset/airbnb.dbc"
how to read this file using spark?
I tried the code below:
df=spark.read.parquet(file_location)
But it generates and error:
AnalysisException: Unable to infer schema for Parquet. It must be specified manually.
thanks for help !
I tried the code below: df=spark.read.parquet(file_location) But it
generates and error:
You are using spark.read.parquet but want to read dbc file. It won't work this way.
Don't use parquet but use load. Give file path with file name (without .dbc extension) in path parameter and dbc in format paramter.
Try below code:
df=spark.read.load(path='<file_path_with_filename>', format='dbc')
Eg: df=spark.read.load(path='/mnt/airbnb-dataset-ml/dataset/airbnb', format='dbc')

"Get XML Data" step of pentaho is not able to read same xml file sometimes

I am using pentaho kettle tool for ETL job. In the job, one of the step(Get XML Data) is not able to read/parse xml file sometime. Sometime same XML file didn't throw any exception and sometime it threw. The list of errors are as given below -
1) Error on line 1 of document
file:///D:/softwares/pdi-ce-6.0.1.0-386/data-integration/UTF-8 : The
element type "Confidence" must be terminated by the matching end-tag
"".
2) org.dom4j.DocumentException: Error on line -1 of document :
Premature end of file. Nested exception: Premature end of file.
However, i don't find any issue in xml file. Could anyone help on this topic?
I didn't find the root cause but got the solution. The xml file which was being parsed by the step, was inside the zip file. Before parsing the xml file, a java step was unzipping the zip file. Instead of unzipping the zip file, i directly parsed the xml file inside the zip. That resolves the issue and no any error is reported again.

SAP DS: Read input xml file result in an error

I am using SAP DATA Services v. 4.2.
I am trying to acquire an XML file in input.
I created a new XML Schema starting from a .xsd file
When i launch the job i have this error:
2076818752FIL-0522267/25/2017 2:56:35 PM|Data flow DF_FE_XXXX
2076818752FIL-0522267/25/2017 2:56:35 PM<XML file reader->READ MESSAGE XX_INPUT_FILE OUTPUT(XX_INPUT_FILE)> cannot find file location object <%1> in repository.
24736 20092 RUN-050304 7/26/2017 9:18:39 AM Function call <raise_exception ( Error 52226 gestito in Error_handling ) > failed, due to error <50316>
What am i doing wrong?
Thanks
problem in the way how you identify file location in Data File(s) section of your format, BODS thinks that you provide some File Location and it don't find such
for more information about "File Locations"

USQL Job failing due to exceeding the path length limit

I am running my jobs locally using the Local SDK. However, I get the following error message:
Error : 'System.IO.PathTooLongException: The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.
One of my colleagues was able to track down the error to the .ss file in the catalog folder inside DataRoot by running the project in a new directory in C:\. The path for the .ss file is
C:\HelloWorld\Main\Source\Data\Insights\NewProject\NewProject\USQLJobsForTesting.Tests\bin\Debug\DataRoot\_catalog_\database\d92bfaa5-dc7f-4131-abdc-22c50eb0d8c0\schema\f6cf4417-e2d8-4769-b633-4fb5dddcb066\table\aa136daf-9e86-4650-9cc3-119d607fb3b0\31a18033-099e-4c2a-aae3-75cf099b0fb1.ss
which exceeds the allowed limit of 260 characters. I cannot reduce the length of my project path because my organization follows a certain working directory format.
Is there any possible solution for this problem?
Try using subst in CMD to workaround this problem by mapping a drive letter to the data root you want to use.
subst X: C:\PathToYourDataRoot
And then in ADL Tools for Visual Studio set the DataRoot to X: