Talend get current filename - filenames

I want to load excel files into the mysql database and check that they do not already exist, my problem is that I can not extract the name of the current file.
For example, I have the following files A.xlsx, B.xlsx and C.xls.
It return always B.xlsx

I think the issue you have is that your "RunIf" link is before the iterator, and therefore it's not being triggered at the right time.
The image below shows a simplified version, where I list the rows in the spreadsheets and then the file names. If I connected the second subjob to the tFileList_1 component, as you did in the question, I only get the logs from tLogRow_1 and the last filename by tLogRow_2.
With the link as shown, behind the iterator, then I get spreadsheet contents listed and then the title of it:
Col1|Col2|Col3
A|B|C
D|E|F
A.xlsx
Col1|Col2|Col3
A|B|C
D|E|F
B.xlsx
Col1|Col2|Col3
A|B|C
D|E|F
C.xlsx
I am assuming that you have filted out the duplicated files in the tMap component, so if you link your second subjob from the disponsibilite_3d component, I think you will get the result you are looking for.

Related

Get File Structure from Get Metadata in ADF

I want to get the column names for a parquet file. I have a Get Metadata module in my pipeline and it is using a parquet dataset with only the root folder provided. Because only the folder is provided ADF is not letting me get the file structure that contains the column names. The file name is not provided because that can change. Can anyone provide some advice on how to approach this?
You will need 2 Get Metadata activities and a ForEach activity to get the file structure if your file name is not the same every time.
Source dataset:
Parameterize the file name as the name changes frequently.
Preview of source data:
Get Metadata1:
In the first Get Metadata activity, get the file name dynamically.
You can also specify if your file name contains any specific pattern by adding an expression in the filename or you can mention asterisk (*) if you don’t have a specific pattern or need more than 1 file in the folder needs to be processed.
Give field list as child items when you want to get the files from the folder.
Output of Get Metadata1: Get the file name from the folder.
FoEach activity:
Using the ForEach activity, you can get the item's name listed inside the Get Metadata activity output array.
Get Metadata2:
Add Get Metadata activity inside ForEach activity to get the file structure or column list of the current file from the folder. It can loop the number of items count in the folder (1 or more).
Output of Get Metadata2:
You can parameterize your file name in dataset or via GetMeta data activity, get the list of files within the folder and then via GetMetaData activity get the list of columns for those corresponding files.

Azure Data Factory check file name dynamically

I'm checking daily if certain files exist in a folder on-prem. The files have a specific format, but the first few letters indicate specific job. For example, xyz-yyyyMMdd.csv, or abc-yyMMdd.csv etc
I would like to use switch activity to see if the file for each job has arrived or an alert should be used. How can I dynamically let the switch activity read the 'xyz' portion knowing that the other part of the file name is dynamic?
Thank you
If number of your few letters is three as you said, you can try this expression:
#substring(item().name,0,3)
If no, you can try this:
#split(item().name,'-')[0]
Here is my test:

Is there a way to list the directories in a using PySpark in a notebook?

I'm trying to see every file is a certain directory, but since each file in the directory is very large, I can't use sc.wholeTextfile or sc.textfile. I wanted to just get the filenames from them, and then pull the file if needed in a different cell. I can access the files just fine using Cyberduck and it shows the names on there.
Ex: I have the link for one set of data at "name:///mainfolder/date/sectionsofdate/indiviual_files.gz", and it works, But I want to see the names of the files in "/mainfolder/date" and in "/mainfolder/date/sectionsofdate" without having to load them all in via sc.textFile or sc.Wholetextfile. Both those functions work, so I know my keys are correct, but it takes too long for them to be loaded.
Considering that the list of files can be retrieve by one single node, you can just list the files in the directory. Look at this response.
wholeTextFiles returns a tuple (path, content) but I don't know if the file content is lazy to get only the first part of the tuple.

SSRS - How to show external image based on URL inside column

I am trying to show images for products inside a basic report. The image needs to be dynamic, meaning the image should change based on the SKU value.
Right now I am inserting an image into a table, setting to external, and i've tried:
=Fields!URL.Value
=http://externalwebservername/sku= & Fields!SKU.Value
="http://externalwebservername/sku=" & Fields!SKU.Value
I do not get any images in my table.
My stored proc has all the data, including a URL with the image I wan't to show. Here is a sample of what the URL looks like:
http://externalwebservername/sku=123456
If I enter the URL in the field without "=" it will show that ONE image only.
How should I set up the expression to properly show the external image based on a dynamic URL? Running SQL 2016
Alan's answer should work, but in our environment we have strict proxy/firewall rules, so the two servers could not contact each other.
Instead we are navigating to the file stored on our storage system.
We altered the URL column to point to file path in the stored procedure. Insert image, set Source to External and Value set to [URL].
URL= file://server\imagepath.jpg
As long as the account executing the report has permissions to access the URLs then your 3rd expression should have worked.
I put together a simple example as follows.
I created a new blank report then added a Data Source. It doesn't matter where this points, we won't use it directly.
Then I created a dataset (Dataset1) with the following SQL to give me list of image names.
SELECT '350x120' AS suffix
UNION SELECT '200x100'
UNION SELECT '500x500'
Actually, these are just parameters for the website http://placehold.it/ which will generate images based on the size you request, but that's not relevant for this exercise.
We'll be showing three images from the following URLs
http://placehold.it/350x120
http://placehold.it/200x100
http://placehold.it/500x500
Next, create a table, I used 3 columns to give me more testing options. Set the DataSetName to DataSet1 if it isn't already.
In the first column the expression is just =Fields!suffix.Value
In the second column I added an image, set it's source property to External and the Value to ="http://placehold.it/" & Fields!suffix.Value
I then added a 3rd column with the same expression as the image Value so I could see what was being used as the image URL. I also added an action that goes to the same URL, just to check the URL did not have any unprintable characters in it that might cause a problem.
The basic report design looks like this.
The rendered result looks like this.

Find correct table number (Selenium IDE)

I have this command here in Selenium IDE to store a text in a variable:
Command: storeText
Target: //div[#id='content-main']/form[2]/table[5]/tbody/tr[td[1][contains(text(), 'Purchase')]]/td[2]
Value: variableName
As you can see, in this command it looks in the first column of the 5th table and search for the line where it says "Purchase" and stores the string content from the second column.
The problem is this: table[5]
There are some times where this table is not always the 5th table. So, I'd like to know if there is some way to search for this String that I'm looking for, but without the table number, so the command would first find the table number, and then find the string I'm looking for.
To make it easier, here is the HTML source of the page I'm doing my tests:
http://txtup.co/e9KYB
I accept suggestions to maybe do it in another way, what I need is to store the Purchase Type value that is in this page.
Just change table[5] to table. The full XPath will then be:
//div[#id='content-main']/form[2]/table/tbody/tr[td[1][contains(text(), 'Purchase')]]/td[2]