Extract the very first last modified date from Azure Data Lake topic using Azure Data factory - azure-data-factory-2

Is there any method available in Azure Data Factory to get the very first last modified date from Azure Data Lake. Filename can be anything. I need the last modified date of a very first file uploaded in the data lake topic.
For Eg:
+----------+------------------+
| Filename | LastModifiedDate |
+----------+------------------+
| File1 | 2021-10-01 |
| File2 | 2021-10-02 |
| File1 | 2021-10-03 |
+----------+------------------+
Expected output: 2021-10-01
Any help would be appreciated.
Regards,
Sandeep

You could go through each folder in the datalake with Get-Metadata Activity like is done in this archived question on MSFT Forum.
Depending on the number of folders and files, it is a rather brute force way of retrieving the earliest date of any file in your datalake.
I found it easier to use PowerShell;
$storageAccount = 'storageAccountName';
$resourceGroupName = 'resourceGroupName';
$storageAccountKey = (Get-AzStorageAccountKey -ResourceGroupName $resourceGroupName -Name $storageAccount | Select-Object -Property Value -First 1).Value
$context = New-AzStorageContext -StorageAccountName $storageAccount -StorageAccountKey $storageAccountKey
$allblobs = Get-AzStorageBlob -Container $containername -Context $context
$allblobs | Sort-Object -Property LastModified | Select-Object -Property Name,LastModified -First 1
This PowerShell script returns the Name and LastModified datetime of the file with the earliest LastModified value. However, running a PowerShell script directly using ADF is not so straight forward. Here is an article by Bob Blackburn on how to achieve this.

Related

Snowflake: Error loading a LOCAL file that has some ARRAY fields

I am getting an Error when loading a LOCAL file that has some ARRAY fields.
I do not get this for source files where all fields are STRINGS.
CREATED TABLE:
CREATE TABLE "TESTDB"."LAYER2"."CREW" ("TCONST" STRING, "DIRECTORS" ARRAY, "WRITERS" ARRAY);
DATA FILE:
tconst directors writers
tt0000001 nm0005690 \N
tt0000002 nm0721526 \N
tt0000003 nm0721526 \N
tt0000004 nm0721526 \N
tt0000005 nm0005690 \N
tt0000006 nm0617588 nm0617588
tt0000007 nm0374658,nm0005690 \N
tt0000008 nm0719756 nm0331003,nm0759866,nm0173952,nm0719756,nm0816458
SQL
PUT file://<file_path>/title.crew_1.tsv #TEST_2/ui1770650179898
COPY INTO "TESTDB"."LAYER2"."TEST_2" FROM #/ui1770650179898
FILE_FORMAT = '"TESTDB"."LAYER1"."TSV"' ON_ERROR =
'ABORT_STATEMENT' PURGE = TRUE;
ERROR MESSAGE:
Unable to copy files into table. Error parsing JSON: nm0005690 File
'#TEST_2/ui1770650179898/sample.samplefile.tsv', line 2, character 11
Row 1, column "TEST_2"["DIRECTORS":2] If you would like to continue
loading when an error is encountered, use other values such as
'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more
information on loading options, please run 'info loading_data' in a
SQL client.
I feel it is an easy answer -- troubleshooted various methods with no success and could not find anything on the web to point me in the right direction.
Any assistance would be greatly appreciated.
You need to convert strings to arrays. I assume that the delimiter is the tab character:
create file format csvtab type=csv field_delimiter = '\t' skip_header=1;
COPY INTO "CREW" FROM
(select $1, SPLIT($2,','), SPLIT($3,',') from #mystage)
FILE_FORMAT = csvtab ON_ERROR = 'ABORT_STATEMENT' ;
select * from "CREW";
+-----------+---------------------------+---------------------------------------------------------------+
| TCONST | DIRECTORS | WRITERS |
+-----------+---------------------------+---------------------------------------------------------------+
| tt0000001 | ["nm0005690"] | |
| tt0000002 | ["nm0721526"] | |
| tt0000003 | ["nm0721526"] | |
| tt0000004 | ["nm0721526"] | |
| tt0000005 | ["nm0005690"] | |
| tt0000006 | ["nm0617588"] | ["nm0617588"] |
| tt0000007 | ["nm0374658","nm0005690"] | |
| tt0000008 | ["nm0719756"] | ["nm0331003","nm0759866","nm0173952","nm0719756","nm0816458"] |
+-----------+---------------------------+---------------------------------------------------------------+

statistics chart in splunk using value from log

I am new to Splunk dashboard. I need some help with this kind of data.
2020-09-22 11:14:33.328+0100 org{abc} INFO 3492 --- [hTaskExecutor-1] c.j.a.i.p.v.b.l.ReadFileStepListener : [] read-feed-file-step ended with status exitCode=COMPLETED;exitDescription= with compositeReadCount 1 and other count status as: BatchStatus(readCount=198, multiEntityAccountCount=0, readMultiAccountEntityAdjustment=0, accountFilterSkipCount=7, broadRidgeFilterSkipCount=189, writeCount=2, taskCreationCount=4)
I wanted to have statistics in a dashboard showing all the integer values in the above log.
Edit 1:
I tried this but not working.
index=abc xyz| rex field=string .*readCount=(?P<readCount>\d+) | table readCount
See if this run-anywhere example helps.
| makeresults
| eval _raw="2020-09-22 11:14:33.328+0100 org{abc} INFO 3492 --- [hTaskExecutor-1] c.j.a.i.p.v.b.l.ReadFileStepListener : [] read-feed-file-step ended with status exitCode=COMPLETED;exitDescription= with compositeReadCount 1 and other count status as: BatchStatus(readCount=198, multiEntityAccountCount=0, readMultiAccountEntityAdjustment=0, accountFilterSkipCount=7, broadRidgeFilterSkipCount=189, writeCount=2, taskCreationCount=4)"
`comment("Everything above just sets up test data")`
| extract kvdelim=",", pairdelim="="
| timechart span=1h max(*Count) as *Count
I solved this using
index=xyz |regex ".*fileName=(\s*([\S\s]+))" | rex field=string .*compositeReadCount=(?P<compositeReadCount>\d+) |regex ".*readCount=(?P<readCount>\d+)" | regex ".*multiEntityAccountCount=(?P<multiEntityAccountCount>\d+)" | regex ".*readMultiAccountEntityAdjustment=(?P<readMultiAccountEntityAdjustment>\d+)" | regex ".*accountFilterSkipCount=(?P<accountFilterSkipCount>\d+)" | regex ".*broadRidgeFilterSkipCount=(?P<broadRidgeFilterSkipCount>\d+)" | regex ".*writeCount=(?P<writeCount>\d+)" | regex ".*taskCreationCount=(?P<taskCreationCount>\d+)" | regex ".*status=(\s*([\S\s]+))" | table _time fileName compositeReadCount readCount multiEntityAccountCount readMultiAccountEntityAdjustment accountFilterSkipCount broadRidgeFilterSkipCount writeCount taskCreationCount status

Creating an SSIS job to split a column and insert into database

I have a column called Description:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Description/Title |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Liszt, Hungarian Rhapsody #6 {'Pesther Carneval'}; 2 Episodes from Lenau's 'Faust'; 'Hunnenschlacht' Symphonic Poem. (NW German Phil./ Kulka) |
| Beethoven, Piano Sonatas 8, 23 & 26. (Justus Frantz) |
| Puccini, Verdi, Gounod, Bizet: Arias & Duets from Butterfly, Tosca, Boheme, Turandot, I Vespri, Faust, Carmen. (Fiamma Izzo d'Amico & Peter Dvorsky w.Berlin Radio Symph./Paternostro) |
| Puccini, Ponchielli, Bizet, Tchaikovsky, Donizetti, Verdi: Arias from Boheme, Manon Lescaut, Tosca, Gioconda, Carmen, Eugen Onegin, Favorita, Rigoletto, Luisa Miller, Ballo, Aida. (Peter Dvorsky, ten. w.Hungarian State Opera Orch./ Mihaly) |
| Thomas, Leslie: 'The Virgin Soldiers' (Hywel Bennett reads abridged version. Listening time app. 2 hrs. 45 mins. DOLBY) |
| Katalsky, A. {1856-1926}: Liturgy for A Cappella Chorus. Rachmaninov, 6 Choral Songs w.Piano. (Bolshoi Theater Children's Choir/ Zabornok. DOLBY) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Please note that above I'm only showing 1 field.
Also, the output that I would like is:
+-------+-------+
| Word | Count |
+-------+-------+
| Arias | 3 |
| Duets | 2 |
| Liszt | 10 |
| Tosca | 1 |
+-------+-------+
I want this output to encompass EVERY record. I do not want a separate one of these for each record, just one global one.
I am choosing to use SSIS to do this job. I'd like your input on which controls to use to help with this task:
I'm not looking for a solution, but simply some direction on how to get started with this. I understand this can be done many different ways, but I cannot seem to think of a way to do this most efficiently. Thank you for any guidance.
FYI:
This script does an excellent job of concatenating everything:
select description + ', ' as 'data()'
from [BroincInventory]
for xml path('')
But I need guidance on how to work with this result to create the required output. How can this be done with c# or with one of the SSIS components?
edit: As siyual points out below I need a script task. The script above obviously will not work since there's a limit to the size of a data point.
I think term extraction might be the component you are looking for. Check this out: http://www.mssqltips.com/sqlservertip/3194/simple-text-mining-with-the-ssis-term-extraction-component/

How to get the unread mails count in gmail using selenium RC?

I have a requirement to automate the Gmail.Here i need to get the unread mail count of Lables like Inbox,spam,bulk etc.How can i get the count of unread mails using selenium RC.
suppose the Lables as Inbox(5),Spam(10),Bulk(34). it mean that Inbox contains 5 unread mails, Spam contains 10 unread mails.
So For this kind of requirement how can i achieve using Selenium RC?
Thanks & Regards,
Shiva.
I think that using standard IMAP client interface you will be able to get your task done much faster.
See working example in Perl and more official documentation on Mail::ImapClient
String inbox=selenium.getText("//a[contains(#title,'Inbox')]");
Now inbox String variable contains Inbox (1)
String unreadInboxMails=inbox.substring(inbox.indexOf("(")+1,inbox.indexOf(")"));
In this way you can get for all Labels like Spam, bulk etc. only thing you need to change is Label locator .
I hope this will solve your problem.
This is the exact selenese (Selenium IDE) code that gets the unread count of all folders and shows in an alert.
You can use it with Selenium RC by tweaking few commands.
store | //div[#class='LrBjie']/div/div[ | target1
store | ]/div/div/div/span/a | target2
store | 1 | i
store | true | present
store | | countsAll
while | ${present}==true |
storeEval | storedVars['target1']+storedVars['i']+storedVars['target2'] | target
echo | ${target} |
storeText | javascript{storedVars['target']} | counts
storeEval | storedVars['countsAll']+' $ '+storedVars['counts'] | countsAll
echo | ${countsAll} |
storeEval | parseInt(storedVars['i'])+1 | i
storeEval | storedVars['target1']+storedVars['i']+storedVars['target2'] | target
storeElementPresent | javascript{storedVars['target']} | present
echo | ${present} |
endWhile | |
storeEval | javascript{alert(storedVars['countsAll'])} | countsAll
WebDriver gmail = new ChromeDriver();
//Inbox count using xpath. From this output you can separate count from the string 'Inbox(20)'
WebElement inbox = gmail.findElement(By.xpath("//*[#id=':bb']/div/div[1]"));
System.out.println(inbox.getText());

how to increment field value each time selenium test is run?

is there any simple way to increment for example field value by +1 every time Selenium test is run through Selenium IDE?
Command: Type
Target: some kind of id
Value: number+1
EDIT 1 :thanks for a reply krosenvold. i got your idea and this is a simplified version of what i got so far:
...
store | 10 | x
storeEval | storedVars['x'] = ${x}+1 |
...
variable's x value does realy get incremented, but how would you save that value between distinct test runs? is it even possible?
should i get $x value every time the test is run and at the end of it assign $x value to some dummy element on testing page, so i could retrieve that previously incremented value the next time test is run?
Correct Answer
store | 10 | i
store | javascript{storedVars.i++;} | i
echo | ${i}
This is solution for your problem
store | 10 | i
store | javascript{storedVars.i++;}
echo | ${i}
store | 0 | iterator
echo | ${iterator} |
execute script | return parseInt(${iterator}) + 1 | iterator
echo | ${iterator} |
As result will be:
0
1
You can use eval;
eval($('elementId').value = $('elementId').value +1);
The exact syntax I'm showing implies prototype on the client;
document.getElementById('elementId').value should also do the trick in a standard DOM environment.
This worked for me
storeEval | storedVars['nextRow'] = ${nextRow}+1 |