Below find example of file which are downloaded to some directory. I got this given file name convention. "SSUP-RX-" is statis no changable.Rest could be changed as username, date and time.
SSUP-RX-admin-2014_12_2-9_16_5_69.csv
What i need to do i have to search for that cain of files in given directory and then extract the date and time from it. What is the best way to search for such cain of files and how to read date/time from it after?
P.S Probably after user name is year, month, day, hour,minutes,seconds, milisends
Well I'm not pretty sure if I got your question correct, I think you are trying to get the name from the file and extract it or clean it up so that for example file name is:
test-2015-01-28-14-15-30-9.csv
So for that the simple way will be retrieve the file name as is and use regex as it has been said to only extract what you want and you can use it as you wish.
Here is how to use regex:
visualbasic.about.com/od/usingvbnet/a/RegExNET.htm
Check out this under check USA Telephone number it similar to what you want:
visualbasic.about.com/od/usingvbnet/a/RegExNET_2.htm
Hope this answer your question.
Related
I have a USQL question. I have a daily job that is outputting files to a directory in the following format
/MyOutput/{YYYY}/{MM}/{DD}/file.csv
I have a second job now that I want to run that will use the most recent 30 files produced by the first job. I can't figure out how to best do this though.
I know I can do wildcards in the extractor but I prefer not to extract all files and then use a select/where to remove the ones I don't want as extracting all files could get really costly if I'm keeping years worth of these files.
So is there a nice way in usql to say extract only the most recent x files? Or what options do I have here?
Thanks,
John
If you use a date pattern it will do what you want.
#rows =
EXTRACT
...,
date DateTime
FROM /MyOutput/{date:YYYY}/{date:MM}/{date:dd}/file.csv;
SELECT * FROM #rows WHERE date > '2018-5-3'
Will read only the files matching the date range - it won't read all of them in first.
I have an extremely large CSV, where each row contains customer and store ids, along with transaction information. The current test file is around 40 GB (about 2 days worth), so partitioning is an absolute must for any reasonable return time on select queries.
My question is this: When we receive a file, it contains multiple store's data. I would like to use the "virtual column" functionality to separate this file into the respective directory structure. That structure is "/Data/{CustomerId}/{StoreID}/file.csv".
I haven't yet gotten it to work with the OUTPUT statement. The statement use was thus:
// Output to file
OUTPUT #dt
TO #"/Data/{CustomerNumber}/{StoreNumber}/PosData.csv"
USING Outputters.Csv();
It gives the following error:
Bad request. Invalid pathname. Cosmos Path: adl://<obfuscated>.azuredatalakestore.net/Data/{0}/{1}/68cde242-60e3-4034-b3a2-1e14a5f7343d
Has anyone attempted the same kind of thing? I tried to concatenate the outputpath from the fields, but that was a no-go. I thought about doing it as a function (UDF) that takes the two ID's and filters the whole dataset, but that seems terribly inefficient.
Thanks in advance for reading/responding!
Currently U-SQL requires that all the file outputs of a script must be understood at compile time. In other words, the output files cannot be created based on the input data.
Dynamic outputs based on data are something we are actively working for release sometime later in 2017.
In the meanwhile until the dynamic output feature is available, the pattern to accomplish what you want requires using two scripts
The first script will use GROUP BY to identify all the unique combinations of CustomerNumber and StoreNumber and write that to a file.
Then through the use of scripting or a tool written using our SDKs, download the previous output file and then programmatically create a second U-SQL script that has an explicit OUTPUT statement for each pair of CustomerNumber and StoreNumber
I want to extract data from multiple file so I am using file set pattern that requires one virtual column. Because of some issues in my data, I also require silent switch other wise I am not able to process my data. It looks like, when I use virtual column with silent switch it does not extract any row.
#drivers =
EXTRACT name string,
age string,
origin string
FROM "/input/{origin:*}file.csv"
USING Extractors.Csv(silent:true);
Note that I can extract data from a single file by removing virtual column. Is there any solution for this problem?
first you do not need to name the wildcard (and expose a virtual column) if you do not plan on referring to the value. Although we recommend that you make sure that you are not processing too many files with this pattern, so best may be to use the virtual column as a filter to restrict the number of files to a few thousand right now until we improve the implementation to work on more files.
I assume that at least one file contains some rows with two columns? If that is the case I think you found a bug. Could you please send me a simple repro (one file that works, and an additional file where it stops working and the script) to my email address so I can file it and we can investigate it?
Thanks!
Apologies, this is kind of a convoluted question. I have a SQL query in a ASP web-page, which is returning a dataset to a webgrid in the page. Looks like so:
Picture of Dataset/Webgrid output in ASP webpage here
I'd like to be able to take the "Community" column and keep the output the same, but make the output into a link to a software client based on the specific Community thats listed. We have a short list of them (maybe 4-5 total) so it'll mean only 4-5 different downloads.
Additionally, I may need to include a field for the OS as we have different downloads per OS (Mac / Windows). I assume if I can get the logic set for one, I can probably repeat that for the other column.
Any ideas on how I could approach this? I'm just not sure how to phrase this question appropriately, but I think this might make it more clear.
Thanks!
what you would need to do is something like
SELECT account, telephone, "<a href='"+communityURL+"'>"+community+"</a>" as CommunityCol, status
FROM myTable
ORDER BY account
... so, assuming the URL is described in communityURL the output you get in the CommunityCol column (from memory you might need to rename it) is a concatenated string containing what you need
Edited outputs: no file names or trailing slashes are included
I have a database with potentially thousands thousands of records (we're talking a 2MB result string if it was just SELECT * FROM xxx in a standard use case.
Now for reasons of security this result cannot be held anywhere for much more processing.
There is a path field where I want to extract all records with each level of folder structure.
So run the query one way I get every record in the root:
C:\
Query again another way I get every record in the first folder level:
C:\a\
C:\b\
etc
Then of course I will GROUP somehow in order to return
C:\a\
C:\b\
and not
C:\a\
C:\a\
C:\b\
C:\b\
hopefully you get the idea?
Any answers that at least move me in the right direction I will be grateful for. I really am stumped where to start with this as downloading every record and processing is far from the ideal solution in my context. (Which is what we do now).
SAMPLE DATA
C:\a\b\c\d
C:\a\b\c
C:\
C:\a\b
C:\g
D:\x
D:\x\y
Sample output 1
C:\
D:\
Sample output 2
C:\a
C:\g
D:\x
sample output 3
C:\a\b
D:\x\y
sample output 4
C:\a\b\c
sample output 5
C:\a\b\c\d
You could do if you have only folders: SELECT DISTINCT path FROM table WHERE LENGTH(path) - LENGTH(replace(path,'\','')) = N
If you have only file names then it depends on whether you have an INSTR function (or some regexp substitution function) provided by the RDBMS. In all cases, depends on the string functions that are available.