How to search blob files in a date range - azure-storage

I’m using Azure storage explorer to find specific files. The files loaded are very old and short by date looks not showing the correct results.
Is there any way in which I can give date range and it will show me the files in between those date range?

Is there any way in which I can give date range and it will show me
the files in between those date range?
Unfortunately no. Azure Blob Storage has very limited server-side filtering capabilities and filtering by date is not one of them. What you have to do is list all blobs in the container and then apply filtering on the client side.
An alternative would be to import this information in an Azure Cognitive Search Index and there you will be able to perform all kinds of filtering. I wrote a blog post about it long time ago that you may find useful: https://gauravmantri.com/2014/08/25/making-azure-blob-storage-searchable-using-azure-search-service/.

Related

Age Analysis Dynamically Sliced with Before Date

I am trying to create an Age Analysis for Creditors using a dynamic date slicer.
I followed each individual step specified on David Churchward's Blog, but I'm not able to replicate what he suggested there.
Herewith is the result of what I tried:
I'm expecting to see these values each in their own Ageing bucket based on what is outstanding.
Please download my PBIX file to see for yourself, then please advise what I did wrong.
The Excel source for PBIX is also in the folder.
Thank you.
The blog that you're referring is quite old and DAX has changed a lot since then.
Additionally PowerBI now has a in-built feature called binning which can do something similar to what you're looking for.
I was able to generate the below output using that feature which automatically groups the data based on the bin size.
There also a related feature called "Grouping" where you can manually choose the groups and their range. If you're up for it you can use this too. Below is the output for that:
I uploaded the file with these changes in the same folder.
Another resource that might be helpful for you is Radacad's article on dynamic banding

Import data to Google SQL from a gsheet

What is the best way to import data into google sql DB from a spreadsheet file?
I have to import two file with 4k rows each into a db.
I've tried to load 4k (one file) rows using Appscript and the result was:
Execution succeeded [294.336 seconds total runtime]
Ideas?
Code here
https://pastebin.com/3RiM1CNb
Depends a bit on how often you need this done. From your comment "No, this files will be uploaded two times for month in gdrive.", I think you mean 2 times/month.
If you need this done programmatically, I suggest to use a cronjob and have either App Engine or a local machine run that cronjob.
You can access the spreadsheet with a serviceaccount (add it to the users of that spreadsheet like any other user) using libraries (check the quickstarts on the side for your language of preference) and process the data with that. How to actually process the data depends on your language of choice, but that's simply inserting rows into MySQL.
The simplest option would be to export to CSV and import this into Cloud SQL. Note that you may need to reformat this into something Cloud SQL understands, but that depends on the source data in Google Sheets.
As for the error you're getting, you're exceeding the max allowed runtime for App Script, which is 6 minutes...

Creating a test-data container in Azure blob storage

I'm adding some testing to my current project which uses Azure blob storage to store telemetry data coming from a stream analytics job. I want to do testing of the routines that get the telemetry data, so I created a separate container for test data. I downloaded a sample set of data, modified the data to serve my needs and re-uploaded (using Azure storage explorer) everything back into the new container.
The tests were immediately failing and I quickly found out that this is because the LastModified date of the files changed into the date/time of upload. This is fine, but the sequence of the upload was also different. My code uses the modified date of the file to find out which one is the most recent, which would now return a different file based on the new dates.
I found that you cannot modify this property, although you can change another property to have it update. So I know the solution: I could write a quick script which gets the sequence of files from my production instance and then touches every file in the test instance in the same sequence.
But... I was wondering whether this is the best option. I also read it's 'best practice' to store a custom datetime in a separate property, but I don't think I can do that straight from Stream Analytics (which is writing the blobs). I also considered using an Azure Function to do this (new blob => update property), but I'm than adding complexity and something that might fail for whatever reason.
So I'm looking for the best way to solve this problem. Anyone?
Update: this one probably deserves a tiny bit more explanation. Apart from using the LastModified date to sort on, I also use it to filter blobs. The blobs themselves are CSV files containing ASA output data, so telemetry records. Each record has a timestamp, but that information is IN the file. When retrieving data, I don't want to have to dive into each file to find out what the timestamp is of those records. So I use a prefilter to filter out the blobs within a certain timespan, and then only download / open those file to the records inside.
This works perfectly as long as you do not touch any of the blob, but obviously it stops working as soon as any of the blobs gets modified for whatever reason. So I'm now convinced that I need a different / better way to solve this issue; but how?
It seems to me that you have two separate things: the data that you want to store in blob storage and metadata about the blob such as the timestamp. I would create a different (azure) database for the metadata or even simpler just add metadata to the (block)blob:
blockBlob.Metadata.Add("from", dateTime.ToString());
blockBlob.Metadata.Add("to", dateTime.ToString());
blockBlob.Metadata.Add("order", "1");
For sorting I would just add a simple order property.
The comment by #Vignesh deserves the credit here, but in order to get this one marked answer I'll provide it myself.
With ASA, you can set the output to be structured by date/time. That means in this case, data is written to the blob store with a directory structure such as:
2016 / 06 / 27 / 15 / 23 (= 27-06-2016 15:23)
2016 / 06 / 28 / 11 / 02 (= 28-06-2016 11:02)
The ASA output allow you to specify how granular you want the structure to be, in my case I chose to store it by day (so not including a time path). The ASA runtime will now ensure that data from a certain point in time is stored within a blob in that resides in the correct path.
Then I subsequently changed my logic to not use the datetime stamp of the individual blob files any more, but simply read just the files from the folders that are within the timerange I'm interested in. That assures we only get data that was produced within that timerange. And if there's more than one file in a folder, I need to load them both since both were in the same timerange anyway. As long as minutes are enough granularity for you, this works excellent even though it might feel a bit strange to use a folder structure for such a thing.
Having a seperate 'index' for blobs which tracks their datetime would work too of course, but adds complexity which in this case I don't really need.

Deleting rows in datastore by time range

I have a CKAN datastore with a column named "recvTime" of type timestamp (i.e. using "timestamp" as type at datastore_create time, as shown in this link). Example value for this column is "2014-06-12T16:08:39.542000".
I have a large numbers of records in the datastore (thousands) and I would like to delete the rows before a given date in "recvTime". My first thought was doing it using the REST API with the datastore_delete operation using a range filter, but it is not possible as described in the following Q&A.
Is there any other way of solving the issue, please?
Given that I have access to the host where CKAN server is running, I wonder if this could be achieved executing a regular SQL sentence on the Postgresql engine where the datastore is persisted. However, I haven't found information about manipulating the CKAN underlying datamodel in the CKAN documentation, so don't know if this a good idea or if it is risky...
Any workaround or information pointer is highly welcome. Thanks!
You could definitely do this directly on the underlying database if you were willing to dig in there (the structure is pretty simple with tables named after the corresponding resource id). You could even turn this into an API of your own using an extension (though you'd want to be careful about permissions).
You might also be interested in the new support (master only atm) for extending the DataStore API via a plugin in an extension - see https://github.com/ckan/ckan/pull/1725

Automating WebTrends analysis

Every week I access server logs processed by WebTrends (for about 7 profiles) and copy ad clickthrough and visitor information into Excel spreadsheets. A lot of it is just accessing certain sections and finding the right title and then copying the unique visitor information.
I tried using WebTrends' built-in query tool but that is really poorly done (only uses a drag-and-drop system instead of text-based) and it has a maximum number of parameters and maximum length of queries to query with. As far as I know, the tools in WebTrends are not suitable to my purpose of automating the entire web metrics gathering process.
I've gotten access to the raw server logs, but it seems redundant to parse that given that they are already being processed by WebTrends.
To me it seems very scriptable, but how would I go about doing that? Is screen-scraping an option?
I use ODBC for querying metrics and numbers out of webtrends. We even fill a scorecard with all key performance metrics..
Its in German, but maybe the idea helps you: http://www.web-scorecard.net/
Michael
Which version of WebTrends are you using? Unless this is a very old install, there should be options to schedule these reports to be emailed to you, and also to bookmark queries. Let me know which version it is and I can make some recommendations.