Ok - total noob on the topic here so this might not even be a good idea.
I have a Google Colab notebook I use for generating images from a textual prompt, where every image takes about 30 secs.
Since I’d like to generate a several hundred I’d like to have an environment set up on GCloud where I run the same process but with a batch of - say - 800 instead of the usual 5-10, and ideally I start the process, close the connection with the machine and come back the next day to find the results.
This might well be a duplicate - because I have no idea of what to search for.
So: is it a good or bad idea and how do I do this?
Related
Thanks to the help in this forum, I got my SQL-conncection and inserts working now.
The following TR-formula is used to retrieve the data from Excel Eikon:
#TR($C3,"TR.CLOSEPRICE (adjusted=0);
TR.CompanySharesOutstanding;
TR.Volume;
TR.TURNOVER"," NULL=Null CODE=MULTI Frq=D SDate="&$A3&" EDate="&$A3)
For 100k RICs the formulas usually need between 30s and 120s to refresh. That would still be acceptable.
The problem is to get the same refresh-speed in a VBA-loop. Application.Run "EikonRefreshWorksheet" is currently used for a synchronous refresh as recommended in this post.
https://community.developers.refinitiv.com/questions/20247/can-you-please-send-me-the-excel-vba-code-which-ex.html
The syntax of the code is correct and working for 100 RICS. But already for 1k the fetching gets very slow and will freeze completely for like 50k. Even with a timeout interval of 5min.
I isolated the refresh-part. There is nothing else slowing it down. So is this maybe just not the right method for fetching larger data sets? Does anyone know a better alternative?
I finally got some good advice from the Refinitiv Developer Forum which I wanted to share here:
I think you should be using the APIs directly as opposed to opening a spreadsheet and taking the data from that - but all our APIs have limits in place. There are limits for the worksheet functions as well (which use the same backend services as our APIs) - as I think you have been finding out.
You are able to use our older Eikon COM APIs directly in VBA. In this instance you would want to use the DEX2 API to open a session and download the data. You can find more details and a DEX2 tutorial sample here:
https://developers.refinitiv.com/en/api-catalog/eikon/com-apis-for-use-in-microsoft-office/tutorials#tutorial-6-data-engine-dex-2
However, I would recommend using our Eikon Data API in the Python environment as it is much more modern and will give you a much better experience than the COM APIs. If you have a list of 50K instruments say - you could make 10 API calls of say 5K instruments using some chunking and it would all be much easier for you to manage - without even resorting to Excel - and then you can use any Python SQL tool to ingest into any database you wish - all from one python script.
import refinitiv.dataplatform.eikon as ek
ek.set_app_key('YOUR APPKEY HERE')
riclist = ['VOD.L','IBM.N','TSLA.O']
df,err = ek.get_data(riclist,["TR.CLOSEPRICE(adjusted=0).date","TR.CLOSEPRICE(adjusted=0)",'TR.CompanySharesOutstanding','TR.Volume','TR.TURNOVER'])
df
#df.to_sql - see note below
#df.to_csv("test1.csv")
1641297076596.png
This will return you a pandas dataframe that you can easily directly write into any SQLAlchemy Database for example (see example here) or CSV / JSON for example.
Unfortunately, our company policy does not allow for Python at the moment. But the VBA-solution also worked, even though it took some time to understand the tutorial and it has more limitations.
We are having a strange issue with Directory.GetFiles method trying search for a Word Document from a UNC Folder Share (NTFS Disk) on a Win2008R2 VMServer. The share contain over 10K Files in the Parent Folder and 75K Files in a SubDirectory.
It was all working fine in Win2003 Server. When migrated to Win2008R2 Server, the WinForms application freezes over this method and taking almost 13 minutes to Open a single File from a Client machine connected to the File Share via a VPN Network that has Download Speed bandwidth of 1Mbps (not throughput).
After search & research, we realized the Windows Search service was not turned on and the Service was started and the share was indexed. We saw a performance improvement where the time taken to open a file using GetFiles Method came down to 3 Minutes from 13 minutes.
But this is not consistent. During day time when bandwidth is much lower than 1MBPS (say 0.5 MBPS) the time-span to open the document is again between 8-12 minutes.
At this point we are not sure of which one is causing the problem?
Not possible solutions:
1) Creating multiple directories and organizing files.
2) Increasing bandwidth.
3) Using direct filepath instead of Directory.GetFiles/EnumerateFiles
Any help is highly appreciated. Thanks!
Oh yeah, good stuff. You will notice that even if the service is off, running it twice (within a short tiem of each other) will run much faster the second time. Actually, here is a good one for you, run it twice, let the first one run for a minute. The second one will catch up the first one almost immediately and then they will both be at the same spot for the rest of the time. (if what I said makes sense).
Here is what is happening, GetFiles() and GetDirectories() do use the indexing service. Also, if your indexing service is off, this just means it will not automatically get data about the files, but when you access the file (windows explorer / GetFiles) it will index them, so that if you ask for the same thing with a set amount of time, it wont have to query the Hard-Drives' Table-Of-Contents again.
As far as it running faster and slower when the indexing service is on, this is because windows knows it cannot keep track of every file on the computer. Therefore, after a set amount of time, the file is considered stale and the indexing service will do an IO call to get the info to update the index database, when you ask about the file.
This wiki talks about it, just a little. Not very thorough.
I just set up a IPython (0.13.1) Notebook server in order to use it during my introductory course on Python. The server is protected with password and is currently up and running.
Now, I need my students (about 15 people) to access the same ipynb document at the same time, play around with it and eventualy modify the code examples, while making sure anyone overwrites the uploaded version of the ipynb file.
How can I set this up?
First, take a look on teaching with ipython notebook. Try to list what type of applications you want to run on this. On the other hand, it possible to use some cloud computing resources, for example on Heroku.
Is there a plugin/package to display status information for a PBS queue? I am currently running an apache webserver on the login-node of my PBS cluster. I would like to display status info and have the ability to perform minimal queries without writing it from scratch (or modifying an age old python script, ala jobmonarch). Note, the accepted/bountied solution must work with Ubuntu.
Update: In addition to ganglia as noted below, I also looked that the Rocks Cluster Toolkit, but I firmly want to stay with Ubuntu. So I've updated the question to reflect that.
Update 2: I've also looked at PBSWeb as well as MyPBS neither one appears to suit my needs. The first is too out-of-date with the current system and the second is more focused on cost estimation and project budgeting. They're both nice, but I'm more interested in resource availability, job completion, and general status updates. So I'm probably just going to write my own from scratch -- starting Aug 15th.
Have you tried Ganglia?
I have no personal experience but few sysadmin I know are using it.
Following pages may help,
http://taos.groups.wuyasea.com/articles/how-to-setup-ganglia-to-monitor-server-stats/3
http://coe04.ucalgary.ca/rocks-documentation/2.3.2/monitoring-pbs.html
my two cents
Have you tried using nagios: http://www.nagios.org/ ?
I have thousands of small CSV files I want to aggregate (with a little munging in-script first). They are on a NAS device, a "SNAP" Server to be more exact. I've had some success with VBA from Excel - doing about 700 files in about a minute, if I recall (was a month ago). Actually, it was half-success: the snap server is home to 80% pdfs and some proprietary-format files and only 20% CSVs. The Loop to test for filetype took the execution time north of 2 hours and the script apparently completely ignored date filtering I put in. The quick result or 'success' was on 700 copies of the CSVs I made and put on my C drive. I've been doing VBA scripting for almost 20 years, and I think I'm decent at it; I do a lot of CSV reading and writing from VBA the last 9 years. So my question is more about your experience with snap servers or NAS generally.
Can I not treat the snap server more or less like any drive/folder with VBA?
Would VBScript be more appropriate? (already using FileSystemObject, after all)
If I can use VBS can I store the script on the NAS and run it using taskscheduler?
I'd appreciate any tips or gotchas from you folks who have experience with snap servers!
Some thoughts on the choice of language:
VB Script is more lightweight than VBA in that it does not require MS Office to be installed. The syntax is similar so there is no real productivity difference.
Moving forward Powershell would be strongly recommended for Windows system admin tasks, general text file processing, etc.
Some thoughts on using the NAS server:
a) If running your script on a workstation you should be able to use a URI string \\myserver\myshare to connect to a share on the NAS. If not you may need to map a drive letter to that share before your script runs.
b) If you want to run your script on the NAS there are 2 things to consider: is the NAS OS locked so that you may not add your own scheduled task and is it Linux or some flavor of Windows. Many NAS products use embedded Linux so running a VBA or VBScript solution directly on the NAS may not work unless it is based on something like Embedded XP and you have access to Scheduled Tasks, etc.
Hope this helps...