How to download a lot of data from Bloomberg? - bloomberg

I am trying to download as much information from Bloomberg for as many securities as I can. This is for a machine learning project, and I would like to have the data reside locally, rather than querying for it each time I need it. I know how to download information for a few fields for a specified security.
Unfortunately, I am pretty new to Bloomberg. I've taken a look at the excel add-in, and it doesn't allow me to specify that I want ALL securities and ALL their data fields.
Is there a way to blanket download data from Bloomberg via excel? Or do I have to do this programmatically. Appreciate any help on how to do this.

Such a request in unreasonable. Bloomberg has dozens of thousands of fields for each security. From fundamental fields like sales, through technical analysis like Bollinger bands and even whether CEO is a woman and if the company abides by Islamic law. I doubt all of these interest you.
Also, some fields come in "flavors". Bloomberg allows you to set arguments when requesting a field, these are called "overrides". For example, when asking for an analyst recommendation, you could specify whether you're interested in yearly or quarterly recommendation, you could also specify how do you want the recommendation consensus calculated? Are you interested in GAAP or IFRS reporting? What type of insider buys do you want to consider? I hope I'm making it clear, the possibilities are endless.
My recommendation is, when approaching a project like you're describing: think in advance what aspects of the security do you want to focus on? Are you looking for value? growth? technical analysis? news? Then "sit down" with a Bloomberg rep and ask what fields apply to this aspect. Then download those fields.
Also, try to reduce your universe of securities. Bloomberg has data for hundreds of thousands of equities. The total number of securities (including non equities) is probably many millions. You should reduce that universe to securities that interest you (only EU? only US? only above certain market capitalization?). This could make you research more applicable to real life. What I mean is that if you find out that certain behavior indicates a stock is going to go up - but you can't buy that stock - then that's not that interesting.
I hope this helps, even if it doesn't really answer the question.

They have specific "Data Licence" products available if you or your company can fork out the (likely high) sums of money for bulk data dumps. Otherwise, as has been mentioned, there are daily and monthly restrictions on how much data (and what type of data) is downloaded via their API. These limits are not very high at all and so, by the sounds of your request, this will take a long and frustrating time. I think the daily limits are something like 500,000 hits, where one hit is one item of data, e.g. a price for one stock. So if you wanted to download only share price data for the 2500 or so US stocks, you'd only managed 200 days for each stock before hitting the limit. And they also monitor your usage, so if you were consistently hitting 500,000 each day - you'd get a phone call.
One tedious way around this is to manually retrieve data via the clipboard. You can load a chart of something (GP), right click and copy data to clipboard. This stores all data points that are on display, which you can dump in excel. This is obviously an extremely inefficient method but, crucially, has no impact on your data limits.
Unfortunately you will find no answer to your (somewhat unreasonable) request, without getting your wallet out. Data ain't cheap. Especially not "all securities and all data".

You say you want to download "ALL securities and ALL their data fields." You can't.
You should go to WAPI on your terminal and look at the terms of service.
From the "extended rules:"
There is a daily limit to the number of hits you can make to our data servers via the Bloomberg API. A "hit" is defined as one request for a singled security/field pairing. Therefore, if you request static data for 5 fields and 10 securities, that will translate into a total of 50 hits.
There is a limit to the number of unique securities you can monitor at any one time, where the number of fields is unlimited via the Bloomberg API.
There is a monthly limit that is based on the volume of unique securities being requested per category (i.e. historical, derived, intraday, pricing, descriptive) from our data servers via the Bloomberg API.

Related

How can blockchains be used in audit trails?

I'm currently trying to figure out how to use blockchain in audit trails and potentially in accounting (and if they actually make sense). Both Deloitte and EY mention them.
I somehow cannot understand how this could be of benefit for audits and/or accounting.
To my understanding to make use of the power of blockchains you need multiple users. Only one user means you cannot validate the integrity since all blocks of that user could be compromised (if one block of a blockchain of a user got changed maybe also all of the following where changed, making it impossible to detect the modification). This means blockchains only make sense if you can share them with different users?
Data and thus blockchains however aren't always shared between multiple users. In accounting you often only have one "user"/"owner" of the data. Sure you could create multiple users in one company but there wouldn't be any benefit since they are in one location (company) and potentially all compromised. Or if the admin want's to change something he could easily modify all users making it useless for audits.
To make it work you would need different partners (supplier/customer) to share the information with. In that case you could however only have two users share the same blockchain (depending on legal regulations in your country) and then again who do you trust if one of the two doesn't validate?
Deloitt mentions that they can be used for files. Again I don't see the benefit since you would need multiple users AND files might get compressed with a different algorithm over time rendering them invalid (the useful information didn't change but the block will still be invalid). Or is this a not an issue from your experience? To me it seems it could be a problem.
The same goes for all the internal data which may be important for audits from my point of view. Which company would like to share the information with independent users. Or is it only intendet for "public"/"shared" data?
To identify a modification of one block in a blockchain the user would have ot validate every single block (every hash in the header of a block needs to be compared to the data of the previous block). In terms of accounting a blockchain could be all transactions of one account during one fiscal year. This however could easily be thousands of transactions. Wouldn't this be very slow to validate?
Maybe I'm misunderstanding the point in terms of audit trails but as long as the users are not independent data can always be modified making it useless for audits. And you need a critical mass to share the blockchain with.
First of all, I think that it's neccesary to get the power of Blockchain. It gives us the chance to create descentralized data bases, i.e. data bases that are not controled by an authority. Also, the data of Blockchain is immutable and permanent, i.e. it can not be modified or deleted. Thanks to it you achive a unique descentralized registry in a distributed network, for example for audit trails.
It's true that it has no sense if you use it inside your company. But if you use it among different companies? Each one could encode its data, so the rest of the companies couldn't see it. However, all the data would be stored in all the companies, so anyone couldn't change it. Moreover, you can have more than one user (node) for each company.
Nowadays, there are many implementations of Blockchain, each one with a different objetive. To understan better the power of Blockchain, I suggest you to wathc the video were is explained the new version (the v 1.0) of the Hyperledger Fabric.

Better way to compare distance between multiple users

I've been using the google distance matrix API, and so far it works very well! However, I've noticed that google put a limit of 1000 requests per day. I am making a program for a website with about 1000 users, and the way I plan to search through all the users would lead to a lot of requests every time someone searched for nearby people.
I was planning to save everyone's location in a database, and then when someone entered their own location, it would compare that location to each and every one in the database. This would lead to potentially hundreds/thousands of requests every time someone searched.
So would there be a better way to do this?
This sounds like a heavy request method you currently doing. And even if you buy the commercial solution I fear you will loose money very quickly.
Instead I suggest that you host a routing server on your own (e.g. GraphHopper where I am the author) and then you can query this until the CPU melts. Also for only a city the hardware requirements are very low.

Data warehouse, data update strategy with Bigquery

We have a MIS where stores all the information about Customers, Accounts, Transactions and etc. We are building a data warehouse with BigQuery.
I am pretty new on this topic, Should we
1. everyday extract ALL the customer's latest information and append them to a BigQuery table with timestamp,
2. or we only extract the updated customer's information on that day?
First solution uses a lot of storage and takes time to upload data, and got lots of duplicates. But it's very clear for me to run query. For 2nd solution, given a specific date how can I get the latest record for that day?
Similar for Account data, an example of simplified Account table, only 4 fields here.
AccountId, CustomerId, AccountBalance, Date
If I need to build a report or graphic of a group of customers' AccountBalance everyday, I need to know the balance of each account on every specific date. So should I extract each account record everyday, even it's the same as last day, or I can only extract the account when the balance changed?
What is the best solution or your suggestion? I prefer the 2nd one because there are no duplicates, but how can I construct the query in BigQuery, will performance be an issue?
What else should I consider? Any recommendation for me to read?
When designing DWH you need to start from business questions, translate them to KPIs, measures, dimensions, etc.
When you have those in place...
you chose technology based on some of the following questions (and many more):
who are your users? in what frequency and what resolutions they consume the data? what are your data sources? are they structured? what are the data volumes? what is your data quality? how often your data structure changes? etc.
when choosing technology you need to think of the following: ETL, DB, Scheduling, Backup, UI, Permissions management, etc.
after you have all those defined... data schema design is pretty straight forward and is derived from "The purpose of the DWH" and your technology limits.
You have pointed out some of the points to consider, but the answer is based of your needs... and is not related to specific DB technology.
I am afraid your question is too general to be answered without deep understanding of your needs.
Referring to your comment bellow:
How reliable is your source data? Are you interested in the analyzing trends or just snapshots? Does your source system allow "Select all" operations? what are the data volumes? What resources does your source allow for extraction (locks, bandwidth, etc.)?
If you just need a daily snapshot of the current balance, and there are no limits by your source system,
it would be much simpler to run a daily snapshot.
this way you don't need to manage "increments", handle data integrity issues and systems discrepancies etc. however, this approach might have undesired impact on your source system, and your network costs...
If you do have resources limits, and you chose the incremental ETL approach, you can either
create a "Changes log" table and query it, you can use row_number()
in order to find latest record per account.
or yo can construct a copy of the source accounts table, merging
changes everyday to an existing table.
each approach has its own aspect of simplicity, costs, and resource consumption...
Hope this helps

How should data be provided to a web server using a data warehouse?

We have data stored in a data warehouse as follows:
Price
Date
Product Name (varchar(25))
We currently only have four products. That changes very infrequently (on average once every 10 years). Once every business day, four new data points are added representing the day's price for each product.
On the website, a user can request this information by entering a date range and selecting one or more products names. Analytics shows that the feature is not heavily used (about 10 users requests per week).
It was suggested that the data warehouse should daily push (SFTP) a CSV file containing all data (currently 6718 rows of this data and growing by four each day) to the web server. Then, the web server would read data from the file and display that data whenever a user made a request.
Usually, the push would only be once a day, but more than one push could be possible to communicate (infrequent) price corrections. Even in the price correction scenario, all data would be delivered in the file. What are problems with this approach?
Would it be better to have the web server make a request to the data warehouse per user request? Or does this have issues such as a greater chance for network errors or performance issues?
Would it be better to have the web server make a request to the data warehouse per user request?
Yes it would. You have very little data, so there is no need to try and 'cache' this in some way. (Apart from the fact that CSV might not be the best way to do this).
There is nothing stopping you from doing these requests from the webserver to the database server. With as little information as this you will not find performance an issue, but even if it would be when everything grows, there is a lot to be gained on the database-side (indexes etc) that will help you survive the next 100 years in this fashion.
The amount of requests from your users (also extremely small) does not need any special treatment, so again, direct query would be the best.
Or does this have issues such as a greater chance for network errors or performance issues?
Well, it might, but that would not justify your CSV method. Examples and why you need not worry, could be
the connection with the databaseserver is down.
This is an issue for both methods, but with only one connection per day the change of a 1-in-10000 failures might seem to be better for once-a-day methods. But these issues should not come up very often, and if they do, you should be able to handle them. (retry request, give a message to user). This is what enourmous amounts of websites do, so trust me if I say that this will not be an issue. Also, think of what it would mean if your daily update failed? That would present a bigger problem!
Performance issues
as said, this is due to the amount of data and requests, not a problem. And even if it becomes one, this is a problem you should be able to catch at a different level. Use a caching system (non CSV) on the database server. Use a caching system on the webserver. Fix your indexes to stop performance from being a problem.
BUT:
It is far from strange to want your data-warehouse separated from your web system. If this is a requirement, and it surely could be, the best thing you can do is re-create your warehouse-database (the one I just defended as being good enough to query directly) on another machine. You might get good results by doing a master-slave system
your datawarehouse is a master-database: it sends all changes to the slave but is inexcessible otherwise
your 2nd database (on your webserver even) gets all updates from the master, and is read-only. you can only query it for data
your webserver cannot connect to the datawarehouse, but can connect to your slave to read information. Even if there was an injection hack, it doesn't matter, as it is read-only.
Now you don't have a single moment where you update the queried database (the master-slave replication will keep it updated always), but no chance that the queries from the webserver put your warehouse in danger. profit!
I don't really see how SQL injection could be a real concern. I assume you have some calendar type field that the user fills in to get data out. If this is the only form just ensure that the only field that is in it is a date then something like DROP TABLE isn't possible. As for getting access to the database, that is another issue. However, a separate file with just the connection function should do fine in most cases so that a user can't, say open your webpage in an HTML viewer and see your database connection string.
As for the CSV, I would have to say querying a database per user, especially if it's only used ~10 times weekly would be much more efficient than the CSV. I just equate the CSV as overkill because again you only have ~10 users attempting to get some information, to export an updated CSV every day would be too much for such little pay off.
EDIT:
Also if an attack is a big concern, which that really depends on the nature of the business, the data being stored, and the visitors you receive, you could always create a backup as another option. I don't really see a reason for this as your question is currently stated, but it is a possibility that even with the best security an attack could happen. That mainly just depends on if the attackers want the information you have.

Bloomberg API request limit

Is there anyway to determine how many requests or how much data you have in your remaining request limit amount for Bloomberg API?
from Bloomberg HelpDesk on April 2014 (this is valid for a basic desktop client):
We have 3 kind of limits..
You can have no more that 3500 real time fields open at the same time.
If you exceed this limit you will see "NA Limit" as error message and
you just need to delete some securities/ fields in order for the error
message to disappeared and to see the values.
We have also a daily limit. The Daily API limit is 500,000 hits/per
day. A "hit" is defined as one request for a single security/field
pairing. Therefore, if you request static data for 5 fields and 10
securities, that will translate into a total of 50 hits. so try to
refresh just the portion of the spreadsheet that really needs to be
refreshed and avoid refreshing it all or reopen it many times a day.
The last limit is a monthly limit. Our monthly limits comes from a
proprietary model. Only about 0.4% of our user database ever goes over
this limit. This limit is based on unique securities and depends on
the type of data being downloaded. For example some of the data on the
system such as intra-day is valued a little bit higher than historical
end of day for any given list of securities. We do not recommend more
than 5000 to 7000 unique identifiers per month and the limit upgrade
will only allow you to get data to complete your project. Once a
security is used once in a month then if you use it again it will not
count again towards the monthly limit.
We normally grant 2 resets per month in case you exceed your daily
limit and if you exceed your monthly limit we grant 1 extension per
month (10% more), if you breach the limit again you will then need to
wait for the midnight for the daily limit to be reset automatically or
the end of the month for the reset of the monthly.
Bloomberg do not state what the explicit limits are, and there is no programmatic way of finding out what the limits are or what proportion of your limits you have used.
The best information from Bloomberg that I have found is on the WAPI page (in the terminal). On the menus on the LHS, go to WAPI Home > API Resources > API Data Limits. There are two pages, 'Extended Rules and Usage Limits' and 'Managing Your API Data Limits' that shed some further light on the matter.
Broadly speaking... there is a daily limit of individual data requests (i.e. security/field pairs - but duplicates are counted for each request). However, your limit for subscriptions is based on the number of securities you are subscribing to concurrently - i.e. if you expect to be requesting the price of a security every 5 mins, you are much better off subscribing to that security's price. Then there is a monthly limit that is based on the number of unique securities that you are making requests for.
there is an upper limit on Bloomberg API, 500,000 hits per day.
-- information from Bloomberg Help Help
The daily limit is clearly stated - it is the monthly limit that is not to my knowledge disclosed in writing. I have been told the following in the context of discussions about Data Licence, which is one Bloomberg product for bulk data subscription. The monthly limit is expressed as a budget in $, and it is the equivalent price for your requests, priced under the Data Licence schema, which clearly is not secret if you enquire about that product. So why the secrecy about the budget? The reason it is commercially sensitive is that this budget is many times the monthly cost of the Terminal Licence, so clearly if you (a) know what it is and (b) either have access via API to the budget spent (nope) OR write software to 'count the cost' (not hard), then you could pony up a couple of terminals and vastly reduce your Data Licence spend. Bloomberg naturally frown on this sort of activity because it represents an arbitrage opportunity in their pricing model and it is not really 'playing nice'. They likewise do not like if you hit 'the wrong kind of data' too often or the monthly limit at all often, and these activities may prompt them to investigate your business model to be sure you are in compliance with all the T&C of the Data Addendum. Out of courtesy to Bloomberg I am not posting that budget number here, but you should be able to get it from your salesperson and confirm the validity of what I have said, because it may change at any time as it is not part of any contract.
I don't believe this is possible programmatically, however if you speak to the Bloomberg helpdesk they will be able to tell you whether you are near the limit, and reset it for you if necessary. Obviously they will only do that a certain number of times. I have not managed to get a definitive answer as to what the limit is, but it's designed to be large enough that you would not hit it just running spreadsheets, which have a limit of 3500 Bloomberg real-time formulas.
If you feel the download limit is not breached but you still get the error message, you can run the following steps to solve the issue:
Close Excel completely.
From the Windows "Start" menu, select All Programs > Bloomberg > Stop API Process. A command prompt window appears.
Press <Enter> to close the window.
From the Windows Start menu, select All Programs > Bloomberg > API Environment Diagnostics.
Click the Start button.
When the test is complete, if there are any red errors, click the "Repair" button.
Re-open Excel and test a formula.
500'000 data points is the approximate daily limit, however remember different types of data use up varying amounts. It is not 1 for 1. Typically requests for esoteric securities and fields will use up more data per request, than PX_LAST for AAPL US for example. Also there are different types of request, such as reference or historic, which will also consume your limit differently.
If you are requesting intra-day realtime data, these fields are typically not charged to your usage limit. Rather you have limits on how many times the realtime 'pipe' can be opened.
Bloomberg are typically very helpful at resetting your monthly data usage limit should you exceed it on an adhoc basis. This is not written company policy, but seems to be part of their customer care. If you are persistently breaching limits each month, they are likely to stop resetting your limits and try to move you to B-PIPE. But otherwise for my experience they are flexible