Downloading and Formatting Web Data with Excel VBA

Downloading and Formatting Web Data with Excel VBA - vba

I need help creating a VBA macro that downloads closing price data directly from Yahoo Finance's Historic Quotes website and imports the data into an Excel spreadsheet column.
Background information:
This is the link to Yahoo Finance's Historical Quote database -
To download as a TXT file:
http://ichart.finance.yahoo.com/table.txt?s="StockTicker"&d="EndingMonth"&e="EndingDay"&f="EndingYear"&g=d&a="EndingMonth"&b="EndingDay"&c="EndingYear"&ignore=.txt
Formatting Issue:
By default, i.e. using Excel's Web Data Import Wizard, Excel imports the entire table which includes more columns than needed. I am trying to isolate the "close" column. I created a macro that formats the table to isolate the "close" column, but this macro requires me to manually download the data from Yahoo Finance as a txt file:
Sub Test_DownloadTextFile()
Dim text As String
'The variables for this URL (in light blue) should be retrieved from the spreadsheet, i.e. "StockTicker" will reference a cell with the ticker symbol "AAPL" in it and that will appear in the URL'
text = DownloadTextFile("http://ichart.finance.yahoo.com/table.txt?s="StockTicker"&d="EndingMonth"&e="EndingDay"&f="EndingYear"&g=d&a="EndingMonth"&b="EndingDay"&c="EndingYear"&ignore=.txt")
'At this point I should have the historical quotes table stored in the variable text. How do I select the 4th column and import it into a specific spreadsheet column?'
Debug.Print text
End Sub
How can I create a macro that:
1. Refers to the spreadsheet for key variables, e.g. "StockTicker", "EndingMonth", etc..
2. Downloads the corresponding historic data from Yahoo Finance
3. Imports the data closing price data as a single column into the spreadsheet
I would very much appreciate a practical solution to this problem. Let me know if I need clarify my question or the task at hand. Thank you!

Suggestion: this seems to be the perfect case for a Web Query.
Do you have any reason not to use that ? You can copy just the columns you need afterwards.
You did not specify you Excel version, but on 2003 it's on Data/Import External data.

Check out this SO discussion. Several suggestions that seem worthwhile.

Related

Web Scrape Daily Treasury Rates

I have tried to write some VBA that scrapes a table from the URL below. I would like to create a macro that pulls in the data on a daily basis. Any help is much appreciated. Thank you
https://home.treasury.gov/resource-center/data-chart-center/interest-rates/TextView?type=daily_treasury_yield_curve&field_tdr_date_value_month=202212
I am having problems understanding HTML and the elements function.

I know you want to scrape that table on the treasury page, but it just has daily values of rates for each maturity duration (2yr, 5yr, etc.). These same rates are published every day in their XML feed.
The XML feed is meant to be consumed and parsed by computer code, while the HTML on that page is NOT meant to be parsed by "screen-scrapers".
If your macro pulled in the data from their XML feed every day instead, and you kept a table/log/whatever of each day's rates, after two weeks, you'd have the exact same data as the treasury HTML table, but NOT have all the headaches of trying to parse HTML and always worrying about your code breaking if their HTML layout changes.
Would this later approach work for you?
Update: Eric Salazar's answer reminded me of something. I don't know if you're using Excel VBA, or some other Office app, but if you ARE using Excel, you can use the "'Data Menu'->Get Data->From Other Sources->From Web" feature to import that treasury table into an Excel table. Simply copy the URL of that page, paste it into the URL field of the first dialog that pops up (keep type as "Basic"), click OK, click "Table 0" under Display Options of the next ("Navigator") dialog, and lastly click "Load". By default it will create a table in the current worksheet at the currently selected cell. It will contain all the columns of what's on the treasury page and a few more columns that will be filled with "N/A" (not sure where Excel is getting the N/A field names from). You can get rid of those fields by going into the Power Query Designer before importing the data (instead of clicking "Load" in the previous "Navigator" dialog, click "Transform Data" instead. This will open up the "Power Query Editor" and you'll see a table editor in the main window. Select the column headers you don't want to end up in the table and hit the "Del" key (or right-click and choose "Remove Columns"). The selected columns will now be gone. Now click "Close and Load" in the upper-left corner of the Power Query Editor window.
And like magic, Excel will create a new table on your current worksheet with the data contained on that treasury page. You can then access the table with regular VBA code to refresh and read the values.

How to create a macro in word to import multiple tables from excel?

Frequently in my job I need to generate reports with lots of tables of inputs and results. Especially for the result tables, one change in analysis may require editing a dozen spreadsheets. I'd like to create a macro in word that pulls in data from a spreadsheet, with each table on it's own tab, so that if I update any of those tables in excel the word document tables will also update. Given the number of tables/data points, I don't want to have to tell the macro to pull each single data point. The aim would be to reduce time and errors from manual entry.
I'm thinking this would involve the following steps, but not sure how to go about them:
1) Define the name/size for each table in word with matching name/size in excel
2) Tell the macro to pull the data into a table format
I'm not sure if this is possible as so far I've only seen how to insert a caption or a text box, not insert or update entire tables. Any help would be greatly appreciated!

Depending on what you're doing, you may not even need any VBA code.
If you copy a range from Excel and paste it into Word using Paste Special with the 'paste link' option, any subsequent changes in the Excel range will automatically be reflected in the document when the workbook is saved. And, if you name the range in Excel before copying/pasting, the Word content will expend/contract to reflect changes in the named range's scope in Excel. A variety of paste formats is supported.
Alternatively, you might use a DATABASE field in Word.

Formatting Data in excel sheet with blue prism

I'm trying to run a duplicate check In which varying data is pulled from a website and compared to a master list, the master list being stored in Excel. The information from the website is read from a table in which has line breaks. These breaks are translated over to the data collection they are initially stored in. Some of the data from the website us eventually written to the master list in Excel. So when I read the master list back into Blue Prism to run a duplicate check, the rows that have line breaks are written into a collection as multiple rows (ex. I should have on 7 rows in my collections but am getting 42). Since the rows are not EXACTLY the same between the 2 collections, when it runs the automation does not recognize the duplicates.
The easiest way to solve this would be if I could make the collection rows have no line breaks as soon as the data is read. I've attempted to use the calculation stage to do so with no luck. I'm not sure if it is actually possible to do this, but would appreciate any direction.

Record an Excel macro to do the data sorting/cleaning in Excel (possibly Text To Columns, etc..) and then include the running of the macro as part of your Blue Prism process by using an action stage and the MS Excel VBO - Run Macro. Get the process to create an Excel instance (and create a handle data item from that stage), then use Open Workbook (whatever workbook you store your Macro in) and then use the MS Excel VBO - Run Macro (use the same handle created earlier and type in the name of the "macro").

It sounds like what is happening is that the MS Excel VBO is grabbing the data from the Excel Worksheet wholesale.
This is to say that it's accessing your Worksheet table, copying the cell values BUT not the cell formatting data, and then dumping the values into a BP collection.
Since it did not bring along any of the original cell formatting data to reference when it went to populate the collection it's just breaking up the values based on crturn/line breaks. Thus, your collection is organized based on that, and not on the original Worksheet cell.
So, with that said, on to a solution!
Solution 1
Brute force the organization of the incoming Excel cell data to the collection by looping over the Excel Worksheet cell-by-cell.
Run a loop, and in that loop have BP go into the Excel Worksheet and grab the first populated cell it comes across. Run a formatting/cleanup Calculation stage over the data. Dump the cell value into a single collection field.
Repeat.
This is...inelegant, expensive at best, and not at all recommended for any medium to large dataset. But it's definitely the best way to do string manipulation and value comparisons before it hits your collection. Since it sounds like your using a Master template then you as-well know what the expected format of your data should be.
This method will enable you implement Trim(), Concat(), or Split() in a Calculation stage to better organize your incoming data before you dump it into a collection.
This is also basically what I think you're already trying to do, but cell-by-cell instead of Worksheet row-by-row or table-by-table.
Solution 2
Clean up the table data you grab from the website before you dump it into the Excel Worksheet.
This is basically Solution 1, but in reverse. Simply format/cleanup your data before it hits you Excel Worksheet.
I'm not sure this is any better than Solution 1, but, you know, it's something...
Solution 3
Format the cell data IN the MS Excel Worksheet itself.
Basically rearrange the cells and cell data in the Excel Worksheet into a more predictable format by using the Split, Trim, Merge, or other actions included in the MS Excel VBO. You can also do this using the Data - OLEDB utility object, but that requires some pretty solid understanding of SQL syntax.
This would look like this using the MS Excel VBO:
Grab the Excel Worksheet data wholesale and dump into a collection
Count the rows/fields of the collection
Is that number consistent with the desired/expected format of your data?
If not, have the bot go back into the Excel Worksheet and reformat the cells by removing any carriage returns/line breaks/whatever else
Repeat.
However, I'm always reluctant to reformat any original source, as it's then hard to figure out what wrong and where it went wrong when you've changed the original structure of your data. So it's best to always make a copy of the Worksheet before you make any manipulation.
Unfortunately I don't have access to my BP environment at the moment or I'd provide you with the act object actions you'd need to do any of this, my bad. Once I do I'll update this answer.

Transform and load a large CSV to multiple worksheets in one Excel file

Back Story:
NEW PROJECT FROM MANAGEMENT: I have been given a soft project from my boss to evaluate one of our current ETL plans to look for room for improvement in the process, and I am looking for guidance.
MOTIVE: Excel is currently being used and crashes quite often during the process due to file size.
TASK: Every month an analyst receives a large csv file from a survey vendor containing up to 750 columns (not all unique names) with over 15,000 rows to simply transform a large csv file into an excel file with seven worksheets broken up based on the column headings in the csv. Details of how it is broken up is below.
My question is one large csv being transformed into an edited excel file with multiple worksheets any easier or quicker using VB.NET and VS2010 or VBA for that matter, or would using Excel be the simplilest way to continue this process? I am an Expert Excel user but I am still very much a beginner to intermediate at coding in VBA, VB.NET or any other language.
Detailed Question:
I am open to using free or open source software, but I am most familiar with VB.NET and Excel and Excel-VBA. I have played around a bit coding a simple windows form application to load the csv into a datatable using similar TextFieldParser code found here. I have thought of loading it into an array or even a 2d array to more easily edit the column headings and find the duplicate column headings. The datatable option still leaves me with more questions than answers because I need unique column headings and not sure if I should bother with a datatable if I'm going to just write an excel file right away. I tried CSVreader from CodeProject won't work on files with duplicate header names. I feel as though I am having writers block as I am not sure which direction I should take handle such a process. Any input you can provide will be much appreciated, and I apologize if this question does not have a single and clear best answer, Thanks.
Current Analyst tasks using excel
The current analytical plan has said analyst to open the csv in excel, insert a row above row 1 and use a vlookup to replace the 'New' column names with the 'Old' column names based on a simple two column lookup table on a separate worksheet. For example
New becomes Old
"org-name" becomes "org_name" or
"item_1_Vendor" becomes "item_1" or
"date-created_Survey" becomes "date_created"
etc...checking all sent "New" columns against the list of all possible 750 columns.
Then they paste values of the first row and then delete the 2nd row which contained the New headings we want to change.
Then the analyst has to fix the primary key on the file which is called "sid".
The Survey ID field (sid) should have a number for each row of the data file. Sometimes the sid shows up under the sid_HCAHPS or the sid_CGCAHPS fields instead.
The analyst would insert a column next to the "sid" field and put a formula in it like this, for example:
=IF(BE2<>"",BE2,IF(RD2<>"",RD2,IF(UH2<>"",UH2,"")))
Actual cell references would change but in the example excel formula,
"sid"=Range("BE2")
"sid_HCAHPS"=Range("RD2")
"sid_CGCAHPS"=Range("UH2")
Once the newly created primary key column is made and filled without blanks, we can delete the original "sid" column.
The next step is to check the columns because there may be a redundant HCAHPS section of columns (due to a second survey being sent and then returned- coded as Wave 2), delete second set of columns "sid_HCAHPS" through "language"
Next is the largest alteration because we have setup a system where we send this information to our database admins in the form of a seven worksheet excel file to be loaded by an MS Access Query that creates a table from each sheet that gets loaded into our proprietary business intelligence software. All Done!!

Is your question, "can VB.net automate our current analyst tasks?" -If so, then yes.
You could use the streamreader class to get data from your csv
(http://msdn.microsoft.com/en-us/library/system.io.streamreader.aspx)
Then store it either in an array as you mentioned or use the *list class
(http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx)
Once you've got all your data stored you'll need to automate excel, this is quite straight forward but here's a link to get you started with that as well: http://support.microsoft.com/kb/301982/en-gb
With the list class you can create a list of custom objects using either classes or structures. eg.
We define a structure:
Structure rowOfData
Public intPrimaryKey as Integer
Public strIceCreamName as String
Public decPrice as Decimal
End Structure
We can then create a rowOfData and add properties to it:
Dim iceCream1 as rowOfData
iceCream1.intPrimaryKey = 1
iceCream1.strIceCreamName = "Mr Whippy"
iceCream1.decPrice = 0.99
We create a list with:
Dim listOfIceCreams as New List(of rowOfData)
And add to it like this:
listOfIceCreams.Add(iceCream1)
listOfIceCreams.Add(iceCream2)
etc.
And access the members of the list like this:
listOfIceCreams(0).decPrice 'gives us the price of the ice Cream that was added to the list first.
There are also a lot of other useful methods that lists have which arrays don't. You could have a look through that msdn list class link to see if anything jumps out at you that you might need

What causes Excel export from SQL Server Reporting Services to produce an abnormally large file?

We have are relatively simple Reporting Services report that our users commonly export to Excel. I've noticed that the files produced by the Excel export seem unusually large. If I open one of these files and just click save, without making any changes, the file size reduces to about half of it's previous size. Has anyone else run into this and is there a known workaround?

You've mentioned that the report is relatively simple, but this is important to check. The export to Excel will go to extraordinary lengths to try and maintain how your report looks.
If you have lots of different borders or colours (particularly if different formatting is determined by the data in your report) this will bloat the file.
Also check if many columns with very small and unusual sizes are created in the exported worksheet. The export does this to try and match alignment in Excel with the original report.
Try recreating your report as a basic table with no formatting or headers/footers and see if you can reproduce the problem. If Excel's behaviour is acceptable then add each piece of formatting back until it goes awry. Please let us know what you find.

I don't have an immediate solution, but a common problem in Excel is files bloating because one/some/all of the worksheets have saved all 64K rows instead of the ones being used. The fix in Excel is to select all the lower rows not being used, and delete them, then save the spreadsheet, close and reopen. Therefore, I'd pursue the angle of extra rows being saved in the export, and see if there is a way to keep this from happening.

What tool are you using when exporting to Excel?
I have also managed to reduce # of rows in my Excel worksheet by copying it to another worksheet, then deleting the original sheet.
You could also try copying only the data in your worksheet, and paste it into a new Excel Workbook (file).

I had the issue where the exported Excel files took and extremely long to open and they would stop responding every time you clicked on a cell.
Also, extra and merged columns would appear in exported excel files.
The fix was to make sure my header text boxes lined up with the beginning and end of columns in the data table just below it. Once both were aligned, there were no more extra columns in the exported excel spreadsheets and the performance was back to normal.
Here's the reference that helped me understand the issue:
http://www.codegur.press/12747988/issue-report-export-to-excel-in-rdlc-report

May be I am answering your question very late. Here's the solution for exporting to CSV.
You need to give the a name that you want to see as a column header for the field (not the column name) in the designer.
By default all the text headers are exported as a separate columns along with the table columns and make sure that you name the Design name in the properties with the name you want to see.
The other important thing to note about the option DataElementOutput which is set to Auto meaning it will be exported. You can change that if you don't want it to be exported.
The last but not least thing ... after you export the data looks messed up. You need select the whole first column and go to the Data tab - > convert text to column -> use the delimiter as comma and say Finish. That should solve your issue.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas