Read text file into array (Storing only numerical data) - vb.net

I'm trying to read specific values from the text file (below):
Current Online Users: 0
Total User Logins: 0
Server Uptime: 0 day, 0 hour, 0 minute
Downloaded Amount: 0.000 KB
Uploaded Amount: 0.000 MB
Downloaded Files: 0
Uploaded Files: 0
Download Bandwidth Utilization: 0.00 KB/s
Upload Bandwidth Utilization: 000.00 KB/s
I can read the file to an array:
Dim path As String = "C:\Stats.txt"
Dim StringArrayOfTextLines() As String = System.IO.File.ReadAllLines(path)
How do I store only the data I require in the array? I've tried split and substring but cannot work out a usable method - I need on the text after the colon for each line.
I was thinking, I only require the numerical data, can this be extracted from each line rather than just splitting into an array?
Thanks.

To capture everything after the colon you just need to split on it and take the second element of each result:
For Each s In StringArrayOfTextLines
Console.WriteLine(s.Split(":")(1).Trim())
Next
If you want to do that as you read the data you'll need to use a StreamReader like Joel suggested.

ReadAllLines does just what it says it does. You have to iterate over the results. To read the data you want directly, you have to write the code to use a System.IO.StreamReader (and it's ReadLine() function) or even just a base System.IO.FileStream.

Related

Read text file using fortran, where the starting 18 lines are strings and the rest of the lines are real numbers in 1000*8 array?

I have a text file where the first few lines are text and the rest of the lines contain data in the form of real numbers. I just require the real numbers array to be stored in a new file. For this, I read the total lines of the file for which output is correct and then trying to read the real numbers from the particular line numbers. I am unable to understand as to how to read this data.
Below is a part of file. Also I have many files like these to read.
AptitudeQT paperI: 12233105
Latitude : 30.00 S
Longitude: 46.45 E
Attemptone Time: 2017-03-30-09-03
End Time: 2017-03-30-14-55
Height(agl): m
Pressure: hPa
Temperature: deg C
Humidity: %RH
Uvelocity: cm/s
Vvelocity: cm/s
WindSpeed: cm/s
WindDirection: deg
---------------------------------------
10 1008.383 27.655 62.200 -718.801 -45.665 720.250 266.500
20 1007.175 27.407 62.950 -792.284 -18.481 792.500 268.800
There are many examples how to skip/read lines like this
But to sum it up, option A is to skip header and read only data:
! Skip first 17 lines
do i = 1, 17
read (unit,*,IOSTAT=stat) ! Dummy read
if ( stat /= 0 ) stop "error"
end do
! Read data
do i = 1, 1000
read (unit,*,IOSTAT=stat) data(:,i)
if ( stat /= 0 ) stop "error"
end do
If you have many files like this, I suggest wrapping this in a subroutine/function.
Option B is to use unix tail utility to discard header (more info here):
tail -n +18 file.txt

how to split large text files into smaller text files using vba?

I have a database textfile.
It is large text file about 387,480 KB. This file contains table name, headers of the table and values. I need to split this file into multiple files each containing table creation and insertion with a file name as table name.
Please can anyone help me??
I don't see how Excel will open a 347MB file. You can try to load it into Access, and do the split, using VBA. However, the process of importing a file that large may fragment enough to blow Access up to #GB, and then it's all over. SQL Server would handle this kind of job. Alternatively, you could use Python or R to do the work for you.
### Python:
import pandas as pd
for i,chunk in enumerate(pd.read_csv('C:/your_path/main.csv', chunksize=3)):
chunk.to_csv('chunk{}.csv'.format(i))
### R
setwd("C:/your_path/")
mydata = read.csv("annualsinglefile.csv")
# If you want 5 different chunks with same number of lines, lets say 30.
# Chunks = split(mydata,sample(rep(1:5,30))) ## 5 Chunks of 30 lines each
# If you want 100000 samples, put any range of 20 values within the range of number of rows
First_chunk <- sample(mydata[1:100000,]) ## this would contain first 100000 rows
# Or you can print any number of rows within the range
# Second_chunk <- sample(mydata[100:70,] ## this would contain last 30 rows in reverse order if your data had 100 rows.
# If you want to write these chunks out in a csv file:
write.csv(First_chunk,file="First_chunk.csv",quote=F,row.names=F,col.names=T)
# write.csv(Second_chunk,file="Second_chunk.csv",quote=F,row.names=F,col.names=T)

Qlikview does not upload what I ask for

I have this simple script to upload filenames:
Files:
LOAD
Distinct
FileName() as File
FROM [C:\Matias\Capacity Tracker\AllFiles\*];
And as a result while running the script, it happens the following:
Files << Analyst Time Sheet - Adam W - 0730-0805 0 lines fetched
Files << Analyst Time Sheet - Adam W - 0806-0812 0 lines fetched
Files << Analyst Time Sheet - Agnieszka J - 0702-0708 2 lines fetched
Files << Analyst Time Sheet - Agnieszka J - 0709-0715 3 lines fetched
Files << Analyst Time Sheet - Agnieszka J - 0716-0722 4 lines fetched
And so on...
So, the strange thing is that for the files from "Adam W", doesn't upload anything (no lines fetched). So then, I have the list of files except these ones. I find it very strange, because as I'm just asking for the filename, it can't be a thing of formatting (I think).
Any idea of what can be happening and how could I solve it?
Thank you in advance
Matias
Although QlikView offers that * option on the filename of the LOAD statment, the results are sometimes a little bit random. I would recomend that you try a different approach and see if it works.
For Each FILE in FileList('C:\Matias\Capacity Tracker\AllFiles\*')
Files:
LOAD
Distinct FileName() as File
FROM [$(FILE)];
next file
Hope this helps.
thanks for your idea. I tried that and unfortunately I had the same problem. Finally, I solved like this:
Files:
LOAD
Distinct
FileName() as File
FROM [C:\Matias\Capacity Tracker\AllFiles\*];
SET ErrorMode=0;
Files:
LOAD
Distinct
FileName() as File
FROM [C:\Matias\Capacity Tracker\AllFiles\*]
(ooxml, no labels, table is [Task Log])
Where Not Exists(File,FileName());
IF ScriptError <> 0 THEN
Files:
LOAD
FileName() as File
FROM [C:\Matias\Capacity Tracker\AllFiles\*]
(biff, no labels, table is [Task Log$])
Where Not Exists(File,FileName());
ENDIF
Despite they are all .xls files, it seems to be formatting differences between them. So the ones not uploaded at first, they were uploaded by the first statement after (ooxml), or if it failed, by the second one (biff files). Quite strange.
Maybe this is not the best and proper solution, but it was the only one that worked to upload all the filenames from the folder.

Create a 350000 column csv file by merging smaller csv files

I have about 350000 one-column csv files, which are essentially 200 - 2000 numbers printed one under another. The numbers are formatted like this: "-1.32%" (no quotes). I want to merge the files to create a monster of a csv file where each file is a separate column. The merged file will have 2000 rows maximum (each column may have a different length) and 350000 columns.
I thought of doing it with MySQL but there is a 30000 column limit. An awk or sed script could do the job but I don't know them all that well and I am afraid it will take a very long time. I could use a server if the solution requires to. Any suggestions?
This python script will do what you want:
#!/usr/bin/env python2
import os
import sys
import codecs
fhs = []
count = 0
for filename in sys.argv[1:]:
fhs.append(codecs.open(filename,'r','utf-8'))
count += 1
while count > 0:
delim = ''
for fh in fhs:
line = fh.readline()
if not line:
count -= 1
line = ''
sys.stdout.write(delim)
delim = ','
sys.stdout.write(line.rstrip())
sys.stdout.write('\n')
for fh in fhs:
fh.close()
Call it with all the CSV files you want to merge and it will print a new file to stdout.
Note that you can't merge all files at once; for one, you can't pass 350,000 file names as arguments to a process and secondly, a process can only open 1024 files at once.
So you'll have to do it in several passes. I.e. merge files 1-1000, then 1001-2000, etc. Then you should be able to merge the 350 resulting intermediate files at once.
Or you could write a wrapper script which uses os.listdir() to get the names or all files and calls this script several times.

Connection reset by peer error in MongoDb on bulk insert

I am trying to insert 500 documents by doing a bulk insert in pymongo and i get this error
File "/usr/lib64/python2.6/site-packages/pymongo/collection.py", line 306, in insert
continue_on_error, self.__uuid_subtype), safe)
File "/usr/lib64/python2.6/site-packages/pymongo/connection.py", line 748, in _send_message
raise AutoReconnect(str(e))
pymongo.errors.AutoReconnect: [Errno 104] Connection reset by peer
i looked around and found here that this happens because the size of inserted documents exceeds 16 MB so according to that the size of 500 documents should be over 16 MB. So i checked the size of the size of the 500 documents(python dictionaries) like this
size=0
for dict in dicts:
size+=dict.__sizeof__()
print size
this gives me 502920. This is like 500 KB. way less than 16 MB. Then why do i get this error.
I know i am calculating the size of python dictionaries not BSON documents and MongoDB takes in BSON documents but that cant turn 500KB into 16+ MB. Moreover i dont know how to convert a python dict into A BSON document.
My MongoDB version is 2.0.6 and pymongo version is 2.2.1
EDIT
I can do a bulk insert with 150 documents and thats fine but over 150 documents this error appears
This Bulk Inserts bug has been resolved, but you may need to update your pymongo version:
pip install --upgrade pymongo
The error occurs due to the fact that the bulk inserted documents have
an overall size of greater than 16 MB
My method of calculating the size of dictionaries was wrong.
When i manually inspected each key of the dictionary and found that 1 key was having a value of size 300 KB. So that did make the overall size of documents in the bulk insert more than 16 MB. (500*(300+)KB) > 16 MB. But i still dont know how to calculate size of a dictionary without manually inspecting it. Can someone please suggest?
Just had the same error and got around it by creating my own small bulks like this:
region_list = []
region_counter = 0
write_buffer = 1000
# loop through regions
for region in source_db.region.find({}, region_column):
region_counter += 1 # up _counter
region_list.append(region)
# save bulk if we're at the write buffer
if region_counter == write_buffer:
result = user_db.region.insert(region_list)
region_list = []
region_counter = 0
# if there is a rest, also save that
if region_counter > 0:
result = user_db.region.insert(region_list)
Hope this helps
NB: small update, from pymongo 2.6 on, PyMongo will auto-split lists based on the max transfer size: "The insert() method automatically splits large batches of documents into multiple insert messages based on max_message_size"