What would be the best way to parse this file? - vb.net

I was just wondering if anyone knew of a good way that I could parse the file at the bottom of the post.
I have a database setup with the correct tables for each section eg Refferal Table,Caller Table,Location Table. Each table has the same columns that are show in the file below
I would really like something that is fairly genetic so if the file layout changes it won't mess me around to much. At the moment I am just reading the file in a line at a time and just using a case statement to check which section i'm in.
Is anyone able to help me with this?
PS. I am using VB but C# or anything else will be fine, also the x's in the document are just personal info I have blanked
Thanks,
Nathan
File:--->
DIAL BEFORE YOU DIG
Call 1100, Fax 1300 652 077
PO Box 7710 MELBOURNE, VIC 8004
Utilities are requested to respond within 2 working days and reference the Sequence number.
[REFFERAL DETAILS]
FROM= Dial Before You Dig - Web
TO= Technical Services
UTILITY ID= xxxxxx
COMPANY= {Company Name}
ENQUIRY DATE= 02/10/2008 13:53
COMMENCEMENT DATE= 06/10/2008
SEQUENCE NO= xxxxxxxxx
PLANNING= No
[CALLER DETAILS]
CUSTOMER ID= 403552
CONTACT NAME= {Name of Contact}
CONTACT HOURS= 0
COMPANY= Underground Utility Locating
ADDRESS= {Address}
SUBURB= {Suburb}
STATE= {State}
POSTCODE= 4350
TELEPHONE= xxxxxxxxxx
MOBILE= xxxxxxxxxx
FAX TYPE= Private
FAX NUMBER= xxxxxxxxxx
PUBLIC ADDRESS= xxxxxxxxxx
PUBLIC TELEPHONE=
EMAIL ADDRESS= {Email Address}
[LOCATION DETAILS]
ADDRESS= {Location Address}
SUBURB= {Location Suburb}
STATE= xxx
POSTCODE= xxx
DEPOSITED PLAN NO= 0
SECTION & HUNDRED NO= 0
PROPERTY PHONE NO=
SIDE OF STREET= B
INTERSECTION= xxxxxx
DISTANCE= 0-200m B
ACTIVITY CODE= 15
ACTIVITY DESCRIPTION= xxxxxxxxxxxxxxxxxx
MAP TYPE= StateGrid
MAP REF= Q851_63
MAP PAGE=
MAP GRID 1=
MAP GRID 2=
MAP GRID 3=
MAP GRID 4=
MAP GRID 5=
GPS X COORD=
GPS Y COORD=
PRIVATE/ROAD/BOTH= B
TRAFFIC AFFECTED= No
NOTIFICATION NO= 3082321
MESSAGE= entire intersection of Allora-Clifton rd , Hillside
rd and merivale st
MOCSMESSAGE= Digsafe generated referral
Notice: Please DO NOT REPLY TO THIS EMAIL as it has been automatically generated and replies are not monitored. Should you wish to advise Dial Before You Dig of any issues with this enquiry, please Call 1100
(See attached file: 3082321_LLGDA94.GML)

Google has the answers, once you know that the file-format is called '.ini'
Edit: That is, it's an .ini plus some extra leading/trailing gunk.

You could read each line of the file sequentially. Each line is essentially a name value pair. Place each value in a map (hash table) keyed by name. Use a map for each section. When done parsing the file you'll have maps containing all the name value pairs. Iterate over each map and populate your database tables.

I would head to Python for any type of string parsing like this. I'm not sure how much of this information you want to retain, but I would perhaps use Python's split() function to split on = to get rid of the equals sign, then strip the whitespace out of the second piece of the pie.
First, I would mask out the header/footer info I know I don't need, then do something akin to the following:
Let's take a chunk and save it in test1.txt:
ADDRESS= {Location Address}
SUBURB= {Location Suburb}
STATE= xxx
POSTCODE= xxx
DEPOSITED PLAN NO= 0
SECTION & HUNDRED NO= 0
PROPERTY PHONE NO=
Here's a small python snippet:
>>> f = open("test1.txt", "r")
>>> l = f.readlines()
>>> l = [line.split('=') for line in l]
>>> for line in l:
print line
['ADDRESS', '{Location Address}']
['SUBURB', '{Location Suburb}']
['STATE', 'xxx']
['POSTCODE', 'xxx']
['DEPOSITED PLAN NO', '0']
['SECTION & HUNDRED NO', '0']
['PROPERTY PHONE NO', '']
This would essentially give you a [Column, Value] tuple you could use to insert the data into your database (after escaping all strings, etc etc, SQL Injection warning).
This is assuming the email input and your DB will have the same column names, but if they didn't, it'd be fairly trivial to set up a column mapping using a dictionary. On the flip side, if the email and columns are in sync, you don't need to know the names of the columns to get the parsing down.
You could iterate through the pseudo-dictionary and print out each key-value pair in the right spot in your parameterized sql string.
Hope this helps!
Edit: While this is in Python, C#/VB.net should have the same/similar abilities.

Using f As StreamReader = File.OpenText("sample.txt")
Dim g As String = "undefined"
Do
Dim s As String = f.ReadLine
If s Is Nothing Then Exit Do
s = s.Replace(Chr(9), " ")
If s.StartsWith("[") And s.EndsWith("]") Then
g = s.Substring("[".Length, s.Length - "[]".Length)
Else
Dim ss() As String = s.Split(New Char() {"="c}, 2)
If ss.Length = 2 Then
Console.WriteLine("{0}.{1}={2}", g, Trim(ss(0)), Trim(ss(1)))
End If
End If
Loop
End Using

Related

Convert a spool into text format

I want to send the spool generated by a Smart Form, by email as attachment in TXT format.
The issue is to get the spool in a TXT format, without technical stuff, just the characters in the form.
I have used the function module RSPO_RETURN_SPOOLJOB for getting it, but it returns a technical format like this:
//XHPLJIIID 0700 00000+00000+
IN01ES_CA930_DEMO_3 FIRST
OPINCH12 P 144 240 1728020160000010000100001
IN02MAIN
MT0100808400
CP11000000E
FCCOURIER 120 00144 SF001SF001110000144E
UL +0000000000000
ST0201614Dear Customer,
MT0214209000
ST0864060We would like to take this opportunity to confirm the flight
MT0100809360
ST0763253reservations listed below. Thank you for your custom.
...
I want something as follows, without the technical stuff:
Dear Customer,
We would like to take this opportunity to confirm the flight
reservations listed below. Thank you for your custom.
...
This is the code I have used :
PARAMETERS spoolnum type TSP01-RQIDENT.
DATA spool_contents type soli_tab.
CALL FUNCTION 'RSPO_RETURN_SPOOLJOB'
exporting
rqident = spoolnum
tables
buffer = spool_contents
exceptions
others = 1.
If the parameter DESIRED_TYPE is not passed or has the value 'OTF', and the spool is of type SAPscript/Smart Form, the function module returns the technical format you have experienced.
Instead, you should use the parameter DESIRED_TYPE = 'RAW' so that all the technical stuff is interpreted and the form is returned as text, the way you request, as follows :
CALL FUNCTION 'RSPO_RETURN_SPOOLJOB'
exporting
rqident = spoolnum
desired_type = 'RAW'
tables
buffer = spool_contents
exceptions
others = 1.

pseudo randomization in loop PsychoPy

I know other people have asked similar questions in past but I am still stuck on how to solve the problem and was hoping someone could offer some help. Using PsychoPy, I would like to present different images, specifically 16 emotional trials, 16 neutral trials and 16 face trials. I would like to pseudo randomize the loop such that there would not be more than 2 consecutive emotional trials. I created the experiment in Builder but compiled a script after reading through previous posts on pseudo randomization.
I have read the previous posts that suggest creating randomized excel files and using those, but considering how many trials I have, I think that would be too many and was hoping for some help with coding. I have tried to implement and tweak some of the code that has been posted for my experiment, but to no avail.
Does anyone have any advice for my situation?
Thank you,
Rae
Here's an approach that will always converge very quickly, given that you have 16 of each type and only reject runs of more than two emotion trials. #brittUWaterloo's suggestion to generate trials offline is very good--this what I do myself typically. (I like to have a small number of random orders, do them forward for some subjects and backwards for others, and prescreen them to make sure there are no weird or unintended juxtapositions.) But the algorithm below is certainly safe enough to do within an experiment if you prefer.
This first example assumes that you can represent a given trial using a string, such as 'e' for an emotion trial, 'n' neutral, 'f' face. This would work with 'emo', 'neut', 'face' as well, not just single letters, just change eee to emoemoemo in the code:
import random
trials = ['e'] * 16 + ['n'] * 16 + ['f'] * 16
while 'eee' in ''.join(trials):
random.shuffle(trials)
print trials
Here's a more general way of doing it, where the trial codes are not restricted to be strings (although they are strings here for illustration):
import random
def run_of_3(trials, obj):
# detect if there's a run of at least 3 objects 'obj'
for i in range(2, len(trials)):
if trials[i-2: i+1] == [obj] * 3:
return True
return False
tr = ['e'] * 16 + ['n'] * 16 + ['f'] * 16
while run_of_3(tr, 'e'):
random.shuffle(tr)
print tr
Edit: To create a PsychoPy-style conditions file from the trial list, just write the values into a file like this:
with open('emo_neu_face.csv', 'wb') as f:
f.write('stim\n') # this is a 'header' row
f.write('\n'.join(tr)) # these are the values
Then you can use that as a conditions file in a Builder loop in the regular way. You could also open this in Excel, and so on.
This is not quite right, but hopefully will give you some ideas. I think you could occassionally get caught in an infinite cycle in the elif statement if the last three items ended up the same, but you could add some sort of a counter there. In any case this shows a strategy you could adapt. Rather than put this in the experimental code, I would generate the trial sequence separately at the command line, and then save a successful output as a list in the experimental code to show to all participants, and know things wouldn't crash during an actual run.
import random as r
#making some dummy data
abc = ['f']*10 + ['e']*10 + ['d']*10
def f (l1,l2):
#just looking at the output to see how it works; can delete
print "l1 = " + str(l1)
print l2
if not l2:
#checks if second list is empty, if so, we are done
out = list(l1)
elif (l1[-1] == l1[-2] and l1[-1] == l2[0]):
#shuffling changes list in place, have to copy it to use it
r.shuffle(l2)
t = list(l2)
f (l1,t)
else:
print "i am here"
l1.append(l2.pop(0))
f(l1,l2)
return l1
You would then run it with something like newlist = f(abc[0:2],abc[2:-1])

How to handle Zim Error "The ZIM tree pool has overflowed"

I have the following code:
set output spoole
select * from displays where displayname='dsp020a'
select * from forms where formname in (select formname from displayforms where displayname='dsp020a')
select * from formfields where formname in (select formname from displayforms where displayname='dsp020a')
The third select is crashing ZIM with the following error:
*** ZIM System Error *** The Zim tree pool has overflowed. Type BYE to exit from Zim.
What am I doing wrong and how can I fix it?
When trying to use SQL in ZIM, I also saw problems and unexpected errors. You should always try to use the native ZIM 4GL commands to access data as well as object definitions.
ZIM Data Dictionary contains some predefined internal relations which can help analyse the data model. For example, you could say:
list all Displays DispDispForms DisplayForms where Displays.DisplayName = "dsp020a"
to find out which forms are contained in the given display. Likewise you could do:
list all Forms FormFormFields FormFields where Forms.FormName in ("f020a", "f020b", "f020c")
to list all form fields which belong to the given forms. Unfortunately, there is no relation between DisplayForms and Forms, so you cannot achieve directly what you tried in your example using SQL.
OR (added after your comment):
You can achieve that using a small program. For this example, it would be:
set output output_file
find Displays DispDispForms DisplayForms where Displays.DisplayName = "dsp020a"
while $setcount > 0
let vStr = DisplayForms.FormName
list all Forms FormFormFields FormFields where Forms.FormName = vStr
let $setcount = $setcount - 1
next
endwhile
set output terminal
Now you have all form fields which belong to all forms of the given display listed in the output_file.
There probably already is a relationship between dfs and forms in the db schema but the documentation is poor. However, you can create your own. Note that dfs is a role for displayforms.
add 1 rels let relname = 'dfsforms' relcondition = 'dfs.formname = forms.formname' reltype = 'ZIM' dirname = 'ZIM'
create rel dfsforms
Now you can find forms related to displays using:
find all displays dispdispforms dfs dfsforms forms formformformfields wh displays.displayname = 'dsp020a'

Hadoop Pig - Replace strings in a relation with their corresponding values in a map

I have a relation called conversations_grouped made up of bags of tuples of varying sizes, like so:
DUMP conversations_grouped:
...
({(L194),(L195),(L196),(L197)})
({(L198),(L199)})
({(L200),(L201),(L202),(L203)})
({(L204),(L205),(L206)})
({(L207),(L208)})
({(L271),(L272),(L273),(L274),(L275)})
({(L276),(L277)})
({(L280),(L281)})
({(L363),(L364)})
({(L365),(L366)})
({(L666256),(L666257)})
({(L666369),(L666370),(L666371),(L666372)})
({(L666520),(L666521),(L666522)})
Each L[0-9]+ is a tag corresponding to a string. For example, L194 might be "Hello, how are you doing?" and L195 might be "fine, how are you?". This correspondence is maintained by a map called line_map. Here's a sample:
DUMP line_map;
...
([L666324#Do you think she might be interested in someone?])
([L666264#Well that's typical of Her Majesty's army. Appoint an engineer to do a soldier's work.])
([L666263#Um. There are rumours that my Lord Chelmsford intends to make Durnford Second in Command.])
([L666262#Lighting COGHILL' 5 cigar: Our good Colonel Dumford scored quite a coup with the Sikali Horse.])
([L666522#So far only their scouts. But we have had reports of a small Impi farther north, over there. ])
([L666521#And I assure you, you do not In fact I'd be obliged for your best advice. What have your scouts seen?])
([L666520#Well I assure you, Sir, I have no desire to create difficulties. 45])
([L666372#I think Chelmsford wants a good man on the border Why he fears a flanking attack and requires a steady Commander in reserve.])
([L666371#Lord Chelmsford seems to want me to stay back with my Basutos.])
([L666370#I'm to take the Sikali with the main column to the river])
([L666369#Your orders, Mr Vereker?])
([L666257#Good ones, yes, Mr Vereker. Gentlemen who can ride and shoot])
([L666256#Colonel Durnford... William Vereker. I hear you 've been seeking Officers?])
What I'm trying to do now is parse through each line and replace the L[0-9]+ tags with their corresponding text from line_map. Is it possible to make references to line_map from within a Pig FOREACH statement, or is there something else I have to do?
The first issue with this is that in a map the key must be a quoted string. So you can't use a schema value to access the map. E.G. This will not work.
C: {foo: chararray, M: [value:chararray]}
D = FOREACH C GENERATE M#foo ;
The solution that comes to mind is to FLATTEN conversations_grouped. Then do a join between conversations_grouped and line_map on the L[0-9]+ tag. You'll probably want to project out some of the extra fields (like the L[0-9]+ tag after the join) to make the next step faster. After that you'll have to regroup the data, and massage it into the correct format.
This won't work unless each bag has it's own unique ID for the regrouping, but if each of the L[0-9]+ tags appear in only one bag (conversation) you can use this to create a unique id.
-- A is dumped conversations_grouped
B = FOREACH A {
-- Pulls out an element from the bag to use as the id
id = LIMIT tags 1 ;
-- Flattens B into id, tag form. Each group of tags will have the same id.
GENERATE FLATTEN(id), FLATTEN(tags) ;
}
The schema and output for B is:
B: {id: chararray,tags::tag: chararray}
(L194,L194)
(L194,L195)
(L194,L196)
(L194,L197)
(L198,L198)
(L198,L199)
(L200,L200)
(L200,L201)
(L200,L202)
(L200,L203)
(L204,L204)
(L204,L205)
(L204,L206)
(L207,L207)
(L207,L208)
(L271,L271)
(L271,L272)
(L271,L273)
(L271,L274)
(L271,L275)
(L276,L276)
(L276,L277)
(L280,L280)
(L280,L281)
(L363,L363)
(L363,L364)
(L365,L365)
(L365,L366)
(L666256,L666256)
(L666256,L666257)
(L666369,L666369)
(L666369,L666370)
(L666369,L666371)
(L666369,L666372)
(L666520,L666520)
(L666520,L666521)
(L666520,L666522)
Assuming that the tags are unique, the rest is done like:
-- A2 is line_map, loaded in tag/message pairs instead of a map
-- Joins conversations_grouped and line_map on tag
C = FOREACH (JOIN B by tags::tag, A2 by tag)
-- This generate removes the tag
GENERATE id, message ;
-- Regroups C on the id created in B
D = FOREACH (GROUP C BY id)
-- This step limits the output to just messages
GENERATE C.(message) AS messages ;
Schema and output from D:
D: {messages: {(A2::message: chararray)}}
({(Colonel Durnford... William Vereker. I hear you 've been seeking Officers?),(Good ones, yes, Mr Vereker. Gentlemen who can ride and shoot)})
({(Your orders, Mr Vereker?),(I'm to take the Sikali with the main column to the river),(Lord Chelmsford seems to want me to stay back with my Basutos.),(I think Chelmsford wants a good man on the border Why he fears a flanking attack and requires a steady Commander in reserve.)})
({(Well I assure you, Sir, I have no desire to create difficulties. 45),(And I assure you, you do not In fact I'd be obliged for your best advice. What have your scouts seen?),(So far only their scouts. But we have had reports of a small Impi farther north, over there. )})
NOTE: If at worst, (the L[0-9]+ tags aren't unique) you can give each line of the input file(s) a sequential, integer id before you load it into pig.
UPDATE: If you are using pig 0.11, then you can also use the RANK operator.

Importing timeseries datasets to MATLAB (all values are displayed as NaN)

I am stuck trying to run an economic model using MATLAB - at the data importing part. For most of my code I'm using a freeware toolbox called IRIS.
I have quarterly dataset with 14 variables and 160 datapoints. Essentially the dataset is a 15X161 matrix- including the dates(col1) and variable names(B1:O1).
The command used for uploading data on IRIS is
d = dbload('filename.csv')
but this isn't working. Although MATLAB is creating a 1X1 array called d and creating fields under it (one for each variable). All cells display NaN - not a number.
Why is this happening?
I checked the tutorials on the IRIS toolbox website and tried running and loading a sample dataset from there using this command, but it leads to the same problem. Everywhere I checked- including MATLAB help, this seems to be the correct command to use when using IRIS, but somehow it isn't working.
I also tried uploading the data directly using MATLAB functions and not IRIS. The command I'm using is:
d = dataset('XLSFile','filename.xls','ReadVarNames', true).
Although this is working, and I can see all the variable names, but MATLAB can't read the dates. I tried xlsread and importdata as well, but they don't read the variable names. Is there any way for me to upload the entire Excel sheet with the variable names and dates?
It would be best if I could get the IRIS command to work, since the rest of my code would be compatible with that.
The dataset looks somewhat like this..
HO_GDP HO_CPI HO_CPI HO_RS HO_ER HO_POIL....
4/1/1970 82.33 85.01 55.00 99.87 08.77
7/1/1970 54.22 8.98 25.22 95.11 91.77
10/1/1970 85.41 85.00 85.22 95.34 55.00
1/1/1971 85.99 899 8.89 85.1
You can use the TEXTSCAN function to read the CSV file in MATLAB:
%# some options
numCols = 15; %# number of columns
opts = {'Delimiter',',', 'MultipleDelimsAsOne',true, 'CollectOutput',true};
%# open file for reading
fid = fopen('filename.csv','rt');
%# read header line
headers = textscan(fid, repmat('%s',1,numCols), 1, opts{:});
%# read rest of data rows
%# 1st column as string, the other 14 as floating point
data = textscan(fid, ['%s' repmat('%f',1,numCols-1)], opts{:});
%# close file
fclose(fid);
%# collect data
headers = headers{1};
data = [datenum(data{1},'mm/dd/yyyy') data{2}];
The result for the above sample you posted (assuming values are comma-separated):
>> headers
headers =
'HO_GDP' 'HO_CPI' 'HO_CPI' 'HO_RS' 'HO_ER' 'HO_POIL'
>> data
data =
7.1962e+05 82.33 85.01 55 99.87 8.77
7.1971e+05 54.22 8.98 25.22 95.11 91.77
7.198e+05 85.41 85 85.22 95.34 55
7.1989e+05 85.99 899 8.89 85.1 0
Note how in the last line of the code we convert the date column to serial date number, so that we can store the entire data in one numeric matrix. You can always go back to string representation of dates using DATESTR function:
>> datestr(data(:,1))
ans =
01-Apr-1970
01-Jul-1970
01-Oct-1970
01-Jan-1971