Formatting lists in erlang - formatting

I am very new to erlang. I am trying to print a list to the console.
This is what I am able to do currently and stuck up.
I'm trying this out in the erl.
>List = [{"a",20},{"b", 30}].
[{"a",20},{"b",30}]
>lists:foreach( fun(H) -> io:format("~p~n", [H]) end, List).
{"a",20}
{"b",30}
I am interested in formatting each list there. I want the output to be of the form
"a" - 20
"b" - 30
I am not knowing how would I be able to access the lists in a list and format them as I want them to be. Any kind help would be greatly appreciated.

We can pattern match the structure of the argument to fun:
1> List = [{"a",20},{"b", 30}].
[{"a",20},{"b",30}]
2> lists:foreach(fun({A, B}) -> io:format("~p - ~p~n", [A, B]) end, List).
"a" - 20
"b" - 30
ok

Related

How to count each of the occurrences of a term PER DOCUMENT in apache SOLR?

Good! I need to get the number of occurrences of a search term for each document stored in Solr (I have indexed "PDF" documents).
That is, if you have the following information:
document A: 3 times the word "house" appears.
document B: the word "house" appears 4 times.
document C: the word "house" appears 1 times.
So, if I search for "house", I need to obtain that for document A it appears 3 times, for B 4 times and for C 1 time. And in total 8 times (3 + 4 + 1). How can I do this in a query from HTTP, that is "'http: // localhost: 8983 / solr / ......."?
Thank you very much, Regards.
I am assuming you indexed the whole document into one field (e.g. text). In which case, you can use a Function Query termfreq to return number of times your term shows up in that field.
There are several ways to use a function query, including using it as a pseudo-field by just putting it in the fl field list:
http://localhost:8983/solr/corename/select?fl=*,termfreq(text,"house")&q=house

pseudo randomization in loop PsychoPy

I know other people have asked similar questions in past but I am still stuck on how to solve the problem and was hoping someone could offer some help. Using PsychoPy, I would like to present different images, specifically 16 emotional trials, 16 neutral trials and 16 face trials. I would like to pseudo randomize the loop such that there would not be more than 2 consecutive emotional trials. I created the experiment in Builder but compiled a script after reading through previous posts on pseudo randomization.
I have read the previous posts that suggest creating randomized excel files and using those, but considering how many trials I have, I think that would be too many and was hoping for some help with coding. I have tried to implement and tweak some of the code that has been posted for my experiment, but to no avail.
Does anyone have any advice for my situation?
Thank you,
Rae
Here's an approach that will always converge very quickly, given that you have 16 of each type and only reject runs of more than two emotion trials. #brittUWaterloo's suggestion to generate trials offline is very good--this what I do myself typically. (I like to have a small number of random orders, do them forward for some subjects and backwards for others, and prescreen them to make sure there are no weird or unintended juxtapositions.) But the algorithm below is certainly safe enough to do within an experiment if you prefer.
This first example assumes that you can represent a given trial using a string, such as 'e' for an emotion trial, 'n' neutral, 'f' face. This would work with 'emo', 'neut', 'face' as well, not just single letters, just change eee to emoemoemo in the code:
import random
trials = ['e'] * 16 + ['n'] * 16 + ['f'] * 16
while 'eee' in ''.join(trials):
random.shuffle(trials)
print trials
Here's a more general way of doing it, where the trial codes are not restricted to be strings (although they are strings here for illustration):
import random
def run_of_3(trials, obj):
# detect if there's a run of at least 3 objects 'obj'
for i in range(2, len(trials)):
if trials[i-2: i+1] == [obj] * 3:
return True
return False
tr = ['e'] * 16 + ['n'] * 16 + ['f'] * 16
while run_of_3(tr, 'e'):
random.shuffle(tr)
print tr
Edit: To create a PsychoPy-style conditions file from the trial list, just write the values into a file like this:
with open('emo_neu_face.csv', 'wb') as f:
f.write('stim\n') # this is a 'header' row
f.write('\n'.join(tr)) # these are the values
Then you can use that as a conditions file in a Builder loop in the regular way. You could also open this in Excel, and so on.
This is not quite right, but hopefully will give you some ideas. I think you could occassionally get caught in an infinite cycle in the elif statement if the last three items ended up the same, but you could add some sort of a counter there. In any case this shows a strategy you could adapt. Rather than put this in the experimental code, I would generate the trial sequence separately at the command line, and then save a successful output as a list in the experimental code to show to all participants, and know things wouldn't crash during an actual run.
import random as r
#making some dummy data
abc = ['f']*10 + ['e']*10 + ['d']*10
def f (l1,l2):
#just looking at the output to see how it works; can delete
print "l1 = " + str(l1)
print l2
if not l2:
#checks if second list is empty, if so, we are done
out = list(l1)
elif (l1[-1] == l1[-2] and l1[-1] == l2[0]):
#shuffling changes list in place, have to copy it to use it
r.shuffle(l2)
t = list(l2)
f (l1,t)
else:
print "i am here"
l1.append(l2.pop(0))
f(l1,l2)
return l1
You would then run it with something like newlist = f(abc[0:2],abc[2:-1])

Pig - comparing two similar statement : one working, the other not

I begin to be really annoyed with PIG :the language seems really not stable, the documentation is poor, there are not that many examples on internet, and any small change in the code can give radical differences :from failure to expected result.... Here is another kind of this last theme :
grunt> describe actions_by_unite;
actions_by_unite: {
group: chararray,
nb_actions_by_unite_and_action: {
(
unite: chararray,
lib_type_action: chararray,
double
)
}
}
-- works :
z = foreach actions_by_unite {
generate group, SUM(nb_actions_by_unite_and_action.$2);};
-- doesn't work :
z = foreach actions_by_unite {
x = SUM(nb_actions_by_unite_and_action.$2);
generate group, x;};
-- error :
2015-05-08 14:43:44,712 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<line 107, column 16> Invalid scalar projection: x : A column needs to be projected from a relation for it to be used as a scalar
Details at logfile: /private/tmp/pig-err.log
And so :
-- doesn't work neither:
z = foreach actions_by_unite { x = SUM(nb_actions_by_unite_and_action.$2);
generate group, x.$0;};
--error :
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (AC,EMAIL,1.1186133550060547E-4), 2nd :(AC,VISITE,6.25755280560356E-4)
at org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:120)
Does anyone would know why ?
Do you have some nice blog / ressources to propose with examples to master this language ?
I have the o'reilly book, but it seems a bit old, I have the 'Agile Data Science' and the "Hadoop definitive guide" book with some examples in it... I found this page really interesting : https://shrikantbang.wordpress.com/2014/01/14/apache-pig-group-by-nested-foreach-join-example/
Any good video on coursera or other inputs ? Do you guys also have problems with this language ? or I am simply dumb ?....
That thing in particular is not because of Pig being unstable, it's because what you are trying to do is correct in the first approach, but wrong in the others.
When you make a group by, you have for each group a bag that contains X tuples. Inside a nested foreach, you have one group with its bag for each iteration, which means that a SUM inside there will yield a scalar value: the sum of the bag you are currently working with. Apache Pig does not work with scalars, it works with relations, therefore you cannot assign a scalar value to an alias, which is exactly what you are doing in the second and third approach.
Therefore, the error comes from attempting something like:
A = foreach B {
x = SUM(bag.$0);
}
However, if you want to emit for each of the groups a scalar, you can perfectly do this as long as you never assign a scalar to an alias. That is why it works perfectly if you do the sum at the end of the foreach, because you are returning for each of the groups a tuple with two values: the group and the sum.

USING Filter in a Nested FOREACH in PIG

I have two pig relations. The first one count_pairs shows pairs of words and how many times they were seen. ex ((car,tire), 4). The second is word_counts, which keeps track of how many times each word was seen ex. (car, 20). I would like to find the percentage of how many times each pair was seen compared to how many times just the first word was seen. In our case I would want ((car,tire), 4/20). I tried to write a nested foreach to solve this problem :
> percent_count_pairs = FOREACH count_pairs {
> denom = FILTER word_counts BY ($0 ==count_pairs.pair.word1);
> GENERATE pair, count2/(double)denom.$1;}
I keep getting this error:
'Pig script failed to parse:
<file src/cluster.pig, line 27, column 15> expression is not a project expression: (Name: ScalarExpression) Type: null Uid: null)'
This point to the line with the FILTER;
googling this error did not lead me to anything helpful. Please help!
(ps. this does work if I take the line with FILTER out of the foreach...)
After more googling I came to realize that this is a bug in Pig that will not allow this:
https://issues.apache.org/jira/browse/PIG-1798. I ended up writing my own UDF to filter.

What would be the best way to parse this file?

I was just wondering if anyone knew of a good way that I could parse the file at the bottom of the post.
I have a database setup with the correct tables for each section eg Refferal Table,Caller Table,Location Table. Each table has the same columns that are show in the file below
I would really like something that is fairly genetic so if the file layout changes it won't mess me around to much. At the moment I am just reading the file in a line at a time and just using a case statement to check which section i'm in.
Is anyone able to help me with this?
PS. I am using VB but C# or anything else will be fine, also the x's in the document are just personal info I have blanked
Thanks,
Nathan
File:--->
DIAL BEFORE YOU DIG
Call 1100, Fax 1300 652 077
PO Box 7710 MELBOURNE, VIC 8004
Utilities are requested to respond within 2 working days and reference the Sequence number.
[REFFERAL DETAILS]
FROM= Dial Before You Dig - Web
TO= Technical Services
UTILITY ID= xxxxxx
COMPANY= {Company Name}
ENQUIRY DATE= 02/10/2008 13:53
COMMENCEMENT DATE= 06/10/2008
SEQUENCE NO= xxxxxxxxx
PLANNING= No
[CALLER DETAILS]
CUSTOMER ID= 403552
CONTACT NAME= {Name of Contact}
CONTACT HOURS= 0
COMPANY= Underground Utility Locating
ADDRESS= {Address}
SUBURB= {Suburb}
STATE= {State}
POSTCODE= 4350
TELEPHONE= xxxxxxxxxx
MOBILE= xxxxxxxxxx
FAX TYPE= Private
FAX NUMBER= xxxxxxxxxx
PUBLIC ADDRESS= xxxxxxxxxx
PUBLIC TELEPHONE=
EMAIL ADDRESS= {Email Address}
[LOCATION DETAILS]
ADDRESS= {Location Address}
SUBURB= {Location Suburb}
STATE= xxx
POSTCODE= xxx
DEPOSITED PLAN NO= 0
SECTION & HUNDRED NO= 0
PROPERTY PHONE NO=
SIDE OF STREET= B
INTERSECTION= xxxxxx
DISTANCE= 0-200m B
ACTIVITY CODE= 15
ACTIVITY DESCRIPTION= xxxxxxxxxxxxxxxxxx
MAP TYPE= StateGrid
MAP REF= Q851_63
MAP PAGE=
MAP GRID 1=
MAP GRID 2=
MAP GRID 3=
MAP GRID 4=
MAP GRID 5=
GPS X COORD=
GPS Y COORD=
PRIVATE/ROAD/BOTH= B
TRAFFIC AFFECTED= No
NOTIFICATION NO= 3082321
MESSAGE= entire intersection of Allora-Clifton rd , Hillside
rd and merivale st
MOCSMESSAGE= Digsafe generated referral
Notice: Please DO NOT REPLY TO THIS EMAIL as it has been automatically generated and replies are not monitored. Should you wish to advise Dial Before You Dig of any issues with this enquiry, please Call 1100
(See attached file: 3082321_LLGDA94.GML)
Google has the answers, once you know that the file-format is called '.ini'
Edit: That is, it's an .ini plus some extra leading/trailing gunk.
You could read each line of the file sequentially. Each line is essentially a name value pair. Place each value in a map (hash table) keyed by name. Use a map for each section. When done parsing the file you'll have maps containing all the name value pairs. Iterate over each map and populate your database tables.
I would head to Python for any type of string parsing like this. I'm not sure how much of this information you want to retain, but I would perhaps use Python's split() function to split on = to get rid of the equals sign, then strip the whitespace out of the second piece of the pie.
First, I would mask out the header/footer info I know I don't need, then do something akin to the following:
Let's take a chunk and save it in test1.txt:
ADDRESS= {Location Address}
SUBURB= {Location Suburb}
STATE= xxx
POSTCODE= xxx
DEPOSITED PLAN NO= 0
SECTION & HUNDRED NO= 0
PROPERTY PHONE NO=
Here's a small python snippet:
>>> f = open("test1.txt", "r")
>>> l = f.readlines()
>>> l = [line.split('=') for line in l]
>>> for line in l:
print line
['ADDRESS', '{Location Address}']
['SUBURB', '{Location Suburb}']
['STATE', 'xxx']
['POSTCODE', 'xxx']
['DEPOSITED PLAN NO', '0']
['SECTION & HUNDRED NO', '0']
['PROPERTY PHONE NO', '']
This would essentially give you a [Column, Value] tuple you could use to insert the data into your database (after escaping all strings, etc etc, SQL Injection warning).
This is assuming the email input and your DB will have the same column names, but if they didn't, it'd be fairly trivial to set up a column mapping using a dictionary. On the flip side, if the email and columns are in sync, you don't need to know the names of the columns to get the parsing down.
You could iterate through the pseudo-dictionary and print out each key-value pair in the right spot in your parameterized sql string.
Hope this helps!
Edit: While this is in Python, C#/VB.net should have the same/similar abilities.
Using f As StreamReader = File.OpenText("sample.txt")
Dim g As String = "undefined"
Do
Dim s As String = f.ReadLine
If s Is Nothing Then Exit Do
s = s.Replace(Chr(9), " ")
If s.StartsWith("[") And s.EndsWith("]") Then
g = s.Substring("[".Length, s.Length - "[]".Length)
Else
Dim ss() As String = s.Split(New Char() {"="c}, 2)
If ss.Length = 2 Then
Console.WriteLine("{0}.{1}={2}", g, Trim(ss(0)), Trim(ss(1)))
End If
End If
Loop
End Using