Associative arrays for grandmothers using awk - awk

I have a really hard time wrapping my head around arrays and associative arrays in awk.
Say you want to compare two different columns in two different files using associative arrays, how would you do? Let's say column 1 in file 1 with column 2 in file two, then print the the matching, corresponding values of file 1 in a new column in file 2. Please explain each step really simply, as if talking to your grandmother, I mean, super-thoroughly and super-simple.
Cheers

Simple explanation of associative arrays (aka maps), not specifically for awk:
Unlike a normal array, where each element has a numeric index, an associative array uses a "key" instead of an index. You can think of it as being like a simple flat-file database, where each record has a key and a value. So if you have, e.g. some salary data:
Fred 10000
John 12000
Sara 11000
you could store it in an associative array, a, like this:
a["Fred"] = 10000
a["John"] = 12000
a["Sara"] = 11000
and then when you wanted to retrieve a salary for a person you would just look it up using their name as the key, e.g.
salary = a[person]
You can of course modify the values too, so if you wanted to give Fred a 10% pay rise you could do it like this:
a["Fred"] = a["Fred"] * 1.1
And if you wanted to set Sara's salary to be the same as John's you could write:
a["Sara"] = a["John"]
So an associative array is just an array that maps keys to values. Note that the keys do not need to be strings, and the values do not need to be numeric, but the basic concept is the same regardless of the data types. Note also that one obvious constraint is that keys need to be unique.

Grandma - let's say you want to make jam out of strawberries, raspberries, and blueberries, one jar of each.
You have a shelf on your wall with room/openings for 3 jars on it. That shelf is an associative array: shelf[]
Stick a label named "strawberry" beneath any one of the 3 openings. That is the index of an associative array: shelf["strawberry"]
Now place a jar of strawberry jam in the opening above that label. That is the contents of the associative array indexed by the word "strawberry": shelf["strawberry"] = "the jar of strawberry jam"
Repeat for raspberry and blueberry.
When you feel like making yourself a delicious sandwich, go to your shelf (array), look for the label (index) named "strawberry" and pick up the jar sitting above it (contents/value), open and apply liberally to bread (preferably Mothers Pride end slices).
Now - if a wolf comes to the door, do not open it in case he steals your sandwich or worse!

Related

Livecode: How do I program a button to create unique variables?

I apologize if this has been asked before (I couldn't find anything).
I'm an extreme noob in Livecode, and I want to know if there is a way of programming a button to create many new, unique variables and assign a value to them. I apologize if this is a dumb question.
Usually you use an array for that. An array is basically a list of things, where each thing is associated with an "index". An index can be any word, so you can use an array like a dictionary, where you'd e.g. have French words as the index, and English words as the value, like:
put "cow" into myDictionary["vache"]
But you can also just use numbers as the keys and make them a numbered list:
put "cow" into allMyAnimals[1]
put "duck" into allMyAnimals[2]
In end effect, you create one variable and put several things in it. For example if you had a loop that calculated something (in this example a number +100) and you wanted to have variables containing all those numbers, but named with 100 less, you'd do something like:
repeat with x = 1 to 250
put x +100 into twoHundredFiftyNumbersFrom101[x]
end repeat
And to read the first one:
answer "the first number is" && twoHundredFiftyNumbersFrom101[1]
Or all of them:
repeat with x = 1 to 250
answer twoHundredFiftyNumbersFrom101[x]
end repeat
Or whatever. You could also use 'do' to build the lines of code as a string, but then you have to make sure your variable names are generated in a fashion that makes them valid identifiers (e.g. have no spaces in them, no special characters). An array key can be any valid string, and the compiler can optimize them, and you can treat them as a whole and pass them between handlers.
Or you can do this "in the clear" with a "do" construction:
on mouseUp
repeat with y = 1 to 10
get random(100)
do "put it into onTheFlyVariable" & y
end repeat
end mouseUp
Step through this handler and watch the variables assemble themselves.

Xcode - Objective C- How to make a dictionary of persons?

I can't get my head around my objective C code. I want to make a dictionary with persons that I save in the app. The dictionary is in first empty and then we need to add it from the textfields I created (see image)textfieldWelkom
For every person that is added, we have to show it in a list (see image) lijstTabblad in the tableviewcontroller. (The list has to be the whole name of the person)
I don't get the idea of how to making a dictionary with multiple persons with every person his own ELEMENTS. And how that I can get the elements again out of de dictionary for making the list etc..
(not like the 1 example but multiple values with that person)
I would be sow thankfull if you could help me!
Greetings,
Kevin
You needs to first create single person dictionary dictionary keys
are like name sirname age and put values for respected keys
add this dictionary in one array or another dictionary(array/dictionary must be mutable array)

R - find name in string that matches a lookup field using regex

I have a data frame of ad listings for pets:
ID Ad_title
1 1 year old ball python
2 Young red Blood python. - For Sale
3 1 Year Old Male Bearded Dragon - For Sale
I would like take the common name in the Ad_listing (i.e. ball pyton) and create a new field with the Latin name for the species. To assist, I have another data frame that has the latin names and common names:
ID Latin_name Common_name
1 Python regius E: Ball Python, Royal Python G: Königspython
2 Python brongersmai E: Red Blood Python, Malaysian Blood Python
3 Pogona barbata E: Eastern Bearded Dragon, Bearded Dragon
How can I go about doing this? The tricky part is that the common names are hidden in between text both in the ad listing and in the Common_name. If that were not the case I could just use %in%. If there was a way/function to use regex I think that would be helpful.
The other answer does a good job outlining the general logic, so here's a few thoughts on a simple (though not optimized!!) way to do this:
First, you'll want to make a big table, two columns of all 'common names' (each name gets its own row) alongside it's Latin name. You could also make a dictionary here, but I like tables.
reference_table <- data.frame(common = c("cat", "kitty", "dog"), technical = c("feline", "feline", "canine"))
common technical
1 cat feline
2 kitty feline
3 dog canine
From here, just loop through every element of "ad_title" (use apply() or a for loop, depending on your preference). Now use something like this:
apply(reference_table,1, function(X) {
if (length(grep(X$common, ad_title)) > 0){ #If the common name was found in the ad_title
[code to replace the string]})
For inserting the new string, play with your regular regex tools. Alternatively, play with strsplit(ad_title, X$common). You'll be able to rebuild the ad_title using paste(), and the parts that make up the strsplit.
Again, this is NOT the best way to do this, but hopefully the logic is simple.
Well, I tried to create a workable solution for your requirement. There could be better ways to execute it, though, probably using packages such as data.table and/or stringr. Anyway, this snippet could be a working starting point. Oh, and I modified the Ad_title data a bit so that the species names are in titlecase.
# Re-create data
Ad_title <- c("1 year old Ball Python", "Young Red Blood Python. - For Sale",
"1 Year Old Male Bearded Dragon - For Sale")
df2 <- data.frame(Latin_name = c("Python regius", "Python brongersmai", "Pogona barbata"),
Common_name = c("E: Ball Python, Royal Python G: Königspython",
"E: Red Blood Python, Malaysian Blood Python",
"E: Eastern Bearded Dragon, Bearded Dragon"),
stringsAsFactors = F)
# Aggregate common names
Common_name <- paste(df2$Common_name, collapse = ", ")
Common_name <- unlist(strsplit(Common_name, "(E: )|( G: )|(, )"))
Common_name <- Common_name[Common_name != ""]
# Data frame latin names vs common names
df3 <- data.frame(Common_name, Latin_name = sapply(Common_name, grep, df2$Common_name),
row.names = NULL, stringsAsFactors = F)
df3$Latin_name <- df2$Latin_name[df3$Latin_name]
# Data frame Ad vs common names
Ad_Common_name <- unlist(sapply(Common_name, grep, Ad_title))
df4 <- data.frame(Ad_title, Common_name = sapply(1:3, function(i) names(Ad_Common_name[Ad_Common_name==i])),
stringsAsFactors = F)
obviously you need a loop structure for all your common name lookup table and another loop that splits this compound field on comma, before doing simple regex. there's no sane regex that will do it all.
in future avoid using packed/compound structures that require packing and unpacking. it looks fine for human consumption but semantically and for computer program consumption, you have multiple data values packed in single field, i.e. it's not "common name" it's "common names" delimited by comma, that you have there.
sorry if i haven't provided R or whatever specific answer. I'm a technology veteran and use many languages/technologies depending on problem and available resources. you will need to iterate over every record of your latin names lookup table, within which you will need to iterate over the comma delimited packed field of "common names", so you're working with one common name at a time. with that single common name you search/replace using regex or whatever means available to you, over the whole input file. it's plain and simple that you need to start at it from that end, i.e. the lookup table. you need to iterlate/loop through that. iteration/looping should be familiar to you, as it's a basic building block of any program/script. this kind of procedural logic is not part of the capability (or desired functionality) of regex itself. I assume you know how to create a iterative construct in R or whatever you're using for this.

File operation is slower is there a faster look up method in Python?

I am storing the values of the form given below into a file:
143 800 'Ask'
213 457 'Comment'
424 800 'Hi'
The first column contains unique elements here.
However, the look up on the values of the first column is quite inefficient when I am storing it in file format, is there a more efficient way in Python for a faster look-up.
I am aware of dictionaries in python for accomplishing this, but I am looking for some other method. Since the data I have consits of trillions of records..therefore I can not keep them in dictionary in the RAM. Therefore, I am searching for some other method.
Also with each program exection the rows are going to be inserted in the case of databases, how to overcome that, an example of what I am getting confused about in databases is given below:
143 800 'Ask'
213 457 'Comment'
424 800 'Hi'
143 800 'Ask'
213 457 'Comment'
424 800 'Hi'
Here's a full code example using sqlite3, showing how to initialise the database, put data into it, and get a single row of data out.
import sqlite3
conn = sqlite3.connect(':memory:')
conn.execute("""CREATE TABLE Widget (id INTEGER PRIMARY KEY,
serial_number INTEGER,
description TEXT);""")
my_data = [ [143, 800, 'Ask'],
[213, 457, 'Comment'],
[424, 800, 'Hi'] ]
for row in my_data:
conn.execute("INSERT INTO Widget (id, serial_number, description) VALUES (?,?,?);" , row )
conn.commit() # save changes
res = conn.execute("SELECT * FROM Widget WHERE id=143")
row = res.next()
print row #prints (143, 800, u'Ask')
Note the use of the special filename :memory: to open a temporary database.
What you're asking for is probably called a "Database table" and an "Index". The classic approach is to have a supplementary file (index) which maps the keys of the data tuples in the table to absolute positions of the tuples in the file.
I don't understand, you want to be able to search faster in the file itself, or with the file content in python? In the latter, use a dictionary with the unique elements as key.
values = {143:[800,'ask'], 213:[457,'Comment'], 424:[800:'Hi']}
If you need to look things up in a persistent store, use a database. One example is sqlite, which is built-in.
Also with each program exection the rows are going to be inserted
If you want to keep the storage in a file, the way you do it, then the simple solution to prevent duplicate entries from appearing the next execution would be to simply truncate the file first. You can do this, by opening it with the w flag:
f = open('filename', 'w')
# ...
f.close()
However it sounds as if you just want to store some data while the program is executed, i.e. you want to keep data around without making it persistent. If that’s the case, then I am wondering why you actually store the contents in a file.
The more obvious way, which is also pythonic (although it’s not special to Python), would be to keep it in a dictionary during the program execution. A dictionary is mutable, so you can change its content all the time: You can add new entries, or even update entries if you later get more information on them.
I knew about this from of storing in dictionary but at times I don't have values for values[143][1] ie the string 'None' is stored in its place
That’s not a problem at all. You can easily store an entry with 143 as the key and None as its value, or even an array of None values:
>>> values[143] = [ None, None ]
That way, the dictionary will still remember that you entered the key, so a check if the key is in the dictionary will return true:
>>> 143 in values
True
Is there any other way other than dictionaries in python for accomplising the same, I was aware of dictionaries...I am just searching for some other way.
No, there usually is only one way to do something right in Python, as also told by the Zen of Python: “There should be one-- and preferably only one --obvious way to do it.”
As such, no, there is probably not an appropriate way to use dictionaries without dictionaries. But then again, why are you searching for some other way? It does not sound to me, as if you have a good reason to do so, and if you have, you have to make sure that you explain why certain ways are undesirable for you to use.

Collection naming convention

how should I name array that holds widths of columns? I would use:
int[] columnsWidths;
but I saw in many places names like:
columnWidths
or
colWidths
Of course "widths of columns" is only an example.
Moreover, I think there is also another case, when 2 words are not 2 separate words, but create some kind of name, e.g.
class TableView
How in this case variable's name should look like?
TableView[] tableViews;
or
TableView[] tablesViews;
Read Code Complete. But there is no right or wrong answer -- choose what suits you and your colleagues and be consistent.