sum of setof in prolog - sum

I have this predicate to get the sum of the length of all borders of a country. I could solve it with findall but I have to use setof. My facts look like this:
borders(sweden,finland,586).
borders(norway,sweden,1619).
My code
circumference(C, Country) :-
findall(X, ( borders(Country, _, X) ; borders(_, Country, X)), Kms),
sum_list(Kms, C).

You cannot find the sum using bagof directly, all you can do is make a list and then sum that list (but you knew that already). In SWI-Prolog there is library(aggregate) that does the bagging and the summing for you. With the facts you have, you would write:
?- aggregate(sum(X), Y^( borders(Y, sweden, X) ; borders(sweden, Y, X) ), Circumference).
Circumference = 2205.
If you instead must obey the whims of your instructor and type "bagof" yourself, or if you are not allowed to use a modern, open source, comprehensive Prolog implementation, you can use the same approach with bagof and manually build the list before summing it:
?- bagof(X, Y^( borders(Y, sweden, X) ; borders(sweden, Y, X) ), Borders).
Borders = [1619, 586].
For reasons that are lost in the mists of time the funny thing with the Var^Goal that you see in both aggregate and bagof is called "existentially qualifying the variables in Goal". You might also read that "^ prevents binding Var in Goal". I cannot explain what this really means.

I ended up using this:
circumference(Z, Country) :- setof(X, Q^(borders(Q,Country,X);borders(Country,Q,X)),Border),
sum_list(Border,Z).
% Adds the numbers in a list.
sum_list([], 0).
sum_list([H|T], Sum) :-
sum_list(T, Rest),
Sum is H + Rest.

Related

Slick: Pass in column to update

Let's say we have a FoodTable with the following columns: Name, Calories, Carbs, Protein. I have an entry for Name = Chocolate, Calories = 100, Carbs = "10g", and Protein = "2g".
I'm wondering if there's a way to pass in a column name and a new value to update with. For example, I want a method that's like
def updateFood(food, columnName, value):
table.filter(_.name === food).map(x => x.columnName).update(value)
It seems like dynamic columns are not possible with Slick? I want to avoid writing a SQL query because that could lead to security flaws or bugs in the code. Is there really no way to do this?
I also don't want to have to pass in the entire object to update, since ideally, it should be:
I want to update column X to value Y. I should only need to pass in the id of the object, the column, and the value to update to.
I'm wondering if there's a way to pass in a column name and a new value to update with
This depends a little bit on what you want the "column name" to be. To maintain safety, what I'd suggest is having the "column name" be a function that can select a column in your table.
At a high level that would look like this:
// Won't compile, but we'll fix that in a moment
def updateFood[V](food: Food, column: FoodTable => Rep[V], value: V): DBIO[Int] =
foods.filter(_.name === food.name).map(column).update(value)
...which we'd call like this:
updateFood(choc, _.calories, 99)
Notice how the "column name" is a function from FoodTable to a column of some value V. Then you provide a value for the V and we do a normal update.
The problem is that Slick knows how to map certain types of values (String, Int, etc) into SQL, but not any kind of value. And the code above won't compile because V is unconstrained.
We can sort of fix that my adding a constraint on V, and it mostly will work:
// Will compile, will work for basic types
def updateFood[V : slick.ast.BaseTypedType](food: Food, column: FoodTable => Rep[V], value: V): DBIO[Int] =
foods.filter(_.name === food.name).map(column).update(value)
However, if you have custom column mappings, they won't match the constraint. We need to go another step on and have an implicit shape in scope:
def updateFood[V](food: Food, column: FoodTable => Rep[V], value: V)(implicit shape: Shape[_ <: FlatShapeLevel, Rep[V], V, _]): DBIO[Int] =
foods.filter(_.name === food.name).map(column).update(value)
I think of Shape as an extra level of abstraction in Slick, above Rep[V]. The mechanisms of the "shape levels" and other details are not something I can explain because I don't understand them yet! (There is a talk that goes into the design of Slick called "Polymorphic Record Types in a Lifted Embedding" which you can find at http://slick.lightbend.com/docs/)
A final note: if you really want the column name to be a String or something like that, I'd suggest pattern matching the string (or validate in some way) to a FoodTable => Rep function and use that in your SQL. That's going to be tricky because your value V is going to have to match the type of the column you want to update.
Off the top of my head, that could look something like this:
def tryUpdateFood(food: Food, columnName: String, value: String): DBIO[Int] =
columnName match {
case "calories" => updateFood(food, _.calories, value.toInt)
case "carbs" => updateFood(food, _.carbs, value)
// etc...
case unknown => DBIO.failed(new Exception(s"Don't know how to update $unknown columns"))
}
I can imagine better error handling, safer or smarter parsing of the value, but in outline the above could work.
For hints at other ways to approach dynamic problems, take a look at the talk "Patterns for Slick database applications" (also listed at: http://slick.lightbend.com/docs/), and towards the end of the presentation there's a section on "Dynamic sorting".

SUM function in PIG

Starting to learn Pig latin scripting and stuck on below issue. I have gone through similar questions on the same topic without any luck! Want to find SUM of all the age fields.
DUMP X;
(22)(19)
grunt> DESCRIBE X;
X: {age: int}
I tried several options such as :
Y = FOREACH ( group X all ) GENERATE SUM(X.age);
But, getting below exception.
Invalid field projection. Projected field [age] does not exist in schema: group:chararray,X:bag{:tuple(age:int)}.
Thanks for your time and help.
I think the Y projection should work as you wrote it. Here's mi little example code for the same and that's just work fine for me.
X = LOAD 'SO/sum_age.txt' USING PigStorage('\t') AS (age:int);
DESCRIBE X;
Y = FOREACH ( group X all ) GENERATE
SUM(X.age);
DESCRIBE Y;
DUMP Y;
So you your problem looks strange. I used the following input data:
-bash-4.1$ cat sum_age.txt
22
19
Can you make a try on the same data with script I inserted here?

Multiple Wildcards/Filters for SQL

I am attempting to only retrieve Business Unit specific information from a large datamart, and would like to structure my query to eliminate unrelated DepartmentIDs.
In plainspeak, the end goal is to filter on ALL DepartmentIDs starting with "AN" and ending in 0, P, A, N, R, V, C, L, W, E, or Y.
Currently the query starts with:
FROM bbms_tpirc.dbo.LaborDetailByName LaborDetailByName
WHERE (LaborDetailByName.post_year='2016') AND
(LaborDetailByName.center_id='APEEN') AND
(LaborDetailByName.loan Like 'AN%')
but I am struggling with the next section.
Using another AND (LaborDetailByName.loan Like '%0', '%P') etc doesn't return anything in the dataset. Might I be overfiltering, or simply forgetting an argument?
You can replace your filter clause
(LaborDetailByName.loan Like 'AN%')
with
(LaborDetailByName.loan Like 'AN%[OPANRVCLWEY]')
In SQL Server or MySQL, you can use RIGHT():
AND RIGHT(LaborDetailByName.loan,1) in ('0','P','A','N','R','V','C','L','W','E','Y')

How do I call log?

I got this error:
Could not infer the matching function for org.apache.pig.builtin.LOG
as multiple or none of them fit. Please use an explicit cast
From this code:
> describe A;
A: {p: long,k: chararray,count: double}
> foreach (group A by p) generate SUM(A.count * LOG(A.count));
What am I doing wrong?
I suppose LOG works on a double, not on a bag of doubles. In your context you are giving it a bag, just as in SUM(A.count), but SUM is supposed to work with a bag.
Try to prepare you data before bag aggregations, something like:
computed = foreach A generate p, (count * LOG(count)) as multiplied;
summed = foreach (group computed by p) generate SUM(multiplied);

Using SQLDF to select specific values from a column

SQLDF newbie here.
I have a data frame which has about 15,000 rows and 1 column.
The data looks like:
cars
autocar
carsinfo
whatisthat
donnadrive
car
telephone
...
I wanted to use the package sqldf to loop through the column and
pick all values which contain "car" anywhere in their value.
However, the following code generates an error.
> sqldf("SELECT Keyword FROM dat WHERE Keyword="car")
Error: unexpected symbol in "sqldf("SELECT Keyword FROM dat WHERE Keyword="car"
There is no unexpected symbol, so I'm not sure whats wrong.
so first, I want to know all the values which contain 'car'.
then I want to know only those values which contain just 'car' by itself.
Can anyone help.
EDIT:
allright, there was an unexpected symbol, but it only gives me just car and not every
row which contains 'car'.
> sqldf("SELECT Keyword FROM dat WHERE Keyword='car'")
Keyword
1 car
Using = will only return exact matches.
You should probably use the like operator combined with the wildcards % or _. The % wildcard will match multiple characters, while _ matches a single character.
Something like the following will find all instances of car, e.g. "cars", "motorcar", etc:
sqldf("SELECT Keyword FROM dat WHERE Keyword like '%car%'")
And the following will match "car" or "cars":
sqldf("SELECT Keyword FROM dat WHERE Keyword like 'car_'")
This has nothing to do with sqldf; your SQL statement is the problem. You need:
dat <- data.frame(Keyword=c("cars","autocar","carsinfo",
"whatisthat","donnadrive","car","telephone"))
sqldf("SELECT Keyword FROM dat WHERE Keyword like '%car%'")
# Keyword
# 1 cars
# 2 autocar
# 3 carsinfo
# 4 car
You can also use regular expressions to do this sort of filtering. grepl returns a logical vector (TRUE / FALSE) stating whether or not there was a match or not. You can get very sophisticated to match specific items, but a basic query will work in this case:
#Using #Joshua's dat data.frame
subset(dat, grepl("car", Keyword, ignore.case = TRUE))
Keyword
1 cars
2 autocar
3 carsinfo
6 car
Very similar to the solution provided by #Chase. Because we do not use subset we do not need a logical vector and can use both grep or grepl:
df <- data.frame(keyword = c("cars", "autocar", "carsinfo", "whatisthat", "donnadrive", "car", "telephone"))
df[grep("car", df$keyword), , drop = FALSE] # or
df[grepl("car", df$keyword), , drop = FALSE]
keyword
1 cars
2 autocar
3 carsinfo
6 car
I took the idea from Selecting rows where a column has a string like 'hsa..' (partial string match)