Correct syntax for SQL Query with XML - sql

My DataXML looks like this
<TestResults>
<MethodResult>
X
X
<StepResult name="BluetoothERROR">
X
X
X
X
X
X
X
<StepResult name="FLOWERROR1">
<Number value="-100" />
</ActualValue>
X
X
<StepResult name="PowerOffError">
X
X
X
</StepResult>
</MethodResult>
</TestResults>
Where X means other instances of StepResult with different Name like BluetoothError or PowerOffError. Assume that the other StepResults can have similar outputs as the "FLOWERROR1".
I am particularly interested in StepResult with name "FlowError1" and I would like to return the Number value of -100.
I have tried this line of code and it did not work and only shows Nulls.
f.ResultXML.value('(/TestResults/MethodResult/StepResult/ActualValue/Number/#Value)[1]', 'varchar(max)') As "Actual Value"
What should I have done instead?

You can to filter nodes by the name and return the first matching node' value:
select #data.value('(TestResults/MethodResult/StepResult[#name="FLOWERROR1"]/ActualValue/Number)[1]/#value', 'int')
db<>fiddle demo
See Introduction to Using XPath Queries and XQuery Language Reference.

Related

Spark SQL: Can't UNNEST lambda variables

I am encountering a strange behaviour. I can't access lambda variable with UNNEST in my spark code:
FILTER(boxes.clicks, x -> EXISTS (SELECT 1 FROM UNNEST(x) AS clicks WHERE clicks.href IS NOT NULL))
This will complain that x does not exist: cannot resolve 'x' given input columns: []
However, without UNNEST, x can be accessed without any problems. For example, this will work just fine:
FILTER(boxes.clicks, x -> size(x) > 1)
Is it possible to use lambda variables in combination with UNNEST?

sum of setof in prolog

I have this predicate to get the sum of the length of all borders of a country. I could solve it with findall but I have to use setof. My facts look like this:
borders(sweden,finland,586).
borders(norway,sweden,1619).
My code
circumference(C, Country) :-
findall(X, ( borders(Country, _, X) ; borders(_, Country, X)), Kms),
sum_list(Kms, C).
You cannot find the sum using bagof directly, all you can do is make a list and then sum that list (but you knew that already). In SWI-Prolog there is library(aggregate) that does the bagging and the summing for you. With the facts you have, you would write:
?- aggregate(sum(X), Y^( borders(Y, sweden, X) ; borders(sweden, Y, X) ), Circumference).
Circumference = 2205.
If you instead must obey the whims of your instructor and type "bagof" yourself, or if you are not allowed to use a modern, open source, comprehensive Prolog implementation, you can use the same approach with bagof and manually build the list before summing it:
?- bagof(X, Y^( borders(Y, sweden, X) ; borders(sweden, Y, X) ), Borders).
Borders = [1619, 586].
For reasons that are lost in the mists of time the funny thing with the Var^Goal that you see in both aggregate and bagof is called "existentially qualifying the variables in Goal". You might also read that "^ prevents binding Var in Goal". I cannot explain what this really means.
I ended up using this:
circumference(Z, Country) :- setof(X, Q^(borders(Q,Country,X);borders(Country,Q,X)),Border),
sum_list(Border,Z).
% Adds the numbers in a list.
sum_list([], 0).
sum_list([H|T], Sum) :-
sum_list(T, Rest),
Sum is H + Rest.

SUM function in PIG

Starting to learn Pig latin scripting and stuck on below issue. I have gone through similar questions on the same topic without any luck! Want to find SUM of all the age fields.
DUMP X;
(22)(19)
grunt> DESCRIBE X;
X: {age: int}
I tried several options such as :
Y = FOREACH ( group X all ) GENERATE SUM(X.age);
But, getting below exception.
Invalid field projection. Projected field [age] does not exist in schema: group:chararray,X:bag{:tuple(age:int)}.
Thanks for your time and help.
I think the Y projection should work as you wrote it. Here's mi little example code for the same and that's just work fine for me.
X = LOAD 'SO/sum_age.txt' USING PigStorage('\t') AS (age:int);
DESCRIBE X;
Y = FOREACH ( group X all ) GENERATE
SUM(X.age);
DESCRIBE Y;
DUMP Y;
So you your problem looks strange. I used the following input data:
-bash-4.1$ cat sum_age.txt
22
19
Can you make a try on the same data with script I inserted here?

Using Multi-Value Field in MS Access Query

I have a MVF field (I am fully aware that this is not a best practice) and I need to create a query where the result looks like this:
PersonName MVF_Opt_1 MVF_Opt_2 MVF_Opt_3
Tim X X X
John X
Jake X X
I tried using an expression for each one that looks like:
MVF_Opt_1: IIf([Options].[Value] = 1,"X","")
For each of the query columns I need, but this seems to only be working if the option in the expression happens to be the first value in the MVF.
I also have about 20 options that do not need to be listed in columns that I can disregard.
Any ideas?
This seems to be working for me:
SELECT
mvfTest.PersonName,
IIf(DCount("*","mvfTest","PersonName=""" & [PersonName] & """ And Options.Value=""1""")=0,"","X") AS MVF_Opt_1,
IIf(DCount("*","mvfTest","PersonName=""" & [PersonName] & """ And Options.Value=""2""")=0,"","X") AS MVF_Opt_2
FROM mvfTest;

Using SQLDF to select specific values from a column

SQLDF newbie here.
I have a data frame which has about 15,000 rows and 1 column.
The data looks like:
cars
autocar
carsinfo
whatisthat
donnadrive
car
telephone
...
I wanted to use the package sqldf to loop through the column and
pick all values which contain "car" anywhere in their value.
However, the following code generates an error.
> sqldf("SELECT Keyword FROM dat WHERE Keyword="car")
Error: unexpected symbol in "sqldf("SELECT Keyword FROM dat WHERE Keyword="car"
There is no unexpected symbol, so I'm not sure whats wrong.
so first, I want to know all the values which contain 'car'.
then I want to know only those values which contain just 'car' by itself.
Can anyone help.
EDIT:
allright, there was an unexpected symbol, but it only gives me just car and not every
row which contains 'car'.
> sqldf("SELECT Keyword FROM dat WHERE Keyword='car'")
Keyword
1 car
Using = will only return exact matches.
You should probably use the like operator combined with the wildcards % or _. The % wildcard will match multiple characters, while _ matches a single character.
Something like the following will find all instances of car, e.g. "cars", "motorcar", etc:
sqldf("SELECT Keyword FROM dat WHERE Keyword like '%car%'")
And the following will match "car" or "cars":
sqldf("SELECT Keyword FROM dat WHERE Keyword like 'car_'")
This has nothing to do with sqldf; your SQL statement is the problem. You need:
dat <- data.frame(Keyword=c("cars","autocar","carsinfo",
"whatisthat","donnadrive","car","telephone"))
sqldf("SELECT Keyword FROM dat WHERE Keyword like '%car%'")
# Keyword
# 1 cars
# 2 autocar
# 3 carsinfo
# 4 car
You can also use regular expressions to do this sort of filtering. grepl returns a logical vector (TRUE / FALSE) stating whether or not there was a match or not. You can get very sophisticated to match specific items, but a basic query will work in this case:
#Using #Joshua's dat data.frame
subset(dat, grepl("car", Keyword, ignore.case = TRUE))
Keyword
1 cars
2 autocar
3 carsinfo
6 car
Very similar to the solution provided by #Chase. Because we do not use subset we do not need a logical vector and can use both grep or grepl:
df <- data.frame(keyword = c("cars", "autocar", "carsinfo", "whatisthat", "donnadrive", "car", "telephone"))
df[grep("car", df$keyword), , drop = FALSE] # or
df[grepl("car", df$keyword), , drop = FALSE]
keyword
1 cars
2 autocar
3 carsinfo
6 car
I took the idea from Selecting rows where a column has a string like 'hsa..' (partial string match)