How to escape characters in SQL code in an R Markdown chunk? - sql

```
{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(odbc)
library(DBI)
library(dbplyr)
```
```{sql, connection=con, output.var="df"}
SELECT DB_Fruit.Pear, Store.Name, Cal.Year, Sales.Qty FROM DB_Fruit
```
#> Error: unexpected symbol in "SELECT DB_Fruit.Pear"
I'm attempting to run SQL code in an R Markdown chunk as shown above. I'm getting the "unexpected symbol" error shown above. My best guess is that I need to escape the underscore with something such as \_ or \\_ but neither of those makes my error go away.
If I instead query using DBI (shown below) I do not get any errors:
df <- dbGetQuery(con,'
SELECT DB_Fruit.Pear, Store.Name, Cal.Year, Sales.Qty
FROM DB_Fruit
')
Maybe the dbGetQuery function is able to interpret things such as underscores _ correctly whereas the regular R Markdown parser can't? Or maybe there's blank spaces that have been copy/pasted as some weird unicode characters that again dbGetQuery function is able to interpret whereas the regular R Markdown parser can't?
What's the likely culprit and what do I do about it?

Your chunk header probably should be
{SQL, connection=con, output.var="df"}
instead of
{r SQL, connection=con, output.var="df"}

You have to use "Chunk Output Inline" in the Rmarkdown document
---
editor_options:
chunk_output_type: inline
---

Related

How to use Latex within an f-string expression in Matplotlib; no variables in equation

I'm trying to make a title for a plot, but the title includes a variable, which I'm inserting using an f-string, but it also includes a Latex expression. I either get an error that f-string expressions do not take \ character, or else it's trying to read what's inside the equation as variables and complaining that it's not defined.
The code I'm trying looks something like this:
test = 'TEST'
plt.plot(1234,5678)
plt.title(f"This is a {test}: ${\sqrt{b/a}}$")
plt.show()
This code will give me the error: "f-string expression part cannot include a backslash", and when I try this (note the extra brackets):
test = 'TEST'
plt.plot(1234,5678)
plt.title(f"This is a {test}: ${{\sqrt{b/a}}}$")
plt.show()
I get this error: "name 'b' is not defined"
I want it to just show a square root of b/a, where b and a are just the letters, not variables, so that it looks something like the plot below:
but I can't seem to make it work with an f-string variable also in the title.
It has to be done like this:
plt.title(f"This is a {test}: ${{\sqrt{{b/a}}}}$")
Since you need to use the { and } characters, they need to be doubled up so that they are interpreted as literal characters. This will prevent it from interpreting the contents between the brackets as a Python expression. Thus, it will no longer complain about the backslash or undefined variables.
Alternatively, it can be put in a separate string to avoid doubling up the brackets. This is more readable in my opinion.
tex = "${\sqrt{b/a}}$"
plt.title(f"This is a {test}: {tex}")

'unicodeescape' codec can't decode bytes in position 0-1: malformed \N character escape

I am trying to push data from Databricks into SQL. However, I get the following error:
I noticed in the file that I am processing that one of the columns has the following as a value:
I have tried to filter out the records by using the following:
df = df.filter(df.COLUMN != "\N")
However, when the above runs, I get the error message idenitfied above. Is there some way to filter out values that have an escape character in them?
I would really appreciate any help. Thank you.
As the error suggested, you need to escape the backslash \
df.filter(df.value != "\\N")

Remove stopwords using open refine

Following this example https://github.com/OpenRefine/OpenRefine/wiki/Recipes#removeextract-words-contained-in-a-file
I'm trying to remove stopwords listed in a file using open refine
Example: you want to remove from a text all stopwords contained in a file on your desktop. In this case, use Jython.
with open(r"C:\Users\ettor\Desktop\stopwords.txt",'r') as f :
stopwords = [name.rstrip() for name in f]
return " ".join([x for x in value.split(' ') if x not in stopwords])
Unfortunately got Internal error
Yes, this script works as you can see in this screencast.
I changed it a bit to ignore the letter case.
with open(r"~\Desktop\stopwords.txt",'r') as f :
stopwords = [name.rstrip().lower() for name in f]
return " ".join([x for x in value.split(' ') if x.lower() not in stopwords])
In an Open Refine's Python script, "internal error" often means a syntax error, such as a forgotten parenthesis or bad indentation.

SWI-Prolog predicate for reading in lines from input file

I'm trying to write a predicate to accept a line from an input file. Every time it's used, it should give the next line, until it reaches the end of the file, at which point it should return false. Something like this:
database :-
see('blah.txt'),
loop,
seen.
loop :-
accept_line(Line),
write('I found a line.\n'),
loop.
accept_line([Char | Rest]) :-
get0(Char),
C =\= "\n",
!,
accept_line(Rest).
accept_line([]).
Obviously this doesn't work. It works for the first line of the input file and then loops endlessly. I can see that I need to have some line like "C =\= -1" in there somewhere to check for the end of the file, but I can't see where it'd go.
So an example input and output could be...
INPUT
this is
an example
OUTPUT
I found a line.
I found a line.
Or am I doing this completely wrong? Maybe there's a built in rule that does this simply?
In SWI-Prolog, the most elegant way to do this is to first use a DCG to describe what a "line" means, and then use library(pio) to apply the DCG to a file.
An important advantage of this is that you can then easily apply the same DCG also on queries on the toplevel with phrase/2 and do not need to create a file to test the predicate.
There is a DCG tutorial that explains this approach, and you can easily adapt it to your use case.
For example:
:- use_module(library(pio)).
:- set_prolog_flag(double_quotes, codes).
lines --> call(eos), !.
lines --> line, { writeln('I found a line.') }, lines.
line --> ( "\n" ; call(eos) ), !.
line --> [_], line.
eos([], []).
Example usage:
?- phrase_from_file(lines, 'blah.txt').
I found a line.
I found a line.
true.
Example usage, using the same DCG to parse directly from character codes without using a file:
?- phrase(lines, "test1\ntest2").
I found a line.
I found a line.
true.
This approach can be very easily extended to parse more complex file contents as well.
If you want to read into code lists, see library(readutil), in particular read_line_to_codes/2 which does exactly what you need.
You can of course use the character I/O primitives, but at least use the ISO predicates. "Edinburgh-style" I/O is deprecated, at least for SWI-Prolog. Then:
get_line(L) :-
get_code(C),
get_line_1(C, L).
get_line_1(-1, []) :- !. % EOF
get_line_1(0'\n, []) :- !. % EOL
get_line_1(C, [C|Cs]) :-
get_code(C1),
get_line_1(C1, Cs).
This is of course a lot of unnecessary code; just use read_line_to_codes/2 and the other predicates in library(readutil).
Since strings were introduced to Prolog, there are some new nifty ways of reading. For example, to read all input and split it to lines, you can do:
read_string(user_input, _, S),
split_string(S, "\n", "", Lines)
See the examples in read_string/5 for reading linewise.
PS. Drop the see and seen etc. Instead:
setup_call_cleanup(open(Filename, read, In),
read_string(In, N, S), % or whatever reading you need to do
close(In))

extract only certain tags in xml file using pig latin

I want to extract only the states from the below xml file.
<.Table>
<State>Florida</State>
<id>123</id>
<./Table>
<.Table>
<State>Texas</State>
<id>456</id>
<./Table>
Expected output :
(Florida)
(Texas)
But with the below pig statements I get
()
() as output
A = LOAD 'hdfs:/user.xml' USING org.apache.pig.piggybank.storage.XMLLoader('Table')
AS (x:chararray);
B = FOREACH A GENERATE FLATTEN (REGEX_EXTRACT_ALL(x,
'<Table>\\n\\s*<State>(.*)</State>\\n\\s*\\n\\s*</Table>'))
as (state:chararray);
Please help me understand where I have gone wrong or how do I eliminate a certain tag line?
That looks like a buggy regex, after the closing </State> you are using \\n\\s*\\n\\s*</Table> which seems to ignore the the <id>...</id> elements. Have you looked at using some XML parsing library in a UDF? It might be easier than trying to build a bunch of regexes by hand.
EDIT: One other suggestion. Are you sure that the line separators in your file are just \n, you may have \r\n as the separator, in which case [\r\n]+ should help see this post for more details.