In R - Is it possible to make a mysql query with a IN, where the values in that IN are from a DataFrame column?
EXAMPLE of what I'm trying to do:
Directory = read.csv("worlddirectory.csv", header = TRUE, sep = ",",stringsAsFactors=FALSE)
Active_Customers = Directory[(Directory$Status == "Active"),]
PhoneNumbers = dbGetQuery(DBConnection,
"
Select
db.phonenumbers,
db.names
from
database db
where
db.country IN
(
Active_Customers$Country
);"
As we can see here, the expected statement looks like:
WHERE column_name IN (value1, value2, ...);
We can use paste with the argument collapse=", " to obtain the desired format. I think this should work:
PhoneNumbers = dbGetQuery(DBConnection,
paste0("SELECT db.phonenumbers, db.names ",
"FROM database db ",
"WHERE db.country IN (",
paste(Active_Customers$Country,collapse=", "),");"))
Example:
Active_Customers <- data.frame(Country=c("NL","BE","US"))
paste0("SELECT db.phonenumbers, db.names ",
"FROM database db ",
"WHERE db.country IN (",
paste(Active_Customers$Country,collapse=", "),");")
Output:
[1] "SELECT db.phonenumbers, db.names FROM database db WHERE db.country IN (NL, BE, US);"
Hope this helps!
Related
I am trying to execute the value of the variable, but I can't find documentation about it in Google BigQuery.
DECLARE SQL STRING;
SELECT
SQL =
CONCAT(
"CREATE TABLE IF NOT EXISTS first.rdds_",
REPLACE(CAST(T.actime AS STRING), " 00:00:00+00", ""),
" PARTITION BY actime ",
" CLUSTER BY id ",
" OPTIONS( ",
" partition_expiration_days=365 ",
" ) ",
" AS ",
"SELECT * ",
"FROM first.rdds AS rd ",
"WHERE rd.actime = ",
"'", CAST(T.actime AS STRING), "'",
" AND ",
"EXISTS ( ",
"SELECT 1 ",
"FROM first.rdds_load AS rd_load ",
"WHERE rd_load.id= rd.id ",
")"
) AS SQ
FROM (
SELECT DISTINCT actime
FROM first.rdds AS rd
WHERE EXISTS (
SELECT 1
FROM first.rdds_load AS rd_load
WHERE rd_load.id= rd.id
)
) T;
My variable will have many rows with scripted for create tables and I need to execute this variable.
In SQL Server for to execute variable is:
EXEC(#variable);
How to I execute SQL variable in Google BigQuery?
EDIT:
I did new test with version beta:
Using array, all rows in one result (ARRAY_AGG):
DECLARE SQL ARRAY<STRING>;
SET SQL = (
SELECT
CONCAT(
"CREATE TABLE IF NOT EXISTS first.rdds_",
REPLACE(CAST(T.actime AS STRING), " 00:00:00+00", ""),
" PARTITION BY actime ",
" CLUSTER BY id ",
" OPTIONS( ",
" partition_expiration_days=365 ",
" ) ",
" AS ",
"SELECT * ",
"FROM first.rdds AS rd ",
"WHERE rd.actime = ",
"'", CAST(T.actime AS STRING), "'",
" AND ",
"EXISTS ( ",
"SELECT 1 ",
"FROM first.rdds_load AS rd_load ",
"WHERE rd_load.id= rd.id ",
")"
)
) AS SQ
FROM (
SELECT DISTINCT actime
FROM first.rdds AS rd
WHERE EXISTS (
SELECT 1
FROM first.rdds_load AS rd_load
WHERE rd_load.id= rd.id
)
) T
);
My result:
One row with all instructions. But I can't running this with all instructions
Update: as of 5/20/2020, BigQuery released dynamic SQL feature for you to achieve the goal.
Dynamic SQL is now available as a beta release in all BigQuery regions. Dynamic SQL lets you generate and execute SQL statements dynamically at runtime. For more information, see EXECUTE IMMEDIATE.
x
================
BigQuery does not support this (Dynamic SQL) in pure SQL, but you can implement this in any client of your choice
While Mikhail is correct that this historically hasn't been supported in BigQuery, the very new beta release of BigQuery Scripting should let you accomplish similar results:
https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting
In this case, you would need to use SET to assign the value of the variable, and there isn't an EXEC statement at this time, but there is support for conditionals, loops, variables, etc.
To recreate your example, you could store the results of a query against either your first.rdds_load table, then use WHILE to loop over those results. Within that loop, you can run a normal CREATE TABLE if it doesn't already exist. I'm thinking something along these lines based on your example . . .
DECLARE results ARRAY<STRING>;
DECLARE i INT64 DEFAULT 1;
DECLARE cnt INT64 DEFAULT 0;
SET results = ARRAY(
SELECT
DISTINCT AS VALUE
CAST(actime AS STRING)
FROM
first.rdds AS rd
WHERE
EXISTS (
SELECT
1
FROM
first.rdds_load AS rd_load
WHERE
rd_load.id = rd.id
)
);
SET cnt = ARRAY_LENGTH(results);
WHILE i <= cnt DO
/* Body of CREATE TABLE goes here; you can access the rows from the query above using results[ORDINAL(i)] as you loop through*/
END WHILE;
There's also support for stored procedures, which can be executed via CALL with passed arguments, which may work in your case as well (if you need to abstract the creation logic used by many scripts).
(I would argue that this scripting support is superior to building and executing strings, since you'll still get SQL validation and such for your query.)
As always with beta features, use with caution in production—but for what it's worth, thus far my experience has been incredibly stable.
I found a way to do it like this:
import sqlite3
conn = sqlite3.connect(
'avg_prices.db',
detect_types=sqlite3.PARSE_DECLTYPES | sqlite3.PARSE_COLNAMES)
cursorObj = conn.cursor()
cursorObj.execute('SELECT name from sqlite_master where type= "table"')
listed_tables =cursorObj.fetchall()
last_table = listed_tables[len(listed_tables)-1][0]
rate="VES/COP"
sql = "SELECT * FROM" + "'" + last_table + "'" + "WHERE Rates=" + "'" + rate + "'"
cursorObj.execute(sql)
result = cursorObj.fetchone()
Being result the values that I wanted to get.
I check this: MySQL - query for last created table
And it seems there is an easier way, but I didn't manage to get it to work on python.
SQLite does not store the creation time of tables. That means that you cannot do what you want.
The sqlite_master has no relevant column. You are at your own risk if you assume that the results are returned in table creation order.
I know that there is a similar question already posed. Since the answers are more than 6 years old, I started a new thread.
I have a Access DB and a copy of that DB. The plan is to write data to the copy and then automatically the new data to the original. So basically both DBs are the same.
I found and answer here: How Do I Copy a table from one Access DB to another Access DB. Now I want to adapt this to my purpose but I fail.
I have attached the SQL string
strSQL = "INSERT INTO [maintblKeyFinancials].* " & _
"IN '" & destination_DB & "' " & _
"SELECT * FROM [maintblKeyFinancials] " & _
" WHERE [Company_ID] = " & identifier & _
" AND [Reference_year] = " & Chr$(34) & Year & Chr$(34) & ";"
Yes, [Reference_year] is a string. I also attached the Output
INSERT INTO [maintblKeyFinancials].* IN 'C:\destination.accdb'
SELECT * FROM [maintblKeyFinancials] IN 'C:\source.accdb'
WHERE [Company_ID] = 899 AND [Reference_year] = "2015";
When I execute the string, I get "syntax error in query. incomplete query clause". And I don't know what to correct. Hope you can help me. Thx!
INSERT INTO [maintblKeyFinancials].*
Remove the .* at the end, this gives the syntax error. It's either
INSERT INTO [maintblKeyFinancials] (column1, column2)
SELECT column1, column2 FROM ...
or if the columns are completely identical
INSERT INTO [maintblKeyFinancials]
SELECT * FROM ...
write [YourServer].[yourSchema].[YourTable]
SELECT * FROM [YourServer].[YourSchema].[maintblKeyFinancials]
I am trying out some dynamic SQL queries using R and the postgres package to connect to my DB.
Unfortunately I get an empty data frame if I execute the following statement:
x <- "Mean"
query1 <- dbGetQuery(con, statement = paste(
"SELECT *",
"FROM name",
"WHERE statistic = '",x,"'"))
I believe that there is a syntax error somewhere in the last line. I already changed the commas and quotation marks in every possible way, but nothing seems to work.
Does anyone have an idea how I can construct this SQL Query with a dynamic WHERE Statement using a R variable?
You should use paste0 instead of paste which is producing wrong results or paste(..., collapse='') which is slightly less efficient (see ?paste0 or docs here).
Also you should consider preparing your SQL statement in separated variable. In such way you can always easily check what SQL is being produced.
I would use this (and I am using this all the time):
x <- "Mean"
sql <- paste0("select * from name where statistic='", x, "'")
# print(sql)
query1 <- dbGetQuery(con, sql)
In case I have SQL inside a function I always add debug parameter so I can see what SQL is used:
function get_statistic(x=NA, debug=FALSE) {
sql <- paste0("select * from name where statistic='", x, "'")
if(debug) print(sql)
query1 <- dbGetQuery(con, sql)
query1
}
Then I can simply use get_statistic('Mean', debug=TRUE) and I will see immediately if generated SQL is really what I expected.
The Problem The problem may be that you have spaces around Mean:
x <- "Mean"
s <- paste(
"SELECT *",
"FROM name",
"WHERE statistic = '",x,"'")
giving:
> s
[1] "SELECT * FROM name WHERE statistic = ' Mean '"
Corrected Version Instead try:
s <- sprintf("select * from name where statistic = '%s'", x)
giving:
> s
[1] "select * from name where statistic = 'Mean'"
gsubfn You could also try this:
library(gsubfn)
fn$dbGetQuery(con, "SELECT *
FROM name
WHERE statistic = '$x'")
Try this:
require(stringi)
stri_paste("SELECT * ",
"FROM name ",
"WHERE statistic = '",x,"'",collapse="")
## [1] "SELECT * FROM name WHERE statistic = 'Mean'"
or use concatenate operator %+%
"SELECT * FROM name WHERE statistic ='" %+% x %+% "'"
## [1] "SELECT * FROM name WHERE statistic ='mean'"
A newer way to do this is with the glue package, part of the tidyverse. It is described as "An implementation of interpreted string literals, inspired by Python's Literal String Interpolation."
Using glue, you would do:
library(glue)
library(DBI)
x <- "Mean"
query1 <- glue_sql("
SELECT *
FROM name
WHERE statistic = ({x})
", .con = con)
dbGetQuery(con, query1)
It's a great package due to its flexibility. For example, let's say you wanted to import mean, median and mode statistics. Then you would add an asterisk to the call like so:
x <- c("Mean", "Median", "Mode")
query2 <- glue_sql("
SELECT *
FROM name
WHERE statistic = ({x*})
", .con = con)
dbGetQuery(con, query2)
I have the following table in a SQLite3 database:
CREATE TABLE overlap_results (
neighbors_of_annotation varchar(20),
other_annotation varchar(20),
set1_size INTEGER,
set2_size INTEGER,
jaccard REAL,
p_value REAL,
bh_corrected_p_value REAL,
PRIMARY KEY (neighbors_of_annotation, other_annotation)
);
I would like to perform the following query:
SELECT * FROM overlap_results WHERE
(neighbors_of_annotation, other_annotation)
IN (('16070', '8150'), ('16070', '44697'));
That is, I have a couple of tuples of annotation IDs, and I'd like to fetch
records for each of those tuples. The sqlite3 prompt gives me the following
error:
SQL error: near ",": syntax error
How do I properly express this as a SQL statement?
EDIT I realize I did not explain well what I am really after. Let me try another crack at this.
If a person gives me an arbitrary list of terms in neighbors_of_annotation that they're interested in, I can write a SQL statement like the following:
SELECT * FROM overlap_results WHERE
neighbors_of_annotation
IN (TERM_1, TERM_2, ..., TERM_N);
But now suppose that person wants to give me pairs of terms if the form (TERM_1,1, TERM_1,2), (TERM_2,1, TERM_2,2), ..., (TERM_N,1, TERM_N,2), where TERM_i,1 is in neighbors_of_annotation and TERM_i,2 is in other_annotation. Does the SQL language provide an equally elegant way to formulate the query for pairs (tuples) of interest?
The simplest solution seems to be to create a new table, just for these pairs,
and then join that table with the table to be queried, and select only the
rows where the first terms and the second terms match. Creating tons of AND /
OR statements looks scary and error prone.
I've never seen SQL like that. If it exists, I would suspect it's a non-standard extension. Try:
SELECT * FROM overlap_results
WHERE neighbors_of_annotation = '16070'
AND other_annotation = '8150'
UNION ALL SELECT * FROM overlap_results
WHERE neighbors_of_annotation = '16070'
AND other_annotation = '44697';
In other words, build the dynamic query from your tuples but as a series of unions instead, or as a series of ANDs within ORs:
SELECT * FROM overlap_results
WHERE (neighbors_of_annotation = '16070' AND other_annotation = '8150')
OR (neighbors_of_annotation = '16070' AND other_annotation = '44697');
So, instead of code (pseudo-code, tested only in my head so debugging is your responsibility) such as:
query = "SELECT * FROM overlap_results"
query += " WHERE (neighbors_of_annotation, other_annotation) IN ("
sep = ""
for element in list:
query += sep + "('" + element.noa + "','" + element.oa + "')"
sep = ","
query += ");"
you would instead have something like:
query = "SELECT * FROM overlap_results "
sep = "WHERE "
for element in list:
query += sep + "(neighbors_of_annotation = '" + element.noa + "'"
query += " AND other_annotation = '" + element.oa + "')"
sep = "OR "
query += ";"
I'm not aware of any SQL dialects that support tuples inside IN clauses. I think you're stuck with:
SELECT * FROM overlap_results WHERE (neighbors_of_annotation = '16070' and other_annotation = '8150') or (neighbors_of_annotation = '16070' and other_annotation = '44697')
Of course, this particular query can be simplified to something like:
SELECT * FROM overlap_results WHERE neighbors_of_annotation = '16070' and (other_annotation = '8150' or other_annotation = '44697')
Generally SQL WHERE-clause predicates only allow filtering on a single-column.