Parameters in Datalab SQL modules - google-bigquery

The parameterization example in the "SQL Parameters" IPython notebook in the datalab github repo (under datalab/tutorials/BigQuery/) shows how to change the value being tested for in a WHERE clause. Is it possible to use a parameter to change the name of a field being SELECT'd on?
eg:
SELECT COUNT(DISTINCT $a) AS n
FROM [...]
After I received the answer below, here is what I have done (with a dummy table name and field name, obviously):
%%sql --module test01
DEFINE QUERY get_counts
SELECT $a AS a, COUNT(*) AS n
FROM [project_id.dataset_id.table_id]
GROUP BY a
ORDER BY n DESC
table = bq.Table('project_id.dataset_id.table_id')
field = table.schema['field_name']
bq.Query(test01.get_counts,a=field).sql
bq.Query(test01.get_counts,a=field).results()

You can use a field from a Schema object (eg. given a table, get a specific field via table.schema[fieldname]).
Or implement a custom object with a _repr_sql_ method. See: https://github.com/GoogleCloudPlatform/datalab/blob/master/sources/lib/api/gcp/bigquery/_schema.py#L49

Related

R: Summary of SQL Tables

I am working with the R programming language.
Normally, when I want to get the summary of a table, I can use something like the "str()" function or the "summary()" function:
str(my_table)
summary(my_table)
However, now I am trying to do this with tables on a server.
For instance, I am trying to get the summaries of variable types for a specific table (e.g. "my_table") on a server. I found a very indirect way to do this:
#load libraries
library(OBDC)
library(RODBC)
library(dbi)
#establish a connection and name it as "dbhandle"
rs <- dbSendQuery(dbhandle, 'select * from my_table limit 1')
dbColumnInfo(rs)
My Question: Is there a more "direct" way to do this? For example, can I get information about each column (e.g. whether the column is integer, character, date, etc.) in a table without first sending the query and then requesting the information? Can I do this directly?
Thanks!
You could try using fetch() from "RMySQL" to turn your SQL query into an R object (e.g. data frame)
library(RMySQL)
rs <- dbSendQuery(dbhandle, 'select * from my_table limit 1')
# Get the results from MySQL into R
my_table = fetch(rs, n=-1)
# clear result
dbClearResult(rs)
rm(rs)
Then use the functions you describe.
str(my_table)
summary(my_table)

Metabase - Field Filter as text

I have this SQL query:
select concept, count(*)
from annotation
where exists (select 1
from annotation a2
where a2.comment_commentid = annotation.comment_commentid and a2.concept = 'Fatigue'
)
group by concept;
And I want to replace 'Fatigue' with {{word}}, to do a filter widget, maping to the column from database.
I have the following error:
ERROR: syntax error at or near "=" Position: 307
What I need to change to aplly the filter? selecting the available words from that column?
With variable type as Text it works... But don't display all the available options, in filter, as variable type Field Filter do...
Thanks!
The outer annotation table needs an alias too. When in doubt, the inner scope always prevails whern resolving names, and the inner exists(...) query an an annotation name in scope, too)
[And the cause of your error is probably that the middleware gets confused]
select concept, count(*)
from annotation a1 -- <<-- HERE!
where exists (select 1
from annotation a2
where a2.comment_commentid = a1.comment_commentid and a2.concept = 'Fatigue'
)
group by concept;

Like Clause over an 'Element' - ORACLE APEX

I encounter some problems that i don't understand with APEX.... Well, let's be specific.
I ve got a select element retrieving a top 50 of most liked url (P11_URL). This is populate by a table view, TOp_Domains.
I create an element called "Context" that have to print all text containing the URL selected by the user from the element select. Those Texts come from another table, let's say "twitter_post".
I create a dynamic action (show only) with this sql/statement:
Select TXT, NB_RT, RANK
from myschema.twitter_post
where TXT like '%:P11_URL%'
group by TXT, NB_RT, RANK
.... and it doesn't work... I think APEX don't like like clause... But i don't know how to do. Let's keep in min an url could have been shared by multiple Tweets, that's why this element "context" is important for me.
I tried to bypass the problem by building a State (in french Statique) and a dynamic action that will refresh the state but it doesn't work neither... bouhououououou
TriX
Right click on the 'P11_URL' and create DA. Event :change, Item:P11_URL. As the true action of the DA, select 'Set Value'. Write your query in the sql stmt area. In the page items to submit, select 'P11_URL' . In the 'Affected Items': select 'Context'.
Query should be :
Select TXT, NB_RT, RANK
from myschema.twitter_post
where TXT like '%' || :P11_URL || '%'
group by TXT, NB_RT, RANK
So
Thanks to #Madona... Their example made me realised my mistake. I wrote the answer here for futher help if somebody encouter the same porblem.
A list select element get as arguments a display value (the one you want to be shown in your screen.... if you want so....^^ ) and a return value (in order, I think to linked dynamic actions). So to solved my problem i had to shape my sql statement as:
select hashtags d, hastags r
from my table
order by 1
[let s say that now in Apex it s an object called P1_HASHTAGS]
First step problem solving.
In fact, the ranking as second value, as i put into my sql statement was making some mitsakens into my 'Where like' clause search... well... Newbie am i!
Second step was to correctly formate the sql statement receiving the datas from my select lov (P1_HASHTAGS) into my interactive report. As shown here:
Select Id, hashtags
from my table
where txt like '%'||:P1_HASHTAGS||'%'
And it works!
Thank you Madona your example helped me figure my mistakes!

How to store the output of a query in a variable in HIVE

I want to store current_day - 1 in a variable in Hive. I know there are already previous threads on this topic but the solutions provided there first recommends defining the variable outside hive in a shell environment and then using that variable inside Hive.
Storing result of query in hive variable
I first got the current_Date - 1 using
select date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1);
Then i tried two approaches:
1. set date1 = ( select date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1);
and
2. set hivevar:date1 = ( select date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1);
Both the approaches are throwing an error:
"ParseException line 1:82 cannot recognize input near 'select' 'date_sub' '(' in expression specification"
When I printed (1) in place of yesterday's date the select query is saved in the variable. The (2) approach throws "{hivevar:dt_chk} is undefined
".
I am new to Hive, would appreciate any help. Thanks.
Hive doesn't support a straightforward way to store query result to variables.You have to use the shell option along with hiveconf.
date1 = $(hive -e "set hive.cli.print.header=false; select date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),1);")
hive -hiveconf "date1"="$date1" -f hive_script.hql
Then in your script you can reference the newly created varaible date1
select '${hiveconf:date1}'
After lots of research, this is probably the best way to achieve setting a variable as an output of an SQL:
INSERT OVERWRITE LOCAL DIRECTORY '<home path>/config/date1'
select CONCAT('set hivevar:date1=',date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd'),1)) from <some table> limit 1;
source <home path>/config/date1/000000_0;
You will then be able to use ${date1} in your subsequent SQLs.
Here we had to use <some table> limit 1 as hive got a bug in insert overwrite if we don't specify a table name.

pass a column name correctly as a function argument for SQL query in R

I have a datafile sales_history. I want to query it in the following way.
my_df<-sqldf("SELECT *
FROM sales_history
WHERE Business_Unit=='RETAIL'"")
Now I want to write a function with argument datafile and column name to do the above job. So something like:
pick_column<-function(df, column_name){
my_df<-sqldf("SELECT *
FROM df
WHERE Business_Unit==column_name"
return(my_df)
}
Ideally, after running the above function definition, I should then be able to run
pick_column(sales_history,'RETAIL'). But when I do this, the second argument 'RETAIL' is not passed to the function correctly. What's the correct way to do this then?
I know that for this example, there are other ways to do this other than using "sqldf" for SQL query. But the point of my question here is how to pass the column_name correctly as a function argument.
the sqldf package uses gsubfn to allow you to add names of R variables into your SQL commands by prefixing them with the "$" character. So you can write
sales_history <- data.frame(
price=c(12,10),
Business_Unit=c("RETAIL","BUSINESS"),
stringsAsFactors=F
)
pick_column <- function(df, columnname) {
fn$sqldf("SELECT * FROM $df WHERE Business_Unit='$columnname'")
}
pick_column("sales_history","RETAIL")