Google Query Language: filter by date - sql

I'm trying to add a filter by date in a Google Visualization API query, but I'm doing something wrong with the syntax...
This is the code without the date filter:
query.setQuery('SELECT A, B, C, D, E, F, G where upper(A) like upper("keyword") or upper(F) like upper("keyword") order by B DESC');
I want to add an AND and also add the condition that date in ColB must be >= of 1st Aug 2016.
So I tried with:
query.setQuery('SELECT A, B, C, D, E, F, G where upper(A) like upper("keyword") or upper(F) like upper("keyword") AND upper(B) >= date "2016-08-01" order by B DESC');
But the syntax is probably wrong as the query gets interrupted.

If B is a date your error is:
Unable to parse query string for Function QUERY parameter 2: upper takes a text parameter
To solve it just remove upper function.
IF B is just a string then automatic type casting is done and query should run without problems.

Related

Adding a "calculated column" to BigQuery query without repeating the calculations

I want to resuse value of calculated columns in a new third column.
For example, this query works:
select
countif(cond1) as A,
countif(cond2) as B,
countif(cond1)/countif(cond2) as prct_pass
From
Where
Group By
But when I try to use A,B instead of repeating the countif, it doesn't work because A and B are invalid:
select
countif(cond1) as A,
countif(cond2) as B,
A/B as prct_pass
From
Where
Group By
Can I somehow make the more readable second version work ?
Is this first one inefficient ?
You should construct a subquery (i.e. a double select) like
SELECT A, B, A/B as prct_pass
FROM
(
SELECT countif(cond1) as A,
countif(cond2) as B
FROM <yourtable>
)
The same amount of data will be processed in both queries.
In the subquery one you will do only 2 countif(), in case that step takes a long time then doing 2 instead of 4 should be more efficient indeed.
Looking at an example using bigquery public datasets:
SELECT
countif(homeFinalRuns>3) as A,
countif(awayFinalRuns>3) as B,
countif(homeFinalRuns>3)/countif(awayFinalRuns>3) as division
FROM `bigquery-public-data.baseball.games_post_wide`
or
SELECT A, B, A/B as division FROM
(
SELECT countif(homeFinalRuns>3) as A,
countif(awayFinalRuns>3) as B
FROM `bigquery-public-data.baseball.games_post_wide`
)
we can see that doing all in one (without a subquery) is actually slightly faster. (I ran the queries 6 times for different values of the inequality, 5 times was faster and one time slower)
In any case, the efficiency will depend on how taxing is to compute the condition in your particular dataset.

Access - SQL Query Date wise with selection of column summarized value

Below is my source Data
by using below query I can get summarized data for '17-09-2016'
SQL Query :-
SELECT key_val.A, key_val.B, key_val.C, key_val.D, Sum(IIf(key_val.Store_date=#9/17/2016#,key_val.Val,0)) AS [17-09-2016]
FROM key_val
GROUP BY key_val.A, key_val.B, key_val.C, key_val.D;
but I am looking output suppose to look like this way.
Specifically= I need summarized data for column a,b,c and for '17-09-2016' dateIn excel we will apply sumifs formula to get desired output but in Access - SQL I am not getting how to form the query to get the same data.
Can any one assist me how to acheive above result by using Access Query?
Specifically= I need summarized data for column a,b,c and for '17-09-2016' date
I'm not sure where you get the 34 figure from - the sum of the first two rows even though the values in A, B, C & D are different (so the grouping won't work)?
Making an assumption that you want the values summed where all the other fields are equal (A, B, C, D & Store_Date):
This query will give you the totals, but not in the format you're after:
SELECT A, B, C, D, SUM(val) As Total, Store_Date
FROM key_val
WHERE Store_date = #9/17/2016#
GROUP BY A,B,C,D, Store_Date
This SQL will give you the same, but for all dates (just remove the WHERE clause).
SELECT A, B, C, D, SUM(val) As Total, Store_Date
FROM key_val
GROUP BY A,B,C,D, Store_Date
ORDER BY Store_Date
This will give the exact table shown in your example:
TRANSFORM Sum(val) AS SumOfValue
SELECT A, B, C, D
FROM key_val
WHERE Store_date = #9/17/2016#
GROUP BY A,B,C,D,val
PIVOT Store_Date
Again, just remove the WHERE clause to list all dates in the table:

Using previous table in pig group syntax after filter

Suppose I have a table in pig with 3 columns, a , b, c. Now suppose I want to filter the table by b == 4 and then group it by a. I believe that would look something like this.
t1 = my_table; -- the table contains three columns a, b, c
t1_filtered = FILTER t1_filtered by (
b == 4
);
t1_grouped = GROUP t1_filtered by my_table.a;
My question is why can't it look like this:
t1 = my_table; -- the table contains three columns a, b, c
t1_filtered = FILTER t1_filtered by (
b == 4
);
t1_grouped = GROUP t1_filtered by t1_filtered.a;
Why do you have to reference the table before the filter? I'm trying to learn pig and i find myself making this mistake a lot. It seems to me that t1_filtered should equal a table that is just the filtered version of t1. Therefore a simple group should make sense, but i've been told you need to reference the table from before. Does anyone know whats going on behind the scenes and why this makes sense? Also, help naming this question is also appreciated.
The way you have De-referenced(.) is also not correct. This is how it should be.
A = LOAD '/filepath/to/tabledata' using PigStorage(',') as (a:int,b:int,c:int);
B = FILTER A BY a==1;
C = GROUP B BY a;
But your way of dereferencing(.) will also work in some cases. You can only use dot(.) when you are referencing a complex data type like a map,tuple or bag. If we use dot operator to access the normal fields it would expect a scalar output. If it has more than one output then you will get a error something like this.
java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (1,2,3), 2nd :(2,2,2)
Your way of using the dot operator would work only if the output of your group by has only one output if not you will end up with this error. Relation B is not a complex data type that is the reason we do not use any dereferencing operator in the group by clause.
Hope this answers your question.

Where clause not working

=QUERY('Sheet8'!A:Z;"select A, B where 'Year' = 2016")
I have this query and I wanted to make the year part dynamic so I tried this
=QUERY('Sheet8'!A:Z;"select A, B where 'Year' = year(now())+4")
But that didn't work so even tried saving the values in cells and the referencing them in the were clause like where 'Year' = Sheet!B1 but that didn't work either. How can I create a where statement that looks performs an operation before doing the comparison?
The year() function in the QUERY select clause is a scalar function that must reference one of the columns. So if the column of dates is in column A, you would use where year(A) = etc.
To make it dynamic, it is best (IMO) to concatenate a value generated from spreadsheet functions (rather than the select clause, which doesn't seem to generate according to the spreadsheet time zone). So something like:
=QUERY('Sheet8'!A:Z;"select A, B where year(A) = "&(YEAR(NOW())+4))

How to "default" a column in a SELECT query

Say I have a database table T with 4 fields, A, B, C, and D. A, B, and C are the primary key. For any combination of [A, B], there is always a row where C == spaces. There may or may not be other rows where C != spaces. I have a query that gets all rows where [A, B] == [in_a, in_b], and also where C == in_c if such a row exists, or C == spaces if the in_c row doesn't exist. So, if there is a row that matches the particular C value, I want that one, otherwise I want the spaces one. It is very important that if there is a matching C row, that I not be returned the spaces one along with it.
I have a working query, but its not very fast. This is executing on DB2 for z/OS. I have full control over these tables, so I can define new indicies if needed. The only index on the table right now is [A, B, C], the primary key. This SQL is kinda messy, and I feel theres a better way to accomplish this task. What can I do to make this query faster?
The query I have now is:
SELECT A, B, C, D FROM T
WHERE A = :IN_A AND B > :IN_B AND
(C = :IN_C
OR (NOT EXISTS(
SELECT B FROM T WHERE
A = :IN_A AND B > :IN_B AND C = :IN_C))
AND C = " ");
Caveat emptor, as I am not familiar with DB2 SQL...
You could try using an ORDER BY clause to sort the matching rows such that a row with c = spaces is last in the sorted set, then retrieve just the first row of the set. Something like:
select first
A, B, C, D
from T
where A = :IN_A
and B = :IN_B
order by C desc;
This assumes that the FIRST and ORDER BY DESC clauses do what I expect them to.
This will work on DB2 LUW, not sure if the order by clause works on DB2 Z:
select
a, b, c, d
from t
where a = :IN_A
and b = :IN_B
and c in (:IN_C,' ')
order by
case c when ' ' then 2 else 1 end
fetch first 1 row only
Make sure that the ' ' value matches the actual value of the column.
Good luck,
Why not start up the index advisor and reads its advices? (or is this only on DB2 for i/OS?)
We use the advisor for our very big production environment and it gives great advices. But having that said, it's always good to start with a good statement.