Translate age() function from postgreSQL to Google Bigquery

Translate age() function from postgreSQL to Google Bigquery - google-bigquery

I'm aware that age() doesn't exist in Google Bigquery. There's DATE_DIFF but it's not giving me the accurate difference between 2 dates.
For example, if run this query in postgresql
select AGE('2020-10-10','2000-10-11')
This will give me the result: "19 years 11 mons 30 days"
However, if I run this query in Google Bigquery
SELECT DATE_DIFF(safe.parse_date('%Y%m%d', safe_cast(20201010 as string)),safe.parse_date('%Y%m%d', safe_cast(20001011 as string)), YEAR)
This will give me result: 20
How can I achieve the result like in postgresql? That has the difference in year, month, and day between 2 dates in Google bigquery.
Thanks.

Since there is no direct function available in BigQuery using which you can get the output in the expected format, you can use Javascript UDF functions which lets you call a Javascript function from a SQL query in BigQuery. For your reference, you can use the below sample query.
Query:
CREATE TEMP FUNCTION
DiffinDate(days int64)
RETURNS STRING
LANGUAGE js AS r"""
function DateDiff(days) {
var years = Math.floor(days / 365);
var months = Math.floor(days % 365 / 30);
var days = Math.floor(days % 365 % 30);
var yearsDisplay = years + (years == 1 ? " year " : " years ");
var monthsDisplay = months + (months == 1 ? " month " : " months ");
var daysDisplay = days + (days == 1 ? " day" : " days");
return yearsDisplay + monthsDisplay + daysDisplay
}
return DateDiff(days);
""";
WITH
INPUT AS (
SELECT
DATE_DIFF(DATE '2010-07-07', DATE '2008-12-25', DAY) day )
SELECT
DiffinDate(day) AS diff_in_date
FROM
INPUT
Output:
You can change the calculation logic as per your convenience.

Related

I want to know which partitions are getting hit when I run a hive query?

I have a table having year, month and day as partition. I am trying to find an optimized way to read the data for last n days using parameters. The only way I can do this at the moment is by specifying each of the combination of year, month, and day individually which is very problematic if we have to read a lot of data, say for 1 month.
Below is a sample example.
select count(*) from table
where (year = 2021 and month = 7 and day = 5)
or (year = 2021 and month = 7 and day = 4)
or (year = 2021 and month 7 and day 3)
I am interested in knowing the following.
Can I use case when in where clause without impacting the performance? For example, will the below query read the same amount of data as the above query?
select count(*) from table
where year = 2021 and month = 7 and (case when day between 4 and 7 then 1 else 0 end) = 1
How does partition work behind the scenes? I believe that the query gets converted into a map reduce job before execution. Will the both codes mentioned above will be converted to same map reduce job?
Can I use functions like case when freely with partitioned columns in where clause and will the hive query engine be able to interpret the function and scan the appropriate partitions?
Is there any built in function in hive to know which partitions are getting hit by the query? If not, is there any workaround? Is there any way to know the same in presto?

Partition pruning works fine with queries like this, the logic is like in your CASE expression:
where concat(year, '-', month, '-', day) >= '2021-07-04'
and
concat(year, '-', month, '-', day) <= '2021-07-07'
See this answer.
How to check how partition pruning works: Use EXPLAIN DEPENDENCY or EXPLAIN EXTENDED See this answer.

How to assign a value to a date depending on it in PostgreSQL?

I'm using PostgreSQL and want to create a function and use it in the same query. The function that I want to create should do the same as the following python function:
def get_season(dia):
season = (abs(dia.year) % 100) + (1 if dia.strftime('%m-%d') >= '10-01' else 0)
return season
Giving a datetime, the function returns the last two digits of its year, or the previous one plus 1 if the date is greater than October the first, for example:
input = '2017-3-5' -> output = 17
input = '2019-11-1' -> output = 20
The problem is that I don't know what functions use for doing that in PostgreSQL.
Currently, I'm using the following code, but throw errors:
CREATE FUNCTION get_season(dia, datetime) RETURNS integer AS $$
BEGIN
MOD(EXTRACT(year FROM dia), 100) +
CASE WHEN EXTRACT(MONTH FROM dia) >= 10 THEN 1 ELSE 0 END;
END $$
LANGUAGE PLPGSQL;
SELECT date_column, get_season(date_column)
FROM my_table
The error throw is: (psycopg2.errors.UndefinedObject) dia type doesn't exist

The following should do this:
create or replace function get_season(p_input date)
returns integer
as
$$
select case
when extract(month from p_input) > 9 then extract(year from p_input)::int + 1
else extract(year from p_input)::int
end % 100
$$
language sql
immutable;
The condition extract(month from p_input) > 9 checks if the date is in October or later and returns the next year's value.
However, I would strongly recommend not to use two-digit years.

The postgres EXTRACT function will pull part of a date out, so I'd thus imagine you can make something like:
MOD(EXTRACT(year FROM yourdate), 100) +
CASE WHEN EXTRACT(month FROM yourdate) > 9 THEN 1 ELSE 0 END
The first gets the date as two digits, the second gets 0 or 1 to add depending on the month
ps; I'm not sure I would call such a function "get_season", there being 4 seasons a year etc - perhaps "get_academic_year", but this is entirely contextual I'm sure.. Just a note of what an unknowing third party assumed when they saw the word "season"

YQL - What is syntax to query data for the past 3 months in YQL?

I have this query and I want to change it so that it gets data for the past 3 months (or 90 days from now).
select * from yahoo.finance.historicaldata where symbol = "YHOO" and startDate = "2009-09-11" and endDate = "2010-03-10"
Is there any "lookback" function available in YQL?

If you want to use YQL, I would suggest playing around with the YQL developer console (with community tables enabled)
Here: https://developer.yahoo.com/yql/console/?q=show%20tables&env=store://datatables.org/alltableswithkeys
The following YQL statement would get the historical data for Yahoo from the start of the year to yesterday:
select * from yahoo.finance.historicaldata where symbol = "YHOO" and startDate = "2016-01-01" and endDate = "2016-03-24"
This URL will generate the following JSON response:
https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22YHOO%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-03-24%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=

PHP SQL Select between 4 columns

I´m looking for a solution, where I can select the entries between 2 dates. My table is like this
ID | YEAR | MONTH | ....
Now i want to SELECT all entries between
MONTH 9 | YEAR 2015
MONTH 1 | YEAR 2016
I don´t get any entries, because the 2nd month is lower than the 1st month. Here is my query:
SELECT *
FROM table
WHERE YEAR >= '$year'
AND MONTH >= '$month'
AND YEAR <= '$year2'
AND MONTH <= '$month2'
I can´t change the columns of the table, because a csv import is like this. Can anyone help me on this?

The years aren't disconnected from the months, so you can't test them separately.
Try something like
$date1 = $year*100+$month; // will be 201509
$date2 = $year2*100+$month2; // will be 201602
...
SELECT * FROM table WHERE (YEAR*100)+MONTH >= '$date1' AND (YEAR*100)+MONTH <= '$date2'
Make sure you protect against SQL injection though.

SELECT
*
FROM
`my_table`
WHERE
((`YEAR` * 12) + `MONTH`) >= (($year * 12) + $month)
AND ((`YEAR` * 12) + `MONTH`) <= (($year2 * 12) + $month2)
Since they aren't date fields, you need to convert to numbers that can be compared against. Multiplying the year by 12 and adding the month will give you a unique number specific to that month of the year. Then you can compare on that.

There are a couple of good answers, but assuming taht you don't/can't change the date's format something you can do is
WHERE ((YEAR>'$year') OR
(YEAR=='$year' AND MONTH>='$month')
AND ((YEAR<'$year2') OR
(YEAR=='$year2' AND MONTH<='$month2')
I would suggest the workarounds though (like alphabetically comparing in YYYYMM[DD] format).

You need to pad the month to make sure it starts with a zero. Otherwise 20162 will be lower than 201512, for example.
$date1 = $year . str_pad($month, 2, "0", STR_PAD_LEFT);
$date2 = $year2 . str_pad($month2, 2, "0", STR_PAD_LEFT);
"SELECT * FROM dates WHERE concat(`year`, LPAD(`month`, 2, '0')) >= '$date1' AND concat(`year`, LPAD(`month`, 2, '0')) <= '$date2'"

Though there are a lot of ways to solve this problem, but the best way is to convert these values into a proper date type in mysql query using str_to_date it is PHP's equivalent of strtotime, your new query should look like this
SELECT
d.*
from
dates as d
where
STR_TO_DATE( concat('1,',d.month,',',d.year) ,'%d,%m,%Y') > STR_TO_DATE('1,5,2015','%d,%m,%Y')
and
STR_TO_DATE( concat('1,',d.month,',',d.year) ,'%d,%m,%Y') < STR_TO_DATE('1,4,2016','%d,%m,%Y')
Using this technique you can easily compare dates and do much more and not worry about other complexities of calendars.
Source: MySQL date and time functions

HQL count() with group by not returning zero/0 records

String query = "select hour(la.dateLastUpdated) as hour,"
+ "coalesce(count(la), 0) from LoginActivity la"
+ "where la.dateLastUpdated > :date"
+ "group by hour(la.dateLastUpdated)"
+ "order by hour(la.dateLastUpdated)";
Date date = new Date(System.currentTimeMillis() - 12*60*60*1000));
Result I'm getting is like
Hour Count
---- -----
12 1
13 3
15 4
17 11
But I want result like
Hour Count
---- -----
12 1
13 3
14 0
15 4
16 0
17 11
That means the zero counts. Tried coalesce but it's not working. Any probable hql query to get expected result? Native query also will do.
*I'm using PostgreSql database

If the record you want (for example Hour = 14) does not exist in your LoginActivity table, how can you expect it to show up in your resultset?
I assume that you want to list every hour of the day and get record counts based on that; if this is the case then
You need a dictionary-like structure that includes every hour of the day to begin with,
You need to execute a left outer join from this structure to your resultset, joining on the field Hour.

Try this one :::
"Select hour(la.dateLastUpdated) as hour, count(coalesce(la, 0))
from LoginActivity la
where la.dateLastUpdated > :date
group by hour(la.dateLastUpdated)
order by hour(la.dateLastUpdated);"

From postgres you can get the result using this query.
SELECT * FROM generate_series(12,24) AS s(hourdigit)
LEFT JOIN (
SELECT
hour(la.dateLastUpdated) AS hour,
coalesce(count(la), 0)
FROM LoginActivity la
WHERE la.dateLastUpdated > '2014-05-05'
GROUP BY hour(la.dateLastUpdated)
ORDER BY hour(la.dateLastUpdated)
) AS resultdata ON resultdata.hour = s.hourdigit

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Translate age() function from postgreSQL to Google Bigquery - google-bigquery

Related

I want to know which partitions are getting hit when I run a hive query?

How to assign a value to a date depending on it in PostgreSQL?

YQL - What is syntax to query data for the past 3 months in YQL?

PHP SQL Select between 4 columns

HQL count() with group by not returning zero/0 records

Categories

Resources