How do I generate an IF statement in PyPika?
I am trying to generate a BigQuery query that pivots a row to a column. I found that if I use the following in a query (where date_range is from a WITH statement):
IF (date_range.kind = 'year', date_range.name, NULL) as year
then this will work. However, I haven't found a way to generate this SQL fragment in PyPika.
For completeness, this is an example of a query I need to run in BigQuery:
WITH date_range AS (
SELECT
CAST(EXTRACT(year FROM year) as string) name,
'year' kind,
year start_date,
DATE_ADD(year, INTERVAL 1 year) end_date
FROM UNNEST(GENERATE_DATE_ARRAY('2010-01-01','2020-06-01',INTERVAL 1 year)) year
UNION ALL
SELECT
FORMAT_DATE('%B', month)||' '||EXTRACT(year FROM month) name,
'month' kind,
month start_date,
DATE_ADD(month,INTERVAL 1 month) end_date
FROM
UNNEST(GENERATE_DATE_ARRAY('2010-01-01','2020-06-01',INTERVAL 1 month)) month
)
SELECT
IF(date_range.kind='year', date_range.name, null) as year,
IF(date_range.kind='month', date_range.name, null) as month,
SUM(sales.sales_value) sales_value,
FROM sales
JOIN date_range ON sales.start_date>=date_range.start_date AND sales.end_date<date_range.end_date
GROUP BY year, month
ORDER BY year, month
The more general question I have is, is there a way to pass literal strings to PyPika so that those will be included in the resulting query string? There are several SQL fragments that Pypika does not generate (such as GENERATE_DATE_ARRAY and UNNEST, at least as far I can find) and passing the actual SQL fragment to PyPika would solve the problem.
Thanks!
Not sure if it applies but be sure to also check whether the CASE statement can help you.
Other than that you can either subclass PyPika's Function class and overwrite get_sql and use that or (ab)use the CustomFunction and PseudoColumn utility classes like this:
from pypika import CustomFunction
sales_table = Table('sales')
MyIf = CustomFunction('IF', ['condition', 'if', 'else'])
q = Query.from_(sales_table).select(
MyIf(PseudoColumn("date_range.kind = 'year'"), PseudoColumn("date_range.name"), None, alias="year")
)
However, I'd probably recommend making a ticket on the PyPika Github.
Note: I wasn't able to test this.
Related
I have a table with update requests for disk quotas. Each record has a request time, a path and as quota size.
I need to aggregate all request for a certain path in a cretain month, plus the last request of the priv month. Something like
sum(quota) over (partition by )
I would appreciate some ideas or thoughts about how to do that in the most elegant way. Of course, it may be (and probably is) a multy-phase process.
Your question is short on details. To give complete answer you need to supply table definition (ddl), test data (or better a Fiddle), and the expected results of that data. However, retrieving this month's data and the latest form the prior month becomes something like:
select sq.path, sum(sq.quota) over (partition by sq.path) total_quota
from ( select s.path, s.quota
from some_table s
where s.path = 'certain path'
and date_trunc('month', s.request_time) = date_trunc('month', current_date)
union all
select sc.path, sc.quota
from some_table sc
where sc.path = 'certain path'
and (sc.path,sc.request_time) =
(select sl.path, max(sl.request_time)
from some_table sl
where sl.path = sc.path
and date_trunc('month', sl.request_time) = date_trunc('month', current_date - interval '1 month')
group by sl.path
) sq1
) sq;
If the month of interest does not contain current_date you could pass a parameter value for the month of interest.
Note: this is not tested.
I want to filter out my activities based on the multicombobox values with the last 4 Years. I am able to make a query to the database if I provide just one year.
SELECT * FROM public.events WHERE (EXTRACT(YEAR from start_date)) = 2019
However, I am not quite sure how can I give the query if I want to filter based on multiple years
SELECT * FROM public.events WHERE (EXTRACT(YEAR from start_date)) IN [2019, 2020]
is not working. How can I change my query?
The expression you meant to write:
WHERE EXTRACT(YEAR from start_date) IN (2019, 2020)
That is, IN expects a list within parentheses, not square brackets.
But I would actually suggest using explicit range comparison instead:
where start_date >= '2019-01-01'::date and start_date < '2021-01-01'::date
The advantage of this approach is that it is SARGeable, meaning it can take advantage of an index on column start_date (while the original expression needs to extract() the year from each and every row before being able to actually filter).
SELECT SUM(Total_A ) FROM Materials_List
This is the snippet of code that I have.
I need it to calculate by month and display by month using SQL.
I also would like it to be a code I can use for any month in the year not just one month at a time.
You seem to be looking for simple aggregation:
select
year(materials_datetime) yr,
month(materials_datetime) mn,
sum(total_a) sum_total_a
from materials_list
group by
year(materials_datetime),
month(materials_datetime)
order by yr, mn
This assumes that column materials_datetime contains the date/time that you want to use to aggregate the data.
I need to form a report which provides some information per each date within dates interval.
I need to have it within a single query (can't create any functions or supporting tables).
How can I achieve that in PrestoDB?
Note: There are lots of vendor specific solution here, here and even here. But none of them satisfies my need as they either don't work in Presto or use tables/functions.
To be more precise here is an example of query:
WITH ( query to select all dates between 2017.01.01 and 2018.01.01 ) AS dates
SELECT
date date,
count(*) number_of_orders
FROM dates dates
LEFT JOIN order order
ON order.created_at = dates.date
You can use the Presto SEQUENCE() function to generate a sequence of days as an array, and then use UNNEST to explode that array as a result set.
Something like this should work for you:
SELECT date_array AS DAY
FROM UNNEST(
SEQUENCE(
cast('2017-01-01' AS date),
cast('2018-01-01' AS date),
INTERVAL '1' DAY
)
) AS t1(date_array)
I want to display the amount of data by month and year. This is an example of displaying data by date:
select count(*) from db.trx where trxdate = to_date('2018-04-23','yyyy-mm-dd')
When I try to display the amount of data by month and year, no query results appear. Is there something wrong with the query?
The query:
select count(*) from db.trx where trxdate = to_date('2018-04','yyyy-mm')
You need to apply the function to trxdate. Using your logic:
SELECT Count(*)
FROM olap.trxh2hpdam
WHERE To_char(trxdate, 'YYYY-MM') = '2018-04';
However, I strongly recommend that you use direct date comparisons:
WHERE trxdate >= date '2018-04-01'
AND
trxdate < date '2018-05-01'
This will allow the database to use an index on trxdate.
There are a couple of ways of accomplishing what you're trying to do. Which one works for you will depend on your database design (for example, the indexes you've created). One way might be this:
SELECT COUNT(*) FROM olap.trxh2hpdam
WHERE TRUNC(trxdate, 'MONTH') = DATE'2018-04-01';
This will round the date down to the first of the month (and, of course, remove any time portion). Then you simply compare it to the first of the month for which you want the data. However, unless you have an index on TRUNC(trxdate, 'MONTH'), this may not be the best course of action; if trxdate is indexed, you'll want to use:
SELECT COUNT(*) FROM olap.trxh2hpdam
WHERE trxdate >= DATE'2018-04-01'
AND trxdate < DATE'2018-05-01';
There are a number of functions at your disposal in Oracle (e.g. ADD_MONTHS()) in the event that the date you use in your query is supposed to be dynamic rather than static.
Just FYI, there is no reason not to use ANSI date literals when trying to retrieve data by day as well. I'm not sure your original query is a good example of getting data for a particular day, since the Oracle DATE datatype does at least potentially include a time:
SELECT COUNT(*) FROM olap.trxh2hpdam
WHERE trxdate >= DATE'2018-04-23'
AND trxdate < DATE'2018-04-24';
or:
SELECT COUNT(*) FROM olap.trxh2hpdam
WHERE TRUNC(trxdate) = DATE'2018-04-23';
EDIT
In case the month and year are dynamic, I would build a date from them (e.g., TO_DATE('<year>-<month>-01', 'YYYY-MM-DD')) and then use the following query:
SELECT COUNT(*) FROM olap.trxh2hpdam
WHERE trxdate >= TO_DATE('<year>-<month>-01', 'YYYY-MM-DD')
AND trxdate < ADD_MONTHS( TO_DATE('<year>-<month>-01', 'YYYY-MM-DD'), 1 );
Hope this helps.