Extract year complex query - PostgreSQL - sql

I want to filter out my activities based on the multicombobox values with the last 4 Years. I am able to make a query to the database if I provide just one year.
SELECT * FROM public.events WHERE (EXTRACT(YEAR from start_date)) = 2019
However, I am not quite sure how can I give the query if I want to filter based on multiple years
SELECT * FROM public.events WHERE (EXTRACT(YEAR from start_date)) IN [2019, 2020]
is not working. How can I change my query?

The expression you meant to write:
WHERE EXTRACT(YEAR from start_date) IN (2019, 2020)
That is, IN expects a list within parentheses, not square brackets.
But I would actually suggest using explicit range comparison instead:
where start_date >= '2019-01-01'::date and start_date < '2021-01-01'::date
The advantage of this approach is that it is SARGeable, meaning it can take advantage of an index on column start_date (while the original expression needs to extract() the year from each and every row before being able to actually filter).

Related

Optimization on large tables

I have the following query that joins two large tables. I am trying to join on patient_id and records that are not older than 30 days.
select * from
chairs c
join data id
on c.patient_id = id.patient_id
and to_date(c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') >= 0
and to_date (c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') < 30
Currently, this query takes 2 hours to run. What indexes can I create on these tables for this query to run faster.
I will take a shot in the dark, because as others said it depends on what the table structure, indices, and the output of the planner is.
The most obvious thing here is that as long as it is possible, you want to represent dates as some date datatype instead of strings. That is the first and most important change you should make here. No index can save you if you transform strings. Because very likely, the problem is not the patient_id, it's your date calculation.
Other than that, forcing hash joins on the patient_id and then doing the filtering could help if for some reason the planner decided to do nested loops for that condition. But that is for after you fixed your date representation AND you still have a problem AND you see that the planner does nested loops on that attribute.
Some observations if you are stuck with string fields for the dates:
YYYYMMDD date strings are ordered and can be used for <,> and =.
Building strings from the data in chairs to use to JOIN on data will make good use of an index like one on data for patient_id, from_date.
So my suggestion would be to write expressions that build the date strings you want to use in the JOIN. Or to put it another way: do not transform the child table data from a string to something else.
Example expression that takes 30 days off a string date and returns a string date:
select to_char(to_date('20200112', 'YYYYMMDD') - INTERVAL '30 DAYS','YYYYMMDD')
Untested:
select * from
chairs c
join data id
on c.patient_id = id.patient_id
and id.from_date between to_char(to_date(c.from_date, 'YYYYMMDD') - INTERVAL '30 DAYS','YYYYMMDD')
and c.from_date
For this query:
select *
from chairs c join data
id
on c.patient_id = id.patient_id and
to_date(c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') >= 0 and
to_date (c.from_date, 'YYYYMMDD') - to_date(id.from_date, 'YYYYMMDD') < 30;
You should start with indexes on (patient_id, from_date) -- you can put them in both tables.
The date comparisons are problematic. Storing the values as actual dates can help. But it is not a 100% solution because comparison operations are still needed.
Depending on what you are actually trying to accomplish there might be other ways of writing the query. I might encourage you to ask a new question, providing sample data, desired results, and a clear explanation of what you really want. For instance, this query is likely to return a lot of rows. And that just takes time as well.
Your query have a non SERGABLE predicate because it uses functions that are iteratively executed. You need to discard such functions and replace them by a direct access to the columns. As an exemple :
SELECT *
FROM chairs AS c
JOIN data AS id
ON c.patient_id = id.patient_id
AND c.from_date BETWEEN id.from_date AND id.from_date + INTERVAL '1 day'
Will run faster with those two indexes :
CREATE X_SQLpro_001 ON chairs (patient_id, from_date);
CREATE X_SQLpro_002 ON data (patient_id, from_date) ;
Also try to avoid
SELECT *
And list only the necessary columns

SQL Query to Retrieve Monthly Data

I'm working with the following SQL Query in Redash, the query retrieves monthly data from table.
SELECT *
FROM Table
WHERE
"Date" between '2021-04-01T00:00:00.669976+00:00' and '2021-04-30T23:59:59.669976+00:00'
I'd like to know if there's a workaround to updating the WHERE clause in an efficient manner rather than manually typing it out at the end of each month.
This worked well for me:
WHERE
EXTRACT(MONTH FROM "Date") = EXTRACT(MONTH FROM CURRENT_DATE) AND EXTRACT(YEAR FROM "Date") = EXTRACT(YEAR FROM CURRENT_DATE)
In your case, I suggest you avoid any solution that involves doing a convert or other type of conversion with the GRP_Date field. By doing that, you do not allow SQL Server to be able to use an index if there is one for the GRP_Date field and this can affect your performance in a very obvious way.
And of course between is not ideal in this specific case for the reasons already mentioned in your question.
I suggest the following condition for the best performance (good use of the indexes) and to avoid problems with the hours
where GRP.GRP_date >= #since
and GRP.GRP_date < dateadd(day, 1, #until) -- #until + 1 day
In the case where:
#since = 2016-11-01
#until = 2016-11-14
where GRP.GRP_Fecha >= '2016-11-01'
and GRP.GRP_Fecha < '2016-11-15'
When ordering dates before 2016-11-15, this includes all dates from 2016-11-14 regardless of time.

PyPika how to generate IF statement

How do I generate an IF statement in PyPika?
I am trying to generate a BigQuery query that pivots a row to a column. I found that if I use the following in a query (where date_range is from a WITH statement):
IF (date_range.kind = 'year', date_range.name, NULL) as year
then this will work. However, I haven't found a way to generate this SQL fragment in PyPika.
For completeness, this is an example of a query I need to run in BigQuery:
WITH date_range AS (
SELECT
CAST(EXTRACT(year FROM year) as string) name,
'year' kind,
year start_date,
DATE_ADD(year, INTERVAL 1 year) end_date
FROM UNNEST(GENERATE_DATE_ARRAY('2010-01-01','2020-06-01',INTERVAL 1 year)) year
UNION ALL
SELECT
FORMAT_DATE('%B', month)||' '||EXTRACT(year FROM month) name,
'month' kind,
month start_date,
DATE_ADD(month,INTERVAL 1 month) end_date
FROM
UNNEST(GENERATE_DATE_ARRAY('2010-01-01','2020-06-01',INTERVAL 1 month)) month
)
SELECT
IF(date_range.kind='year', date_range.name, null) as year,
IF(date_range.kind='month', date_range.name, null) as month,
SUM(sales.sales_value) sales_value,
FROM sales
JOIN date_range ON sales.start_date>=date_range.start_date AND sales.end_date<date_range.end_date
GROUP BY year, month
ORDER BY year, month
The more general question I have is, is there a way to pass literal strings to PyPika so that those will be included in the resulting query string? There are several SQL fragments that Pypika does not generate (such as GENERATE_DATE_ARRAY and UNNEST, at least as far I can find) and passing the actual SQL fragment to PyPika would solve the problem.
Thanks!
Not sure if it applies but be sure to also check whether the CASE statement can help you.
Other than that you can either subclass PyPika's Function class and overwrite get_sql and use that or (ab)use the CustomFunction and PseudoColumn utility classes like this:
from pypika import CustomFunction
sales_table = Table('sales')
MyIf = CustomFunction('IF', ['condition', 'if', 'else'])
q = Query.from_(sales_table).select(
MyIf(PseudoColumn("date_range.kind = 'year'"), PseudoColumn("date_range.name"), None, alias="year")
)
However, I'd probably recommend making a ticket on the PyPika Github.
Note: I wasn't able to test this.

Teradata Current year and year-1

How to get the dynamic years in the Query for where condition, i need to fetch data for 2017,2018,2019, currently i am hard coding them ( where FSC_YR in (2017,2018,2019) instead i need in a dynamic way. How to do it in teradata.
I tried extract(year from current_date)-2,extract(year from current_date)-1,extract(year from current_date)-3). I am getting error too many expression.
Since you're looking for a range of year numbers, why not just use a BETWEEN?
SELECT *
FROM data
WHERE fsc_yr BETWEEN EXTRACT(year FROM current_date - interval '2' year) AND EXTRACT(year FROM current_date)
But as #dnoeth pointed out in the comments.
To avoid an error when running it on Feb. 29, using INTERVAL might not be the safest method.
But just subtracting from the year number isn't so bad really.
SELECT *
FROM data
WHERE fsc_yr BETWEEN EXTRACT(year FROM current_date)-2 AND EXTRACT(year FROM current_date)
Also note that such error can come from selecting more than 1 column in the query for an IN
For example this would fail:
SELECT * FROM Table1
WHERE Col1 IN (SELECT Col1, Col2 FROM Tabel2)
So if you would use the query for data with a * then it would still result in that error.

Postgresql query between date ranges

I am trying to query my postgresql db to return results where a date is in certain month and year. In other words I would like all the values for a month-year.
The only way i've been able to do it so far is like this:
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN '2014-02-01' AND '2014-02-28'
Problem with this is that I have to calculate the first date and last date before querying the table. Is there a simpler way to do this?
Thanks
With dates (and times) many things become simpler if you use >= start AND < end.
For example:
SELECT
user_id
FROM
user_logs
WHERE
login_date >= '2014-02-01'
AND login_date < '2014-03-01'
In this case you still need to calculate the start date of the month you need, but that should be straight forward in any number of ways.
The end date is also simplified; just add exactly one month. No messing about with 28th, 30th, 31st, etc.
This structure also has the advantage of being able to maintain use of indexes.
Many people may suggest a form such as the following, but they do not use indexes:
WHERE
DATEPART('year', login_date) = 2014
AND DATEPART('month', login_date) = 2
This involves calculating the conditions for every single row in the table (a scan) and not using index to find the range of rows that will match (a range-seek).
From PostreSQL 9.2 Range Types are supported. So you can write this like:
SELECT user_id
FROM user_logs
WHERE '[2014-02-01, 2014-03-01]'::daterange #> login_date
this should be more efficient than the string comparison
Just in case somebody land here... since 8.1 you can simply use:
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN SYMMETRIC '2014-02-01' AND '2014-02-28'
From the docs:
BETWEEN SYMMETRIC is the same as BETWEEN except there is no
requirement that the argument to the left of AND be less than or equal
to the argument on the right. If it is not, those two arguments are
automatically swapped, so that a nonempty range is always implied.
SELECT user_id
FROM user_logs
WHERE login_date BETWEEN '2014-02-01' AND '2014-03-01'
Between keyword works exceptionally for a date. it assumes the time is at 00:00:00 (i.e. midnight) for dates.
Read the documentation.
http://www.postgresql.org/docs/9.1/static/functions-datetime.html
I used a query like that:
WHERE
(
date_trunc('day',table1.date_eval) = '2015-02-09'
)
or
WHERE(date_trunc('day',table1.date_eval) >='2015-02-09'AND date_trunc('day',table1.date_eval) <'2015-02-09')