I have this query
SELECT DISTINCT fc,cohort, value
FROM test.input_test
WHERE value IS NULL AND date BETWEEN '2019-07-01' AND '2019-09-30';
I would like to select all rows where the value is null in the whole data range (not only on some specific dates).
One method is window functions:
SELECT DISTINCT fc, cohort, value
FROM (SELECT it.*,
SUM(CASE WHEN it.value IS NULL THEN 1 ELSE 0 END) OVER () as null_cnt
FROM test.input_test it
WHERE it.date >= '2019-07-01' AND it.date < '2019-10-01'
) it
WHERE null_cnt = 0;
Note that I changed the date comparisons to avoid BETWEEN. Here is a good blog explaining why BETWEEN is a bad idea with dates. Most of the points are relevant even if you do not use SQL Server.
How about using an exists clause:
SELECT DISTINCT fc, cohort, value
FROM test.input_test
WHERE
date BETWEEN '2019-07-01' AND '2019-09-30' AND
NOT EXISTS (SELECT 1 FROM test.input_test
WHERE date BETWEEN '2019-07-01' AND '2019-09-30' AND
value IS NOT NULL);
As #Gordon mentioned, if you want to include the entire months of July, August, and September 2019, then the date range you should use is:
WHERE date >= '2019-07-01' AND date < '2019-10-01'
Could you possibly try below :
SELECT DISTINCT IT.fc, IT.cohort, IT.value
FROM test.input_test AS IT
LEFT JOIN
(
SELECT fc,cohort, value
FROM test.input_test
WHERE value IS NOT NULL AND date BETWEEN '2019-07-01' AND '2019-09-30'
) AS QUERY_2 ON IT.fc = QUERY_2.fc AND IT.cohort = QUERY_2.cohort AND IT.value = QUERY_2.value
WHERE QUERY_2.fc IS NULL
Related
I am currently trying to compare aggregated numbers from today and exactly 7 days ago (not between today and 7 days ago, but instead simply comparing these two discrete dates).
I already have a way of doing it using a lot of subqueries, but the performance is bad, and I am now trying to optimize.
This is what I have come up with so far (sample query, not with real table names and columns due to confidentiality):
Select current_date, previous_date, current_sum, previous_sum, percentage
From (Select date as current_date, sum(numbers) as current_sum,
lag (sum(numbers)) over (partition by date order by date) as previous_sum,
(Select max(date)-7 From t1 ) as previous_date,
(current_sum - previous_sum)*100/current_sum as percentage
From t1 where date>=sysdate-7 group by date,previous_date)
But I am definitely doing something wrong since in the output the previous_sum appears null, and naturally the percentage too.
Any ideas on what I am doing wrong? I haven't used LAG before so it must be something there.
Thanks!
Using Join of pre-aggregated subqueries.
with agg as (
select sum(numbers) as sum_numbers, date from t1 group by date
)
select curr.sum_numbers as current_sum,
prev.sum_numbers as prev_sum,
curr.date as curr_date,
prev.date as prev_date
from agg curr
left join agg prev on curr.date-7=prev.date
Using lag:
with agg as (
select sum(numbers) as sum_numbers, date from t1 group by date
)
select sum_numbers as current_sum,
lag(sum_numbers, 7) over(order by date) as prev_sum,
a.date as curr_date,
lag(a.date,7) over(order by date) as prev_date
from agg a
If you want exactly 2 dates only (today and today-7) then it can be done much simpler using conditional aggregation and filter:
select sum(case when date = trunc(sysdate) then numbers else null end) as current_sum,
sum(case when date = trunc(sysdate-7) then numbers else null end) as previous_sum,
trunc(sysdate) as curr_date,
trunc(sysdate-7) as prev_date,
(current_sum - previous_sum)*100/current_sum as percentage
from t1 where date = trunc(sysdate) or date = trunc(sysdate-7)
You can do this with window (analytic) functions, which should be the fastest method. Your actually aggregation query is a bit unclear, but I think it is:
select date as current_date, sum(numbers) as current_sum
from t1
group by date;
If you have values for all dates, then use:
select date as current_date, sum(numbers) as current_sum,
lag(sum(numbers), 7) over (order by date) as prev_7_sum
from t1
group by date;
If you don't have data for all days, then use a window frame:
select date as current_date, sum(numbers) as current_sum,
max(sum(numbers), 7) over (order by date range between '7' day preceding and '7' day preceding) as prev_7_sum
from t1
group by date;
I have a table like this, I hope to count the number of ids by month. I used the following code but it does not work.
id date_time
1390880502018723840,2021-05-08
1390881127930372100,2021-05-08
1390881498270736386,2021-05-08
SELECT twitter.tweets.id
WHERE Month(twitter.tweets.date_time)=01 AND Year(twitter.tweets.date_time)=2021 ;
you have to use count() function and to_char to get year month part of date in one column:
SELECT count(witter.tweets.id)
WHERE to_char(twitter.tweets.date_time,'YYYY-MM')= '2021-01';
you can generalize it for all the month/year by using group by :
SELECT to_char(twitter.tweets.date_time,'YYYY-MM') , count(witter.tweets.id)
group by to_char(twitter.tweets.date_time,'YYYY-MM');
To get counts for all months since Jan 2021:
SELECT date_trunc('month', date_time), count(*)
FROM twitter.tweets
WHERE date_time >= '2021-01-01'
GROUP BY 1;
If id can be NULL (which should be disallowed for an id column), use the slightly more expensive count(id) instead.
Count of distinct IDs:
SELECT date_trunc('month', date_time), count(DISTINCT id)
FROM twitter.tweets
WHERE date_time >= '2021-01-01'
GROUP BY 1;
For only Jan 2021:
SELECT count(DISTINCT id)
FROM twitter.tweets
WHERE date_time >= '2021-01-01'
WHERE date_time < '2021-02-01';
With the query, I basically want to compare avg_clicks at different time periods and set a filter according to the avg_clicks.
The below query gives us avg_clicks for each shop in January 2020. But I want to see the avg_clicks that is higher than 0 in January 2020.
Question 1: When I add the where avg_clicks > 0 in the query, I am getting the following error: Column 'avg_clicks' cannot be resolved. Where to put the filter?
SELECT AVG(a.clicks) AS avg_clicks,
a.shop_id,
b.shop_name
FROM
(SELECT SUM(clicks_on) AS clicks,
shop_id,
date
FROM X
WHERE site = ‘com’
AND date >= CAST('2020-01-01' AS date)
AND date <= CAST('2020-01-31' AS date)
GROUP BY shop_id, date) as a
JOIN Y as b
ON a.shop_id = b.shop_id
GROUP BY a.shop_id, b.shop_name
Question 2: As I wrote, I want to compare two different times. And now, I want to see avg_clicks that is 0 in February 2020.
As a result, the desired output will show me the list of shops that had more than 0 clicks in January, but 0 clicks in February.
Hope I could explain my question. Thanks in advance.
For your Question 1 try to use having clause. Read execution order of SQL statement which gives you a better idea why are you getting avg_clicks() error.
SELECT AVG(a.clicks) AS avg_clicks,
a.shop_id,
b.shop_name
FROM
(SELECT SUM(clicks_on) AS clicks,
shop_id,
date
FROM X
WHERE site = ‘com’
AND date >= '2020-01-01'
AND date <= '2020-01-31'
GROUP BY shop_id, date) as a
JOIN Y as b
ON a.shop_id = b.shop_id
GROUP BY a.shop_id, b.shop_name
HAVING AVG(a.clicks) > 0
For your Question 2, you can do something like this
SELECT
shop_id,
b.shop_name,
jan_avg_clicks,
feb_avg_clicks
FROM
(
SELECT
AVG(clicks) AS jan_avg_clicks,
shop_id
FROM
(
SELECT
SUM(clicks_on) AS clicks,
shop_id,
date
FROM X
WHERE site = ‘com’
AND date >= '2020-01-01'
AND date <= '2020-01-31'
GROUP BY
shop_id,
date
) as a
GROUP BY
shop_id
HAVING AVG(clicks) > 0
) jan
join
(
SELECT
AVG(clicks) AS feb_avg_clicks,
shop_id
FROM
(
SELECT
SUM(clicks_on) AS clicks,
shop_id,
date
FROM X
WHERE site = ‘com’
AND date >= '2020-02-01'
AND date < '2020-03-01'
GROUP BY
shop_id,
date
) as a
GROUP BY
shop_id
HAVING AVG(clicks) = 0
) feb
on jan.shop_id = feb.shop_id
join Y as b
on jan.shop_id = b.shop_id
Start with conditional aggregation:
SELECT shop_id,
SUM(CASE WHEN DATE_TRUNC('month', date) = '2020-01-01' THEN clicks_on END) / COUNT(DISTINCT date) as avg_clicks_jan,
SUM(CASE WHEN DATE_TRUNC('month', date) = '2020-02-01' THEN clicks_on END) / COUNT(DISTINCT date) as avg_clicks_feb
FROM X
WHERE site = 'com' AND
date >= '2020-01-01' AND
date < '2020-03-01'
GROUP BY shop_id;
I'm not sure what comparison you want to make. But if you want to filter based on the aggregated values, use a HAVING clause.
Is there other way to rewrite/improve this query, trying to make it with less typo and if possible improve performance:
Select
(Select Sum(value) from table1
where code = 'B2'
and date between DATE '2017-01-01'
and DATE '2017-03-31')
+
(Select Sum(value) from table2
where code = 'B2'
and date between DATE '2017-04-01'
and DATE '2017-04-30')
I also tried with union all but this still is not what I need:
Select Sum(value)
from (Select code, value from table1
Where date between DATE '2017-01-01'
and DATE '2017-03-31')
union all
(Select code, value from table1
Where date between DATE '2017-04-01'
and DATE '2017-04-30')
where code = 'B2'
Thanks
Your first query is fine . . . assuming you have a from dual at the end.
For performance, you want indexes on table1(code, date, value) and table2(code, date, value). Note that the order of the columns in the indexes is important.
If, with typo you mean that you have the criteria code = 'B2' twice in your query, you can move it to your from clause. Anyway, be aware that a subquery can return NULL. Use NVL (or COALESCE) to deal with this.
select
nvl((select sum(value) from table1
where code = x.code and date between date '2017-01-01' and date '2017-03-31'), 0)
+
nvl((select sum(value) from table2
where code = x.code and date between date '2017-04-01' and date '2017-04-30'), 0)
from (select 'B2' as code from dual) x;
How do I get a maximium daily value of a numerical field over a year in MS-SQL
This would query the daily maximum of value over 2008:
select
datepart(dayofyear,datecolumn)
, max(value)
from yourtable
where '2008-01-01' <= datecolumn and datecolumn < '2009-01-01'
group by datepart(dayofyear,datecolumn)
Or the daily maximum over each year:
select
datepart(year,datecolumn),
, datepart(dayofyear,datecolumn)
, max(value)
from yourtable
group by datepart(year,datecolumn), datepart(dayofyear,datecolumn)
Or the day(s) with the highest value in a year:
select
Year = datepart(year,datecolumn),
, DayOfYear = datepart(dayofyear,datecolumn)
, MaxValue = max(MaxValue)
from yourtable
inner join (
select
Year = datepart(year,datecolumn),
, MaxValue = max(value)
from yourtable
group by datepart(year,datecolumn)
) sub on
sub.Year = yourtable.datepart(year,datecolumn)
and sub.MaxValue = yourtable.value
group by
datepart(year,datecolumn),
datepart(dayofyear,datecolumn)
You didn't mention which RDBMS or SQL dialect you're using. The following will work with T-SQL (MS SQL Server). It may require some modifications for other dialects since date functions tend to change a lot between them.
SELECT
DATEPART(dy, my_date),
MAX(my_number)
FROM
My_Table
WHERE
my_date >= '2008-01-01' AND
my_date < '2009-01-01'
GROUP BY
DATEPART(dy, my_date)
The DAY function could be any function or combination of functions which gives you the days in the format that you're looking to get.
Also, if there are days with no rows at all then they will not be returned. If you need those days as well with a NULL or the highest value from the previous day then the query would need to be altered a bit.
Something like
SELECT dateadd(dd,0, datediff(dd,0,datetime)) as day, MAX(value)
FROM table GROUP BY dateadd(dd,0, datediff(dd,0,datetime)) WHERE
datetime < '2009-01-01' AND datetime > '2007-12-31'
Assuming datetime is your date column, dateadd(dd,0, datediff(dd,0,datetime)) will extract only the date part, and then you can group by that value to get a maximum daily value. There might be a prettier way to get only the date part though.
You can also use the between construct to avoid the less than and greater than.
Group on the date, use the max delegate to get the highest value for each date, sort on the value, and get the first record.
Example:
select top 1 theDate, max(theValue)
from TheTable
group by theDate
order by max(theValue) desc
(The date field needs to only contain a date for this grouping to work, i.e. the time component has to be zero.)
If you need to limit the query for a specific year, use a starting and ending date in a where claues:
select top 1 theDate, max(theValue)
from TheTable
where theDate between '2008-01-01' and '2008-12-13'
group by theDate
order by max(theValue) desc