Get last date with data for each calendar date - sql

I have some data : sales amount for each day, but sometimes I have missing data so no record (for example on the weekend, but not only). For these dates, I want to replace the null value with the last known value. I create a reference table with all calendar dates and a boolean to tell me if I have data for this day.
For example with this reference table :
Date
is_data_present
27/10/2022
1
28/10/2022
1
29/10/2022
0
10/10/2022
0
I want this outcome :
Date
is_data_present
date_to_use
27/10/2022
1
27/10/2022
28/10/2022
1
28/10/2022
29/10/2022
0
28/10/2022
30/10/2022
0
28/10/2022
I tried things with LEAD but I don't know how to add a condition like 'where is_data_present = 1'

Basically, you don't need a window function for this.
The coalsesce is for the case that the first row is 0, and so has no value that is prior to it
SELECT
"Date", "is_data_present",
COALESCE((SELECT "Date" FROM table1 WHERE "Date" <= Tab1."Date" AND "is_data_present" = 1 ORDER BY "Date" DESC LIMIT 1 ),"Date") date_to_use
FROM table1 tab1

I tried things with LEAD but I don't know how to add a condition like 'where is_data_present = 1'
In addtion to #nbk's approach, you might consider FIRST_VALUE or LAST_VALUE if you want to use a window function since LEAD or LAG doesn't support IGNORE NULLS in it.
WITH sample_table AS (
SELECT '27/10/2022' date, 1 is_data_present UNION ALL
SELECT '28/10/2022' date, 1 is_data_present UNION ALL
SELECT '29/10/2022' date, 0 is_data_present UNION ALL
SELECT '30/10/2022' date, 0 is_data_present
)
SELECT *,
LAST_VALUE(IF(is_data_present = 1, date, NULL) IGNORE NULLS) OVER (ORDER BY date) date_to_use,
FROM sample_table;
+------------+-----------------+-------------+
| date | is_data_present | date_to_use |
+------------+-----------------+-------------+
| 27/10/2022 | 1 | 27/10/2022 |
| 28/10/2022 | 1 | 28/10/2022 |
| 29/10/2022 | 0 | 28/10/2022 |
| 30/10/2022 | 0 | 28/10/2022 |
+------------+-----------------+-------------+

Related

SQL: Calculate number of days since last success

Following table represents results of given test.
Every result for the same test is either pass ( error_id=0) or fail ( error_id <> 0)
I need help to write a query, that returns the number of runs since last good run ( error_id= 0) and the date.
| Date | test_id | error_id |
-----------------------------------
| 2019-12-20 | 123 | 23
| 2019-12-19 | 123 | 23
| 2019-12-17 | 123 | 22
| 2019-12-18 | 123 | 0
| 2019-12-16 | 123 | 11
| 2019-12-15 | 123 | 11
| 2019-12-13 | 123 | 11
| 2019-12-12 | 123 | 0
So the result for this example should be:
| 2019-12-18 | 123 | 4
as the test 123 was PASS on 2019-12-18 and this happened 4 runs ago.
I have a query to determine whether given run is error or not, but I have trouble applying appropriate window function to it to get the wanted result
select test_id, Date, error_id, (CASE WHEN error_id 0 THEN 1 ELSE 0 END) as is_error
from testresults
You can generate a row number, in reverse order from the sorting of the query itself:
SELECT test_date, test_id, error_code,
(row_number() OVER (ORDER BY test_date asc) - 1) as runs_since_last_pass
FROM tests
WHERE test_date >= (SELECT MAX(test_date) FROM tests WHERE error_code=0)
ORDER BY test_date DESC
LIMIT 1;
Note that this will run into issues if test_date is not unique. Better use a timestamp (precise to the millisecond) instead of a date.
Here's a DBFiddle: https://www.db-fiddle.com/f/8gSHVcXMztuRiFcL8zLeEx/0
If there's more than one test_id, you'll want to add a PARTITION BY clause to the row number function, and the subquery would become a bit more complex. It may be more efficient to come up with a way to do this by a JOIN instead of a subquery, but it would be more cognitively complex.
I think you just want aggregation and some filtering:
select id, count(*),
max(date) over (filter where error_id = 0) as last_success_date
from t
where date > (select max(t2.date) from t t2 where t2.error_id = 0);
group by id;
You have to use the Maximum date of the good runs for every test_id in your query. You can try this query:
select tr2.Date_error, tr.test_id, count(tr.error_id) from
testresults tr inner join (select max(Date_error), test_id
from testresult where error_id=0 group by test_id) tr2 on
tr.test_id=tr2.test_id and tr.date_error >=tr2.date_error
group by test_id
This should do the trick:
select count(*) from table t,
(select max(date) date from table where error_id = 0) good
where t.date >= good.date
Basically you are counting the rows that have a date >= the date of the last success.
Please note: If you need the number of days, it is a complete different query:
select now()::date - max(test_date) last_valid from tests
where error_code = 0;

How to aggregate based on various conditions

lets say I have a table which stores itemID, Date and total_shipped over a period of time:
ItemID | Date | Total_shipped
__________________________________
1 | 1/20/2000 | 2
2 | 1/20/2000 | 3
1 | 1/21/2000 | 5
2 | 1/21/2000 | 4
1 | 1/22/2000 | 1
2 | 1/22/2000 | 7
1 | 1/23/2000 | 5
2 | 1/23/2000 | 6
Now I want to aggregate based on several periods of time. For example, I Want to know how many of each item was shipped every two days and in total. So the desired output should look something like:
ItemID | Jan20-Jan21 | Jan22-Jan23 | Jan20-Jan23
_____________________________________________
1 | 7 | 6 | 13
2 | 7 | 13 | 20
How do I do that in the most efficient way
I know I can make three different subqueries but I think there should be a better way. My real data is large and there are several different time periods to be considered i. e. in my real problem I want the shipped items for current_week, last_week, two_weeks_ago, three_weeks_ago, last_month, two_months_ago, three_months_ago so I do not think writing 7 different subqueries would be a good idea.
Here is the general idea of what I can already run but is very expensive for the database
WITH
sq1 as (
SELECT ItemID, sum(Total_shipped) sum1
FROM table
WHERE Date BETWEEN '1/20/2000' and '1/21/2000'
GROUP BY ItemID),
sq2 as (
SELECT ItemID, sum(Total_Shipped) sum2
FROM table
WHERE Date BETWEEN '1/22/2000' and '1/23/2000'
GROUP BY ItemID),
sq3 as(
SELECT ItemID, sum(Total_Shipped) sum3
FROM Table
GROUP BY ItemID)
SELECT ItemID, sq1.sum1, sq2.sum2, sq3.sum3
FROM Table
JOIN sq1 on Table.ItemID = sq1.ItemID
JOIN sq2 on Table.ItemID = sq2.ItemID
JOIN sq3 on Table.ItemID = sq3.ItemID
I dont know why you have tagged this question with multiple database.
Anyway, you can use conditional aggregation as following in oracle:
select
item_id,
sum(case when "date" between date'2000-01-20' and date'2000-01-21' then total_shipped end) as "Jan20-Jan21",
sum(case when "date" between date'2000-01-22' and date'2000-01-23' then total_shipped end) as "Jan22-Jan23",
sum(case when "date" between date'2000-01-20' and date'2000-01-23' then total_shipped end) as "Jan20-Jan23"
from my_table
group by item_id
Cheers!!
Use FILTER:
select
item_id,
sum(total_shipped) filter (where date between '2000-01-20' and '2000-01-21') as "Jan20-Jan21",
sum(total_shipped) filter (where date between '2000-01-22' and '2000-01-23') as "Jan22-Jan23",
sum(total_shipped) filter (where date between '2000-01-20' and '2000-01-23') as "Jan20-Jan23"
from my_table
group by 1
item_id | Jan20-Jan21 | Jan22-Jan23 | Jan20-Jan23
---------+-------------+-------------+-------------
1 | 7 | 6 | 13
2 | 7 | 13 | 20
(2 rows)
Db<>fiddle.

How to merge two rows in SQL Server

I have a table similar to this one:
Date | Cond | Time
---------+--------+------
18/03/19 | 1 | 13:07
18/03/19 | 0 | 16:07
I want to have a selection that would produce thing similar to that using join or union or any sort of condition
Date | Time1 | Time2
----------+-------+------
18/03/19 | 13:07 | 16:07
Best regards
You can use conditional aggregation:
select date, max(case when cond = 1 then time end) as time_1,
max(case when cond = 0 then time end) as time_0
from t
group by date
order by date;
use aggregate function
select date,min(time),max(time)
from table group by date

Multiple select statement in hive

toi am fairly new to hive.
I have a table called stats as shown below
from_date | to_date | customer_name | callcount
-------------------------------------------------------
2016_01_01 | 2016_01_02 | ABC | 25
2016_01_02 | 2016_01_03 | ABC | 53
2016_01_03 | 2016_01_04 | ABC | 44
2016_01_04 | 2016_01_05 | ABC | 55
I want to build a hive query will accept:
a) current time range (from and to time)
b) previous time range (from and to time)
c) customer name
For e.g.: the inputs will be:
current time range can be from time(2016_01_03) and to time(2016_01_05)
current time range can be from time(2016_01_01) and to time(2016_01_02)
customer name can be ABC
And the result i want to display is:
current_call_count(sum of call counts for current time range), previous_call_count(sum of call counts for previous time range)
and difference between current_call_count & previous_call_count
like this:
customer | current_call_count | previous_call_count | Diff
---------------------------------------------------------
ABC | 99 | 25 | 74
The query which i built was:
select * from
(
select sum(callCount) as current_count from stats
where customer_name='ABC' and from_date>='2016-04-03' and to_date<='2016-04-05'
UNION ALL
select sum(callCount) as current_count from stats
where customer_name='ABC' and from_date>='2016-04-01' and to_date<='2016-04-02'
) FINAL
I am not able to get the calculation and also not able to display result as columns. Please help
Try conditional aggregation:
select
sum(case when from_date >= '2016_01_04' and to_date <= '2016_01_05' then callcount else 0 end)
as current_call_count,
sum(case when from_date >= '2016_01_02' and to_date <= '2016_01_03' then callcount else 0 end)
as previous_call_count,
sum(case when from_date >= '2016_01_04' and to_date <= '2016_01_05' then callcount else 0 end)
- sum(case when from_date >= '2016_01_02' and to_date <= '2016_01_03' then callcount else 0 end)
as difference
from stats
where customer_name = 'ABC'
Note that your example data uses _ instead of - (which is used in your query) as a date separator.

SQL Query Compare values in per 15 minutes and display the result per hour

I have a table with 2 columns. UTCTime and Values.
The UTCTime is in 15 mins increment. I want a query that would compare the value to the previous value in one hour span and display a value between 0 and 4 depends on if the values are constant. In other words there is an entry for every 15 minute increment and the value can be constant so I just need to check each value to the previous one per hour.
For example
+---------|-------+
| UTCTime | Value |
------------------|
| 12:00 | 18.2 |
| 12:15 | 87.3 |
| 12:30 | 55.91 |
| 12:45 | 55.91 |
| 1:00 | 37.3 |
| 1:15 | 47.3 |
| 1:30 | 47.3 |
| 1:45 | 47.3 |
| 2:00 | 37.3 |
+---------|-------+
In this case, I just want a Query that would compare the 12:45 value to the 12:30 and 12:30 to 12:15 and so on. Since we are comparing in only one hour span then the constant values must be between 0 and 4 (O there is no constant values, 1 there is one like in the example above)
The query should display:
+----------+----------------+
| UTCTime | ConstantValues |
----------------------------|
| 12:00 | 1 |
| 1:00 | 2 |
+----------|----------------+
I just wanted to mention that I am new to SQL programming.
Thank you.
See SQL fiddle here
Below is the query you need and a working solution Note: I changed the timeframe to 24 hrs
;with SourceData(HourTime, Value, RowNum)
as
(
select
datepart(hh, UTCTime) HourTime,
Value,
row_number() over (partition by datepart(hh, UTCTime) order by UTCTime) RowNum
from foo
union
select
datepart(hh, UTCTime) - 1 HourTime,
Value,
5
from foo
where datepart(mi, UTCTime) = 0
)
select cast(A.HourTime as varchar) + ':00' UTCTime, sum(case when A.Value = B.Value then 1 else 0 end) ConstantValues
from SourceData A
inner join SourceData B on A.HourTime = B.HourTime and
(B.RowNum = (A.RowNum - 1))
group by cast(A.HourTime as varchar) + ':00'
select SUBSTRING_INDEX(UTCTime,':',1) as time,value, count(*)-1 as total
from foo group by value,time having total >= 1;
fiddle
Mine isn't much different from Vasanth's, same idea different approach.
The idea is that you need recursion to carry it out simply. You could also use the LEAD() function to look at rows ahead of your current row, but in this case that would require a big case statement to cover every outcome.
;WITH T
AS (
SELECT a.UTCTime,b.VALUE,ROW_NUMBER() OVER(PARTITION BY a.UTCTime ORDER BY b.UTCTime DESC)'RowRank'
FROM (SELECT *
FROM #Table1
WHERE DATEPART(MINUTE,UTCTime) = 0
)a
JOIN #Table1 b
ON b.UTCTIME BETWEEN a.UTCTIME AND DATEADD(hour,1,a.UTCTIME)
)
SELECT T.UTCTime, SUM(CASE WHEN T.Value = T2.Value THEN 1 ELSE 0 END)
FROM T
JOIN T T2
ON T.UTCTime = T2.UTCTime
AND T.RowRank = T2.RowRank -1
GROUP BY T.UTCTime
If you run the portion inside the ;WITH T AS ( ) you'll see that gets us the hour we're looking at and the values in order by time. That is used in the recursive portion below by joining to itself and evaluating each row compared to the next row (hence the RowRank - 1) on the JOIN.