Facing issue in Hive query in generating missing dates - sql

I have a requirement where I need to go back to previous values for a column until 1000 rows and get those previous 1000 dates for my next steps, but all those 1000 previous dates are not present for that column in the table. But I need those missing dates to get from output of the query.
When I try to run below query it is not displaying 1000 previous date values from current date.
Example: let's say only 2 dates are available for date column
date
2019-01-16
2019-01-19
I have come up with a query to get back 1000 dates but it is giving only nearest date as all previous back dates are missing
SELECT date FROM table1 t
WHERE
date >= date_sub(current_date,1000) and dt<current_date ORDER BY date LIMIT 1
If I run above query it is displaying 2019-01-16, since previous 1000 days back date are not present it is giving nearest date ,which is 2019-01-16 but I need missing dates starting from 2016-04-23 (1000th date from current date) till before current date (2019-01-18) as output of my query.

You can generate dates for required range in the subquery (see date_range subquery in the example below) and left join it with your table. If there is no record in your table on some dates, the value will be null, dates will be returned from the date_range subquery without gaps. Set start_date and end_date parameters for date_range required:
set hivevar:start_date=2016-04-23; --replace with your start_date
set hivevar:end_date=current_date; --replace with your end_date
set hive.exec.parallel=true;
set hive.auto.convert.join=true; --this enables map-join
set hive.mapjoin.smalltable.filesize=25000000; --size of table to fit in memory
with date_range as
(--this query generates date range, check it's output
select date_add ('${hivevar:start_date}',s.i) as dt
from ( select posexplode(split(space(datediff('${hivevar:end_date}','${hivevar:start_date}')),' ')) as (i,x) ) s
)
select d.dt as date,
t.your_col --some value from your table on date
from date_range d
left join table1 t on d.dt=t.date
order by d.dt --order by dates if necessary

Related

Get the number of the weekday from each date in a date list

DB-Fiddle
CREATE TABLE dates (
date_list DATE
);
INSERT INTO dates
(date_list)
VALUES
('2020-01-29'),
('2020-01-30'),
('2020-01-31'),
('2020-02-01'),
('2020-02-02');
Expected Results:
Weekday
2
3
4
5
6
I want go get the number of the weekday for each date in the table dates.
Therefore, I tried to go with the solution from this question but could not make it work:
SELECT
EXTRACT(DOW FROM DATE d.date_list))
FROM dates d
How do I need to modify the query to get the expected result?
Get rid of the date keyword it is only needed to introduce a DATE constant. If you already have a DATE value (which your column is) it's not needed:
select extract(dow from d.date_list)
from dates d

To populate all dates in a month in SQL and SSRS

We need to create a report based on employee, say Kumar, logged 3 hours.
For example,
if a person is logged Hours in system for 1-1-2021. But he didn't log for 2-1-2021.
We have a join query to get the date and worked hours for each employee.
Our present report is like below (this is the screenshot from SSRS report)
But we need the missing dates also in the report. Abhishek has logged only from 4th Jan in the system. But we need rows above 4th Jan also in the report.
First name and last name should be in those columns in the new row. The hours worked should be 0 and other column values should be N/A or Null.
But here we need a query to get all dates in a month in the report whether the employee logged hours or not.() for not logged days)
This is the report that we want in SSRS.(i created this in excel)
How can i do that?
enter image description here
You are not able to force the missing dates with SSRS. To make this work correctly, you'll need to update your query to get all the dates in your date range and then LEFT JOIN your current data to the dates.
Add a INTO to your current report query put to put the data in a #TEMP_TABLE.
SELECT <CURRENT FIELDS>
INTO #TEMP_TABLE
FROM BLAH BLAH BLAH
Then Create a table of dates for your date range using a CTE with RECURSION.
DECLARE #START_DATE DATE = '01/01/2020' --THESE DATES SHOULD BE CHANGE TO USE PARAMETERS OF YOUR DATE RANGE
DECLARE #END_DATE DATE = '03/31/2021' --OR MIN/MAX FROM THE #TEMP_TABLE
;WITH GETDATES AS
(
SELECT #START_DATE AS THEDATE
UNION ALL
SELECT DATEADD(DAY,1, THEDATE) FROM GETDATES
WHERE THEDATE < #END_DATE
)
SELECT D.THEDATE, T.*
FROM GETDATES D
LEFT JOIN #TEMP_TABLE T ON D.THEDATE = T.DATE_WORKED
OPTION (maxrecursion 0)
This will return every date in THEDATE field with your current data in the other fields.

Pull data from table based on the date of the first record

I am querying some data from a sql table based on dates entered by the user as below :
dt = as.Date(some_date)
# Manipulate dates
end_date = as.Date(dt)
begin_date = as.character(as.Date(end_date) - 364)
What happens after this is that all records in the table where the date field falls between the begin and end date are pulled.
qry <- paste0("select * from table
where date>= '", begin_date, "' and date <= '", end_date; ")
But sometimes it might happen that I do not have 1 years of data but only 10 or 9 or 8 months.
So I want to be able to change the 364 value as per the first date in the table.
So is there any way in R by which I can pull the records starting with the begin date as end_date - 364 and if that date does not exist in the table change the begin date to the first available date and run the query again.
I understand that this will require two passes of the dates and the query but I want to be able to do it iteratively without manually checking for the dates.
Your query will give you one year of data or all the data available in table, which seems to be your requirement. However, if you need to know if there is more than one year of data before selecting the data, than you can use
SELECT MIN(date) FROM table
to get the earliest date available.

Access query to pull previous month's data

I have a table in Access 2013 (Table1) that contains the following columns:
ID (pk), ReportDate, Amount
The most current data is 30-50 days old. For example, today (6/22/16) the most recent data would be the 5/1/16 row, as the 6/1/16 data won't be entered until mid-July. (All dates in the ReportDate column are the 1st of the month, i.e.: 4/1/16, 5/1/16, etc.)
I need to write a query that will do a 6-month lookback, but exclude the most current month's data.
So, for example, if I ran the query today (6/22/16), I would only get the rows that correspond to the following months:
12/1/2015
1/1/2016
2/1/2016
3/1/2016
4/1/2016
The data for 5/1/16 should be excluded, as it's the most recent month.
I can pull the previous 6 months worth of data with setting the criteria (in QBE) for ReportDate to>=DateAdd("m",-6,Date()), but I can't seem to figure out how to exclude the most recent month.
This should give you the start date of the most recent month in your table:
SELECT Max(ReportDate) AS MaxOfReportDate
FROM Table1;
If that is the month you want to exclude, use that query as a subquery which you cross join back to the table. Then you can use a WHERE clause with a BETWEEN condition whose end points are determined by DateAdd() expressions based on MaxOfReportDate:
SELECT t.ID, t.ReportDate, t.Amount
FROM
Table1 AS t,
(
SELECT Max(ReportDate) AS MaxOfReportDate
FROM Table1
) AS sub
WHERE
t.ReportDate BETWEEN DateAdd('m', -6, sub.MaxOfReportDate)
AND DateAdd('m', -1, sub.MaxOfReportDate);

How can I cross join the following query results with a table of dates

I am looking for a query which gives me the daily playing time. The start (first_date) and end date(last_update) are given as shown in the Table. The following query gives me the sum of playing time on given date. How can I extend it to get a table from first day to last day and plot the query data in it and show 0 on dates when no game is played.
SELECT startTime, SUM(duration) as sum
FROM myTable
WHERE startTime = endTime
GROUP BY startTime
To show date when no one play you will need create a table days with a date field day so you could do a left join. (100 years is only 36500 rows).
Using select Generate days from date range
This use store procedure in MSQL
I will assume if a play pass the midnight a new record begin. So I could simplify my code and remove the time from datetime field
SELECT d.day, SUM(duration) as sum
FROM
days d
left join myTable m
on CONVERT(date, m.starttime) = d.day
GROUP BY d.day
If I understand correctly, you could try:
SELECT SUM(duration) AS duration, date
FROM myTable
WHERE date <= 20140430
AND date => 20140401
GROUP BY date
This would get the total time played for each date between april 1 and april 30
As far as showing 0 for dates not in the table, I don't know.
Also, the table you posted doesn't show a duration column, but the query you posted does, so I went ahead and used it.