Having issues joing a table with a recursive function in Sqlite - sql

I'm building a complex query but I have a problem...
Pratically, I retrieve a dates range from recursive function in sqlite:
WITH RECURSIVE dates(d)
AS (VALUES('2014-05-01')
UNION ALL
SELECT date(d, '+1 day')
FROM dates
WHERE d < '2014-05-5')
SELECT d AS date FROM dates
This is the result:
2014-05-01
2014-05-02
2014-05-03
2014-05-04
2014-05-05
I would join this query on other query, about this:
select date_column, column1, column2 from table
This is the result:
2014-05-03 column_value1 column_value2
Finally, I would like to see a similar result in output (join first query and date_column of second query):
2014-05-01 | | |
2014-05-02 | | |
2014-05-03 | column_value1 | column_value2 |
2014-05-04 | | |
2014-05-05 | | |
How can I obtain this result?
Thanks!!!

Why don't you simply do something like this ?
WITH RECURSIVE dates(d)
AS (VALUES('2014-05-01')
UNION ALL
SELECT date(d, '+1 day')
FROM dates
WHERE d < '2014-05-5')
SELECT
dates.d AS date
,table.column1
,table.column2
FROM dates
left join table
ON strftime("%Y-%m-%d", table.date_column) = dates.date
Perhaps you will need to convert your date...

Related

How to write a SQL statement to sum data using group by the same day of every two neighboring months

I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...
One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo
I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.
I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).
Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)

In a table containing rows of date ranges, from each row, generate one row per day containing hours of utilization

Given a table with rows like:
+----+-------------------------+------------------------+
| ID | StartDate | EndDate |
+----+-------------------------+------------------------+
| 1 | 2016-02-05 20:00:00.000 | 2016-02-07 5:00:00.000 |
+----+-------------------------+------------------------+
I want to produce a table like this:
+----+------------+----------+
| ID | Date | Duration |
+----+------------+----------+
| 1 | 2016-02-05 | 4 |
| 1 | 2016-02-06 | 24 |
| 1 | 2016-02-07 | 5 |
+----+------------+----------+
This is an interview-style question. I am wondering how I can go about tackling this. Is it possible to do this with just standard SQL query syntax? Or is a procedural language like pl/pgSQL required to do a query like this?
The basic idea is this:
SELECT date_trunc('day', dayhour) as dd,count(*)
FROM (VALUES (1, '2016-02-05 20:00:00.000'::timestamp, '2016-02-07 5:00:00.000'::timestamp)
) v(ID, StartDate, EndDate), lateral
generate_series(StartDate, EndDate, interval '1 hour') g(dayhour)
GROUP BY dd
ORDER BY dd;
That adds an extra hour, so this is more accurate:
SELECT date_trunc('day', dayhour) as dd,count(*)
FROM (VALUES (1, '2016-02-05 20:00:00.000'::timestamp, '2016-02-07 5:00:00.000'::timestamp)
) v(ID, StartDate, EndDate), lateral
generate_series(StartDate, EndDate - interval '1 hour', interval '1 hour') g(dayhour)
GROUP BY dd
ORDER BY dd;
Technically, the lateral is not needed (and in that case, I would replace the comma with cross join). However, this is an example of a lateral join, so being explicit is good.
I should also note that the above is the simplest method. However, the group by does slow down the query. There are other methods that don't require generating a series for every hour.

How to get the difference between two rows in a table and place in a new column through SQL

select * from TABLE1
where ENTRY_DATE >=trunc(sysdate-365)
ORDER BY ENTRY_TIME
This gives me the following result:
NUMBER_ID | ENTRY_DATE | ENTRY_TIME
----------+------------+------------
1 | 11/21/2014 | 11/21/2014 08:05:00 AM
2 | 11/21/2014 | 11/21/2014 08:08:46 AM
3 | 11/21/2014 | 11/21/2014 08:09:51 AM
4 | 11/21/2014 | 11/21/2014 08:10:05 AM
5 | 11/21/2014 | 11/21/2014 08:10:05 AM
6 | 11/21/2014 | 11/21/2014 08:10:59 AM
7 | 11/21/2014 | 11/21/2014 08:14:34 AM
However I would like to be able to display "Difference" through SQL, where column "Difference" is the difference in time between one entry and the last.
What I need
Can anyone help with adding this to my this SQL code? Thanks
You have specified multiple RDBMS. For oracle, a straightforward query would be
SELECT e_id
, e_d - NVL(LAG ( e_d ) OVER ( ORDER BY e_d ), e_d) diff
FROM events
;
assuming a base table events created by
CREATE TABLE events ( e_id NUMBER PRIMARY KEY, e_d DATE );
The difference will be presented in the unit 'days'.
An alternative query does not use the LAG function and - while stillbeing formulated in oracle syntax - should be portable:
SELECT e.e_id
, NVL ( e.e_d - elagged.e_d, 0 ) diff
FROM events e
LEFT JOIN events elagged ON ( elagged.e_id = e.e_id - 1 )
ORDER BY e.e_id
;
This sqlfiddle contains the complete example.

How to sort PostgreSQL data by weeks over month period

In simplest terms, I want to pull aggregate data from a table over a 4 week period but group by each week. It is safe to assume we can "force" a specific date or time (although it would be nice to allow any date entered and have the query run based on the date entered).
For example, the resulting data from a query would look like this:
start_date | end_date | count_of_sales
---------------------------------------------------------------
2014-03-03 04:00:00 | 2014-03-10 03:59:59 | 375
2014-03-10 04:00:00 | 2014-03-17 03:59:59 | 375
2014-03-17 04:00:00 | 2014-03-24 03:59:59 | 375
2014-03-24 04:00:00 | 2014-03-31 04:00:00 | 200
This would stem from unaggregated data that simply had a date (and of course other data but that is irrelevant):
saleDate | repID | productID
---------------------------------------------------------------
2014-03-04 12:36:33 | 1235 | 443
2014-03-09 07:08:12 | 1235 | 493
2014-03-09 10:12:44 | 3948 | 472
2014-03-21 23:33:01 | 2957 | 479
In my head the query would look SOMETHING (although accurate) like this:
SELECT start_date, end_date, COUNT(*) FROM table WHERE date < '2014-03-31 04:00:00' GROUP BY date
I understand the query above however does not understand how far back to look (ideally the customer enters the final date and perhaps how many weeks prior of data they want to pull) which is why I left out a date BETWEEN clause (they may not know the exact 'start' date.
Sorry if this is confusing but hopefully the sample SQL (albeit wrong) and desired results will give a clearer picture
If I got your question correctly, then following code should help you,
For clarification: Code which I have given is of SQL Server.
With CTE as
(
Select 1 as pID,'2014-03-03 04:00:00' as startDate,'2014-03-10 03:59:59' as endDate
Union All
Select 2,'2014-03-10 04:00:00','2014-03-17 03:59:59'
Union All
Select 3,'2014-03-17 04:00:00','2014-03-24 03:59:59'
Union All
Select 4,'2014-03-24 04:00:00','2014-03-31 04:00:00'
)
select a.pID,a.startDate,a.endDate,count(*) from CTE as a
inner join MyTable on myDateCol between a.startDate and a.endDate
group by a.pID,a.startDate,a.endDate
for demo SQL Fiddle

Query date range columns with list of date parameters

I have an Oracle 10g table containing 2 date columns, DATE_VALID_FROM from and DATE_VALID_TO.
MY_TABLE:
DATE_VALID_FROM | DATE_VALID_TO | VALUE
15-FEB-13 | 17-FEB-13 | 1.833
14-FEB-13 | 14-FEB-13 | 1.836
13-FEB-13 | 13-FEB-13 | 1.824
12-FEB-13 | 12-FEB-13 | 1.82
11-FEB-13 | 11-FEB-13 | 1.822
08-FEB-13 | 10-FEB-13 | 1.826
07-FEB-13 | 07-FEB-13 | 1.814
06-FEB-13 | 06-FEB-13 | 1.806
05-FEB-13 | 05-FEB-13 | 1.804
04-FEB-13 | 04-FEB-13 | 1.796
01-FEB-13 | 03-FEB-13 | 1.801
The range on the date columns isn’t always one day (weekends).
I can retrieve the value for a single date like this,
select DATE_VALID_FROM, DATE_VALID_TO, VALUE
from MY_TABLE
where DATE_VALID_FROM <= TO_DATE('16-FEB-13', 'dd-MON-yy')
and DATE_VALID_TO >= TO_DATE('16-FEB-13', 'dd-MON-yy')
Is it possible to retrieve the values for multiple random dates in a single query?
e.g. Values for the 1st, 5th, 6th, 11th and 16th Feb.
Producing this result set:
DATE_VALID_FROM | DATE_VALID_TO | VALUE
15-FEB-13 | 17-FEB-13 | 1.833
11-FEB-13 | 11-FEB-13 | 1.822
06-FEB-13 | 06-FEB-13 | 1.806
05-FEB-13 | 05-FEB-13 | 1.804
01-FEB-13 | 03-FEB-13 | 1.801
Try:
select DATE_VALID_FROM, DATE_VALID_TO, VALUE
from MY_TABLE M
JOIN (SELECT TO_DATE('01-FEB-2013') DATE_PARAM FROM DUAL UNION ALL
SELECT TO_DATE('05-FEB-2013') DATE_PARAM FROM DUAL UNION ALL
SELECT TO_DATE('06-FEB-2013') DATE_PARAM FROM DUAL UNION ALL
SELECT TO_DATE('11-FEB-2013') DATE_PARAM FROM DUAL UNION ALL
SELECT TO_DATE('16-FEB-2013') DATE_PARAM FROM DUAL) D
ON M.DATE_VALID_FROM <= D.DATE_PARAM and M.DATE_VALID_TO >= D.DATE_PARAM
SQLFiddle here
you can use a collection for this:
SQL> create type mydatetab as table of date;
2 /
Type created.
SQL> with dates as (select /*+ cardinality(t, 5) */ t.column_value thedate
2 from table(mydatetab(TO_DATE('16-FEB-13', 'dd-mon-rr'),
3 TO_DATE('13-FEB-13', 'dd-mon-rr'))) t)
4 select DATE_VALID_FROM, DATE_VALID_TO, VALUE
5 from MY_TABLE, dates
6 where dates.thedate between DATE_VALID_FROM and DATE_VALID_TO;
DATE_VALI DATE_VALI VALUE
--------- --------- ----------
13-FEB-13 13-FEB-13 1.824
15-FEB-13 17-FEB-13 1.833
if you don't have privs to create one (ie this is just an adhoc thing). there may be some public ones you can use. check select * from all_coll_types where elem_type_name = 'DATE' for these.
p.s. you should always specify the format when you use dates. i.e. dont do :
TO_DATE('16-FEB-13')
but rather:
TO_DATE('16-FEB-13', 'dd-MON-rr')