Flat Big Query Rows - sql

Please help me to resolve this task. I have Google Big Query table like this:
| name | startDate | endDate |
| Bob | 2018-01-01 | 2018-01-01 |
| Nick | 2017-12-29 | 2017-12-31 |
and as a result I need to get something like this:
| name | date |
| Bob | 2018-01-01 |
| Nick | 2017-12-29 |
| Nick | 2017-12-30 |
| Nick | 2017-12-31 |
Is it possible? Thank you in advance.

WITH CTE as (
SELECT 'bob' name, date('2018-01-01') startDate, date('2018-01-01') endDate
UNION ALL SELECT 'Nick', date '2017-12-29' startDate, date('2017-12-31') endDate
),
CTE2 AS (
SELECT name, GENERATE_DATE_ARRAY(startDate, endDate, INTERVAL 1 DAY) AS date
FROM CTE
)
SELECT name, date
FROM CTE2,
UNNEST(date) as date

Or just simply
#standardSQL
SELECT name, date
FROM `project.dataset.table`,
UNNEST(GENERATE_DATE_ARRAY(startDate, endDate)) date

You may make use of a calendar table here:
WITH dates AS (
SELECT '2017-12-29' AS date_val UNION ALL
SELECT '2017-12-30' UNION ALL
SELECT '2017-12-31' UNION ALL
SELECT '2018-01-01'
-- and maybe other dates
)
SELECT
t2.name,
t1.date_val
FROM dates t1
INNER JOIN yourTable t2
ON t1.date_val BETWEEN t2.startDate AND t2.endDate
ORDER BY
t2.name,
t1.date_val;
If your version of BigQuery does not support CTE, you may just inline the CTE as a subquery. That is, replace dates with the body of the CTE itself.
In practice, you might want to generate a date series (q.v. here), or possibly maintain a dedicated calendar table in your database. The above just shows what the query itself might look like.

Related

Create all months list from a date column in ORACLE SQL

CREATE TABLE dates(
alldates date);
INSERT INTO dates (alldates) VALUES ('1-May-2017');
INSERT INTO dates (alldates) VALUES ('1-Mar-2018');
I want to generate all months beginning between these two dates. I am very new to Oracle SQL. My solution is below, but it is not working properly.
WITH t1(test) AS (
SELECT MIN(alldates) as test
FROM dates
UNION ALL
SELECT ADD_MONTHS(test,1) as test
FROM t1
WHERE t1.test<= (SELECT MAX(alldates) FROM date)
)
SELECT * FROM t1
The result I want should look like
Test
2017-02-01
2017-03-01
...
2017-12-01
2018-01-01
2018-02-01
2018-03-01
You made a typo and wrote date instead of dates but you also need to make a second change and use ADD_MONTHS in the recursive query's WHERE clause or you will generate one too many rows.
WITH t1(test) AS (
SELECT MIN(alldates)
FROM dates
UNION ALL
SELECT ADD_MONTHS(test,1)
FROM t1
WHERE ADD_MONTHS(test,1) <= (SELECT MAX(alldates) FROM dates)
)
SELECT * FROM t1
Which outputs:
| TEST |
| :-------- |
| 01-MAY-17 |
| 01-JUN-17 |
| 01-JUL-17 |
| 01-AUG-17 |
| 01-SEP-17 |
| 01-OCT-17 |
| 01-NOV-17 |
| 01-DEC-17 |
| 01-JAN-18 |
| 01-FEB-18 |
| 01-MAR-18 |
However, a more efficient query would be to get the minimum and maximum values in the same query and then iterate using these pre-found bounds:
WITH t1(min_date, max_date) AS (
SELECT MIN(alldates),
MAX(alldates)
FROM dates
UNION ALL
SELECT ADD_MONTHS(min_date,1),
max_date
FROM t1
WHERE ADD_MONTHS(min_date,1) <= max_date
)
SELECT min_date AS month
FROM t1
db<>fiddle here
Update
Oracle 11gR2 has bugs handling recursive date queries; this is fixed in later Oracle versions but if you want to use SQL Fiddle and Oracle 11gR2 then you need to iterate over a numeric value and not a date. Something like this:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE dates(
alldates date);
INSERT INTO dates (alldates) VALUES ('1-May-2017');
INSERT INTO dates (alldates) VALUES ('1-Mar-2018');
Query 1:
WITH t1(min_date, month, total_months) AS (
SELECT MIN(alldates),
0,
MONTHS_BETWEEN(MAX(alldates),MIN(alldates))
FROM dates
UNION ALL
SELECT min_date,
month+1,
total_months
FROM t1
WHERE month+1<=total_months
)
SELECT ADD_MONTHS(min_date,month) AS month
FROM t1
Results:
| MONTH |
|----------------------|
| 2017-05-01T00:00:00Z |
| 2017-06-01T00:00:00Z |
| 2017-07-01T00:00:00Z |
| 2017-08-01T00:00:00Z |
| 2017-09-01T00:00:00Z |
| 2017-10-01T00:00:00Z |
| 2017-11-01T00:00:00Z |
| 2017-12-01T00:00:00Z |
| 2018-01-01T00:00:00Z |
| 2018-02-01T00:00:00Z |
| 2018-03-01T00:00:00Z |
You seem to want a recursive CTE. That syntax would be:
WITH CTE(min_date, max_date) as (
SELECT MIN(alldates) as min_date, MAX(alldates) as max_date
FROM dates
UNION ALL
SELECT add_months(min_date, 1), max_date
FROM CTE
WHERE min_date < max_date
)
SELECT min_date
FROM CTE;
Here is a db<>fiddle.
You just made a typo: date instead of dates:
WITH t1(test) AS (
SELECT MIN(alldates) as test
FROM dates
UNION ALL
SELECT ADD_MONTHS(test,1) as test
FROM t1
WHERE t1.test<= (SELECT MAX(alldates) FROM dateS) -- fixed here
)
SELECT * FROM t1

Count distinct dates within timestamp by Month SQL

I would like to count the number of distinct dates within each month.
I have a data set that looks like this:
TIMESTAMP
------------------
2017-10-25 14:39:51
2017-10-25 15:00:51
2017-11-10 02:39:42
2018-09-24 14:39:55
2018-09-25 13:25:01
2019-02-12 12:23:44
...
So my expected output would be:
year_month | count
2017-10 2
2018-09 2
2019-02 1
I have tried the following code so far, but it is returning incorrect results:
WITH F AS(
SELECT concat(YEAR(TIMESTAMP), MONTH(TIMESTAMP)) AS year_month
FROM tbl
WHERE TYPE = 'Site'
AND TO_SITE = 'location'
)
SELECT count(year_month), year_month
FROM F
GROUP BY year_month
I do not need to worry about the time of day. I just want to count the distinct days in each month. Thank you in advance for your help.
In SQL Server, I would recommend one of these approaches:
select year(timestamp), month(timestamp), count(distinct convert(date, timestamp)
from t
group by year(timestamp), month(timestamp);
Or:
select format(timestamp, 'yyyy-MM'), count(distinct convert(date, timestamp))
from t
group by format(timestamp, 'yyyy-MM');
I see no need for subqueries or CTEs.
from your code I assume you are using a SQL Server then you can do something like this
with cte as
(
select
left(convert(varchar, myCol,112),6) as yyyy_mm,
convert(date, myCol) as date
from myTable
)
select
yyyy_mm,
count(distinct date) as count
from cte
group by
yyyy_mm
output:
| yyyy_mm | count |
*-----------------*
| 201710 | 1 |
| 201711 | 1 |
| 201809 | 2 |
| 201902 | 1 |
Using date_trunc() in PostgreSQL it can simply be:
SELECT date_trunc('month', timestamp)
, count(DISTINCT date_trunc('day', timestamp))
FROM tbl
GROUP BY 1;
Various performance optimizations possible, depending on details of the setup.

Get Max And Min dates for consecutive values in T-SQL

I have a log table like below and want to simplfy it by getting min start date and max end date for consecutive Status values for each Id. I tried many window function combinations but no luck.
This is what I have:
This is what want to see:
This is a typical gaps-and-islands problem. You want to aggregate groups of consecutive records that have the same Id and Status.
No need for recursion, here is one way to solve it using window functions:
select
Id,
Status,
min(StartDate) StartDate,
max(EndDate) EndDate
from (
select
t.*,
row_number() over(partition by id order by StartDate) rn1,
row_number() over(partition by id, status order by StartDate) rn2
from mytable t
) t
group by
Id,
Status,
rn1 - rn2
order by Id, min(StartDate)
The query works by ranking records over two different partitions (by Id, and by Id and Status). The difference between the ranks gives you the group each record belongs to. You can run the subquery independently to see what it returns and understand the logic.
Demo on DB Fiddle:
Id | Status | StartDate | EndDate
-: | :----- | :------------------ | :------------------
1 | B | 07/02/2019 00:00:00 | 18/02/2019 00:00:00
1 | C | 18/02/2019 00:00:00 | 10/03/2019 00:00:00
1 | B | 10/03/2019 00:00:00 | 01/04/2019 00:00:00
2 | A | 05/02/2019 00:00:00 | 22/04/2019 00:00:00
2 | D | 22/04/2019 00:00:00 | 05/05/2019 00:00:00
2 | A | 05/05/2019 00:00:00 | 30/06/2019 00:00:00
Try the following query. First order the data by StartDate and generate a sequence (rid). Then you the recursive cte to get the first row (rid=1) for each group (id,status), and recursively get the next row and compare the start/end date.
;WITH cte_r(id,[Status],StartDate,EndDate,rid)
AS
(
SELECT id,[Status],StartDate,EndDate, ROW_NUMBER() OVER(PARTITION BY Id,[Status] ORDER BY StartDate) AS rid
FROM log_table
),
cte_range(id,[Status],StartDate,EndDate,rid)
AS
(
SELECT id,[Status],StartDate,EndDate,rid
FROM cte_r
WHERE rid=1
UNION ALL
SELECT p.id, p.[Status], CASE WHEN c.StartDate<p.EndDate THEN p.StartDate ELSE c.StartDate END AS StartDate, c.EndDate,c.rid
FROM cte_range p
INNER JOIN cte_r c
ON p.id=c.id
AND p.[Status]=c.[Status]
AND p.rid+1=c.rid
)
SELECT id,[Status],StartDate,MAX(EndDate) AS EndDate FROM cte_range GROUP BY id,StartDate ;

SQL Find Last Entry Closest to a Date

I am trying to filter the last entry in a table closet to a defined date and I am having difficulties. Any input is greatly appreciated. Thanks! I am running Microsoft SQL Server 2008.
Table:
code | account | date | amount
1 | 1234 | 2016-02-28 | 500
2 | 1234 | 2016-03-01 | 650
3 | 1234 | 2016-03-05 | 842
4 | 7890 | 2016-02-28 | 500
5 | 7890 | 2016-03-30 | 550
I want to select only entries with a date closest to March 31 ('2016-03-31'). In this example, the entry closest to 2016-03-31 for account 1234 is entry #3 and the entry closest to 2016-03-31 for account 7890 is entry #5. In other words, I want the last entry for all accounts equal to or before a date.
3 | 1234 | 2016-03-05 | 842
5 | 7890 | 2016-03-30 | 550
Most DBMSes (including MS SQL Server) support Analytical Functions:
select *
from
(
select *,
row_number() -- create a ranking
over (partition by account -- for each account
order by date desc) as rn -- based on descending dates
from tab
where date <= date '2016-03-31'
) dt
where rn = 1 -- return the row with the "closest" date
Since no DBMS is specified, here's a kind of hacky way to do this in SQL Server. It grabs the record just before and just after the specified date:
select * from (
select top(1) * FROM mytable
where date >= '2016-03-31' order by date asc
) t1
union
select * from (
select top(1) * FROM mytable
where date <= '2016-03-31' order by date desc
) t2
This should do what you want, and it should be easy enough to understand to don't need further explanation:
select t.*
from your_table t
join (
select account, max(date) as date
from your_table
where date <= '2016-03-31'
group by account
) as subquery on t.account = subquery.account and t.date = subquery.date
Edit: for SQL Server it might be better to use an analytical function (like row_number)

How can I join a table to get values similar to the Excel VLOOKUP-Range-Function?

I'm using HP Vertica to write my queries. I want to select some data which should look like Excel would do it when you use the VLOOKUP function with the range flag enabled [ VLOOKUP(A1;B1:C4;2;1) ].
I give you one simple example for better understanding. I have a table showing historic warehouse movements.
stock_history
-------------
|product|location|time_stamp |
|-------|--------|------------|
| A | Loc A | 2015-01-13 |
| A | Loc B | 2015-03-13 |
Product A was moved in location A in January
(and stayed there in February)
and was moved in location B in March
Now I want to see the Location of A at every month (let's say there is only one movement allowed per month to make it easier)
It should look like this
|product|location|month |
|-------|--------|----- ---|
| A | Loc A | 2015-01 |
| A | Loc A | 2015-02 |
| A | Loc B | 2015-03 |
I've generated a table which shows all months:
all_months
----------
|month |
|---------|
| 2015-01 |
| 2015-02 |
| 2015-03 |
Here is a statement I tried
select his.product
, his.location
, mon.month
from stock_history as his
left outer join all_months as mon
on mon.month = to_char( time_stamp, 'YYYY-MM' )
|product |location|month |
|--------|--------|----- ---|
| A | Loc A | 2015-01 |
| (null) | (null) | 2015-02 |
| A | Loc B | 2015-03 |
How do I manage it to get the product A also in the February-line, because it still was in location A in February?
Thanks for reading my question. I'm looking forward to get your answers ;)
Regards,
Felix
Here you go !
I have also added example with added months.Made use of recursive features.
I tested with oracle, should work with vertica also.
CREATE TABLE A
(PRODUCT CHAR(1),LOCATION VARCHAR(10),MONTHS VARCHAR(10))
INSERT INTO A (PRODUCT,LOCATION,MONTHS)
SELECT 'A','LOC A','2015-01' FROM DUAL
UNION
SELECT 'A','LOC B','2015-03' FROM DUAL
CREATE TABLE MONTHS
(MON VARCHAR(10))
INSERT INTO MONTHS(MON)
SELECT '2015-01' FROM DUAL
UNION
SELECT '2015-02' FROM DUAL
UNION
SELECT '2015-03' FROM DUAL
UNION
SELECT '2015-04' FROM DUAL
UNION
SELECT '2015-05' FROM DUAL
UNION
SELECT '2015-06' FROM DUAL
COMMIT
WITH CTE (I,PRODUCT,LOCATION,MON) AS
(
SELECT 1 I,BASE.PRODUCT,A.LOCATION,M.MON
FROM
(SELECT DISTINCT PRODUCT FROM A)BASE
CROSS JOIN
MONTHS M
LEFT JOIN A
ON A.MONTHS=M.MON
UNION ALL
SELECT I+1,PRODUCT,COALESCE(LOCATION,LAG(LOCATION)OVER(PARTITION BY PRODUCT ORDER BY MON)) AS LOC,MON
FROM
CTE WHERE I<12
)
SELECT DISTINCT PRODUCT,LOCATION,MON FROM CTE WHERE LOCATION IS NOT NULL
ORDER BY MON
You can generate all the month/product combinations using a cross join. Then use a correlated subquery to get the location from the most recent or current month:
select mon.month, p.product,
(select sh.location
from stock_history sh
where mon.month <= to_char(sh.time_stamp, 'YYYY-MM' ) and p.product = sh.product
order by mon.month desc
limit 1
) as location
from (select distinct product p from stock_history) p cross join
all_months mon;