GBQ - Merge cells of a column across rows - sql

I have a data table that looks like this
start_date | end_date | string
date x | date y | apple
date x | date y | orange
date z | date y | grape
I want to merge the string column if the start_date and end_date are the same across rows. So out put would look like this
start_date | end_date | string
date x | date y | apple/orange
date z | date y | grape
I am using Google big query SQL. Any help would be greatly appreciated.
Thank you.

Below is for BigQuery Standard SQL
#standardSQL
SELECT start_date, end_date, STRING_AGG(str, '/') str
FROM `project.dataset.table`
GROUP BY 1, 2

You want GROUP_CONCAT :
select start_date, end_date, GROUP_CONCAT(string) as string
from table t
group by start_date, end_date;

Related

Creating daterange in SQL

Given the date range:
'20180504' and '20180425'
I want to write a query that would return the following dates
'20180504'
'20180503'
'20180502'
'20180501'
'20180430'
'20180429'
'20180428'
'20180427'
'20180426'
'20180425'
Could anyone suggest what would be the best way to generate dates like these? The date format should be same as above, because I would use it to extract data from another table. Thanks!
You can use a hierarchical query:
SQL Fiddle
Query 1:
SELECT TO_CHAR( DATE '2018-04-25' + LEVEL - 1, 'YYYYMMDD' ) AS value
FROM DUAL
CONNECT BY DATE '2018-04-25' + LEVEL - 1 <= DATE '2018-05-04'
Results:
| VALUE |
|----------|
| 20180425 |
| 20180426 |
| 20180427 |
| 20180428 |
| 20180429 |
| 20180430 |
| 20180501 |
| 20180502 |
| 20180503 |
| 20180504 |
You seem to want a string output, so you can generate the dates and then convert to strings:
with dates as (
select date '2018-04-25' + level - 1 as dte
from dual
connect by date '2018-04-25' + level - 1 <= date '2018-05-04'
)
select to_char(dte, 'YYYYMMDD')
from dates;
Here is a rextester.

How do I apply a function to each subgroup of a table in SQL

I want to find the minimum value of a column in a certain date range of a table.
so lets say I have a table like the following,
Date | Value
---------------
01-26 | 2
01-26 | 1
01-27 | 2
01-27 | 4
01-28 | 3
01-28 | 5
How can I apply the MIN() function to the subgroup of the Value column so that the result might be
Date | MIN(Value)
---------------
01-26 | 1
01-27 | 2
01-28 | 3
I thought about GROUP BY .. or such but couldn't figure out how to get the results into a table.
Using UNION and JOIN isn't quite scalable because the query could be using a date range of a month
Group by should work:
Select date, min( value )
From table1
Group by date
Maybe too simple, but seems like this would work
Select Min(col1), datecol from yourtable group by datecol;
HTH

Converting date into integer (1 to 365)

I have no idea if there is a function in postgres to do that, but how can I convert a date (yyyy-mm-dd) into the numeric correspondent in SQL?
E.g. table input
id | date
------+-------------
1 | 2013-01-01
2 | 2013-01-02
3 | 2013-02-01
Output
id | date
------+-------------
1 | 1
2 | 2
3 | 32
You are looking for the extract() function with the doy("Day Of Year") argument, not day ("day of the week"):
select id, extract(doy from "date")
from the_table;
Acording to this documentation an other option is to do something like:
SELECT id, DATE_PART('day', date - date_trunc('year', date)) + 1 as date
from table_name;
Here you can see a sql-fiddle.

Populating a table with all dates in a given range in Google BigQuery

Is there any convenient way to populate a table with all dates in a given range in Google BigQuery? What I need are all dates from 2015-06-01 till CURRENT_DATE(), so something like this:
+------------+
| date |
+------------+
| 2015-06-01 |
| 2015-06-02 |
| 2015-06-03 |
| ... |
| 2016-07-11 |
+------------+
Optimally, the next step would be to also get all weeks between the two dates, i.e.:
+---------+
| week |
+---------+
| 2015-23 |
| 2015-24 |
| 2015-25 |
| ... |
| 2016-28 |
+---------+
I've been fiddling around with the following answers I found, but I can't get them to work, mostly because core functions aren't supported and I can't find proper ways to replace them.
Easiest way to populate a temp table with dates between and including 2 date parameters
Generate Dates between date ranges
Your help is very much appreciated!
Best,
Max
Mikhail's answer works for BigQuery's legacy sql syntax perfectly. This solution is a slightly easier one if you're using the standard SQL syntax.
BigQuery standard SQL syntax actually has a built in function, GENERATE_DATE_ARRAY for creating an array from a date range. It takes a start date, end date and INTERVAL. For example:
SELECT day
FROM UNNEST(
GENERATE_DATE_ARRAY(DATE('2015-06-01'), CURRENT_DATE(), INTERVAL 1 DAY)
) AS day
If you wanted the week and year you could use
SELECT EXTRACT(YEAR FROM day), EXTRACT(WEEK FROM day)
FROM UNNEST(
GENERATE_DATE_ARRAY(DATE('2015-06-01'), CURRENT_DATE(), INTERVAL 1 WEEK)
) AS day
all dates from 2015-06-01 till CURRENT_DATE()
SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS DAY
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
FROM (SELECT NULL)),h
)))
all weeks between the two dates
SELECT YEAR(DAY) AS y, WEEK(DAY) AS w
FROM (
SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS DAY
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
FROM (SELECT NULL)),h
)))
)
GROUP BY y, w

SQL : Getting data as well as count from a single table for a month

I am working on a SQL query where I have a rather huge data-set. I have the table data as mentioned below.
Existing table :
+---------+----------+----------------------+
| id(!PK) | name | Date |
+---------+----------+----------------------+
| 1 | abc | 21.03.2015 |
| 1 | def | 22.04.2015 |
| 1 | ajk | 22.03.2015 |
| 3 | ghi | 23.03.2015 |
+-------------------------------------------+
What I am looking for is an insert query into an empty table. The condition is like this :
Insert in an empty table where id is common, count of names common to an id for march.
Output for above table would be like
+---------+----------+------------------------+
| some_id | count | Date |
+---------+----------+----------------------+
| 1 | 2 | 21.03.2015 |
| 3 | 1 | 23.03.2015 |
+-------------------------------------------+
All I have is :
insert into empty_table values (some_id,count,date)
select id,count(*),date from existing_table where id=1;
Unfortunately above basic query doesn't suit this complex requirement.
Any suggestions or ideas? Thank you.
Udpated query
insert into empty_table
select id,count(*),min(date)
from existing_table where
date >= '2015-03-01' and
date < '2015-04-01'
group by id;
Seems you want the number of unique names per id:
insert into empty_table
select id
,count(distinct name)
,min(date)
from existing_table
where date >= DATE '2015-03-01'
and date < DATE '2015-04-01'
group by id;
If I understand correctly, you just need a date condition:
insert into empty_table(some_id, count, date)
select id, count(*), min(date)
from existing_table
where id = 1 and
date >= date '2015-03-01' and
date < date '2015-04-01'
group by id;
Note: the list after the table name contains the columns being inserted. There is no values keyword when using insert . . . select.
insert into empty_table
select id, count(*) as mycnt, min(date) as mydate
from existing_table
group by id, year_month(date);
Please use function provided by your RDBMS obtaining date part containing only year and month as far as you did not provide the RDBMS version and the date processing functionality varies wildly between them.