Converting date into integer (1 to 365) - sql

I have no idea if there is a function in postgres to do that, but how can I convert a date (yyyy-mm-dd) into the numeric correspondent in SQL?
E.g. table input
id | date
------+-------------
1 | 2013-01-01
2 | 2013-01-02
3 | 2013-02-01
Output
id | date
------+-------------
1 | 1
2 | 2
3 | 32

You are looking for the extract() function with the doy("Day Of Year") argument, not day ("day of the week"):
select id, extract(doy from "date")
from the_table;

Acording to this documentation an other option is to do something like:
SELECT id, DATE_PART('day', date - date_trunc('year', date)) + 1 as date
from table_name;
Here you can see a sql-fiddle.

Related

how to get month difference in 2 columns with date value in SQL developer?

I want to get the full month difference between DT_1 and DT_2. I tried the datediff, datepart and month, but it didn't work... I use SQL developer, and it shows:
ORA-00904: "DATEPART": invalid identifier
00904. 00000 - "%s: invalid identifier"
Data :
+---------+-----------+---------------+
| ID | DT_1 | DT_2 |
+---------+-----------+---------------+
| C111111 | 2018/1/1 | 2018/1/1 |
| C111112 | 2017/9/30 | 2018/10/25 |
| C111113 | 2018/10/1 | 2018/10/31 |
| C111114 | 2018/10/6 | 2018/12/14 |
+---------+-----------+---------------+
Expected results :
for C111111, the month difference should be 0 (1-1)
for C111112, the month difference should be 13 (22-9)
for C111113, the month difference should be 0 (10-10)
for C111114, the month difference should be 2 (12-10)
Can anyone help? Thanks.
You may need to see whether DT_1 and DT_2 are date type and then use MONTHS_BETWEEN function to get the date difference in months between two dates. Like Below:
SELECT MONTHS_BETWEEN (TO_DATE('2018/1/1','YYYY/MM/DD'), TO_DATE('2018/1/1','YYYY/MM/DD') ) "Months" FROM DUAL;
OR,
SELECT MONTHS_BETWEEN (TO_DATE(DT_1,'YYYY/MM/DD'), TO_DATE(DT_2,'YYYY/MM/DD') ) "Months" FROM TableName;
You want to count the number of month boundaries between two values. That is not what months_between() does. Instead, convert the values to number of months since some point in time and take the difference.
An easy way to do that uses arithmetic:
select t.*,
( (extract(year from dt_2) * 12 + extract(month from dt_2)) -
(extract(year from dt_1) * 12 + extract(month from dt_1))
) as month_difference
from t;
Alternatively, you can use trunc() with months_between():
select t.*,
months_between(trunc(dt_2, 'MON'), trunc(dt_1, 'MON'))

GBQ - Merge cells of a column across rows

I have a data table that looks like this
start_date | end_date | string
date x | date y | apple
date x | date y | orange
date z | date y | grape
I want to merge the string column if the start_date and end_date are the same across rows. So out put would look like this
start_date | end_date | string
date x | date y | apple/orange
date z | date y | grape
I am using Google big query SQL. Any help would be greatly appreciated.
Thank you.
Below is for BigQuery Standard SQL
#standardSQL
SELECT start_date, end_date, STRING_AGG(str, '/') str
FROM `project.dataset.table`
GROUP BY 1, 2
You want GROUP_CONCAT :
select start_date, end_date, GROUP_CONCAT(string) as string
from table t
group by start_date, end_date;

How to write a SQL statement to sum data using group by the same day of every two neighboring months

I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...
One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo
I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.
I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).
Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)

Is it possible to group by day, month or year with timestamp values?

I have a table ORDERS(idOrder, idProduct, Qty, OrderDate) where OrderDate is a varchar column with timestamp values, is it possible to get the Qty of each day, week, month or year ?
The table looks like this :
---------------------------------------
|idOrder | idProduct | Qty | OrderDate|
---------------------------------------
| 1 | 5 | 20 | 1504011790 |
| 2 | 5 | 50 | 1504015790 |
| 3 | 5 | 60 | 1504611790 |
| 4 | 5 | 90 | 1504911790 |
-----------------------------------------
and i want something like this
------------------------------
| idProduct | Qty | OrderDate|
-------------------------------
| 5 | 70 | 08/29/2017|
| 5 | 60 | 09/05/2017|
| 5 | 90 | 09/08/2017|
-------------------------------
looks like you want to do 2 things here: first group by your idProduct and OrderDate
select idProduct, sum(Qty), OrderDate from [yourtable] group by idProduct, OrderDate
This will get you the sums that you want. Next, you want to convert time formats. I assume that your stamps are in Epoch time (number of seconds from Jan 1, 1970) so converting them takes the form:
dateadd(s,[your time field],'19700101')
It also looks like you wanted your dates formatted as mm/dd/yyyy.
convert(NVARCHAR, [date],101) is the format for accomplishing that
Together:
select idProduct, sum(Qty), convert(NVARCHAR,dateadd(s,OrderDate,'19700101'), 101)
from [yourtable]
group by idProduct, OrderDate
Unfortunately, the TSQL TIMESTAMP data type isn't really a date. According to this SO question they're even changing the name because it's such a misnomer. You're much better off creating a DATETIME field with a DEFAULT = GETDATE() to keep an accurate record of when a line was created.
That being said, the most performant way I've seen to track dates down to the day/week/month/quarter/etc. is to use a date dimension table that just lists every date and has fields like WeekOfMonth and DayOfYearand. Once you join your new DateCreated field to it you can get all sorts of information about that date. You can google scripts that will create a date dimension table for you.
Yes its very simple:
TRUNC ( date [, format ] )
Format can be:
TRUNC(TO_DATE('22-AUG-03'), 'YEAR')
Result: '01-JAN-03'
TRUNC(TO_DATE('22-AUG-03'), 'MONTH')
Result: '01-AUG-03'
TRUNC(TO_DATE('22-AUG-03'), 'DDD')
Result: '22-AUG-03'
TRUNC(TO_DATE('22-AUG-03'), 'DAY')
Result: '17-AUG-03'

SQL: earliest date from set of date fields

I have a series of dates associated with a unique identifier in a table. For example:
1 | 1999-04-01 | 0000-00-00 | 0000-00-00 | 0000-00-00 | 2008-12-01 |
2 | 1999-04-06 | 2000-04-01 | 0000-00-00 | 0000-00-00 | 2010-04-03 |
3 | 1999-01-09 | 0000-00-00 | 0000-00-00 | 0000-00-00 | 2007-09-03 |
4 | 1999-01-01 | 0000-00-00 | 1997-01-01 | 0000-00-00 | 2002-01-04 |
Is there a way, to select the earliest date from the predefined list of DATE fields using a straightforward SQL command?
So the expected output would be:
1 | 1999-04-01
2 | 1999-04-06
3 | 1998-01-09
4 | 1997-01-01
I am guessing this is not possible but I wanted to ask and make sure. My current solution in mind involves putting all the dates in a temporary table and then using that to get the MIN()
thanks
Edit: The problem with using LEAST() as stated is that the new behaviour is to return NULL if any of the columns in NULL. In a series of dates like the dataset in question, any date might be NULL. I would like to obtain the earliest actual date from the set of dates.
SOLUTION: Used a combination of LEAST() and IF() in order to filter out NULL dates.
SELECT LEAST( IF(date1=0,NOW(),date1), IF(date2=0,NOW(),date2), [...] );
Lessons learnt a) COALESCE does not treat '0000-00-00' as a NULL date, b) LEAST will return '0000-00-00' as the smallest value - I would guess this is due to internal integer comparison(?)
select id, least(date_col_a, date_col_b, date_col_c) from table
upd
select id, least (
case when date_col_a = '0000-00-00' then now() + interval 100 year else date_col_a end,
case when date_col_b = '0000-00-00' then now() + interval 100 year else date_col_b end) from table
Actually you can do it like bellow or using a large case structure... or with least(date1, date2, dateN) but with that null could be the minimum value...
select rowid, min(date)
from
( select rowid, date1 from table
union all
select rowid, date2 from table
union all
select rowid, date3 from table
/* and so on */
)
group by rowid;
HTH
select
id,
least(coalesce(date1, '9999-12-31'), ....)
from
table