ORDER BY datediff() reversed with unnamed column in mariaDB - sql

Example queries below, can you tell me why they return a different result? Specifically, why the order is reversed.
There's only one difference between the two: in the second query, the datediff in the select clause is named and re-used in the ORDER BY, while in the first one it is not named.
This is with mariadb 10.1.18 as well as 10.2.12.
First query:
select Start_Date, min(End_Date), datediff(min(End_Date), Start_Date)
from (
select Start_Date
from Projects
where Start_Date
not in (select End_Date from Projects)
) a,
(select End_Date
from Projects
where End_Date
not in (select Start_Date from Projects)
) b
where Start_Date < End_Date
group by Start_Date
order by datediff(min(End_Date), Start_Date)
;
+------------+---------------+-------------------------------------+
| Start_Date | min(End_Date) | datediff(min(End_Date), Start_Date) |
+------------+---------------+-------------------------------------+
| 2015-10-01 | 2015-10-04 | 3 |
| 2015-10-13 | 2015-10-15 | 2 |
| 2015-10-28 | 2015-10-29 | 1 |
| 2015-10-30 | 2015-10-31 | 1 |
+------------+---------------+-------------------------------------+
Second query:
select Start_Date, min(End_Date), datediff(min(End_Date), Start_Date) as 'test_diff'
from (
select Start_Date
from Projects
where Start_Date
not in (select End_Date from Projects)
) a,
(select End_Date
from Projects
where End_Date
not in (select Start_Date from Projects)
) b
where Start_Date < End_Date
group by Start_Date
order by test_diff
;
+------------+---------------+-----------+
| Start_Date | min(End_Date) | test_diff |
+------------+---------------+-----------+
| 2015-10-28 | 2015-10-29 | 1 |
| 2015-10-30 | 2015-10-31 | 1 |
| 2015-10-13 | 2015-10-15 | 2 |
| 2015-10-01 | 2015-10-04 | 3 |
+------------+---------------+-----------+

your second Query has
order by test_diff
and your first does not in you add this line to the first it will show as the second does.
if you change the order by on the second query to
order by test_diff DESC
it will look like the first, putting the result in DESCending Order

Sounds like a bug. Please file a bug report.
Meanwhile, the problem can probably be worked around by making a subquery of most of the query, then doing the ORDER BY in the outside query.

Related

Get max dates for each customer

Let's say I have a customer table like so:
id | start_date | created_at
-----------------------------
1 | 2020-1-15 | 2020-1-15
1 | 2020-1-16 | 2020-1-15
1 | 2020-1-16 | 2020-1-16
2 | 2020-1-15 | 2020-1-15
2 | 2020-1-16 | 2020-1-15
I want to get 1 row per customer id that has the max(start_date) and if it's the same date will use the max(created_at).
Result should look like this:
id | start_date | created_at
-----------------------------
1 | 2020-1-16 | 2020-1-16
2 | 2020-1-16 | 2020-1-15
I'm having a hard time with window functions as I thought a partition by id would work but I have 2 dates.
Maybe I use a group by?
please try this one, you could use order by two columns
SELECT * FROM (
SELECT id, start_date, Created_At, ROW_NUMBER()OVER(PARTITION BY id ORDER BY start_date DESC, Created_At DESC) AS R
FROM #date
) A
WHERE A.R = 1

How to add records for each user based on another existing row in BigQuery?

Posting here in case someone with more knowledge than may be able to help me with some direction.
I have a table like this:
| Row | date |user id | score |
-----------------------------------
| 1 | 20201120 | 1 | 26 |
-----------------------------------
| 2 | 20201121 | 1 | 14 |
-----------------------------------
| 3 | 20201125 | 1 | 0 |
-----------------------------------
| 4 | 20201114 | 2 | 32 |
-----------------------------------
| 5 | 20201116 | 2 | 0 |
-----------------------------------
| 6 | 20201120 | 2 | 23 |
-----------------------------------
However, from this, I need to have a record for each user for each day where if a day is missing for a user, then the last score recorded should be maintained then I would have something like this:
| Row | date |user id | score |
-----------------------------------
| 1 | 20201120 | 1 | 26 |
-----------------------------------
| 2 | 20201121 | 1 | 14 |
-----------------------------------
| 3 | 20201122 | 1 | 14 |
-----------------------------------
| 4 | 20201123 | 1 | 14 |
-----------------------------------
| 5 | 20201124 | 1 | 14 |
-----------------------------------
| 6 | 20201125 | 1 | 0 |
-----------------------------------
| 7 | 20201114 | 2 | 32 |
-----------------------------------
| 8 | 20201115 | 2 | 32 |
-----------------------------------
| 9 | 20201116 | 2 | 0 |
-----------------------------------
| 10 | 20201117 | 2 | 0 |
-----------------------------------
| 11 | 20201118 | 2 | 0 |
-----------------------------------
| 12 | 20201119 | 2 | 0 |
-----------------------------------
| 13 | 20201120 | 2 | 23 |
-----------------------------------
I'm trying to to this in BigQuery using StandardSQL. I have an idea of how to keep the same score across following empty dates, but I really don't know how to add new rows for missing dates for each user. Also, just to keep in mind, this example only has 2 users, but in my data I have more than 1500.
My end goal would be to show something like the average of the score per day. For background, because of our logic, if the score wasn't recorded in a specific day, this means that the user is still in the last score recorded which is why I need a score for every user every day.
I'd really appreciate any help I could get! I've been trying different options without success
Below is for BigQuery Standard SQL
#standardSQL
select date, user_id,
last_value(score ignore nulls) over(partition by user_id order by date) as score
from (
select user_id, format_date('%Y%m%d', day) date,
from (
select user_id, min(parse_date('%Y%m%d', date)) min_date, max(parse_date('%Y%m%d', date)) max_date
from `project.dataset.table`
group by user_id
) a, unnest(generate_date_array(min_date, max_date)) day
)
left join `project.dataset.table` b
using(date, user_id)
-- order by user_id, date
if applied to sample data from your question - output is
One option uses generate_date_array() to create the series of dates of each user, then brings the table with a left join.
select d.date, d.user_id,
last_value(t.score ignore nulls) over(partition by d.user_id order by d.date) as score
from (
select t.user_id, d.date
from mytable t
cross join unnest(generate_date_array(min(date), max(date), interval 1 day)) d(date)
group by t.user_id
) d
left join mytable t on t.user_id = d.user_id and t.date = d.date
I think the most efficient method is to use generate_date_array() but in a very particular way:
with t as (
select t.*,
date_add(lead(date) over (partition by user_id order by date), interval -1 day) as next_date
from t
)
select row_number() over (order by t.user_id, dte) as id,
t.user_id, dte, t.score
from t cross join join
unnest(generate_date_array(date,
coalesce(next_date, date)
interval 1 day
)
) dte;

Previous Month MTD

Is there anyways I can find Previous month's MTD?
So my data is at day level and I need to find MTD and previous month MTD
D_date product TOTAL_UNIT
01/AUG/2020 A 10
01/AUG/2020 B 20
02/AUG/2020 A 15
02/AUG/2020 B 25
29/JUL/2020 A 5
29/JUL/2020 B 0
30/JUL/2020 A 2
31/JUL/2020 B 30
I can get current month MTD using below SQL (Oracle)
SUM(TOTAL_UNIT)OVER( PARTITION BY PRODUCT, TRUNC(D_DATE,'MM') ORDER BY D_DATE )MTD
However, when I do add_months -1 to get PMTD, it still shows the current MTD
I tried doing
SUM(TOTAL_UNIT)OVER( PARTITION BY PRODUCT, TRUNC(ADD_MONTHS(D_DATE,-1),'MM') ORDER BY D_DATE )MTD
Another way I can find is by doing a self-join but I would like to avoid that for performance issues.
Changing your partition-by clause to TRUNC(ADD_MONTHS(D_DATE,-1),'MM') - or ADD_MONTHS(TRUNC(D_DATE,'MM'),-1) - gives you a different value for that partitioning bu exactly the same groups as plain TRUNC(D_DATE,'MM').
If you want to get the last MTD before the current month you can put your existing query as a subquery and use lag():
select d_date, product, total_unit, m_date, mtd,
last_value(mtd) over (partition by product order by m_date range between unbounded preceding and 1 preceding) as prev_mtd
from (
select d_date, product, total_unit,
TRUNC(D_DATE,'MM') m_date,
SUM(TOTAL_UNIT)OVER( PARTITION BY PRODUCT, TRUNC(D_DATE,'MM') ORDER BY D_DATE )MTD
from your_table
)
order by product, d_date;
D_DATE | PRODUCT | TOTAL_UNIT | M_DATE | MTD | PREV_MTD
:-------- | :------ | ---------: | :-------- | --: | -------:
29-JUL-20 | A | 5 | 01-JUL-20 | 5 | null
30-JUL-20 | A | 2 | 01-JUL-20 | 7 | null
01-AUG-20 | A | 10 | 01-AUG-20 | 10 | 7
02-AUG-20 | A | 15 | 01-AUG-20 | 25 | 7
29-JUL-20 | B | 0 | 01-JUL-20 | 0 | null
31-JUL-20 | B | 30 | 01-JUL-20 | 30 | null
01-AUG-20 | B | 20 | 01-AUG-20 | 20 | 30
02-AUG-20 | B | 25 | 01-AUG-20 | 45 | 30
db<>fiddle
That is because you're using add_months inside an analytic function. So what you are doing is calculate the sum, partitioned by the month, as if it were the previous month.. While the last part doesn't make much sense to me, it does give you the correct SUM, but you didn't tell SQL to show the the previous month as well.
If you want the D_DATE to be showed as previous month, add another column trunc( ADD_MONTHS( d_date, -1), 'MM') as pmtd
select trunc( ADD_MONTHS( sysdate, -1), 'MM') from dual;
This will always get you the first date of the previous month.

SQL - Insert multiple rows based on one record with conditions

I need to insert multiple rows based on one record in Table A to Table B. The query needs to grab each day from the start and end date in Table A and check if it is working day. If it's non-working days(weekends), it will not insert into Table B.
Scenario as below:
Table A:
+ LID + Start_Date + End_Date + Working_Day + Total_Days
------------------------------------------------------------
| 101 | 1-Jan-18 | 5-Jan-2018 | Yes | 5 |
Table B (Expected Result):
+ LID + Start_Date + End_Date +
---------------------------------
| 101 | 1-Jan-18 | 1-Jan-2018 |
| 101 | 2-Jan-18 | 2-Jan-2018 |
| 101 | 3-Jan-18 | 3-Jan-2018 |
| 101 | 4-Jan-18 | 4-Jan-2018 |
| 101 | 5-Jan-18 | 5-Jan-2018 |
If I understand correctly, you can expand the data using a recursive CTE and then filter out the weekend days:
with cte as (
select ltd, start_date, end_date,
from a
union all
select ltd, date_add(day, 1, start_date), end_date
from cte
where start_date < end_date
)
select ltd, start_date, end_date
from cte
where datename(weekday, start_date) not in ('Saturday', 'Sunday');

SQL/Postgres datetime division / normalizing

I have this activity table
+--------------+------------------+
| Field | Type |
+--------------+------------------+
| id | int(11) unsigned |
| start_date | timestamp |
| end_date | timestamp |
| ... | |
+--------------+------------------+
I need a view which groups these activities by start_date by DAY, but in such a way that, if the end_date is not in the same day as start_date, the view contain the entry again but with the start_date set to 00:00 of the next day.. (and so on, repeated as many times as needed until the start_date is in the same day as the end_date)
As an example:
if the activity table contains:
+--------------+----------------------------+----------------------------+
| id | start_date | end_date |
+--------------+----------------------------+----------------------------+
| 1 | 2014-12-02 14:12:00+00 | 2014-12-03 06:45:00+00 |
| 2 | 2014-12-05 15:25:00+00 | 2014-12-05 07:29:00+00 |
+--------------+----------------------------+----------------------------+
The view should contain:
+--------------+----------------------------+----------------------------+
| activity_id | start_date | end_date |
+--------------+----------------------------+----------------------------+
| 1 | 2014-12-02 14:12:00+00 | 2014-12-02 23:59:59+00 |
| 1 | 2014-12-03 00:00:00+00 | 2014-12-03 06:45:00+00 |
| 2 | 2014-12-05 15:25:00+00 | 2014-12-05 07:29:00+00 |
+--------------+----------------------------+----------------------------+
Any help would be greatly appreciated!
PS: I'm using postgresql
To get the needed rows, start by using a set returning function along with a lateral join. From there, use CASE statements and date arithmetics to pull out the relevant values.
Here's an example to get you started:
with data as (
select id, start_date, end_date
from (values
(1, '2014-12-02 14:12:00+00'::timestamptz, '2014-12-03 06:45:00+00'::timestamptz),
(2, '2014-12-05 15:25:00+00'::timestamptz, '2014-12-05 07:29:00+00'::timestamptz)
) as rows (id, start_date, end_date)
)
select data.id,
case days.d = date_trunc('day', data.start_date)
when true then data.start_date
else days.d
end as start_date,
case days.d = date_trunc('day', data.end_date)
when true then data.end_date
else days.d + interval '1 day' - interval '1 sec'
end as end_date
from data
join generate_series(
date_trunc('day', data.start_date),
date_trunc('day', data.end_date),
'1 day'
) as days (d)
on days.d >= date_trunc('day', data.start_date)
and days.d <= date_trunc('day', data.end_date)
id | start_date | end_date
----+------------------------+------------------------
1 | 2014-12-02 15:12:00+01 | 2014-12-02 23:59:59+01
1 | 2014-12-03 00:00:00+01 | 2014-12-03 07:45:00+01
2 | 2014-12-05 16:25:00+01 | 2014-12-05 08:29:00+01
(3 rows)
As an aside, depending on what you're doing, it might make more sense for you to use a date range:
with data as (
select id, start_date, end_date
from (values
(1, '2014-12-02 14:12:00+00'::timestamptz, '2014-12-03 06:45:00+00'::timestamptz),
(2, '2014-12-05 07:25:00+00'::timestamptz, '2014-12-05 15:29:00+00'::timestamptz)
) as rows (id, start_date, end_date)
)
select data.id,
tstzrange(data.start_date, data.end_date)
from data;
id | tstzrange
----+-----------------------------------------------------
1 | ["2014-12-02 15:12:00+01","2014-12-03 07:45:00+01")
2 | ["2014-12-05 08:25:00+01","2014-12-05 16:29:00+01")
(2 rows)