UPDATE a Table with the smallest date between 2 tables - sql

I have 2 Tables:
#Temdate1
+------+------------+---------------+--------+
| Year | Entry_Date | DeliveryMonth | Symbol |
+------+------------+---------------+--------+
| 2016 | 2016-01-07 | June | ABC |
| 2015 | 2015-01-06 | June | ABC |
| 2014 | 2014-01-05 | June | ABC |
| 2016 | 2016-03-05 | Sep | CDE |
| 2015 | 2015-03-04 | Sep | CDE |
| 2014 | 2014-03-03 | Sep | CDE |
+------+------------+---------------+--------+
and AllProducts
+-----------------+---------------+--------+
| Date | DeliveryMonth | Symbol |
+-----------------+---------------+--------+
| 2016-01-07 | June | ABC |
| 2016-01-08 | June | ABC |
| 2016-01-09 | June | ABC |
| 2016-01-10 | June | ABC |
| 2015-01-01 | June | ABC |
| 2015-01-02 | June | ABC |
| 2015-01-03 | June | ABC |
| 2014-01-05 | June | ABC |
+-----------------+---------------+--------+
Results I am looking for the Updated Table #Temdate1:
+------+------------+---------------+--------+
| Year | Entry_Date | DeliveryMonth | Symbol |
+------+------------+---------------+--------+
| 2016 | 2016-01-07 | June | ABC |
| 2015 | 2015-01-01 | June | ABC |
| 2014 | 2014-01-05 | June | ABC |
| 2016 | 2016-03-05 | Sep | CDE |
| 2015 | 2015-03-04 | Sep | CDE |
| 2014 | 2014-03-03 | Sep | CDE |
+------+------------+---------------+--------+
I have this query to find the smallest (earliest) date for a given Year and a given Product. With this query how to Update Temdate1 with the earliest date when ever it doesn't have the earliest date?
SELECT
Year
,CASE
WHEN MIN([Date])<entry_date THEN MIN([Date])
ELSE entry_date
END AS MDate
FROM #TempDate1 a
INNER JOIN AllProducts b on a.DeliveryMonth =b.DeliveryMonth AND a.Symbol = b.Symbol
GROUP BY Year,entry_date

It seems you make a typo in expected results, Or maybe was me
Update a
set Entry_Date = case when a.Entry_Date> b.Date then b.Date else a.Entry_Date end
from
#Tempdate1 a
inner join
#AllProducts b
on b.Symbol = a.Symbol
and b.DeliveryMonth = a.DeliveryMonth
and year(b.[Date]) = a.Year
http://rextester.com/AQXR21093

Related

How to Do Data-Grouping in BigQuery?

I have list of database that needed to be grouped. I've successfully done this by using R, yet now I have to do this by using BigQuery. The data is shown as per following table
| category | sub_category | date | day | timestamp | type | cpc | gmv |
|---------- |-------------- |----------- |----- |------------- |------ |------ |--------- |
| ABC | ABC-1 | 2/17/2020 | Mon | 11:37:36 PM | BI | 1.94 | 252,293 |
| ABC | ABC-1 | 2/17/2020 | Mon | 11:37:39 PM | RT | 1.94 | 252,293 |
| ABC | ABC-1 | 2/17/2020 | Mon | 11:38:29 PM | RT | 1.58 | 205,041 |
| ABC | ABC-1 | 2/18/2020 | Tue | 12:05:14 AM | BI | 1.6 | 208,397 |
| ABC | ABC-1 | 2/18/2020 | Tue | 12:05:18 AM | RT | 1.6 | 208,397 |
| ABC | ABC-1 | 2/18/2020 | Tue | 12:05:52 AM | RT | 1.6 | 208,397 |
| ABC | ABC-1 | 2/18/2020 | Tue | 12:06:33 AM | BI | 1.55 | 201,354 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:55:47 PM | PP | 1 | 129,282 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:56:23 PM | PP | 0.98 | 126,928 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:57:19 PM | PP | 0.98 | 126,928 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:57:34 PM | PP | 0.98 | 126,928 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:58:46 PM | PP | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:59:27 PM | PP | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:59:51 PM | RT | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 12:00:57 AM | BI | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 12:01:11 AM | PP | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 12:03:01 AM | PP | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 12:12:42 AM | RT | 1.19 | 154,886 |
I wanted to group the rows. A row that has <= 8 minutes timestamp-difference with the next row will be grouped as one row with below output example:
| category | sub_category | date | day | time | start_timestamp | end_timestamp | type | cpc | gmv |
|---------- |-------------- |----------------------- |--------- |---------- |--------------------- |--------------------- |---------- |------ |--------- |
| ABC | ABC-1 | 2/17/2020 | Mon | 23:37:36 | (02/17/20 23:37:36) | (02/17/20 23:38:29) | BI|RT | 1.82 | 236,542 |
| ABC | ABC-1 | 2/18/2020 | Tue | 0:05:14 | (02/18/20 00:05:14) | (02/18/20 00:06:33) | BI|RT | 1.59 | 206,636 |
| XYZ | XYZ-1 | 02/17/2020|02/18/2020 | Mon|Tue | 0:06:21 | (02/17/20 23:55:47) | (02/18/20 00:12:42) | PP|RT|BI | 0.95 | 123,815 |
There were some new-generated fields as per below:
| fields | definition |
|----------------- |-------------------------------------------------------- |
| day | Day of the row (combination if there's different days) |
| time | Start of timestamp |
| start_timestamp | Start timestamp of the first row in group |
| end_timestamp | Start timestamp of the last row in group |
| type | Type of Row (combination if there's different types) |
| cpc | Average CPC of the Group |
| gwm | Average GMV of the Group |
Could anyone help me to make the query as per above requirements?
Thank you
This is a gaps and island problem. Here is a solution that uses lag() and a cumulative sum() to define groups of adjacent records with less than 8 minutes gap; the rest is aggregation.
select
category,
sub_category,
string_agg(distinct day, '|' order by dt) day,
min(dt) start_dt,
max(dt) end_dt,
string_agg(distinct type, '|' order by dt) type,
avg(cpc) cpc,
avg(gwm) gwm
from (
select
t.*,
sum(case when dt <= datetime_add(lag_dt, interval 8 minute) then 0 else 1 end)
over(partition by category, sub_category order by dt) grp
from (
select
t.*,
lag(dt) over(partition by category, sub_category order by dt) lag_dt
from (
select t.*, datetime(date, timestamp) dt
from mytable t
) t
) t
) t
) t
group by category, sub_category, grp
Note that you should not be storing the date and time parts of your timestamps in separated columns: this makes the logic more complicated when you need to combine them (I added another level of nesting to avoid repeated conversions, which would have obfuscated the code).

Return 0 for count when no data present for that month

I am trying to get back 12 rows, 1 for each month from this month back.
I am using this to populate a graph, but as no data for early last year, the graph shows July's data as January's (as is first row of data).
Image of the current graph
As you can see in this image, the blue line ends in september. That is actually meant to be February's data. And March (first data point), is actually July's data
Because of this, I need to be able to return COUNT as 0 if there is no data for that month.
I have created a ref_months table which holds 12 records, each one a month Jan-Dec.
I have the following query:
SELECT
appointment.time AS appointmentdatetime,
ref_months.text AS ref_month_text,
ref_months.month AS ref_month_int,
YEAR(TIME) AS appointmentyear,
COUNT(appointment.id) AS COUNT
FROM
appointment
RIGHT OUTER JOIN ref_months ON ref_months.month = MONTH(appointment.time)
WHERE
appointment.time >= DATE_ADD(NOW(), INTERVAL - 12 MONTH)
GROUP BY
ref_months.month
ORDER BY
appointmentyear ASC,
ref_month_int ASC
This currently returns:
+---------------------+----------------+------------------+--------------------+-------+--+
| appointmentdatetime | ref_month_text | ref_month_int | appointmentyear | COUNT | |
+---------------------+----------------+------------------+--------------------+-------+--+
| 2019-07-27 13:00:00 | July | 7 | 2019 | 1 | |
| 2019-08-26 13:00:00 | August | 8 | 2019 | 2 | |
| 2019-09-06 13:00:00 | September | 9 | 2019 | 8 | |
| 2019-10-22 12:00:00 | October | 10 | 2019 | 9 | |
| 2019-11-21 12:00:00 | November | 11 | 2019 | 15 | |
| 2019-12-27 11:00:00 | December | 12 | 2019 | 2 | |
| 2020-01-22 15:00:00 | January | 1 | 2020 | 4 | |
| 2020-02-12 09:00:00 | February | 2 | 2020 | 1 | |
+---------------------+----------------+------------------+--------------------+-------+--+
What I need to return is this (last 12 months, if no data, show count as 0):
+---------------------+----------------+------------------+--------------------+-------+--+
| appointmentdatetime | ref_month_text | ref_month_int | appointmentyear | COUNT | |
+---------------------+----------------+------------------+--------------------+-------+--+
| NULL | March | 3 | NULL | 0 | |
| NULL | April | 4 | NULL | 0 | |
| NULL | May | 5 | NULL | 0 | |
| NULL | June | 6 | NULL | 0 | |
| 2019-07-27 15:00:00 | July | 7 | 2019 | 1 | |
| 2019-08-26 13:00:00 | August | 8 | 2019 | 2 | |
| 2019-09-06 13:00:00 | September | 9 | 2019 | 8 | |
| 2019-10-22 12:00:00 | October | 10 | 2019 | 9 | |
| 2019-11-21 12:00:00 | November | 11 | 2019 | 15 | |
| 2019-12-27 11:00:00 | December | 12 | 2019 | 2 | |
| 2020-01-22 15:00:00 | January | 1 | 2020 | 4 | |
| 2020-02-12 09:00:00 | February | 2 | 2020 | 1 | |
+---------------------+----------------+------------------+--------------------+-------+--+
I have tried every variation of LEFT, RIGHT, INNER joins, and still not getting back empty rows.
I strongly recommend LEFT JOIN and starting with the table where you want to keep everything. Then, be very careful about what goes into the WHERE clause.
So, I think you want:
SELECT a.time AS appointmentdatetime,
m.text AS ref_month_text,
m.month AS ref_month_int,
YEAR(TIME) AS appointmentyear,
COUNT(a.id) AS COUNT
FROM ref_months m LEFT JOIN
appointment a
ON r.month = MONTH(a.time) AND
a.time >= DATE_ADD(NOW(), INTERVAL - 12 MONTH)
GROUP BY m.month
ORDER BY appointmentyear ASC, ref_month_int ASC;
Your WHERE clause is undoing the outer join. There are some other things I would note:
Your GROUP BY does not match the unaggregated columns in the SELECT. This will generate a syntax error in most databases, including the more recent versions of MySQL.
The current month will be partially populated by this year and last year. That seems strange to me.
It is totally unclear why you want an arbitrary value of a.time in the result set. This, in particular, is screaming for MIN(), MAX(), or GROUP_CONCAT().
appointmentyear will be NULL on the rows where there is no appointment. That seems a bit weird.
If you want to address these, I would suggest asking a NEW question, with appropriate sample data, desired results, and explanation.

Is there a way to COUNT the amount of non-null between the NULLs in a column?

I have a query that is pulling financial figures and flags whether they have hit a target or not. I have a column that populates a 1 if the target is hit, and it NULLs if the target isn't hit. This is a simple CASE statement.
I need to be able count how many consecutive rows in that column are populated with a 1, and then stop counting when a NULL is hit, and then start counting again from the next non-null.
I have tried every combination of "COUNT(*) OVER" I can possibly think of, all not quite giving me the result I need.
I'll post the entire query as it's not too long -
SELECT
*,
CASE
WHEN zzz.Flag_hit_Target IS NOT NULL THEN COUNT(*) OVER (PARTITION BY zzz.Flag_hit_Target ORDER BY CAST(zzz.Close_month as DATE) DESC)
ELSE NULL
END AS Counter
FROM
(
SELECT
zz.Close_month,
SUM(MRP) as Total_MRP,
zz.Target,
CASE
WHEN SUM(MRP) >= zz.Target THEN 1
ELSE NULL
END AS Flag_hit_target
FROM
(
SELECT
Opp.id,
opp.MRP__c as MRP,
1500 as Target,
CONCAT(DATENAME(month, Closedate), ' ', DATEPART(year, Closedate)) as Close_month
FROM Table1 as Opp WITH (NOLOCK)
WHERE OPP_type__c = 'Opp Type 1'
AND Appointment_setter1__c = 'Person 1'
AND Stagename = 'Closed (Won)'
) as zz
GROUP BY zz.Close_month, zz.Target
) as zzz
ORDER by CAST(zzz.Close_month as DATE) desc
With this I get the following results -
+----------------+-----------------+---------+
| Close_month | Flag_hit_target | Counter |
+----------------+-----------------+---------+
| June 2019 | NULL | NULL |
| April 2019 | NULL | NULL |
| March 2019 | 1 | 1 |
| February 2019 | NULL | NULL |
| January 2019 | 1 | 2 |
| November 2018 | NULL | NULL |
| October 2018 | NULL | NULL |
| September 2018 | NULL | NULL |
| July 2018 | NULL | NULL |
| June 2018 | 1 | 3 |
| May 2018 | NULL | NULL |
| April 2018 | 1 | 4 |
| March 2018 | NULL | NULL |
| February 2018 | 1 | 5 |
| January 2018 | 1 | 6 |
| December 2017 | 1 | 7 |
| October 2017 | NULL | NULL |
| September 2017 | 1 | 8 |
| August 2017 | 1 | 9 |
| July 2017 | 1 | 10 |
| June 2017 | 1 | 11 |
| May 2017 | NULL | NULL |
| April 2017 | 1 | 12 |
| March 2017 | NULL | NULL |
| February 2017 | 1 | 13 |
| January 2017 | 1 | 14 |
+----------------+-----------------+---------+
The results I am after is as following (notice the end column) -
+----------------+-----------------+---------+
| Close_month | Flag_hit_target | Counter |
+----------------+-----------------+---------+
| June 2019 | NULL | NULL |
| April 2019 | NULL | NULL |
| March 2019 | 1 | 1 |
| February 2019 | NULL | NULL |
| January 2019 | 1 | 1 |
| November 2018 | NULL | NULL |
| October 2018 | NULL | NULL |
| September 2018 | NULL | NULL |
| July 2018 | NULL | NULL |
| June 2018 | 1 | 1 |
| May 2018 | NULL | NULL |
| April 2018 | 1 | 1 |
| March 2018 | NULL | NULL |
| February 2018 | 1 | 3 |
| January 2018 | 1 | 2 |
| December 2017 | 1 | 1 |
| October 2017 | NULL | NULL |
| September 2017 | 1 | 4 |
| August 2017 | 1 | 3 |
| July 2017 | 1 | 2 |
| June 2017 | 1 | 1 |
| May 2017 | NULL | NULL |
| April 2017 | 1 | 1 |
| March 2017 | NULL | NULL |
| February 2017 | 1 | 2 |
| January 2017 | 1 | 1 |
+----------------+-----------------+---------+
Thank you!
A solution is to use a ROW_NUMBER for all records, and substract the ROW_NUMBER value of the last NULL record for each date.
Setup:
IF OBJECT_ID('tempdb..#Test') IS NOT NULL
DROP TABLE #Test
CREATE TABLE #Test (
Date DATE,
Flag BIT)
INSERT INTO #Test (
Date,
Flag)
VALUES
('2019-09-01', NULL),
('2019-08-01', NULL),
('2019-07-01', 1),
('2019-06-01', NULL),
('2019-05-01', 1),
('2019-04-01', NULL),
('2019-03-01', NULL),
('2019-02-01', NULL),
('2019-01-01', 1),
('2018-12-01', NULL),
('2018-11-01', 1),
('2018-10-01', NULL),
('2018-09-01', 1),
('2018-08-01', 1),
('2018-07-01', 1),
('2018-06-01', NULL),
('2018-05-01', 1),
('2018-04-01', 1),
('2018-03-01', 1),
('2018-02-01', 1),
('2018-01-01', NULL)
Solution:
;WITH DataWithRowNumber AS
(
SELECT
T.*,
RowNumber = -1 + ROW_NUMBER() OVER (ORDER BY T.Date)
FROM
#Test AS T
)
SELECT
D.Date,
D.Flag,
D.RowNumber,
M.MaxPreviousNullRowNumber,
RowNumberRest = D.RowNumber - M.MaxPreviousNullRowNumber,
Counter = CASE WHEN D.Flag IS NOT NULL THEN D.RowNumber - M.MaxPreviousNullRowNumber END
FROM
DataWithRowNumber AS D
OUTER APPLY (
SELECT
MaxPreviousNullRowNumber = MAX(R.RowNumber)
FROM
DataWithRowNumber AS R
WHERE
R.Date < D.Date AND
R.Flag IS NULL) AS M
ORDER By
D.RowNumber DESC
Result:
+------------+------+-----------+--------------------------+---------------+---------+
| Date | Flag | RowNumber | MaxPreviousNullRowNumber | RowNumberRest | Counter |
+------------+------+-----------+--------------------------+---------------+---------+
| 2019-09-01 | NULL | 20 | 19 | 1 | NULL |
| 2019-08-01 | NULL | 19 | 17 | 2 | NULL |
| 2019-07-01 | 1 | 18 | 17 | 1 | 1 |
| 2019-06-01 | NULL | 17 | 15 | 2 | NULL |
| 2019-05-01 | 1 | 16 | 15 | 1 | 1 |
| 2019-04-01 | NULL | 15 | 14 | 1 | NULL |
| 2019-03-01 | NULL | 14 | 13 | 1 | NULL |
| 2019-02-01 | NULL | 13 | 11 | 2 | NULL |
| 2019-01-01 | 1 | 12 | 11 | 1 | 1 |
| 2018-12-01 | NULL | 11 | 9 | 2 | NULL |
| 2018-11-01 | 1 | 10 | 9 | 1 | 1 |
| 2018-10-01 | NULL | 9 | 5 | 4 | NULL |
| 2018-09-01 | 1 | 8 | 5 | 3 | 3 |
| 2018-08-01 | 1 | 7 | 5 | 2 | 2 |
| 2018-07-01 | 1 | 6 | 5 | 1 | 1 |
| 2018-06-01 | NULL | 5 | 0 | 5 | NULL |
| 2018-05-01 | 1 | 4 | 0 | 4 | 4 |
| 2018-04-01 | 1 | 3 | 0 | 3 | 3 |
| 2018-03-01 | 1 | 2 | 0 | 2 | 2 |
| 2018-02-01 | 1 | 1 | 0 | 1 | 1 |
| 2018-01-01 | NULL | 0 | NULL | NULL | NULL |
+------------+------+-----------+--------------------------+---------------+---------+
Ryan you need to implement the sql running total here , please check the link
https://codingsight.com/calculating-running-total-with-over-clause-and-partition-by-clause-in-sql-server/

Grouping by a column to compare values between similar rows

I'm trying to turn this
+----+---------+-------------------+-----------+
| id | year | desc | amount |
+----+---------+-------------------+-----------+
| 1 | 2017 | car | 500 |
| 2 | 2017 | car | 550 |
| 1 | 2018 | car | 490 |
| 2 | 2018 | car | 550 |
| 1 | 2017 | house | 200 |
| 2 | 2017 | house | 300 |
| 1 | 2018 | house | 210 |
| 2 | 2018 | house | 320 |
| 1 | 2019 | house | 290 |
| 2 | 2019 | house | 325 |
+----+---------+-------------------+-----------+
Into something like this
+----+---------+---------+-------------------+-----------+-----------+
| id | year_0 | year_1 | desc | amount_0 | amount_1 |
+----+---------+---------+-------------------+-----------+-----------+
| 1 | 2017 | 2018 | car | 500 | 490 |
| 2 | 2017 | 2018 | car | 550 | 550 |
| 1 | 2017 | 2018 | house | 200 | 210 |
| 2 | 2017 | 2018 | house | 300 | 320 |
+----+---------+---------+-------------------+-----------+-----------+
But I'm having difficulty getting the two years and two amounts to group by description.
You can achieve the result by applying join:
SELECT A.id,a.year year_0,b.year year_1, A.[desc], A.amount amount_0,B.amount amount_1
FROM
(SELECT * FROM YourTable WHERE Year= Datepart(year,GETDATE())-1) AS A
INNER JOIN
(SELECT * FROM YourTable WHERE Year= Datepart(year,GETDATE())) AS B
ON A.id=B.id AND A.[desc]=B.[desc]

Filtering after a group by produces a different outcome than MySQL

I have the following table from which I try to extract all cust_id who have bought an item for the first time in January.
I found a way with MySQL but I'm working with Hive and it doesn't work
Consider this table:
| cust_id | created | year | month | item |
|---------|---------------------|------|-------|------|
| 100 | 2017-01-01 19:20:00 | 2017 | 01 | ABC |
| 100 | 2017-01-01 19:20:00 | 2017 | 01 | DEF |
| 100 | 2017-01-08 22:45:00 | 2017 | 01 | GHI |
| 100 | 2017-08-03 08:01:00 | 2017 | 08 | JKL |
| 100 | 2017-01-01 21:23:00 | 2017 | 01 | MNO |
| 130 | 2016-12-06 06:42:00 | 2016 | 12 | PQR |
| 140 | 2017-01-21 15:01:00 | 2017 | 01 | STU |
| 130 | 2017-01-29 13:20:00 | 2017 | 01 | VWX |
| 140 | 2017-04-10 09:15:00 | 2017 | 04 | YZZ |
With the following query, it works:
SELECT
cust_id,
year,
month,
MIN(STR_TO_DATE(created, '%Y-%m-%d %H:%i:%s')) AS min_date
FROM
t1
GROUP BY
cust_id
HAVING
year = '2017'
AND
month= '01'
And it returns this table:
| cust_id | year | month | min_date |
|---------|------|-------|---------------------|
| 100 | 2017 | 01 | 2017-01-01 19:20:00 |
| 140 | 2017 | 01 | 2017-01-21 15:01:00 |
But in Hive, I cannot filter the fields year and month with HAVING if they have not been grouped by previously. In other words, the previous query fails.
Instead, the following runs but don't produce the expected result:
SELECT
cust_id,
year,
month,
MIN(unix_timestamp(created, 'yyyy-MM-dd HH:mm:ss')) AS min_date
FROM
t1
GROUP BY
cust_id, year, month
HAVING
year = '2017'
AND
month= '01'
cust_id 130 shows up even if the first purchase happened in december 2016
| cust_id | year | month | min_date |
|---------|------|-------|---------------------|
| 100 | 2017 | 01 | 2017-01-01 19:20:00 |
| 130 | 2017 | 01 | 2017-01-29 13:20:00 |
| 140 | 2017 | 01 | 2017-01-21 15:01:00 |
Here is the fiddle : SQL fiddle
Thank you
Your MySQL query doesn't really work, even if it runs. Never have "bare" columns in the group by or having or order by (of an aggregation query). All non-aggregated columns should be the arguments to an aggregation function. In your case, year and month fall into this category.
What you appear to want in either database is something like this:
SELECT cust_id
FROM t1
GROUP BY cust_id
HAVING MIN(created) >= '2017-01-01' AND
MIN(created) < '2017-02-01';