Merge rows in SQL based on values in 2 tables - sql

I have a table with travel details. Details are getting saved in distributed manner. I need to merge the rows based on Source and Destination. My Source is A and Final Destination is D, I need to merge all the 3 rows into 1 with sum of time and distance. Here is an example.
Table #1: trip details
CarID
Source
Destination
Distance
Time
Date
1
A
P
10
1
1 Jan 2022
1
P
R
20
2
1 Jan 2022
1
R
D
30
3
1 Jan 2022
2
S
A
20
1
1 Jan 2022
2
A
F
10
2
1 Jan 2022
2
F
G
30
3
1 Jan 2022
2
S
A
10
1
2 Jan 2022
Table #2: TravelPlan
CarID
Source
Destination
Date
1
A
D
1 Jan 2022
2
S
G
1 Jan 2022
2
S
A
2 Jan 2022
Output needed:
CarID
Source
Destination
Distance
Time
Date
1
A
D
60
6
1 Jan 2022
2
S
G
60
6
1 Jan 2022
2
S
A
60
6
2 Jan 2022
I tried using concatenate but not able to do based on conditions. Not sure how to combine rows of one table based on values of another. 'Between' is also not giving me desired output.

Using your example data to construct DDL and DML (which is really useful for questions like this):
DECLARE #TripDetails TABLE (CarID INT, Source NVARCHAR(20), Destination NVARCHAR(20), Distance DECIMAL(5,2), Time DECIMAL(5,2), Date DATE)
INSERT INTO #TripDetails (CarID, Source, Destination, Distance, Time, Date) VALUES
(1, 'A', 'P', 10, 1, '1 Jan 2022'),
(1, 'P', 'R', 20, 2, '1 Jan 2022'),
(1, 'R', 'D', 30, 3, '1 Jan 2022'),
(2, 'S', 'A', 20, 1, '1 Jan 2022'),
(2, 'A', 'F', 10, 2, '1 Jan 2022'),
(2, 'F', 'G', 30, 3, '1 Jan 2022'),
(2, 'S', 'A', 10, 1, '2 Jan 2022')
DECLARE #TripPlan TABLE (CarID INT, Source NVARCHAR(20), Destination NVARCHAR(20), Date DATE)
INSERT INTO #TripPlan (CarID, Source, Destination, Date) VALUES
(1, 'A', 'D', '1 Jan 2022'),
(2, 'S', 'G', '1 Jan 2022'),
(2, 'S', 'A', '2 Jan 2022')
This then becomes a fairly straight forward JOIN and GROUP BY operation.
SELECT tp.CarID, tp.Source, tp.Destination, tp.Date, SUM(t.Distance) AS Distance, SUM(t.Time) AS Time
FROM #TripPlan tp
INNER JOIN #TripDetails t
ON tp.CarID = t.CarID
AND tp.Date = t.Date
GROUP BY tp.CarID, tp.Source, tp.Destination, tp.Date
CarID Source Destination Date Distance Time
--------------------------------------------------------
1 A D 2022-01-01 60.00 6.00
2 S A 2022-01-02 10.00 1.00
2 S G 2022-01-01 60.00 6.00
To deviate from the question a little:
I changed from the obvious data types for both Distance and Time as I could see both values needing to be expressed as decimals. There is no indication in the example data as to what the units for these columns is.
Detailing the units in your column names is a good idea, it's pretty much self documenting that way. If we're recording Time in minutes, say so in the column name: TimeMinutes, if we're recording distance in kilometers: DistanceKM.

Related

How to transform data into daily snapshot given the two date columns?

I have product data in my table which looks similar to this
product_id
user_id
sales_start
sales_end
quantity
1
12
2022-01-01
2022-02-01
15
2
234
2022-11-01
2022-12-31
123
I want to transform the table into a daily snapshot so that it would look something like this:
product_id
user_id
quantity
date
1
12
15
2022-01-01
1
12
15
2022-01-02
1
12
15
2022-01-03
...
...
...
...
2
234
123
2022-12-31
I know how to do a similar thing in Pandas, but I need to do it within AWS Athena.
I thought of getting the date interval and unnest it, but I am struggling with mapping them properly.
Any ideas on how to transform data?
This will help you sequence
SELECT product_id, user_id, quantity, date(date) as date FROM(
VALUES
(1, 12, DATE '2022-01-01', DATE '2022-02-01', 15),
(2, 234, DATE '2022-11-01', DATE '2022-12-31', 123)
) AS t (product_id, user_id, sales_start, sales_end, quantity),
UNNEST(sequence(sales_start, sales_end, interval '1' day)) t(date)
You can use sequnece to generate dates range and then unnest it:
-- sample data
with dataset(product_id, user_id, sales_start, sales_end, quantity) as (
values (1, 12 , date '2022-01-01', date '2022-01-05', 15), -- short date ranges
(2, 234, date '2022-11-01', date '2022-11-03', 123) -- short date ranges
)
-- query
select product_id, user_id, quantity, date
from dataset,
unnest(sequence(sales_start, sales_end, interval '1' day)) as t(date);
Output:
product_id
user_id
quantity
date
1
12
15
2022-01-01
1
12
15
2022-01-02
1
12
15
2022-01-03
1
12
15
2022-01-04
1
12
15
2022-01-05
2
234
123
2022-11-01
2
234
123
2022-11-02
2
234
123
2022-11-03

How can I build a date from 3 columns of float datatype?

I am using SQL Server 2014 and I have a table (t1) which has the following 3 columns:
DAY MONTH YEAR
2 11 2021
1 10 2021
12 10 2021
22 09 2021
All the 3 columns have a float datatype.
What would be the t-sql code to build a date from these 3 columns in the following format (YYYY-MM-DD):
CREATEDDATE
2021-11-02
2021-10-01
2021-10-12
2021-09-22
I have tried looking around for other questions similar to mine but I could not find a solution.
Any help would be appreciated.
The DATEFROMPARTS() function is an option (... returns a date value that maps to the specified year, month, and day values):
SELECT DATEFROMPARTS([YEAR], [MONTH], [DAY]) AS CREATEDATE
FROM (VALUES
(2, 11, 2021),
(1, 10, 2021),
(12, 10, 2021),
(22, 09, 2021)
) t ([DAY], [MONTH], [YEAR])
Or with the actual table:
SELECT DATEFROMPARTS([YEAR], [MONTH], [DAY]) AS CREATEDATE
FROM t1

Select hours as columns from Oracle table

I am working with an Oracle database table that is structured like this:
TRANS_DATE TRANS_HOUR_ENDING TRANS_HOUR_SUFFIX READING
1/1/2021 1 1 100
1/1/2021 2 1 105
... ... ... ...
1/1/2021 24 1 115
The TRANS_HOUR_SUFFIX is only used to track hourly readings on days where day light savings time ends (when there could be 2 hours with the same TRANS_HOUR value). This column is the bane of this database's design, however I'm trying to do something to select this data in a certain way. We need a report that columnizes this data based on the hour. Therefore, it would be structured like this (last day shows a day on which DST would end):
TRANS_DATE HOUR_1 HOUR_2_1 HOUR_2_2 ... HOUR_24
1/1/2021 100 105 0 ... 115
1/2/2021 112 108 0 ... 135
... ... ... ... ... ...
11/7/2021 117 108 107 ... 121
I have done something like this before with a PIVOT, however in this case I'm having trouble determining what I should do to account for the suffix. When DST ending happens, we have to account for this hour. I know that we can do this by selecting each hourly value individually with decode or case statements, but that is some messy code. Is there a cleaner way to do this?
You can include multiple source columns in the pivot for() and in() clauses, so you could do:
select *
from (
select trans_date,
trans_hour_ending,
trans_hour_suffix,
reading
from your_table
)
pivot (max(reading) for (trans_hour_ending, trans_hour_suffix)
in ((1, 1) as hour_1, (2, 1) as hour_2_1, (2, 2) as hour_2_2, (3, 1) as hour_3,
-- snip
(23, 1) as hour_23, (24, 1) as hour_24))
order by trans_date;
where every hour has a (24, 1) tuple, and the DST-relevant hour has an extra (2, 2) tuple.
If you don't have rows for every hour - which you don't appear to have form the very brief sample data, at least for suffix 2 for non-DST days - then you will get null results for those, but can replace them with zeros:
select trans_date,
coalesce(hour_1, 0) as hour_1,
coalesce(hour_2_1, 0) as hour_2_1,
coalesce(hour_2_2, 0) as hour_2_2,
coalesce(hour_3, 0) as hour_3,
-- snip
coalesce(hour_23, 0) as hour_23,
coalesce(hour_24, 0) as hour_24
from (
select trans_date,
trans_hour_ending,
trans_hour_suffix,
reading
from your_table
)
pivot (max(reading) for (trans_hour_ending, trans_hour_suffix)
in ((1, 1) as hour_1, (2, 1) as hour_2_1, (2, 2) as hour_2_2, (3, 1) as hour_3,
-- snip
(23, 1) as hour_23, (24, 1) as hour_24))
order by trans_date;
which with slightly expanded sample data gets:
TRANS_DATE HOUR_1 HOUR_2_1 HOUR_2_2 HOUR_3 HOUR_23 HOUR_24
---------- ---------- ---------- ---------- ---------- ---------- ----------
2021-01-01 100 105 0 0 0 115
2021-01-02 112 108 0 0 0 135
2021-11-07 117 108 107 0 0 121
Which is a bit long-winded when you have to include all 25 columns everywhere; but to avoid that you'd have to do a dynamic pivot.
Like I said in my comment, if you can format it with an additional row, I would recommend just having a row for the extra hour. Every other day would look normal. The query to do it would look like this:
CREATE TABLE READINGS
(
TRANS_DATE DATE,
TRANS_HOUR INTEGER,
TRANS_SUFFIX INTEGER,
READING INTEGER
);
INSERT INTO readings
SELECT TO_DATE('01/01/2021', 'MM/DD/YYYY'), 1, 1, 100 FROM DUAL UNION ALL
SELECT TO_DATE('01/01/2021', 'MM/DD/YYYY'), 2, 1, 100 FROM DUAL UNION ALL
SELECT TO_DATE('11/07/2021', 'MM/DD/YYYY'), 1, 1, 200 FROM DUAL UNION ALL
SELECT TO_DATE('11/07/2021', 'MM/DD/YYYY'), 1, 2, 300 FROM DUAL UNION ALL
SELECT TO_DATE('11/07/2021', 'MM/DD/YYYY'), 2, 1, 500 FROM DUAL UNION ALL
SELECT TO_DATE('11/07/2021', 'MM/DD/YYYY'), 2, 2, 350 FROM DUAL;
SELECT TRANS_DATE||DECODE(MAX(TRANS_SUFFIX) OVER (PARTITION BY TRANS_DATE), 1, NULL, 2, ' - '||TRANS_SUFFIX) AS TRANS_DATE,
HOUR_1, HOUR_2, /*...*/ HOUR_24
FROM readings
PIVOT (MAX(READING) FOR TRANS_HOUR IN (1 AS HOUR_1, 2 AS HOUR_2, /*...*/ 24 AS HOUR_24));
This would result in the following results (Sorry, I can't get dbfiddle to work):
TRANS_DATE
HOUR_1
HOUR_2
HOUR_24
01-JAN-21
100
100
-
07-NOV-21 - 1
200
500
-
07-NOV-21 - 2
300
350
-

Creating ranged mapping records from yearly entries

I have a table that maps an ID to an Associated ID (AssocID) over time and the database is built having one record per year. I would like to roll up the table have one record for each period of association.
Current Example:
ID AssocID Start End
1 a 2000 2001
1 a 2001 2002
1 b 2002 2003
1 b 2003 2004
1 a 2004 2005
...
1 a 2017 2018
2 c 2000 2001
2 c 2001 2002
2 d 2002 2003
...
2 d 2017 2018
and I am trying to make it look more like this:
ID AssocID Start End
1 a 2000 2002
1 b 2002 2004
1 a 2004 2018
2 c 2000 2002
2 d 2002 2018
My main problem is that ID '1' goes back to AssocID 'a' after time and using DISTINCT (ID, AssocID) and MIN (Start) misses the second time ID '1' maps to AssocID 'a'
Any help appreciated :)
You can use this.
-- Sample Data
DECLARE #MyTable TABLE (ID INT, AssocID VARCHAR(10), Start INT, [End] INT)
INSERT INTO #MyTable VALUES
(1, 'a', 2000, 2001),
(1, 'a', 2001, 2002),
(1, 'b', 2002, 2003),
(1, 'b', 2003, 2004),
(1, 'a', 2004, 2005),
(1, 'a', 2017, 2018),
(2, 'c', 2000, 2001),
(2, 'c', 2001, 2002),
(2, 'd', 2002, 2003),
(2, 'd', 2017, 2018)
-- Query
SELECT ID, AssocID, MIN(Start) [Start], MAX([End]) [End] FROM
( SELECT *,
GRP = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Start) - ROW_NUMBER() OVER(PARTITION BY ID, AssocID ORDER BY Start)
FROM #MyTable ) T
GROUP BY ID, AssocID, GRP
ORDER BY ID, [Start]
Result:
ID AssocID Start End
----------- ---------- ----------- -----------
1 a 2000 2002
1 b 2002 2004
1 a 2004 2018
2 c 2000 2002
2 d 2002 2018
This is an example of a gaps and islands problem. You need to first identify the start of each group grp_start and then group by each grp to find the min / max
declare #T table (ID int, AssocID varchar(3), Start int, [End] int)
insert into #T (ID, AssocID, Start, [End]) values
(1, 'a', 2000, 2001),(1, 'a', 2001, 2002),(1, 'b', 2002, 2003),(1, 'b', 2003, 2004),(1, 'a', 2004, 2005),(1, 'a', 2005, 2006),(1, 'a', 2006, 2007),(1, 'a', 2007, 2008),(1, 'a', 2008, 2009),(1, 'a', 2009, 2010),(1, 'a', 2010, 2011),(1, 'a', 2011, 2012),(1, 'a', 2012, 2013),(1, 'a', 2013, 2014),(1, 'a', 2014, 2015),(1, 'a', 2015, 2016),(1, 'a', 2016, 2017),(1, 'a', 2017, 2018),(2, 'c', 2000, 2001),(2, 'c', 2001, 2002),(2, 'd', 2002, 2003),(2, 'd', 2017, 2018)
select
ID,
AssocID,
min(Start),
max([End])
from
(
select *,
sum([grp_start]) over (partition by ID, AssocID order by [End]) as grp
from
(
select *,
case
when
lag([End]) over (partition by ID, AssocID order by [End]) <> [Start]
then 1 else 0
end as [grp_start]
from #T
) as T
)as T
group by ID, AssocID, grp
order by ID, min(Start), max([End])

How to retrieve data using SQL from below table (not using PL/SQL)

Source Table using DB2 or Oracle only SQL not Procedural Language
Name Date Amt Reason
----- ----- ------- ---------
A 10 Nov 200 Overdue
A 20 Nov 500 EMT
B 6 Dec 300 Overdue
B 3 Dec 100 EMT
Result is like unique Name among duplicates, minimum of Date, Maximum of Amt, If Overdue is there in any one of duplicates must print Overdue if not print else part. See result table as shown below
Name Date Amt Reason
----- ------ ----- --------
A 10 Nov 500 Overdue
B 3 Dec 300 Overdue
with input as (
select 'A' name , '10 Nov' "date", 200 amt, 'Overdue' reason from dual
union all select 'A', '20 Nov', 500, 'EMT' from dual
union all select 'B', '6 Dec', 300, 'Overdue' from dual
union all select 'B', '3 Dec', 100, 'EMT' from dual)
select name, min("date") "date", max(amt),
substr(max(decode(reason, 'Overdue', '2'||reason, '1'||reason) ), 2) reason
from input
group by name
For minimum date you can use:
min(to_date(decode(length("date"), 5, '0', '')||"date", 'dd MON'))
instead of min("date")