Merge the table based on date and condition - sql

I have following table:
DROP TABLE IF EXISTS t
CREATE TABLE t
(
id INT IDENTITY PRIMARY KEY,
dt datetime,
type int,
grp int,
typecol1 varchar(10),
typecol2 varchar(10),
typecol3 varchar(10),
typecol4 varchar(10)
)
INSERT INTO t (dt,type,grp,typecol1,typecol2,typecol3,typecol4)
VALUES
('2019-01-15',1,1,'A',null,null,null),
('2019-01-15',2,2,null,'B',null,null),
('2019-01-15',3,3,null,null,'C',null),
('2019-01-15',4,4,null,null,null,'D'),
('2019-02-15',1,1,'AA',null,null,null),
('2019-02-15',4,2,null,null,null,'DD'),
('2019-03-15',3,1,null,null,'CCC',null),
('2019-04-15',2,1,null,'BBBB',null,NULL);
In this table type will be 1,2,3,4.. here date and type both are composite key.
I need to merge the row based if same date exist to single row
and merge based on only below condition
if same date &
type=1 then merge to typecol1
type=2 then merge to typecol2
type=3 then merge to typecol3
type=4 then merge to typecol4
and grp col is based on running count of date.

Try GROUP BY
FIDDLE DEMO
SELECT dt, MAX(typecol1) typecol1, MAX(typecol2) typecol2, MAX(typecol3) typecol3,
MAX(typecol4) typecol4
FROM t
GROUP BY dt
Output
dt typecol1 typecol2 typecol3 typecol4
15/01/2019 00:00:00 A B C D
15/02/2019 00:00:00 AA DD
15/03/2019 00:00:00 CCC
15/04/2019 00:00:00 BBBB

You just need grouping by ID with MAX() aggregation for rest of the columns :
SELECT dt,MAX(typecol1) as typecol1,
MAX(typecol2) as typecol2,
MAX(typecol3) as typecol3,
MAX(typecol4) as typecol4
FROM t
GROUP BY dt
Demo

Related

Compare a single-column row-set with another single-column row set in Oracle SQL

Is there any Oracle SQL operator or function, which compares 2 result sets whether they are the exact same or not. Currently my idea is to use MINUS operator in both directions, but I am looking for a better and performanter solution to achieve. The one result set is fixed (see below), the other depends on the records.
Very important: I am not allowed to change the schema and structure. So CREATE TABLE and CREATE TYPE etc. are not allowed here for me. Also important that oracle11g version is used where the solution must be found.
The shema for SQL Fiddle is:
CREATE TABLE DETAILS (ID INT, MAIN_ID INT, VALUE INT);
INSERT INTO DETAILS VALUES (1,1,1);
INSERT INTO DETAILS VALUES (2,1,2);
INSERT INTO DETAILS VALUES (3,1,3);
INSERT INTO DETAILS VALUES (4,1,4);
INSERT INTO DETAILS VALUES (5,2,1);
INSERT INTO DETAILS VALUES (6,2,2);
INSERT INTO DETAILS VALUES (7,3,1);
INSERT INTO DETAILS VALUES (7,3,2);
Now this is my SQL query for doing the job well (selects MAIN_IDs of those, whose 'VALUE's are exactly the same as the given lists'):
SELECT DISTINCT D.MAIN_ID FROM DETAILS D WHERE NOT EXISTS
(SELECT VALUE FROM DETAILS WHERE MAIN_ID=D.MAIN_ID
MINUS
SELECT * FROM TABLE(SYS.ODCINUMBERLIST(1, 2)))
AND NOT EXISTS
(SELECT * FROM TABLE(SYS.ODCINUMBERLIST(1, 2))
MINUS
SELECT VALUE FROM DETAILS WHERE MAIN_ID=D.MAIN_ID)
The SQL Fiddle link: http://sqlfiddle.com/#!4/25dde/7/0
If you use a collection (rather than a VARRAY) then you can aggregate the values into a collection and directly compare two collections:
CREATE TYPE int_list AS TABLE OF INT;
Then:
SELECT main_id
FROM details
GROUP BY main_id
HAVING CAST( COLLECT( value ) AS int_list ) = int_list( 1, 2 );
Outputs:
| MAIN_ID |
| ------: |
| 2 |
| 3 |
db<>fiddle here
Update
Based on your expanded fiddle in comments, you can use:
SELECT B.ID
FROM BUSINESS_DATA B
INNER JOIN BUSINESS_NAME N
ON ( B.NAME_ID=N.ID )
WHERE N.NAME='B1'
AND EXISTS (
SELECT business_id
FROM ORDERS O
LEFT OUTER JOIN TABLE(
SYS.ODCIDATELIST( DATE '2021-01-03', DATE '2020-04-07', DATE '2020-05-07' )
) d
ON ( o.orderdate = d.COLUMN_VALUE )
WHERE O.BUSINESS_ID=B.ID
GROUP BY business_id
HAVING COUNT( CASE WHEN d.COLUMN_VALUE IS NULL THEN 1 END ) = 0
AND COUNT( DISTINCT o.orderdate )
= ( SELECT COUNT(DISTINCT COLUMN_VALUE) FROM TABLE( SYS.ODCIDATELIST( DATE '2021-01-03', DATE '2020-04-07', DATE '2020-05-07' ) ) )
)
(Note: Do not implicitly create dates from strings; it will cause the query to fail, without there being any changes to the query text, if a user changes their NLS_DATE_FORMAT session parameter. Instead use TO_DATE with an appropriate format model or a DATE literal.)
db<>fiddle here

Select min date record from duplicates in table

Let say that I have this table "contract" which have duplicated records in the "END" column for the same ID.
ID
Begin
End
20
2016-01-01
9999-12-31
20
2020-01-01
9999-12-31
30
2018-01-01
2019-02-28
30
2019-03-01
9999-12-31
30
2020-02-01
9999-12-31
10
2019-01-01
2019-06-30
10
2019-07-01
2020-02-29
10
2020-03-01
9999-12-31
I want to get the oldest date in the "Begin" column for all the ID's that have duplicated records in the "END" column with the date "9999-12-31". So for this example I expect to get:
ID
Begin
20
2016-01-01
30
2019-03-01
I made an SQL script, but there should be a better way.
select ID, MIN(Begin) from
(
select * from contract m where exists
(
select 1 from contract v where END = '9999-12-31' and v.ID = m.ID
having count(ID)=2
)
and END = '9999-12-31'
)a
group by FUN_ID
If it is a big table, you really want to use EXISTS for finding duplicates because it will short circuit. Here's two ways to use EXISTS that might help with what you are trying to do.
DROP TABLE IF EXISTS #Test;
CREATE TABLE #Test
(
ID INT NOT NULL
,[Begin] DATE NOT NULL
,[End] DATE NOT NULL
)
;
INSERT INTO #Test
VALUES
(20,'2016-01-01','9999-12-31')
,(20,'2020-01-01','9999-12-31')
,(30,'2018-01-01','2019-02-28')
,(30,'2019-03-01','9999-12-31')
,(30,'2020-02-01','9999-12-31')
,(10,'2019-01-01','2019-06-30')
,(10,'2019-07-01','2020-02-29')
,(10,'2020-03-01','9999-12-31')
;
--See all duplicates with OldestBegin for context
SELECT
TST.ID
,TST.[Begin]
,TST.[End]
,OldestBegin = MIN([Begin]) OVER (PARTITION BY TST.ID,TST.[End])
FROM #Test AS TST
WHERE EXISTS
(
SELECT 1
FROM #Test AS _TST
WHERE TST.ID = _TST.ID
AND TST.[End] = _TST.[End]
AND TST.[Begin] <> _TST.[Begin]
)
;
--Get only oldest duplicate
SELECT
TST.ID
,TST.[End]
,[Begin] = MIN([Begin])
FROM #Test AS TST
WHERE EXISTS
(
SELECT 1
FROM #Test AS _TST
WHERE TST.ID = _TST.ID
AND TST.[End] = _TST.[End]
AND TST.[Begin] <> _TST.[Begin]
)
GROUP BY
TST.ID
,TST.[End]
;
Perhaps this will help:
DECLARE #Tab TABLE(ID INT,[Begin] DATE,[End] DATE)
INSERT #Tab
VALUES
(20,'2016-01-01','9999-12-31')
,(20,'2020-01-01','9999-12-31')
,(30,'2018-01-01','2019-02-28')
,(30,'2019-03-01','9999-12-31')
,(30,'2020-02-01','9999-12-31')
,(10,'2019-01-01','2019-06-30')
,(10,'2019-07-01','2020-02-29')
,(10,'2020-03-01','9999-12-31')
;WITH cte AS(
SELECT *
FROM #Tab
WHERE [End] = '9999-12-31'
)
SELECT ID, MIN([Begin]) AS [Begin]
FROM cte
GROUP BY ID
HAVING COUNT(*) > 1
Try this:
WITH test as (SELECT
count(*), min(begin) as Begin, ID from contract
where end = '9999-12-31' group by ID having count(*) > 1) select ID, Begin from test

How to add missing dates to the result table?

I have a table:
accdate (DATETIME) | value (INT)
-------------------+------------
|
The accdate-column contains datasets on hour-granularity. That means, there are datetimes in the format YYYY-mm-dd HH:00:00. If I view the table using SELECT * FROM mytable ORDER BY accdate ASC I get an ordered table by accdate. But mytable does not contain all possible dates and hours between the first row and the last (some dates are missing in the times my program is not running). I want to have default-values for all possible date+hour-combinations between the first and the last row.
I know this can be solved by using a LEFT JOIN with another table, that contains all possible dates in this range. But how do I construct such a table in a SQL-Statement? I think it is not senseful to populate the table with dummy data, if I can resolve the problem in the query.
Example:
accdate (DATETIME) | value (INT)
---------------------+------------
2011-11-11 19:00:00 | 50
2011-11-11 20:00:00 | 53
2011-11-11 22:00:00 | 16
2011-11-12 06:00:00 | 15
2011-11-12 07:00:00 | 150
The date 2011-11-11 21:00:00 and the range between 23 pm and 5am is missing. For these dates there should be a row in the result-table (containing a 0 in the value-column).
I hope you understand my problem. If something is unclear, please comment. Thank you.
With SQLite 3.8.3 or later, you can use a common table expression to generate values out of nothing:
WITH RECURSIVE AllDates(accdate)
AS (VALUES('2011-11-11 00:00:00')
UNION ALL
SELECT datetime(accdate, '+1 hour')
FROM AllDates
WHERE accdate < '2011-11-12 10:00:00')
SELECT AllHours.accdate,
MyTable.value
FROM AllHours
LEFT JOIN MyTable USING (accdate)
The only way I can think of is to use a left join with the same table, adding to the desired field on the join and union the result with the actual results to complete the set:
Example setup:
CREATE TABLE tmp (
id INT IDENTITY,
number INT
);
-- insert some incomplete sequenced values
INSERT INTO tmp (number) VALUES(1);
INSERT INTO tmp (number) VALUES(3);
INSERT INTO tmp (number) VALUES(4);
Example query:
-- select your actual data
SELECT number
FROM tmp
UNION
-- select the missing data
SELECT a.number + 1
FROM tmp a
LEFT JOIN tmp b ON a.number + 1 = b.number
WHERE b.id IS NULL
-- order the complete set
ORDER BY number ASC;
This will not work if you have more than one missing value between your results (eg.: 1 and 4), but if your data misses only single hours between each result this Works like a charm.

Greatest Date group by TCP address

What I want: I'm having problems with a greatest-n-per-group problem. My group is a set of TCP Addresses and the n is the date at which the table row was inserted into the database.
The problem: I'm currently getting all rows with tcp Addresses which match my where clause, rather then one with the largest date per tcp address.
I'm trying to follow this example and failing: SQL Select only rows with Max Value on a Column.
Here's what my table looks like.
CREATE TABLE IF NOT EXISTS `xactions` (
`id` int(15) NOT NULL AUTO_INCREMENT,
`tcpAddress` varchar(40) NOT NULL,
//a whole lot of other stuff in batween
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=150 ;
Example rows are
ID | tcpAddress | ... | date
1 | 192.168.1.161 | ... | 2012-09-12 14:19:39
2 | 192.168.1.162 | ... | 2012-09-12 14:19:40
3 | 192.168.1.162 | ... | 2012-09-12 14:19:41
4 | 192.168.1.162 | ... | 2012-09-12 14:19:42
SQL statement I'm trying to use
select yt.id, yt.tcpAddress, yt.analog, yt.discrete, yt.counter, yt.date
from xactions yt
inner join(
select id, tcpAddress, analog, discrete, counter, max(date) date
from xactions
WHERE tcpAddress='192.168.1.161' OR tcpAddress='192.168.1.162'
group by date
) ss on yt.id = ss.id and yt.date= ss.date
You need to group by the tcpAddress, not by the date.
And join by the tcpAddress, not the id.
select yt.id, yt.tcpAddress, yt.analog, yt.discrete, yt.counter, yt.date
from xactions yt
inner join (
select tcpAddress, max(date) date
from xactions
where tcpAddress in ('192.168.1.161', '192.168.1.162')
group by tcpAddress
) ss using (tcpAddress, date);
Also, you don't need to select any extra columns in the derived table -- only the tcpAddress and the max(date).
Also you can use option with EXISTS(). In EXISTS() find MAX(date) for each group of tcpAddress
and compare them
SELECT id, tcpAddress, analog, discrete, counter, date
FROM xactions x1
WHERE EXISTS (
SELECT 1
FROM xactions x2
WHERE x1.tcpAddress = x2.tcpAddress
HAVING MAX(x2.date) = x1.date
) AND (tcpAddress='192.168.1.161' OR tcpAddress='192.168.1.162')

Help With SQL - Combining Two Rows Into One Row

I have an interesting SQL problem that I need help with.
Here is the sample dataset:
Warehouse DateStamp TimeStamp ItemNumber ID
A 8/1/2009 10001 abc 1
B 8/1/2009 10002 abc 1
A 8/3/2009 12144 qrs 5
C 8/3/2009 12143 qrs 5
D 8/5/2009 6754 xyz 6
B 8/5/2009 6755 xyz 6
This dataset represents inventory transfers between two warehouses. There are two records that represent each transfer, and these two transfer records always have the same ItemNumber, DateStamp, and ID. The TimeStamp values for the two transfer records always have a difference of 1, where the smaller TimeStamp represents the source warehouse record and the larger TimeStamp represents the destination warehouse record.
Using the sample dataset above, here is the query result set that I need:
Warehouse_Source Warehouse_Destination ItemNumber DateStamp
A B abc 8/1/2009
C A qrs 8/3/2009
D B xyz 8/5/2009
I can write code to produce the desired result set, but I was wondering if this record combination was possible through SQL. I am using SQL Server 2005 as my underlying database. I also need to add a WHERE clause to the SQL, so that for example, I could search on Warehouse_Source = A. And no, I can't change the data model ;).
Any advice is greatly appreciated!
Regards,
Mark
SELECT source.Warehouse as Warehouse_Source
, dest.Warehouse as Warehouse_Destination
, source.ItemNumber
, source.DateStamp
FROM table source
JOIN table dest ON source.ID = dest.ID
AND source.ItemNumber = dest.ItemNumber
AND source.DateStamp = dest.DateStamp
AND source.TimeStamp = dest.TimeStamp + 1
Mark,
Here is how you can do this with row_number and PIVOT. With a clustered index or primary key on the columns as I suggest, it will use a straight-line query plan with no Sort operation, thus be particularly efficient.
create table T(
Warehouse char,
DateStamp datetime,
TimeStamp int,
ItemNumber varchar(10),
ID int,
primary key(ItemNumber,DateStamp,ID,TimeStamp)
);
insert into T values ('A','20090801','10001','abc','1');
insert into T values ('B','20090801','10002','abc','1');
insert into T values ('A','20090803','12144','qrs','5');
insert into T values ('C','20090803','12143','qrs','5');
insert into T values ('D','20090805','6754','xyz','6');
insert into T values ('B','20090805','6755','xyz','6');
with Tpaired(Warehouse,DateStamp,TimeStamp,ItemNumber,ID,rk) as (
select
Warehouse,DateStamp,TimeStamp,ItemNumber,ID,
row_number() over (
partition by ItemNumber,DateStamp,ID
order by TimeStamp
)
from T
)
select
max([1]) as Warehouse_Source,
max([2]) as Warehouse_Destination,
ItemNumber,
DateStamp
from Tpaired
pivot (
max(Warehouse) for rk in ([1],[2])
) as P
group by ItemNumber, DateStamp, ID;
go
drop table T;