I have to deal with data that is being dumped to a "log" table within SQL Server. Unfortunately can't make changes. Basically a process is run daily which dumps some duplicate items into a table.
Table 1:
import_id: guid
import_at: datetime
Table 2:
item_id: guid
import_id: guid (foreign key)
item_url: varchar(1000)
item_name: varchar(50)
item_description: varchar(1000)
Sometimes Table 2 will have a duplicate item_url. I only want to get the list of item_id and item_url from the newest import.
The query below will return one row per item_url, the one with the latest import_at value:
WITH all_items AS (
SELECT
t1.import_id
, t1.import_at
, t2.item_id
, t2.item_url
, t2.item_name
, t2.item_description
, ROW_NUMBER() OVER(PARTITION BY item_url ORDER BY t1.import_at DESC) AS item_url_rank
FROM dbo.table1 AS t1
JOIN dbo.table1 AS t2 ON
t2.import_id = t1.import_id
)
SELECT
t1.import_id
, import_at
, item_id
, item_url
, item_name
, item_description
WHERE
item_url_rank = 1;
Related
my brain may not be working today... but I'm trying to get a dataset to be arranged in a particular way. It's easier to show what I mean.
I have a dataset like this:
CREATE TABLE #EXAMPLE (
ID CHAR(11)
, ORDER_ID INT
, PARENT_ORDER_ID INT
);
INSERT INTO #EXAMPLE VALUES
('27KJKR8K3TP', 19517, 0)
, ('27KJKR8K3TP', 10615, 0)
, ('27KJKR8K3TP', 83364, 19517)
, ('27KJKR8K3TP', 96671, 10615)
, ('TXCMK9757JT', 92645, 0)
, ('TXCMK9757JT', 60924, 92645);
SELECT * FROM #EXAMPLE;
DROP TABLE #EXAMPLE;
The PARENT_ORDER_ID field refers back to other orders on the given ID. E.g. ID TXCMK9757JT has order 60924 which is a child order of 92645, which is a separate order on the ID. The way I need this dataset to be arranged is like this:
CREATE TABLE #EXAMPLE (
ID CHAR(11)
, ORDER_ID INT
, CHILD_ORDER_ID INT
);
INSERT INTO #EXAMPLE VALUES
('27KJKR8K3TP', 19517, 19517)
, ('27KJKR8K3TP', 19517, 83364)
, ('27KJKR8K3TP', 10615, 10615)
, ('27KJKR8K3TP', 10615, 96671)
--, ('27KJKR8K3TP', 83364, 83364)
--, ('27KJKR8K3TP', 96671, 96671)
, ('TXCMK9757JT', 92645, 92645)
, ('TXCMK9757JT', 92645, 60924)
--, ('TXCMK9757JT', 60924, 60924)
;
SELECT * FROM #EXAMPLE;
DROP TABLE #EXAMPLE;
In this arrangement of the data set, instead of PARENT_ORDER_ID field there is CHILD_ORDER_ID, which basically lists every single ORDER_ID falling under a given ORDER_ID, including itself. I ultimately would like to have the CHILD_ORDER_ID field be the key for the data set, having only unique values (so that's why I've commented out the CHILD_ORDER_IDs that would only contain themselves, because they have a parent order ID which already contains them).
Any advice on how to achieve the described transformation of the data set would be greatly appreciated! I've tried recursive CTEs and different join statements but I'm not quite getting what I want. Thank you!
You can try to use CTE recursive first, then you will get a result to show all Id hierarchy then use CASE WHEN judgment the logic.
;WITH CTE AS (
SELECT ID,ORDER_ID,PARENT_ORDER_ID
FROM #EXAMPLE
WHERE PARENT_ORDER_ID = 0
UNION ALL
SELECT c.Id,e.ORDER_ID,e.PARENT_ORDER_ID
FROM CTE c
INNER JOIN #EXAMPLE e
ON c.ORDER_ID = e.PARENT_ORDER_ID AND c.Id = e.Id
)
SELECT ID,
(CASE WHEN PARENT_ORDER_ID = 0 THEN ORDER_ID ELSE PARENT_ORDER_ID END) ORDER_ID,
ORDER_ID CHILD_ORDER_ID
FROM CTE
ORDER BY ID
sqlfiddle
I have a table with a many to many relationship, in which I need to make a 1 to 1 without modifying the schema. Here is the pseudo code:
Reports {
Id INT,
Description NVARCHAR(256),
ReportFields...
}
ScheduledReports {
ScheduledReportId INT
ReportId INT (FK)
Frequency INT
}
When I run this query:
SELECT [ReportID], COUNT(*) as NumberOfReports
FROM [ScheduledReports]
GROUP BY ReportId
HAVING COUNT(*) > 1
I get return the results of all the reports who have duplicates.
ReportId, NumberOfReports
1, 2
2, 4
Foreach additional report (e.g NumberOfReports -1).
I need to create a duplicate row in the Reports table. However I'm having trouble on figuring out how to turn the count into a join (since I don't want to use cursors).
Here is my query:
INSERT INTO Reports (Description)
SELECT Description
FROM Reports
WHERE ReportId IN (SELECT [ReportID]
FROM [ScheduledReports]
GROUP BY ReportId
HAVING COUNT(*) > 1)
How do I Join the ReportRow on itself for Count(*) -1 times?
The below query should get you a sequencing of the schedules per unique report. You can then use the sequencing > 1 to determine which values will need to be inserted to your report table. Output of this select should probably be cached, since it will
Indicate which rows need to be added to your Reports by their current ID
Can be used to later update the referenced ReportID in your schedules table
SELECT *
FROM (
SELECT Reports.Id
,ScheduledReportId
,ROW_NUMBER() OVER (
PARTITION BY ReportId
ORDER BY ScheduledReportId
) AS [Sequencing]
FROM Reports
INNER JOIN ScheduledReports on ScheduledReports.ReportId = Reports.Id
WHERE ReportId IN (SELECT [ReportID]
FROM [ScheduledReports]
GROUP BY ReportId
HAVING COUNT(*) > 1)) AS SequencedReportAndSchedules
Given the following table containing the example rows, I’m looking for a query to give me the aggregate results of changes made to the same record. All changes are made against a base record in another table (results table), so the contents of the results table are not cumulative.
Base Records (from which all changes are made)
Edited Columns highlighted
I’m looking for a query that would give me the cumulative changes (in order by date). This would be the resulting rows:
Any help appreciated!
UPDATE---------------
Let me offer some clarification. The records being edited exist in one table, let's call that [dbo].[Base]. When a person updates a record from [dbo].[Base], his updates go into [dbo].[Updates]. Therefore, a person is always editing from the base table.
At some point, let's say once a day, we need to calculate the sum of changes with the following rule:
For any given record, determine the latest change for each column and take the latest change. If no change was made to a column, take the value from [dbo].[Base]. So, one way of looking at the [dbo].[Updates] table would be to see only the changed columns.
Please let's not discuss the merits of this approach, I realize it's strange. I just need to figure out how to determine the final state of each record.
Thanks!
This is dirty, but you can give this a shot (test here: https://rextester.com/MKSBU15593)
I use a CTE to do an initial CROSS JOIN of the Base and Update tables and then a second to filter it to only the rows where the IDs match. From there I use FIRST_VALUE() for each column, partitioned by the ID value and ordered by a CASE expression (if the Base column value matches the Update column value then 1 else 0) and the Datemodified column to get the most recent version of the each column.
It spits out
CREATE TABLE Base
(
ID INT
,FNAME VARCHAR(100)
,LNAME VARCHAR(100)
,ADDRESS VARCHAR(100)
,RATING INT
,[TYPE] VARCHAR(5)
,SUBTYPE VARCHAR(5)
);
INSERT INTO dbo.Base
VALUES
( 100,'John','Doe','123 First',3,'Emp','W2'),
( 200,'Jane','Smith','Wacker Dr.',2,'Emp','W2');
CREATE TABLE Updates
(
ID INT
,DATEMODIFIED DATE
,FNAME VARCHAR(100)
,LNAME VARCHAR(100)
,ADDRESS VARCHAR(100)
,RATING INT
,[TYPE] VARCHAR(5)
,SUBTYPE VARCHAR(5)
);
INSERT INTO dbo.Updates
VALUES
( 100,'1/15/2019','John','Doe','123 First St.',3,'Emp','W2'),
( 200,'1/15/2019','Jane','Smyth','Wacker Dr.',2,'Emp','W2'),
( 100,'1/17/2019','Johnny','Doe','123 First',3,'Emp','W2'),
( 200,'1/19/2019','Jane','Smith','2 Wacker Dr.',2,'Emp','W2'),
( 100,'1/20/2019','Jon','Doe','123 First',3,'Cont','W2');
WITH merged AS
(
SELECT b.ID AS IDOrigin
,'1/1/1900' AS DATEMODIFIEDOrigin
,b.FNAME AS FNAMEOrigin
,b.LNAME AS LNAMEOrigin
,b.ADDRESS AS ADDRESSOrigin
,b.RATING AS RATINGOrigin
,b.[TYPE] AS TYPEOrigin
,b.SUBTYPE AS SUBTYPEOrigin
,u.*
FROM base b
CROSS JOIN
dbo.Updates u
), filtered AS
(
SELECT *
FROM merged
WHERE IDOrigin = ID
)
SELECT distinct
ID
,FNAME = FIRST_VALUE(FNAME) OVER (PARTITION BY ID ORDER BY CASE WHEN FNAME = FNAMEOrigin THEN 1 ELSE 0 end, datemodified desc)
,LNAME = FIRST_VALUE(LNAME) OVER (PARTITION BY ID ORDER BY CASE WHEN LNAME = LNAMEOrigin THEN 1 ELSE 0 end, datemodified desc)
,ADDRESS = FIRST_VALUE(ADDRESS) OVER (PARTITION BY ID ORDER BY CASE WHEN ADDRESS = ADDRESSOrigin THEN 1 ELSE 0 end, datemodified desc)
,RATING = FIRST_VALUE(RATING) OVER (PARTITION BY ID ORDER BY CASE WHEN RATING = RATINGOrigin THEN 1 ELSE 0 end, datemodified desc)
,[TYPE] = FIRST_VALUE([TYPE]) OVER (PARTITION BY ID ORDER BY CASE WHEN [TYPE] = TYPEOrigin THEN 1 ELSE 0 end, datemodified desc)
,SUBTYPE = FIRST_VALUE(SUBTYPE) OVER (PARTITION BY ID ORDER BY CASE WHEN SUBTYPE = SUBTYPEOrigin THEN 1 ELSE 0 end, datemodified desc)
FROM filtered
Don't you just want the last record?
select e.*
from edited e
where e.datemodified = (select max(e2.datemodified)
from edited e2
where e2.id = e.id
);
I wrote a stored procedure that can insert bulk data into table using the merge statement.
Problem is that when I insert itemid 1024,1000,1012,1025 in this order, then SQL Server automatically changes order of itemid 1000,1012,1024,1025.
I want to insert data that I actually pass.
Here is sample code. This will parse XML string into table object:
DECLARE #tblPurchase TABLE
(
Purchase_Detail_ID INT ,
Purchase_ID INT ,
Head_ID INT ,
Item_ID INT
);
INSERT INTO #tblPurchase (Purchase_Detail_ID, Purchase_ID, Head_ID, Item_ID)
SELECT
Tbl.Col.value('Purchase_Detail_ID[1]', 'INT') AS Purchase_Detail_ID,
Tbl.Col.value('Purchase_ID[1]', 'INT') AS Purchase_ID,
Tbl.Col.value('Head_ID[1]', 'INT') AS Head_ID,
Tbl.Col.value('Item_ID[1]', 'INT') AS Item_ID
FROM
#PurchaseDetailsXML.nodes('/documentelement/TRN_Purchase_Details') Tbl(Col)
This will insert bulk data into the TRN_Purchase_Details table:
MERGE TRN_Purchase_Details MTD
USING (SELECT
Purchase_Detail_ID,
Id AS Purchase_ID,
Head_ID, Item_ID
FROM
#tblPurchase
LEFT JOIN
#ChangeResult ON 1 = 1) AS TMTD ON MTD.Purchase_Detail_ID = TMTD.Purchase_Detail_ID
AND MTD.Purchase_ID = TMTD.Purchase_ID
WHEN MATCHED THEN
UPDATE SET MTD.Head_ID = TMTD.Head_ID,
MTD.Item_ID = TMTD.Item_ID
WHEN NOT MATCHED BY TARGET THEN
INSERT (Purchase_ID, Head_ID, Item_ID)
VALUES (Purchase_ID, Head_ID, Item_ID)
WHEN NOT MATCHED BY SOURCE AND
MTD.Purchase_ID = (SELECT TOP 1 Id
FROM #ChangeResult
WHERE Id > 0) THEN
DELETE;
Rows in a SQL table don't have any order. They come back in indeterminate order unless you specify an order by.
Try adding an identity column to your temporary table?
DECLARE #tblPurchase TABLE
(
ID int identity,
Purchase_Detail_ID INT ,
The identity column might capture the order of the XML elements.
If that doesn't work, you can calculate the position of the elements in the XML and store that position in the temporary table.
As mentioned elsewhere, data in a table is stored as an unordered set. If you need to be able to go back to your table after data is inserted and determine the order that it was inserted, you'll have to add a column to the table schema to record that information.
It could be something as simple as adding an IDENTITY column, which will increment on each row addition, or perhaps a column with a DATETIME data type and a GETDATE() default value so you not only know the order rows were added, but exactly when that happened.
I'm trying to write a query which will return those records:
select *
from [CloneConfiguration]
where InstrumentId = 2
and insert them into the same table with changing the following columns:
Id - the new record will need a unique id number (because it is the primary key without that it defined as auto increment)
Instrument id - change the instrument id to another number (3 for example)
I tried the following query which doesn't work.
INSERT INTO [CloneConfiguration]
SELECT
MAX(Id) + 1, 3,
[SourceCCy1Id], [SourceCCy2Id], [SourceProviderId],
[TargetCCy1Id], [TargetCCy2Id], [TargetProviderId], [Remark]
FROM
[CloneConfiguration]
WHERE
InstrumentId =2
Error:
Column 'CloneConfiguration.SourceCCy1Id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
You can do what you want in a single query by doing:
INSERT INTO [CloneConfiguration]
SELECT COALESCE(m.maxid + 1, 1), 3, [SourceCCy1Id], [SourceCCy2Id],
[SourceProviderId], [TargetCCy1Id], [TargetCCy2Id],
[TargetProviderId], [Remark]
FROM [CloneConfiguration] CROSS JOIN
(SELECT max(id) as maxid FROM CloneConfiguration) m
WHERE InstrumentId = 2 ;
If you are inserting multiple rows, then use row_number() as well:
INSERT INTO [CloneConfiguration]
SELECT COALESCE(m.maxid, 0) + ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
3, [SourceCCy1Id], [SourceCCy2Id],
[SourceProviderId], [TargetCCy1Id], [TargetCCy2Id],
[TargetProviderId], [Remark]
FROM [CloneConfiguration] CROSS JOIN
(SELECT max(id) as maxid FROM CloneConfiguration) m
WHERE InstrumentId = 2 ;
That said, the correct solution is to define the id to be an identity column. Then the database takes care of assigning a unique id. Your queries also will not have race conditions. So, the above work if there is only one user, but can fail if there are multiple users.
This is assuming sql-server, but I guess you get the point anyway:
DECLARE #MAXID INT = (SELECT MAX(Id) FROM [CloneConfiguration]) -- You probably want to number from the highest Id regardless of InstrumentId
INSERT INTO [CloneConfiguration]
SELECT #MAXID + ROW_NUMBER() OVER(ORDER BY Id)
, 3
, [SourceCCy1Id]
, [SourceCCy2Id]
, [SourceProviderId]
, [TargetCCy1Id]
, [TargetCCy2Id]
, [TargetProviderId]
, [Remark]
FROM [CloneConfiguration]
WHERE InstrumentId=2
The idea is to first get the MAX(Id) currently in the table, and add a ROW_NUMBER based on the selected Id's.
By the way, it's also a good idea to name the columns you want to insert into:
INSERT INTO [CloneConfiguration] (Id, InstrumentId...)
...