SQL Create a duplicate row for each additional count - sql

I have a table with a many to many relationship, in which I need to make a 1 to 1 without modifying the schema. Here is the pseudo code:
Reports {
Id INT,
Description NVARCHAR(256),
ReportFields...
}
ScheduledReports {
ScheduledReportId INT
ReportId INT (FK)
Frequency INT
}
When I run this query:
SELECT [ReportID], COUNT(*) as NumberOfReports
FROM [ScheduledReports]
GROUP BY ReportId
HAVING COUNT(*) > 1
I get return the results of all the reports who have duplicates.
ReportId, NumberOfReports
1, 2
2, 4
Foreach additional report (e.g NumberOfReports -1).
I need to create a duplicate row in the Reports table. However I'm having trouble on figuring out how to turn the count into a join (since I don't want to use cursors).
Here is my query:
INSERT INTO Reports (Description)
SELECT Description
FROM Reports
WHERE ReportId IN (SELECT [ReportID]
FROM [ScheduledReports]
GROUP BY ReportId
HAVING COUNT(*) > 1)
How do I Join the ReportRow on itself for Count(*) -1 times?

The below query should get you a sequencing of the schedules per unique report. You can then use the sequencing > 1 to determine which values will need to be inserted to your report table. Output of this select should probably be cached, since it will
Indicate which rows need to be added to your Reports by their current ID
Can be used to later update the referenced ReportID in your schedules table
SELECT *
FROM (
SELECT Reports.Id
,ScheduledReportId
,ROW_NUMBER() OVER (
PARTITION BY ReportId
ORDER BY ScheduledReportId
) AS [Sequencing]
FROM Reports
INNER JOIN ScheduledReports on ScheduledReports.ReportId = Reports.Id
WHERE ReportId IN (SELECT [ReportID]
FROM [ScheduledReports]
GROUP BY ReportId
HAVING COUNT(*) > 1)) AS SequencedReportAndSchedules

Related

Find entryno set from multiple set of records

I have two SQL temp tables #Temp1 and #Temp2.
I want to get entryno which contain set of temp table two.
For example: #Temp2 has 8 records. I want to search in #Temp1 which contains a set of records from #Temp1.
CREATE TABLE #Temp1 (entryNo INT, setid INT, measurid INT,measurvalueid int)
CREATE TABLE #Temp2(setid INT, measurid INT,measurvalueid int)
INSERT INTO #Temp1 (entryNo,setid,measurid,measurvalueid )
VALUES (1,400001,1,1),
(1,400001,2,110),
(1,400001,3,1001),
(1,400001,4,1100),
(2,400002,5,100),
(2,400002,6,102),
(2,400002,7,1003),
(2,400002,8,10004),
(3,400001,1,1),
(3,400001,2,110),
(3,400001,3,1001),
(3,400001,4,1200)
INSERT INTO #Temp2 (setid,measurid,measurvalueid )
VALUES (400001,1,1),
(400001,2,110),
(400001,3,1001),
(400001,4,1100),
(400002,5,100),
(400002,6,102),
(400002,7,1003),
(400002,8,10004)
I want output
EntryNo
1
2
It contains two sets.
One is:
(400001,1,1),
(400001,2,110),
(400001,3,1001),
(400001,4,1100)
The second is:
(400002,5,100),
(400002,6,102),
(400002,7,1003),
(400002,8,10004)
Try this:
WITH DataSourceInialData AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY [entryNo], [setid]) AS [GroupCount]
FROM #Temp1
), DataSourceFilteringData AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY [setid]) AS [GroupCount]
FROM #Temp2
)
SELECT A.[entryNo]
FROM DataSourceInialData A
INNER JOIN DataSourceFilteringData B
ON A.[setid] = B.[setid]
AND A.[measurid] = B.[measurid]
AND A.[measurvalueid] = B.[measurvalueid]
-- we are interested in groups which are passed completely by the filtering groups
AND A.[GroupCount] = B.[GroupCount]
GROUP BY A.[entryNo]
-- aftering joining the rows, the filtered rows must match the filtering rows
HAVING COUNT(A.[setid]) = MAX(B.[GroupCount]);
The algorithm is simple:
we count how many rows exists per data group
we count how many rows exists per filtering group
we join the initial data and the filtering data
after the join we count how many rows are left in the initial data and if there count is equal to the filtering count for the given group
and the result is:
Note, that I am checking for each match. For example, if in your sample data, there is one more row for entryNo = 1 it won't be included in the result. In order to change this behavior, comment this row:
-- we are interested in groups which are passed completely by the filtering groups
AND A.[GroupCount] = B.[GroupCount]

Duplicate SQL records to the same table with manually auto increment id

I'm trying to write a query which will return those records:
select *
from [CloneConfiguration]
where InstrumentId = 2
and insert them into the same table with changing the following columns:
Id - the new record will need a unique id number (because it is the primary key without that it defined as auto increment)
Instrument id - change the instrument id to another number (3 for example)
I tried the following query which doesn't work.
INSERT INTO [CloneConfiguration]
SELECT
MAX(Id) + 1, 3,
[SourceCCy1Id], [SourceCCy2Id], [SourceProviderId],
[TargetCCy1Id], [TargetCCy2Id], [TargetProviderId], [Remark]
FROM
[CloneConfiguration]
WHERE
InstrumentId =2
Error:
Column 'CloneConfiguration.SourceCCy1Id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
You can do what you want in a single query by doing:
INSERT INTO [CloneConfiguration]
SELECT COALESCE(m.maxid + 1, 1), 3, [SourceCCy1Id], [SourceCCy2Id],
[SourceProviderId], [TargetCCy1Id], [TargetCCy2Id],
[TargetProviderId], [Remark]
FROM [CloneConfiguration] CROSS JOIN
(SELECT max(id) as maxid FROM CloneConfiguration) m
WHERE InstrumentId = 2 ;
If you are inserting multiple rows, then use row_number() as well:
INSERT INTO [CloneConfiguration]
SELECT COALESCE(m.maxid, 0) + ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
3, [SourceCCy1Id], [SourceCCy2Id],
[SourceProviderId], [TargetCCy1Id], [TargetCCy2Id],
[TargetProviderId], [Remark]
FROM [CloneConfiguration] CROSS JOIN
(SELECT max(id) as maxid FROM CloneConfiguration) m
WHERE InstrumentId = 2 ;
That said, the correct solution is to define the id to be an identity column. Then the database takes care of assigning a unique id. Your queries also will not have race conditions. So, the above work if there is only one user, but can fail if there are multiple users.
This is assuming sql-server, but I guess you get the point anyway:
DECLARE #MAXID INT = (SELECT MAX(Id) FROM [CloneConfiguration]) -- You probably want to number from the highest Id regardless of InstrumentId
INSERT INTO [CloneConfiguration]
SELECT #MAXID + ROW_NUMBER() OVER(ORDER BY Id)
, 3
, [SourceCCy1Id]
, [SourceCCy2Id]
, [SourceProviderId]
, [TargetCCy1Id]
, [TargetCCy2Id]
, [TargetProviderId]
, [Remark]
FROM [CloneConfiguration]
WHERE InstrumentId=2
The idea is to first get the MAX(Id) currently in the table, and add a ROW_NUMBER based on the selected Id's.
By the way, it's also a good idea to name the columns you want to insert into:
INSERT INTO [CloneConfiguration] (Id, InstrumentId...)
...

How do I filter a history table for items that had a specific status but exclude that item if it also had another specific status

In our SQL Server 2008 database, we have a one-to-many relationship between the "Jobs" table and the "History" table. Each time the status of the job is changed, we log the Status, date-time and Job ID into the History table.
I need to get all the Job ID’s where the history has a status = “Issued” unless the job history also has a status = “Revoked”.
If the history for a Job has a status = “Revoked”, but then at a later date has a status = “Issued” , it must be included in the results.
So based on the following mock up table and the rules above, the results that should be returned are FK_JobID’s 2 and 3.
DECLARE #History TABLE
(
HistoryID INT IDENTITY(1,1),
MaterStatus VARCHAR(300),
Updated DATETIME,
FK_JobID INT
)
INSERT INTO #History
VALUES ('Issued','2015-09-09',1),
('Revoked','2015-09-09',1),
('Issued','2015-09-09',2),
('Archived','2015-09-09',2),
('Issued','2015-09-09',3),
('Revoked','2015-09-09',3),
('Issued','2015-09-10',3),
('other','2015-09-09',4);
How do I write this query to do that?
Here's a method of pulling the results using a CTE to get the most recent times a status was issued, and using a NOT EXISTS clause to filter out those that were revoked.
;With RecentStatuses As
(
Select FK_JobId, MaterStatus, Max(Updated) Recent
From #History
Group by FK_JobID, MaterStatus
)
Select H1.FK_JobID
From RecentStatuses H1
Where H1.MaterStatus = 'Issued'
And Not Exists
(
Select *
From RecentStatuses H2
Where H2.FK_JobID = H1.FK_JobID
And H2.MaterStatus = 'Revoked'
And H2.Recent >= H1.Recent
)

Database design - Multiple 1 to many relationships

What would be the best way to model 1 table with multiple 1 to many relatiionships.
With the above schema if Report contains 1 row, Grant 2 rows and Donation 12. When I join the three together I end up with a Cartesian product and result set of 24. Report joins to Grant and creates 2 rows, then Donation joins on that to make 24 rows.
Is there a better way to model this to avoid the caresian product?
example code
DECLARE #Report
TABLE (
ReportID INT,
Name VARCHAR(50)
)
INSERT
INTO #Report
(
ReportID,
Name
)
SELECT 1,'Report1'
DECLARE #Grant
TABLE (
GrantID INT IDENTITY(1,1) PRIMARY KEY(GrantID),
GrantMaker VARCHAR(50),
Amount DECIMAL(10,2),
ReportID INT
)
INSERT
INTO #Grant
(
GrantMaker,
Amount,
ReportID
)
SELECT 'Grantmaker1',10,1
UNION ALL
SELECT 'Grantmaker2',999,1
DECLARE #Donation
TABLE (
DonationID INT IDENTITY(1,1) PRIMARY KEY(DonationID),
DonationMaker VARCHAR(50),
Amount DECIMAL(10,2),
ReportID INT
)
INSERT
INTO #Donation
(
DonationMaker,
Amount,
ReportID
)
SELECT 'Grantmaker1',10,1
UNION ALL
SELECT 'Grantmaker2',3434,1
UNION ALL
SELECT 'Grantmaker3',45645,1
UNION ALL
SELECT 'Grantmaker4',3,1
UNION ALL
SELECT 'Grantmaker5',34,1
UNION ALL
SELECT 'Grantmaker6',23,1
UNION ALL
SELECT 'Grantmaker7',67,1
UNION ALL
SELECT 'Grantmaker8',78,1
UNION ALL
SELECT 'Grantmaker9',98,1
UNION ALL
SELECT 'Grantmaker10',43,1
UNION ALL
SELECT 'Grantmaker11',107,1
UNION ALL
SELECT 'Grantmaker12',111,1
SELECT *
FROM #Report r
INNER JOIN
#Grant g
ON r.ReportID = g.ReportID
INNER JOIN
#Donation d
ON r.ReportID = d.ReportID
Update 1 2011-03-07 15:20
Cheers for the feedback so far, to add to this scenario there are also 15 other 1 to many relationships coming from the one report table. These tables can't for various business reasons be grouped together.
Is there any relationship at all between Grants and Donations? If there isn't, does it make sense to pull back a query that shows a pseudo relationship between them?
I'd do one query for grants:
SELECT r.*, g.*
FROM #Report r
JOIN #Grant g ON r.ReportID = g.ReportID
And another for donations:
SELECT r.*, d.*
FROM #Report r
JOIN #Donation d ON r.ReportID = d.ReportID
Then let your application show the appropriate data.
However, if Grants and Donations are similar, then just make a more generic table such as Contributions.
Contributions
-------------
ContributionID (PK)
Maker
Amount
Type
ReportID (FK)
Now your query is:
SELECT r.*, c.*
FROM #Report r
JOIN #Contribution c ON r.ReportID = c.ReportID
WHERE c.Type = 'Grant' -- or Donation, depending on the application
If you're going to join on ReportID, then no, you can't avoid a lot of rows. When you omit the table "Report", and just join "Donation" to "Grant" on ReportId, you still get 24 rows.
SELECT *
FROM Grant g
INNER JOIN
Donation d
ON g.ReportID = d.ReportID
But the essential point is that it doesn't make sense in the real world to match up donations and grants. They're completely independent things that essentially have nothing to do with each other.
In the database, the statement immediately above will join each row in Grants to every matching row in Donation. The resulting 24 rows really shouldn't surprise you.
When you need to present independent things to the user, you should use a report writer or web application (for example) that selects the independent things, well, independently. Select donations and put them into one section of a report or web page, then select grants and put them into another section of the report or web page, and so on.
If the table "Report" is supposed to help you record which sections go into a particular report, then you need a structure more like this:
create table reports (
reportid integer primary key,
report_name varchar(35) not null unique
);
create table report_sections (
reportid integer not null references reports (reportid),
section_name varchar(35), -- Might want to reference a table of section names
section_order integer not null,
primary key (reportid, section_name)
);
The donation and grant tables look almost identical. You could make them one table and add a column that is something like DonationType. Would reduce complexity by 1 table. Now if donations and grants are completely different and have different subtables associated with them then keeping them seperate and only joining on one at a time would be ideal.

SQL Server: row present in one query, missing in another

Ok so I think I must be misunderstanding something about SQL queries. This is a pretty wordy question, so thanks for taking the time to read it (my problem is right at the end, everything else is just context).
I am writing an accounting system that works on the double-entry principal -- money always moves between accounts, a transaction is 2 or more TransactionParts rows decrementing one account and incrementing another.
Some TransactionParts rows may be flagged as tax related so that the system can produce a report of total VAT sales/purchases etc, so it is possible that a single Transaction may have two TransactionParts referencing the same Account -- one VAT related, and the other not. To simplify presentation to the user, I have a view to combine multiple rows for the same account and transaction:
create view Accounting.CondensedEntryView as
select p.[Transaction], p.Account, sum(p.Amount) as Amount
from Accounting.TransactionParts p
group by p.[Transaction], p.Account
I then have a view to calculate the running balance column, as follows:
create view Accounting.TransactionBalanceView as
with cte as
(
select ROW_NUMBER() over (order by t.[Date]) AS RowNumber,
t.ID as [Transaction], p.Amount, p.Account
from Accounting.Transactions t
inner join Accounting.CondensedEntryView p on p.[Transaction]=t.ID
)
select b.RowNumber, b.[Transaction], a.Account,
coalesce(sum(a.Amount), 0) as Balance
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
For reasons I haven't yet worked out, a certain transaction (ID=30) doesn't appear on an account statement for the user. I confirmed this by running
select * from Accounting.TransactionBalanceView where [Transaction]=30
This gave me the following result:
RowNumber Transaction Account Balance
-------------------- ----------- ------- ---------------------
72 30 23 143.80
As I said before, there should be at least two TransactionParts for each Transaction, so one of them isn't being presented in my view. I assumed there must be an issue with the way I've written my view, and run a query to see if there's anything else missing:
select [Transaction], count(*)
from Accounting.TransactionBalanceView
group by [Transaction]
having count(*) < 2
This query returns no results -- not even for Transaction 30! Thinking I must be an idiot I run the following query:
select [Transaction]
from Accounting.TransactionBalanceView
where [Transaction]=30
It returns two rows! So select * returns only one row and select [Transaction] returns both. After much head-scratching and re-running the last two queries, I concluded I don't have the faintest idea what's happening. Any ideas?
Thanks a lot if you've stuck with me this far!
Edit:
Here are the execution plans:
select *
select [Transaction]
1000 lines each, hence finding somewhere else to host.
Edit 2:
For completeness, here are the tables I used:
create table Accounting.Accounts
(
ID smallint identity primary key,
[Name] varchar(50) not null
constraint UQ_AccountName unique,
[Type] tinyint not null
constraint FK_AccountType foreign key references Accounting.AccountTypes
);
create table Accounting.Transactions
(
ID int identity primary key,
[Date] date not null default getdate(),
[Description] varchar(50) not null,
Reference varchar(20) not null default '',
Memo varchar(1000) not null
);
create table Accounting.TransactionParts
(
ID int identity primary key,
[Transaction] int not null
constraint FK_TransactionPart foreign key references Accounting.Transactions,
Account smallint not null
constraint FK_TransactionAccount foreign key references Accounting.Accounts,
Amount money not null,
VatRelated bit not null default 0
);
Demonstration of possible explanation.
Create table Script
SELECT *
INTO #T
FROM master.dbo.spt_values
CREATE NONCLUSTERED INDEX [IX_T] ON #T ([name] DESC,[number] DESC);
Query one (Returns 35 results)
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Query Two (Same as before but adding c2.[type] to the select list makes it return 0 results)
;
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY NAME) AS rn
FROM #T
)
SELECT c1.number,c1.[type] ,c2.[type]
FROM cte c1
JOIN cte c2 ON c1.rn=c2.rn AND c1.number <> c2.number
Why?
row_number() for duplicate NAMEs isn't specified so it just chooses whichever one fits in with the best execution plan for the required output columns. In the second query this is the same for both cte invocations, in the first one it chooses a different access path with resultant different row_numbering.
Suggested Solution
You are self joining the CTE on ROW_NUMBER() over (order by t.[Date])
Contrary to what may have been expected the CTE will likely not be materialised which would have ensured consistency for the self join and thus you assume a correlation between ROW_NUMBER() on both sides that may well not exist for records where a duplicate [Date] exists in the data.
What if you try ROW_NUMBER() over (order by t.[Date], t.[id]) to ensure that in the event of tied dates the row_numbering is in a guaranteed consistent order. (Or some other column/combination of columns that can differentiate records if id won't do it)
If the purpose of this part of the view is just to make sure that the same row isn't joined to itself
where a.RowNumber <= b.RowNumber
then how does changing this part to
where a.RowNumber <> b.RowNumber
affect the results?
It seems you read dirty entries. (Someone else deletes/insertes new data)
try SET TRANSACTION ISOLATION LEVEL READ COMMITTED.
i've tried this code (seems equal to yours)
IF object_id('tempdb..#t') IS NOT NULL DROP TABLE #t
CREATE TABLE #t(i INT, val INT, acc int)
INSERT #t
SELECT 1, 2, 70
UNION ALL SELECT 2, 3, 70
;with cte as
(
select ROW_NUMBER() over (order by t.i) AS RowNumber,
t.val as [Transaction], t.acc Account
from #t t
)
select b.RowNumber, b.[Transaction], a.Account
from cte a, cte b
where a.RowNumber <= b.RowNumber AND a.Account=b.Account
group by b.RowNumber, b.[Transaction], a.Account
and got two rows
RowNumber Transaction Account
1 2 70
2 3 70