SQL Query - solving duplicate data - sql

Here are the 2 tables I am using currently with some example data:
Finances
FinanceID Variation Total
1 0 £1,000.00
5 1 £250.00
24 2 £500.00
A Project can contain multiple Finances, Variation 0 is the Original Order, and each Finance line here after is 1 greater than the previous.
Application
ApplicationID ApplicationNumber PercentageComplete Value
5 1 20% £200.00
17 2 50% £300.00
35 3 75% £250.00
91 4 90% £150.00
The Application table above references the Original Finance Line, NOT the Variations
and here is an example of my problem which I will explain in more detail after:
Application 5 PercentageComplete Value
Contains no Variations
Application 17 PercentageComplete Value
Contains Variation 1 40% £100.00
Application 35 PercentageComplete Value
Contains Variation 1 100% £150.00
Contains Variation 2 25% £125.00
Application 91 PercentageComplete Value
Contains Variation 1 100% £0.00
Contains Variation 2 60% £175.00
An Application can contain multiple Variations
Once an Application contains a Variation, it needs to be automatically added to the next Application that is created.
So using the above example, the user would manually add Variation 1 to Application 17, enter its percentage complete, and the the value would be calculated automatically
MY PROBLEM:
Now for Application 35, I want the variation line from the previous Application added to this one AUTOMATICALLY, however when the infomation is edited (now at 100%) i do not want this to affect Application 17.
is my only option to keep duplicating the data for each Variation Line or is there a more efficient method someone could help me with? I have tried writing a Query to do this also, which was a lot more difficult than i anticipated so if this is the only method, some pointers or an example would be great help.
For the Query, I created this table to try it:
VariationLine ApplicationID VariationID PreviousPercentage NewPercentage Value
1 17 1 0% 40% £100.00
2 35 1 40% 100% £150.00
3 35 2 0% 25% £125.00
4 91 1 100% 100% £0.00
5 91 2 25% 60% £175.00
If I was to add a 5th Application, then I would need to insert the 2 previous VariationLines for the new Project (4 and 5)
Short Version of my problem:
I have an 4 Application's... The 1st has no Variations and just its own cost. For the 2nd Application the user manually adds Variation 1 with a percentage complete of 40% (£100.00)... when the user creates the 3rd Application, I would like SQL Server to automatically add the 1st Variation to this Project, as a new Variation line so that i can ammend the percentage complete and not affect the previous Application
Entity Relationship Diagram

I have created a view to first determine the the previous VariationLines based upon a changedate column, and when a new Application is added, then this view below would select all of the Applications Variations based on the previous Application. If the Application was a 1st, then nothing is added at first, a user has the option to add in Variations to the Application at any point in time.
CREATE VIEW NextApplicationVariations AS
SELECT variationid,
(SELECT TOP (1) applicationid
FROM dbo.applications
ORDER BY changedate DESC) AS ApplicationID,
(SELECT TOP (1) applicationnumber
FROM dbo.applications
ORDER BY changedate DESC) AS ApplicationCounter,
variationpercentage,
0 AS VariationValue,
variationpercentage AS PreviousPercentage,
projectreference,
totalvariationvalue,
0 AS VariationRetention
FROM dbo.variationline
WHERE ( projectreference = (SELECT TOP (1) projectreference
FROM dbo.projectcosts
ORDER BY changedate DESC) )
AND ( applicationcounter = (SELECT TOP (1) applicationnumber
FROM dbo.applications
ORDER BY changedate DESC) - 1 )
Now all thats left to do is insert the new lines based upon this view, and now this query is ran in a view, I can use the NOT EXISTS in the query to determine whether the latest Application has had its VariationLines added automatically
--INSERT PREVIOUS VARIATIONS INTO NEW APPLICATIONS
INSERT INTO variationline
(variationid,
applicationid,
applicationcounter,
variationpercentage,
variationvalue,
previouspercentage,
projectreference,
totalvariationvalue,
variationretention)
SELECT nextvariationline.variationid,
nextvariationline.applicationid,
nextvariationline.applicationcounter,
nextvariationline.variationpercentage,
nextvariationline.variationvalue,
nextvariationline.previouspercentage,
nextvariationline.projectreference,
nextvariationline.totalvariationvalue,
nextvariationline.variationretention
FROM nextvariationline
WHERE NOT EXISTS (SELECT *
FROM variationline
WHERE nextvariationline.applicationid =
variationline.applicationid)

Related

Decrement all values in a column after insert at top SQL

Before Inserting
Id Priority
1 . 1
2 . 2
3 . 3
After Inserting Id: 4, Priority 2
Id Priority
1 . 1
4 . 2
2 . 3
3 . 4
fairly new to postgres, and i have a table with a column named priority. this column should have unique values, and if you attempt to give a row a priority that already exists, it would basically insert it with that priority, and decrement all the priorities that are <= by one to accommodate it.
is there a term for this sort of behavior? i know it will involve a column with unique values, but are there any model constraints i can introduce to enable this sort of behavior? or do i need to manually code an algorithm to do this and account for all edge cases.
I wouldn't store priority as it's own field. Create the table as ID, priority, Date_entered. Then use:
Select ID, rank() over (order by priority, date_entered) as priority
...
I suspect since the rank can change so frequently, calculating it on the fly like this would be preferential to attempting to store the rank and keep it updated.
edit:
There is a logical flaw to this that I can spot already...if record 4 was inserted as priority 2 (so the database contains 2 priority 2 records), there really wouldn't be an easy way to inject ID 5 between ID 4 and 2 without manipulating the date_entered field.
second edit:
Allowing the priority column to be decimal (priority 2 entered, then priority 2.5 entered, and so on), then using the rank() function to resolve that to an integer would get around that. There isn't a pretty answer here that I can find

SQL to return records that do not have a complete set according to a second table

I have two tables. I want to find the erroneous records in the first table based on the fact that they aren't complete set as determined by the second table. eg:
custID service transID
1 20 1
1 20 2
1 50 2
2 49 1
2 138 1
3 80 1
3 140 1
comboID combinations
1 Y00020Y00050
2 Y00049Y00138
3 Y00020Y00049
4 Y00020Y00080Y00140
So in this example I would want a query to return the first row of the first table because it does not have a matching 49 or 50 or (80 and 140), and the last two rows as well (because there is no 20). The second transaction is fine, and the second customer is fine.
I couldn't figure this out with a query, so I wound up writing a program that loads the services per customer and transid into an array, iterates over them, and ensures that there is at least one matching combination record where all the services in the combination are present in the initially loaded array. Even that came off as hamfisted, but it was less of a nightmare than the awkward outer joining of multiple joins I was trying to accomplish with SQL.
Taking a step back, I think I need to restructure the combinations table into something more accommodating, but I still can't think of what the approach would be.
I do not have DB2 so I have tested on Oracle. However listagg function should be there as well. The table service is the first table and comb the second one. I assume the service numbers to be sorted as in the combinations column.
select service.*
from service
join
(
select S.custid, S.transid
from
(
select custid, transid, listagg(concat('Y000',service)) within group(order by service) as agg
from service
group by custid, transid
) S
where not exists
(
select *
from comb
where S.agg = comb.combinations
)
) NOT_F on NOT_F.custid = service.custid and NOT_F.transid = service.transid
I dare to say that your database design does not conform to the first normal form since the combinations column is not atomic. Think about it.

How to display top 10 records by number of accesses

So I am building a report to display the Top 10 Applications by number of Accesses in the past month. So far I've wrote this code:
Select
Top 10 r.Displayname, Count(Distinct(ur.EntryDtm)) as AccessCount
from Table1 ur, Table2 r
where ur.EntryDtm between
(DATEADD(MONTH, DATEDIFF(MONTH,0,GETDATE())-1,0))
and
(DATEADD(MONTH,-1,DATEADD(mm,DATEDIFF(m,0,GETDATE())+1,0)))
and r.AppID = ur.AppID
group by r.Displayname
order by 2 desc
This only displays only 1 record (example output below):
Displayname|AccessCount
-----------------------
App 1 | 26
What am I missing to return the top 10 Apps, as opposed to just the most accessed app? When I go and view the records manually in MS SQL Server, there is other applications there, but they are not appearing when running my query.
Thank you in advance for any help!

Multicriteria Insert/Update

I'm trying to create a query that will insert new records to a table or update already existing records, but I'm getting stuck on the filtering and grouping for the criteria I want.
I have two tables: tbl_PartInfo, and dbo_CUST_BOOK_LINE.
I'm want to select from dbo_CUST_BOOK_LINE based upon the combination of CUST_ORDER_ID, CUST_ORDER_LINE_NO, and REVISION_ID. Each customer order can have multiple lines, and each line can have multiple revision. I'm trying to select the unique combinations of each order and it's connected lines, but take the connected information for the row with the highest value in the revision column.
I want to insert/update from dbo_CUST_BOOK_LINE the following columns:
CUST_ORDER_ID
PART_ID
USER_ORDER_QTY
UNIT_PRICE
I want to insert/update them into tbl_PartInfo as the following columns respectively:
JobID
DrawingNumber
Quantity
UnitPrice
So if I have the following rows in dbo_CUST_BOOK_LINE (PART_ID omitted for example)
CUST_ORDER_ID CUST_ORDER_LINE_NO REVISION_ID USER_ORDER_QTY UNIT_PRICE
SCabc 1 1 0 100
SCabc 1 2 4 150
SCabc 1 3 4 125
SCabc 2 3 2 200
SCxyz 1 1 0 0
SCxyz 1 2 3 50
It would return
CUST_ORDER_ID CUST_ORDER_LINE_NO (REVISION_ID) USER_ORDER_QTY UNIT_PRICE
SCabc 1 3 4 125
SCabc 2 3 2 200
SCxyz 1 2 3 50
but with PART_ID included and without REVISION_ID
So far, my code is just for the inset portion as I was trying to get the correct records selected, but I keep getting duplicates of CUST_ORDER_ID and CUST_ORDER_LINE_NO.
INSERT INTO tbl_PartInfo ( JobID, DrawingNumber, Quantity, UnitPrice, ProductFamily, ProductCategory )
SELECT dbo_CUST_BOOK_LINE.CUST_ORDER_ID, dbo_CUST_BOOK_LINE.PART_ID, dbo_CUST_BOOK_LINE.USER_ORDER_QTY, dbo_CUST_BOOK_LINE.UNIT_PRICE, dbo_CUST_BOOK_LINE.CUST_ORDER_LINE_NO, Max(dbo_CUST_BOOK_LINE.REVISION_ID) AS MaxOfREVISION_ID
FROM dbo_CUST_BOOK_LINE, tbl_PartInfo
GROUP BY dbo_CUST_BOOK_LINE.CUST_ORDER_ID, dbo_CUST_BOOK_LINE.PART_ID, dbo_CUST_BOOK_LINE.USER_ORDER_QTY, dbo_CUST_BOOK_LINE.UNIT_PRICE, dbo_CUST_BOOK_LINE.CUST_ORDER_LINE_NO;
This has been far more complicated that anything I've done so far, so any help would be greatly appreciated. Sorry about the long column names, I didn't get to choose them.
I did some research and think I found a way to make it work, but I'm still testing it. Right now I'm using three queries, but it should be easily simplified into two when complete.
The first is an append query that takes the two columns I want to get distinct combo's from and selects them and using "group by," while also selecting max of the revision column. It appends them to another table that I'm using called tbl_TempDrop. This table is only being used right now to reduce the number of results before the next part.
The second is an update query that updates tbl_TempDrop to include all the other columns I wanted by setting the criteria equal to the three selected columns from the first query. This took an EXTREMELY long time to complete when I had 700,000 records to work with, hence the use of the tbl_TempDrop.
The third query is a basic append query that appends the rows of tbl_TempDrop to the end destination, tbl_PartInfo.
All that's left is to run all three in a row.
I didn't want to include the full details of any tables or queries yet until I ensure that it works as desired, and because some of the names are vague since I will be using this method for multiple query searches.
This website helped me a little to make sure I had the basic idea down. http://www.techonthenet.com/access/queries/max_query2_2007.php
Let me know if you see any flaws with the ideology!

MDX query to count number of rows that match a certain condition (newest row for each question, client group)

I have the following fact table:
response_history_id client_id question_id answer
1 1 2 24
2 1 2 27
3 1 3 12
4 1 2 43
5 2 2 39
It holds history of client answers to some questions. The largest response_history_id for each client_id,question_id combination is the latest answer for that question and client.
What I want to do is to count the number of clients whose latest answer falls within a specific range
I have some dimensions:
question associated with question_id
client associated with client_id
response_history_id associated with response_history_id
range associated with answer. 0-20 low, 20-40 = medium, >40 is high
and some measures:
max_history_id as max(response_history_id)
clients_count as disticnt count(client_id)
Now, I want to group only the latest answers by range:
select
[ranges].members on 0,
{[Measures].[clients_count]} on 1
from (select [question].[All].[2] on 1 from [Cube])
What I get is:
Measures All low medium high
clients_count 2 0 2 1
But what I wanted (and I can't get) is the calculation based on the latest answer:
Measures All low medium high
clients_count 2 0 1 1
I understand why my query doesn't give me the desired result, it's more for demonstration purpose. But I have tried a lot of more complex MDX queries and still couldn't get any good result.
Also, I can't generate a static view from my fact table because later on I would like to limit the search by another column in fact table which is timestamp, my queries must eventually be able to get _the number of clients whose latest answer to a question before a given timestamp falls within a specific range.
Can anyone help me with this please?
I can define other dimensions and measures and I am using iccube.