SQL - Getting the max effective date less than a date in another table - sql

I'm currently working on a conversion script to transfer a bunch of old data out of an SQL Server 2000 database and onto a SQL Server 2008. One of thing things I'm trying to accomplish during this conversion is to eliminate all of the composite keys and replace them with a "proper" primary key. Obviously, when I transfer the data I need to inject the foreign key values into the new table structures.
I'm currently stuck with one data set though and I can't seem to get my head around it in a set-based fashion. The two tables with which I am working are called Charge and Law. They have a 1:1 relationship and "link" on three columns. The first two are an equal link on the LawSource and LawStatue columns, but the third column is causing me problems. The ChargeDate column should link to the LawDate column where LawDate <= ChargeDate.
My current query is returning more than one row (in some cases) for a given Charge because the Law may have more than one LawDate that is less than or equal to the ChargeDate.
Here's what I currently have:
select LawId
from Law a
join Charge b on b.LawSource = a.LawSource
and b.LawStatute = a.LawStatute
and b.ChargeDate >= a.LawDate
Any way I can rewrite this to get the most recent entry in the Law table that is the same (or earlier) date at the ChargeDate?

This would be easier in SQL 2008 with the partitioning functions (so, it should be easier in the future for you).
The usual caveats of "I don't have your schema, so this isn't tested" apply, but I think it should do what you need.
select
l.LawID
from
law l
join (
select
a.LawSource,
a.LawStatue,
max(a.LawDate) LawDate
from
Law a
join Charge b on b.LawSource = a.LawSource
and b.LawStatute = a.LawStatute
and b.ChargeDate >= a.LawDate
group by
a.LawSource, a.LawStatue
) d on l.LawSource = d.LawSource and l.LawStatue = d.LawStatue and l.LawDate = d.LawDate

If performance is not an issue, cross apply provides a very readable way:
select *
from Law l
cross apply
(
select top 1 *
from Charge
where LawSource = l.LawSource
and LawStatute = l.LawStatute
and ChargeDate >= l.LawDate
order by
ChargeDate
) c
For each row, this looks up the row in the Charge table with the smallest ChargeDate.
To include rows from Law without a matching Charge, change cross apply to outer apply.

Related

Can I divide an amount across multiple parties and round to the 'primary' party in a single SQL query?

I am working on an oracle PL/SQL process which divides a single monetary amount across multiple involved parties in a particular group. Assuming 'pGroupRef' is an input parameter, the current implementation first designates a 'primary' involved party, and then it splits the amount across all the secondaries as follows:
INSERT INTO ActualValue
SELECT
...
pGroupRef AS GroupRef,
ROUND(Am.Amount * P.SplitPercentage / 100, 2) AS Amount,
...
FROM
Amount Am,
Party P
WHERE
Am.GroupRef = pGroupRef
AND P.GroupRef = Am.GroupRef
...
P.PrimaryInd = 0;
Finally, it runs a second procedure to insert whatever amount is left over to the primary party, i.e.:
INSERT INTO ActualValue
SELECT
...
pGroupRef AS GroupRef,
Am.Amount - S.SecondaryAmounts,
FROM
Amount Am,
Party P,
(SELECT SUM(Amount) AS SecondaryAmounts FROM ActualValue WHERE GroupRef = pGroupRef) S
WHERE
Am.GroupRef = pGroupRef
AND P.GroupRef = Am.GroupRef
...
P.PrimaryInd = 1;
However, the full query here is very large and I am making this area more complex by adding subgroups, each of which will have their own primary member, and the possibility of overrides - hence if I continued to use this implementation then it would mean a lot of duplicated SQL.
I suppose I could always calculate the correct amounts into an array before running a single unified insert - but I feel like there has to be an elegant mathematical way to capture this logic in a single SQL Query.
So you can use analytical functions to get what you are looking for. As I didn't know your exact structure, this is only an example:
SELECT s.party_id, s.member_id,
s.portion + DECODE(s.prime, 1, s.total - SUM(s.portion) OVER (PARTITION BY s.party_id),0)
FROM (SELECT p.party_id, p.member_id,
ROUND(a.amt*(p.split/100), 2) AS PORTION,
a.amt AS TOTAL, p.prime
FROM party p
INNER JOIN amount a ON p.party_id = a.party_id) s
So in the query you have a subquery that gathers the required information, then the outer query puts everything together, only applying the remainder to the record marked as prime.
Here is a DBFiddle showing how this works (LINK)
N.B.: Interestingly in the example in the DBFiddle, there is a 0.01 overpayment, so the primary actually pays less.

My Joins in query not pulling through correctly

Good evening. Could someone please help me with the following. I am trying to join two tables.The first id wbr_global.gl_ap_details. This stores historic GL information. The second table sandbox.utr_fixed_mapping is where account mapping is stored. For example, ana ccount number 60820 is mapped as Employee relation. The first table needs the mapping from the second table linked on the account number. The output I am getting is not right and way to bug. Any help would be appreciated!
Output
select sandbox.utr_fixed_mapping_na.new_mapping_1,sum(wbr_global.gl_ap_details.amount)
from wbr_global.gl_ap_details
LEFT JOIN sandbox.utr_fixed_mapping_na ON wbr_global.gl_ap_details.account_number = sandbox.utr_fixed_mapping_na.account_number
Where gl_ap_details.cost_center = '1172'
and gl_ap_details.period_name = 'JUL-21'
and gl_ap_details.ledger_name = 'Amazon.com, Inc.'
Group by 1;
I tried adding the cast function but after 5000 seconds of the query running I canceled it.
The query itself appears ok, but minor changes. Learn to use table "aliases". This way you don't have to keep typing long database.table.column all over. Additionally, SQL is easier to read doing it that way anyhow.
Notice the aliases "gl" and "fm" after the tables are declared, then these aliases are used to represent the columns.. Easier to read, would you agree.
Added GL Account number as described below the query.
select
gl.account_number,
fm.new_mapping_1,
sum(gl.amount)
from
wbr_global.gl_ap_details gl
LEFT JOIN sandbox.utr_fixed_mapping_na fm
ON gl.account_number = fm.account_number
Where
gl.cost_center = '1172'
and gl.period_name = 'JUL-21'
and gl.ledger_name = 'Amazon.com, Inc.'
Group by
gl.account_number,
fm.new_mapping_1
Now, as for your query and getting null. This just means that there are records within the gl_ap_details table with an account number that is not found in the utr_fixed_mapping_na table. So, to see WHAT gl account number does NOT exist, I have added it to the query. Its possible there are MULTIPLE records in the gl_ap_details that are not found in the mapping table. So, you may get
GLAccount Description SumOfAmount
glaccount1 null $someAmount
glaccount37 null $someAmount
glaccount49 null $someAmount
glaccount72 Depreciation $someAmount
glaccount87 Real Estate $someAmount
glaccount92 Building $someAmount
glaccount99 Salaries $someAmount
I obviously made-up glaccounts just to show the purpose. You may have multiple where the null's total amount is actually masking how many different gl account numbers were NOT found.
Once you find which are missing, you can check / confirm they SHOULD be in the mapping table.
FEEDBACK.
Since you do realize the missing numbers, lets consider a Cartesian result. If there are multiple entries in the mapping table for the same G/L account number, you will get a Cartesian result thus bloating your numbers. To clarify, lets say your mapping table has
Mapping file.
GL Descr1 NewMapping
1 test Salaries
1 testView Buildings
1 Another Depreciation
And your GL_AP_Details has
GL Amount
1 $100
Your total for the query would result in $300 because the query is trying to join the AP Details GL #1 to EACH of the entries in the mapping file thus bloating the amount. You could also add a COUNT(*) as NumberOfEntries to the query to see how many transactions it THINKS it is processing. Is there some "unique ID" in the GL_AP_Details table? If so, then you could also do a count of DISTINCT ID values. If they are different (distinct is lower than # of entries), I think THAT is your culprit.
select
fm.new_mapping_1,
sum(gl.amount),
count(*) as NumberOfEntries,
count( distinct gl.UniqueIdField ) as DistinctTransactions
from
wbr_global.gl_ap_details gl
LEFT JOIN sandbox.utr_fixed_mapping_na fm
ON gl.account_number = fm.account_number
Where
gl.cost_center = '1172'
and gl.period_name = 'JUL-21'
and gl.ledger_name = 'Amazon.com, Inc.'
Group by
fm.new_mapping_1
Might you also need to limit the mapping table for a specific prophecy or mec view?
If you "think" that the result of an aggregate is wrong, then the easiest way to verify this is to select the individual rows that correlate to 1 record in the aggregate output and inspect the records, looking for duplications.
For instance, pick 'Building Management':
SELECT fixed.new_mapping_1,details.amount,*
FROM wbr_global.gl_ap_details details
LEFT JOIN sandbox.utr_fixed_mapping_na fixed ON details.account_number = fixed.account_number
WHERE details.cost_center = '1172'
AND details.period_name = 'JUL-21'
AND details.ledger_name = 'Amazon.com, Inc.'
AND details.account_number = 'Building Management'
Notice that we tack on a ,* to the end of the projection, this will show you everything that the query has access to, you should look for repeating sections of data that you were not expecting, then depending on which table they originate from your might add additional criteria to the JOIN, or to the WHERE or you might need to group by additional columns.
This type of issue is really hard to comment on in a forum like this because it is highly specific to your schema, and the data contained within it, making solutions highly subjective to criteria you are not likely to publish online.
Generally if you think a calculation is wrong, you need to manually compute it to verify, this above advice helps you to inspect the data your query is using, you should either construct your own query or use other tools to build the data set that helps you to manually compute the correct values, then work them back into or replace your original query.
The speed issues are out of scope here, we can comment on the poor schema design but I suspect you don't have a choice. In the utr_fixed_mapping_na table you should make the account_number have the same column type as the source data, or add a new column that has the data in the original type, then you can setup indexes on the columns to improve the speed of the join.

Splitting one table based on criteria and comparing

I'm not quite sure on the best way to phrase this particular query, so I hope the title is adequate, however, I will attempt to describe what it is I need to be able to understand how to do. Just to clarify, this is for oracle sql.
We have a table called assessments. There are different kinds of assessments within this table, however, some assessments should follow others in a logical order and within set time frames. The problems come in when a client has multiple assessments of the same type, as we have to use a fairly inefficient array formula in excel to identify which 'full' assessment corresponds with the 'initial' assessment.
I have an earlier query that was resolved on this site (Returning relevant date from multiple tables including additional table info) which I believe includes a lot of the logic for what is required (particularly in identifying a corresponding event which has occurred within a specified timeframe). However, whilst that query pulls data from 3 seperate tables (assessments, events, responsiblities), I now need to create a query that generates a similar outcome but pulling from 1 main table and a 2nd table to return worker information. I thought the most logical way would be be to create a query that looks at the assessment table with one type of assessment, and then joins to the assessment table again (possibly a temporary table?) with assessment type that would follow the initial one.
For example:
Table 1 (Assessments):
Client ID Assessment Type Start End
P1 1 Initial 01/01/2012 05/01/2012
Table 2 (Assessments temp?):
Client ID Assessment Type Start End
P1 2 Full 12/01/2012
Table 3:
ID Worker Team
1 Bob Team1
2 Lyn Team2
Result:
Client ID Initial Start Initial End Initial Worker Full Start Full End
P1 1 01/01/2012 05/01/2012 Bob 12/01/2012
So table 1 and table 2 draw from the same table, except it's bringing back different assessments. Ideally, there'd be a check to make sure that the 'full' assessment started within X days of the end of the 'initial' assessment (similar to the 'likely' check in the previous query mentioned earlier). If this can be achieved, it's probably worth mentioning that I'd also be interested in expanding this to look at multiple assessment types, as roughly in the cycle a client could be expected to have between 4 or 5 different types of assessment. Any pointers would be appreciated, I've already had a great deal of help from this community which is very valuable.
Edit:
Edited to include solution following MBs advice.
Select
*
From(
Select
I.ASM_SUBJECT_ID as PNo,
I.ASM_ID As IAID,
I.ASM_QSA_ID as IAType,
I.ASM_START_DATE as IAStart,
I.ASM_END_DATE as IAEnd,
nvl(olm_bo.get_ref_desc(I.ASM_OUTCOME,'ASM_OUTCOME'),'') as IAOutcome,
C.ASM_ID as CAID,
C.ASM_QSA_ID as CAType,
C.ASM_START_DATE as CAStart,
C.ASM_END_DATE as CAEnd,
nvl(olm_bo.get_ref_desc(C.ASM_OUTCOME,'ASM_OUTCOME'),'') as CAOutcome,
ROUND(C.ASM_START_DATE -I.ASM_START_DATE,0) as "Likely",
row_number() over(PARTITION BY I.ASM_ID
ORDER BY
abs(I.ASM_START_DATE - C.ASM_START_DATE))as "Row Number"
FROM
O_ASSESSMENTS I
left join O_ASSESSMENTS C
on I.ASM_SUBJECT_ID = C.ASM_SUBJECT_ID
and C.ASM_QSA_ID IN ('AA523','AA1326') and
ROUND(C.ASM_START_DATE - I.ASM_START_DATE,0) >= -2
AND
ROUND(C.ASM_START_DATE - I.ASM_START_DATE,0) <= 25
and C.ASM_OUTCOME <>'ABANDON'
Where I.ASM_QSA_ID IN ('AA501','AA1323')
AND I.ASM_OUTCOME <> 'ABANDON'
AND
I.ASM_END_DATE >= '01-04-2011') WHERE "Row Number" = 1
You can access the same table multiple times in a given query in SQL, simply by using table aliases. So one way of doing this would be:
select i.client,
i.id initial_id,
i.start initial_start,
i.end initial_end,
w.worker initial_worker,
f.id full_id,
f.start full_start,
f.end full_end
from assessments i
join workers w on i.id = w.id
left join assessments f
on i.client = f.client and
f.assessment_type = 'Full' and
f.start between i.end and i.end + X
/* replace X with appropriate number of days */
where i.assessment_type = 'Initial'
Note: column names such as end (that are reserved words in Oracle SQL) should normally be double-quoted, but from the previous question it looks as though these are simplified versions of the actual column names.
From your post, I assume that you're using Oracle here (as I see "Oracle" in the question).
In terms of "temp" tables, Views come right to mind. An Oracle View can give you different looks of a table which is what it sounds like you're looking for with different kinds of assessments.
Don Burleson is a good source for anything Oracle related and he gives some tips on Oracle Views at http://www.dba-oracle.com/concepts/views.htm

Querying for records with parameters depending upon previous dates

My table are in the form of :
Id, Date, Open,High,Low,Close,VOlume,OI
I am using MS Access, and I need to query like this:
Select those dates(D), where Close on D-2> D-3 and D-1>D-2
So, how do I form a query, with this? In general, you can think of it as a query with its parameters on previous records.
Soham
SELECT
[Today].*
FROM
(
(
MyTable AS [Today]
INNER JOIN
MyTable AS [TodayMinus1]
ON [TodayMinus1].Date = DATEADD("d", -1, [Today].Date)
AND [TodayMinus1].ID = [Today].ID
)
INNER JOIN
MyTable AS [TodayMinus2]
ON [TodayMinus2].Date = DATEADD("d", -2, [Today].Date)
AND [TodayMinus2].ID = [Today].ID
)
INNER JOIN
MyTable AS [TodayMinus3]
ON [TodayMinus3].Date = DATEADD("d", -3, [Today].Date)
AND [TodayMinus3].ID = [Today].ID
WHERE
[TodayMinus1].Close > [TodayMinus2].Close
AND [TodayMinus2].Close > [TodayMinus3].Close
EDIT Note to elaborate on the use of three joins.
Systems like SAS operate as explicit loops where you are able to base a calculation on the values or results obtained from previous itterations of the loop.
SQL, however, is expressed as Sets rather than loops, and then the optimiser estimates the most algorithmically efficient way to accomplish that logic. This set based expression, however, traditionally means that you can't say "three records ago" as the set doesn't have an explicit order, or an order it is processed in (parallelism may mean it's processed in chuncks, index may mean it's processed in different orders, etc, etc).
This means that you need a set based mechanism for obtaining the records you want to compare. In this case, if you want to compare "today" with "yesterday", each on of those is a set which you join together before comparing. You have a total of 4 different days, so 4 different sets to join together for comparison. In a harsh sense, that's just how a relational database's set based expression works...
ANSI-SQL does now, however, include windowing functions such as LAG that allow a set based notation for what you desire. It is not yet widely implemented for a variety of reasons. As ACCESS is a light-weight database (compared with MySQL, SQL Server, Oracle, etc) I wouldn't expect leading edge functionality.

SQL Server 2008: Recursive query where hierarchy isn't strict

I'm dealing with a large multi-national corp. I have a table (oldtir) that shows ownership of subsidiaries. The fields for this problem are:
cID - PK for this table
dpm_sub - FK for the subsidiary company
dpm_pco - FK for the parent company
year - the year in which this is the relationship (because they change over time)
There are other fields, but not relevant to this problem. (Note that there are no records to specifically indicate the top-level companies, so we have to figure out which they are by having them not appear as subsidiaries.)
I've written the query below:
with CompanyHierarchy([year], dpm_pco, dpm_sub, cID)
as (select distinct oldtir.[year], cast(' ' as nvarchar(5)) as dpm_pco, oldtir.dpm_pco as dpm_sub, cast(0 as float) as cID
from oldtir
where oldtir.dpm_pco not in
(select dpm_sub from oldtir oldtir2
where oldtir.[year] = oldtir2.[year]
and oldtir2.dpm_sub <> oldtir2.dpm_pco)
and oldtir.[year] = 2011
union all
select oldtir.[year], oldtir.dpm_pco, oldtir.dpm_sub, oldtir.cID
from oldtir
join CompanyHierarchy
on CompanyHierarchy.dpm_sub = oldtir.dpm_pco
and CompanyHierarchy.[year] = oldtir.[year]
where oldtir.[year] = 2011
)
select distinct CompanyHierarchy.[Year],
CompanyHierarchy.[dpm_pco],
CompanyHierarchy.dpm_sub,
from CompanyHierarchy
order by 1, 2, 3
It fails with msg 530: "The maximum recursion 100 has been exhausted before statement completion."
I believe the problem is that the relationships in the table aren't strictly hierarchical. Specifically, one subsidiary can be owned by more than one company, and you can even have the situation where A owns B and part of C, and B also owns part of C. (One of the other fields indicates percent of ownership.)
For the time being, I've solved the problem by adding a field to track level, and arbitrarily stopping after a few levels. But this feels kludgy to me, since I can't be sure of the maximum number of levels.
Any ideas how to do this generically?
Thanks,
Tamar
Thanks to the commenters. They made me go back and look more closely at the data. There were, in fact, errors in the data, which led to infinite recursion. Fixed the data and the query worked just fine.
Add the OPTION statement and see if it makes a difference. This will increase the levels of recursion to 32K
select distinct CompanyHierarchy.[Year],
CompanyHierarchy.[dpm_pco],
CompanyHierarchy.dpm_sub,
from CompanyHierarchy
order by 1, 2, 3
option (maxrecursion 0)