Appropriate idea or SQL to obtain the results set - sql

Fig 1
TxnId | TxnTypeId |BranchId |TxnNumber |LocalAmount |ItemName
--------------------------------|-----------|---------------|----------
1777486 | 101 |1099 |1804908 |65.20000000 |A
1777486 | 101 |1099 |1804908 |324.50000000 |B
1777486 | 101 |1099 |1804908 |97.20000000 |C
1777486 | 101 |1099 |1804908 |310.00000000 |D
1777486 | 101 |1099 |1804908 |48.90000000 |E
Fig 2
TxnId |TxnTypeId |BankId |Number |Check |Bank |Cash |Wallet
--------|-----------|-------|--------|-------|------|------|------
1777486 |101 |1099 |1804908 | 48.9 | 310 |389.7 |97.2
Fig 3 (Expected Output)
TxnId |BankId |ItemName |Amount |Wallet |Bank |Check |Cash
--------|-------|-----------|-------|-------|-------|-------|-------
1777486 |1099 |A |65.2 |0 0 |0 |0 |65.2
1777486 |1099 |B |324.5 |0 0 |0 |0 |324.5
1777486 |1099 |C |97.2 |97.2 |0 |0 |0
1777486 |1099 |D |48.9 |0 |0 |48.9 |0
1777486 |1099 |E |310 |0 |310 |0 |0
I have two different result set that is obtained from the different query.
Fig 1 and Fig 2.
The Result i wanted is like shown in fig 3.
Currently i do not have the flag to identify the payment mode use for each transaction(each item). I have the flag for only the complete transaction.
Fig 4
IndividualTxnPaymentDetailId| IndividualTxnId |PaymentAmount |PaymentMode
---------------------------:|:-----------------:|:-------------:|:--------------
2106163 | 1777486 |389.70000000 | Cash
2106164 | 1777486 |97.20000000 | Wallet
2106165 | 1777486 |310.00000000 | Bank
2106166 | 1777486 |48.90000000 | Check
Means if two item or more is purchased using one payment mode i do not have the proper way of identifying the payment done for each item.
Item A and B is purchased using cash as payment mode with the amount 65.2 and 324.5. Total Cash paid is 389.7
Item C is purchased using Wallet as payment mode with amount 97.2. Total Wallet amount is 97.2.
Fig 5
TxnId |LocalAmount |ItemName
--------|--------------:|:------------
1777486 |65.20000000 | A
1777486 |324.50000000 | B
1777486 |97.20000000 | C
1777486 |310.00000000 | D
1777486 |48.90000000 | E
Query by which i generated the result in Fig 4 and Fig 5
select IndividualTxnPaymentDetailId, IndividualTxnId, PaymentAmount, cc.choicecode as PaymentMode
from dbo.IndividualTxnPaymentDetail it
inner join configchoice cc on cc.configchoiceid= it.configpaymentmodeid
where IndividualTxnId = 1777486
select IndividualTxnId as TxnId, LocalAmount, CurrencyName from dbo.IndividualTxnFCYDetail where IndividualTxnId = 1777486
This is the query written to identify the transaction made through Bank. Similarly i wanted to get the transaction on all the payment mode. But could not obtain the transaction properly.
CASE
WHEN tpm.Bank - SUM(txn.LocalAmount) OVER (PARTITION BY txn.BranchId, txn.TxnNumber ORDER BY CAST(txn.ItemName AS varchar(300))) + txn.LocalAmount < 0 THEN 0
WHEN tpm.Bank - SUM(txn.LocalAmount) OVER (PARTITION BY txn.BranchId, txn.TxnNumber ORDER BY CAST(txn.ItemName AS varchar(300))) + txn.LocalAmount > txn.LocalAmount THEN txn.LocalAmount
WHEN tpm.Bank - SUM(txn.LocalAmount) OVER (PARTITION BY txn.BranchId, txn.TxnNumber ORDER BY CAST(txn.ItemName AS varchar(300))) + txn.LocalAmount > tpm.Bank THEN tpm.Bank
ELSE tpm.Bank - SUM(txn.LocalAmount) OVER (PARTITION BY txn.BranchId, txn.TxnNumber ORDER BY CAST(txn.ItemName AS varchar(300))) + txn.LocalAmount
END AS Bank,
Can you help me to get the idea or with some sql to get the result set as in fig 3.

Updated Question - Updated Responce
I read your updated question and I'm afraid the problem still stands. Neither of those queries are summing the data - they are just pulling the same already summed numbers. You would either need to get at the numbers prior to the aggregation happening -or- to have some column in your IndividualTxnPaymentDetail table that ties each row to its counterpart rows in the other table (presumably through a cross table as in - Row 1 : ItemName A, Row 1 : ItemName B, Row 2 : ItemName C, etc).
If these are simply impossible, then perhaps your approaching this the wrong way, or to put it better, perhaps you are being asked to do something that doesn't make sense - and provable so. If there is no direct relationship between these activities in the data there's not much you can be expected to do. What's more it may indicate that your organization doesn't 'think' about them that way.
These two tables seem be payments and liabilities. Perhaps consider an approach where each payment goes toward what ever the oldest outstanding balance is and are matched to the items in Fig 4 that way. Add a column to the details table to store payment toward that item. Rather than a simple Paid/Unpaid Boolean, I would store the amount of payment that has been applied toward each item or the amount still owed on each item; that way you can handle partially applied payments. As payments come in, apply them. You would likely want a similar column in the payments table too to measure the amount of each payment that you have applied; that way you can handle over-payments, and be able to know the status of things such as pending receipts in the case that payments aren't applied immediately.
I hope this helps.

Fundamental Flaw
Your question is looking to take aggregated data (in your example, the Fig 2 Cash total of 389.7) and tease out what numbers were totaled to get the sum. You can do it here since 3 of the 4 numbers in Fig 2 are unique, one-to-one matches with numbers in Fig 1 - meaning the remaining ones have to belong to each other. But imagine 100s of numbers, many or most of them sums (i.e. not one-to-one matches like most of these). Or imagine an example as simple as yours except the numbers aren't so unique (e.g. Fig 1 = (10, 10, 10, 10, 20) and Fig 2 = (10, 20, 20, 10) - it is not possible to say which ones are which) and there only needs to be two possible combinations that could be responsible for a particular sum for the results to become ambiguous.
The weakness is in Fig 2. Do you have any control over that data source? Can grab the numbers up-stream before they are totaled?
Sorry for the negative conclusion but...
I hope this helps.

The Continuing Saga
Comment: [A version of this] report has already been made ...[but] I cannot contact the person who actually wrote that thing.
Perhaps he was also asked to do something that didn't make sense but did it anyway. The math simply doesn't work. He may have written something that finds as many one-to-one matches as it can and then sort of rolls the dice on the rest of it. He may have done something like the following:
Find and eliminate all the one-to-one matches.
Take any total and subtract any item amount from it to see if it
matches any remaining item amounts(s), if so, arbitrarily pick one,
eliminate all three numbers.
Repeat this until all combinations have been tested.
But you are still potentially left with unmatched numbers, so you next need to test for sums of three numbers by:
Arbitrarily subtract any two item amounts from any of the remaining
totals.
and so on and so on, followed by testing for sums of four items and so on.

I think part of what you're looking for is buried in here:
http://www.itprotoday.com/software-development/algorithms-still-matter
it calls it 'order fulfillment' where you go through transactions, combining them until you reach a given total
I think the solution will be in multiple parts, including cursors etc.
I'm not convinced you would be able to understand or implement any solution posted. Also, I maintain that there are cases where there are ambiguous solutions.
Lastly I see you have asked 16 questions and not marked a single one as answered.

Related

How to round decimals smaller than .5 to the following number in SQL?

I'm having this situation where a I have a large database with +1000 products.
Some of them have prices like 12.3, 20.7, 55.1 for example.
| Name | Price |
| -------- | -------------- |
| Product 1| 12.3 |
| Product 2| 20.7 |
| Product 3| 55.1 |
(and so on)...
What I've tried is update prices set price = ROUND (price, 0.1).
The output for this will be:
| Name | Price |
| -------- | -------------- | (after updated)
| Product 1| 12.3 | 12.0
| Product 2| 20.7 | 21.0
| Product 3| 55.1 | 55.0
the prices with decimals < .5 will remain the same, and I'm out of ideas.
I'll appreciate any help.
Note I need to update all rows,Ii'm trying to learn about CEILING() but only shows how to use it with SELECT, any idea on how to perform an UPDATE CEILING or something?
It's not entirely clear what you're asking, but I can tell you the function call as shown makes no sense.
The second argument to the ROUND() function is the number of decimal places, not the size of the value you wish to round to. Additionally, the function only accepts integral types for that argument. Therefore, if you pass the value 0.1 to the function what will happen is the value is first cast to an integer, and the result of casting 0.1 to an integer is 0.
We see then, that calling ROUND(price, 0.1) is the same as calling ROUND(price, 0).
If you want to round to the nearest 0.1, that's one decimal place and the correct value for the ROUND() function is 1.
ROUND(price, 1)
Compare results here:
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=7878c275f0f9ea86f07770e107bc1274
Note the trailing 0's remain, because the fundamental type of the value is unchanged. If you also want to remove the trailing 0`s, then you're really moving into the realm of strings, and for that you should wait and the client code, application, or reporting tool handle the conversion.

Data Modeling - Slow Changing Dimension type 2: How to deal with schema change (column added)?

What is the best practice to deal with schema-changing when building a Slow Changing Dimension table?
For example, a column was added:
First state:
+----------+---------------------+-------------------+
|customerId|address |updated_at |
+----------+---------------------+-------------------+
|1 |current address for 1|2018-02-01 00:00:00|
+----------+---------------------+-------------------+
New state with new column, but every other followed column constant:
+----------+---------------------+-------------------+------+
|customerId|address |updated_at |newCol|
+----------+---------------------+-------------------+------+
|1 |current address for 1|2018-03-03 00:00:00|1000 |
+----------+---------------------+-------------------+------+
My first approach is to think that schema-changing means the row has changed. So I would add a new row to my SCD table:
+----------+---------------------+-------------------+------+-------------+-------------------+-------------------+
|customerId|address |updated_at |newCol|active_status|active_status_start|active_status_end |
+----------+---------------------+-------------------+------+-------------+-------------------+-------------------+
|1 |current address for 1|2018-02-01 00:00:00|null |false |2018-02-01 00:00:00|2018-03-03 00:00:00|
|1 |current address for 1|2018-03-03 00:00:00|1000 |true |2018-03-03 00:00:00|null |
+----------+---------------------+-------------------+------+-------------+-------------------+-------------------+
But, what if the columns were added, but for some specific row the value is null? For example, for row with customerId = 2, it is null:
+----------+---------------------+-------------------+------+
|customerId|address |updated_at |newCol|
+----------+---------------------+-------------------+------+
|2 |current address for 2|2018-03-03 00:00:00|null |
+----------+---------------------+-------------------+------+
In this case, I can take two approaches:
Consider every schema change as a row change, even for null rows (much easier to implement, but costlier from a storage perspective). It would result in:
+----------+---------------------+-------------------+-------------+-------------------+-------------------+------+
|customerId|address |updated_at |active_status|active_status_end |active_status_start|newCol|
+----------+---------------------+-------------------+-------------+-------------------+-------------------+------+
|1 |current address for 1|2018-02-01 00:00:00|false |2018-03-03 00:00:00|2018-02-01 00:00:00|null |
|1 |current address for 1|2018-03-03 00:00:00|true |null |2018-03-03 00:00:00|1000 |
|2 |current address for 2|2018-02-01 00:00:00|false |2018-03-03 00:00:00|2018-02-01 00:00:00|null |
|2 |current address for 2|2018-03-03 00:00:00|true |null |2018-03-03 00:00:00|null |
+----------+---------------------+-------------------+-------------+-------------------+-------------------+------+
Do a check for every row, and if it has an actual value for this new column, add it; otherwise, don't do anything to this row (for now, I didn't come up with implementation to it, but it is much more complicated and likely to be error-prone). The result in SCD table for row 2 would be 'row has not changed':
+----------+---------------------+-------------------+-------------+-------------------+-------------------+------+
|customerId|address |updated_at |active_status|active_status_end |active_status_start|newCol|
+----------+---------------------+-------------------+-------------+-------------------+-------------------+------+
|1 |current address for 1|2018-02-01 00:00:00|false |2018-03-03 00:00:00|2018-02-01 00:00:00|null |
|1 |current address for 1|2018-03-03 00:00:00|true |null |2018-03-03 00:00:00|1000 |
|2 |current address for 2|2018-02-01 00:00:00|true |null |2018-02-01 00:00:00|null |
+----------+---------------------+-------------------+-------------+-------------------+-------------------+------+
The second approuch seems more "correct", but am I right? Also, implement approuch 1 is much simpler. Approuch 2 would need some more complicated and has other trade-offs, for example:
a) What if instead of adding a columns, a columnd was droped?
b) From a query persperctive it is much more costlier.
I have done research on the subject and didn't fount this kind of situation being treated.
What is the standard approach to it? Trade-offs? Is there another approach I am missing here?
Thank you all.
Thanks for #MarmiteBomber and #MatBailie comments. Based on your comments I ended up implementing the second option, because (summary of your thoughts):
The second approach is the only one meaningful.
Implementation is a consequence of business logic, not necessarily a standard practice. In our case, we didn't need to differentiate types of nulls, so the right approach was to encapsulate known non-existing values as null, as well as unknown values, etc.
Be explicit.
The second approach also needed to add a check (is the new column present in the row?) in write time, but it saves complexity in query time, and storage. Since SCD is "slow" and this case is rare (schema changes happen, but not "every day"), adding the check in write time is better than in query time.

Getting start and end indices of string in Pandas

I have a df that looks like this:
|Index|Value|Anomaly|
---------------------
|0 |4 | |
|1 |2 |Anomaly|
|2 |1 |Anomaly|
|3 |2 | |
|4 |6 |Anomaly|
I want to get the start and end indices of the consecutive anomaly counts so in this case, it will be [[1,2],[4]]
I understand I have to use .shift and .cumsum but I am lost and I hope someone would be able to enlighten me.
Get consecutive groups taking the cumsum of the Boolean Series that checks where the value is not 'Anomoly'. Use where so that we only only take the 'Anomoly' rows. Then we can loop over the groups and grab the indices.
m = df['Anomaly'].ne('Anomaly')
[[idx[0], idx[-1]] if len(idx) > 1 else [idx[0]]
for idx in df.groupby(m.cumsum().where(~m)).groups.values()]
#[[1, 2], [4]]
Or if you want to use a much longer groupby you can get the first and last index, then drop duplicates (to deal with streaks of only 1) and get it into a list of lists. This is much slower though
(df.reset_index().groupby(m.cumsum().where(~m))['index'].agg(['first', 'last'])
.stack()
.drop_duplicates()
.groupby(level=0).agg(list)
.tolist())
#[[1, 2], [4]]

SQL Output Rows as columns

I have a table that tests an item and stores any faliures similar to:
Item|Test|FailureValue
1 |1a |"ZZZZZZ"
1 |1b | 123456
2 |1a |"MMMMMM"
2 |1c | 111111
1 |1d |"AAAAAA"
Is there a way in SQL to essential pivot these and have the failure values be output to individual columns? I know that I can already use STUFF to achieve what I want for the Test field but I would like the results as individual columns if possible.
I'm hoping to achieve something like:
Item|Tests |FailureValue1|FailureValue2|FailureValue3|Failure......
1 |1a,1b |"ZZZZZZ" |123456 |NULL |NULL ......
2 |1a,1b |"MMMMMM" |111111 |"AAAAAA" |NULL ......
Kind regards
Matt

Get the begin of a union of intervals

Disclaimer
While searching for an answer, I found this question, but I couldn't find a way to express the solution in SQL:
Union of intervals
Background
I'm trying to calculate how long the people in the company I work in are employed. In the database I have (that is already in the company for years and is [sadly] not changeable), each contract is stored as one line. Each line has a lot of information about the employee and the contract, including a contract creation date, a contract rescission date (or infinity, if still active) and the current contract situation ("active" or "deactivated"). There are, however, two problems that are preventing me from simply doing what could seem obvious:
People can be "multicontratual", so the same person could have multiple active lines at the same time.
Sometimes, there are some transfers that result in deactivating one of a person's contracts and creating a new contract line. These transfers must not be counted (i.e., I should take into account both the timelines). There is, however, no explicit flag for the transfers existence in the database, so it was defined that "it is a transfer if there was any contract rescission until 60 days before a new contract is created".
When trying to account for the multiple cases that could arise from this scenario (e.g., if the same person had many contracts through the time, then no contracts during more than 60 days, and then some other contracts, then I'd want to start counting from after the "more-than-60-days" period), I found that two rules solve the problem. I need:
The last contract creation where there was no other contract already active at the time. (this solves the problem 1)
&& there was no other active contract until 60 days before.
To the DB
To solve the problem, I decided to rearrange the rules. I wanted to take all contracts for which there was no other active contract until 60 days before its creation, and then take the "MAX()" of them. So, for example, for the following person, I would say she is active since 1973:
+----------+-----+-----------+-----------+---------------+-----------------+
| CONTRACT | ... | PERSON_ID | STATUS | CREATION_DATE | RESCISSION_DATE |
+----------+-----+-----------+-----------+---------------+-----------------+
| 1 | ... | 1 | deactived | 1973/10/01 | 1999/07/01 |
| 2 | ... | 1 | deactived | 1978/06/01 | 2000/07/01 |
| 3 | ... | 1 | deactived | 2000/08/01 | 2008/06/01 |
| 4 | ... | 1 | active | 2000/08/01 | infinity |
| 5 | ... | 1 | active | 2000/08/01 | infinity |
+----------+-----+-----------+-----------+---------------+-----------------+
I am treating the dates as if they were integers (in fact, they are in the real database). My question is: how could I create a query to take the "1973/10/01"? I.e., how could I get all the "creation_date"s that are distant from (higher than) the others in at least 60, and that are not in the intervals described by the other lines?
[and, anyway, does this seem the best way to solve the problem? (I don't think so)]