I have a table that looks something like this:
sender
reciever
amount
1
2
10
2
1
20
3
2
20
1
3
30
The desired output should be:
user
Trans_Change
1
-20
2
10
3
10
i can't find a way to write a query for it in SQL.
the logic behind the desired output should be that;
1 sends 2 amount of 10, so now 1 has: -10 and 2 has: +10 and so on...
Best Guess given known info:
We simply assign all senders negative transaction amounts union all the receivers as positive amounts and then group the data summing the transactions
With CTE AS (
SELECT sender as aUser, (-1 * amount) as Trans_Change -- Senders lose money
FROM table
UNION ALL
SELECT Receiver as aUser, amount -- receivers get money
FROM Table)
SELECT aUser, sum(Trans_Change) as Trans_Change -- aggregate transaction totals by user
FROM CTE
GROUP BY aUser
Part of addressing this is acknowledging that an amount is being used twice: once for the sender as a negative, once for a receiver as a positive (or credit/debit if you prefer) Realizing this I knew I needed to get that value on two separate rows. selecting the data twice allowed for this. Using two selects and a union all allows us to get that value twice and then it's a simple aggregration.
Related
I would like to create a query that operates similar to a cash register. Imagine a cash register full of coins of different sizes. I would like to retrieve a total value of coins in the fewest number of coins possible.
Given this table:
id
value
1
100
2
100
3
500
4
500
5
1000
How would I query for a list of rows that:
has a total value of AT LEAST a given threshold
with the minimum excess value (value above the threshod)
in the fewest possible rows
For example, if my threshold is 1050, this would be the expected result:
id
value
1
100
5
1000
I'm working with postgres and elixir/ecto. If it can be done in a single query great, if it requires a sequence of multiple queries no problem.
I had a go at this myself, using answers from previous questions:
Using ABS() to order by the closest value to the threshold
Select rows until a sum reduction of a single column reaches a threshold
Based on #TheImpaler's comment above, this prioritises minimum number of rows over minimum excess. It's not 100% what I was looking for, so open to improvements if anyone can, but if not I think this is going to be good enough:
-- outer query selects all rows underneath the threshold
-- inner subquery adds a running total column
-- window function orders by the difference between value and threshold
SELECT
*
FROM (
SELECT
i.*,
SUM(i.value) OVER (
ORDER BY
ABS(i.value - $THRESHOLD),
i.id
) AS total
FROM
inputs i
) t
WHERE
t.total - t.value < $THRESHOLD;
I am looking at transactional data such as my credit card statement. I want to ensure that I am not getting my card swiped twice. The fields that I have are card number (I have multiple), amount of transaction, transaction date, merchant code, merchant name, and transaction code.
To know if it is a true duplicate transaction, I want to know if the merchant code, merchant name, and transaction amount appear more the once. I also want to make sure that the transaction was within 5 days of each other if all else matches.
I am doing the work in SAS code, but I can also do in PROC SQL. So far in SAS I’ve sorted the data and then pulled a table that only holds duplicates, but since I’ve sorted the data, It will only call it a duplicate if the dates are the exact same date instead of the 5 days rule mentioned.
I did a simple PROC SORT.
PROC SORT DATA=WORK.TRANSACTIONS
OUT=WORK.TRANSACTIONS1
DUPOUT=WORK.SORTSORTEDDUPS
NODUPKEY;
BY CARD NUMBER TRANSACTION_AMOUNT TRANSACTION_DATE MERCHANT_CODE MERCHANT_NAME TRANSACTION_CODE
What do I need to incorporate to add my rule of transaction within 5 days?
You can do it with an additional pass, retaining (and comparing to) the last transaction date as per the below. Note the change in the sort BY statement (you'll need to update the proc sort also).
data duplicates;
set work.transactions1;
by BY CARD NUMBER TRANSACTION_AMOUNT MERCHANT_CODE MERCHANT_NAME TRANSACTION_CODE TRANSACTION_DATE;
retain datecheck 0;
if first.TRANSACTION_CODE then datecheck=0;
else if TRANSACTION_DATE-datecheck le 5 then output;
datecheck=TRANSACTION_DATE;
run;
Let's create our practice data source:
DATA MY_CREDIT_CARDS;
INPUT
C_NUMBER
TRANC_AMOUNT
TRANSC_DATE :DATE10.
TRANSC_CODE
MERCH_CODE
MERCH_NAME $10.;
FORMAT TRANSC_DATE DDMMYY10.;
CARDS;
1 100 17JAN1990 1 1 AMAZON
2 200 01JAN1990 2 8 WALLMART
4 100 04JAN1990 3 5 CRUSTYKRAB
2 200 07JAN1990 4 7 NETFLIX
1 300 01JAN1990 5 2 GOOGLEPLAY
3 200 17JAN1990 6 8 WALLMART
5 100 18JAN1990 7 2 GOOG.PLAY
5 300 19JAN1990 8 2 GOOGLEPLAY
2 200 22JAN1990 9 8 WALLMART
4 200 20JAN1990 10 2 GOOGLEPLAY
1 100 03JAN1990 11 2 GOOG.PLAY
1 100 17JAN1990 12 1 AMZN
;
RUN;
Result:
Now, first of all, I recommend not to use descriptive fields such as a names (merchant name in this case) as keys, because descriptive fields can be very variable, i.e. someone can register AMAZON as AMZN or AMAZN, or any combination you could imagine as the merchant name. Use ID fields instead. So, assuming merchant code is an unique ID, I think that is enough to identify the merchant.
Considering the above, using PROC SQL you could do something like this to find duplicates based on the rule you provide (and without the need of using any other extra-step):
PROC SQL;
/*The following assuming each record are unique
(identified by 'transaction code' in this case),
otherwise you must handle duplicate records properly.*/
SELECT
DISTINCT A.*,
CASE WHEN
B.TRANSC_CODE IS NOT NULL
THEN 1 ELSE 0 END AS DUPLICATED
FROM MY_CREDIT_CARDS AS A
LEFT JOIN MY_CREDIT_CARDS AS B
ON
A.MERCH_CODE = B.MERCH_CODE AND
A.TRANC_AMOUNT = B.TRANC_AMOUNT AND
A.TRANSC_CODE ^= B.TRANSC_CODE AND
A.TRANSC_DATE >= INTNX('day',B.TRANSC_DATE,-5) AND
A.TRANSC_DATE <= INTNX('day',B.TRANSC_DATE,5)
;
/*You could use an ORDER BY clause to sort the
results as you want.*/
RUN;
The result would be:
Now you have a new column named "DUPLICATED" showing 1 if found the value as duplicated and 0 if not.
Hope it helps.
I have two tables. I want to find the erroneous records in the first table based on the fact that they aren't complete set as determined by the second table. eg:
custID service transID
1 20 1
1 20 2
1 50 2
2 49 1
2 138 1
3 80 1
3 140 1
comboID combinations
1 Y00020Y00050
2 Y00049Y00138
3 Y00020Y00049
4 Y00020Y00080Y00140
So in this example I would want a query to return the first row of the first table because it does not have a matching 49 or 50 or (80 and 140), and the last two rows as well (because there is no 20). The second transaction is fine, and the second customer is fine.
I couldn't figure this out with a query, so I wound up writing a program that loads the services per customer and transid into an array, iterates over them, and ensures that there is at least one matching combination record where all the services in the combination are present in the initially loaded array. Even that came off as hamfisted, but it was less of a nightmare than the awkward outer joining of multiple joins I was trying to accomplish with SQL.
Taking a step back, I think I need to restructure the combinations table into something more accommodating, but I still can't think of what the approach would be.
I do not have DB2 so I have tested on Oracle. However listagg function should be there as well. The table service is the first table and comb the second one. I assume the service numbers to be sorted as in the combinations column.
select service.*
from service
join
(
select S.custid, S.transid
from
(
select custid, transid, listagg(concat('Y000',service)) within group(order by service) as agg
from service
group by custid, transid
) S
where not exists
(
select *
from comb
where S.agg = comb.combinations
)
) NOT_F on NOT_F.custid = service.custid and NOT_F.transid = service.transid
I dare to say that your database design does not conform to the first normal form since the combinations column is not atomic. Think about it.
Situation: I have three tables of parts: Raw Material, Individual Parts, and Assembled Parts. I have created a union query to list all the part numbers as well as their minimum levels of inventory and and opening levels of inventory. I also have an inventory table that uses all the part numbers. I this used the union query to find current inventory and a current balance in another query. When I attempt to open this query I get a input box asking for CurrentInventory.
Question: How do I get the input box to stop appearing?
Code:
Tables:
Raw Material, Individual Parts, and Assembled Parts all have similar formats that begin with the following
PartNum | Min | Open
1 50 100
Inventory:
PartNum | Year | Week | In | Out
1 2015 31 20 10
Queries
Union Query:
SELECT PartNum, Open, Min
FROM Raw Material
UNION
SELECT PartNum , Open, Min
FROM Individual Parts
UNION
SELECT PartNum, Open, Min
FROM Assembled Parts;
Which results in:
PartNum | Min | Open
1 50 100
etc.
Current Inventory:
SELECT AllParts.PartNum, AllParts.Open, Sum(Inventory.[In]) AS SumOfIn,
Sum(Inventory.Out) AS SumOfOut,
[Open]+[SumOfIn]-[SumOfOut] AS CurrentInventory,
AllParts.Min, [CurrentInventory]-[Min] AS CurrentBalance
FROM AllParts
INNER JOIN Inventory ON AllParts.PartNum = Inventory.PartNum
GROUP BY AllParts.PartNum, AllParts.Open, AllParts.Min,
[CurrentInventory]-[Min], [Open]+[In]-[Out];
When I attempt to run this is when I get the input box for CurrentInventory. If I don't enter anything it doesn't effect the results. However, when I attempt to run the report I generate from this, the column will show as what I entered and not the actual value.
Even though you are aliasing a calculated result as "CurrentInventory", you can't reference that calculation by the alias in the same query.
Everytime you have "CurrentInventory" (except for after the "AS") you need to replace it with [Open]+[SumOfIn]-[SumOfOut]
I have table like this:
id activity pay parent
1 pay all - null
2 pay tax 10 $ 1
3 pay water bills - 1
4 fix house - null
5 fix roof 1 $ 4
6 pay drinking water 1 $ 3
I want get table like this:
id activity pay parent matriks
1 pay all {11 $} null 1 (pay tax + pay water bills)
2 pay tax 10 $ 1 1-2
3 pay water bills {1 $} 1 1-3 (pay drinking water)
4 fix house {1 $} null 4 (fix roof)
5 fix roof 1 $ 4 4-5
6 pay drinking water 1 $ 3 1-3-6
Count from child to parent:
The problem is when water bills not counted from drinking water, pay all cant counted if pay tax or pay water not have pay value.
I tried this on our postgres db (Version 8.4.22), since the fiddle was a bit slow for my taste. But the SQL can be pasted in there and it works for postgres.
Still here is the fiddle demo take like 20 sec first time but then is faster.
Here's what produces the calculated results for me. (I didn't format it according to your requirements, because in my mind the main excercise was the calculation.) This assumes your table is called activity:
with recursive rekmatriks as(
select id, activity, pay, parent, id::text as matriks, 0 as lev
from activity
where parent is null
union all
select activity.id, activity.activity, activity.pay, activity.parent,
rekmatriks.matriks || '-' || activity.id::text as matriks,
rekmatriks.lev+1 as lev
from activity inner join rekmatriks on activity.parent = rekmatriks.id
)
, reksum as (
select id, activity, pay, parent, matriks, lev, coalesce(pay,0) as subsum
from rekmatriks
where not exists(select id from rekmatriks rmi where rmi.parent=rekmatriks.id)
union all
select rekmatriks.*, reksum.subsum+coalesce(rekmatriks.pay, 0) as subsum
from rekmatriks inner join reksum on rekmatriks.id = reksum.parent)
select id, activity, pay, parent, matriks, sum(subsum) as amount, lev
from reksum
group by id, activity, pay, parent, matriks, lev
order by id
As a bonus, this delivers the nesting depth of an id. 0 for a parent, 1 for first sublevel etc. This uses two recursive WITH queries to achieve what you want. The calculated value you need is in the amount column.
The first one (rekmatriks) processes the IDs in the table top to bottom, starting with any ids that have a parent of NULL. The recursive part simply takes the parent id and adds it's own id to it, to achieve your matriks tree representation field.
The second one (reksum) works bottom to top and starts with all rows that have no child elements. The recursive part of this query selects a parent row for each child row selected in the non-recursive part, and computes the sum of pay and subsum for each line. This produces multiple rows per id, since one parent can have multiple children.
All that's left now is the final select statement. It uses GROUP BY and SUM to aggregate the multiple possible child sum values into one row.
This does work for your particular example. It may fail if there's different cases not shown in the sample data, for example, if an item that has children carries a value that needs to be added.