I have a Transaction table that has id,postDate,account,debit,credit columns.
In SQL Server I'm trying to write a union or join query that will highlight imbalances in this table. An imbalance is defined as any credit for an account that does not have a corresponding debit for another account. This table has millions of rows and some transactions are duplicates in it that need to be removed on a case by case basis, but I need a query that will highlight them.
I started here with a union as a start. The finer points of how to highlight imbalances is the thing I need. I basically want an indicator on the row if the record is a duplicate. I'd also like to add a running balance column to the query as well... If the approach I am taking is naive, it is because I am a novice at advanced joining. I have stubbed out for the column but hardcoded string data in it.
SELECT id,
b.account,
b.credit,
b.debit,
b.duplicate
FROM (SELECT t.id,
t.account,
t.credit,
t.debit,
'false' as duplicate
FROM TransactionRegister t
UNION ALL
SELECT t.id,
t.account,
t.debit,
t.credit,
'false' as duplicate
FROM TransactionRegister t ) AS b order by id desc
Here is a sample of the table or two transactions. Typically a debit transaction for one account will always have a matching credit to the another account. I just want to show credits without a matching debit and vice versa and let the user decide what to do or if it is legit. In this system it also appears that the matching transaction is always the next number in the sequence, but I cannot be totally sure that is 100% the case. Out of million records, if the user sees 100 that are not balancing, they can easily digest it.
ID|Date|Account|Credit|Debit
1|01/22/2018,22222,13500,0
2|01/22/2018,11111,0,13500
Related
I am trying to select all records in a time-variant Account table for each account with a change in an associated value (e.g. the maturity date). A change in the value will result in the most recent record for an account being end-dated and a new record (containing a new effective date of the following day) being created. The most recent records for accounts in this table have an end-date of 12/31/9000.
For instance, in the below illustration, account 44444444 would not be included in my query result set since it hasn't had a change in the value (and thus also has no additional records aside from the original); however, the other accounts have multiple changes in values (and multiple records), so I would want to see those returned.
I've tried using the row_num function, as well as a reflexive join, but for some reason I'm not getting the expected results. What are some ways to obtain the results I need?
Note: The primary key for this table includes the acct_id and eff_dt. Also, I'm using PostgreSQL in a Greenplum environment.
Here are two types of queries I tried to use but which produced problematic results:
Query 1
Query 2
If you want only the accounts, use aggregation:
select acct_id
from t
group by acct_id
having min(value) <> max(value);
Based on your description, you could also use count(*) >.
If you want the original records, you can use window functions:
select t.*
from (select t.*, count(*) over (partition by acct_id) as cnt
from t
) t
where cnt > 1;
I have a database of a service that helps people sell things. If they fail a delivery of a sale, they get penalised. I am trying to extract the number of active listings each user had when a particular penalty was applied.
I have the equivalent to the following tables(and relevant fields):
user (id)
listing (id, user_id, status)
transaction (listing_id, seller_id)
listing_history (id, listing_status, date_created)
penalty (id, transaction_id, user_id, date_created)
The listing_history table saves an entry every time a listing is modified, saving a record of what the new state of the listing is.
My goal is to end with a result table with the field: penalty_id, and number of active listings the penalised user had when the penalty was applied.
So far I have the following:
SELECT s1.penalty_id,
COUNT(s1.record_id) 'active_listings'
FROM (
SELECT penalty.id AS 'penalty_id',
listing_history.id AS 'record_id',
FROM user
JOIN penalty ON penalty.user_id = user.id
JOIN transaction ON transaction.id = penalty.transaction_id
JOIN listing_history ON listing_history.listing_id = listing.id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
) s1
GROUP BY s1.penalty_id
Status = 0 means that the listing is active (or that the listing was active at the time the record was created). I got results similar to what I expected, but I fear I may be missing something or may be doing the JOINs wrong. Would this have your approval? (apart from the obvious non-use of aliases, for clarity problems).
UPDATE - As the comments on this answer indicate that changing the table structure isn't an option, here are more details on some queries you could use with the existing structure.
Note that I made a couple changes to the query before even modifying the logic.
As viki888 pointed out, there was a problem reference to listing.id; I've replaced it.
There was no real need for a subquery in the original query; I've simplified it out.
So the original query is rewritten as
SELECT penalty.id AS 'penalty_id'
, COUNT(listing_history.id) 'active_listings'
FROM user
JOIN penalty
ON penalty.user_id = user.id
JOIN transaction
ON transaction.id = penalty.transaction_id
JOIN listing_history
ON listing_history.listing_id = transaction.listing_id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
GROUP BY penalty.id
Now the most natural way, in my opinion, to write the corrected timeline constraint is with a NOT EXISTS condition that filters out all but the most recent listing_history record for a given id. This does require thinking about some edge cases:
Could two listing history records have the same create date? If so, how do you decide which happened first?
If a listing history record is created on the same day as the penalty, which is treated as happening first?
If the created_date is really a timestamp, then this may not matter much (if at all); if it's really a date, it might be a bigger issue. Since your original query required that the listing history be created before the penalty, I'll continue in that style; but it's still ambiguous how to handle the case where two history records with matching status have the same date. You may need to adjust the date comparisons to get the desired behavior.
SELECT penalty.id AS 'penalty_id'
, COUNT(DISTINCT listing_history.id) 'active_listings'
FROM user
JOIN penalty
ON penalty.user_id = user.id
JOIN transaction
ON transaction.id = penalty.transaction_id
JOIN listing_history
ON listing_history.listing_id = transaction.listing_id
WHERE listing_history.date_created < penalty.date_created
AND listing_history.status = 0
AND NOT EXISTS (SELECT 1
FROM listing_history h2
WHERE listing_history.date_created < h2.date_created
AND h2.date_created < penalty.date_created
AND h2.id = listing_history.id)
GROUP BY penalty.id
Note that I switched from COUNT(...) to COUNT(DISTINCT ...); this helps with some edge cases where two active records for the same listing might be counted.
If you change the date comparisons to use <= instead of < - or, equivalently, if you use BETWEEN to combine the date comparisons - then you'd want to add AND h2.status != 0 (or AND h2.status <> 0, depending on your database) to the subquery so that two concurrent ACTIVE records don't cancel each other out.
There are several equivalent ways to write this, and unfortunately its the kind of query that doesn't always cooperate with a database query optimizer so some trial and error may be necessary to make it run well with large data volumes. Hopefully that gives enough insight into the intended logic that you could work out some equivalents if need be. You could consider using NOT IN instead of NOT EXISTS; or you could use an outer join to a second instance of LISTING_HISTORY... There are probably others I'm not thinking of off hand.
I don't know that we're in a position to sign off on a general statement that the query is, or is not, "correct". If there's a specific question about whether a query will include/exclude a record in a specific situation (or why it does/doesn't, or how to modify it so it won't/will), those might get more complete answers.
I can say that there are a couple likely issues:
The only glaring logic issue has to do with timeline management, which is something that causes a lot of trouble with SQL. The issue is, while your query demonstrates that the listing was active at some point before the penalty creation date, it doesn't demonstrate that the listing was still active on the penalty creation date. Consider
PENALTY
id transaction date
1 10 2016-02-01
TRANSACTION
id listing_id
10 100
LISTING_HISTORY
listing_id status date
100 0 2016-01-01
100 1 2016-01-15
The joins would create a single record, and the count for penalty 1 would include listing 100 even though its status had changed to something other than 0 before the penalty was created.
This is hard - but not impossible - to fix with your existing table structure. You could add a NOT EXISTS condition looking for another LISTING_HISTORY record matching the ID with a date between the first LISTING_HISTORY date and the PENALTY date, for one.
It would be more efficient to add an end date to the LISTING_HISTORY date, but that may not be so easy depending on how the data is maintained.
The second potential issue is the COUNT(RECORD_ID). This may not do what you mean - what COUNT(x) may intuitively seem like it should do, is what COUNT(DISTINCT RECORD_ID) actually does. As written, if the join produces two matches with the same LISTING_HISTORY.ID value - i.e. the listing became active at two different times before the penalty - the listing would be counted twice.
this is my first question on here, so please forgive me if I break any rules.
Here is what I need to know:
How do I create an Oracle SQL query that will display a unique count of something even if there are duplicates in the results?
Example: a customer table has a list of purchases made by various customers. The table lists the customer ID, name, category of purchase (ie Hardware, Tools, Seasonal) ect. The outcome of the query needs to show each customer id, customer name and the category of the purchase, and a count of the individual customer. SO customer ID 1 for John Smith has made a purchase in each department. If I do a count of the customer, he will appear three times as he has made three purchases, but I also need a column to count the customer only once. The count in the other rows returned for the other departments should show a 0 or Null.
I normally achieve this by pulling everything and exporting to excel. I add a column that uses an IF formula on the ID to only show a 1 on the first occurrence of the customer IE: IF(A3=A2,0,1) (if a3 is the same as A2, show a 0, if it's not the same as A2 then show a 1). This will give me a unique count of customers for one part of the report and will still show me how many purchase the customer made in another part of the report.
I want to do this directly in the SQL query as I have a large set of data this needs to be done on, and adding any formulas in excel will make the sheet huge. This will also make it easier to host the query results in ACCESS so excel can pull it from there.
I have tried to find a solution to this for a while, but any searching on Google will usually return results on how to remove duplicates form a table or how to count the duplicates in a table.
I am sorry if this is long question, but I wanted to be through so I do not waste anyone's time on back an fourth comments (I have seen this many times on here and else where when the OP asks a very cryptic question and expects everyone to understand them without further expiation).
Using distinct can be used in a count to only count the unique values of a field.
SELECT
cust.customer_id, cust.customer_name, p.category,
count(distinct p.department_id) as total_departments,
count(*) as total_purchases
FROM customers cust
LEFT JOIN purchase_table p on (cust.customer_id = p.customer_id)
GROUP BY cust.customer_id, cust.customer_name, p.category
ORDER BY cust.customer_id;
Such method is not limited to the Oracle RDBMS.
I have a table that stores transaction information. Each transaction is has a unique (auto incremented) id column, a column with the customer's id number, a column called bill_paid which indicates if the transaction has been paid for by the customer with a yes or no, and a few other columns which hold other information not relevant to my question.
I want to select all customer ids from the transaction table for which the bill has not been paid, but if the customer has had multiple transactions where the bill has not been paid I DO NOT want to select them more than once. This way I can generate that customer one bill with all the transactions they owe for instead of a separate bill for each transaction. How would I build a query that did that for me?
Returns exactly one customer_id for each customer with bill_paid equal to 'no':
SELECT
t.customer_id
FROM
transactions t
WHERE
t.bill_paid = 'no'
GROUP BY
t.customer_id
Edit:
GROUP BY summarises your resultset.
Caveat: Every column selected must be either 'grouped by' or aggregated in some fashion. As shown by nikic you could use SUM to get the total amount owed, e.g.:
SELECT
t.customer_id
, SUM(t.amount) AS TOTAL_OWED
FROM
transactions AS t
WHERE
t.bill_paid = 'no'
GROUP BY
t.customer_id
t is simply an alias.
So instead of typing transactions everywhere you can now simply type t. The alias is not necessary here since you query only one table, but I find them invaluable for larger queries. You can optionally type AS to make it more clear that you're using an alias.
You might try the Group By operator, eg group by the customer.
SELECT customer, SUM(toPay) FROM .. GROUP BY customer
I didn't design this table, and I would redesign it if I could, but that's not an option for me.
I have this table:
Transactions
Index --PK, auto increment
Tenant --this is a fk to another table
AmountCharged
AmountPaid
Balance
Other Data
The software that is used calculates the balance each time from the previous balance like this:
previousBalance - (AmountPaid - AmountCharged)
Balance is how much the tenant really owes.
However, the program uses Access and concurrent users, and messes up. Big time.
For example: I have a tenant that looks like this:
Amount Charged | Amount Paid | Balance
350 0 350
440 0 790
0 350 -350 !
0 440 -790
I want to go though and reset all the balances to what they should be, so I'd have some sort of running total. I don't know if Access can use variables like SP's or not.
I don't even know how to start on this, I'd assume it'd be a query with a subquery to sum all the charges/payments before it's index, but I don't know how to write it.
How can I do this?
Edit:
I am using Access 97
Assuming Index is incremental, and higher values --> later transaction dates, you can use a self-join with a >= condition in the join clause, something like this:
select
a.[Index],
max(a.[Tenant]) as [Tenant],
max(a.[AmountCharged]) as [AmountCharged],
max(a.[AmountPaid]) as [AmountPaid],
sum(
iif(isnull(b.[AmountCharged]),0,b.[AmountCharged])+
iif(isnull(b.[AmountPaid]),0,b.[AmountPaid])
) as [Balance]
from
[Transactions] as a
left outer join
[Transactions] as b on
a.[Tenant] = b.[Tenant] and
a.[Index] >= b.[Index]
group by
a.[Index];
Access SQL is fiddly; there may be some syntax errors above, but that's the general idea. To create this query in the query designer, add the Transactions table twice, join them on Tenant and Index, and then edit the join (if possible).
You could do the same with a subquery, something like:
select
[Index],
[Tenant],
[AmountCharged],
[AmountPaid],
(
select
sum(
iif(isnull(b.[AmountCharged]),0,b.[AmountCharged])+
iif(isnull(b.[AmountPaid]),0,b.[AmountPaid])
)
from
[Transactions] as b
where
[Transactions].[Tenant] = b.[Tenant] and
[Transactions].[Index] >= b.[Index]
) as [Balance]
from
[Transactions];
Once you have calculated the proper balances, use an update query to update the table, by joining the Transactions table to the select query defined above on Index. You could probably combine it into one update query, but that would make it more difficult to test.
If all the records have a sequnecing number (with no gaps in between) you can try the following: create a query where you link the table to itself. In the join, you spicify that you want to link the tables with Id = Id - 1. That way, you link each record to its previous record.
If ou do not have a column that can be used for this, try adding an autonumber column.
Other option is to write some simple lines in VBA to loop over the records and update the values. If it is a one-off operation, I think that will be the easiest if you are not very experienced with sql.