SQLite selecting transactions that do / do not meet a particular criteria - sql

I am trying to extract data from a GnuCash SQLite database. Relevant tables include accounts, transactions, and splits. Simplistically, accounts contain transactions which contain splits, and each split points back to an account.
The transactions need to be processed differently depending on whether each one does or does not include a particular kind of transaction fee—in this case whether or not the transaction contains a split linked to account 8190-000.
I've set up two queries, one that handles transactions with the transaction fee, and one that handles transactions without the transaction fee. The queries work, but they are awkward and wordy, and I'm sure there is a better way to do this. I did see not exists in this answer, but could not figure out how to make it work in this situation.
My current queries look like this:
-- Find all transactions containing a split with account code 8190-000
select tx_guid from transactions
inner join
(select tx_guid from
(splits inner join accounts on splits.account_guid = accounts.guid)
where accounts.code = "8190-000") fee_transactions
on fee_transactions.tx_guid = transactions.guid;
-- Find all transactions not containing a split with account code 8190-000
select guid from transactions
except
select tx_guid from transactions
inner join
(select tx_guid from
(splits inner join accounts on splits.account_guid = accounts.guid)
where accounts.code = "8190-000") fee_transactions
on fee_transactions.tx_guid = transactions.guid;
Given that I need to use these results in other queries, what is a simpler and more succinct way to obtain these lists of transactions?

You can use EXISTS for your 1st query like this:
SELECT t.*
FROM transactions t
WHERE EXISTS (
SELECT 1
FROM splits s INNER JOIN accounts a
ON s.account_guid = a.guid
WHERE a.code = '8190-000' AND ?.tx_guid = t.guid
);
Change ? to s or a, depending on which table contains the column tx_guid (splits or accounts), since it is not clear in your question.
Also, change to NOT EXISTS for your 2nd query.

Related

How to join 4 tables in SQL?

I just started using SQL and I need some help. I have 4 tables in a database. All four are connected with each other. I need to find the amount of unique transactions but can't seem to find it.
Transactions
transaction_id pk
name
Partyinvolved
transaction.id pk
partyinvolved.id
type (buyer, seller)
PartyCompany
partyinvolved.id
Partycompany.id
Companies
PartyCompany.id pk
sector
pk = primary key
The transaction is unique if the conditions are met.
I only need a certain sector out of Companies, this is condition1. Condition2 is a condition inside table Partyinvolved but we first need to execute condition1. I know the conditions but do not know where to put them.
SELECT *
FROM group
INNER JOIN groupB ON groupB.group_id = group.id
INNER JOIN companies ON companies.id = groupB.company_id
WHERE condition1 AND condition2 ;
I want to output the amount of unique transactions with the name.
It is a bit unclear what you are asking as your table definitions look like your hinting at column meanings more than names such as partycompany.id you are probably meaning the column that stores the relationship to PartyCompany column Id......
Anyway, If I follow that logic and I look at your questions about wanting to know where to limit the recordsets during the join. You could do it in Where clause because you are using an Inner Join and it wont mess you your results, but the same would not be true if you were to use an outer join. Plus for optimization it is typically best to add the limiter to the ON condition of the join.
I am also a bit lost as to what exactly you want e.g. a count of transactions or the actual transactions associated with a particular sector for instance. Anyway, either should be able to be derived from a basic query structure like:
SELECT
t.*
FROM
Companies co
INNER JOIN PartyCompancy pco
ON co.PartyCompanyId = pco.PartyCompanyId
INNER JOIN PartyInvolved pinv
ON pco.PartyInvolvedId = pinv.PartyInvolvedId
AND pinv.[type] = 'buyer'
INNER JOIN Transactions t
ON ping.TransactionId = t.TransactionId
WHERE
co.sector = 'some sector'

What's the most efficient way to exclude possible results from an SQL query?

I have a subscription database containing Customers, Subscriptions and Publications tables.
The Subscriptions table contains ALL subscription records and each record has three flags to mark the status: isActive, isExpire and isPending. These are Booleans and only one flag can be True - this is handled by the application.
I need to identify all customers who have not renewed any magazines to which they have previously subscribed and I'm not sure that I've written the most efficient SQL query. If I find a lapsed subscription I need to ignore it if they already have an active or pending subscription for that particular magazine.
Here's what I have:
SELECT DISTINCT Customers.id, Subscriptions.publicationName
FROM Subscriptions
LEFT JOIN Customers
ON Subscriptions.id_Customer = Customers.id
LEFT JOIN Publications
ON Subscriptions.id_Publication = Publications.id
WHERE Subscriptions.isExpired = 1
AND NOT EXISTS
( SELECT * FROM Subscriptions s2
WHERE s2.id_Publication = Subscriptions.id_Publication
AND s2.id_Customer = Subscriptions.id_Customer
AND s2.isPending = 1 )
AND NOT EXISTS
( SELECT * FROM Subscriptions s3
WHERE s3.id_Publication = Subscriptions.id_Publication
AND s3.id_Customer = Subscriptions.id_Customer
AND s3.isActive = 1 )
I have just over 50,000 subscription records and this query takes almost an hour to run which tells me that there's a lot of looping or something going on where for each record the SQL engine is having to search again to find any 'isPending' and 'isActive' records.
This is my first post so please be gentle if I've missed out any information in my question :) Thanks.
I don't have your complete database structure, so I can't test the following query but it may contain some optimization. I will leave it to you to test, but will explain why I have changed, what I have changed.
select Distinct Customers.id, Subscriptions.publicationName
from Subscriptions
join Customers on Subscriptions.id_Customer = Customer.id
join Publications
ON Subscriptions.id_Publication = Publications.id
Where Subscriptions.isExpired = 1
And Not Exists
(select * from Subscriptions s2
join Customers on s2.id_Customer = Customer.id
join Publications
ON s2.id_Publication = Publications.id
where s2.id_Customer = s2.id_customer and
(s2.isPending = 1 or s2.isActive = 1))
If you have no resulting data in Customer or Publications DB, then the Subscription information isn't useful, so I eliminated the LEFT join in favor of simply join. Combine the two Exists subqueries. These are pretty intensive if I recall so the fewer the better. Last thing which I did not list above but may be worth looking into is, can you run a subquery with specific data fields returned and use it in an Exists clause? The use of Select * will return all data fields which slows down processing. I'm not sure if you can limit your result unfortunately, because I don't have an equivalent DB available to me that I can test on (the google probably knows).
I suspect there are further optimizations that could be made on this query. Eliminating the Exists clause in favor of an 'IN' clause may help, but I can't think of a way right now, seeing how you've got to match two unique fields (customer id and the relevant subscription). Let me know if this helps at all.
With a table of 50k rows, you should be able to run a query like this in seconds.

Querying and adding rows

OK, this is a second attempt to resolve my issue, for those who will read this a second time, i hope its clear enough to understand a problem.
I am developing a query for a report, the thing is that while retrieving data from database this report should populate some rows, which do not exist. For illustrating purpose lets say i have these tables :
Table 1 - Companies
Table 2 - Transactions.
Table 3 - Transaction types.
Important detail that most of the companies do not have transactions of all transaction types. Although the report logic requires to dysplay a company with all of them : "real" ones with real money values and other, not existed ones with just $0. The problem starts here because transaction types are combined in logical groups, so lets say if a company has only 1 real transaction of type_1, the report should contain "$0" records of other types associated with type_1, like type_2, type_3 and type_4. If company has transactions of type_1 and type_2, report should be populated with some other tran types from different transaction type group etc.
The problem here is that the environment where it should be executed must be a pure sql (being a java programmer i understand how easy is to query database, load data into array[][] and add missing transaction types) - but the query should be ran on UNIX inside plsql batch so it should be single (or joined) select.
Thanks in advance. Any help or ideas would be very appreciated!
It sounds like you just need some sort of outer join. I'm guessing at how your tables relate to each other but it appears that you want something like
SELECT c_typ_cross_join.company_name,
c_typ_cross_join.transaction_type,
nvl( sum( t.transaction_amount ), 0 ) total_amt
FROM (SELECT c.company_name,
typ.transaction_type
FROM companies c
FULL OUTER JOIN transaction_type typ) c_typ_cross_join
LEFT OUTER JOIN transactions t ON ( c_typ_cross_join.company_id = t.company_id
AND c_typ_cross_join.transaction_type = t.transaction_typ)
GROUP BY c_typ_cross_join.company_name,
c_typ_cross_join.transaction_type
This should produce one row for every company for every transaction type and the sum of the related transactions (or 0 if there are no transactions for the combination of companies and transaction types).
You could use two sub-queries one to find all transactions per company based on the existing types the company has, second to find the totals.
SELECT companies.id, all_transactions.transaction, COALESCE(sums.total_amount, 0)
FROM companies
JOIN (SELECT ct.companyid, t.transaction
FROM transactions ct
JOIN transactions t ON t.transactiontype = ct.transactiontype
GROUP BY ct.companyid, t.transaction) all_transactions ON all_transactions.companyid = companies.companyid
LEFT JOIN (SELECT ct.companyid, SUM(t.amount) as total_amount
FROM transactions ct
GROUP BY ct.companyid) sums ON sums.companyid = companies.companyid

Uses of unequal joins

Of all the thousands of queries I've written, I can probably count on one hand the number of times I've used a non-equijoin. e.g.:
SELECT * FROM tbl1 INNER JOIN tbl2 ON tbl1.date > tbl2.date
And most of those instances were probably better solved using another method. Are there any good/clever real-world uses for non-equijoins that you've come across?
Bitmasks come to mind. In one of my jobs, we had permissions for a particular user or group on an "object" (usually corresponding to a form or class in the code) stored in the database. Rather than including a row or column for each particular permission (read, write, read others, write others, etc.), we would typically assign a bit value to each one. From there, we could then join using bitwise operators to get objects with a particular permission.
How about for checking for overlaps?
select ...
from employee_assignments ea1
, employee_assignments ea2
where ea1.emp_id = ea2.emp_id
and ea1.end_date >= ea2.start_date
and ea1.start_date <= ea1.start_date
Whole-day inetervals in date_time fields:
date_time_field >= begin_date and date_time_field < end_date_plus_1
Just found another interesting use of an unequal join on the MCTS 70-433 (SQL Server 2008 Database Development) Training Kit book. Verbatim below.
By combining derived tables with unequal joins, you can calculate a variety of cumulative aggregates. The following query returns a running aggregate of orders for each salesperson (my note - with reference to the ubiquitous AdventureWorks sample db):
select
SH3.SalesPersonID,
SH3.OrderDate,
SH3.DailyTotal,
SUM(SH4.DailyTotal) RunningTotal
from
(select SH1.SalesPersonID, SH1.OrderDate, SUM(SH1.TotalDue) DailyTotal
from Sales.SalesOrderHeader SH1
where SH1.SalesPersonID IS NOT NULL
group by SH1.SalesPersonID, SH1.OrderDate) SH3
join
(select SH1.SalesPersonID, SH1.OrderDate, SUM(SH1.TotalDue) DailyTotal
from Sales.SalesOrderHeader SH1
where SH1.SalesPersonID IS NOT NULL
group by SH1.SalesPersonID, SH1.OrderDate) SH4
on SH3.SalesPersonID = SH4.SalesPersonID AND SH3.OrderDate >= SH4.OrderDate
group by SH3.SalesPersonID, SH3.OrderDate, SH3.DailyTotal
order by SH3.SalesPersonID, SH3.OrderDate
The derived tables are used to combine all orders for salespeople who have more than one order on a single day. The join on SalesPersonID ensures that you are accumulating rows for only a single salesperson. The unequal join allows the aggregate to consider only the rows for a salesperson where the order date is earlier than the order date currently being considered within the result set.
In this particular example, the unequal join is creating a "sliding window" kind of sum on the daily total column in SH4.
Dublicates;
SELECT
*
FROM
table a, (
SELECT
id,
min(rowid)
FROM
table
GROUP BY
id
) b
WHERE
a.id = b.id
and a.rowid > b.rowid;
If you wanted to get all of the products to offer to a customer and don't want to offer them products that they already have:
SELECT
C.customer_id,
P.product_id
FROM
Customers C
INNER JOIN Products P ON
P.product_id NOT IN
(
SELECT
O.product_id
FROM
Orders O
WHERE
O.customer_id = C.customer_id
)
Most often though, when I use a non-equijoin it's because I'm doing some kind of manual fix to data. For example, the business tells me that a person in a user table should be given all access roles that they don't already have, etc.
If you want to do a dirty join of two not really related tables, you can join with a <>.
For example, you could have a Product table and a Customer table. Hypothetically, if you want to show a list of every product with every customer, you could do somthing like this:
SELECT *
FROM Product p
JOIN Customer c on p.SKU <> c.SSN
It can be useful. Be careful, though, because it can create ginormous result sets.

Select based on the number of appearances of an id in another table

I have a table B with cids and cities. I also have a table C that has these cids with extra information. I want to list all the cids in table C that are associated with ALL appearances of a given city in Table B.
My current solution relies on counting the number of times the given city appears in Table B and selecting only the cids that appear that many times. I don't know all the SQL syntax yet, but is there a way to select for this kind of pattern?
My current solution:
SELECT Agents.aid
FROM Agents, Customers, Orders
WHERE (Customers.city='Duluth')
AND (Agents.aid = Orders.aid)
AND (Customers.cid = Orders.cid)
GROUP BY Agents.aid
HAVING count(Agents.aid) > 1
It only works because I know right now with the HAVING statement.
Thanks for the help. I wasn't sure how to google this problem, since it's pretty specific.
EDIT: I'm pinpointing my problem a bit. I need to know how to determine if EVERY row in a table has a certain value for a field. Declaring a variable and counting the rows in a sub-selection and filtering out my results by IDs that appear that many times works, but It's really ugly.
There HAS to be a way to do this without explicitly count()ing rows. I hope.
Not an answer to your question, but a general improvement.
I'd recommend using JOIN syntax to join your tables together.
This would change your query to be:
SELECT Agents.aid
FROM Agents
INNER JOIN Orders
ON Agents.aid = Orders.aid
INNER JOIN Customers
ON Customers.cid = Orders.cid
WHERE Customers.city='Duluth'
GROUP BY Agents.aid
HAVING count(Agents.aid) > 1
What variant of SQL are you using?
To start with, you can (and should) use JOIN instead of doing it in the WHERE clause, e.g.,
select Agents.aid
from Agents
join Orders on Agents.aid = Orders.aid
join Customers on Customers.cid = Orders.cid
where Customers.city = 'Duluth'
group by Agents.aid
having count(Agents.aid) > 1
After that, I'm afraid I might be a little lost. Using the table names in your example query, what (in English, not pseudocode) are you trying to retrieve? For example, I think your sample query is retrieving the PK for all Agents that have been involved in at least 2 Orders involving Customers in Duluth.
Also, some table definitions for Agents, Orders, and Customers might help (then again, they might be irrelevant).
I'm not sure if I understood you problem, but I think the following query is what you want:
SELECT *
FROM customers b
INNER JOIN orders c USING (cid)
WHERE b.city = 'Duluth'
AND NOT EXISTS (SELECT 1
FROM customers b2
WHERE b2.city = b.city
AND b2.cid <> cid);
Probably you will need some indexes on these columns.