Comparing aggregated columns to non aggregated columns to remove matches - sql

I have two separate tables from two different databases that are performing a matching check.
If the values match I want them out of the result set. The first table (A) has multiple entries that contain the same symbol matches for the matching columns in the second table (B).
The entries in table B, if added up will ideally equal the value of one of the matching rows of A.
The tables look like below when queried separately.
Underneath the tables is what my query currently looks like. I thought if I group the columns by the symbols I could use the SUM of B to add up to the value of A which would get rid of the entries. However, I think because I am summing from B and not from A, then the A doesn't count as an aggregated column so must be included in the group by and doesn't allow for the summing to work in the way I'm wanting it to calculate.
How would I be able to run this query so the values in B are all summed up. Then, if matching to the symbol/value from any of the entries in A, don't get included in the result set?
Table A
| Symbol | Value |
|--------|-------|
| A | 1000 |
| A | 1000 |
| B | 1440 |
| B | 1440 |
| C | 1235 |
Table B
| Symbol | Value |
|--------|-------|
| A | 750 |
| A | 250 |
| B | 24 |
| B | 1416|
| C | 1874|
SELECT DBA.A, DBB.B
FROM DatabaseA DBA
INNER JOIN DatabaseB DBB on DBA.Symbol = DBB.Symbol
and DBA.Value != DBB.Value
group by DBA.Symbol, DBB.Symbol, DBB.Value
having SUM(DBB.Value) != DBA.Value
order by Symbol, Value
Edited to add ideal results
Table C
| SymbolB| ValueB| SymbolA | ValueA |
|--------|-------|---------|--------|
| C | 1874 | C | 1235 |
Wherever B adds up to A remove both. If they don't add, leave number inside result set

I will use CTE and use this common table expression (CTE) to search in Table A. Then join table A and table B on symbol.
WITH tDBB as (
SELECT DBB.Symbol, SUM(DBB.Value) as total
FROM tableB as DBB
GROUP BY DBB.Symbol
)
SELECT distinct DBB.Symbol as SymbolB, DBB.Value as ValueB, DBA.Symbol as SymbolA, DBA.Value as ValueA
FROM tableA as DBA
INNER JOIN tableB as DBB on DBA.Symbol = DBB.Symbol
WHERE DBA.Symbol in (Select Symbol from tDBB)
AND NOT DBA.Value in (Select total from tDBB)
Result:
|symbolB |valueB |SymbolA |ValueA |
|--------|-------|--------|-------|
| C | 1874 | C | 1235 |

with t3 as (
select symbol
,sum(value) as value
from t2
group by symbol
)
select *
from t3 join t on t.symbol = t3.symbol and t.value != t3.value
symbol
value
Symbol
Value
C
1874
C
1235
Fiddle

Related

SQL to search with a subsets of records

Is there any way to select all subsets from table A where corresponding subsets exist in table B? Each subset in table A must have at least all the entries that corresponding subset in table B.
In this link it's called "Division with a Remainder" but my problem is more complex because I've got many to many relation.
Table A
UserName text | File text | AccessLevel int
Table B
AppName text | File text | AccessLevel int
Sample data
Table A
User1 | aaa.txt | 1
User1 | bbb.txt | 3
User1 | ccc.txt | 1
User2 | aaa.txt | 3
Table B
Appl1 | aaa.txt | 1
Appl1 | bbb.txt | 1
Appl2 | aaa.txt | 1
Appl3 | bbb.txt | 5
Appl4 | aaa.txt | 1
Appl4 | bbb.txt | 1
Appl4 | ccc.txt | 1
Appl4 | ddd.txt | 1
Expected results:
User1 | Appl1
User1 | Appl2
User2 | Appl2
User1 has "complete" access to applications Appl1 and Appl2 because he has necessary access to ALL files used by these applications. He doesn't have access to application Appl3 because access level is not high enough. He doesn't have access to application Appl4 because he doesn't have access to file ddd.txt.
Basically I need to compare subsets of records and return all cases where subset in table B is equal or greater than subset in table A. Is there any way to do it in SQL?
Any help appreciated.
One method is a self-join and then comparing the number of matching records to the number needed for the application.
Assuming no duplicates:
select a.username, b.appname
from a join
(select b.*, count(*) over (partition by b.appname) as cnt
from b
) b
on a.filetext = b.filetext and
a.accesslevel >= b.accesslevel
group by a.username, b.appname, b.cnt
having count(*) = b.cnt
SELECT distinct A1.USERNAME, B1.APPLICATION
FROM TABLEUSER A1
INNER JOIN TABLEAPPLI B1 on B1.filename=A1.filename and B1.accesslevel<=A1.accesslevel
where not exists
(
select *
from TABLEUSER A2 INNER JOIN TABLEAPPLI B2 on B2.filename<>A2.filename or B2.accesslevel>A2.accesslevel
where (A1.USERNAME, B1.APPLICATIONAME) = (A2.USERNAME, B2.APPLICATIONAME)
)

Join two tables juxtaposing columns with same name sql

I have two sqlite3 tables with same column names and I want to compare them. To do that, I need to join the tables and juxtapose the columns with same name.
The tables share an identical column which I want to put as the first column.
Let's imagine I have table t1 and table t2
Table t1:
SharedColumn | Height | Weight
A | 2 | 70
B | 10 | 100
Table t2:
SharedColumn | Height | Weight
A | 5 | 25
B | 32 | 30
What I want get as a result of my query is :
SharedColumn | Height_1 | Height_2 | Weight_1 | Weight_2
A | 2 | 5 | 70 | 25
B | 10 | 32 | 100 | 30
In my real case i have a lot of columns so I would like to avoid writing each column name twice to specify the order.
Renaming the columns is not my main concern, what interests me the most is the juxtaposition of columns with same name.
There is no way to do that directly in SQL especially because you also want to rename the columns to identify their source, you'll have to use dynamic SQL and honestly? Don't! .
Simply write the columns names, most SQL tools provide a way to generate the select, just copy them and place them in the correct places :
SELECT t1.sharedColumn,t1.height as height_1,t2.height as height_2 ...
FROM t1
JOIN t2 ON(t1.sharedColumn = t2.sharedColumn)+
Try the following query to get the desired result!!
SELECT t1.Height AS Height_1, t1.Weight AS Weight_1, t1.sharedColumn AS SharedColumn
t2.Height AS Height_2, t2.Weight AS Weight_2
FROM t1 INNER JOIN t2
ON t1.sharedColumn = t2.sharedColumn
ORDER By t1.sharedColumn ASC
After that, you can fetch the result by following lines:
$result['SharedColumn'];
$result['Height_1'];
$result['Height_2'];
$result['Weight_1'];
$result['Weight_1'];

SQL/PostgreSQL: How to select limited amount of rows of different types based on limits stored in a different table?

I have a table (table 1) where the first column is the key and the second column contains elements of different types. In table 1, there's three types (type A, B, C) but the actual database have many more types.
Table.1. A minimal example.
_________________
| | |
|_KEY| attribute |
|____|___________|
|k1 | A |
|k2 | A |
|k3 | B |
|k4 | C |
|k5 | C |
|____|___________|
From table 1; I am interested in retrieving only a limited amount of elements from each type. The limited amount of elements of a given type is provided by table 2, in which the elements type is the key of the table (_element).
To clarify; The limited amount of elements of type A to obtain from table 1. in this minimal example is 1. Likewise, for type B it is 2 and for type C it is 1.
Table 2. Limits of item to obtain for each type in table 1.
____________________
| _Element | Limit |
|----------|-------|
| A | 1 |
| B | 2 |
| C | 1 |
|__________|_______|
Finally, the elements should be retrieved from table 1 from top to bottom.
Thanks for any help and/or pointers / gus.
P.S.
For the above minimal example, the expected output would be
___________________
| Key| Attribute |
|____|____________|
| k1 | A |
| k3 | B |
| K4 | C |
|____|____________|
Since there only exists 1 C attribute for this particular minimal example. Note that if there would have existed, say 5 elements of type C then the follow table would have been obtained instead (since the limited amount of C elements is 2)
___________________
| Key| Attribute |
|____|____________|
| k1 | A |
| k3 | B |
| K4 | C |
|_k5 | C |
|____|____________|
You can always do it with a union.
select top (SELECT Limit FROM Table2 WHERE _Element='A') * from Table1
WHERE attribute = A
UNION ALL
select top (SELECT Limit FROM Table2 WHERE _Element='B') * from Table1
WHERE attribute = B
UNION ALL
select top (SELECT Limit FROM Table2 WHERE _Element='C') * from Table1
WHERE attribute = C
Or using row_number:
with cte as (SELECT _Key,
attribute,
ROW_NUMBER() OVER (Partition by attribute Order by _Key ASC) as rowno
From Table1)
SELECT * FROM cte
LEFT JOIN Table2 on Table2.Element = Table1.attribute
WHERE rowno >= Limit
I truly like the power of PostgreSQL arrays. So
select
table2._element,
unnest((array_agg(table1._key order by table1._key desc)[1:table2.limit])) as _key
from
table1 join table2 on (table1.attribute = table2._element)
group by
table2._element, table2.limit
where in the second field of the query:
array_agg(table1._key order by table1._key desc) - collects values into array in the specified order (note that order by table1._key desc is just for example and you might to skip it or to specify another one),
(...)[1:table2.limit] - returns array elements from 1 to table2.limit,
unnest(...) - unwraps previous result to rows.

How to join table A to B to find all the entries in A whose intervals contain entries in B?

I am comparing two tables A and B of log entry intervals, i.e. each record in a table is the first log entry and then another, for a given user, like this:
+--------+----------+--------+-----------+--------+
| userID | date1 | logID1 | date2 | logID2 |
+--------+----------+--------+-----------+--------+
| 235 | 1/3/2013 | 45 | 1/7/2013 | 48 |
| 235 | 4/6/2013 | 64 | 4/12/2013 | 73 |
| 462 | 1/4/2013 | 40 | 1/16/2013 | 50 |
+--------+----------+--------+-----------+--------+
I want to build a join query that links every record in A to all the records in B based on userID, where A contains B in either dates:
a.date1<=b.date1, a.date2>=b.date2
or IDs:
a.logID1<=b.logID1, a.logID2>=b.logID2
I want to return all records in A, regardless of whether or not there is a contained interval in B.
At first glance it seems like this would work:
select * from a
left join b
on
a.userID=b.userID
where
(a.date1<=b.date1 or a.logID1<=b.logID1)
and
(a.date2>=b.date2 or a.logID2>=b.logID2)
or
b.userID is null
But the problem is that if there is a record in A that has a matching userID in B but the A record does not contain the B record, the join will occur but the record will be filtered out by the WHERE condition, so the A record will not appear in the results.
If I try to resolve this by moving the WHERE conditions to the JOIN clause as follows:
select * from a
left join b
on
a.userID=b.userID
and
(a.date1<=b.date1 or a.logID1<=b.logID1)
and
(a.date2>=b.date2 or a.logID2>=b.logID2)
then I get this error message:
JOIN expression not supported.
I assume this means Access can't have nested OR conditions in the JOIN criteria.
What can I do to return a list of A, joined to their contained B where applicable?
If you want all records in a, then put the condition in the on clause. where has strange effects. So try:
select *
from a left join
b
on a.userID = b.userID and
(a.date1<=b.date1 or a.logID1<=b.logID1)
(a.date2>=b.date2 or a.logID2>=b.logID2);

Select multiple (non-aggregate function) columns with GROUP BY

I am trying to select the max value from one column, while grouping by another non-unique id column which has multiple duplicate values. The original database looks something like:
mukey | comppct_r | name | type
65789 | 20 | a | 7n
65789 | 15 | b | 8m
65789 | 1 | c | 1o
65790 | 10 | a | 7n
65790 | 26 | b | 8m
65790 | 5 | c | 1o
...
This works just fine using:
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.mukey;
Which returns a table like:
mukey | ComponentPercent
65789 | 20
65790 | 26
65791 | 50
65792 | 90
I want to be able to add other columns in without affecting the GROUP BY function, to include columns like name and type into the output table like:
mukey | comppct_r | name | type
65789 | 20 | a | 7n
65790 | 26 | b | 8m
65791 | 50 | c | 7n
65792 | 90 | d | 7n
but it always outputs an error saying I need to use an aggregate function with select statement. How should I go about doing this?
You have yourself a greatest-n-per-group problem. This is one of the possible solutions:
select c.mukey, c.comppct_r, c.name, c.type
from c yt
inner join(
select c.mukey, max(c.comppct_r) comppct_r
from c
group by c.mukey
) ss on c.mukey = ss.mukey and c.comppct_r= ss.comppct_r
Another possible approach, same output:
select c1.*
from c c1
left outer join c c2
on (c1.mukey = c2.mukey and c1.comppct_r < c2.comppct_r)
where c2.mukey is null;
There's a comprehensive and explanatory answer on the topic here: SQL Select only rows with Max Value on a Column
Any non-aggregate column should be there in Group By clause .. why??
t1
x1 y1 z1
1 2 5
2 2 7
Now you are trying to write a query like:
select x1,y1,max(z1) from t1 group by y1;
Now this query will result only one row, but what should be the value of x1?? This is basically an undefined behaviour. To overcome this, SQL will error out this query.
Now, coming to the point, you can either chose aggregate function for x1 or you can add x1 to group by. Note that this all depends on your requirement.
If you want all rows with aggregation on z1 grouping by y1, you may use SubQ approach.
Select x1,y1,(select max(z1) from t1 where tt.y1=y1 group by y1)
from t1 tt;
This will produce a result like:
t1
x1 y1 max(z1)
1 2 7
2 2 7
Try using a virtual table as follows:
SELECT vt.*,c.name FROM(
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.muke;
) as VT, c
WHERE VT.mukey = c.mukey
You can't just add additional columns without adding them to the GROUP BY or applying an aggregate function. The reason for that is, that the values of a column can be different inside one group. For example, you could have two rows:
mukey | comppct_r | name | type
65789 | 20 | a | 7n
65789 | 20 | b | 9f
How should the aggregated group look like for the columns name and type?
If name and type is always the same inside a group, just add it to the GROUP BY clause:
SELECT c.mukey, Max(c.comppct_r) AS ComponentPercent
FROM c
GROUP BY c.muke, c.name, c.type;
Use a 'Having' clause
SELECT *
FROM c
GROUP BY c.mukey
HAVING c.comppct_r = Max(c.comppct_r);