Selecting records from Foreign Key table where multiple items match the condition - sql

I am getting data from a third party in form of below tables in MS Access 2000 file format
Papers and
PaperTags
Below is the sample data from these tables.
Papers table and sample data
+----+----------+
| ID | PaperID |
+----+----------+
| 1 | 658 |
| 2 | 659 |
| 3 | 660 |
| 4 | 661 |
| 5 | 662 |
| 6 | 663 |
| 7 | 664 |
+----+----------+
PaperTags table and sample data
+----+----------+----------------------------------------+
| ID | PaperID | TagID |
+----+----------+----------------------------------------+
| 1 | 663 | 3 |
| 2 | 663 | 15 --Y |
| 3 | 663 | 17 |
| 4 | 663 | 18 --Y |
| 5 | 664 | 14 |
| 62 | 658 | 9 |
| 63 | 658 | 14 |
| 64 | 658 | 17 |
| 65 | 659 | 15 --Y |
| 66 | 659 | 17 |
| 67 | 659 | 18 --Y |
| 68 | 660 | 17 |
| 69 | 660 | 18 --N as it has only 18 and not 15 |
| 70 | 661 | 10 |
| 71 | 661 | 17 |
| 72 | 661 | 18 --N as it has only 18 and not 15 |
| 73 | 662 | 18 --N as it has only 18 and not 15 |
| 74 | 662 | 14 |
| 75 | 662 | 17 |
| 76 | 662 | 18 --N as it has only 18 and not 15 |
+----+----------+----------------------------------------+
Now my end user will pass one or more TagIDs for example 15 and 18 my goal is to find all the PaperIDs which have all of these TagIDs. In these example I need to return 663 and 659
I have tried below query but if there is any glitch in data then it doesn't work. For example PaperID 662 appears twice in the table with the same TagID so count(PaperID) = 2 turns out to be true and it will end up in my result.
select Count(PaperID), PaperID from PaperTags
group by TagID, PaperID
having TagID = 15 or TagID = 18
and count(PaperID) = 2
The other query that I tried is
select * from Papers
where Papers.PaperID
in
(
select PaperTags.PaperID from PaperTags
where (PaperTags.Tagid = 15 or PaperTags.Tagid = 18)
and PaperTags.PaperID = Papers.PaperID
)
I have gone thru below article but as I am using MSAccess I can't use this approach.
Select records from a table where all other records with same foreign key have a certain value
I think there has to be better way to filter. Any help is much appreciated.

Something like this:
SELECT G.ContentID
FROM (
SELECT PT.ContentID, PT.TagID
FROM PaperTags AS PT
WHERE PT.TagID IN (15, 18)
GROUP BY PT.ContentID, PT.TagID
) AS G
GROUP BY G.ContentId
HAVING Count(*) = 2

Related

How to reshape a table having multiple records for the same id into a table with one record per id without losing information?

Basically, I want to transform this(Initial) into this(Final). In other words, I want to
"squash" the initial table so that it will have only one record per id
"dilate" the initial table so that I won't lose any information: create a different column for every possible combination of source and column from the initial table (create c1_A, c1_B, ...).
I can work with the initial table as a csv in Python (maybe Pandas) and manually hardcode the mapping between the Initial and the Final table. However, I don't find this solution elegant at all and I'm much more interested in a sql / sas solution. Is there any way of doing that?
Edit: I what to change
+----+--------+------+-----+------+
| ID | source | c1 | c2 | c3 |
+----+--------+------+-----+------+
| 1 | A | 432 | 56 | 1 |
| 1 | B | 53 | 3 | 73 |
| 1 | C | 7 | 342 | 83 |
| 1 | D | 543 | 43 | 73 |
| 2 | A | 8 | 882 | 39 |
| 2 | B | 5 | 54 | 46 |
| 2 | C | 8 | 3 | 2226 |
| 2 | D | 87 | 2 | 45 |
| 3 | A | 93 | 143 | 45 |
| 3 | B | 1023 | 72 | 8 |
| 3 | C | 3 | 3 | 704 |
| 4 | A | 2 | 5 | 0 |
| 4 | B | 78 | 888 | 2 |
| 4 | C | 87 | 23 | 34 |
| 4 | D | 112 | 7 | 712 |
+----+--------+------+-----+------+
into
+----+------+------+------+------+------+------+------+------+------+------+------+------+
| ID | c1_A | c1_B | c1_C | c1_D | c2_A | c2_B | c2_C | c2_D | c3_A | c3_B | c3_C | c3_D |
+----+------+------+------+------+------+------+------+------+------+------+------+------+
| 1 | 432 | 53 | 7 | 543 | 56 | 3 | 342 | 43 | 1 | 73 | 83 | 73 |
| 2 | 8 | 5 | 8 | 87 | 882 | 54 | 3 | 2 | 39 | 46 | 2226 | 45 |
| 3 | 93 | 1023 | 3 | | 143 | 72 | 3 | | 45 | 8 | 704 | |
| 4 | 2 | 78 | 87 | 112 | 5 | 888 | 23 | 7 | 0 | 2 | 34 | 712 |
+----+------+------+------+------+------+------+------+------+------+------+------+------+
Abandon hope ... ?
data want;
input
ID source $ c1 c2 c3;datalines;
1 A 432 56 1
1 B 53 3 73
1 C 7 342 83
1 D 543 43 73
2 A 8 882 39
2 B 5 54 46
2 C 8 3 2226
2 D 87 2 45
3 A 93 143 45
3 B 1023 72 8
3 C 3 3 704
4 A 2 5 0
4 B 78 888 2
4 C 87 23 34
4 D 112 7 712
;
* one to grow you oh data;
proc transpose data=want out=stage1;
by id source;
var c1-c3;
run;
* and one to shrink;
proc transpose data=stage1 out=want(drop=_name_) delim=_;
by id;
id _name_ source;
run;

Get RMSE score while fetching data from the Table directly.Write a query for that

I have a table in the Database which has many features each feature is having its own actual and predicted value in its and we have two more column which is Id_partner and Id_accounts.My main goal is to get the RMSE score for each feature for each accounts in each partners, I have done that with the for loop but it is taking hell lot of time to complete in PySpark is there an efficient way of doing that directly with the help of query while reading the data only so I get the RMSE score for each accounts in each partner.
My Table is something like this
Actual_Feature_1 = Act_F_1
Predicted_Feature_1 = Pred_F_1
Actual_Feature_1 = Act_F_2
Predicted_Feature_1 = Pred_F_2
Table 1:
ID_PARTNER | ID_ACCOUNT | Act_F_1 | Pred_F_1 | Act_F_2 | Pred_F_2 |
4 | 24 | 10 | 12 | 22 | 20 |
4 | 24 | 11 | 13 | 23 | 21 |
4 | 24 | 11 | 12 | 24 | 23 |
4 | 25 | 13 | 15 | 22 | 20 |
4 | 25 | 15 | 12 | 21 | 20 |
4 | 25 | 15 | 14 | 21 | 21 |
4 | 27 | 13 | 12 | 35 | 32 |
4 | 27 | 12 | 16 | 34 | 31 |
4 | 27 | 17 | 14 | 36 | 34 |
5 | 301 | 19 | 17 | 56 | 54 |
5 | 301 | 21 | 20 | 58 | 54 |
5 | 301 | 22 | 19 | 59 | 57 |
5 | 301 | 24 | 22 | 46 | 50 |
5 | 301 | 25 | 22 | 49 | 54 |
5 | 350 | 12 | 10 | 67 | 66 |
5 | 350 | 12 | 11 | 65 | 64 |
5 | 350 | 14 | 13 | 68 | 67 |
5 | 350 | 15 | 12 | 61 | 61 |
5 | 350 | 12 | 10 | 63 | 60 |
7 | 420 | 51 | 49 | 30 | 29 |
7 | 420 | 51 | 48 | 32 | 30 |
7 | 410 | 49 | 45 | 81 | 79 |
7 | 410 | 48 | 44 | 83 | 80 |
7 | 410 | 45 | 43 | 84 | 81 |
I need the RMSE score for each account in each partners in this format
Resulted Table :
ID_PARTNER | ID_ACCOUNT | FEATURE_1 | FEATURE_2 |
4 | 24 | rmse_score | rmse_score |
4 | 25 | rmse_score | rmse_score |
4 | 27 | rmse_score | rmse_score |
5 | 301 | rmse_score | rmse_score |
5 | 350 | rmse_score | rmse_score |
7 | 420 | rmse_score | rmse_score |
7 | 410 | rmse_score | rmse_score |
Note : For this we need to do consideration of both id_account and id_partner by seeing the above table i.e actual table we see that id_accounts can be just used for getting rmse but different id_partner can have the same accounts as other partner is having.
I need an SQL query that provides the resulted table directly while reading the table from the database.
Yes, you can calculate the root-mean-square-error in SQL.
SELECT ID_PARTNER, ID_ACCOUNT
, SQRT(Avg( POWER(Act_F_1 - Pred_F_1 , 2) ) ) as feature_1_rmse
FROM ...
GROUP BY ID_PARTNER, ID_ACCOUNT

CTE - recursive query doing too much

I have the current table of data...
| LoanRollupID | NewLoanID | PreviousLoanID |
|--------------|-----------|----------------|
| 11 | 76 | 44 |
| 12 | 80 | 75 |
| 13 | 83 | 82 |
| 14 | 84 | 83 |
| 15 | 86 | 85 |
| 16 | 87 | 54 |
| 17 | 88 | 87 |
| 18 | 90 | 48 |
| 19 | 91 | 34 |
| 20 | 93 | 41 |
| 21 | 94 | 76 |
| 22 | 95 | 90 |
| 23 | 96 | 94 |
| 24 | 100 | 92 |
| 25 | 101 | 99 |
| 26 | 102 | 98 |
| 27 | 103 | 101 |
| 28 | 104 | 81 |
| 29 | 105 | 80 |
| 30 | 107 | 52 |
| 31 | 110 | 108 |
| 1029 | 1105 | 103 |
| 1030 | 1106 | 104 |
| 1031 | 1108 | 1106 |
| 1032 | 1109 | 73 |
I'm trying to jump in at NewLoanID 1108 and see how it has evolved from previous Loans. e.g 1108 came from 1106, which came from 104, which came from 81, etc.
When I run this query:
WITH OldLoans (PreviousLoanID, NewLoanID, start)
AS
(
---- Anchor member definition
SELECT l.NewLoanID, l.PreviousLoanID, 0 as start
FROM dscs_public.LoanRollup l
Where NewLoanID = 1108
UNION ALL
-- Recursive member definition
SELECT l.NewLoanID, l.PreviousLoanID, start + 1
FROM dscs_public.LoanRollup l
INNER JOIN OldLoans AS o
ON o.NewLoanID = l.PreviousLoanID
)
---- Statement that executes the CTE
SELECT PreviousLoanID, NewLoanID, start
FROM OldLoans
It fails with this error:
The statement terminated. The maximum recursion 100 has been exhausted
before statement completion.
Can anyone spot my mistake please?
Thanks.
The aliases in the CTE definition are in the wrong order:
-- Instead of (PreviousLoanID, NewLoanID, start)
WITH OldLoans (NewLoanID, PreviousLoanID, start)
AS
(
---- Anchor member definition
SELECT l.NewLoanID, l.PreviousLoanID, 0 as start
FROM mytable l --LoanRollup l
Where NewLoanID = 1108
UNION ALL
-- Recursive member definition
SELECT l.NewLoanID, l.PreviousLoanID, start + 1
FROM mytable l --dscs_public.LoanRollup l
INNER JOIN OldLoans AS o
-- Instead of o.NewLoanID = l.PreviousLoanID
ON l.NewLoanID = o.PreviousLoanID
)
---- Statement that executes the CTE
SELECT PreviousLoanID, NewLoanID, start
FROM OldLoans
The same thing holds for the ON clause in the recursive member definition.

Select rows with greatest value

I have a MS Access query called qryA380 that uses multiple INNER JOIN to join a couple of tables.
Running the query will show the results like this:
+----+-----------+----------+------------+
| ID | Aircraft | Route.ID | Passengers |
+----+-----------+----------+------------+
| 23 | A-380 | 1 | 556 |
| 2 | A-380 | 2 | 652 |
| 54 | A-380 | 2 | 489 |
| 16 | A-380 | 1 | 598 |
| 39 | A-380 | 1 | 627 |
| 45 | A-380 | 3 | 392 |
| 74 | A-380 | 3 | 726 |
+----+-----------+----------+------------+
My plan is to select the smallest Route.ID (in this case it's 1) and the final result should be:
+----+-----------+----------+------------+
| ID | Aircraft | MinRoute | Passengers |
+----+-----------+----------+------------+
| 23 | A-380 | 1 | 556 |
| 16 | A-380 | 1 | 598 |
| 39 | A-380 | 1 | 627 |
+----+-----------+----------+------------+
I thought this would be straight forward and simple. To save some time, I create a second query to do this work:
SELECT [qryA380].ID, [qryA380].Aircraft, MIN([qryA380].Route.ID) AS MinRoute, [qryA380].Passengers
FROM [qryA380]
GROUP BY [qryA380].ID, [qryA380].Aircraft, [qryA380].Passengers
But I kept getting a table identical with the table generated by qryA380. It has all the Route.ID on the results.
The Passenger and ID column should be excluded since they have unique values. By using a Subquery, I'm now able to generate the desired results:
SELECT [qryA380].*
FROM (
SELECT MIN([qryA380].Route.ID) AS MinRoute
FROM [qryA380]
) tblMinRoute
INNER JOIN [qryA380]
ON [qryA380].Route.ID = tblMinRoute.MinRoute
Try this
SELECT [qryA380].*
FROM [qryA380]
WHERE [qryA380].Route.ID = (
SELECT min(Route.ID)
FROM [qryA380]
)

How to SUM from MySQL for every n record

I have a following result from query:
+---------------+------+------+------+------+------+------+------+-------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total |
+---------------+------+------+------+------+------+------+------+-------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 |
+---------------+------+------+------+------+------+------+------+-------+
I would like to insert a SUM before enter different order_main_id, it would be like this result:
+---------------+------+------+------+------+------+------+------+-------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total |
+---------------+------+------+------+------+------+------+------+-------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 |
| | 450 | 853 | 1107 | 1098 | 796 | 423 | 172 | 4899 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 |
| | 50 | 70 | 70 | 70 | 40 | NULL | NULL | 300 |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 |
| | 107 | 144 | 144 | 70 | 35 | NULL | NULL | 500 |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 |
| | 21 | 45 | 51 | 41 | 21 | 3 | NULL | 182 |
+---------------+------+------+------+------+------+------+------+-------+
How to make this possible ?
You'll need to write a second Query which makes use of GROUP BY order_main_id.
Something like:
SELECT sum(S41+...) FROM yourTable GROUP BY orderMainId
K
You can actually do this in one query, but with a union all (really two queries, but the result sets are combined to make one awesome result set):
select
order_main_id,
S36,
S37,
S38,
S39,
S40,
S41,
S42,
S36 + S37 + S38 + S39 + S40 + S41 + S42 as total,
'Detail' as rowtype
from
tblA
union all
select
order_main_id,
sum(S36),
sum(S37),
sum(S38),
sum(S39),
sum(S40),
sum(S41),
sum(S42),
sum(S36 + S37 + S38 + S39 + S40 + S41 + S42),
'Summary' as rowtype
from
tblA
group by
order_main_id
order by
order_main_id, RowType
Remember that the order by affects the entirety of the union all, not just the last query. So, your resultset would look like this:
+---------------+------+------+------+------+------+------+------+-------+---------+
| order_main_id | S36 | S37 | S38 | S39 | S40 | S41 | S42 | total | rowtype |
+---------------+------+------+------+------+------+------+------+-------+---------+
| 26 | 127 | 247 | 335 | 333 | 223 | 111 | 18 | 1394 | Detail |
| 26 | 323 | 606 | 772 | 765 | 573 | 312 | 154 | 3505 | Detail |
| 26 | 450 | 853 | 1107 | 1098 | 796 | 423 | 172 | 4899 | Summary |
| 35 | 11 | 20 | 21 | 18 | 9 | 2 | NULL | 81 | Detail |
| 35 | 10 | 25 | 30 | 23 | 12 | 1 | NULL | 101 | Detail |
| 35 | 21 | 45 | 51 | 41 | 21 | 3 | NULL | 182 | Summary |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 | Detail |
| 38 | 25 | 35 | 35 | 35 | 20 | NULL | NULL | 150 | Detail |
| 38 | 50 | 70 | 70 | 70 | 40 | NULL | NULL | 300 | Summary |
| 39 | 65 | 86 | 86 | 42 | 21 | NULL | NULL | 300 | Detail |
| 39 | 42 | 58 | 58 | 28 | 14 | NULL | NULL | 200 | Detail |
| 39 | 107 | 144 | 144 | 70 | 35 | NULL | NULL | 500 | Summary |
+---------------+------+------+------+------+------+------+------+-------+---------+
This way, you know what is and what isn't a detail or summary row, and the order_main_id that it's for. You could always (and probably should) hide this column in your presentation layer.
For things like these I think you should use a reporting library(such as Crystal Reports), it'll save you a lot of trouble, check JasperReports and similar projects on osalt