Relational Algebra operations : assignment and grader - sql

assume I have 4 assignment with assignmentID:A1,A2,A3,A4 in a assignment table and following table:
GroupID GraderName assigmentID
1 TA1 A1
2 TA2 A2
3 TA1 A4
4 TA1 A3
5 TA1 A1
6 TA2 A4
7 TA3 A3
8 TA3 A2
9 TA3 A1
10 TA2 A1
11 TA1 A2
Report name of the grader that mark at least one group for every assignment.
From my table, it should report TA1.
TA2 didnt mark any A3 and TA3 didnt mark any A4 thus ignore them.
Can someone suggest a approach using relational algebra operator such as cross join , natural join, self join, etc....

From the relational algebra perspective I would suggest to find all assignments first: π_assignmentID(assignment). Then, to answer your question you should use division, ÷, i.e., the query should be
π_{GraderName,assignmentID}(assignment) ÷ π_assignmentID(assignment)
If for whatever reason you do not like using division you can always replace by a repeated difference.

Related

Compare two tables and display selected value from 2nd table

I'm trying to match the 3rd column and 2nd column on two table. In below example, I need to get the PROGRAM from the second table and output it using `AWK. Common between the two table is the TESTER.
below is my code, not working . pls help fix
awk -F, 'NR==FNR{a[$1]=$8;next;}{print $0,a[$3]?a[$2]:"N/A"}' OFS=, table1 table2
Table1:
Date Time TESTER Niche SMS_NO TEST_AREA SCREEN_TYPE PROGRAM
4/23/2019 8:40:42 A1 Nxx S11 TA1 ST1 PGM1
4/23/2019 7:34:08 B1 Nx1 S21 TA2 ST2 PGM2
4/23/2019 3:16:24 C1 Nx2 S31 TA3 ST3 PGM3
4/23/2019 6:22:04 D1 Nx3 S41 TA4 ST4 PGM4
4/23/2019 8:55:19 E1 Nx4 S51 TA5 ST5 PGM5
7/22/2018 17:30:37 F1 Nx5 S61 TA6 ST6 PGM6
Table2:
FEATURE TESTER LICENSE_USED
FEA1 A1 4
FEA2 B1 16
FEA3 C1 16
FEA4 D1 16
FEA5 E1 16
FEA6 F1 16
FEA7 G1 16
FEA8 G2 16
Expected output:
FEATURE TESTER LICENSE_USED PROGRAM
FEA1 A1 4 PGM1
FEA2 B1 16 PGM2
FEA3 C1 16 PGM3
FEA4 D1 16 PGM4
FEA5 E1 16 PGM5
FEA6 F1 16 PGM6
FEA7 G1 16 N/A
FEA8 G2 16 N/A
Please check this:
awk 'NR==FNR {a[$3]=$8; next} {print $0 FS (a[$2]?a[$2]:"N/A")}' file1.txt file2.txt
File1.txt
Date Time TESTER Niche SMS_NO TEST_AREA SCREEN_TYPE PROGRAM
4/23/2019 8:40:42 A1 Nxx S11 TA1 ST1 PGM1
4/23/2019 7:34:08 B1 Nx1 S21 TA2 ST2 PGM2
4/23/2019 3:16:24 C1 Nx2 S31 TA3 ST3 PGM3
4/23/2019 6:22:04 D1 Nx3 S41 TA4 ST4 PGM4
4/23/2019 8:55:19 E1 Nx4 S51 TA5 ST5 PGM5
7/22/2018 17:30:37 F1 Nx5 S61 TA6 ST6 PGM6
File2.txt
FEATURE TESTER LICENSE_USED
FEA1 A1 4
FEA2 B1 16
FEA3 C1 16
FEA4 D1 16
FEA5 E1 16
FEA6 F1 16
FEA7 G1 16
FEA8 G2 16
Output:
FEATURE TESTER LICENSE_USED PROGRAM
FEA1 A1 4 PGM1
FEA2 B1 16 PGM2
FEA3 C1 16 PGM3
FEA4 D1 16 PGM4
FEA5 E1 16 PGM5
FEA6 F1 16 PGM6
FEA7 G1 16 N/A
FEA8 G2 16 N/A
tried on gnu awk
awk 'NR==FNR{a[$3]=$8;next} {$4=a[$2];if($4=="") $4="N/A";print}' Table1 Table2

Adding Two Columns when a Third Column is Equal

I am doing a left join query to compare the two tables for any values that do not equal each other on f9 and sumoff6 columns ONLY if the F1 columns are the same. If they are different i would like to subtract them.
The results I am getting show equal values that are already on both tables. I need for f1 columns to match on both tables but if their values on the sumoff6 and f9 columns do not match then display them and subtract them. F1 is unique on both tables. Keeep in mind though that there may be more rows in the statement table. I am using MS Access SQL view.
Query
SELECT statement.f1, statement.f9
FROM statement
LEFT JOIN allocation_final ON statement.[f1] = allocation_final[f1]
WHERE [allocation_final].sumoff6 <> statement.f9
Statement table:
f1 f9
-----------------
1 135.58
2 166.30
3 40.22
4 86.46
5 170.33
6 96.40
allocation_final:
f1 SumOff6
--------------
1 135.58
2 166.30
3 40.00
4 86.46
5 170.33
6 40.22
7 22.40
8 70.00
9 96.40
10 50.00
Results
f1 f9
--------------
1 135.58
2 166.3
4 86.46
5 170.33
Update:
The result i want is if f1 = f3 and f3 <> sumoff6 then display the output. As you see below for example it still brings up the result if its the same. Look at the first row, which should not be there because f9 = sumoff6.
query:
SELECT statement.f1, statement.f9, allocation_2.[f3], allocation_2.sumoff6
FROM allocation_2 LEFT JOIN statement ON allocation_2.[f3]=statement.f1
WHERE statement.f9 <> allocation_2.sumoff6
GROUP BY statement.f1, statement.f9, allocation_2.[f3], allocation_2.sumoff6
ORDER BY statement.f1;
Output:
f1 f9 f3 sumoff6
--------------------------------------
123456789 135.58 123456789 135.58
111111111 166.3 111111111 66.3
222222222 86.46 222222222 86.46
333333333 170.33 333333333 170.33
444444444 135.58 444444444 35.58
555555555 125.74 555555555 125.74
666666666 73.49 666666666 23.49
777777777 187.99 777777777 87.99
I think your first query is basically correct. The problem is probably that numbers look the same but are really different Does this fix the problem?
SELECT statement.f1, statement.f9
FROM statement INNER JOIN
allocation_final
ON statement.[f1] = allocation_final[f1]
WHERE ABS([allocation_final].sumoff6 - statement.f9) < 0.01

Why is T-SQL Returning Duplicate Rows, Join Issue I Believe

The code as written below returns the appropriate customers, lockers, units and balance. However, when I add in the commented-out code it reiterates each customer's data for each club even though each customer can only be a member of one club.
USE db1
GO
SELECT [db1].[dbo].[Customer].[CustomerNumber] AS 'Customer No.'
-- ,A. ClubID AS 'Club ID No.'
,(SELECT CONCAT (SI.Locker, '-', SI.Frequency)) AS Locker
,SI.Unit AS Unit
--,[db2].[dbo].[vueClub].Club_aka AS Club
,[db1].[dbo].[Customer_Balance].[CurrentBalance]
FROM [db1].[dbo].[Customer_Balance]
JOIN [db1].[dbo].[Customer]
ON [db1].[dbo].[Customer_Balance].POSCusNo = Customer.CustomerNumber
JOIN [SQLSrv01].[ db3].[dbo].[md_Table_1] AS D
ON D.Contract_no = [db1].[dbo].[Customer_Balance]. POSCusNo
JOIN [SQLSrv01].[ db2].[dbo].[vueSoldLockers] AS SI
ON SI.CustomerID = [db1].[dbo].[Customer].CustomerID
--JOIN [db2].[dbo].[vueClub] AS A
--ON [db1].[dbo].[Customer].SiteID = A.SiteID
WHERE [db1].[dbo].[Customer_Balance].StatusCode = '1234'
ORDER BY Customer.CustomerNumber ASC
So if I run it as is I get:
Customer No. Locker Unit Current Balance
1 315 A1 456.00
2 316 A3 1204.70
3 317 B2 335.60
4 318 B4 1500.30
But if I include the commented-out code I get:
Customer No. Club ID No Locker Unit Club Current Balance
1 4 315 A1 Tigers 456.00
1 3 315 A1 Lions 456.00
2 4 316 A3 Tigers 1204.70
2 3 316 A3 Lions 1204.70
3 4 317 B2 Tigers 335.60
3 3 317 B2 Lions 335.60
4 4 318 B4 Tigers 1500.30
4 3 318 B4 Lions 1500.30
Is it because I don't have the JOIN set up properly?
Customer No. Club ID No Locker Unit Club Current Balance
1 4 315 A1 Tigers 456.00
1 3 315 A1 Lions 456.00
You are joining customer to vueClub on SiteID. Looks like the site customer 1 is in, has 2 clubs (3, 4)

Join on second table if value not found in first table

I would like to join on a second table only if the results of the first join are blank. Below is a subsection of Table A data:
ID Metro Submarket
1 NYC Manhattan
2 NYC Brooklyn
3 NYC Queens
4 NYC Bronx
5 NYC Newark
The tables I'm using for the joins are:
Table B Table C
Metro Submarket A.Price B.Price C.Price Metro A.Price B.Price C.Price
NYC Manhattan 54 32 48 NYC 50 49 69
NYC Queens 35 39 59 Philly 49 48 37
NYC Brooklyn 20 49 58 Chicago 20 48 36
NYC Bronx 49 30 20
NYC Newark 49 50 -
I'm adding the Price columns from Table B to Table A based on a Metro and Submarket match. However, Table B doesn't have all the prices. If I can't find a match in Table B then I want to look into Table C for a match only on Metro.
For ID 5, we can find the A and B prices in Table B. However, the C price is blank. In that case, I want it to retrieve the C price from Table C (69 is what it would choose).
I'm using SAS 9.4. SQL, macros, or anything else SAS can handle is welcome!
You can left join both tables to the main table and simply use COALESCE(). This will give you the value if present in Table B, otherwise it will give you the value in Table C:
PROC SQL;
CREATE TABLE Output AS
SELECT
ta.ID,
ta.Metro,
ta.Submarket,
COALESCE(tb.A_Price,tc.A_Price) AS A_Price,
COALESCE(tb.B_Price,tc.B_Price) AS B_Price,
COALESCE(tb.C_Price,tc.C_Price) AS C_Price
FROM
tablea ta
LEFT JOIN
tableb tb
ON (tb.Metro = ta.Metro)
AND (tb.Submarket = ta.Submarket)
LEFT JOIN
tablec tc
ON (tc.Metro = ta.Metro);
QUIT;

joining two narrow format tables

I have scenario where i have got tables (in propriety datastore) with thousands of columns. The tables before being exported for querying is transformed to narrow format (http://en.wikipedia.org/wiki/Wide_and_Narrow_Data).
I am developing a query executor. The input to this query executor is the narrow tables not the original tables. I want to perform joins on two similar narrow tables, but cannot figure out the exact general logic behind it.
For example lets say we have two table R and S in the original format(wide format)
Table R
C1 C2 C3 R1 R2 R3
5 6 7 1234 4552 12532
5 6 8 4512 21523 434
15 16 17 1254 1212 3576
Table S
C1 C2 C3 S1 S2 S3
5 6 7 5412 35112 3512
5 6 8 125393 1523 6749
15 16 17 74397 4311 1153
C1, C2, C3 are the common columns between the tables.
The narrow table for table R is
C1 C2 C3 Key Value
5 6 7 R1 1234
R2 4552
R3 12532
5 6 8 R1 4512
R2 21523
R3 434
15 16 17 R1 1254
R2 1212
R3 3576
The narrow table for table S is
C1 C2 C3 Key Value
5 6 7 S1 5412
S2 35112
S3 3512
5 6 8 S1 125393
S2 1523
S3 6749
15 16 17 S1 74397
S2 4311
S3 1153
Now when i join the original table R and S (on C1, C2 and C3) i get the result
C1 C2 C3 R1 R2 R3 S1 S2 S3
5 6 7 1234 4552 12532 5412 35112 3512
5 6 8 4512 21523 434 125393 1523 6749
15 16 17 1254 1212 3576 74397 4311 1153
Whose narrow format is
C1 C2 C3 Key Value
5 6 7 R1 1234
R2 4552
R3 12532
S1 5412
S2 35112
S3 3512
5 6 8 R1 4512
R2 21523
R3 434
S1 125393
S2 1523
S3 6749
15 16 17 R1 1254
R2 1212
R3 3576
S1 74397
S2 4311
S3 1153
How can i get the above table by just joining the narrow tables (on the common columns) that i got as input.
If you use normal tabular join (natural joing, outer join etc) between the two narrow tables you will get an exploded table because each key on table R gets multiplied with all the keys in table S.
I am not using SQL, or postgres or any database system. I am looking for the answer in terms of algorithms or relational algebraic expressions.
You're looking for the set union operator: A∪B is defined as the set of all tuples that appear in A, B or both, supposing the two relations have the same schema. The narrow tables all have the same schema (id, key, value), so they're perfectly union compatible.
And I have proof:
Suppose we have relations A(id, val1, val2 ... val_n) and B(id, val_n+1 ... val_n+m). We will also need a relation holding our variable names V(variable) = {('val1'), ('val2') ... ('val_n+m')}. The narrow-format equivalent of A is A'(id, variable, value), which we can construct like this:
That is, for each value we project A to (id, val_i), rename val_i to "value", put the variable name in the table (by taking the cross product with a single tuple in V); then we take the union of all these relations. Let us also construct B'(id, variable, value) in a similar fashion.
The natural join can be defined using only primitives:
Therefore we can construct (A ⋈ B)' like this (having combined the projections):
Let's apply the projection earlier:
But a val_i can only appear in A or B, not both, making one term of the cross product zero half of the time so this can be reduced and re-ordered into
which is exactly A' U B'.
So, we have shown that (A ⋈ B)' = A' U B', that is, the narrow format of the joined tables is the union of the narrow format tables.