OPENEDGE (ODBC) Joining without Duplicates on ID - sql

trying to query 3 tables without duplicates on the Object_ID
If this is not possible, order for both Object_ID, so the duplicates of both Object_IDs are among themselves, would also do the trick for me. But this is not really working for me, since Im only able to order by one Object_ID, so the duplicates are among themselves.
Tables:
S_Anl
Ktr Anl
4711 1234
4711 5678
4711 9000
AB_Erg
Anl AB_Erg_Obj Value
1234 c9d91f 1000
1234 696bfc 2000
1234 8c9915 3000
5678 141a65 4000
E_BP
Anl E_BP_Obj Value
1234 99f75ab 500
1234 720e573 100
9000 830614c 50
9000 958ac28 200
Query
SELECT B.AB_Erg_Obj, C.E_BP_Obj, A.Anl, B.Value, C.Value
FROM PUB.S_Anl AS A
LEFT JOIN PUB.AB_Erg AS B ON A.Anl = B.Anl
LEFT JOIN PUB.E_BP AS C ON A.Anl = C.Anl
WHERE A.Ktr = '4711'
ORDER BY A.Anl, B.AB_Erg_Obj, C.E_BP_Obj
with (nolock)
Expected Result
Anl AB_Erg_Obj E_BP_Obj Value Value
1234 c9d91f 99f75ab 1000 500
1234 696bfc 720e573 2000 100
1234 8c9915 NULL 3000 NULL
5678 141a65 NULL 4000 NULL
9000 830614c 830614c NULL 50
9000 958ac28 958ac28 NULL 200
Or Ordering AB_Erg_Obj and E_BP_Obj among themselves.
Is either of this possible?
//EDIT:
I know that ordering wouldnt remove duplicates in the result set, but it would be easier to do it afterwards.
Also its not necassary that the data is matched exactly on row-level, I just need the overall sum of Value from E_BP and Value of AB_Erg in the first place - because of that exact matching on row-level is not needed, just no duplicates on the Object_ID-Level

Related

hive - Duplicate counts check associated from one to another column

I have a table with and trying to fetch counts of distinct uniqueness from across a column by comparing to another column and the data is across millions to billions for each TMKEY partitioned column
ID TNUM TMKEY
23455 ABCD 1001
23456 ABCD 1001
23455 ABCD 1001
112233 BCDE 1001
113322 BCDE 1001
9009 DDEE 1001
9009 DDEE 1001
1009 FFGG 1001
Looking for desired output:
total_distinct_tNUM_count count_of_TNUM_which_has_more_than_disintct_ID TMKEY
4 2 1001
Here when TNUM is DDEE, the ID is fetching 9009 which has duplicates shouldn't be picked up when calculating the count of TNUM which has more than distinct ID. All I'm looking in here is get group concat counts. Any suggestions please. As I have data with more than 3 billion to 4 billions my approach is completely different and stuck.
select a.tnum,a.group_id,a.time_week from (SELECT time_week,tnum,count(*) as num_of_rows, concat_ws('|' , collect_set(id)) as group_id from source_table_test1 where time_week=1001 group by tnum,time_week) as a where length(a.group_id)>16 and num_of_rows>1

Presto left join SQL statement to return first match only

I know this question has been asked many times before, but for the life of me I am unable to figure this out.
I'm essentially trying to do a Left Join statement that is matching a Purchase Order number in table 1 with a Purchase order number in table 2. The issue is, since PO data is updated daily, it will return numerous rows from table 2 because there are different scenarios of a PO not being paid, canceled, etc.
Where I am running into issues is, my left join statement is returning multiple rows when joining against table 2.
Table 1 = T1
POnum
Description
12345
I need help
54321
I need help
78910
I need help
Table 2 = T2
POnum
Date
Vendor
12345
1/2/21
ABC
12345
1/2/21
ABC
12345
1/2/21
ABC
54321
1/1/21
CBD
54321
1/1/21
CBD
54321
1/1/21
CBD
78910
1/5/21
GED
78910
1/5/21
GED
78910
1/5/21
GED
Here is the code that I am using:
Select
t1.POnum, T2.Vendor
From
Table 1 as T1
Left Join
Table 2 as T2 On T1.POnum = T2.POnum
As you will note, in Table 2 the PO number 12345 has 3 rows with the same exact date.
The ending result should essentially look like the following:
T1.POnum
T2.Vendor
12345
ABC
54321
CBD
78910
GED
where in query you are using the date from Table T2 ?
you have not used any where clause also in query, it means no condition is required.
The result you need can be achieved directly without any join.
SELECT DISTINCT PONUM, VENDOR FROM Table2;
Let me know in case anything is missing here.

SQL Prevent Multiple Counting

I found some similar SO questions regarding my issue and they fixed it by using sub-queries but I can't seem to apply it on my situation.
Goal
My goal is to count the animals that have been a breeder at least once in their lifespan.
I have 2 tables to keep track of when an animal became a breeder. Here's a simple look of how the tables are structured:
Animals (a)
id name
-------------------
100 Mouse
101 Cow
102 Pig
103 Dog
Breeding History (bh)
id animal_id code date
--------------------------------------------
500 100 B 2016-01-12
501 100 A 2016-01-25
502 101 B 2016-01-28
503 102 B 2016-02-02
504 100 B 2016-02-05
505 100 A 2016-02-08
In this scenario, my current query for counting works fine for both 101 | Cow and 102 | Pig since they only became a breeder (Code: B) once. The count for an animal who never became a breeder is also correct but it's not really a problem here. For an animal that became a breeder more than once in its lifespan e.g. 100 | Mouse it would be counted by the number of times it became a breeder.
Query
SELECT
a.name,
COUNT(CASE WHEN bh.code IN ('B') THEN 1 ELSE NULL END) AS breeder_count
FROM animals a
LEFT OUTER JOIN breeding_history bh
ON a.id = bh.animal_id
GROUP BY a.name
Result
name breeder_count
--------------------------
Mouse 2
Cow 1
Pig 1
Dog 0
The result shows that there are 2 mice that became a breeder when actually it was the same animal and should only be counted once.
You can use the DISTINCT keyword, so as to count a 'B' just once:
SELECT
a.name,
COUNT(DISTINCT CASE WHEN bh.code IN ('B') THEN 1 END) AS breeder_count
FROM animals a
LEFT OUTER JOIN breeding_history bh
ON a.id = bh.animal_id
GROUP BY a.name
As a side note, ELSE NULL is redundant and has been removed from the CASE expression.
Demo here

How Do I Select All Parents and the Top Previous Child Record Based on Dates in SQL Server 2008

I'm using a vendor provided database running on SQL Server 2008. There are two tables that track tests. For every record in Table A there may be zero, one or multiple records in Table B. There can also be multiple tests in Table A for the same user. The relationship is TableA.UserID = TableB.UserID. Tests taken in Table B can occur before or after Table A.
I need to select all of the records in Table A and, if test(s) from Table B have been taken by the same user before the test in Table A, data from Table B but only from the last previous child record. Both tables are structured similarly:
**TABLE A**
TestID INTEGER PRIMARY KEY,
UserID INTEGER,
TestDate DATE,
Score INTEGER
TABLE B
TestID INTEGER PRIMARY KEY,
UserID INTEGER,
TestDate Date,
Score INTEGER
Sample Data
TABLE A
TestID UserID TestDate Score
1 100 2014-02-15 80
2 101 2014-02-20 100
3 102 2014-02-22 90
4 102 2014-03-10 70
TABLE B
TestID UserID TestDate Score
1000 100 2014-02-01 55
1007 100 2014-02-05 85
1012 100 2014-02-20 95
1034 102 2014-02-12 65
1205 102 2014-03-05 75
1986 101 2014-03-10 45
What I'd like returned would be:
UserID TestA_ID TestADate TestAScore TestB_ID TestBDate TestBScore
100 1 2014-02-15 80 1007 2014-02-05 85
101 2 2014-02-20 100 NULL NULL NULL
102 3 2014-02-22 90 1034 2014-02-12 65
102 4 2014-03-10 70 1205 2014-03-05 75
I've know how to get all of the previous Table B rows joined to the Table A rows by using a LEFT OUTER JOIN and filtering by date in the WHERE clause, and I know how to get the Top row from Table B, but I haven't been able to work out how to get the top child record that occurs before the date of the record in Table A. Any help would be appreciated. Thanks.
You can do this using OUTER APPLY in T-SQL.
For each record in TableA, we're looking for a record in TableB for the same user but with a test date prior to the test date in TableA and we're also ordering the test in TableB to ensure we're getting the most recent test from TableB (but still prior to the test date from TableA).
SELECT
A.[UserID],
A.[TestID] [TestA_ID],
A.[TestDate] [TestADate],
A.[Score] [TestAScore],
B.[TestB_ID],
B.[TestBDate],
B.[TestBScore]
FROM [TableA] A
OUTER APPLY
(
SELECT TOP 1
B1.[TestID] [TestB_ID],
B1.[TestDate] [TestBDate],
B1.[Score] [TestBScore]
FROM [TableB] B1
WHERE A.[UserID] = B1.[UserID]
AND A.[TestDate] > B1.[TestDate]
ORDER BY
B1.[TestDate] DESC
) B
Or another option might be to use the ROW_NUMBER() window function to find the record from TableB. I have a hunch this one wouldn't perform as well because it needs to hit TableA twice, but can't be sure without running tests.
SELECT
A.[UserID],
A.[TestID] [TestA_ID],
A.[TestDate] [TestADate],
A.[Score] [TestAScore],
B.[TestB_ID],
B.[TestBDate],
B.[TestBScore]
FROM [TableA] A
LEFT JOIN
(
SELECT
ROW_NUMBER() OVER (PARTITION BY A.[UserID], A.[TestID] ORDER BY B.[TestDate] DESC) [rn],
A.[UserID],
A.[TestID] [TestA_ID],
B.[TestID] [TestB_ID],
B.[TestDate] [TestBDate],
B.[Score] [TestBScore]
FROM [TableA] A
INNER JOIN [TableB] B
ON A.[UserID] = B.[UserID]
AND A.[TestDate] > B.[TestDate]
) B
ON A.[UserID] = B.[UserID]
AND A.[TestID] = B.[TestA_ID]
AND B.[rn] = 1

oracle sql query to get data from two tables of similar type

I have two tables ACTUAL AND ESTIMATE having unique column(sal_id, gal_id, amount, tax).
In ACTUAL table I have
actual_id, sal_id, gal_id, process_flag, amount, tax
1 111 222 N 100 1
2 110 223 N 200 2
In ESTIMATE table I have
estimate_id, sal_id, gal_id, process_flag, amount, tax
3 111 222 N 50 1
4 123 250 N 150 2
5 212 312 Y 10 1
Now I want a final table, which should have record from ACTUAL table and if no record exist for sal_id+gal_id mapping in ACTUAL but exist in ESTIMATE, then populate estimate record (along with addition of amount and tax).
In FINAL table
id sal_id, gal_id, actual_id, estimate_id, total
1 111 222 1 null 101 (since record exist in actual table for 111 222)
2 110 223 2 null 202 (since record exist in actual table for 110 223)
3 123 250 null 4 51 (since record not exist in actual table but estimate exist for 123 250)
(for 212 312 combination in estimate, since record already processed, no need to process again).
I am using Oracle 11g. Please help me on writing a logic in a single sql query?
Thanks.
There are several ways to write this query. One way is to use join and coalesce:
select coalesce(a.sal_id, e.sal_id) as sal_id,
coalesce(a.gal_id, e.gal_id) as gal_id,
coalesce(a.actual_value, e.estimate_value) as actual_value
from actual a full outer join
estimate e
on a.sal_id = e.sal_id and
a.gal_id = e.gal_id
This assumes that sal_id/gal_id provides a unique match between the tables.
Since you are using Oracle, here is perhaps a clearer way of doing it:
select sal_id, gal_id, actual_value
from (select *,
max(isactual) over (partition by sal_id, gal_id) as hasactual
from ((select 1 as isactual, *
from actual
) union all
(select 0 as isactual, *
from estimate
)
) t
) t
where isactual = 1 or hasactual = 0
This query uses a window function to determine whether there is an actual record with the matching sal_id/gal_id. The logic is to take all actuals and then all records that have no match in the actuals.