SQLite JOIN two tables with duplicated keys - sql

I need to join two tables on two different fields. I have table 1 like this:
key productid customer
1 100 jhon
2 109 paul
3 100 john
And table 2 has same fields but aditional data I must relate to first table
key productid customer status date ...
1 109 phil ok 04/01
2 109 paul nok 04/03
3 100 jhon nok 04/06
4 100 jhon ok 04/06
Both "key" fields are autoincrement. Problem is that my relationship fields are repeated several times across result and I need to generate a one-to-one relationship, in such manner that one row from table 2 must be related ONLY ONCE with a row on table 1.
I did a left join on (customer=customer and productid=productid) but relationship came out duplicated, a row from tablet 2 was related many times to rows of table one.
To clarify things...
I have to cross check both tables, table 1 is loaded from an XLS report, table 2 is data from a database that reflects customer transactions with many status data. I have to check if a row from XLS exists in database and then load additional status data. I must produce a report when rows from XLS has no correspondent data on database.
How can accomplish this JOIN, is this possible with only SQL?

You can accomplish this in MS SQL using the sql below. Not sure if SQLite supports this.
select a.*, c.*
from table2 a, ( select min(key) key, productid, customer
from table1
group by productid, customer
) b,
table1 c
where a.productid = b.productid
and a.customer = b.customer
and b.key = c.key

One way to understand this would be to figure out what each table represents exactly. Both tables seem to represent the same thing, with a row representing what you might call a purchase. Why are there two separate tables, then? Perhaps the second table goes into more depth about each purchase? Like jhon bought product 100, and it was 'nok' first and then 'ok'? Is so, then the key (what makes the table unique) for the second table would be all three fields.
You still join on only the two fields that match, but you can't expect uniqueness if there are two rows with the same unique keys.
It helps sometimes to create additional indexes on a table, to see what is truly unique.

Related

Joining the rows of one table to the columns of another in SQL

I have the following two tables in SQLite:
transactions:
ID
Department
1
IT
2
Customer Service
3
Cleaning
standards:
IT
Customer Service
Cleaning
9.12
17.8
24.86
I want to join these two tables so the numbers in the second table become rows in my first table based on the matching value in the second table. The resulting table would look like this:
ID
Department
Standard
1
IT
9.12
2
Customer Service
17.8
3
Cleaning
24.86
How can I join these, as I'm using the rows of one table and the columns of another?
You need a CROSS join of the tables and a CASE expression to pick the appropriate standard:
SELECT t.*,
CASE t.Department
WHEN 'IT' THEN s.IT
WHEN 'Customer Service' THEN s.`Customer Service`
WHEN 'Cleaning' THEN s.Cleaning
END AS Standard
FROM transactions t JOIN standards s;
See the demo.

Left Join misbehaving - VBA SQL

Table 1: Customer Info Data (has ~50K records)
Table 2: Sales by customer ID (has ~25K records)
When I am performing a left join of Sales data from Table 2 on Table 1 based on the Customer ID, the output has a handful of records (~500) with a unit increase in the sales quantity. For e.g. sales quantity for customer #1 is 200, the sale quantity that I am getting in my joined output is 201. Note again that this is only for handful of records, for the majority of the data, it is joined absolutely correctly.
The SQL query is a pretty standard one:
SELECT [Table1$].[ID], Name, Volume, Amount
FROM [Table1$] LEFT JOIN [Table2$] ON [Table1$].[ID] = [Table2$].[ID]
What is weird is that this error is only for a few records and it is changing only by a unit for all of them. What do you think could be the potential reason here? Note that I run this SQL query from VBA.
Edit:
Maybe if I add an image, it would help, I have masked my data:
Table1, blue colored column is the joined column, I have filtered for the ID where there is a problem:
Table2, source of the joined column, as you can see it is 20, but the value that has been joined with table 1 is 21. Interestingly, there is no ID with value 21 in table2
There is certainly a space or something that makes seemingly identical IDs different. To make sure do a select distinct id on the sales table.

Query to identify the parent/child relationship between two big tables

I have two tables. The first one contains laboratory result header records, one for each order. It has about 10 million rows in it that contain one of about 6,000 unique ProcedureIDs...
OrderID
ResultID
ProcedureID
ProcedureName
OrderDate
ResultDate
PatientID
ProviderID
The second table contains the detailed result record(s) for each order in the first table. It has about 80 million rows and contains about 28,000 child components that are associated with the 6,000 procedure IDs from the first table.
ResultComponentID
ResultID (foreign key to first table)
ComponentID
ComponentName
ResultValueType
ResultValue
ResultUnits
ResultingLab
I have a subset (n=135) procedure IDs for which I need a list of associated child component IDs. Here is a simple example...
Table 1
1000|1|CBC|Complete Blood Count|8/1/2019 08:00:00|8/2/2019 09:27:00|9999|8888
1001|2|CA|Calcium|8/1/2019 08:01:00|8/2/2019 09:28:00|9999|8888
Table 2
2543|1|RBC|Red Blood Cell Count|NM|60|Million/uL|OurLab
2544|1|PLT|Platelet Count|NM|60|Thou/cmm|OurLab
2545|2|RBC|Red Blood Cell Count|NM|60|Million/uL|OurLab
2546|1|CA|Calcium|NM|40|g/dl|OurLab
In this example, if CBC was in my subset and CA wasn't, I would expect two rows back...
CBC|Complete Blood Count|RBC|Red Blood Cell Count
CBC|Complete Blood Count|PLT|Platelet Count
Even if I had two million CBCs in the DB, I only need have one set of CBC parent/child rows.
If I were using a scripting tool, I would use a for each loop to iterate through the subset and grab the top 1 of each ProcedureID and use it to get the associated component children.
If I really wanted to go crazy with this, I would not assume that CBC only had two components, as some labs might send us two and some might send us seven.
Any advice on how to get the list of parent/child associations?
For the simple query, sometimes there is no way around just writing out all 135 ids if you can't find a neat way to get that subset out of a query or store it in a temp table.
For the uniqueness requirement, just add a 'group by'
Select t1.ProcedureId, t2.ComponentId
from Table1 t1
join Table2 t2 on t2.ResultId = t1.ResultId
where t1.ProcedureId in (
'CBC',
'etc', -- 135 times...
)
group by t1.ProcedureId, t2.ComponentId

SQL Query to fetch information based on one or more condition. Getting combinations instead of exact number

I have two tables. Table 1 has about 750,000 rows and table 2 has 4 million rows. Table two has an extra ID field in which I am interested, so I want to write a query that will check if the 750,000 table 1 records exist in table 2. For all those rows in table 1 that exist in table 2, I want the respective ID based on same SSN. I tried the following query:
SELECT distinct b.UID, a.*
FROM [Analysis].[dbo].[Table1] A, [Proteus_8_2].dbo.Table2 B
where a.ssn = b.ssn
Instead of getting 750,000 rows in the output, I am getting 5.4 million records. Where am i going wrong?
Please help?
You're requesting all the rows in your select if b.UID is a unique field in column two.
Also if SSN is not unique in table one you can get the higher row count than the total row count for table 2.
You need to consider what you want from table 2 again.
EDIT
You can try this to return distinct combinations of ssn and uid when ssn is found in table 2 provided that ssn and uid have a cardinality of 1:1, i.e., every unique ssn has a single unique uid.
select distinct
a.ssn,b.[UID]
from [Analysis].[dbo].[Table1] a
cross apply
( select top 1 [uid] from [Proteus_8_2].[dbo].[Table2] where ssn = a.ssn ) b
where b.[UID] is not null
Try with LEFT JOIN
SELECT distinct b.UID, a.*
FROM [Analysis].[dbo].[Table1] A LEFT JOIN [Proteus_8_2].dbo.Table2 B
on a.ssn = b.ssn
Since the order detail table is in a one-many relationship to the order table, that is the expected result of any join. If you want something different, you need to define for us the business rule that will tell us how to select only one record from the Order detail table. You cannot effectively write SQL code without understanding the business rules that of what you are trying to achieve. You should never just willy nilly select one record out of the many, you need to understand which one you want.

SQL queries with different results

I have two tables that I try to join over one field and it gives me different results in two queries that should give same results. Queries are:
SELECT * FROM tblCustomer tca
WHERE tca.PhoneNumber IN(
SELECT ts.SubscriptionNumber FROM sub.tblSubscription ts
WHERE ts.ServiceTypeID=4
AND ts.SourceID=-1
)
and
SELECT tca.*
FROM sub.tblSubscription ts
inner JOIN tblCustomer tca
ON ts.SubscriptionNumber = tca.PhoneNumber
WHERE ts.ServiceTypeID = 4
AND ts.SourceID = -1
How is this possible?
I'm assuming a customer can have multiple subscriptions, right? Let's assume you have 5 customers, each with 2 subscriptions...
When doing a SELECT ... FROM Customer WHERE IN (Subscription), you will receive 5 customer records, because each of those 5 customers are in fact in the subscription table, even though the subscription table will have 10 records. You are inherently asking the database for the data from one table, where the value of one of it's fields exists in another table. So it will only return the distinct records in the FROM table, irrespective of the amount of data in the WHERE IN table.
On the other hand, INNER JOINing the Customer table with the subscription table will return 5 customers x 2 subscriptions each = 10 records. By JOINing the tables you are asking the database for ALL the data in each table, where the data is matched up against specific fields.
So yes, the 2 queries will definitely give you different results.