Query to identify the parent/child relationship between two big tables - sql

I have two tables. The first one contains laboratory result header records, one for each order. It has about 10 million rows in it that contain one of about 6,000 unique ProcedureIDs...
OrderID
ResultID
ProcedureID
ProcedureName
OrderDate
ResultDate
PatientID
ProviderID
The second table contains the detailed result record(s) for each order in the first table. It has about 80 million rows and contains about 28,000 child components that are associated with the 6,000 procedure IDs from the first table.
ResultComponentID
ResultID (foreign key to first table)
ComponentID
ComponentName
ResultValueType
ResultValue
ResultUnits
ResultingLab
I have a subset (n=135) procedure IDs for which I need a list of associated child component IDs. Here is a simple example...
Table 1
1000|1|CBC|Complete Blood Count|8/1/2019 08:00:00|8/2/2019 09:27:00|9999|8888
1001|2|CA|Calcium|8/1/2019 08:01:00|8/2/2019 09:28:00|9999|8888
Table 2
2543|1|RBC|Red Blood Cell Count|NM|60|Million/uL|OurLab
2544|1|PLT|Platelet Count|NM|60|Thou/cmm|OurLab
2545|2|RBC|Red Blood Cell Count|NM|60|Million/uL|OurLab
2546|1|CA|Calcium|NM|40|g/dl|OurLab
In this example, if CBC was in my subset and CA wasn't, I would expect two rows back...
CBC|Complete Blood Count|RBC|Red Blood Cell Count
CBC|Complete Blood Count|PLT|Platelet Count
Even if I had two million CBCs in the DB, I only need have one set of CBC parent/child rows.
If I were using a scripting tool, I would use a for each loop to iterate through the subset and grab the top 1 of each ProcedureID and use it to get the associated component children.
If I really wanted to go crazy with this, I would not assume that CBC only had two components, as some labs might send us two and some might send us seven.
Any advice on how to get the list of parent/child associations?

For the simple query, sometimes there is no way around just writing out all 135 ids if you can't find a neat way to get that subset out of a query or store it in a temp table.
For the uniqueness requirement, just add a 'group by'
Select t1.ProcedureId, t2.ComponentId
from Table1 t1
join Table2 t2 on t2.ResultId = t1.ResultId
where t1.ProcedureId in (
'CBC',
'etc', -- 135 times...
)
group by t1.ProcedureId, t2.ComponentId

Related

SQLITE select random N rows

I've got two tables which both have hundreds of millions of rows.
One is PAPER. Each row is unique with a column called "paper_id" as its key.
The other is PFOS. Each row has two columns, "paper_id" and "field_id".
One paepr may belong to several fields.
I need to select N rows in each group grouped by field_id in PFOS then get papers in PAPER by selected paper_id.
This is my sql:
select paper_id in PFOS where field_id in/= xxx order by random limit N
Questions
How could I make it faster?
When I use LIMIT(), the rows I got are less than N.Did I make a mistake in sql?
PAPER
paper_id*,title...
PFOS
paper_id,field_id
I would apprecaite it if I got you suggestions.

How to do three left joins in SQL

I know how to do a left join but when I add two more to the query. It gets a bit weird. So, here is the task. The original table, list20192, has 93 columns and 85,353 rows. However, the end user is not ok with it having no descriptive fields such as titles or descriptions. One field, naics, is a six-digit code and they want an industry title to go along with it. For the corresponding title that goes with naics code, 551114, one has to go to the indcodes table which has the following structure:
state char(2)
codetype char(2)
code char(6)
codetitle varchar (115)
GEOG Table
state char (2)
areatype char(2)
area (6)
areaname varchar (254)
areadesc varchar (254)
I am using the following query in an effort to attach the descriptive counterparts of the fields in list20192 to the end. The end result should be 96 columns and 85,353 rows. This query works but produces 6,800,277 rows which is far too many.
Select list20192.*, geog.area, sizeclas.sizedesc
from dbo.list20192
left join dbo.indcodes on list20192.naics = indcodes.code
left join dbo.sizeclas on list20192.sizeclass = sizeclas.sizeclass
left join dbo.geog on list20192.area = geog.area
where list20192.year = '2019' and list20192.qtr = '2'
The end result should look something like this. There are two rows with the sixth item being on the second row.
naics codetitle sizeclass sizedesc area
541114 Management 22 400 to 499 employees 000025
areaname
Pershing
Any ideas how I would adjust this query to not receive so many results? Now that the results show it appears that it is giving me a result for every area value in geog with no regard to state. I am in state 32. My original table list20192 only deals with state 32. For each state, there are many area values that could be identical. For instance, area 000003 in Nevada is Clark while 000003 in South Dakota is Beadle County
The proliferation of rows is undoubtedly being caused by having multiple rows in the various joined tables which match the requested join-key value. A query will produce an output row for every matching combination: for example, a join of a table with 2 matching keys on one side and 5 on the other will produce 2*5 = 10 result rows.
One easy way to find out might be to add JOIN clauses one at a time.
Without having any idea, of course, what your data looks like, I'd probably finger that geog table. (The other two just look to me like lookup-tables.)
The DISTINCT clause can filter out duplicates, but note that this can be expensive. Maybe you need to be more specific with one or more of these tables . . .

Return query results where two fields are different (Access 2010)

I'm working in a large access database (Access 2010) and am trying to return records where two locations are different.
In my case, I have a large number of birds that have been observed on multiple dates and potentially on different islands. Each bird has a unique BirdID and also an actual physical identifier (unfortunately that may have changed over time). [I'm going to try addressing the changing physical identifier issue later]. I currently want to query individual birds where one or more of their observations is different than the "IslandAlpha" (the first island where they were observed). Something along the lines of a criteria for BirdID: WHERE IslandID [doesn't equal] IslandAlpha.
I then need a separate query to find where all observations DO equal where they were first observed. So where IslandID = IslandAlpha
I'm new to Access, so let me know if you need more information on how my tables/relationships are set up! Thanks in advance.
Assuming the following tables:
Birds table in which all individual birds have records with a unique BirdID and IslandAlpha.
Sightings table in which individual sightings are recorded, including IslandID.
Your first query would look something like this:
SELECT *
FROM Birds
INNER JOIN Sightings ON Birds.BirdID=Sightings.BirdID
WHERE Sightings.IslandID <> Birds.IslandAlpha
You second query would be the same but with = instead of <> in the WHERE clause.
Please provide us information about the tables and columns you are using.
I will presume you are asking this question because a simple join of tables and filtering where IslandAlpha <> ObsLoc is not possible because IslandAlpha is derived from first observation record for each bird. Pulling first observation record for each bird requires a nested query. Need a unique record identifier in Observations - autonumber should serve. Assuming there is an observation date/time field, consider:
SELECT * FROM Observations WHERE ObsID IN
(SELECT TOP 1 ObsID FROM Observations AS Dupe
WHERE Dupe.ObsBirdID = Observations.ObsBirdID ORDER BY Dupe.ObsDateTime);
Now use that query for subsequent queries.
SELECT * FROM Observations
INNER JOIN Query1 ON Observations.ObsBirdID = Query1.ObsBirdID
WHERE Observations.ObsLocID <> Query1.ObsLocID;

Create groups of items from separate tables

I have multiple models (events, chores, bills, and lists), which each have their own table. I want to be able to group any of these instances together, for example group an event with a list of items to buy for it, and a bill for the cost.
I was thinking each table could have a group id, and I could get other items in a group by merging records from each table where the group_id equals the items group_id.
group = Events.find_by_group_id(self.group_id).concat(Bills.find_by_group_id(self.group_id)) ...
But that seems like a bad way to do it.
Another way I thought to do it was to use a polymorphic relation between two of the items
tag
item_1_id | item_1_type | item_2_id | item_2_type
----------+-------------+-----------+------------
But in the example above (a group of three different items) would require six records, two between each pair, for each item to know of all other items in the group.
Is there a way to do this with joins, should I redesign some of the tables?

SQLite JOIN two tables with duplicated keys

I need to join two tables on two different fields. I have table 1 like this:
key productid customer
1 100 jhon
2 109 paul
3 100 john
And table 2 has same fields but aditional data I must relate to first table
key productid customer status date ...
1 109 phil ok 04/01
2 109 paul nok 04/03
3 100 jhon nok 04/06
4 100 jhon ok 04/06
Both "key" fields are autoincrement. Problem is that my relationship fields are repeated several times across result and I need to generate a one-to-one relationship, in such manner that one row from table 2 must be related ONLY ONCE with a row on table 1.
I did a left join on (customer=customer and productid=productid) but relationship came out duplicated, a row from tablet 2 was related many times to rows of table one.
To clarify things...
I have to cross check both tables, table 1 is loaded from an XLS report, table 2 is data from a database that reflects customer transactions with many status data. I have to check if a row from XLS exists in database and then load additional status data. I must produce a report when rows from XLS has no correspondent data on database.
How can accomplish this JOIN, is this possible with only SQL?
You can accomplish this in MS SQL using the sql below. Not sure if SQLite supports this.
select a.*, c.*
from table2 a, ( select min(key) key, productid, customer
from table1
group by productid, customer
) b,
table1 c
where a.productid = b.productid
and a.customer = b.customer
and b.key = c.key
One way to understand this would be to figure out what each table represents exactly. Both tables seem to represent the same thing, with a row representing what you might call a purchase. Why are there two separate tables, then? Perhaps the second table goes into more depth about each purchase? Like jhon bought product 100, and it was 'nok' first and then 'ok'? Is so, then the key (what makes the table unique) for the second table would be all three fields.
You still join on only the two fields that match, but you can't expect uniqueness if there are two rows with the same unique keys.
It helps sometimes to create additional indexes on a table, to see what is truly unique.