SQLITE select random N rows - sql

I've got two tables which both have hundreds of millions of rows.
One is PAPER. Each row is unique with a column called "paper_id" as its key.
The other is PFOS. Each row has two columns, "paper_id" and "field_id".
One paepr may belong to several fields.
I need to select N rows in each group grouped by field_id in PFOS then get papers in PAPER by selected paper_id.
This is my sql:
select paper_id in PFOS where field_id in/= xxx order by random limit N
Questions
How could I make it faster?
When I use LIMIT(), the rows I got are less than N.Did I make a mistake in sql?
PAPER
paper_id*,title...
PFOS
paper_id,field_id
I would apprecaite it if I got you suggestions.

Related

(Hive) SQL retrieving data from a column that has 1 to N relationship in another column

How can I retrieve rows where BID comes up multiple times in AID
You can see the sample below, AID and BID columns are under the PrimaryID, and BIDs are under AID. I want to come up with an output that only takes records where BIDs had 1 to many relationship with records on AIDs column. Example output below.
I provided a small sample of data, I am trying to retrieve 20+ columns and joining 4 tables. I have unqiue PrimaryIDs and under those I have multiple unique AIDs, however under these AIDs I can have multiple non-unqiue BIDs that can repeatedly come up under different AIDs.
Hive supports window functions. A window function can associate every row in a group with an attribute of the group. Count() being one of the supported functions. In your case you can use that a and select rows for which that count > 1
The partition by clause you specify which columns define the group, tge same way that you would in the more familiar group by clause.
Something like this:
select * from
(
Select *,
count(*) over (partition by primaryID,AID) counts
from mytable
) x
Where counts>1

Query to identify the parent/child relationship between two big tables

I have two tables. The first one contains laboratory result header records, one for each order. It has about 10 million rows in it that contain one of about 6,000 unique ProcedureIDs...
OrderID
ResultID
ProcedureID
ProcedureName
OrderDate
ResultDate
PatientID
ProviderID
The second table contains the detailed result record(s) for each order in the first table. It has about 80 million rows and contains about 28,000 child components that are associated with the 6,000 procedure IDs from the first table.
ResultComponentID
ResultID (foreign key to first table)
ComponentID
ComponentName
ResultValueType
ResultValue
ResultUnits
ResultingLab
I have a subset (n=135) procedure IDs for which I need a list of associated child component IDs. Here is a simple example...
Table 1
1000|1|CBC|Complete Blood Count|8/1/2019 08:00:00|8/2/2019 09:27:00|9999|8888
1001|2|CA|Calcium|8/1/2019 08:01:00|8/2/2019 09:28:00|9999|8888
Table 2
2543|1|RBC|Red Blood Cell Count|NM|60|Million/uL|OurLab
2544|1|PLT|Platelet Count|NM|60|Thou/cmm|OurLab
2545|2|RBC|Red Blood Cell Count|NM|60|Million/uL|OurLab
2546|1|CA|Calcium|NM|40|g/dl|OurLab
In this example, if CBC was in my subset and CA wasn't, I would expect two rows back...
CBC|Complete Blood Count|RBC|Red Blood Cell Count
CBC|Complete Blood Count|PLT|Platelet Count
Even if I had two million CBCs in the DB, I only need have one set of CBC parent/child rows.
If I were using a scripting tool, I would use a for each loop to iterate through the subset and grab the top 1 of each ProcedureID and use it to get the associated component children.
If I really wanted to go crazy with this, I would not assume that CBC only had two components, as some labs might send us two and some might send us seven.
Any advice on how to get the list of parent/child associations?
For the simple query, sometimes there is no way around just writing out all 135 ids if you can't find a neat way to get that subset out of a query or store it in a temp table.
For the uniqueness requirement, just add a 'group by'
Select t1.ProcedureId, t2.ComponentId
from Table1 t1
join Table2 t2 on t2.ResultId = t1.ResultId
where t1.ProcedureId in (
'CBC',
'etc', -- 135 times...
)
group by t1.ProcedureId, t2.ComponentId

How to group rows after another grouping in oracle?

I have a table called correctObjects. In this tablet here a lot of grups which has different number records. One example is given below as grup 544 has 5 rows in table. So firstly, I should group all records by GRUP COLUMN then I must do inner matching by CAP COLUMN. So in grup#544 there is three different CAP values then I must give Inner Group number to these records. How can I do these two level grouping process. GRUP column is already done. Inner Grup Column is null in every records.
After Inner Group process, It must look like as belows:
I am using Oracle 11g R2 and PL/SQL Developer
Your question lacks certain details, so I'll just give you a starting point, and you can tweak it to suit your needs.
It's not entirely clear, but the way I understand it, you want to rank the different rows by cap. And I think the ranking is independent for every distinct grup value.
What's not clear to me is why 125 mm is ranked 1, and 62 mm is ranked 2. Is it based on the value? Is it based on which row is the first one, and if so, how are the rows ordered? Or maybe you don't really care which one is first or second, as long as they are grouped correctly. I'll have to assume the latter.
In any case, it sounds like you want to use the dense_rank() analytic function in some form:
select mip, startmi, cap, grup,
dense_rank() over (partition by grup order by cap) as inner_grup
from tbl

Access query/SQL - duplicates in one field with distinct multiple 2nd field

I am working on a database with products and lot numbers. Each entry in the Lots table has a Lot Number and a Product description.
Sometimes there are multiple records of the same lot number, for example when an item is repacked a new record is created, but with the same Lot Number and same product description - this is fine. But other times there are problem cases, namely when two different products share the same Lot Number. I am trying to find those.
In other words, there are 3 possibilities:
Lot numbers for which there is only one record in the table.
Lot numbers for which there are multiple records, but the Product description is the same for all of them
Lot numbers for which there are multiple records, and the product descriptions are not all the same.
I need to return only #3, with a separate record for each instance of that Lot Number and product description.
Any help would be greatly appreciated.
Thanks Juan for the sample data. Using this example, I want to return the data contained in Id 2-8, but not 1, 9, 10, 11.
This wasn't easy because lot of time don't use access.
First select unique values using distinct.
Then count how many diferent product appear on each lotnumber using group by
Last join both result and show only the lots with more than one description where total >1
.
SELECT id, Product.lotnumber, Product.Product, total
FROM
Product Inner join
(
SELECT lotnumber, count(*) as total
FROM
(SELECT distinct lotnumber, product
FROM Product)
GROUP BY lotnumber
) SubT On Product.lotnumber = SubT.lotnumber
WHERE total > 1
ORDER BY id
As you can see :
lot 2 have two products (yy and zz)
lot 3 have thre products (aa, bb, cc)
I include my product table:
Sorry for spanish. Field types are Autonumeric, Short Text, and Number

Filtering Database Results to Top n Records for Each Value in a Lookup Column

Let's say I have two tables in my database.
TABLE:Categories
ID|CategoryName
01|CategoryA
02|CategoryB
03|CategoryC
and a table that references the Categories and also has a column storing some random number.
TABLE:CategoriesAndNumbers
CategoryType|Number
CategoryA|24
CategoryA|22
CategoryC|105
.....(20,000 records)
CategoryB|3
Now, how do I filter out this data? So, I want to know what the 3 smallest numbers are out of each category and delete the rest. The end result would be like this:
TABLE:CategoriesAndNumbers
CategoryType|Number
CategoryA|2
CategoryA|5
CategoryA|18
CategoryB|3
CategoryB|500
CategoryB|1601
CategoryC|1
CategoryC|4
CategoryC|62
Right now, I can get the smallest numbers between all the categories, but I would like each category to be compared individually.
EDIT: I'm using Access and here's my code so far
SELECT TOP 10 cdt1.sourceCounty, cdt1.destCounty, cdt1.distMiles
FROM countyDistanceTable as cdt1, countyTable
WHERE cdt1.sourceCounty = countyTable.countyID
ORDER BY cdt1.sourceCounty, cdt1.distMiles, cdt1.destCounty
EDIT2: Thanks to Remou, here would be the working query that solved my problem. Thank you!
DELETE
FROM CategoriesAndNumbers a
WHERE a.Number NOT IN (
SELECT Top 3 [Number]
FROM CategoriesAndNumbers b
WHERE b.CategoryType=a.CategoryType
ORDER BY [Number])
You could use something like:
SELECT a.CategoryType, a.Number
FROM CategoriesAndNumbers a
WHERE a.Number IN (
SELECT Top 3 [Number]
FROM CategoriesAndNumbers b
WHERE b.CategoryType=a.CategoryType
ORDER BY [Number])
ORDER BY a.CategoryType
The difficulty with this is that Jet/ACE Top selects duplicate values where they exist, so you will not necessarily get three values, but more, if there are ties. The problem can often be solved with a key field, if one exists :
WHERE a.Number IN (
SELECT Top 3 [Number]
FROM CategoriesAndNumbers b
WHERE b.CategoryType=a.CategoryType
ORDER BY [Number], [KeyField])
However, I do not think it will help in this instance, because the outer table will include ties.
Order it by number and take 3, find out what the biggest number is and then remove rows where Number is greater than the Number.
I imagine it would need to be two seperate queries as your business tier would hold the value for the biggest number out of the 3 results and dynamically build the query to delete the rest.