Sql query for selecting data out of an table - sql

I`m working on a project that is examining Colorectal Cancer. And i have some data that i want to filter with an Sql query. The problem is that some of the experiments have failed so that some of the samples are in another folder pointed with _2 or _3 if its the second or third time this experiment was tried.
My data looks as follows:
So i want to check if Sentrix_ID exists more than 1 time if so than the query should take the latest version _2 or _3 and so one. Is there any Query that will do that for me.
And the raw_data:
ID Sample_Name Sample_Code Sample Sentrix_ID NorTum Pool_ID Sample_Group Sentrix_Position Folderdate opmerkingen
1835 99-02872T2 99-02872T2 99-02872 1495455 T2 GS0006564-OPA HNPCC_UV R001_C010 Exp060501 MLH1-UV
1836 97-5332T1 97-5332T1 97-5332 1495455 T1 GS0006564-OPA MUTYH R001_C011 Exp060501 1105delC_G382D
1827 R02-81709N R02-81709N R02-81709 1495455 N GS0006563-OPA HNPCC_UV R001_C002 Exp060501 MSH2
492 t03-32120 N t03-32120 N t03-32120 1495455_2 N GS0006563-OPA TEM_TME_LOH R004_C005 Exp060920
484 t02-27628 N t02-27628 N t02-27628 1495455_2 N GS0006563-OPA TEM_TME_LOH R004_C004 Exp060920
478 t03-06297 B2 t03-06297 B2 t03-06297 1495455_2 B2 GS0006563-OPA TEM_TME_LOH R006_C003 Exp060920
479 t03-06297 B3 t03-06297 B3 t03-06297 1495455_2 B3 GS0006563-OPA TEM_TME_LOH R007_C003 Exp060920
Thanks in advance

So you want to parse the value in your sentrix_id column into a prefix and suffix? You can use the instr() function to identify the position of the underscore, if any, unless you know the length of the prefix is always the same.
Try something like
select *,
left(sentrix_id,8) as prefix,
iif(len(sentrix_id) = 8 ,"",mid(sentrix_id,10)) as suffix
from table
if you create a new query with equivalent sql, you should see two additional rows in your table, one for the ID and one for the version suffix.
Then, using those query results in a second new query, you can group by prefix and return the max(suffix).

SELECT TOP 1 * FROM [YourTable] WHERE [Sentrix_ID] LIKE '<value>%' ORDER BY [Sentrix_ID] DESC
Would that work for your problem?

SELECT *
FROM (
SELECT count(Sentrix_ID) AS senId
FROM tableName
GROUP BY Sentrix_ID
) AS m
WHERE senId = 1;
This query will results if Sentrix_ID exists more than 1 it will give the latest version but in above question it will show you the empty table because Sentrix_id(1495455 & 1495455_2) has repeated more than one times. So if you have _3 as one entry in the table means it will give the _3 data.

Related

How can I use a row value to dynamically select a column name in Oracle SQL 11g?

I have two tables, one with a single row for each "batch_number" and another with defect details for each batch. The first table has a "defect_of_interest" column which I would like to link to one of the columns in the second table. I am trying to write a query that would then pick the maximum value in that dynamically linked column for any "unit_number" in the "batch_number".
Here is the SQLFiddle with example data for each table: http://sqlfiddle.com/#!9/a1c27d
For example, the maximum value in the DEFECT_DETAILS.SCRATCHES column for BATCH_NUMBER = A1 is 12.
Here is my desired output:
BATCH_NUMBER DEFECT_OF_INTEREST MAXIMUM_DEFECT_COUNT
------------ ------------------ --------------------
A1 SCRATCHES 12
B3 BUMPS 4
C2 STAINS 9
I have tried using the PIVOT function, but I can't get it to work. Not sure if it works in cases like this. Any help would be much appreciated.
If the number of columns is fixed (it seems to be) you can use CASE to select the specific value according to the related table. Then aggregating is simple.
For example:
select
batch_number,
max(defect_of_interest) as defect_of_interest,
max(defect_count) as maximum_defect_count
from (
select
d.batch_number,
b.defect_of_interest,
case when b.defect_of_interest = 'SCRATCHES' then d.scratches
when b.defect_of_interest = 'BUMPS' then d.bumps
when b.defect_of_interest = 'STAINS' then d.stains
end as defect_count
from defect_details d
join batches b on b.batch_number = d.batch_number
) x
group by batch_number
order by batch_number;
See Oracle example in db<>fiddle.

Why aren't these two sql statements returning same output?

I'm just getting started with sql and have the objective to transform this:
select X.persnr
from Pruefung X
where X.persnr in (
select Y.persnr
from pruefung Y
where X.matrikelnr <> Y.matrikelnr)
output:
into the same output but using a form of join. I tried it the way below but I can't seem to get "rid" of the cartesian product as far as i can see. Or maybe i misunderstood the above statement what it should actually do. For me the above says "for each unique matrikelnr display all corresponding persnr".
select X.persnr
from Pruefung X
join pruefung y on x.persnr=y.persnr
where x.matrikelnr<>y.matrikelnr
output: A long list (I don't want to fill the entire question with it) - i am guessing the cartesian product from the join
This is the relation I am using.
Edit: Distinct (unless i am using it in the wrong place) won't work because then persnr is only displayed once, thats not the objective though.
Your initial query actually does:
select persnr from Pruefung if the same persnr exists for a a diferent matrikelnr.
"for each unique matrikelnr display all corresponding persnr"
This is achieved using aggregation:
Depending on the DBMS you are using you could use something like (SQL Server uses STRING_AGG, but MySQL uses GROUP_CONCAT)
SELECT matrikelnr,STRING_AGG(matrikelnr,',')
GROUP BY matrikelnr
You cannot easily achieve what you got from a correlated query (your first attempt) by using a join.
Edit:
A join does not result in a "Cartesian product" expect from when there is no join condition (CROSS JOIN).
A join matches two sets based on a join condition. The reason why you get more entries is that the join looks at the join key (PERSNR) and does its matching.
For example for 101 you have 3 entries. That means you will get 3x3 reults.
You then filter out the results for the cases where X.matrikelnr <> Y.matrikelnr If we assume matrikelnr is unique that would mean the row matched with itself. so you will lose 3 results ending up with 3x3 - 3 = 6.
If you want to achieve something in SQL you must first define what you are expecting to use and then use the appropiate tools (in this case correlated queries not joins)
You can write your 1st query with EXISTS instead of IN like:
select X.persnr
from Pruefung X
where exists (
select 1
from pruefung Y
where X.persnr = Y.persnr and X.matrikelnr <> Y.matrikelnr
)
This way it's obvious that this query means:
return all the persnrs of the table for which there exists another
row with the same persnr but different matrikelnr
For your sample data the result is all the persnrs of the table.
Your 2nd query though, does something different.
It links every row of the table with all the rows of the same table with the same persnr but different matrikelnr.
So for every row of the table you will get as many as rows as there are for the same persnrs but different matrikelnrs.
For example for the 1st row with persnr = 101 and matrikelnr = 8532478 you will get 2 rows because there are 2 rows in the table with persnr = 101 and matrikelnr <> 8532478.
You are right. It's the cartesian product's fault. Suppose you have persnr 1,1,1,2,2,2 in the first table and persnr 1,1,1,2,2 in the second. How many lines are you expecting to be returned?
In pdeuso-code it would go like this
Select
...
WHERE persnr in (second table)
-- 6 lines
Select persnr
FROM ...
JOIN ... ON a.persnr = b.persnr
-- 3X3 + 3X2 = 15 lines.
SELECT DISTINCT persnr
FROM ...
JOIN ... ON a.persnr = b.persnr
-- 2 lines (1 and 2)
Take your pick

SQL - Select the longest substrings

I have the data like that.
AB
ABC
ABCD
ABCDE
EF
EFG
IJ
IJK
IJKL
and I just want to get ABCDE,EFG,IJKL. how can i do that oracle sql?
the size of the char are min 2 but doesn't have a fixed length, can be from 2 to 100.
In the event that you mean "longest string for each sequence of strings", the answer is a little different -- you are not guaranteed that all have a length of 4. Instead, you want to find the strings where adding a letter isn't another string.
select t.str
from table t
where not exists (select 1
from table t2
where substr(t2.str, 1, length(t.str)) = t.str and
length(t2.str) = length(t.str) + 1
);
Do note that performance of this query will not be great if you have even a moderate number of rows.
Select all rows where the string is not a substring of any other row. It's not clear if this is what you want though.
select t.str
from table t
where not exists (
select 1
from table t2
where instr(t1.str, t2.str) > 0
);

Add counter column to table for every n rows

I am looking to add a column like CusID that would be essentially a counter that can be chosen according to variable #nrows. In this case #nrows is 3 and just simply goes down the table date added and for each item in the row it adds the counter.
CustID --- DateAdded ---
1 2012-02-09
1 2012-02-09
1 2012-02-08
2 2012-02-07
2 2012-02-07
2 2012-02-07
3 2012-02-06
3 2012-02-06
If someone could tell me how to do that in MSSQL, it would be greatly appreciated.
This can be done in Excel with two formulas the first one counts rows and compares to #nrows
Location A3 in screen shot
=IF(B3=B2,(A2+1),1)
Second places the ID, location B4 in the screen shot
=IF(A3=$B$1,B3+1,B3)
The value in B1 is the variable "#nrows"
The value in B3 is the starter ID, so you can start at any value you want.
What about
=MAX(1,ROUNDUP(ROW()/#NROWS,0))
which I believe produces the result you want.
One reason it might not work is the "#NROWS" variable, which OP indicated he wanted to use. I confess that in my testing I used
=MAX(1,ROUNDUP(ROW()/3,0))
Don't know how to do it in excel, but you can first load data into SQL server, then the following syntax will help you
select NTILE(#NRows) over (order by DateAdded desc), DateAdded from tablename
Apply the ROW_NUMBER() function to the row set. It will produce sequential numbers starting from 1. Modify those by adding #nrows - 1 to them and dividing the results by #nrows:
SELECT
CustID = (ROW_NUMBER() OVER (ORDER BY DateAdded) + #nrows - 1) / #nrows,
DateAdded
FROM atable
;
See a demo at SQL Fiddle.

Missing gaps in recurring series within a group

We have a table with following data
Id,ItemId,SeqNumber;DateTimeTrx
1,100,254,2011-12-01 09:00:00
2,100,1,2011-12-01 09:10:00
3,200,7,2011-12-02 11:00:00
4,200,5,2011-12-02 10:00:00
5,100,255,2011-12-01 09:05:00
6,200,3,2011-12-02 09:00:00
7,300,0,2011-12-03 10:00:00
8,300,255,2011-12-03 11:00:00
9,300,1,2011-12-03 10:30:00
Id is an identity column.
The sequence for an ItemId starts from 0 and goes till 255 and then resets to 0. All this information is stored in a table called Item. The order of sequence number is determined by the DateTimeTrx but such data can enter any time into the system. The expected output is as shown below-
ItemId,PrevorNext,SeqNumber,DateTimeTrx,MissingNumber
100,Previous,255,2011-12-01 09:05:00,0
100,Next,1,2011-12-01 09:10:00,0
200,Previous,3,2011-12-02 09:00:00,4
200,Next,5,2011-12-02 10:00:00,4
200,Previous,5,2011-12-02 10:00:00,6
200,Next,7,2011-12-02 11:00:00,6
300,Previous,1,2011-12-03 10:30:00,2
300,Next,255,2011-12-03 16:30:00,2
We need to get those rows one before and one after the missing sequence. In the above example for ItemId 300 - the record with sequence 1 has entered first (2011-12-03 10:30:00) and then 255(2011-12-03 16:30:00), hence the missing number here is 2. So 1 is previous and 255 is next and 2 is the first missing number. Coming to ItemId 100, the record with sequence 255 has entered first (2011-12-02 09:05:00) and then 1 (2011-12-02 09:10:00), hence 255 is previous and then 1, hence 0 is the first missing number.
In the above expected result, MissingNumber column is the first occuring missing number just to illustrate the example.
We will not have a case where we would have a complete series reset at one time i.e. it can be either a series rundown from 255 to 0 as in for itemid 100 or 0 to 255 as in ItemId 300. Hence we need to identify sequence missing when in ascending order (0,1,...255) or either in descending order (254,254,0,2) etc.
How can we accomplish this in a t-sql?
Could work like this:
;WITH b AS (
SELECT *
,row_number() OVER (ORDER BY ItemId, DateTimeTrx, SeqNumber) AS rn
FROM tbl
), x AS (
SELECT
b.Id
,b.ItemId AS prev_Itm
,b.SeqNumber AS prev_Seq
,c.ItemId AS next_Itm
,c.SeqNumber AS next_Seq
FROM b
JOIN b c ON c.rn = b.rn + 1 -- next row
WHERE c.ItemId = b.ItemId -- only with same ItemId
AND c.SeqNumber <> (b.SeqNumber + 1)%256 -- Seq cycles modulo 256
)
SELECT Id, prev_Itm, 'Previous' AS PrevNext, prev_Seq
FROM x
UNION ALL
SELECT Id, next_Itm ,'Next', next_Seq
FROM x
ORDER BY Id, PrevNext DESC
Produces exactly the requested result.
See a complete working demo on data.SE.
This solution takes gaps in the Id column into consideration, as there is no mention of a gapless sequence of Ids in the question.
Edit2: Answer to updated question:
I updated the CTE in the query above to match your latest verstion - or so I think.
Use those columns that define the sequence of rows. Add as many columns to your ORDER BY clause as necessary to break ties.
The explanation to your latest update is not entirely clear to me, but I think you only need to squeeze in DateTimeTrx to achieve what you want. I have SeqNumber in the ORDER BY additionally to break ties left by identical DateTimeTrx. I edited the query above.