SQL subqueries with 'exists' clause and multiple tables - sql

sorry, not enough reputation to post images. I have 3 SQL tables:
x1: {'a':1,2,3,4,5},
x2: {'c':1,1,1,2,2,3,3, 'd':1,3,5,1,3,1,1},
x3: {'b':1,3,5}
The query is:
select a from x1
where not exists (
select * from x3
where not exists (
select *
from x2
where x1.a = x2.c and x3.b=x2.d
)
)
The result from the following query is '1', but I can't understand what are the steps taken to get to that result.
What is being returned in which subquery?

I will try to explain. Your query will fetch records from table a for which the result set of
is empty(ie is asserted by not exists)
Consider the data in the tables
The values in a are 1,2,3,45
Lets check for a=1
We got against a=1 with c=1 three records in Table x2
And for the 3 records all of the column d values are present in table x3. This means the output of the block will return empty for a=1 and therefore will be present in the final output.
Check for a=2, the possible values in d are 1 and 3 and as we got 5 in table x3 the query will return a non-empty result
Similarly for a=3 the inner query returns 3 and 5 from table x3
For a=45, as a doesnt exist in table x2,all of the records table x3 gets returned -> 1,3,5
Therefore the only a which satisfy the empty result set is a=1 which is the answer.

Related

SQL - How to filter out responses with no variation for survery collection to do multi-linear regression?

I'm new to SQL and I am trying to filter out responses with no variation for survery collection (invalid responses) to do multi-linear regression. Do take note that there is actually more than 100 records for this table and I have simplified it for the illustration.
Database: MySQL 8.0.30 : TLSv1.2 (TablePlus)
ID is the respondent number.
Variables - x1, x2, x3 is the independent variables.
Values - Survery response.
For example this is the current table I have:
ID
Variables
Values
1
x1
1
1
x2
1
1
x3
1
2
x1
2
2
x2
3
2
x3
4
3
x1
5
3
x2
5
3
x3
5
Scripts used:
SELECT ID, Variables, Values
FROM TableA
GROUP BY ID
I am trying to achieve the following table, where I only want to keep the records which have a variation in the responses:
ID
Variables
Values
2
x1
2
2
x2
3
2
x3
4
I have tried to use the functions WHERE, DISTINCT, WHERE NOT, HAVING, but I can't seem to get the results that I require, or showing blank most times (like the table below). If anyone is able to help, that would be most helpful.
ID
Variables
Values
Thank you very much!
Your problem has two parts so you are going to need to use a subquery for this.
you want to know which responses have variations. For this you'll want to group by the responses by the id, and then check that the responses that have the same id all have the same value, or not. For this you can select only those having more than one distinct value:
select `id`
from results
group by `id`
having count(distinct `values`) > 1
based on that you can just wrap it with a select to get all the fields that you want, ungrouped:
select *
from results
where `id` in (
select `id`
from results
group by `id`
having count(distinct `values`) > 1
)
This is MySQL syntax, but shouldn't have that many differences for main dbs
SQL Fiddle: http://sqlfiddle.com/#!9/a266f806/4/0
Hope that helps
Try the following:
WITH ids_with_variations as
(
SELECT ID
,COUNT(DISTINCT [Values]) as unique_value_count
FROM TableA
GROUP BY ID
HAVING COUNT(DISTINCT [Values]) = 3 -- this assumes that you expect each ID to have exactly three responses
)
SELECT *
FROM TableA
WHERE ID IN (SELECT ID FROM ids_with_variations)
This is TSQL dialect. This also assumes that you expect exactly three variations in the value column.

Why aren't these two sql statements returning same output?

I'm just getting started with sql and have the objective to transform this:
select X.persnr
from Pruefung X
where X.persnr in (
select Y.persnr
from pruefung Y
where X.matrikelnr <> Y.matrikelnr)
output:
into the same output but using a form of join. I tried it the way below but I can't seem to get "rid" of the cartesian product as far as i can see. Or maybe i misunderstood the above statement what it should actually do. For me the above says "for each unique matrikelnr display all corresponding persnr".
select X.persnr
from Pruefung X
join pruefung y on x.persnr=y.persnr
where x.matrikelnr<>y.matrikelnr
output: A long list (I don't want to fill the entire question with it) - i am guessing the cartesian product from the join
This is the relation I am using.
Edit: Distinct (unless i am using it in the wrong place) won't work because then persnr is only displayed once, thats not the objective though.
Your initial query actually does:
select persnr from Pruefung if the same persnr exists for a a diferent matrikelnr.
"for each unique matrikelnr display all corresponding persnr"
This is achieved using aggregation:
Depending on the DBMS you are using you could use something like (SQL Server uses STRING_AGG, but MySQL uses GROUP_CONCAT)
SELECT matrikelnr,STRING_AGG(matrikelnr,',')
GROUP BY matrikelnr
You cannot easily achieve what you got from a correlated query (your first attempt) by using a join.
Edit:
A join does not result in a "Cartesian product" expect from when there is no join condition (CROSS JOIN).
A join matches two sets based on a join condition. The reason why you get more entries is that the join looks at the join key (PERSNR) and does its matching.
For example for 101 you have 3 entries. That means you will get 3x3 reults.
You then filter out the results for the cases where X.matrikelnr <> Y.matrikelnr If we assume matrikelnr is unique that would mean the row matched with itself. so you will lose 3 results ending up with 3x3 - 3 = 6.
If you want to achieve something in SQL you must first define what you are expecting to use and then use the appropiate tools (in this case correlated queries not joins)
You can write your 1st query with EXISTS instead of IN like:
select X.persnr
from Pruefung X
where exists (
select 1
from pruefung Y
where X.persnr = Y.persnr and X.matrikelnr <> Y.matrikelnr
)
This way it's obvious that this query means:
return all the persnrs of the table for which there exists another
row with the same persnr but different matrikelnr
For your sample data the result is all the persnrs of the table.
Your 2nd query though, does something different.
It links every row of the table with all the rows of the same table with the same persnr but different matrikelnr.
So for every row of the table you will get as many as rows as there are for the same persnrs but different matrikelnrs.
For example for the 1st row with persnr = 101 and matrikelnr = 8532478 you will get 2 rows because there are 2 rows in the table with persnr = 101 and matrikelnr <> 8532478.
You are right. It's the cartesian product's fault. Suppose you have persnr 1,1,1,2,2,2 in the first table and persnr 1,1,1,2,2 in the second. How many lines are you expecting to be returned?
In pdeuso-code it would go like this
Select
...
WHERE persnr in (second table)
-- 6 lines
Select persnr
FROM ...
JOIN ... ON a.persnr = b.persnr
-- 3X3 + 3X2 = 15 lines.
SELECT DISTINCT persnr
FROM ...
JOIN ... ON a.persnr = b.persnr
-- 2 lines (1 and 2)
Take your pick

Compare two unrelated tables sql

We're dealing with geographic data with our Oracle database.
There's a function called ST_Insertects(x,y) which returns true if record x intersects y.
What we're trying to do is, compare each record of table A with all records of table B, and check two conditions
condition 1 : A.TIMEZONE = 1 (Timezone field is not unique)
condition 2 : B.TIMEZONE = 1
condition 3 : ST_Intersects(A.SHAPE, B.SHAPE) (Shape field is where the geographical information is stored)
The result we're looking for is records ONLY from the table A that satisfy all 3 conditions above
We tried this in a single select statement but it doesn't seem to make much sense logically
pseudo-code that demonstrates a cross-join:
select A.*
from
tbl1 A, tbl2 B
where
A.TIMEZONE = 1 and
B.TIMEZONE = 1 and
ST_Intersects(A.SHAPE, B.SHAPE)
if you get multiples, you can put a distinct and only select A.XXX columns
With a cross-join rows are matched like this
a.row1 - b.row1
a.row1 - b.row2
a.row1 - b.row3
a.row2 - b.row1
a.row2 - b.row2
a.row2 - b.row3
So if row 1 evaluates to true on multiple rows, then just add a distinct on a.Column1, etc.
If you want to use the return value from your function in an Oracle SQL statement, you will need to change the function to return 0 or 1 (or 'T'/'F' - some data type supported by Oracle Database, which does NOT support the Boolean data type).
Then you probably want something like
select <columns from A>
from A
where A.timezone = 1
and exists ( select *
from B
where B.timezone = 1
and ST_intersects(A.shape, B.shape) = 1
)

group by ID and delete the values

I am modifying my description to make it more sense.
I want to select all the ID's which doesn't have value = Z2 ; How can I do that using sqlserver query ?
below is the data example
PK ID Value
1 1 x1
2 1 x2
3 1 x3
4 1 X4
5 2 X1
6 3 z2
7 2 Z2
8 4 X1
9 4 X2
EDIT: Since you clarified your question to just wanting to select the IDs that don't have z2 as a value, the query gets simpler. Just select all id's and remove those that have a z2 value using EXCEPT;
SELECT id FROM mytable
EXCEPT
SELECT id FROM mytable WHERE value='z2';
An SQLfiddle to test with.
(I'm assuming you want case insensitivity in this query)
DELETE FROM table where value <> 'z2';
You said in your question that you just want to select all the IDs which doesn't have value = Z2. So why would you want a DELETE query? A simple SELECT query like the following is supposed to give you the expected output:
SELECT ID
FROM Table
WHERE Value NOT IN ('Z2')
From the query, you can see that I have used an SQL clause - NOT IN, which will exclude certain values from specific column(s) while retrieving from your table. You don't need a DELETE query to view those IDs.
For better understanding, please refer to this link: SQL: NOT Condition

Select query with multiple tables

I need a query to select a common record from four table based on single condition from a table
I used a query which returns 240 records but the condition returns only 2 result sets.
Reference no from all the given tables are same.
Select b.cdr_data
,a.cdr_data
,c.cdr_data
from itaukei_data_store b
,itaukei_data_store_key a
,ITAUKEI_BANK_ACCOUNT c
,payment_data_store d
where a.reference_no = b.reference_no
and a.reference_no=c.ITK_REFNO
and b.INDIVIDUAL_REFNO=d.INDIV_REF_NO
and d.remarks='Below 18 years';
But,
select * from payment_data_store where remarks='Below 18 years';
Returns 2 records alone.
You Try like this
Select b.cdr_data,a.cdr_data,c.cdr_data,d.cdr_data
from itaukei_data_store b,itaukei_data_store_key a,
ITAUKEI_BANK_ACCOUNT c,payment_data_store d
where a.reference_no = b.reference_no
and b.reference_no=c.ITK_REFNO
and b.INDIVIDUAL_REFNO=d.INDIV_REF_NO
and d.remarks='Below 18 years';