Different conditions (similiar to IF) + uniqe values - sql

i need to filter data using different conditions. One is that I need to queck if the values in one column (column d) are unique IF the values in another column (c) are greater than 1.
Lets assume:
Column a, b, c, d
So I don't want any entries, where c is greater than 1 while d has non unique values.
Select TOP 100 * From table
Where (a = 'Max' AND b = '2019') -- just an additional filter, which always applies
AND (c = 1 -- if c is one, that is fine
OR (c > 1 AND -- here I want to check if c is bigger than 1 AND if d is unique; but thats the part I need help with
);
Thank you very much in advance!

Create a CTE where you count the distinct values of column d and use it in the WHERE clause:
with cte as (
select count(distinct d) counter from tablename
)
...........................................
Where ....(c > 1 AND (select counter from cte) = 1)

Related

SQL select ordered randomly, but certain rows must be grouped by a 'next' id column

I have a table with two columns, id and next. Both would be the SHA256 of a file, id being the primary key, and next being nullable, referencing another row's id.
What I'm trying to do is select the rows from a table ordered randomly, but at the same time: if a row contains a value in next, the next row's id/pk MUST be the value of next, from the previous row. It would essentially be a random query, but keeping certain rows that depend on each other in a sequence.
The random part would be easy, just something like SELECT * FROM table ORDER BY rand(), but I didn't found anything about ordering based on a previous row's value. Another option would be manually sorting in the client, after the query, but that might be too costly depending on the table's size.
Example:
id
next
a
null
b
c
c
d
d
null
e
null
f
null
g
null
h
e
i
null
Expected result:
id
next
f
null
i
null
h
e
e
null
b
c
c
d
d
null
g
null
a
null
(Note that the results are shuffled, but h is followed by e, b is followed by c, which is followed by d)
Is it possible to so such a query in SQLite?
This is a graph walking problem. I assume that your structure is a set of linked lists:
No cycles.
next is unique
These assumptions are based on your naming.
With this assumption, you can use a relatively simple recursive CTE to construct the path to each id and then order by that path:
with recursive cte as (
select id, next, cast(id as text) as path
from t
where not exists (select 1 from t t2 where t2.next = t.id)
union all
select t.id, t.next, (cte.path || coalesce('->' || t.id, ''))
from cte join
t
on cte.next = t.id
)
select id, next
from cte
order by path;
Here is a db<>fiddle.

Finding a pair of row in SQL

I am very confused how to define the problem statement but Let's say below is table History i want to find those rows which have a pair.
Pair I will defined like column a and b will have same value and c should have False and d should be different for both row.
If I am using Java i would have set row 3, C column as true when i hit a pair or would have saved both row 1 and row 3 into different list. So that row 2 can be excluded. But i don't know how to do the same functionality in SQL.
Table - History
col a, b, c(Boolean ), d
1 bb F d
1 bb F d
1 bb F c
Query ? ----
Result - rows 1 and 3.
Assuming the table is called test:
SELECT
*
FROM
test
WHERE id IN (
SELECT
MIN(id)
FROM
test
WHERE
!c
AND a = b
AND d != a
GROUP BY a, d
)
We get the smallest id of every where matching your conditions. Furthermore we group the results by a, d which means we get only unique pairs of "a and d". Then we use this ids to select the rows we want.
Working example.
Update: without existing id
# add PK afterwards
ALTER TABLE test ADD COLUMN id INT PRIMARY KEY AUTO_INCREMENT FIRST;
Working example.
All the rows match the conditioin you specified. A "pair" happens when:
column a and b will have same value, and
c should have False, and
d should be different for both rows.
1 and 3 will match that as well as 2 and 3. Also, 3 and 1 will match as well as 3 and 2. There are four solutions.
You don't say which database, so I'll assume PostgreSQL. The query that can search using your criteria is:
select *
from t x
where exists (
select null from t y
where y.a = x.a
and y.b = x.b
and not y.c
and y.d <> x.d
);
Result:
a b c d
-- --- ------ -
1 bb false d
1 bb false d
1 bb false c
That is... the whole table.
See running example at DB Fiddle.

(SQL) How to compare each row to all other rows in Presto

I'm working with AWS Athena which uses Presto. Let's say I have a SQL table with columns A, B, C, and D. Assume table is sorted by column C, ascending.
I need to compare each row to all the other rows and check if current row's D value is the maximum value out of all rows whose C values are less than current row's C value. Then append a boolean value in column F. Code in Python would look something like:
D_val_list = []
for index, row in df.iterrows():
max_val_D = df[:index]['D'].max() #Sorted on column C
if row['D'] < max_val_D:
D_val_list.append(FALSE)
else:
D_val_list.append(TRUE)
df['F'] = D_val_list
Using the provisional jupyter notebook in Athena times out (the dataset is millions of rows long) and I figure connecting to AWS via local jupyter instance would have similar issues.
In SQL, you would use window functions -- something like this:
select t.*,
(case when d < coalesce(max(d) over (order by c
rows between unbounded preceding and 1 preceding) is null,
d + 1
then 1 else 0
end) as flag
from t;
This logic would work assuming that c is unique. That said, there might be alternative depending on the exact nature of the data.
You have to discretely order your rows on c in Athena because of its distributed nature. You can use window functions on top of the ordered set to achieve your desired results:
SELECT
a,
b,
c,
d,
CASE WHEN d>lag(max_so_far) OVER () THEN true ELSE false END as f
FROM (
SELECT a,
b,
c,
d,
max(d) OVER (rows BETWEEN unbounded preceding AND current row) AS max_so_far
FROM (
-- sorted ON c
SELECT
a,
b,
c,
d
FROM dataset.table
ORDER BY c
)
)

SQL - HAVING MIN() vs WHERE

Are these queries exactly the same, or is it possible to get different results depending on the data?
SELECT A, B, C, D
FROM Table_A
GROUP BY A, B, C, D , E
HAVING A in (1,2) AND E = 1 AND MIN(status) = 100
SELECT A, B, C, D
FROM Table_A
WHERE A IN (1,2) AND E = 1 AND status = 100
GROUP BY A, B, C, D , E
They're not equal.
When you consider the following block
create table Table_A(A int, B int, C int, D int, E int, status int);
insert into Table_A values(1,1,1,1,1,100);
insert into Table_A values(1,1,1,1,1,10);
insert into Table_A values(2,1,1,1,1,10);
SELECT A, B, C, D, 'First Query' as query
FROM Table_A
GROUP BY A, B, C, D , E
HAVING A in (1,2) AND E = 1 AND MIN(status) = 100;
SELECT A, B, C, D, 'Second Query' as query
FROM Table_A
WHERE A IN (1,2) AND E = 1 AND status = 100
GROUP BY A, B, C, D , E
you get
A B C D query
- - - - -------------
1 1 1 1 Second Query
as a result ( only the second one returns ),
since for both of the groupings 1,1,1,1,1 and 2,1,1,1,1 -> min(status)=10.
For this reason min(status)=100 case never occurs and first query returns no result.
Rextester Demo
A couple of things:
HAVING MIN(status) = 100
and
WHERE status = 100
are different. The where condition filters out anything that is not 100, period -- it's not even evaluated. The having clause only evaluates it after every record has been read and it looks at the result of the aggregate function (min) for the specified grouping.
Also, a more subtle difference is that the "where" clause for non-aggregate functions is preferable because it can make use of any index on the table, and equally important it will prevent records from being grouped and joined.
For example
having E = 1
and
where E = 1
functionally do the same thing. The difference is you need to collect, group and sort a bunch of records only to discard them using "having," whereas the "where" option removes them before any grouping ever occurs. Also, in this example, with the "where" option, you can remove E from the grouping criteria since it is always 1.
At a high level:
The where clause specifies search conditions for the rows
returned by the Query and limits rows to a meaningful set.
The having clause works as a filter on top of grouped rows.

Find Groups of Rows Conditionally, "with Opposite Value,"

I am seeking a way to SELECT rows conditionally without having only compound key A,B (refer to the picture).
Furthermore, I need to select rows where negative value and positive value of column C is present; skipping 0. There may be any combination of row count with A, B group the minimum is 2 where C has a negative or positive row.
The data found below is already queried.
Note: I was able to add another column D, because we can't use actual values for C:
D = CASE WHEN C < 0 THEN 1 ELSE 2 end
So the logic could be SELECT * WHERE SUM(D) >= 3.
I am fully able to complete this task with another language such as C#, but I have to get this done using only SQL.
I would also like to avoid temporary tables. Column D is not required.
Would this work?
Select tblA.*
FROM tblA
INNER JOIN
(select A,B
from tblA
Group By A,B
HAVING
SUM(case when C<0 then 1 else 2 end) >=3
)X
on X.A=tblA.A and X.B=tblA.B
SQLFiddle
http://sqlfiddle.com/#!9/2078f/2