I am seeking a way to SELECT rows conditionally without having only compound key A,B (refer to the picture).
Furthermore, I need to select rows where negative value and positive value of column C is present; skipping 0. There may be any combination of row count with A, B group the minimum is 2 where C has a negative or positive row.
The data found below is already queried.
Note: I was able to add another column D, because we can't use actual values for C:
D = CASE WHEN C < 0 THEN 1 ELSE 2 end
So the logic could be SELECT * WHERE SUM(D) >= 3.
I am fully able to complete this task with another language such as C#, but I have to get this done using only SQL.
I would also like to avoid temporary tables. Column D is not required.
Would this work?
Select tblA.*
FROM tblA
INNER JOIN
(select A,B
from tblA
Group By A,B
HAVING
SUM(case when C<0 then 1 else 2 end) >=3
)X
on X.A=tblA.A and X.B=tblA.B
SQLFiddle
http://sqlfiddle.com/#!9/2078f/2
Related
I have two tables that share IDs on a postgresql .
I would like to select certain rows from table A, based on condition Y (in table A) AND based on Condition Z in a different table (B) ).
For example:
Table A Table B
ID | type ID | date
0 E 1 01.01.2022
1 F 2 01.01.2022
2 E 3 01.01.2010
3 F
IDs MUST by unique - the same ID can appear only once in each table, and if the same ID is in both tables it means that both are referring to the same object.
Using an SQL query, I would like to find all cases where:
1 - the same ID exists in both tables
2 - type is F
3 - date is after 31.12.2021
And again, only rows from table A will be returned.
So the only returned row should be:1 F
It is a bit hard t understand what problem you are actually facing, as this is very basic SQL.
Use EXISTS:
select *
from a
where type = 'F'
and exists (select null from b where b.id = a.id and dt >= date '2022-01-01');
Or IN:
select *
from a
where type = 'F'
and id in (select id from b where dt >= date '2022-01-01');
Or, as the IDs are unique in both tables, join:
select a.*
from a
join b on b.id = a.id
where a.type = 'F'
and b.dt >= date '2022-01-01';
My favorite here is the IN clause, because you want to select data from table A where conditions are met. So no join needed, just a where clause, and IN is easier to read than EXISTS.
SELECT *
FROM A
WHERE type='F'
AND id IN (
SELECT id
FROM B
WHERE DATE>='2022-01-01'; -- '2022' imo should be enough, need to check
);
I don't think joining is necessary.
I'm working with AWS Athena which uses Presto. Let's say I have a SQL table with columns A, B, C, and D. Assume table is sorted by column C, ascending.
I need to compare each row to all the other rows and check if current row's D value is the maximum value out of all rows whose C values are less than current row's C value. Then append a boolean value in column F. Code in Python would look something like:
D_val_list = []
for index, row in df.iterrows():
max_val_D = df[:index]['D'].max() #Sorted on column C
if row['D'] < max_val_D:
D_val_list.append(FALSE)
else:
D_val_list.append(TRUE)
df['F'] = D_val_list
Using the provisional jupyter notebook in Athena times out (the dataset is millions of rows long) and I figure connecting to AWS via local jupyter instance would have similar issues.
In SQL, you would use window functions -- something like this:
select t.*,
(case when d < coalesce(max(d) over (order by c
rows between unbounded preceding and 1 preceding) is null,
d + 1
then 1 else 0
end) as flag
from t;
This logic would work assuming that c is unique. That said, there might be alternative depending on the exact nature of the data.
You have to discretely order your rows on c in Athena because of its distributed nature. You can use window functions on top of the ordered set to achieve your desired results:
SELECT
a,
b,
c,
d,
CASE WHEN d>lag(max_so_far) OVER () THEN true ELSE false END as f
FROM (
SELECT a,
b,
c,
d,
max(d) OVER (rows BETWEEN unbounded preceding AND current row) AS max_so_far
FROM (
-- sorted ON c
SELECT
a,
b,
c,
d
FROM dataset.table
ORDER BY c
)
)
i need to filter data using different conditions. One is that I need to queck if the values in one column (column d) are unique IF the values in another column (c) are greater than 1.
Lets assume:
Column a, b, c, d
So I don't want any entries, where c is greater than 1 while d has non unique values.
Select TOP 100 * From table
Where (a = 'Max' AND b = '2019') -- just an additional filter, which always applies
AND (c = 1 -- if c is one, that is fine
OR (c > 1 AND -- here I want to check if c is bigger than 1 AND if d is unique; but thats the part I need help with
);
Thank you very much in advance!
Create a CTE where you count the distinct values of column d and use it in the WHERE clause:
with cte as (
select count(distinct d) counter from tablename
)
...........................................
Where ....(c > 1 AND (select counter from cte) = 1)
Imagine a table with only one column.
+------+
| v |
+------+
|0.1234|
|0.8923|
|0.5221|
+------+
I want to do the following for row K:
Take row K=1 value: 0.1234
Count how many values in the rest of the table are less than or equal to value in row 1.
Iterate through all rows
Output should be:
+------+-------+
| v |output |
+------+-------+
|0.1234| 0 |
|0.8923| 2 |
|0.5221| 1 |
+------+-------+
Quick Update I was using this approach to compute a statistic at every value of v in the above table. The cross join approach was way too slow for the size of data I was dealing with. So, instead I computed my stat for a grid of v values and then matched them to the vs in the original data. v_table is the data table from before and stat_comp is the statistics table.
AS SELECT t1.*
,CASE WHEN v<=1.000000 THEN pr_1
WHEN v<=2.000000 AND v>1.000000 THEN pr_2
FROM v_table AS t1
LEFT OUTER JOIN stat_comp AS t2
Windows functions were added to ANSI/ISO SQL in 1999 and to to Hive in version 0.11, which was released on 15 May, 2013.
What you are looking for is a variation on rank with ties high which in ANSI/ISO SQL:2011 would look like this-
rank () over (order by v with ties high) - 1
Hive currently does not support with ties ... but the logic can be implemented using count(*) over (...)
select v
,count(*) over (order by v) - 1 as rank_with_ties_high_implicit
from mytable
;
or
select v
,count(*) over
(
order by v
range between unbounded preceding and current row
) - 1 as rank_with_ties_high_explicit
from mytable
;
Generate sample data
select 0.1234 as v into #t
union all
select 0.8923
union all
select 0.5221
This is the query
;with ct as (
select ROW_NUMBER() over (order by v) rn
, v
from #t ot
)
select distinct v, a.cnt
from ct ot
outer apply (select count(*) cnt from ct where ct.rn <> ot.rn and v <= ot.v) a
After seeing your edits, it really does look look like you could use a Cartesian product, i.e. CROSS JOIN here. I called your table foo, and crossed joined it to itself as bar:
SELECT foo.v, COUNT(foo.v) - 1 AS output
FROM foo
CROSS JOIN foo bar
WHERE foo.v >= bar.v
GROUP BY foo.v;
Here's a fiddle.
This query cross joins the column such that every permutation of the column's elements is returned (you can see this yourself by removing the SUM and GROUP BY clauses, and adding bar.v to the SELECT). It then adds one count when foo.v >= bar.v, yielding the final result.
You can take the full Cartesian product of the table with itself and sum a case statement:
select a.x
, sum(case when b.x < a.x then 1 else 0 end) as count_less_than_x
from (select distinct x from T) a
, T b
group by a.x
This will give you one row per unique value in the table with the count of non-unique rows whose value is less than this value.
Notice that there is neither a join nor a where clause. In this case, we actually want that. For each row of a we get a full copy aliased as b. We can then check each one to see whether or not it's less than a.x. If it is, we add 1 to the count. If not, we just add 0.
In Oracle SQL Developer, how do I compare three tables where A + B = C table? I have to validate if all the data of A and B is converted into C. Also table A is in a different database from B and C, which are in the same database.
Let me assume that the different databases have one column, an id. You could use full outer join for this, assuming it is never NULL. However, this is probably easier using union all and aggregation.
You can get a list of ids that differ using the following query:
select id, sum(inab) as inab, sum(inc) as inc
from ((select id, 1 as inab, 0 as inc
from a
) union all
(select id, 1 as inab, 0 as inc
from b
) union all
(select id, 0 as inab, 1 as inc
from c
)
) c
group by id
having sum(inab) <> 1 or sum(inc) <> 1;
In practice, you would probably have multiple columns. Note: if there are duplicates in A+B or C, this just guarantees that the duplicate appears in both (rather than in both with the same count).