Get rows which don't satisfy any of below condition - hive

I have a table with id and different values. I want to have my output something which would looks like this
id value
----------
1 t
1 f
2 t
3 f
4 f
4 f
Expected output
id value
---------
3 f
4 f
If we look at the output, my condition here to check if my id has all f as value then return f, if it has all t then don't, and if any one of the id has t also don't include row in output.
How to achieve this ?

create subquery and exclude the values accordingly. i think hiveql supports where clause subqueries.
select id, value
from your_data_source
where id not in
(select id from your_data_source where value='t' group by id)
group by id, value

Related

Is there a way to display the first two results of each unique id?

I work in healthcare. In a Postgres database, we have a table member IDs and dates. I'm trying to pull the latest two dates for each member ID.
Simplified sample data:
A 1
B 1
B 2
C 1
C 5
C 7
D 1
D 2
D 3
D 4
Desired result:
A 1
B 1
B 2
C 1
C 5
D 1
D 2
I get a strong feeling this is for a homework assignment and would recommend that you look into partitioning and specifically rank() function by yourself first before looking at my solution.
Moreover, you have not specified how you received the initial result you provided, so I'll have to assume you just did select letter_column, number_column from my_table; to achieve the result.
So, what you actually want here is partition the initial query result into groups by the letter_column and select the first two rows in each. rank() function lets you assign each row a number, counting within groups:
select letter_column,
number_column,
rank() over (partition by letter_column order by number_column) as rank
from my_table;
Since it's a function, you can't use it in a predicate in the same query, so you'll have to build another query around this one, this time filtering the results where rank is over 2:
with ranked_results as (select letter_column,
number_column,
rank() over (partition by letter_column order by number_column asc) as rank
from my_table mt)
select letter_column,
number_column
from ranked_results
where rank < 3;
Here's an SQLFiddle to play around: http://sqlfiddle.com/#!15/e90744/1/0
Hope this helps!

duplication of rows in table

I have a table which has many rows which are same, except for the id column. How can I show only one row for other duplicate row?
id name roll_number
1 a 1
2 b 2
3 a 1
4 b 2
5 c 3
6 d 4
7 d 4
show output like this
id name roll_number
1 a 1
2 b 2
5 c 3
6 d 4
We can use DISTINCT ON here:
SELECT DISTINCT ON (name) id, name, roll_number
FROM yourTable
ORDER BY name, id;
This query is selecting one record with the lowest id from each group of records having the same name.
Simple aggregation using min
select Min(id), name,roll_number
from t
group by name, roll_number
You could use the numpy.unique(filt, trim='fb') function:
>>> import numpy as np
>>> np.unique(array)
This problem requires to "filter out" tuples during the projection based on groups. The solution is to use distinct on.
SELECT DISTINCT ON (name, roll_number) id, name, roll_number
FROM table
ORDER BY name, id;
it basically creates groups by the attributes within the "DISTINCT_ON" and non-deterministically chooses one tuple, which it outputs.

Finding a pair of row in SQL

I am very confused how to define the problem statement but Let's say below is table History i want to find those rows which have a pair.
Pair I will defined like column a and b will have same value and c should have False and d should be different for both row.
If I am using Java i would have set row 3, C column as true when i hit a pair or would have saved both row 1 and row 3 into different list. So that row 2 can be excluded. But i don't know how to do the same functionality in SQL.
Table - History
col a, b, c(Boolean ), d
1 bb F d
1 bb F d
1 bb F c
Query ? ----
Result - rows 1 and 3.
Assuming the table is called test:
SELECT
*
FROM
test
WHERE id IN (
SELECT
MIN(id)
FROM
test
WHERE
!c
AND a = b
AND d != a
GROUP BY a, d
)
We get the smallest id of every where matching your conditions. Furthermore we group the results by a, d which means we get only unique pairs of "a and d". Then we use this ids to select the rows we want.
Working example.
Update: without existing id
# add PK afterwards
ALTER TABLE test ADD COLUMN id INT PRIMARY KEY AUTO_INCREMENT FIRST;
Working example.
All the rows match the conditioin you specified. A "pair" happens when:
column a and b will have same value, and
c should have False, and
d should be different for both rows.
1 and 3 will match that as well as 2 and 3. Also, 3 and 1 will match as well as 3 and 2. There are four solutions.
You don't say which database, so I'll assume PostgreSQL. The query that can search using your criteria is:
select *
from t x
where exists (
select null from t y
where y.a = x.a
and y.b = x.b
and not y.c
and y.d <> x.d
);
Result:
a b c d
-- --- ------ -
1 bb false d
1 bb false d
1 bb false c
That is... the whole table.
See running example at DB Fiddle.

Need to find out if all columns in a SQL Server table have the same value

I have the task to find out if all columns in a SQL Server table have exact the same value. The table content is created by a stored procedure and can vary in the number of columns. The first column is an ID, the second and the following columns must be compared if the all columns have exact the same value.
At the moment I do not have a clue how to achieve this.
The best solution would be to display only the rows, which have different values in one or multiple columns except the first column with ID.
Thank you so much for your help!!
--> Edit: The table looks this:
ID Instance1 Instance2 Instance3 Instance4 Instance5
=====================================================
A 1 1 1 1 1
B 1 1 0 1 1
C 55 55 55 55 55
D Driver Driver Driver Co-driver Driver
E 90 0 90 0 50
F On On On On On
The result should look like this, only the rows with one or multiple different column values should be display.
ID Instance1 Instance2 Instance3 Instance4 Instance5
=====================================================
B 1 1 0 1 1
D Driver Driver Driver Co-driver Driver
E 90 0 90 0 50
My table has more than 1000 rows and 40 columns
you can achieve this by using row_number()
Try the following code
With c as(
Select id
,field_1
,field_2
,field_3
,field_n
,row_number() over(partition by field_1,field_2,field_3,field_n order by id asc) as rn
From Table
)
Select *
From c
Where rn = 1
row_number with partition is going to show you if the field is repeated by assigning a number to a row based on field_1,field_2,field_3,field_n, for example if you have 2 rows with same field values the inner query is going to show you
rn field_1 field_2 field_3 field_n id
1 x y z a 5
2 x y z a 9
After that on the outer part of the query pick rn = 1 and you are going to obtain a query without repetitions based on fields.
Also if you want to delete repeated numbers from your table you can apply
With c as(
Select id
,field_1
,field_2
,field_3
,field_n
,row_number() over(partition by field_1,field_2,field_3,field_n order by id asc) as rn
From Table
)
delete
From c
Where rn > 1
The best solution would be to display only the rows, which have different values in one or multiple columns except the first column with ID.
You may be looking for a the following simple query, whose WHERE clause filters out rows where all fields have the same value (I assumed 5 fields - id not included).
SELECT *
FROM mytable t
WHERE NOT (
field1 = field2
AND field1 = field3
AND field1 = field4
AND field1 = field5
);

Select a row from the group of row on a particular column value

I want to write a query to achieve the following. I have a table xyz in which there are multiple row with same column value(1) in say column a.
I want to find in column b doesn't have a particular value for the set of rows with value 1 in column a.
Table xyz
---------
a b
1 te
1 we
1 re
2 te
2 re
3 ge
4 re
So basically I want to find if the column b does not have the value 'te' for a set of values from column a
when i do
Select a from xyz where b <> 'te'
group by a
I will get 1,2,3 and 4 both for the result.
But I want the result should only contain 1 and 2. Please help.
Select a from xyz where (b<>'te') and ((a=1) or (a=2))
or as variant
select a from xyz where (b<>'te') and (a in (1, 2))
select a from xyz
where b! = 'tz' and
a in (select a from xyz where b = 'tz')
Is this what you are looking for?
Try this for you:
Select a from xyz where b = 'te'
group by a
I just realized I didn't and still don't understand what you are asking. Could you try and restate it? The only non-trivial interpretation I can come up with that would return 1 and 2 based on this data would be:
What are the values of a such that there is a row with both a and
'te' and a row with both a and a value other than 'te'
in which case a query would be:
SELECT DISTINCT q1.a FROM (SELECT a FROM xyz WHERE b='te') q1
JOIN (SELECT a FROM xyz WHERE b!='te') q2 ON
q1.a=q2.a
The interpretation which corresponds with returning 3 and 4 in your example or with returning 1 and 2 in your geo example would be:
What are the values of a for which a te row does not exist?
in which case a query would be:
SELECT DISTINCT a FROM xyz WHERE a NOT IN (SELECT a FROM xyx WHERE b='te')
as shown here (sqlfiddle is acting up, so I used ideone)