This may be really nothing but as i am new to hive. I don't know how to do this in Hive?
I have a sample dataset that looks like this:
column_A column_B column_C
1 1 0
1 1 0
1 0 1
Now, i need to find out the sum of each column and then compare them to get the highest.
for example:
column_A column_B column_C
3 2 1
Output should be:
column_A
3
Query that I wrote is unable to perform the sum of each columns and compare columns to find the greatest among them.
SELECT (sum(column_A) as A,sum(column_B) as B,sum(column_C) as C) as xyz
from table_name where A IN (SELECT GREATEST(A,B,C) from xyz) ;
You can use greatest() after the aggregation:
SELECT greatest(sum(column_A), sum(column_B), sum(column_C))
from table_name;
Related
So let's say I have two columns:
A
B
1
300
1
299
2
300
2
300
3
299
3
299
I want to look for distinct values of A such that there is never a combination of A and B where B equals 300.
In my example, I would want to return the columnA value 3.
Result
A
3
How do I accomplish this with SQL?
What you are looking for is called conditional aggregation. You want to aggregate by A (i.e. show A values in your result) and have a check only applied on particular B values. For instance:
select a
from mytable
group by a
having count(case when b = 300 then 1 end) = 0;
A simple subquery will give the results.
Exclude the Rows using the NOT IN keyword
SELECT DISTINCT ColumnA
FROM TABLE
WHERE ColumnA NOT IN(
SELECT ColumnA FROM Table WHERE ColumnB=300)
You can also use NOT EXISTS
SELECT DISTINCT A
FROM tbl t1
WHERE NOT EXISTS(
SELECT 1 FROM Tbl t2 WHERE t2.A=t1.B AND t2.B=300)
I've come across a simple SQL query that should return a single row, but instead returns results like a GROUP BY statement.
Here is the query:
select
column_a,
column_b
from table_1 A1
where column_b = (
select MIN(column_b)
from table_1 A2
where A1.column_a = A2.column_a
)
;
And here is a table_1, the only table the query uses:
column_a column_b
a 3
a 2
a 1
b 5
b 4
c 6
The strange thing is that the query should only return a single row, column_a = "a" and column_b = "1", because the subquery will evaluate to "1".
But the actual result is the minimum for each letter in column_a. So:
a 1
b 4
c 6
Can anyone help me understand why the query is behaving like this?
I've setup a SQL Fiddle page with the example here: http://sqlfiddle.com/#!9/4b4d6f/1
This is a (correlation clause:
where column_b = (select MIN(column_b)
from table_1 A2
where A1.column_a = A2.column_a
------------------------^
)
It connects the subquery to the outer query. You can think of this as looping through the table in the outer row and keeping the row only when column_b has the minimum column_b value for the column_a value. Note that the query may not be executed by using such a nested loop!
If you wanted the overall minimum, you would leave out the correlation clause:
where column_b = (select MIN(column_b)
from table_1 A2
)
That is a correlated subquery, it filters using the value of A1.column_a. That means for each different value of column_a you will get a (potentially) different result.
This is why it looks like it is grouping by column_a.
You should also note that because you are not grouping you could get duplicate rows where there exist ties for the minimum column_b for a value of column_a
Can't wrap my mind around the next task:
I have a table, with some key, which represents some kind of group id. I would like to GROUP BY by this key and in resulted table show some columns from this table depending on the column value:
If all the values in this group by this key in col1 are equal (same number or text), then show this exact value, if they are different (at least one of them) - show some kind like "Others".
Example
key col1
1 1
1 1
1 1
2 4
2 5
Resulted table:
key col1
1 1
2 Others
Postgres 9.4, if this matters.
You can use aggregation and case:
select key,
(case when min(col1) = max(col1) then min(col1)
else 'others'
end) as col1
from t
group by key;
I have a table with 125k records. Each day, I insert ~20 records and generate 1,000 notifications based on the top 1000 records in the table ordered by insert time. Once notifications are generated, they are marked and no longer considered for future notification delivery. This has worked fine for a long time, except that a large insert of 100k which was ordered in a weird manner causes some issues.
There are 4 types of records, 2 columns with two different values each determines which of these 4 types it is. Based on the file sorting, one of the types is in the first 80k records and dominates the daily notifications.
I am working to fix this by creating a trigger on insert that will reorder the table in a manner that there is more evenly dispersed notifications each day.
My question: Is there a built in SQL Sorting Function that can proportionally sort results based on data in a column?
i.e. Can I get an 80/20 split based on column A and underneath, get an 80/20 split based on column B so that if I have the below options, I get 640 Records of (1,1), 160 records of (1,2), 160 records of (2,1), and 40 records of (2,2) without doing a hard coded select top X statement, as there are times I won't have 640 (1,1) records, but I still want 1000 total notifications generated.
Column A Column B
1 1
1 2
2 1
2 2
You should be able to use ROW_NUMBER to achieve what you want. You didn't provide table structures, so some of this is guesswork:
;WITH CTE_NumberedRows AS
(
SELECT
id,
column_a,
column_b,
some_date,
ROW_NUMBER() OVER (PARTITION BY column_a, column_b ORDER BY some_date) AS row_num
FROM
My_Table
)
SELECT TOP 1000
id,
column_a,
column_b
FROM
CTE_NumberedRows
ORDER BY
CASE WHEN column_a = 1 AND column_b = 1 AND row_num <= 640 THEN 0 ELSE 1 END,
CASE WHEN column_a = 1 AND column_b = 2 AND row_num <= 160 THEN 0 ELSE 1 END,
CASE WHEN column_a = 2 AND column_b = 1 AND row_num <= 160 THEN 0 ELSE 1 END,
CASE WHEN column_a = 2 AND column_b = 2 AND row_num <= 40 THEN 0 ELSE 1 END,
some_date
Whether or not you need to order by the date in ASC or DESC order isn't clear (or if you're even ordering on a date.) Hopefully the general gist of how it can be done is enough though. You're prioritizing (via the ORDER BY and CASE statements) the first "x" rows from each type and then after that just by the date.
I'm trying to solve the following: the data is organized in the table with Column X as the foreign key for the information (it's the ID which identifies a set of rows in this table as belonging together in a bundle, owned by a particular entity in another table). So each distinct value of X has multiple rows associated with it here. I would like to filter out all distinct values of X that have a row associated with them containing value "ABC" in Column Q.
i.e.
data looks like this:
Column X Column Q
-------- ---------
123 ABC
123 AAA
123 ANQ
456 ANQ
456 PKR
579 AAA
579 XYZ
886 ABC
the query should return "456" and "579" because those two distinct values of X have no rows containing the value "ABC" in Column Q.
I was thinking of doing this with a minus function (select distinct X minus (select distinct X where Q = "ABC")), as all I want are the distinct values of X. But i was wondering if there was a more efficient way to do this that could avoid a subquery? If for example I could partition the table over X and throw out each partition that had a row with the value "ABC" in Q?
I prefer to answer questions like this (i.e. about groups within groups) using aggregation and the having clause. Here is the solution in this case:
select colx
from data d
group by colx
having max(case when colq = 'ABC' then 1 else 0 end) = 0
If any values of colx have ABC, then the max() expression returns 1 . . . which does not match 0.
This should work:
SELECT DISTINCT t.ColX
FROM mytable t
LEFT JOIN mytable t2 on t.colx = t2.colx and t2.colq = 'ABC'
WHERE t2.colx IS NULL
And here is the SQL Fiddle.
Good luck.
How about this, using IN?
SQLFIDDLE DEMO
select distinct colx from
demo
where colx not in (
SELECT COLX from demo
where colq = 'ABC')
;
| COLX |
--------
| 456 |
| 579 |
Try this:
select DISTINCT colx
from demo
where colq not like '%A%'
AND colq not like '%B%'
AND colx not like '%C%'
SQL Fiddle