Hive. Over partition by vs Having-clause - hive

I have following table:
column0 column1 column2 column3 column4
sunny hot high false no
sunny hot high true no
overcast hot high false yes
rainy mild high false yes
rainy cool normal false yes
rainy cool normal true no
I want to run the query with having-clause and OVER PARTITION BY ,my query is
SELECT column0,column1,column2,count(column0) OVER (PARTITION BY column2) AS cnt FROM FF_weatherdatase1 where column0="sunny" having cnt>8
But hive shows :
HAVING specified without GROUP BY error.
please provide method to run having-clause using over partition by condition without group by condition, thanks in advance.

HAVING can be used only with GROUP BY . For your query, you can use a where condition outside parentheses.
SELECT *
FROM (SELECT column0,
column1,
column2,
COUNT(column0)
OVER (
PARTITION BY column2) AS cnt
FROM ff_weatherdatase1
WHERE column0 = 'sunny') w
WHERE cnt > 8

Related

Decode in oracle, with multiple condition

Select decode(a.Type,1,'ONE',2,'TWO',null ,'OTHER')||
Decode (b.Active, 1 ,'Yes',0,'NO', null ,'NO'),
count(*)
from TypeTable a,
Status b
where a.id=b.id
and a.type in(12,3,34,45,66);
Now question is,
I want to count those who has type 1 and other that in record
Like
Column1 | Column2
---------------+----------
One | 10
ONE
Other that ONE | 20
I'm not sure what do you want. Maybe you looking for something similar to my example.
select decode(object_type,'SYNONYM','SYNONYM','OTHER THAN SYNONYM') column1, count(*) column2 from user_objects
group by decode(object_type,'SYNONYM','SYNONYM','OTHER THAN SYNONYM');

Replace set of values with single value and sum

I have a table as follows:
id group1 group2 is_bar amount
1 a bar1 true 100
2 a bar2 true 200
3 a baz false 150
4 a baz false 250
5 b bar1 true 50
Every time is_bar is true, I'd like to replace the value in group2 and sum over amount resulting in:
group1 group2 amount
a bar 300
a baz 400
b bar 50
I currently do this using a subquery and then grouping by every other column in the table! But this seems a bit noob to me:
SELECT group1, group2, sum(amount) FROM
(
SELECT group1,
CASE WHEN is_bar THEN 'bar' ELSE group2 END as group2,
amount
FROM foo
) new_foo
GROUP BY group1, group2
ORDER BY group1, group2;
Is there a smart-person solution to this?
I believe this should work:
SELECT
group1
, CASE WHEN is_bar THEN 'bar' ELSE group2 END as group2
, SUM(AMOUNT)
FROM foo
GROUP BY group1, CASE WHEN is_bar THEN 'bar' ELSE group2 END
As #Patrick has mentioned in the comments you can replace the very long conditions in the GROUP BY with a GROUP BY 1, 2.
This will automatically refer to columns 1 and 2 (first and second) in the SELECT statement and the query will have the same output. But, if you add a different column as the first, you will have to make sure the GROUP BY still works as intended by changing or adding a number/condition representing the first column.
It turns out that in Postgresql you can GROUP BY an aliased column, as stated in this part of the docs:
In strict SQL, GROUP BY can only group by columns of the source table
but PostgreSQL extends this to also allow GROUP BY to group by columns
in the select list. Grouping by value expressions instead of simple
column names is also allowed.
This SQL Fiddle shows that in action:
SELECT
group1,
CASE WHEN is_bar THEN 'bar' ELSE group2 END group2_grouped,
SUM(AMOUNT)
FROM foo
GROUP BY group1, group2_grouped
The problem arises when you try to alias the CASE statement to the same as the column name as Postgresql will GROUP BY the original column not the alias. This is mentioned in the docs:
In case of ambiguity, a GROUP BY name will be interpreted as an
input-column name rather than an output column name.

SQL query for count of predicate applied on rows of subquery

I have a somewhat complicated SQL query to perform, and I'm not sure what the right strategy is.
Consider the model:
event
foreignId Int
time UTCTime
success Bool
and suppose I have a predicate, which we can call trailingSuccess, that is True if the last n events were successful. I want to test for this predicate. That is, I want to run a query on event that returns a count of foreignId's for which the event was a success each of the last n times (or more) that the event was logged.
I am using Postgres, if it matters, but I'd rather stay in the ANSI fragment if possible.
What is a sensible strategy for computing this query?
So far, I have code like:
SELECT count (*)
FROM (SELECT e.foreignId
FROM event e
...
ORDER BY e.time ASC
LIMIT n)
Obviously, I didn't get very far. I'm not sure how to express a predicate that quantifies over multiple rows.
For hypothetical usage, n = 4 is fine.
Example data:
foreign_id time success
1 14:00 True
1 15:00 True
1 16:00 True
1 17:00 True
2 14:00 False
2 15:00 True
2 16:00 True
2 17:00 True
3 14:00 True
3 15:00 True
3 16:00 True
For the sample data, the query should return 1, because there are n = 4 successful events with foreign_id = 1. foreign_id 2 does not count because there is a False one in the last 4. foreign_id 3 does not count because there aren't enough events with foreign_id = 3.
Try finding the latest "unsuccessful" entry fur each foreignID, using a simple GROUP BY clause. With this in a sub-query, you can join it back to the table, counting how many records there are (for each foreignID) that matches foreignID and has newer time.
Something like:
SELECT lastn.foreignID, count(*)
FROM
(SELECT foreignID, MAX(time) AS lasttime
FROM event
WHERE success = 'n'
GROUP BY foreignID
) AS lastn
JOIN event AS e
ON e.foreignID = lastn.foreignID
AND e.time > lastn.lasttime
GROUP BY lastn.foreignID;
And you can experiment with left joins and the like to tweak it to your needs.
select count(*)
from (
select
foreignId,
row_number() over(partition by foreignId order by "time" desc) as rn,
success
from event
) s
where rn <= n
group by foreignId
having bool_and(success)
The first derived table selects all foreignIds that have at least n events. The subquery checks if the last n events for each foreignId were all successful.
SELECT COUNT(*)
FROM (
SELECT foreignId
FROM event
GROUP BY foreignId
HAVING COUNT(*) >= n
) t1
WHERE (
SELECT COUNT(CASE WHEN NOT success THEN 1 END) = 0
FROM event
WHERE foreignId = t1.foreignId
ORDER BY time DESC
LIMIT n
)
I ended up messing around on sqlfiddle for a while, until I arrived at this:
select count (*)
from (select count (last.foreignId) as cnt
from (select foreignId
from event
and success = True
order by time desc
) as last
group by last.foreignId) as correct
where correct.cnt >= 4
I guess the insight I'm adding is that every layer of "selecting" can be thought of as a filter on the inner selections.

Sort data row in sql

please help me i have columns from more than one table and the data type for all these columns is integer
i want to sort the data row (not columns (not order by)) Except the primary key column
for example
column1(pk) column2 column3 column4 column5
1 6 5 3 1
2 10 2 3 1
3 1 2 4 3
How do I get this result
column1(pk) column2 column3 column4 column5
1 1 3 5 6
2 1 2 3 10
3 1 2 3 4
Please help me quickly .. Is it possible ?? or impossible ???
if impossible how I could have a similar result regardless of sort
What database are you using? The capabilities of the database are important. Second, this suggests a data structure issue. Things that need to be sorted would normally be separate entities . . . that is, separate rows in a table. The rest of this post answers the question.
If the database supports pivot/unpivot you can do the following:
(1) Unpivot the data to get in the format , ,
(2) Use row_number() to assign a new column, based on the ordering of the values.
(3) Use the row_number() to create a varchar new column name.
(4) Pivot the data again using the new column.
You can do something similar if this functionality is not available.
First, change the data to rows:
(select id, 'col1', col1 as val from t) union all
(select id, 'col2', col2 from t) union all
. . .
Call this byrow. The following query appends a row number:
select br.*, row_number() over (partition by id order by val) as seqnum
from byrow br
Put this into a subquery to unpivot. The final solution looks like:
with byrow as (<the big union all query>)
select id,
max(case when seqnum = 1 then val end) as col1,
max(case when seqnum = 2 then val end) as col2,
...
from (select br.*, row_number() over (partition by id order by val) as seqnum
from byrow br
) br
group by id
You can use pivot function of sql server to convert the row in column. Apply the sorting there and again convert column to row using unpivot.
Here is a good example using PIVOT, you should be able to adapt this to meet your needs
http://blogs.msdn.com/b/spike/archive/2009/03/03/pivot-tables-in-sql-server-a-simple-sample.aspx

How to count number of occurrences for all different values in database column?

I have a Postgre database that has say 10 columns. The fifth column is called column5. There are 100 rows in the database and possible values of column5 are c5value1, c5value2, c5value3...c5value29, c5value30. I would like to print out a table that shows how many times each value occurs.
So the table would look like this:
Value(of column5) number of occurrences of the value
c5value1 1
c5value2 5
c5value3 3
c5value4 9
c5value5 1
c5value6 1
. .
. .
. .
What is the command that does that?
Group by the column you are interested in and then use count to get the number of rows in each group:
SELECT column5, COUNT(*)
FROM table1
GROUP BY column5
Use the GROUP BY clause and the COUNT() aggregate function:
SELECT column5, COUNT(column5) AS Occurences
FROM myTable
GROUP BY column5