GROUP BY on specific columns in hive

GROUP BY on specific columns in hive - sql

I have a hive query with 38 columns and only one column is using an aggregate function. But I need to group it only with column name 1, 2 instead of all. How can this be accomplished?
for example,
What I need is,
SELECT
1
,2
,3
,4
,5
,MAX(6)
FROM
table_x
GROUP BY
1,2

select all the columns you want and group it only with column name 1, 2 use analytical functions .
use below query:
select col1,col2.....col38,
max(col6) over(partition by col1,col2 order by col1) as max_val
from tablename

use row_number() function
select * from
(
SELECT 1,2,3,4,5,6,row_number() over(partition by 1,2 order by 6 desc) as rn
FROM table_x
)A where rn=1

It does not comply with group by definition. When you group by X columns, other Y columns must be aggregated to fit in the existing groups.

Related

concatenate column values from multiple rows in Oracle without duplicates

I can concatenate column values from multiple rows in Oracle using LISTAGG
But I want to avoid duplicates
Currently it return duplicates
select LISTAGG( t.id,',') WITHIN GROUP (ORDER BY t.id) from table t;
for example for data
ID
10
10
20
30
30
40
Returns 10,10,20,30,40,40
Instead 10,20,30,40
And I can't use distinct inside LISTAGG
select LISTAGG( distinct t.id,',') WITHIN GROUP (ORDER BY t.id) from table t;
Error
ORA-30482: DISTINCT option not allowed for this function

One option would be using regexp_replace():
select regexp_replace(
listagg( t.id,',') within group (order by t.id)
, '([^,]+)(,\1)+', '\1') as "Result"
from t
Demo

You can put the distinct in a subquery:
select LISTAGG( t.id,',') WITHIN GROUP (ORDER BY t.id) from (SELECT DISTINCT t.id FROM TABLE) t

sql - single line per distinct values in a given column

is there a way using sql, in bigquery more specifically, to get one line per unique value in a given column
I know that this is possible using a sequence of union queries where you have as much union as distinct values as there is in the column of interest. but i'm wondering if there is a better way to do it.

You can use row_number():
select t.* except (seqnum)
from (select t.*, row_number() over (partition by col order by col) as seqnum
from t
) t
where seqnum = 1;
This returns an arbitrary row. You can control which row by adjusting the order by.
Another fun solution in BigQuery uses structs:
select array_agg(t limit 1)[ordinal(1)].*
from t
group by col;
You can add an order by (order by X limit 1) if you want a particular row.

here is just a more formated format :
select tab.* except(seqnum)
from (
select *, row_number() over (partition by column_x order by column_x) as seqnum
from `project.dataset.table`
) as tab
where seqnum = 1

Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY col
You can test, play with above using dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 1 col UNION ALL
SELECT 2, 1 UNION ALL
SELECT 3, 1 UNION ALL
SELECT 4, 2 UNION ALL
SELECT 5, 2 UNION ALL
SELECT 6, 3
)
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY col
with result
Row id col
1 1 1
2 4 2
3 6 3

How to select all columns for rows where I check if just 1 or 2 columns contain duplicate values

I'm having difficulty with what I figure should be an easy problem. I want to select all the columns in a table for which one particular column has duplicate values.
I've been trying to use aggregate functions, but that's constraining me as I want to just match on one column and display all values. Using aggregates seems to require that I 'group by' all columns I'm going to want to display.

If I understood you correctly, this should do:
SELECT *
FROM YourTable A
WHERE EXISTS(SELECT 1
FROM YourTable
WHERE Col1 = A.Col1
GROUP BY Col1
HAVING COUNT(*) > 1)

You can join on a derived table where you aggregate and determine "col" values which are duplicated:
SELECT a.*
FROM Table1 a
INNER JOIN
(
SELECT col
FROM Table1
GROUP BY col
HAVING COUNT(1) > 1
) b ON a.col = b.col

This query gives you a chance to ORDER BY cola in ascending or descending order and change Cola output.
Here's a Demo on SqlFiddle.
with cl
as
(
select *, ROW_NUMBER() OVER(partition by colb order by cola ) as rn
from tbl)
select *
from cl
where rn > 1

how to find the most appears one in a table using sql?

I have a table A with two columns named B and C as following:
('W1','F2')
('W1','F7')
('W2','F1')
('W2','F6')
('W2','F8')
('W4','F7')
('W6','F2')
('W6','F15')
('W7','F1')
('W7','F4')
('W7','F17')
('W8','F13')
How can I find which one in the B column appears with the most time using sql in oracle? (In this case, it's W2 and W7). Thank you!

Use a subquery to calculate the number of items in columC for each value in columnB and rank() the results of the subquery based on that count. Then in your main select return just the values of columnB where the rank of the rows returned by the subquery is 1:
SELECT ColB
FROM (
SELECT ColB,
Count(ColC),
rank() over (ORDER BY Count(ColC) DESC) AS rnk
FROM yourTable
GROUP BY ColB)
WHERE rnk = 1
Here's a sql fiddle: http://sqlfiddle.com/#!4/fa6bd/2

/*
C2 REFERS TO THE COLUMN B
T1 Refers to an alias
*/
WITH T1 AS
(
SELECT C2,COUNT(*) AS COUNT
FROM YOURTABLE
GROUP BY C2
)
SELECT C2,COUNT FROM T1 WHERE COUNT=(SELECT MAX(COUNT) FROM T1 )
;

Select ColB, Count(*)
FROM yourTable
GROUP BY ColB
ORDER BY count(*) desc

How can I get the n-th row in the Query results?

How can I get the n-th row of a TSQL query results?
Let's say, I want to get the 2nd row of this SELECT:
SELECT * FROM table
ORDER BY 2 ASC

What version of SQL Server are you targeting? If 2005 or greater, you can use ROW_NUMBER to generate a row number and select using that number. http://msdn.microsoft.com/en-us/library/ms186734.aspx
WITH orderedtable AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY <your order here>) AS 'RowNumber'
FROM table
)
SELECT *
FROM orderedtable
WHERE RowNumber = 2;

You can use a trick combining TOP with ORDER BY ASC/DESC to achieve an effect similar to MySQL's LIMIT:
SELECT TOP 2 * INTO #temptable FROM table
ORDER BY 2 ASC
SELECT TOP 1 * FROM #temptable
ORDER BY 2 DESC
or without temptable, but nested statements:
SELECT TOP 1 * FROM
(
SELECT TOP 2 * FROM table
ORDER BY 2 ASC
) sub
ORDER BY 2 DESC
The first time you select all rows up to the one you want to actually have, and in the second query you select only the first of the remaining when ordering them reversely, which is exactly the one you want.
Source: http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=850&lngWId=5

One way;
;with T(rownumber, col1, colN) as (
select
row_number() over (order by ACOLUMN) as rownumber,
col1,
colN
from
atable
)
select * from T where rownumber = 2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

GROUP BY on specific columns in hive - sql

I have a hive query with 38 columns and only one column is using an aggregate function. But I need to group it only with column name 1, 2 instead of all. How can this be accomplished? for example, What I need is, SELECT 1 ,2 ,3 ,4 ,5 ,MAX(6) FROM table_x GROUP BY 1,2

select all the columns you want and group it only with column name 1, 2 use analytical functions . use below query: select col1,col2.....col38, max(col6) over(partition by col1,col2 order by col1) as max_val from tablename

use row_number() function select * from ( SELECT 1,2,3,4,5,6,row_number() over(partition by 1,2 order by 6 desc) as rn FROM table_x )A where rn=1

It does not comply with group by definition. When you group by X columns, other Y columns must be aggregated to fit in the existing groups.

Related

concatenate column values from multiple rows in Oracle without duplicates

sql - single line per distinct values in a given column

How to select all columns for rows where I check if just 1 or 2 columns contain duplicate values

how to find the most appears one in a table using sql?

How can I get the n-th row in the Query results?

Categories

Resources