Grouping sets size cannot be greater than 64 : Hive - hive

I have 70 columns in my hive table i want to fetch all the rows which have exactly all the 70 matching columns.i.e. if two rows contain same data in all the column then i need to find that row and count as '2'. I'm writing below query.
SELECT (all 70 columns),COUNT(*) AS CountOf FROM tablename GROUP BY (all 70 columns)
HAVING COUNT(*)>1;
but its showing
Error: Error while compiling statement: FAILED: SemanticException [Error 10411]:
Grouping sets size cannot be
greater than 64 (state=42000,code=10411)
is there any way to find the exact duplicate rows's count from hive table?

It's a bug HIVE-21135 in Hive 3.1.0 version, it is fixed in the Hive 4.0.0, see HIVE-21018, not backported.
Try to concatenate columns using delimiter in the subquery before aggregation as a workaround, I'm not sure will it help or not.
like this, using concat() or concat_ws or || operator:
select concat_ws ('~', col1, col2, col3, col4)
...
group by concat_ws ('~', col1, col2, col3, col4)
or
col1||'~'||col2||'~'||...||colN
NULLs should be taken care also. Replace nulls with empty strings before concatenation using NVL function.

Related

How can I automatically remove columns that are all null in a SQL query?

Consider the basic SQL query
SELECT *
FROM tablename
For a variety of reasons, there are entire columns of null values in this database. How can I augment this query to automatically return columns that are not all null? This will be performed in Oracle SQL Developer.
Thank you!
To do that you'll have to specify the columns, like this:
SELECT col1, col2, col5 FROM tablename

How to represent the result in Hive

I have two fields like above image.
The above three rows to be represented as single row as mentioned in same image.
Can someone let me know how to produce the above result in Hive without using UDF?
You can use concat_ws:
select
concat_ws(',', collect_list(concat_ws(':', col1, col2))) as output
from mytable

UPDATING a table which is selected using SELECT query in SQL

I want to use the Update keyword with select something like
UPDATE(select col1,col2,col3 from UNPIVOTED_TABLE)
SET col1=0
WHERE col1 IS NULL
SET col2=0
WHERE col2 is NULL
SET col3=0
WHERE col3 is NULL
I know my syntax is not right but this basically is what i am trying to achieve
I am selecting 3 columns and there are some null values which i want to update and set it as 0
Also i cannot update the table itself since the original table was UNPIVOTED and i am PIVOTING it in the select statement and i need the pivoted result (that is the columns i have selected) (col1,col2,col3)
Also i am using amazon Athena if that is relevant
If I followed you correctly, you just want coalesce():
select
coalesce(col1, 0) col1,
coalesce(col2, 0) col2,
coalesce(col3, 0) col3
from unpivoted_table
colaesce() checks if the first argument is null: if it doesn't, it returns the original value as-is, otherwise it returns the value given a second argument instead.
In case you are using Athena, I can assume you have read only access and cannot really update the original data.
In your case, if you'd like to represent the nulls as 0 you can use `IFNULL(column, 0)
For more information about IFNULL you can read here

postgres varchar needs casting

I have 2 tables in 2 different schemas scha schb (e.g) and in scha I have a several tables that are all made of varchar as I had to format some data [it was part of the task]
now I have the same tables but with different types in schb.
The problem is this, Wherever I have a type which involves numbers (money, numerical, date), it's giving me an error to CAST.
Is there a way where I can CAST, without the need of copying one coloumn after another (copying it all in one go)
for example
INSERT INTO schb.customer
SELECT "col1", "col2" "col3 **(needs casting)**...."
FROM scha.customer
Thanks
A SELECT clause is not a list of columns, it is a list of expressions (which usually involve columns). A type cast is an expression so you can put them right into your SELECT. PostgreSQL supports two casting syntaxes:
CAST ( expression AS type )
expression::type
The first is standard SQL, the :: form is PostgreSQL-specific. If your schb.customer.col3 is (for example) numeric(5,2), then you'd say:
INSERT INTO schb.customer (col1, col2, col3)
SELECT col1, col2, cast(col3 as numeric(5,2))
FROM scha.customer
-- or
INSERT INTO schb.customer (col1, col2, col3)
SELECT col1, col2, col3::numeric(5,2)
FROM scha.customer
Note that I've included the column list in the INSERT as well. You don't have to do that but it is a good idea as you don't have to worry about the column order and it makes it easy to skip columns (or let columns assume their default values with explicitly telling them to).

Select (column-name) as subquery [duplicate]

This question already has answers here:
SQL column names and comparing them to row records in another table in PostgreSQL
(3 answers)
Closed 9 years ago.
I am trying to have a SQL statement where the column names in the SELECT are a subquery. The basic format is:
SELECT (<subquery for columns>) FROM Table;
My subquery returns 4 rows of field names, so I need to make them a single row. I used:
SELECT array_to_string(array_agg(column_names::text),',') FROM Fieldnames;
And then I get a returned format of col1, col2, col3, col4 for my 4 returned rows as a string. If I paste in the raw test for my query, it works fine as:
SELECT (col1, col2, col3, col4) FROM Table;
The issue arises when I put the two together. I get an odd response from psql. I get a:
?column?
col1, col2, col3, col4
with no rows returned for:
SELECT(SELECT array_to_string(array_agg(column_names::text),',') FROM Fieldnames) FROM Table;
Conceptually, I think there are two ways I can address this. I need to get my subquery SELECT back in a format that I can put as the column-name argument to the first SELECT statement, but because I return multiple rows (of a single value of a varchar for the column name that I want), I thought I could just paste them together but I cannot. I am using psql so I do not have the "#" list trick.
Any advice would be appreciated.
Solution:
Here is why the question is not a duplicate, and how I solved it. In trying to simplify the question to be manageable, it lost its muster. I ended up writing a function because I couldn't use # to pass a list to SELECT in PostgreSQL. When you want to select only a subset of rows, you cannot pass a nested (SELECT) even with an AS, although this works in Oracle. As a result, I wrote a function that effective created a string, and then passed it as the SELECT. There seems to be something fundamentally different on how the SQL parser in PostgreSQL handles the arguments for SELECT from Oracle, but everyone DB is different.
If you enclose several column names in parentheses like you do:
SELECT (col1, col2, col3, col4) FROM tbl;
.. you effectively create an ad-hoc row type from the enclosed columns, which has no name, because you did not provide an alias. Postgres will choose a fallback like ?column?. In later Postgres versions the default name is row since, internally, the above is short syntax for:
SELECT ROW(col1, col2, col3, col4) FROM tbl;
Provide your own name (alias):
SELECT (col1, col2, col3, col4) AS my_row_type FROM tbl;
But you probably just want individual columns. Drop the parentheses:
SELECT col1, col2, col3, col4 FROM tbl;