how to convert columns into rows in HIVE - hive

I have a table in HIVE which looks like this
cust_1,month_1, f1,f2,f3
cust_1,month_2, f2,f3,f4
cust_2,month_1, f1,f5,f4
I would like to convert it in the following format
cust_1,month_1, f1
cust_1,month_1, f2
cust_1,month_1, f3
....
How that is possible in HIVE?

you can use this sql:
select col1, col2, value
from orig_table lateral view explode(array(col3,col4,col5)) orig_table_alias as value;

Related

aws athena convert array<array<string>> to table

I have an s3 bucket that got a lot of files with this content:
{time:123456, state:{{1,2,3,4},{4,5,6,7}...}
and after I use Athena read it, the result is the dataset with 2 colums: the first one is time int and the second is array<array<string>>, is it possible to convert with athena sql this table to:
select time, col1,col2,col3,col4
from table
when the col1...4 is the names of the columns on the array?
Use CROSS JOIN UNNEST to unnest upper array:
select a.time,
state_array[1] as col1,
state_array[2] as col2,
state_array[3] as col3,
state_array[4] as col4
from my_table a
CROSS JOIN UNNEST(state) as t(state_array)

Convert ARRAY_UPPER() to Hive query

I have to convert a postgres query to a hive query as :
Select
SUBSTRING(col_name[ARRAY_UPPER(col_name)][2],1,8)
from tab
As hive does not have anything called ARRAY_UPPER then how can we covert?
In my hive table corresponding col_name is of String type, so I tried as below but not getting desired output.
select substring(col_name,-10,8) from hive_tab
Input and output are:
with data_example as(
select
array(1476290200, 35525707293822) as a
)
select substr(a[size(a)-1],-7)
from data_example
Result:
7293822

Extract elements from an array inside a jsonb field in Postgresql

In my Postgresql table, I have a jsonb field called data, which contains data in the following format:
{
list:[1,2,3,4,5]
}
I use the query:
select data->'list' from "Table" where id=1
This gives me the array [1,2,3,4,5]
The problem is that I want to use this result in another select query within the IN clause. It's not accepting the array.
IN ([1,2,3,4,5]) fails
It wants:
IN (1,2,3,4,5)
So, In my original query I don't know how to covert [1,2,3,4,5] to just 1,2,3,4,5
My current query is:
select * from "Table2" where "items" in (select data->'list' from "Table" where id=1)
Please help
You can use the array contains operator (#>) rather than IN if you cast the search value to jsonb. For example:
SELECT *
FROM "Table2"
WHERE items::jsonb <# (SELECT data->'list' FROM "Table" WHERE id=1)
Note that if items is an int you will need to cast it char before casting to jsonb:
SELECT *
FROM "Table2"
WHERE cast(items as char)::jsonb <# (SELECT data->'list' FROM "Table" WHERE id=1)
Demo on dbfiddle
Use jsonb_array_elements() to turn the elements into rows
select t2.*
from table_2 t2
where t2.items in (select jsonb_array_elements_text(t1.data -> 'list')::int
from table_1 t1
where t1.id = 1);
This assumes that items is defined as text or varchar and contains a single value - however the name (plural!) seems to indicate yet another de-normalized column.

Hive- how to get the derive column names and use it in the same query?

I am trying to run the below query :
select [every_column],count(*) from <table> group by [every_column] having count(*) >1
But column names should be derived in the same query. I believe show columns in would list down the column names separated by new line. But I need to use it in one query to retrieve the result.
Appreciate any help in this regard.
You can use shell sed to search the newlines(\n) and replace with comma(,).
Assign the comma separated column names to a hive variable, use the variable name in your hive query.
References for sed and set hive variables
Have you thought of using subqueries or even CTE? Maybe this helps you find your answer:
select outer.col1,
outer.col2,
outer.col3,
outer.col4,
outer.col5,
outer.col6, count(*) as cnt
from (
select <some logic> as col1,
<some logic> as col2,
<some logic> as col3,
<some logic> as col4,
<some logic> as col5,
<some logic> as col6
from innerTable
)outer
group by outer.col1,
outer.col2,
outer.col3,
outer.col4,
outer.col5,
outer.col6

How do I add a null row in Postgres in a generic manner

I would like to do
select col1, col2 from foo union values (null, null)
but null is given the default type of TEXT, so I get the error "UNION types [e.g.] integer and text cannot be matched". In specific cases I can provide the types of the columns of foo, but I am constructing SQL statements programatically and it would be preferable if I didn't have to carry around the column type information with me.
Is there a workaround for this?
You can query INFORMATION_SCHEMA table COLUMNS using query like this:
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'mytable'
or you can use PostgreSQL specific form:
SELECT attname, atttypid::regtype
FROM pg_attribute
WHERE attrelid = 'public.mytable'::regclass
AND attnum > 0
This will give you data types for columns of interest in your table. Having this, in your automated framework you can generate UNION string to add empty row by casting NULLs to required data type, like this:
SELECT col1, col2 FROM foo
UNION ALL VALUES (NULL::VARCHAR, NULL::INTEGER)
Probably more important question is why do you want empty row? Perhaps you can get around without having this synthetic empty row in first place?
Just abuse an outer join like so:
select col1, col2 from foo
full join (select) as dummy on false
If col1 is of type bar and col2 is of type baz then
select col1, col2 from foo union values (null::bar, null::baz)
Will work
Actually, you can cast NULL to int, you just can't cast an empty string to int. Assuming you want NULL in the new column if data1 contains an empty string or NULL, you can do something like this:
UPDATE table SET data2 = cast(nullif(data1, '') AS int);
or
UPDATE table SET data2 = nullif(data1, '')::int;
Reference