Query on result of Hive's Describe - hive

In Hue/Hive,
Describe mytablename;
gives the list of columns, their types and comments. Is there any way to query in Hive, treating result from describe as a table ?
For example I want to count the number of numeric/character/specific type columns, filter column names, total number of columns (currently requires scrolling down per 100 each, which is a hassle with 1000+ columns), etc
Queries such as
select count(*) from (Describe mytablename);
select count(*) from (select * from describe mytablename);
are of course invalid
Any ideas ?

You can create a sql file --> hive.sql containing "describe dbname.tablename"
hive -f hive.sql > /path/file.txt
create table dbname.desc
(
name String,
type String,
desc String
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
then, load data from path '/path/file.txt' into table dbname.desc.

Related

dynamically cast() values to string and unpivot in BigQuery

I have tables (of different schema) that consist of numerous rows (millions) with a unique id and at least 100-200 columns of various data types (INT64, String, Datetime, Float...etc). I need to unpivot the columns to rows dynamically and display pertaining values (including null values) in the next column. I need this only for data related to a selected id.
Here is an example of what I need.
An idea of how tables look and final result:
I wrote this code but I am getting the following error:
"Query error: The datatype of column does not match with other datatypes in the IN clause. Expected STRING, Found INT64 at [4:74]"
code I wrote:
declare myup string;
set myup=(
select concat('(',string_agg(column_name,','),')'),
from (select distinct column_name from `abc-def-
bigqueryghi.dataset_info.INFORMATION_SCHEMA.COLUMNS`
where table_name='table_1'
and column_name not in ("id")
)
);
execute immediate format("""
select*from `abc-def-bigquery-ghi.dataset_info.table_1`
unpivot
(values for column_name in %s)""",myup);
It is not possible to explicitly cast each column by name into string since some tables have up to 200 columns.
Null values also need to be displayed in final result since this needs to then be visualized on Google Data Studio.
Any ideas on how to solve this is highly appreciated.

Teradata: Results with duplicate values converted into comma delimited strings

I have a typical table where each row represents a customer - product holding. If a customer has multiple products, there will be multiple rows with the same customer Id. I'm trying to roll this up so that each customer is represented by a single row, with all product codes concatenated together in a single comma delimited string. The diagram below illustrates this
After googling this, I managed to get it to work using the XMLAGG function - but this only worked on a small sample of data, when scaled up Teradata complained about running out of 'spool space' - so I figure it's not very efficient.
Does anyone know how to efficiently achieve this?
Newer versions of Teradata support NPath, which can be used for this. You have to get used to the syntax, it's a Table Operator :-)
E.g. this returns the column list for each table in your system:
SELECT *
FROM
NPath(ON(SELECT databasename, tablename, columnname, columnid
FROM dbc.columnsV
) AS dt -- input data
PARTITION BY databasename, tablename -- group by columns
ORDER BY columnid -- order within list
USING
MODE (NonOverlapping) -- required syntax
Symbols (True AS F) -- every row
Pattern ('F*') -- is returned
RESULT(First (databasename OF F) AS DatabaseName, -- group by column
First (tablename OF F) AS TableName, -- group by column
Count (* OF F) AS Cnt,
Accumulate(Translate(columnname USING unicode_to_latin) OF ANY (F)) AS ListAgg
)
);
Should be waaaaaay better than XMLAgg.

Bigquery : get the name of the table as column value

I have a dataset that contains several tables that have suffixes in their name:
table_src1_serie1
table_src1_serie2
table_src2_opt1
table_src2_opt2
table_src3_type1_v1
table_src3_type2_v1
table_src3_type2_v2
I know that i can use this type of queries in BQ:
select * from `project.dataset.table_*`
to get all the rows from theses different tables.
What i am trying to achieve is to have a column that will contain for instance the type of source (src1, src2, src3)
Assuming the schema of all tables the same - you can add below to your select list (for BigQuery Standard SQL)
SPLIT(_TABLE_SUFFIX, '_')[SAFE_OFFSET(0)] AS src

Query Sql Like String

I need help for sql query LIKE.
Value for column in database is same below:
record 1 : "3,13,15,20"
record 2 : "13,23,14,19"
record 3 : "3,14,15,19,20"......
for now I want to get the most accurate record with a value of 3
This is my query :
SELECT * FROM accounts where type like '%3%'
This query will find all record with value exist is '3' eg: 13,23 ....
And It does not solve my problem.
Try this:
SELECT *
FROM accounts
WHERE CONCAT(',', type, ',') LIKE '%,3,%';
Demo
This trick places commas around the end of the type CSV string, so that we all we have to do is then check for ,3, anywhere in that string.
By the way, it is generally not desirable to store CSV data like this in your SQL tables. Instead, consider normalizing your data and storing those CSV values across separate rows.

How do I select a value in a key:value pair within a list in a column using SQL?

In a table called payouts, there is a column stripeResponseData where the data is in the following structure:
{"id":"tr_1BlSHbGQXLV7RqqnHJffUVO0","object":"transfer","amount":39415,"amount_reversed":0,"balance_transaction":"txn_1BlSHbGQXfV7AqqnGi2o7UiY","created":1516239215,"currency":"usd","description":null,"destination":"acct_1BWWAmAzms5xPfV9","destination_payment":"py_1BlSHbAzms5xkfV91RHAOrno","livemode":true,"metadata":{},"reversals":{"object":"list","data":[],"has_more":false,"total_count":0,"url":"/v1/transfers/tr_1BlSHbYQXLV7AqqnHJffUVO0/reversals"},"reversed":false,"source_transaction":null,"source_type":"card","transfer_group":null}
Within my SQL SELECT statement, I want to return only the value of the key "destination". How do I write my SQL query?
My desired result of the query:
SELECT "stripeResponseData" FROM payouts [...]
(where I don't know how to write [...]) should look like the following (assume we have 3 rows with different values on "destination"):
acct_1BWWAmAzms5xPfV9
acct_1AY0phDc9pCDpLR8
acct_1AwG3VL7DXxftOaS
How do I extract that value from the list within the stripeResponseData column?
See this sqlfiddle. This query will fetch the ID from stripResponseData where the id is a specific id (Probably not very useful, but does show you how to select and query):
SELECT data->>'id' FROM stripeResponseData WHERE data #> '{"id":"tr_1BlSHbGQXLV7RqqnHJffUVO0"}';
Because you mentioned your data was a string, you need to to type conversions to query/use it correctly. See this sqlfiddle:
SELECT data::jsonb->>'id' FROM stripeResponseData WHERE data::jsonb #> '{"id":"tr_1BlSHbGQXLV7RqqnHJffUVO0"}';
Per your edit, you can simply query destination in almost the exact same way. This will get all the id's from stripeResponseData where destination = acct_1BWWAmAzms5xPfV9:
SELECT data::jsonb->>'id' FROM stripeResponseData WHERE data::jsonb #> '{"destination":"acct_1BWWAmAzms5xPfV9"}';