Is it possible to concat a string field after group by in Hive

Is it possible to concat a string field after group by in Hive - hive

I am evaluating Hive and need to do some string field concatenation after group by. I found a function named "concat_ws" but it looks like I have to explicitly list all the values to be concatenated. I am wondering if I can do something like this with concat_ws in Hive. Here is an example. So I have a table named "my_table" and it has two fields named country and city. I want to have only one record per country and each record will have two fields - country and cities:
select country, concat_ws(city, "|") as cities
from my_table
group by country
Is this possible in Hive? I am using Hive 0.11 from CDH5 right now

In database management an aggregate function is a function where the values of multiple rows are grouped together as input on certain criteria to form a single value of more significant meaning or measurement such as a set, a bag or a list.
Source: Aggregate function - Wikipedia
Hive's out-of-the-box aggregate functions listed on the following web-page:
Built-in Aggregate Functions (UDAF - user defined aggregation function)
So, the only built-in option (for Hive 0.11; for Hive 0.13 and above you have collect_list) is:
array collect_set(col)
This one will answer your request in case there is no duplicate city records per country (returns a set of objects with duplicate elements eliminated). Otherwise create your own UDAF or aggregate outside of Hive.
References for writing UDAF:
Writing GenericUDAFs: A Tutorial
HivePlugins
Create/Drop Function

Related

How can I create multiple rows based on the value of one column in SQL?

I have a column of type string in my table, where multiple values are separated by pipe operator. For example, like this,
Value1|Value2|Value3
Now, what I want is to have a query, which will show three rows for this row. Basically something similar to the concept of explode in Dataframes.
Note that I am using Spark SQL. And I want to achieve this using SQL, not dataframes.

I got it working by using the following query.
select t.*, explode(split(values, "\\|")) as value
from table t
\\| here can also be replaced by [|]. Just specifying | doesn't work.

How to use Kusto to return a max() row from a table, while showing other columns not used in the max grouping

Given the following Log analytics KQL query :
SigninLogs
| where ResultType == 0
| summarize max(TimeGenerated) by UserPrincipalName
I need to display other columns from those selected rows in the SigninLogs table. I've tried different approaches with no success. Joining back to the same table again seems unfeasible as joins appear to only be available using a single column. Other approaches using in failed because the needed columns weren't available in the above source query.

You can use the arg_max() aggregation function: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/arg-max-aggfunction

Partitioning BigQuery table based on nested column

I am trying to partition a BigQuery table based on a timestamp but the column I want to use for partitioning is a nested column and has a parent record. For instance: transaction.timestamp.
I would like to pass the column name as String to a java method. How can I define this column name as String in java when I pass it as a parameter?
I have previously tried partitioning with non-nested columns and it worked fine. Following piece of code does not recognize the column name and results in error:
String columnName = "transaction.timestamp";
I would appreciate your help in figuring this problem out.

For partitioning and clustering: You will need to unnest the column and have it as a first level column.
From the docs:
The partitioning column must be a top-level field. You cannot use a leaf field from a RECORD (STRUCT) as the partitioning column.
https://cloud.google.com/bigquery/docs/creating-column-partitions

Equivalent of CONCAT_GROUP for multiple columns

Do you have any idea on how to display values obtained by concat_group in multiple columns instead of having a unique column containing all the values separated by commas.
Thanks in advance :-)

You cannot do this in SQL.
One of the fixed rules of SQL is that the columns in your select-list must be set at the time you prepare your query. The select-list does not expand dynamically to match the values it finds as it examines the data.
This comes from the origins of SQL in the relational model. A relation (not a relationship, lots of people get this wrong) is a data structure with a fixed set of columns, a header defining the names and data types of the columns, and then a set of rows, where every row has the same set of columns as the header.
The select-list of an SQL SELECT statement effectively defines the header of the relation returned as the result-set of that query. The number and names of the columns are defined by the query, not by the data in the result.
A commenter above asks if you want to do a pivot, but a pivot also requires that you name the columns in the select-list. There is no such thing as a SQL pivot query that grows its select-list according to the data in the result.

Get row values as column names in t-sql

I have a requirement to display row values as column names in a data grid view. I want to get the store names into columns using sql select statement. (Please refer the attached image). I want user to enter some values under each column. So STORE 1, STORE 2, STORE 3 should displays as columns in datagrid view. Does anyone can help me to get this work?
while googling i found this can be done using PIVOT in SQL. But in this table i don't have any aggregate columns. Any help pls?
the result should be somthing like

You may know that your data only contains a single row for each pivoting column, but SQL Server has to construct a plan that could accommodate multiple rows.
So, use PIVOT and just use an aggregate that, if passed a single value, will return that same value. MIN() and MAX() fit that description (as does SUM if you're working with numeric data)

You may use specific function of dynamic pivot and pass your query with item count column.
You can use below link which provided you function and can easily show you expected output.
http://forums.asp.net/t/1772644.aspx/1
Procedure name:
[dbo].[dynamic_pivot]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Is it possible to concat a string field after group by in Hive - hive

Related

How can I create multiple rows based on the value of one column in SQL?

How to use Kusto to return a max() row from a table, while showing other columns not used in the max grouping

Partitioning BigQuery table based on nested column

Equivalent of CONCAT_GROUP for multiple columns

Get row values as column names in t-sql

Categories

Resources