How can I get result format JSON from Athena in AWS? - hive

I want to get result value format JSON from Athena in AWS.
When I select from the Athena then the result format like this.
{test.value={report_1=test, report_2=normal, report_3=hard}}
Is there any way to get JSON format result without replacing "=" to ":" ?
The column format is
map<string,map<string,string>>

select mycol
from mytable
;
+--------------------------------------------------------------+
| mycol |
+--------------------------------------------------------------+
| {test.value={report_3=hard, report_2=normal, report_1=test}} |
+--------------------------------------------------------------+
select cast (mycol as json) as json
from mytable
;
+--------------------------------------------------------------------------+
| json |
+--------------------------------------------------------------------------+
| {"test.value":{"report_1":"test","report_2":"normal","report_3":"hard"}} |
+--------------------------------------------------------------------------+

If your input format is json (i.e. your whole row is JSON) you can create a new table that holds athena results in whatever format you specify out of several possible options like parquet, json, orc etc. This would ultimately end up storing all athena results of your query, in an s3 bucket with the desired format.
hope this helps
here's the documentation for reference: https://docs.aws.amazon.com/athena/latest/ug/ctas-examples.html

Related

Postgres \u representation to readable representation

I've a column which has data in following format
\d my_table
Column | Type | Modifiers
-------------+--------------------------+---------------
message | character varying |
message has strings such as
select message from my_table;
message
-----------------------
\u6771
How can I print its human readable counterpart 東? I don't know how to use E while selecting columns. I don't have database write access so I can't create functions.

single vs double quotes in WHERE clause returning different results

It seemed that Athena was including CSV column headers in my query results. I recreated the tables with the DDL included below using TBLPROPERTIES ("skip.header.line.count"="1") to remove the headers.
I'm running the following queries to validate that the CREATE TABLE DDL worked. The only difference between the queries below is the use of single vs double quotes in the WHERE clause. The issue is that I'm getting different result when running them.
Query 1:
SELECT
file_name
FROM table
WHERE file_name = "file_name"
The query above returns the actual data (see sample table below), rather than only rows where the file_name field is "file_name".
+-------+--------------------+
| Row # | file_name |
+-------+--------------------+
| 1 | |
| 2 | 1586786323.8194735 |
| 3 | |
| 4 | 1586858857.3117666 |
| 5 | 1586858857.3117666 |
| 6 | 1586858857.3117666 |
| ... | |
+-------+--------------------+
Query 2:
SELECT
file_name
FROM table
WHERE file_name = 'file_name'
The query above returns no results, as expected if the CSV column headers are not being included in the results.
I'm quite confused by the first query returning any results at all. I've scoured the AWS documentation at this point and doesn't seem I did anything wrong with the DDL and SQL should not care whether I use single vs. double quotes. What am I missing here?
DDL:
CREATE EXTERNAL TABLE `table` (
`file_name` string,
`ticker` string,
...
)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'escapeChar'='\\',
'separatorChar'=',')
LOCATION
's3://{bucket_name}/{folder}/'
TBLPROPERTIES (
"skip.header.line.count"="1")
Single quotes are the SQL standard for delimiting strings.
Double quotes are used for escaping delimiters. So "file_name" refers to the column of that name. Some databases also accept double quotes for strings. That is just confusing. Don't do that.
In your original tags, for instance, Hive uses backticks to escape identifiers and double quotes for strings. Presto uses double quotes (which is the standard) to delimit identifiers.
Just to expand on Gordon's answer a little. Your first query:
SELECT
file_name
FROM table
WHERE file_name = "file_name"
In this case, the double quotes are causing the query engine to treat "file_name" as a column identifier, not a value, so that query is functionally the same as:
SELECT
file_name
FROM table
WHERE file_name = file_name
Obviously (when written that way) the condition is always true, so the full table is returned.

Format a number to NOT have commas (1,000,000 -> 1000000) in Google BigQuery

In Bigquery: How do we format a number that will be part of the result set that should be not having commas: like 1,000,000 to 1000000 ?
I am assuming that your data type is string here.
You can use the REGEXP_REPLACE function to remove certain symbols from strings.
SELECT REGEXP_REPLACE("1,000,000", r',', '') AS Output
Returns:
+-----+---------+
| Row | Output |
+-----+---------+
| 1 | 1000000 |
+-----+---------+
If your data contains strings with and without commas, this function will return the ones without as they are so you don't need to worry about filtering the input.
Documentation for this function can be found here.

Extract particular character using StandardSQL

I would like to extract particular character from strings using StandardSQL.
I would like to extract the character after limit=.
For instance, from below strings I would like to extract 10, 3 and null. For everything that has null I also would like to make all null = 1.
partner=&limit=10
partner=aex&limit=3&filters%5Bpartner%5D
partner=aex&limit=&filters%5Bpartner%5D
I only know how to use substring function but the problem here is the positions of limit= are not always the same.
You can use REGEXP_EXTRACT. For example:
SELECT REGEXP_EXTRACT('partner=aex&limit=3&filters%5Bpartner%5D', 'limit=(\\d+)');
+-------+
| $col1 |
+-------+
| 3 |
+-------+

Concat multiple rows with a delimiter in Hive

I need to concat string values row wise with '~' as delimiter.
I have the following data:
I need to concat 'Comment' column for each 'id' in the ascending order of 'row_id' with '~' as delimiter.
Expected output is as below:
GROUP_CONCAT is not an option since its not recognized in my Hive version.
I can use collect_set or collect_list, but I won't be able to insert delimiter in between.
Is there any workaround?
collect_list returns array, not string.
Array can be converted to delimited string using concat_ws.
This will work, with no specific order of comments.
select id
,concat_ws('~',collect_list(comment)) as comments
from mytable
group by id
;
+----+-------------+
| id | comments |
+----+-------------+
| 1 | ABC~PRQ~XYZ |
| 2 | LMN~OPQ |
+----+-------------+