Does Tableau support complex data type of hive columns and flatten it? - hive

I am trying to create a dashboard from the data present in the Hive. The catch is the column which I want to visualize is a nested JSON type. So will tableau able to parse and flatten the JSON column and list out all possible attributes? thanks!

Unfortunately Tableau will not automatically flatten the JSON structure of the field for you, but you can manually do so.
Here is an article that explains the use of Regex within Tableau to extract pertinent information from your JSON field.
I realize this may not be the answer you were looking for, but hopefully it gets you started down the right path.
(In case it helps, Tableau does have a JSON connector in the event you are able to connect directly to your JSON as a datasource instead of embedded in your Hive connection as a complex field type.)

Related

dbt macro for Databricks to flatten a column with arbitrarily nested JSON content

My goal is to write a dbt macro that will allow me to flatten a table column with arbitrarily nested JSON content.
I have already found a wonderful tutorial for this for Snowflake, however I would like to implement this for Databricks (Delta Lake) - using SQL.
Ultimately, I am looking for the Databricks equivalent of the LATERAL FLATTEN function in Snowflake.
The following is an example of a source...
Source
If ultimately using SQL to transform to the following target state:
Target
I have already looked at several projects, for example json-denormalize. However, I would like to implement this completely in SQL.
Also I have seen the Databricks functions json_object_keys, lateral view, explode, but can't make sense of how exactly I should ideally approach the problem.
Can someone steer me in the right direction?

How to read specific columns from a Parquet file in Java

I am using a WriteSupport that knows how to write my custom object 'T' into Parquet. I am interested in only reading 2 or 3 specific columns out of 100 columns of my custom object that are written into the Parquet file.
Most examples online extend ReadSupport and read the entire record. Want to accomplish this without using things like Spark, Hive, Avro, Thrift, etc.
An example in Java, which reads selected columns of a custom object in Parquet?
This post may help.
Read specific column from Parquet without using Spark
If you just want to read specific columns, then you need to set a read schema on the configuration that the ParquetReader builder accepts. (This is also known as a projection).
In your case you should be able to call .withConf(conf) on the AvroParquetReader builder class, and in the conf you pass in, invoke conf.set(ReadSupport.PARQUET_READ_SCHEMA, schema) where schema is a avro schema in String form.

coldfusion and reading JSON data from web service into a cf query

I am using the following code to obtain data from a webservice and it returns the data in json format
<cfhttp url="http://api.sensis.com.au/v1/test/search?key=czsjp3f8xhd835vg6xfw8ber&query=vetinary%20and%20clinic&radius=1&location=-37.7833,144.9667">
<cfdump var="#cfhttp.FileContent#">
I want to be able to output the data into a table, for that reason, I need to be able to bring the data into a query object and then I can cfloop or cfoutput the query to display each row of data for the selected data fields I choose.
However, I have not been successful in trying to achieve the above. I would appreciate, if I could be given some assistance with the code to achieve the result mentioned.
I believe, the answers by #Leigh and #J.T to this question would help you understand the structure of JSON and how to handle it in ColdFusion.
As already commented, you don't need to convert the result to a query object in order to represent the data you want in a tabular format. That is the beauty and simplicity of ColdFusion, you can easily loop through an array, a collection (or struct) or a complex arrangement such as an array of structures. Learn from here.
The JSON result of the http call in your question has a 'results' object with more nested objects. You first begin by deserializing the http result using DeSerializeJSON() get the results object and dump it, analyze the data structure within and finally form your solution.
I will recommend you to start by building up your understanding on the subject.

How to query a specific collection from a RavenDB?

In one database, I am storing two separate documents - CumulativeSprintData and Features. I'm trying to query from javascript. Right now I'm just using the default:
http://servername:8080/databases/sprintprogress/indexes/dynamic?
The problem is that this default query pulls in documents of both types. How do I specify which document type I want to pull down?
Thanks!
You can use:
http://servername:8080/databases/sprintprogress/indexes/dynamic/Features
http://servername:8080/databases/sprintprogress/indexes/dynamic/CumulativeSprintDatas

Native JSON support for BigQuery?

Is there any plan for Google BigQuery to implement native JSON support?
I am considering migrating hive data (~20T) to Google BigQuery,
but the table definitions in Hive contains map type which is not supported in BigQuery.
for example, the HiveQL below:
select gid, payload['src'] from data_repository;
although, it can be worked around by using regular expression.
As of 1 Oct 2012, BigQuery supports newline separated JSON for import and export.
Blog post: http://googledevelopers.blogspot.com/2012/10/got-big-json-bigquery-expands-data.html
Documentation on data formats: https://developers.google.com/bigquery/docs/import#dataformats
Your best bet is to coerce all of your types into csv before importing, and if you have complex fields, decompose them via a regular expression in the query (as you suggested).
That said, we are actively investigating support for new input formats, and are interested in feedback as to what formats would be the most useful. There is support in the underlying query engine (Dremel) for types similar to the hive map type, but BigQuery, however, does not currently expose a mechanism for ingesting nested records.