Hierarchical SQL query to Athena - sql

I'm trying to create a query in Athena that solves this problem:
I have records that look like this
{'id': 'a', 'children': ['b','c']}
which create a hierarchical structure, like a tree but with indeterminate children.
I have more than one root, that is, more than one element that is not children of anyone.
I want to get the complete structure for one of them. How can I do that using a SQL query? I've seen that recursive queries are not allowed in Athena.

What you want to achieve is called "recursive queries" or "recursive CTEs" (common table expressions). Presto 340 adds experimental support for them, but Athena is based on Presto 0.172 and does not have the feature. Unfortunately, there is no general replacement for the feature.
Without support for the feature in the query engine, you need to pull the parent/child relationships and calculate the result within your app.

Related

How do I query across many tables in redshift without UNION ALL?

I'm looking for a dumb way to write the same select query across all tables. For example in Google Bigquery I can query like this using wild cards
select COMPLICATED QUERY HERE from `myproject:mytable_2017_1_*`;
How can I do the equivalent in redshift?
The wildcard syntax is not available for Amazon Redshift. Each query must specifically reference the table(s) it wishes to use.
You could create a VIEW that does the UNION ALL for you, and then you could just query the view.

What are Apache Kylin Use Cases?

I recently came across Apache Kylin, and was curious what it's use cases are. From what I can tell, it seems to be a tool designed to solve very specific problems related to upwards of 10+ billion rows, aggregating, caching and querying data from other sources (HBase, Hadoop, Hive). Am I correct in this assumption?
Apache Kylin's use case is interactive big data analysis on Hadoop. It lets you query big Hive tables at sub-second latency in 3 simple steps.
Identify a set of Hive tables in star schema.
Build a cube from the Hive tables in an offline batch process.
Query the Hive tables using SQL and get results in sub-seconds, via Rest API, ODBC, or JDBC.
The use case is pretty general that it can fast query any Hive tables as long as you can define star schema and model cubes from the tables. Check out Kylin terminologies if you are not sure what is star schema and what is cube.
Kylin provides ANSI SQL interface, so you can query the Hive tables pretty much the same way you used to. One limitation however is Kylin provides only aggregated results, or in other word, SQL should contain a "group by" clause to yield correct result. This is usually fine because big data analysis focus more on the aggregated results rather than individual records.

store hierarchy information in a MS Access DB for faster queries

I'm trying to figure out how I can store hierarchical type information in a MS Access DB so that queries will be faster. An use case example might make more sense.
I have a table that has two fields
a name
a hierarchy
a hierarchy is an X # of level folder structure:
\a\b\c\d
\a\b\c\d\e
\a\b\c\d\f\g
\a\b\h
\a\b\i\j
you get the idea
the table will be filled with 300,000 rows
each row will have a name and a hierarchy
At this point:
if I want to find all the names that are in a hierarchy, including sub-hierarchies I can run a like query: where [hierarchy] like '\a\b\*'
I can even do wildcard joins even though MS Access's query design GUI doesn't handle it and I have to use the SQL view: join on [hierarchy] like '\a\b\*'.
But it can be very slow. Especially if my joins get complex.
So I thought maybe there is a way to create another table that would all the hierarchies and it would maintain parent/child relationships and the first table would reference a row in it. And then, somehow, I could use it to find rows in the first table that match hierarchies and sub-hierarchies in the second table.
However, I have no clue if this is even possible and how I would go about it. Any advice is appreciated.
In Oracle we use the hierarchal structure where each row has a reference to its parent. Then with the CONNECT BY clause you can connect these rows to each-other.
You should take a look here: simulation of connect-by in sql-server

Suitability of MongoDB for equivalent of XPath

I am very interested in using MongoDB for a variety of reasons. It suits many of my needs well.
However, I also need to perform the equivalent of an XPath query. I have a complex hierarchical document. I need to be able to extract specific nodes (and their children) based on parameter matching. Something like:
Give me the document structure starting at node x where the attribute "level" is null or 1.
Can MongoDB do this and if so, how can I go about it? Or should I stick to PostgreSQL / SQL Server for this type of work?
Wrong tool....use a database providing explicit support for hierarchical data like a graph database or a RDBMS with support for XML (if you are using XML). MongoDB is not suited for this purpose..

A query to summarize data in sub-tree?

My data fits a tree form naturally. Therefore, I have a simple SQL table to store the data: {id, parentid, data1, ..., dataN}
I want to be able to "zoom in" on the data and produce a report which summarizes the data found below the current branch.
That is, when standing in the root, I want to have the totals of all the data. When I have traveled down a certain branch of the tree, I want to only have the summation of the data found only for that node and its child nodes.
How do I write such a query in SQL?
Thanks in advance!
/John
Since sqlite does not support CONNECT BY, you will not be able to perform this calculation in a single query unless you use nested sets or materialized paths for your data.
Alternatively, do it "the hard way" and traverse your tree recursively, one query for each child node starting at the parent-of-interest.
Also see:
Managing Hierarchical Data in MySQL
Recursive Hierarchies: The Relational Taboo!
Vlad's reference on nested sets looks pretty good. If you want something that covers trees and hierarchies in more detail then you can also check out Joe Celko's book.
The "ID, ParentID" adjacency list model is really an "old time" way of looking at hierarchies in a relational database model.