Create hierarchy table/dataframe in Spark or SQL

Create hierarchy table/dataframe in Spark or SQL - sql

I have the following dataframe that contains all the data I need.
The thing is, each child can be found in the parent column with their own children
1000584 is Top level
4003773 is Level 1
1252665 is Level 2
*1321212 is Level 3
What I want to achieve is this:

Related

How to calculate percentiles for every level in a game using PostgreSQL 9.2

I have a table of game logs. Like this:
Level Shuffle_Count
1 3
2 1
2 2
2 1
3 0
3 4
That means whenever a user plays a level, a row is added to table. These rows have the level data showing which level was played by user and the shuffle_count data showing how many times shuffle happened during that level.
I want to know how many times shuffle occurs in every level by calculating the median of shuffle_count for every level. In the below code, I can find the median of level 2 separately. Firstly, I create a temporary table which orders shuffle_counts and divide them to 4 even groups with ntile. Then I select the min shuffle_count which has value of 3 within the new column named quartile.
with ranked_test as (
SELECT shuffle_count, ntile(4) OVER (ORDER BY shuffle_count) AS quartile FROM ch.public.game_log WHERE level = 2
)
SELECT min(shuffle_count) FROM ranked_test
WHERE quartile = 3
GROUP BY quartile;
This is the table created before selecting min shuffle_count where quartile = 3 (which is median approximately):
Shuffle_Count quartile
0 1
0 1
2 2
3 2
4 3
8 3
12 4
19 4
So far so good. But the problem is that I have over 1000 levels and I can't do that manually for each level. I need the median value of shuffle_count for every level from 1 to 1000. I know this could be done with one row in PostgreSQL 9.4 but I unfortunately don't have that option right now.
I couldn't make this happen with a simple Group By. I guess I need more complex query including FOR or something.
Do you have any idea, guys? Thanks in advance.

I think that this should do it for your use case:
with ranked_test as (
select
level,
shuffle_count,
ntile(4) over(partition by level order by shuffle_count) quartile
from ch.public.game_log
)
select level, quartile , min(shuffle_count)
from ranked_test
where quartile = 3
group by level, quartile;
This is basically an extended version of your working query:
in the CTE, we remove the filter on level in the subquery, and add it to the partition by of the window function instead
the outer query, we add the level to the select and group by clause

Getting parenthood of tree structure

I am trying to retrieve a tree structure i want specific levels of 'parenthood'. my table has level of depth, pathIndex and mapping. my first approach was to make some kinds of substrings to be able to look for the value via the mapping, but I am getting multiple errors on conversion of strings. one thing that might be possible is that if i try and query an item that is not at the lowest level it should return null for the levels it is missing.
In the table if i where to query for the line while asterisks
Id depth pathindex ItemNumber
4CF91F7F-832E-468D-B44A-E14DC66E710A 0 0 0.0
D34784A3-2134-4D09-828E-0EDA0C275C43 1 1 1
38158804-3EBC-4841-B1AF-1B86AD153010 2 1 1.1
8E25D494-322F-45F9-8A91-2A385F561C71 3 1 1.1.1
**64EB6C43-FF9C-0FF9-133F-01F4F21DA14F** 4 1 1.1.1.1
13AFA35C-80F8-405A-8980-33C3F7733EE2 2 2 1.2
3F1332E9-4D42-4BD8-9423-598430E94CB5 3 1 1.2.1
B3CC1306-A122-46F6-8F67-30FBABA3B590 4 1 1.2.1.1
C3F27C8E-F96B-4498-A85F-E4FC8EA90ED7 4 2 1.2.1.2
This is how it should be looking for the information, the static string are the ones i don't know how to generate in order to get nulls when asking for a level that is not that deep.
Select top 1 VehicleGroupId as Region
from GroupHierarchy where GroupHierarchy.numericalmapping = '1'
Select top 1 VehicleGroupId as gz
from GroupHierarchy where GroupHierarchy.numericalmapping = '1.1'
Select top 1 VehicleGroupId as cedis
from GroupHierarchy where GroupHierarchy.numericalmapping = '1.1.1'

Instead of putting decimals between your hierarchy members rather use forwardslashes (1/2/3) and then you can use Microsoft SQL hiearchy data type and functions to easily join and retain the structure:
https://www.sqlshack.com/use-hierarchyid-sql-server/

Get list of dependent objects via SQL query or function

I have two tables. One is for Task and second is dependency table for the tasks.
I want a query to give me all the tasks (recursively) based on a particular id.
I have two tables. One is for Task
ID TASK
1 Abc
2 Def
3 Ghi
4 Jkl
5 Mno
6 Pqr
The second one is for getting dependent tasks
ID DEPENDENT_ON
2 1
3 1
4 2
4 6
5 2
6 5
Is it possible to write a sql query to get a list of all the tasks (recursive) which are dependent on a particular task.
Example.
I want to check all tasks dependent on ID=1.
Expected output (which is 2 and 3):
2.Def
3.Ghi
Furthermore query should also give output of these two dependent tasks and so on.
Final output should be:
2.Def -- level one
3.Ghi -- level one
4.Jkl -- Dependent on task 2
5.Mno -- Dependent on task 2
6.Pqr -- Dependent on task 5
Formatting is not important. Just output is required

I need to join two tables and then do a recursive search.
You must OUTER JOIN the second table (which you didn't name, so I have called it TASK_TREE) through DEPENDENT_ON to the parent ID. Outer join because task 1 is the top of the tree and depends on no task. Then use Oracle's hierarchical query syntax to walk the tree:
select t.id, t.task, tt.dependent_on, level
from tasks t
left outer join task_tree tt on tt.id = t.id
connect by prior t.id = tt.dependent_on
start with t.id = 1
/
I have included the level so you can see how the tree unfurls. The Oracle SQL documentation covers hierarchical queries in depth. Find out more. If you don't want to use Oracle's proprietary hierarchical syntax, from 11gR2 Oracle supported recursive WITH clause. Find out more.
Incidentally, your posted data contains a error. Task 4 depends on both 2 and 6. Hierarchies must have child nodes which depend on a single parent node. Otherwise you'll get all sorts of weird results.

How to fill in the nulls for the missing nodes. SQL

I have data that have a hierachy level. If one leg of the hierarchy is shorter than the others, the last value must be carried down to the last level in my output.
Example: I have max 3 levels of data
AAB
/ \
AA B
/
A
I want to have output in which 'AA' and 'B' should be on the same level
but my output is giving me 'a' and 'b' on the same level.
Something like this
A-AA-AAB
B-AAB-NULL
But I want
A-AA-AAB
NULL-B-AAB
How to fill in the nulls for the missing nodes?

How to implement high performance tree view in SQL Server 2005

What is the best way to build the table that will represent the tree?
I want to implement a select ,insert ,update and delete that will work well with big data.
The select for example will have to support "Expand ALL" - getting all the children (and there children) for a given node.

Use CTE's.
Given the tree-like table structure:
id parent name
1 0 Electronics
2 1 TV
3 1 Hi-Fi
4 2 LCD
5 2 Plasma
6 3 Amplifiers
7 3 Speakers
, this query will return id, parent and depth level, ordered as a tree:
WITH v (id, parent, level) AS
(
SELECT id, parent, 1
FROM table
WHERE parent = 0
UNION ALL
SELECT id, parent, v.level + 1
FROM v
JOIN table t
ON t.parent = v.id
)
SELECT *
FROM v
id parent name
1 0 Electronics
2 1 TV
4 2 LCD
5 2 Plasma
3 1 Hi-Fi
6 3 Amplifiers
7 3 Speakers
Replace parent = 0 with parent = #parent to get only a branch of a tree.
Provided there's an index on table (parent), this query will efficiently work on a very large table, since it will recursively use INDEX LOOKUP to find all chilrden for each parent.
To update a certain branch, issue:
WITH v (id, parent, level) AS
(
SELECT id, parent, 1
FROM table
WHERE parent = 0
UNION ALL
SELECT id, parent, v.level + 1
FROM v
JOIN table t
ON t.parent = v.id
)
UPDATE table t
SET column = newvalue
WHERE t.id IN
(
SELECT id
FROM v
)
where #parent is the root of the branch.

You have to ask yourself these questions first :
1) What is ratio of modifications vs reads ? (= mostly static tree or changing constantly?)
2) How deep and how large do you expect the tree to grow ?
Nested sets are great for mostly-static trees where you need operations on whole branches. It handles deep trees without problems.
Materialized path works well for dynamic (changing) trees with constrained/predictable depth.
Recursive CTEs are ideal for very small trees, but the branch operations ("get all children in this branch..") get very costly with deep / large tree.

Check out Joe Celko's book on trees and hierarchies for multiple ways to tackle the hierarchy problem. The model that you choose will depend on how you weight lookups vs. updates vs. complexity. You can make the lookups pretty fast (especially for getting all children in a node) using the adjacency list model, but updates to the tree are slower.

If you have many updates and selects, the best option seems to be the Path Enumeration Model, which is briefly described here:
http://www.sqlteam.com/article/more-trees-hierarchies-in-sql

I'm surprised no one has mentioned going with a Closure Table. Very efficient for reads and pretty simple to write.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Create hierarchy table/dataframe in Spark or SQL - sql

I have the following dataframe that contains all the data I need. The thing is, each child can be found in the parent column with their own children 1000584 is Top level 4003773 is Level 1 1252665 is Level 2 *1321212 is Level 3 What I want to achieve is this:

Related

How to calculate percentiles for every level in a game using PostgreSQL 9.2

Getting parenthood of tree structure

Get list of dependent objects via SQL query or function

How to fill in the nulls for the missing nodes. SQL

How to implement high performance tree view in SQL Server 2005

Categories

Resources