Querying number of children nodes for each node in Hive

Querying number of children nodes for each node in Hive - hive

I am trying to come up with the best HiveQL query to get a list of rows where the one column would have the number of (direct) children that node has. the database is hierarchical so it looks like this:
| ID | Some other column | ParentID |
+-----------------------------------+
| 1 | XXXXXXXXXX x X X | NULL |
| 2 | XXXXXXXXXX x X X | 1 |
| 3 | XXXXXXXXXX x X X | 2 |
| 4 | XXXXXXXXXX x X X | 1 |
And I am attempting to query it to output something like this:
| ID | Some other column | child count |
+--------------------------------------+
| 1 | XXXXXXXXXX x X X | 2 |
| 2 | XXXXXXXXXX x X X | 1 |
| 3 | XXXXXXXXXX x X X | 0 |
| 4 | XXXXXXXXXX x X X | 0 |

Try something like this with LEFT JOIN.
SELECT a.id,
COALESCE (b.child_count, 0) "child count"
FROM mytable a
LEFT JOIN (SELECT parentid,
Count(*) child_count
FROM mytable
GROUP BY parentid) b
ON a.id = b.parentid;

Related

How to select distinct records based on a given condition?

I have the following table in the MySQL database:
| id | col | val |
| -- | --- | --- |
| 1 | 1 | y |
| 2 | 1 | y |
| 3 | 1 | y |
| 4 | 1 | n |
| 5 | 2 | n |
| 6 | 3 | n |
| 7 | 3 | n |
| 8 | 4 | y |
| 9 | 5 | y |
| 10 | 5 | y |
Now I want to distinctly select the records where all the values of similar col are equal to y. I tried both the following queries:
SELECT DISTINCT `col` FROM `tbl` WHERE `val` = 'y'
SELECT `col` FROM `tbl` GROUP BY `col` HAVING (`val` = 'y')
But it's not working out as per my expectation. I want the result to look like this:
| col |
| --- |
| 4 |
| 5 |
But 1 is also being included in the results with my queries. Can anybody help me building the correct query? As far as I understand, I may need to create a derived table, but can't quite figure out the right path.

You are close, with the second query. Instead, compare the min and max values:
SELECT `col`
FROM `tbl`
GROUP BY `col`
HAVING MIN(val) = MAX(val) AND MIN(`val`) = 'y';

Check that 'y' is the minimum value:
HAVING MIN(val) = 'y'

Oracle SQL unpivot and keep rows with null values [duplicate]

This question already has an answer here:
oracle - querying NULL values in unpivot query
(1 answer)
Closed 2 years ago.
I'm currently doing an unpivot for a Oracle Data Source (v.12.2) like this:
SELECT *
FROM some_table
UNPIVOT (
(X,Y,Val)
FOR SITE
IN (
(SITE1_X, SITE1_Y, SITE1_VAL) AS '1',
(SITE2_X, SITE2_Y, SITE2_VAL) AS '2',
(SITE3_X, SITE3_Y, SITE3_VAL) AS '3'
))
This works totally fine so far. There is only one exception - I have another column, let's say extend_info, ... if this column has the value y, there will be only one row of this column and all the site columns will be null. Nevertheless I would like to keep this row and not drop it.
I'm not really sure how to do this or what would be a nice way to do this. Any recommendations?
Example:
Original Table:
ID | SITE1_X | SITE1_Y |SITE1_VAL | SITE2_X | SITE2_Y | SITE2_VAL | ... | extend_info
-------
1 | 0 | 0 | 5 | 1 | 1 | 10 | ... | n
2 | 0 | 0 | 3 | null | null | null | ... | n
3 | null | null | null | null | null | null | ... | y
current output:
ID | SITE | X | Y | VAL | extend_info
-------
1 | 1 | 0 | 0 | 5 | n
2 | 1 | 1 | 1 | 10 | n
3 | 2 | 0 | 0 | 3 | n
desired output:
ID | SITE | X | Y | VAL | extend_info
-------
1 | 1 | 0 | 0 | 5 | n
2 | 1 | 1 | 1 | 10 | n
3 | 2 | 0 | 0 | 3 | n
4 | | | | | y
I don't really care what is in SITE|X|Y|VAL in that case, can be 0 for everything or null.
Bonus question:
If extend_info is y I would like to join another table with this ID. The other table looks like this:
ID | F_ID | X | Y | VAL
-----
1 | 4 | 1 | 1 | 8
2 | 4 | 2 | 2 | 9
and in that case my final output table should look like:
ID | SITE | X | Y | VAL | X_OTHER_TABLE | Y_OTHER_TABLE
-------
1 | 1 | 0 | 0 | 5 |
2 | 1 | 1 | 1 | 10 |
3 | 2 | 0 | 0 | 3 |
4 | | | | 8 | 1 | 1
5 | | | | 9 | 2 | 2
I know... the database structure is super ugly but that is what a vendor provides us and we are trying to create a View to make it easier to perform some data analysis tasks on it.
It doesn't have to look 1:1 like my final example - but maybe my itention gets clear = I want to have one single table/view with all the information in a single format.
Thanks for any help!

I would recommend a lateral join:
SELECT s.id, u.*
FROM some_table s CROSS JOIN LATERAL
(SELECT s.SITE1_X as SITE_X, s.SITE1_Y as SITE_Y, s.SITE1_VAL as SITE_VAL FROM DUAL UNION ALL
SELECT s.SITE2_X, s.SITE2_Y, s.SITE2_VAL FROM DUAL UNION ALL
SELECT s.SITE3_X, s.SITE3_Y, s.SITE3_VAL FROM DUAL
) u;
You can just join additional tables to this as you like.

Cte within Cte in SQL

I have been encountered with a situation where I need to apply a where, group by condition on the result of CTE in the CTE.
Table 1 as follows
+---+---+---+---+
| x | y | z | w |
+---+---+---+---+
| 1 | 2 | 3 | 1 |
| 2 | 3 | 4 | 2 |
| 3 | 2 | 5 | 3 |
| 1 | 2 | 6 | 2 |
+---+---+---+---+
Table 2 as follows
+---+---+-----+---+
| a | b | c | d |
+---+---+-----+---+
| 1 | m | 100 | 1 |
| 2 | n | 23 | 2 |
| 4 | o | 34 | 4 |
| 1 | m | 23 | 2 |
+---+---+-----+---+
Assuming I have the data of following sql query in a table called TAB
with cte as (
select x,y,z from table1),
cte1 as (select a,b,c from table2)
select cte.x,cte1.y,cte1.z,cte2.b,cte2.c from cte left join cte1 on cte.x=cte.a and cte1.w=cte2.d
Result of above CTE would be as follows
+---+---+---+---+---+-----+
| x | y | z | w | b | c |
+---+---+---+---+---+-----+
| 1 | 2 | 3 | 1 | m | 100 |
| 2 | 3 | 4 | 2 | n | 23 |
| 1 | 2 | 6 | 2 | m | 23 |
+---+---+---+---+---+-----+
I would like to query the following from the table TAB
select * from TAB where (X||b) in (select (X||b) from TAB group by (X||Y) having sum(c)=123)
I'm trying to formulate the SQL query as follows but it's not as i expected:
select * from (
with cte as (
select x,y,z from table1),
cte1 as (select a,b,c from table2)
select cte.x,cte1.y,cte1.z,cte2.b,cte2.c from cte left join cte1 on cte.x=cte.a) as TAB
where ((X||b) in (select (X||b) from TAB group by (X||Y) having sum(c)=123))
The final result must be as follows
+---+---+---+---+---+-----+
| x | y | z | w | b | c |
+---+---+---+---+---+-----+
| 1 | 2 | 3 | 1 | m | 100 |
| 1 | 2 | 6 | 2 | m | 23 |
+---+---+---+---+---+-----+

I don't think DB2 allows CTEs in subqueries or to be nested. Why not just write this using another CTE?
with cte as (
select x,y,z from
table1
),
cte1 as (
select a,b,c
from table2
),
tab as (
select cte.x,cte1.y,cte1.z,cte1.w,cte2.b,cte2.c
from cte left join
cte1
on cte.x=cte.a and cte1.w=cte2.d
)
select *
from TAB
where (X||b) in (select (X||b) from TAB group by (X||Y) having sum(c)=123);

Transform data in rows to columns

I'm trying to transform data of the following form:
| ID | X | Y |
--------------
| 1 | a | m |
| 1 | b | n |
| 1 | c | o |
| 2 | d | p |
| 2 | e | q |
| 3 | f | r |
| 3 | g | s |
| 3 | h | |
To this form:
| ID | X1 | X2 | X3 | Y1 | Y2 | Y3 |
------------------------------------
| 1 | a | b | c | m | n | o |
| 2 | d | e | | p | q | |
| 3 | f | g | h | r | s | |
What is the best way to accomplish this in SQL Server 2017? Is there a better way to do transformations like this using another tool?

I don't think you can solve this problem on the DB side. You should do some backend programming. You would be able to use Pivot function, if you wanted to reverse your row values as column but you want to group them based on duplicate ids. I would solve this problem by checking duplicates by using the query below. At the results of that query, you'll be able to get max count for duplicated id. For example 1 duplicated 3 times, so you need to create a data table with 3x2+1=7 columns in your backend code. 1 stands for id column. After that you can just fill that table by checking data for each id.
WITH Temp (id, count)
AS
(
Select id, count(*)
from MyTable
group by id
having count(*)>1
)
select max(count) from Temp

MS-Access: Merge two tables "below" each other

I have two tables in my Access-database. They look something like this:
Table1
+--------------+----------+----------+----------+
| Kabelnummer | Column1 | Column2 | Column3 |
+--------------+----------+----------+----------+
| 1 | x | x | x |
+--------------+----------+----------+----------+
| 2 | x | x | x |
+--------------+----------+----------+----------+
| 3 | x | x | x |
+--------------+----------+----------+----------+
| 4 | x | x | x |
+--------------+----------+----------+----------+
table2
+--------------+----------+----------+----------+
| Kabelnummer | Column1 | Column2 | Column3 |
+--------------+----------+----------+----------+
| 1 | x | x | x |
+--------------+----------+----------+----------+
| 2 | x | x | x |
+--------------+----------+----------+----------+
| 3 | x | x | x |
+--------------+----------+----------+----------+
| 4 | x | x | x |
+--------------+----------+----------+----------+
I need a query that gives me 1 table with the data from table1 added to the data from table2:
TableTotal
+--------------+----------+----------+----------+
| Kabelnummer | Column1 | Column2 | Column3 |
+--------------+----------+----------+----------+
| 1 | x | x | x |
+--------------+----------+----------+----------+
| 2 | x | x | x |
+--------------+----------+----------+----------+
| 3 | x | x | x |
+--------------+----------+----------+----------+
| 4 | x | x | x |
+--------------+----------+----------+----------+
| 1 | x | x | x |
+--------------+----------+----------+----------+
| 2 | x | x | x |
+--------------+----------+----------+----------+
| 3 | x | x | x |
+--------------+----------+----------+----------+
| 4 | x | x | x |
+--------------+----------+----------+----------+
The names "Column1", "Column2" and "Column3" are the same in both tables

SELECT *
FROM Table1
UNION
SELECT *
FROM table2;

The question asks for non-distinct values while the current answers provide distinct values. The method below provides non-distinct values such that
SELECT *
FROM Table1
UNION ALL
SELECT *
FROM table2;
which is often more efficient than the union method, particularly with large data sets (not having to compute the distinct).

If your goal is to append the second table to the first one, it can be achieved this way
INSERT INTO TABLE1 SELECT * FROM TABLE2;
The caveat with these other queries is that yes, they do the job, but create a third table with the joined data.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Querying number of children nodes for each node in Hive - hive

Try something like this with LEFT JOIN. SELECT a.id, COALESCE (b.child_count, 0) "child count" FROM mytable a LEFT JOIN (SELECT parentid, Count(*) child_count FROM mytable GROUP BY parentid) b ON a.id = b.parentid;

Related

How to select distinct records based on a given condition?

Oracle SQL unpivot and keep rows with null values [duplicate]

Cte within Cte in SQL

Transform data in rows to columns

MS-Access: Merge two tables "below" each other

Categories

Resources