Hive SQL: Select all rows before event - sql

In Hive, I have the following data
sess,person,type,number
a mary I 1
a mary I 2
a mary V 3
a mary V 4
b mary I 1
b mary V 2
b mary C 3
a john I 1
a john I 2
a john V 3
a john V 4
b john I 1
b john V 2
b john C 3
How do I select everything for each person and session up to and including the first type=V? The output should look like
sess,person,type,number
a mary I 1
a mary I 2
a mary V 3
b mary I 1
b mary V 2
a john I 1
a john I 2
a john V 3
b john I 1
b john V 2

You can use window functions:
select t.*
from (select t.*,
min(case when type = 'V' then number end) over (partition by session, person order by number) as min_aid
from t
) t
where min_aid is null or number <= aid;

Related

Count children from hierarchy path

I have a table like this:
id
name
path
1
John
/1
2
Mark
/2
3
Kevin
/1/3
4
Sarah
/1/3/4
5
Andy
/2/5
...
...
...
So, I can say that Sarah is Kevin's child which is John's child.
I would like to have this:
id
name
path
number of children
1
John
/1
2
2
Mark
/2
1
3
Kevin
/1/3
1
4
Sarah
/1/3/4
0
5
Andy
/2/5
0
...
...
...
...
TASK NUMBER 2:
Let's say that I have this table too
id
income
user_id
1
200
1
2
120
1
3
340
2
4
500
3
5
600
5
6
80
5
I can say that John has a Total income of 320$, but if I also want to count John's children, it is 820$ (because id =3 is John's child). So, I would also like a query where I can count all the hierarchical incomes.
You can do:
select
t.*,
(select count(*) from t c where c.path like t.path || '/%') as c_count,
i.income + (
select coalesce(sum(i.income), 0) from t c join i on i.user_id = c.id
where c.path like t.path || '/%'
) as c_income
from t
left join (
select user_id, sum(income) as income from i group by user_id
) i on i.user_id = t.id
Result:
id name path c_count c_income
--- ------ ------- -------- --------
1 John /1 2 820
2 Mark /2 1 1020
3 Kevin /1/3 1 500
4 Sarah /1/3/4 0 null
5 Andy /2/5 0 680
See example at DB Fiddle.

How to do a count 2 columns

I have the following table
cus_id
gov_id
name
1
aa
Bob
1
bb
Bill
1
aa
James
2
cc
Sam
3
aa
Sarah
1
aa
Joe
2
cc
Nathan
As you can see when the cus_id=1 and gov_id=aa there are 3 duplicates, thus the count is 3. I want to count how many instances where the cus_id and gov_id are the same, as in the row.
When cus_id=2 and gov_id=cc there are 2 duplicates. I want the output like this:
cus_id
gov_id
name
count
1
aa
Bob
3
1
bb
Bill
1
1
aa
James
3
2
cc
Sam
2
3
aa
Sarah
1
1
aa
Joe
3
2
cc
Nathan
2
I tried:
SELECT cus_id, gov_id, name, count(*) as count
FROM test_table;
You can use analytics functions:
select t.*,
count(*) over (partition by cus_id, gov_id) as cnt
from t;

Using CTE to determine a specific Hierarchical ID for Family Members

I'm trying to figure out how to attach an incrementing ID to my resultset while using CTE.
My table has data like so:
PersonId ParentLinkId Relation Name
1 NULL F John Doe
2 1 S Jane Doe
3 1 C Jack Doe
4 1 C Jill Doe
I want to add a column called RelationId. Basically the "F" person will always get "1", The relation "S" will always get "2" and any subsequent "C" relation will get 3,4,5...etc
They are linked by the ParentLinkId so ParentLinkId = PersonId.
I tried to use CTE to recursively increment this value but I keep getting stuck on an infinite loop
I tried :
WITH FinalData( ParentId, ParentLinkId, Name, Relationship, RelationshipId) AS
(
SELECT ParentId
,ParentLinkId
,Name
,Relationship
,1
FROM FamTable
WHERE ParentLinkId IS NULL
UNION ALL
SELECT FT.ParentId
,ParentLinkId
,Name
,Relationship
,RelationshipId + 1
FROM FamTable FT
INNER JOIN FinalData ON FT.ParentLinkId = FinalData.ParentId
)
SELECT * FROM
FinalData
This is the result I keep on getting:
PersonId ParentLinkId Relation Name RelationshipId
1 NULL F John Doe 1
2 1 S Jane Doe 2
3 1 C Jack Doe 2
4 1 C Jill Doe 2
It should be
PersonId ParentLinkId Relation Name RelationshipId
1 NULL F John Doe 1
2 1 S Jane Doe 2
3 1 C Jack Doe 3
4 1 C Jill Doe 4
I think I'm getting close using CTE but any help or prod in the right direction would be greatly appreciated!
This sounds like a simple row_number():
select f.*,
row_number() over (partition by coalesce(ParentLinkId, PersonId)
order by (case when relation = 'F' then 1
when relation = 'S' then 2
when relation = 'C' then 3
end), PersonId
) as relationId
from famtable f;
Here is a db<>fiddle.

Finding fields to update based on combinations

I need to be able to display in my results who needs updates. I have a temp table I created that looks like this. The rule is per ID they cannot have more than 1 MASTER = 1. They must have FULLTIME = 1 on that record and all other records will be FULLTIME = 0 and PARTTIME = 1. This is quite difficult because you have to compare across multiple IDs.
I've tried combinations using maxes, count distinct, subqueries, etc. No luck getting it done. I've even tried to do some manipulation in Excel but it's totally confusing to me.
select distinct
x.ID,
COUNT(x.ID) AS ID_Count
from #FT0PT1M1Version2 as x
join (
select
ID, NAME, MASTER, FULLTIME, PARTTIME
from #FT0PT1M1Version2
WHERE E = 'P'
GROUP BY
ID, NAME, MASTER, FULLTIME, PARTTIME
HAVING COUNT(ID) = '1'
) as y
on x.ID = y.ID
WHERE
x.PARTTIME = '1' and
x.MASTER = '1'
group by x.ID
HAVING COUNT(x.ID) = '1'
order by 1
Temp Table
ID NAME MASTER FULLTIME PARTTIME
1 JAMES JONES 0 1 0
1 JAMES JONES 1 0 1
1 JAMES JONES 0 0 1
2 MICHEAL JORDAN 1 1 0
2 MICHEAL JORDAN 0 0 1
2 MICHEAL JORDAN 0 0 1
3 JOHN DOE 1 1 0
3 JOHN DOE 0 0 1
Expected Results
ID NAME MASTER FULLTIME PARTTIME UPDATE
1 JAMES JONES 0 1 0 Y
1 JAMES JONES 1 0 1 Y
1 JAMES JONES 0 0 1 N
2 MICHEAL JORDAN 1 1 0 N
2 MICHEAL JORDAN 0 0 1 N
2 MICHEAL JORDAN 0 0 1 N
3 JOHN DOE 1 1 0 N
3 JOHN DOE 1 0 1 Y
You could try below query but I would prefer to put a check constraint on columns like check if master=1 then update ='Y' something like that.
SELECT ID, NAME,
CASE
WHEN (MASTER=1 AND FULLTIME=1 And PARTTIME=0)
OR (MASTER=0 AND FULLTIME=0 And PARTTIME=1)
Then 'N'
ELSE 'Y'
END as "Update"
from table group by ID, NAME, Update
Having sum(Master) = 1;

SQL Server: how to get this result from this table (example inside)

I would like to ask you guys how would you do a query to show the data of this table:
week name total
==== ====== =====
1 jon 15.2
1 jon 10
1 susan 10
1 howard 9
1 ben 10
3 ben 30
3 susan 10
3 mary 10
5 jon 10
6 howard 12
7 tony 25.1
8 tony 7
8 howard 10
9 susan 6.2
9 howard 9
9 ben 10
11 howard 10
11 howard 10
like this:
week name total
==== ====== =====
1 ben 10
1 howard 9
1 jon 25.2
1 mary 0
1 susan 10
1 tony 0
3 ben 30
3 howard 0
3 jon 0
3 mary 10
3 susan 10
3 tony 0
5 ben 0
5 howard 0
5 jon 10
5 mary 0
5 susan 0
5 tony 0
6 ben 0
6 howard 12
6 jon 0
6 mary 0
6 susan 0
6 tony 0
7 ben 0
7 howard 0
7 jon 0
7 mary 0
7 susan 0
7 tony 25.1
8 ben 0
8 howard 10
8 jon 0
8 mary 0
8 susan 0
8 tony 7
9 ben 10
9 howard 9
9 jon 0
9 mary 0
9 susan 6.2
9 tony 0
11 ben 0
11 howard 20
11 jon 0
11 mary 0
11 susan 0
11 tony 0
I tried something like:
select t1.week_id ,
t2.name ,
sum(t1.total)
from xpto as t1 ,
xpto as t2
where t1.week_id = t2.week_id
group by t1.week_id, t2.name
order by t1.week_id, t2.name
But I'm failing to understand the "sum" part and I can't figure out why...
Any help would be very appreciated. Thanks in advance, and sorry for my english.
you might try something like the following:
select week = w.week ,
name = n.name ,
sum_total = coalesce( sum( d.total ) , 0 )
from ( select distinct week from my_table ) w
cross join ( select distinct name from my_table ) n
left join my_table d on d.week = w.week
and d.name = n.name
group by w.week ,
n.name
order by 1,2
The cross join of first two derived tables constructs their cartesian product: all week and all names from the table, regardless of whether or not a particular week/name combination exists.
We then take that, join it against the actual data rows and summarize it, using coalesce() to collapse any null results from the aggregate function sum() to 0.
As I understood you you want to show all weeks and all names across all table regardless whether they were entered for the week or not. To do so you will need to first build a list of all names and week, cross join them and then join them to the list of totals, like this:
;with names as (select distinct name from xpto),
weeks as (select distinct week from xpto),
totals as (select week, name, sum(total) as total
from xpto group by week, name)
select w.week, n.name, coalesce(total, 0) as total
from names n cross join weeks w
left join totals t on t.name=n.name and w.week = t.week
order by 1,2
SQL Fiddle
I've edited my answer because I now understand what you want to do a bit better.
I prefer doing things in several steps rather than trying to do several transformations of data with a single join or subquery. So I would approach this like this:
;
with Weeks as (
select distinct Week_id
from xpto
)
, Names as (
select distinct Name
from xpto
)
, Scores as (
select week_id
, name
, score = sum(t1.score)
from xpto t1
group by
t1.week_id
, t1.name
)
, WeeksAndNames as (
select week_id
, name
from Weeks
cross join Names
)
-- The final query!
select wan.week_id
, wan.name
, total = COALESCE(s.total,0)
from WeeksAndNames wan
left join Scores s
on wan.week_id = s.week_id
and wan.name = s.name
order by
wan.week_id
, wan.name
Lengthy, I'll grant you, and you can probably condense it. But this shows each step you need to go through to transform your data into the list you want.