group by on the multiple inner join in postgres

group by on the multiple inner join in postgres - sql

i have 3 tables the first table "A" is the master table
id_grp|group_name |created_on |status|
------+--------------+-----------------------+------+
17|Teller |2022-09-09 16:00:44.842| 1|
18|Combined Group|2022-09-09 10:16:42.473| 1|
16|admnistrator |2022-09-08 10:11:14.313| 1|
Then i have another table table "b"
id_config|id_grp|id_utilis|
---------+------+---------+
159| 16| 1|
161| 16| 54|
164| 17| 55|
438| 17| 88|
166| 18| 39|
167| 18| 20|
439| 16| 89|
198| 18| 51|
Then i have the last table "C"
id_config|id_grp|id_pol|
---------+------+------+
46| 16| 7|
48| 17| 8|
51| 18| 8|
52| 18| 7|
84| 18| 9|
113| 17| 9|
but when i using group by with multiple join as follows
SELECT
a.id_grp,
a.group_name,
a.created_on,
a.status,
count(b.id_utilis) AS users,
count(c.id_pol) AS policy
FROM a
inner JOIN b on a.id_grp = b.id_grp
inner JOIN c on a.id_grp = c.id_grp
GROUP BY a.id_grp, a.group_name, a.created_on, a.status,
but i am getting wront result there both the count are creating matrix and multiplying each other
id_grp|group_name |created_on |status|users|policy|
------+--------------+-----------------------+------+-----+------+
17|Teller |2022-09-09 16:00:44.842| 1| 10| 10|
16|admnistrator |2022-09-08 10:11:14.313| 1| 3| 3|
18|Combined Group|2022-09-09 10:16:42.473| 1| 18| 18|

select *
from a
join (select id_grp, count(*) as users from b group by id_grp) b using(id_grp)
join (select id_grp, count(*) as policy from c group by id_grp) c using(id_grp)
id_grp
group_name
created_on
status
users
policy
17
Teller
2022-09-09 16:00:44
1
2
2
18
Combined Group
2022-09-09 10:16:42
1
3
3
16
admnistrator
2022-09-08 10:11:14
1
3
1
Fiddle

Related

How to find values between date ranges Power BI?

I need help to understand a thing.
I have a table that uses slow changing dimension, with start and end date and indicator if it is active or not:
Type
start
end
value
active
A
0001/01/01
9999/12/31
10
1
B
2015/03/18
2016-06-25
4
0
B
2016-06-25
9999/12/31
7
1
C
2017-05-07
9999/12/31
8
1
I need to connect this table to a report in power bi and fetch the respective value in a line graph that brings the values by month.
Something like this:
I am using a report connect to SSAS through Direct Query. I am able to create a view with the new structure to connect to my cube.
How can I get this result using a table with this structure.
Thanks for the help!
I thought about creating a table with a value for each month, but as you can see, I have dates ranging from 0001-01-01 to 9999-12-31. (To be honest, I don't really know how to do that either).

Based on the chart, assume that date range is 2016-04 ~ 2017-12, so
Generate a month dimension for the above date range
Cross join month dimension and the give slow changing dimension slo_dim and get value in effective period
Draw a line chart based on the result from step 2
with cte_month (year_month, n) as (
select cast('2016-04-01' as date), 1
union all
select dateadd(month, 1, year_month), n+1
from cte_month
where n < 21)
select d.type,
m.year_month,
d.value
from cte_month m, slo_dim d
where m.year_month between d.start_dt and d.end_dt
order by d.type, m.year_month;
Result:
type|year_month|value|
----+----------+-----+
A |2016-04-01| 10|
A |2016-05-01| 10|
A |2016-06-01| 10|
A |2016-07-01| 10|
A |2016-08-01| 10|
A |2016-09-01| 10|
A |2016-10-01| 10|
A |2016-11-01| 10|
A |2016-12-01| 10|
A |2017-01-01| 10|
A |2017-02-01| 10|
A |2017-03-01| 10|
A |2017-04-01| 10|
A |2017-05-01| 10|
A |2017-06-01| 10|
A |2017-07-01| 10|
A |2017-08-01| 10|
A |2017-09-01| 10|
A |2017-10-01| 10|
A |2017-11-01| 10|
A |2017-12-01| 10|
B |2016-04-01| 4|
B |2016-05-01| 4|
B |2016-06-01| 4|
B |2016-07-01| 7|
B |2016-08-01| 7|
B |2016-09-01| 7|
B |2016-10-01| 7|
B |2016-11-01| 7|
B |2016-12-01| 7|
B |2017-01-01| 7|
B |2017-02-01| 7|
B |2017-03-01| 7|
B |2017-04-01| 7|
B |2017-05-01| 7|
B |2017-06-01| 7|
B |2017-07-01| 7|
B |2017-08-01| 7|
B |2017-09-01| 7|
B |2017-10-01| 7|
B |2017-11-01| 7|
B |2017-12-01| 7|
C |2017-06-01| 8|
C |2017-07-01| 8|
C |2017-08-01| 8|
C |2017-09-01| 8|
C |2017-10-01| 8|
C |2017-11-01| 8|
C |2017-12-01| 8|

SQL query to find an output table

I have three dimension tables and a fact table and i need to write a query in such way that i join all the dimension columns with fact table to find out top 10 ATMs where most transactions are in the ’inactive’ state.I try the query with cartesian join but i dont know if this is the right way to join the tables.
select a.atm_number,a.atm_manufacturer,b.location,count(c.trans_id) as total_transaction_count,count(c.atm_status) as inactive_count
from dimen_atm a,dimen_location b,fact_atm_trans c
where a.atm_id = c.atm_id and b.location = c.location
order by inactive_count desc limit 10;
dimen_card_type
+------------+---------+
|card_type_id|card_type|
+------------+---------+
| 1| CIRRUS|
| 2| Dankort|
dimen_atm
+------+----------+----------------+---------------+
|atm_id|atm_number|atm_manufacturer|atm_location_id|
+------+----------+----------------+---------------+
| 1| 1| NCR| 16|
| 2| 2| NCR| 64|
+------+----------+----------------+---------------+
dimen_location
+-----------+--------------------+----------------+-------------+-------+------+------+
|location_id| location| streetname|street_number|zipcode| lat| lon|
+-----------+--------------------+----------------+-------------+-------+------+------+
| 1|Intern KÃƒÂ¸benhavn|RÃƒÂ¥dhuspladsen| 75| 1550|55.676|12.571|
| 2| KÃƒÂ¸benhavn| Regnbuepladsen| 5| 1550|55.676|12.571|
+-----------+--------------------+----------------+-------------+-------+------+------+
fact_atm_trans
+--------+------+--------------+-------+------------+----------+--------+----------+------------------+------------+------------+-------+----------+----------+------------+-------------------+
|trans_id|atm_id|weather_loc_id|date_id|card_type_id|atm_status|currency| service|transaction_amount|message_code|message_text|rain_3h|clouds_all|weather_id|weather_main|weather_description|
+--------+------+--------------+-------+------------+----------+--------+----------+------------------+------------+------------+-------+----------+----------+------------+-------------------+
| 1| 1| 16| 5229| 3| Active| DKK|Withdrawal| 5980| null| null| 0.0| 80| 803| Clouds| broken cloudsr|
| 2| 1| 16| 4090| 10| Active| DKK|Withdrawal| 3992| null| null| 0.0| 32| 802| Clouds| scattered cloudsr|
+--------+------+--------------+-------+------------+----------+--------+----------+------------------+------------+-----------

Joining tables and finding difference

I have a table which contains the following schema:
Table1
+------------------+--------------------+-------------------+-------------+-------------+
|student_id|project_id|name|project_name|approved|evaluation_type|grade| cohort_number|
I have another table with the following:
Table2
+-------------+----------+
|cohort_number|project_id|
My problem is: I want to get for each student_id the projects that he has not completed (no rows). The way i know all the projects he should have done is by checking the cohort_number. Basically I need the "diference" between the 2 tables. I want to fill table 1 with the missing entries, by comparing with table 2 project_id for that cohort_number.
I am not sure if I was clear.
I tried using LEFT JOIN, but I only get records where it matches. (I need the opposite)
Example:
Table1
|student_id|project_id|name| project_name| approved|evaluation_type| grade|cohort_number|
+----------+----------+--------------------+------+--------------------+--------+---------------+------------------
| 13| 18|Name| project/sd-03-bloc...| true| standard| 1.0| 3|
| 13| 7|Name| project/sd-03-bloc...| true| standard| 1.0| 3|
| 13| 27|Name| project/sd-03-bloc...| true| standard| 1.0| 3|
Table2
+-------------+----------+
|cohort_number|project_id|
+-------------+----------+
| 3| 18|
| 3| 27|
| 4| 15|
| 3| 7|
| 3| 35|
I want:
|student_id|project_id|name| project_name| approved|evaluation_type| grade|cohort_number|
+----------+----------+--------------------+------+--------------------+--------+---------------+------------------
| 13| 18|Name| project/sd-03-bloc...| true| standard| 1.0| 3|
| 13| 7|Name| project/sd-03-bloc...| true| standard| 1.0| 3|
| 13| 27|Name| project/sd-03-bloc...| true| standard| 1.0| 3|
| 13| 35|Name| project/sd-03-bloc...| false| standard| 0| 3|
Thanks in advance

If I followed you correctly, you can get all distinct (student_id, cohort_number, name) tuples from table1, and then bring all corresponding rows from table2. This basically gives you one row for each project that a student should have completed.
You can then bring table1 with a left join. "Missing" projects are identified by null values in columns project_name, approved, evaluation_type, grade.
select
s.student_id,
t2.project_id,
s.name,
t1.project_name,
t1.approved,
t1.evaluation_type,
t1.grade,
s.cohort_number
from (select distinct student_id, cohort_number, name from table1) s
inner join table2 t2
on t2.cohort_number = s.cohort_number
left join table1 t1
on t1.student_id = s.student_id
and t1.project_id = t.project_id

why sql code work and scala code like sql don't work? (use left join and several date)

I have sql code which works perfectly:
val sql ="""
select a.*,
b.fOOS,
b.prevD
from dataFrame as a
left join dataNoPromoFOOS as b on
a.shopId = b.shopId and a.skuId = b.skuId and
a.Date > b.date and a.date <= b.prevD
"""
result:
+------+------+----------+-----+-----+------------------+---+----------+------------------+----------+
|shopId| skuId| date|stock|sales| salesRub| st|totalPromo| fOOS| prevD|
+------+------+----------+-----+-----+------------------+---+----------+------------------+----------+
| 200|154057|2017-03-31|101.0| 49.0| 629.66| 1| 0|58.618803952304724|2017-03-31|
| 200|154057|2017-09-11|116.0| 76.0| 970.67| 1| 0| 63.3344597217295|2017-09-11|
| 200|154057|2017-11-10| 72.0| 94.0| 982.4599999999999| 1| 0|59.019226118850405|2017-11-10|
| 200|154057|2018-10-08|126.0| 34.0| 414.44| 1| 0| 55.16878756270067|2018-10-08|
| 200|154057|2016-08-03|210.0| 27.0| 307.43| 1| 0|23.530049844711286|2016-08-03|
| 200|154057|2016-09-03| 47.0| 20.0| 246.23| 1| 0|24.656378380329674|2016-09-03|
| 200|154057|2016-12-31| 66.0| 30.0| 386.5| 1| 1| 26.0423103074891|2017-01-09|
| 200|154057|2017-02-28| 22.0| 61.0| 743.2899999999998| 1| 0| 54.86808157636879|2017-02-28|
| 200|154057|2017-03-16| 79.0| 41.0|505.40999999999997| 1| 0| 49.79449369431623|2017-03-16|
when i use scala this code don't work
dataFrame.join(dataNoPromoFOOS,
dataFrame("shopId") === dataNoPromoFOOS("shopId") &&
dataFrame("skuId") === dataNoPromoFOOS("skuId") &&
(dataFrame("date").lt(dataNoPromoFOOS("date"))) &&
(dataFrame("date").geq(dataNoPromoFOOS("prevD"))) ,
"left"
).select(dataFrame("*"),dataNoPromoFOOS("fOOS"),dataNoPromoFOOS("prevD"))
result:
+------+------+----------+-----+-----+------------------+---+----------+----+-----+
|shopId| skuId| date|stock|sales| salesRub| st|totalPromo|fOOS|prevD|
+------+------+----------+-----+-----+------------------+---+----------+----+-----+
| 200|154057|2016-09-24|288.0| 34.0| 398.66| 1| 0|null| null|
| 200|154057|2017-06-11| 40.0| 38.0| 455.32| 1| 1|null| null|
| 200|154057|2017-08-18| 83.0| 20.0|226.92000000000002| 1| 1|null| null|
| 200|154057|2018-07-19|849.0| 58.0| 713.12| 1| 0|null| null|
| 200|154057|2018-08-11|203.0| 52.0| 625.74| 1| 0|null| null|
| 200|154057|2016-09-01|120.0| 24.0| 300.0| 1| 1|null| null|
| 200|154057|2016-12-22| 62.0| 30.0| 378.54| 1| 0|null| null|
| 200|154057|2017-05-11|105.0| 49.0| 597.12| 1| 0|null| null|
| 200|154057|2016-12-28| 3.0| 36.0| 433.11| 1| 1|null| null|
somebody know why sql code work and scala code don't join left table.
i think it's the date column, but i don't undestand how i can find my error

Or conditions on join result on cross join

I am trying to join two dataset on spark, I am using spark version 2.1,
SELECT *
FROM Tb1
INNER JOIN Tb2
ON Tb1.key1=Tb2.key1
OR Tb1.key2=Tb2.Key2
But it results on cross join, how can I join two tables and get only matching records?
I also have tried left outer join, but also it is forcing me to change to cross join instead ??

Try this method
from pyspark.sql import SQLContext as SQC
sqc = SQC(sc)
x = [(1,2,3), (4,5,6), (7,8,9), (10,11,12), (13,14,15)]
y = [(1,4,5), (4,5,6), (10,11,16),(34,23,31), (56,14,89)]
x_df = sqc.createDataFrame(x,["x","y","z"])
y_df = sqc.createDataFrame(y,["x","y","z"])
cond = [(x_df.x == y_df.x) | ( x_df.y == y_df.y)]
x_df.join(y_df,cond, "inner").show()
output
+---+---+---+---+---+---+
| x| y| z| x| y| z|
+---+---+---+---+---+---+
| 1| 2| 3| 1| 4| 5|
| 4| 5| 6| 4| 5| 6|
| 10| 11| 12| 10| 11| 16|
| 13| 14| 15| 56| 14| 89|
+---+---+---+---+---+---+

By joining it twice:
select *
from Tb1
inner join Tb2
on Tb1.key1=Tb2.key1
inner join Tb2 as Tb22
on Tb1.key2=Tb22.Key2
Or Left joining both:
select *
from Tb1
left join Tb2
on Tb1.key1=Tb2.key1
left join Tb2 as Tb22
on Tb1.key2=Tb22.Key2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

group by on the multiple inner join in postgres - sql

Related

How to find values between date ranges Power BI?

SQL query to find an output table

Joining tables and finding difference

why sql code work and scala code like sql don't work? (use left join and several date)

Or conditions on join result on cross join

Categories

Resources