i have 3 tables the first table "A" is the master table
id_grp|group_name |created_on |status|
------+--------------+-----------------------+------+
17|Teller |2022-09-09 16:00:44.842| 1|
18|Combined Group|2022-09-09 10:16:42.473| 1|
16|admnistrator |2022-09-08 10:11:14.313| 1|
Then i have another table table "b"
id_config|id_grp|id_utilis|
---------+------+---------+
159| 16| 1|
161| 16| 54|
164| 17| 55|
438| 17| 88|
166| 18| 39|
167| 18| 20|
439| 16| 89|
198| 18| 51|
Then i have the last table "C"
id_config|id_grp|id_pol|
---------+------+------+
46| 16| 7|
48| 17| 8|
51| 18| 8|
52| 18| 7|
84| 18| 9|
113| 17| 9|
but when i using group by with multiple join as follows
SELECT
a.id_grp,
a.group_name,
a.created_on,
a.status,
count(b.id_utilis) AS users,
count(c.id_pol) AS policy
FROM a
inner JOIN b on a.id_grp = b.id_grp
inner JOIN c on a.id_grp = c.id_grp
GROUP BY a.id_grp, a.group_name, a.created_on, a.status,
but i am getting wront result there both the count are creating matrix and multiplying each other
id_grp|group_name |created_on |status|users|policy|
------+--------------+-----------------------+------+-----+------+
17|Teller |2022-09-09 16:00:44.842| 1| 10| 10|
16|admnistrator |2022-09-08 10:11:14.313| 1| 3| 3|
18|Combined Group|2022-09-09 10:16:42.473| 1| 18| 18|
select *
from a
join (select id_grp, count(*) as users from b group by id_grp) b using(id_grp)
join (select id_grp, count(*) as policy from c group by id_grp) c using(id_grp)
id_grp
group_name
created_on
status
users
policy
17
Teller
2022-09-09 16:00:44
1
2
2
18
Combined Group
2022-09-09 10:16:42
1
3
3
16
admnistrator
2022-09-08 10:11:14
1
3
1
Fiddle
Related
I need help to understand a thing.
I have a table that uses slow changing dimension, with start and end date and indicator if it is active or not:
Type
start
end
value
active
A
0001/01/01
9999/12/31
10
1
B
2015/03/18
2016-06-25
4
0
B
2016-06-25
9999/12/31
7
1
C
2017-05-07
9999/12/31
8
1
I need to connect this table to a report in power bi and fetch the respective value in a line graph that brings the values by month.
Something like this:
I am using a report connect to SSAS through Direct Query. I am able to create a view with the new structure to connect to my cube.
How can I get this result using a table with this structure.
Thanks for the help!
I thought about creating a table with a value for each month, but as you can see, I have dates ranging from 0001-01-01 to 9999-12-31. (To be honest, I don't really know how to do that either).
Based on the chart, assume that date range is 2016-04 ~ 2017-12, so
Generate a month dimension for the above date range
Cross join month dimension and the give slow changing dimension slo_dim and get value in effective period
Draw a line chart based on the result from step 2
with cte_month (year_month, n) as (
select cast('2016-04-01' as date), 1
union all
select dateadd(month, 1, year_month), n+1
from cte_month
where n < 21)
select d.type,
m.year_month,
d.value
from cte_month m, slo_dim d
where m.year_month between d.start_dt and d.end_dt
order by d.type, m.year_month;
Result:
type|year_month|value|
----+----------+-----+
A |2016-04-01| 10|
A |2016-05-01| 10|
A |2016-06-01| 10|
A |2016-07-01| 10|
A |2016-08-01| 10|
A |2016-09-01| 10|
A |2016-10-01| 10|
A |2016-11-01| 10|
A |2016-12-01| 10|
A |2017-01-01| 10|
A |2017-02-01| 10|
A |2017-03-01| 10|
A |2017-04-01| 10|
A |2017-05-01| 10|
A |2017-06-01| 10|
A |2017-07-01| 10|
A |2017-08-01| 10|
A |2017-09-01| 10|
A |2017-10-01| 10|
A |2017-11-01| 10|
A |2017-12-01| 10|
B |2016-04-01| 4|
B |2016-05-01| 4|
B |2016-06-01| 4|
B |2016-07-01| 7|
B |2016-08-01| 7|
B |2016-09-01| 7|
B |2016-10-01| 7|
B |2016-11-01| 7|
B |2016-12-01| 7|
B |2017-01-01| 7|
B |2017-02-01| 7|
B |2017-03-01| 7|
B |2017-04-01| 7|
B |2017-05-01| 7|
B |2017-06-01| 7|
B |2017-07-01| 7|
B |2017-08-01| 7|
B |2017-09-01| 7|
B |2017-10-01| 7|
B |2017-11-01| 7|
B |2017-12-01| 7|
C |2017-06-01| 8|
C |2017-07-01| 8|
C |2017-08-01| 8|
C |2017-09-01| 8|
C |2017-10-01| 8|
C |2017-11-01| 8|
C |2017-12-01| 8|
I have three dimension tables and a fact table and i need to write a query in such way that i join all the dimension columns with fact table to find out top 10 ATMs where most transactions are in the ’inactive’ state.I try the query with cartesian join but i dont know if this is the right way to join the tables.
select a.atm_number,a.atm_manufacturer,b.location,count(c.trans_id) as total_transaction_count,count(c.atm_status) as inactive_count
from dimen_atm a,dimen_location b,fact_atm_trans c
where a.atm_id = c.atm_id and b.location = c.location
order by inactive_count desc limit 10;
dimen_card_type
+------------+---------+
|card_type_id|card_type|
+------------+---------+
| 1| CIRRUS|
| 2| Dankort|
dimen_atm
+------+----------+----------------+---------------+
|atm_id|atm_number|atm_manufacturer|atm_location_id|
+------+----------+----------------+---------------+
| 1| 1| NCR| 16|
| 2| 2| NCR| 64|
+------+----------+----------------+---------------+
dimen_location
+-----------+--------------------+----------------+-------------+-------+------+------+
|location_id| location| streetname|street_number|zipcode| lat| lon|
+-----------+--------------------+----------------+-------------+-------+------+------+
| 1|Intern København|Rådhuspladsen| 75| 1550|55.676|12.571|
| 2| København| Regnbuepladsen| 5| 1550|55.676|12.571|
+-----------+--------------------+----------------+-------------+-------+------+------+
fact_atm_trans
+--------+------+--------------+-------+------------+----------+--------+----------+------------------+------------+------------+-------+----------+----------+------------+-------------------+
|trans_id|atm_id|weather_loc_id|date_id|card_type_id|atm_status|currency| service|transaction_amount|message_code|message_text|rain_3h|clouds_all|weather_id|weather_main|weather_description|
+--------+------+--------------+-------+------------+----------+--------+----------+------------------+------------+------------+-------+----------+----------+------------+-------------------+
| 1| 1| 16| 5229| 3| Active| DKK|Withdrawal| 5980| null| null| 0.0| 80| 803| Clouds| broken cloudsr|
| 2| 1| 16| 4090| 10| Active| DKK|Withdrawal| 3992| null| null| 0.0| 32| 802| Clouds| scattered cloudsr|
+--------+------+--------------+-------+------------+----------+--------+----------+------------------+------------+-----------
I have a table which contains the following schema:
Table1
+------------------+--------------------+-------------------+-------------+-------------+
|student_id|project_id|name|project_name|approved|evaluation_type|grade| cohort_number|
I have another table with the following:
Table2
+-------------+----------+
|cohort_number|project_id|
My problem is: I want to get for each student_id the projects that he has not completed (no rows). The way i know all the projects he should have done is by checking the cohort_number. Basically I need the "diference" between the 2 tables. I want to fill table 1 with the missing entries, by comparing with table 2 project_id for that cohort_number.
I am not sure if I was clear.
I tried using LEFT JOIN, but I only get records where it matches. (I need the opposite)
Example:
Table1
|student_id|project_id|name| project_name| approved|evaluation_type| grade|cohort_number|
+----------+----------+--------------------+------+--------------------+--------+---------------+------------------
| 13| 18|Name| project/sd-03-bloc...| true| standard| 1.0| 3|
| 13| 7|Name| project/sd-03-bloc...| true| standard| 1.0| 3|
| 13| 27|Name| project/sd-03-bloc...| true| standard| 1.0| 3|
Table2
+-------------+----------+
|cohort_number|project_id|
+-------------+----------+
| 3| 18|
| 3| 27|
| 4| 15|
| 3| 7|
| 3| 35|
I want:
|student_id|project_id|name| project_name| approved|evaluation_type| grade|cohort_number|
+----------+----------+--------------------+------+--------------------+--------+---------------+------------------
| 13| 18|Name| project/sd-03-bloc...| true| standard| 1.0| 3|
| 13| 7|Name| project/sd-03-bloc...| true| standard| 1.0| 3|
| 13| 27|Name| project/sd-03-bloc...| true| standard| 1.0| 3|
| 13| 35|Name| project/sd-03-bloc...| false| standard| 0| 3|
Thanks in advance
If I followed you correctly, you can get all distinct (student_id, cohort_number, name) tuples from table1, and then bring all corresponding rows from table2. This basically gives you one row for each project that a student should have completed.
You can then bring table1 with a left join. "Missing" projects are identified by null values in columns project_name, approved, evaluation_type, grade.
select
s.student_id,
t2.project_id,
s.name,
t1.project_name,
t1.approved,
t1.evaluation_type,
t1.grade,
s.cohort_number
from (select distinct student_id, cohort_number, name from table1) s
inner join table2 t2
on t2.cohort_number = s.cohort_number
left join table1 t1
on t1.student_id = s.student_id
and t1.project_id = t.project_id
I have sql code which works perfectly:
val sql ="""
select a.*,
b.fOOS,
b.prevD
from dataFrame as a
left join dataNoPromoFOOS as b on
a.shopId = b.shopId and a.skuId = b.skuId and
a.Date > b.date and a.date <= b.prevD
"""
result:
+------+------+----------+-----+-----+------------------+---+----------+------------------+----------+
|shopId| skuId| date|stock|sales| salesRub| st|totalPromo| fOOS| prevD|
+------+------+----------+-----+-----+------------------+---+----------+------------------+----------+
| 200|154057|2017-03-31|101.0| 49.0| 629.66| 1| 0|58.618803952304724|2017-03-31|
| 200|154057|2017-09-11|116.0| 76.0| 970.67| 1| 0| 63.3344597217295|2017-09-11|
| 200|154057|2017-11-10| 72.0| 94.0| 982.4599999999999| 1| 0|59.019226118850405|2017-11-10|
| 200|154057|2018-10-08|126.0| 34.0| 414.44| 1| 0| 55.16878756270067|2018-10-08|
| 200|154057|2016-08-03|210.0| 27.0| 307.43| 1| 0|23.530049844711286|2016-08-03|
| 200|154057|2016-09-03| 47.0| 20.0| 246.23| 1| 0|24.656378380329674|2016-09-03|
| 200|154057|2016-12-31| 66.0| 30.0| 386.5| 1| 1| 26.0423103074891|2017-01-09|
| 200|154057|2017-02-28| 22.0| 61.0| 743.2899999999998| 1| 0| 54.86808157636879|2017-02-28|
| 200|154057|2017-03-16| 79.0| 41.0|505.40999999999997| 1| 0| 49.79449369431623|2017-03-16|
when i use scala this code don't work
dataFrame.join(dataNoPromoFOOS,
dataFrame("shopId") === dataNoPromoFOOS("shopId") &&
dataFrame("skuId") === dataNoPromoFOOS("skuId") &&
(dataFrame("date").lt(dataNoPromoFOOS("date"))) &&
(dataFrame("date").geq(dataNoPromoFOOS("prevD"))) ,
"left"
).select(dataFrame("*"),dataNoPromoFOOS("fOOS"),dataNoPromoFOOS("prevD"))
result:
+------+------+----------+-----+-----+------------------+---+----------+----+-----+
|shopId| skuId| date|stock|sales| salesRub| st|totalPromo|fOOS|prevD|
+------+------+----------+-----+-----+------------------+---+----------+----+-----+
| 200|154057|2016-09-24|288.0| 34.0| 398.66| 1| 0|null| null|
| 200|154057|2017-06-11| 40.0| 38.0| 455.32| 1| 1|null| null|
| 200|154057|2017-08-18| 83.0| 20.0|226.92000000000002| 1| 1|null| null|
| 200|154057|2018-07-19|849.0| 58.0| 713.12| 1| 0|null| null|
| 200|154057|2018-08-11|203.0| 52.0| 625.74| 1| 0|null| null|
| 200|154057|2016-09-01|120.0| 24.0| 300.0| 1| 1|null| null|
| 200|154057|2016-12-22| 62.0| 30.0| 378.54| 1| 0|null| null|
| 200|154057|2017-05-11|105.0| 49.0| 597.12| 1| 0|null| null|
| 200|154057|2016-12-28| 3.0| 36.0| 433.11| 1| 1|null| null|
somebody know why sql code work and scala code don't join left table.
i think it's the date column, but i don't undestand how i can find my error
I am trying to join two dataset on spark, I am using spark version 2.1,
SELECT *
FROM Tb1
INNER JOIN Tb2
ON Tb1.key1=Tb2.key1
OR Tb1.key2=Tb2.Key2
But it results on cross join, how can I join two tables and get only matching records?
I also have tried left outer join, but also it is forcing me to change to cross join instead ??
Try this method
from pyspark.sql import SQLContext as SQC
sqc = SQC(sc)
x = [(1,2,3), (4,5,6), (7,8,9), (10,11,12), (13,14,15)]
y = [(1,4,5), (4,5,6), (10,11,16),(34,23,31), (56,14,89)]
x_df = sqc.createDataFrame(x,["x","y","z"])
y_df = sqc.createDataFrame(y,["x","y","z"])
cond = [(x_df.x == y_df.x) | ( x_df.y == y_df.y)]
x_df.join(y_df,cond, "inner").show()
output
+---+---+---+---+---+---+
| x| y| z| x| y| z|
+---+---+---+---+---+---+
| 1| 2| 3| 1| 4| 5|
| 4| 5| 6| 4| 5| 6|
| 10| 11| 12| 10| 11| 16|
| 13| 14| 15| 56| 14| 89|
+---+---+---+---+---+---+
By joining it twice:
select *
from Tb1
inner join Tb2
on Tb1.key1=Tb2.key1
inner join Tb2 as Tb22
on Tb1.key2=Tb22.Key2
Or Left joining both:
select *
from Tb1
left join Tb2
on Tb1.key1=Tb2.key1
left join Tb2 as Tb22
on Tb1.key2=Tb22.Key2