RANKX issues using DAX - sql

I was working on the conversion of a sql request into a dax one :
>;WITH cte
>AS
>(
> SELECT uc.idU_Email
> ,uc.id_Type
> ,uc.dRecueil
> ,valeur
> ,ROW_NUMBER() OVER(PARTITION BY idU_Email,id_Type ORDER BY dRecueil DESC, valeur ASC) AS rang
> FROM vUE uc
> WHERE uc.Id_Type=1 AND uc.dRecueil<=(DATEADD(month, -1, GETDATE()))
>)
>SELECT COUNT(idU_Email)
> FROM cte
> WHERE rang = 1
> and valeur = 1
If i understood well, the equivalent of the rownumber function in sql in the RANKX Function in DAX.
So basically, to achieve the conversion i created a new calculated colum with the following expression :
Ranking on partition
To resume it, it makes a ranking on every parition of idU_Email order by dRecueil and then by valeur.
Thus, the only thing left to do was to add a condition on the date (like in my SQL request), but i guess i don't really know how (or where?), and an error occurred when i tried, which i haven't achieved to solved yet :
Ranking on partition with filter on date
I hope someone would find a way to solve that issue (or even to propose me a better way to have the equivalent of my SQL request).
Thanks by advance ! Smiley Happy
[UPDATE]
I've achieved to partition as I wanted to do and to apply a date filter like that :
RANK_OnDate = IF(vUE[dRecueil] <= DATE(YEAR(TODAY()); MONTH(TODAY())-1; DAY(TODAY()))
;RANKX(FILTER(vUE; vUE[idU_Email] = EARLIER(vUE[idU_Email]) && vUE[id_Type] = EARLIER(vUE[id_Type]))
;RANKX(ALL(vUE); vUE[dRecueil]; ;ASC)
+
DIVIDE(
RANKX(ALL(vUE); vUE[valeur]; ; DESC; Skip)
;COUNTROWS(ALL(vUE)) + 1
)
)
)
But, actually, i would like it to filter dynamically... Thus i don't know if it would be better to use that directly in a measure (according that i want to count distinctly the lower/higher rank for each idU_email).
Additionally my filter just apply blank when the date isn't ok with the filter but the ranking stay the same...
I tried to do directly the distinct count in a measure but can't save the different issues encountered... Have you got some ideas ?
(and thanks for your answers) :)

I think you're pretty close, try this
Rank = IF(vUE[dRecueil] <= DATE(YEAR(TODAY()), MONTH(TODAY())-1, DAY(TODAY())),
RANKX(FILTER(vUE, vUE[idU_Email] = EARLIER(vUE[idU_Email])),
RANKX(ALL(vUE), vUE[dRecueil], ,ASC) +
DIVIDE(
RANKX(ALL(vUE), vUE[valeur], , DESC, Skip),
COUNTROWS(ALL(vUE)) + 1)))

Related

How to return distinct rows while keeping the ordering in a query (SQL Alchemy)

I've been stuck on this for a few days now. An event can have multiple dates, and I want the query to only return the date closest to today (the next date). I have considered querying for Events and then adding a hybrid property to Event that returns the next Event Date but I believe this won't work out (such as if I want to query EventDates in a certain range).
I'm having a problem with distinct() not working as I would expect. Keep in mind I'm not a SQL expert. Also, I'm using postgres.
My query starts like this:
distance_expression = func.ST_Distance(
cast(EventLocation.geo, Geography(srid=4326)),
cast("SRID=4326;POINT(%f %f)" % (lng, lat), Geography(srid=4326)),
)
query = (
db.session.query(EventDate)
.populate_existing()
.options(
with_expression(
EventDate.distance,
distance_expression,
)
)
.join(Event, EventDate.event_id == Event.id)
.join(EventLocation, EventDate.location_id == EventLocation.id)
)
And then I have multiple filters (just showing a few for as an example)
query = query.filter(EventDate.start >= datetime.utcnow)
if kwargs.get("locality_id", None) is not None:
query = query.filter(EventLocation.locality_id == kwargs.pop("locality_id"))
if kwargs.get("region_id", None) is not None:
query = query.filter(EventLocation.region_id == kwargs.pop("region_id"))
if kwargs.get("country_id", None) is not None:
query = query.filter(EventLocation.country_id == kwargs.pop("country_id"))
Then I want to order by date and distance (using my query expression)
query = query.order_by(
EventDate.start.asc(),
distance_expression.asc(),
)
And finally I want to get distinct rows, and only return the next EventDate of an event, according to the ordering in the code block above.
query = query.distinct(Event.id)
The problem is that this doesn't work and I get a database error. This is what the generated SQL looks like:
SELECT DISTINCT ON (events.id) ST_Distance(CAST(event_locations.geo AS geography(GEOMETRY,4326)), CAST(ST_GeogFromText(%(param_1)s) AS geography(GEOMETRY,4326))) AS "ST_Distance_1", event_dates.id AS event_dates_id, event_dates.created_at AS event_dates_created_at, event_dates.event_id AS event_dates_event_id, event_dates.tz AS event_dates_tz, event_dates.start AS event_dates_start, event_dates."end" AS event_dates_end, event_dates.start_naive AS event_dates_start_naive, event_dates.end_naive AS event_dates_end_naive, event_dates.location_id AS event_dates_location_id, event_dates.description AS event_dates_description, event_dates.description_attribute AS event_dates_description_attribute, event_dates.url AS event_dates_url, event_dates.ticket_url AS event_dates_ticket_url, event_dates.cancelled AS event_dates_cancelled, event_dates.size AS event_dates_size
FROM event_dates JOIN events ON event_dates.event_id = events.id JOIN event_locations ON event_dates.location_id = event_locations.id
WHERE events.hidden = false AND event_dates.start >= %(start_1)s AND (event_locations.lat BETWEEN %(lat_1)s AND %(lat_2)s OR false) AND (event_locations.lng BETWEEN %(lng_1)s AND %(lng_2)s OR false) AND ST_DWithin(CAST(event_locations.geo AS geography(GEOMETRY,4326)), CAST(ST_GeogFromText(%(param_2)s) AS geography(GEOMETRY,4326)), %(ST_DWithin_1)s) ORDER BY event_dates.start ASC, ST_Distance(CAST(event_locations.geo AS geography(GEOMETRY,4326)), CAST(ST_GeogFromText(%(param_3)s) AS geography(GEOMETRY,4326))) ASC
I've tried a lot of different things and orderings but I can't work this out. I've also tried to create a subquery at the end using from_self() but it doesn't keep the ordering.
Any help would be much appreciated!
EDIT:
On further experimentation it seems that I can't use order_by will only work if it's ordering the same field that I'm using for distinct(). So
query = query.order_by(EventDate.event_id).distinct(EventDate.event_id)
will work, but
query.order_by(EventDate.start).distinct(EventDate.event_id)
will not :/
I solved this by using adding a row_number column and then filtering by the first row numbers like in this answer:
filter by row_number in sqlalchemy

Getting error "SQL compilation error: Unsupported subquery type cannot be evaluated"

Am new to Snowflake programming though I had much experience in Oracle DB.
When am running the below query in Snowflake am getting the error as
SQL compilation error: Unsupported subquery type cannot be evaluated
SELECT organization_id,
inventory_item_id,
revision,
effectivity_date,
revision_label,
revision_id
FROM cg1_mtl_item_revisions_b mir
WHERE effectivity_date IN
(SELECT FIRST_VALUE (ir2.effectivity_date)
OVER (ORDER BY ir2.effectivity_date DESC)
effectivity_date
FROM cg1_mtl_item_revisions_b ir2
WHERE ir2.inventory_item_id = mir.inventory_item_id
AND ir2.organization_id = mir.organization_id
AND ir2.effectivity_date <= CURRENT_DATE
AND ir2.implementation_date IS NOT NULL)
AND mir.revision IN
(SELECT FIRST_VALUE (ir3.revision)
OVER (ORDER BY ir3.revision DESC)
revision
FROM cg1_mtl_item_revisions_b ir3
WHERE ir3.inventory_item_id = mir.inventory_item_id
AND ir3.organization_id = mir.organization_id
AND ir3.implementation_date IS NOT NULL
AND ir3.effectivity_date = mir.effectivity_date);
Am I missing something here??
Can someone plz help me here.
Thanks in Advance,
Sudarshan
The Snowflake database doesn't support correlated subqueries as extensively as Oracle does.
You have to find a way to rewrite, eg. using
WITH <common table expressions ...>
SELECT ...
JOIN ...
You seem to want the latest revision from the latest effective date. Window functions are probably a better approach in any database:
SELECT mir.* -- whatever columns you want
FROM (SELECT mir.*,
ROW_NUMBER() OVER (PARTITION BY mir.inventory_item_id, mir.organization_id
ORDER BY mir.effectivity_date DESC, mir.revision DESC) as seqnum
FROM cg1_mtl_item_revisions_b mir
) mir
WHERE seqnum = 1;
I have figured out that the third condition of the join in the last sub-query is throwing this error. If we comment out the following line then the code returns data:
{ AND ir3.effectivity_date = mir.effectivity_date }
So it seems three join conditions in a sub-query are not getting supported.
We need to work out on finding an alternate piece of code to satisfy the above condition so that we get the correct result set.
Another approach may be to use lateral joins - https://docs.snowflake.com/en/sql-reference/constructs/join-lateral.html
One important thing about lateral joins is it will only return matched records.

Collapse Data in Sql without stored precedure or function if a value is the same as the value from row above

I got a problem regarding grouping if a value is the same as in the row above.
Our statement looks like this:
SELECT pat_id,
treatData.treatmentdate AS Date,
treatMeth.name AS TreatDataTableInfo,
treatData.treatmentid AS TreatID
FROM dialysistreatmentdata treatData
LEFT JOIN hdtreatmentmethods treatMeth
ON treatMeth.id = treatData.hdtreatmentmethodid
WHERE treatData.hdtreatmentmethodid IS NOT NULL
AND Year(treatData.treatmentdate) >= 2013
AND ekeyid = 12
ORDER BY treatData.ekeyid,
treatmentdate DESC,
treatdatatableinfo;
The output looks like this:
The desired output should be grouped if the value is the same as in the row/rows before and ther should be a ToDate as you can see in the screenshot which is the date of the next row -1 day.
The desired output should look like this:
I hope someone has a solution regarding this matter!
Or maybe someone has an idea how to solve this problem within qlikview.
Looking forward for solutions
Michael
You want to collapse episodes of treatment into single rows. This is a "gaps-and-islands" problem. I like the difference of row numbers approach:
select patid, min(date) as fromdate, max(date) as todate, TreatDataTableInfo,
min(treatid)
from (select td.Pat_ID, td.TreatmentDate As Date, tm.Name As TreatDataTableInfo,
td.TreatmentID As TreatID,
row_number() over (partition by td.pat_id order by td.treatmentdate) as seqnum_p,
row_number() over (partition by td.pat_id, tm.name order by td.treatment_date) as seqnum_pn
from DialysisTreatmentData td Left join
HDTreatmentMethods tm
On tm.ID = td.HDTreatmentMethodID
where td.HDTreatmentMethodID Is Not Null And
td.TreatmentDate) >= '2013-01-01' and
EKeyID = 12
) t
group by patid, TreatDataTableInfo, (seqnum_p - seqnum_pn)
order by patid, TreatmentDate Desc, TreatDataTableInfo;
Note: This uses the ANSI standard window function row_number(), which is available in most databases.
Below is a possible Qlikview solution. I've put some comments in the script. If it's not clear just let me know. The result picture is below the script.
RawData:
Load * Inline [
Pat_ID,Date,TreatDataTableInfo,TreatId
PatNum_12,08.07.2016,HDF Pradilution,1
PatNum_12,07.07.2016,HDF Predilution,2
PatNum_12,23.03.2016,HD,3
PatNum_12,24.11.2015,HD,4
PatNum_12,22.11.2015,HD,5
PatNum_12,04.09.2015,HD,6
PatNum_12,01.09.2015,HD,7
PatNum_12,30.07.2015,HD,8
PatNum_12,12.01.2015,HD,9
PatNum_12,09.01.2015,HD,10
PatNum_12,26.08.2014,Hemodialysis,11
PatNum_12,08.07.2014,Hemodialysis,12
PatNum_12,23.05.2014,Hemodialysis,13
PatNum_12,19.03.2014,Hemodialysis,14
PatNum_12,29.01.2014,Hemodialysis,15
PatNum_12,14.12.2013,Hemodialysis,16
PatNum_12,26.10.2013,Hemodialysis,17
PatNum_12,05.10.2013,Hemodialysis,18
PatNum_12,03.10.2013,HD,19
PatNum_12,24.06.2013,Hemodialysis,20
PatNum_12,03.06.2013,Hemodialysis,21
PatNum_12,14.05.2013,Hemodialysis,22
PatNum_12,26.02.2013,HDF Postdilution,23
PatNum_12,23.02.2013,HDF Pradilution,24
PatNum_12,21.02.2013,HDF Postdilution,25
PatNum_12,07.02.2013,HD,26
PatNum_12,25.01.2013,HDF Pradilution,27
PatNum_12,18.01.2013,HDF Pradilution,28
];
GroupedData:
Load
*,
// assign new GroupId for all rows where the TreatDataTableInfo is equal
if( RowNo() = 1, 1,
if( TreatDataTableInfo <> peek('TreatDataTableInfo'),
peek('GroupId') + 1, peek('GroupId'))) as GroupId,
// assign new GroupSubId (incremental int) for all the records in each group
if( TreatDataTableInfo <> peek('TreatDataTableInfo'),
1, peek('GroupSubId') + 1) as GroupSubId,
// pick the first Date field value and spread it acccross the group
if( TreatDataTableInfo <> peek('TreatDataTableInfo'), TreatId, peek('TreatId_Temp')) as TreatId_Temp
Resident
RawData
;
Drop Table RawData;
right join (GroupedData)
// get the max GroupSubId for each group and right join it to
// the GroupedData table to remove the records we dont need
MaxByGroup:
Load
max(GroupSubId) as GroupSubId,
GroupId
Resident
GroupedData
Group By
GroupId
;
// these are not needed anymore
Drop Fields GroupId, GroupSubId, TreatId;
// replace the old TreatId with the new TreatId_Temp field
// which contains the first TreatId for each group
Rename Field TreatId_Temp to TreatId;

Selecting the last set of records in SQL Server that match certain criteria

I've got 10 records that match the criteria I'm searching for. The problem is there are two sets of 10 records, one at 1pm and one at 3pm. I only want the sets at 3pm. Here's part of my SQL:
select shp_rev.ShpNum, shp_rev.RevTime
from shp_rev
where shp_rev.RevDate = '10/1/2015'
and shp_rev.ValAfter = 'O'
and shp_rev.ShpNum = 732809
(I've added the shp_rev.ShpNum to the where just to narrow down the data to a dataset that has this problem. Normally I wouldn't have that in the where. And there are other fields that would be included with this select.)
This produces:
732809 13:14:45
732809 13:14:45
...
732809 15:23:33
732809 15:23:33
...
I only want the records at 15:23:33. I know one way I can do it is to concatenate all of the fields I want into one string with ShpNum and RevTime at the beginning then using MAX() to get just the 3pm records like this:
select max(cast(shp_rev.ShpNum as varchar) + '~' + shp_rev.RevTime)
from shp_rev
where shp_rev.RevDate = '10/1/2015'
and shp_rev.ValAfter = 'O'
and shp_rev.ShpNum = 732809
order by max(cast(shp_rev.ShpNum as varchar) + '~' + shp_rev.RevTime)
But then I have to parse back that string to get everything. It seems there must be a better way. I've tried using MAX() on ShpNum and RevTime separately but that doesn't work. Any thoughts? Oh, I'm working in SQL Server 2012. Thanks!
If I understand correctly, you can use dense_rank():
select s.*
from (select shp_rev.ShpNum, shp_rev.RevTime,
dense_rank() over (partition by revdate, valafter, shipnum order by revtime desc) as seqnum
from shp_rev
where shp_rev.RevDate = '2015-10-01' and
shp_rev.ValAfter = 'O' and
shp_rev.ShpNum = 732809
) s
where seqnum = 1;
This assumes that the time stamps are all exactly the same.

SubQuery Aggregates in ActiveRecord

I'm trying to avoid using straight up SQL in my Rails app, but need to do a quite large version of this:
SELECT ds.product_id,
( SELECT SUM(units) FROM daily_sales WHERE (date BETWEEN '2015-01-01' AND '2015-01-08') AND service_type = 1 ) as wk1,
( SELECT SUM(units) FROM daily_sales WHERE (date BETWEEN '2015-01-09' AND '2015-01-16') AND service_type = 1 ) as wk2
FROM daily_sales as ds group by ds.product_id
I'm sure it can be done, but i'm struggling to write this as an active record statement. Can anyone help?
If you must do this in a single query, you'll need to write some SQL for the CASE statements. The following is what you need:
ranges = [ # ordered array of all your date-ranges
Date.new(2015, 1, 1)..Date.new(2015, 1, 8),
Date.new(2015, 1, 9)..Date.new(2015, 1, 16)
]
overall_range = (ranges.first.min)..(ranges.last.max)
grouping_sub_str = \
ranges.map.with_index do |range, i|
"WHEN (date BETWEEN '#{range.min}' AND '#{range.max}') THEN 'week#{i}'"
end.join(' ')
grouping_condition = "CASE #{grouping_sub_str} END"
grouping_columns = ['product_id', grouping_condition]
DailySale.where(date: overall_range).group(grouping_columns).sum(:units)
That will produce a hash with array keys and numeric values. A key will be of the form [product_id, 'week1'] and the value will be the corresponding sum of units for that week.
Simplify your SQL to the following and try converting it..
SELECT ds.product_id,
, SUM(CASE WHEN date BETWEEN '2015-01-01' AND '2015-01-08' AND service_type = 1
THEN units
END) WK1
, SUM(CASE WHEN date BETWEEN '2015-01-09' AND '2015-01-16' AND service_type = 1
THEN units
END) WK2
FROM daily_sales as ds
group by ds.product_id
Every rail developer sooner or later hits his/her head against the walls of Active Record query interface just to find the solution in Arel.
Arel gives you the flexibility that you need in creating your query without using loops, etc. I am not going to give runnable code rather some hints how to do it yourself:
We are going to use arel_tables to create our query. For a model called for example Product, getting the Arel table is as easy as products = Product.arel_table
Getting sum of a column is like daily_sales.project(daily_sales[:units].count).where(daily_sales[:date].gt(BEGIN_DATE).where(daily_sales[:date].lt(END_DATE). You can chain as many wheres as you want and it will be translated into SQL ANDs.
Since we need to have multiple sums in our end result you need to make use of Common Table Expressions(CTE). Take a look at docs and this answer for more info on this.
You can use those CTEs from step 3 in combination with group and you are done!