How to return distinct rows while keeping the ordering in a query (SQL Alchemy) - sql

I've been stuck on this for a few days now. An event can have multiple dates, and I want the query to only return the date closest to today (the next date). I have considered querying for Events and then adding a hybrid property to Event that returns the next Event Date but I believe this won't work out (such as if I want to query EventDates in a certain range).
I'm having a problem with distinct() not working as I would expect. Keep in mind I'm not a SQL expert. Also, I'm using postgres.
My query starts like this:
distance_expression = func.ST_Distance(
cast(EventLocation.geo, Geography(srid=4326)),
cast("SRID=4326;POINT(%f %f)" % (lng, lat), Geography(srid=4326)),
)
query = (
db.session.query(EventDate)
.populate_existing()
.options(
with_expression(
EventDate.distance,
distance_expression,
)
)
.join(Event, EventDate.event_id == Event.id)
.join(EventLocation, EventDate.location_id == EventLocation.id)
)
And then I have multiple filters (just showing a few for as an example)
query = query.filter(EventDate.start >= datetime.utcnow)
if kwargs.get("locality_id", None) is not None:
query = query.filter(EventLocation.locality_id == kwargs.pop("locality_id"))
if kwargs.get("region_id", None) is not None:
query = query.filter(EventLocation.region_id == kwargs.pop("region_id"))
if kwargs.get("country_id", None) is not None:
query = query.filter(EventLocation.country_id == kwargs.pop("country_id"))
Then I want to order by date and distance (using my query expression)
query = query.order_by(
EventDate.start.asc(),
distance_expression.asc(),
)
And finally I want to get distinct rows, and only return the next EventDate of an event, according to the ordering in the code block above.
query = query.distinct(Event.id)
The problem is that this doesn't work and I get a database error. This is what the generated SQL looks like:
SELECT DISTINCT ON (events.id) ST_Distance(CAST(event_locations.geo AS geography(GEOMETRY,4326)), CAST(ST_GeogFromText(%(param_1)s) AS geography(GEOMETRY,4326))) AS "ST_Distance_1", event_dates.id AS event_dates_id, event_dates.created_at AS event_dates_created_at, event_dates.event_id AS event_dates_event_id, event_dates.tz AS event_dates_tz, event_dates.start AS event_dates_start, event_dates."end" AS event_dates_end, event_dates.start_naive AS event_dates_start_naive, event_dates.end_naive AS event_dates_end_naive, event_dates.location_id AS event_dates_location_id, event_dates.description AS event_dates_description, event_dates.description_attribute AS event_dates_description_attribute, event_dates.url AS event_dates_url, event_dates.ticket_url AS event_dates_ticket_url, event_dates.cancelled AS event_dates_cancelled, event_dates.size AS event_dates_size
FROM event_dates JOIN events ON event_dates.event_id = events.id JOIN event_locations ON event_dates.location_id = event_locations.id
WHERE events.hidden = false AND event_dates.start >= %(start_1)s AND (event_locations.lat BETWEEN %(lat_1)s AND %(lat_2)s OR false) AND (event_locations.lng BETWEEN %(lng_1)s AND %(lng_2)s OR false) AND ST_DWithin(CAST(event_locations.geo AS geography(GEOMETRY,4326)), CAST(ST_GeogFromText(%(param_2)s) AS geography(GEOMETRY,4326)), %(ST_DWithin_1)s) ORDER BY event_dates.start ASC, ST_Distance(CAST(event_locations.geo AS geography(GEOMETRY,4326)), CAST(ST_GeogFromText(%(param_3)s) AS geography(GEOMETRY,4326))) ASC
I've tried a lot of different things and orderings but I can't work this out. I've also tried to create a subquery at the end using from_self() but it doesn't keep the ordering.
Any help would be much appreciated!
EDIT:
On further experimentation it seems that I can't use order_by will only work if it's ordering the same field that I'm using for distinct(). So
query = query.order_by(EventDate.event_id).distinct(EventDate.event_id)
will work, but
query.order_by(EventDate.start).distinct(EventDate.event_id)
will not :/

I solved this by using adding a row_number column and then filtering by the first row numbers like in this answer:
filter by row_number in sqlalchemy

Related

Agregating a subquery

I try to find what I missed in the code to retrieve the value of "Last_Maintenance" in a table called "Interventions".
I try to understand the order rules of SQL and the particularities of subqueries without success.
Did I missed something, something basic or an important step?
---Interventions with PkState "Schedule_Visit" with the Last_Maintenance aggregation
SELECT Interventions.ID AS Nro_Inter,
--Interventions.PlacesList AS Nro_Place,
MaintenanceContracts.Num AS Nro_Contract,
Interventions.TentativeDate AS Schedule_Visit,
--MaintenanceContracts.NumberOfVisits AS Number_Visits_Contracts,
--Interventions.VisitNumber AS Visit_Number,
(SELECT MAX(Interventions.AssignmentDate)
FROM Interventions
WHERE PkState = 'AE4B42CF-0003-4796-89F2-2881527DFB26' AND PkMaintenanceContract IS NOT NULL) AS Last_Maintenance --PkState "Maintenance Executed"
FROM Interventions
INNER JOIN MaintenanceContracts ON MaintenanceContracts.Pk = Interventions.PkMaintenanceContract
WHERE PkState = 'AE4B42CF-0000-4796-89F2-2881527ABC26' AND PkMaintenanceContract IS NOT NULL --PkState "Schedule_Visit"
GROUP BY Interventions.AssignmentDate,
Interventions.ID,
Interventions.PlacesList,
MaintenanceContracts.Num,
Interventions.TentativeDate,
MaintenanceContracts.NumberOfVisits,
Interventions.VisitNumber
ORDER BY Nro_Contract
I try to use GROUP BY and HAVING clause in a sub query, I did not succeed. Clearly I am lacking some understanding.
Output
The output of "Last_Maintenance" is the last date of entire contracts in the DB, which is not the desirable output. The desirable output is to know the last date the maintenance was executed for each row, meaning, for each "Nro-Contract". Somehow I need to aggregate like I did below.
In opposition of what mention I did succeed in another table.
In the table Contracts I did had success as you can see.
SELECT
MaintenanceContracts.Num AS Nro_Contract,
MAX(Interventions.AssignmentDate) AS Last_Maintenance
--MaintenanceContracts.Name AS Place
--MaintenanceContracts.StartDate,
--MaintenanceContracts.EndDate
FROM MaintenanceContracts
INNER JOIN Interventions ON Interventions.PkMaintenanceContract = MaintenanceContracts.Pk
WHERE MaintenanceContracts.ActiveContract = 2 OR MaintenanceContracts.ActiveContract = 1 --// 2 = Inactive; 1 = Active
GROUP BY MaintenanceContracts.Num, MaintenanceContracts.Name,
MaintenanceContracts.StartDate,
MaintenanceContracts.EndDate
ORDER BY Nro_Contract
I am struggling to understanding how nested queries works and how I can leverage in a simple manner the queries.
I think you're mixed up in how aggregation works. The MAX function will get a single MAX value over the entire dataset. What you're trying to do is get a MAX for each unique ID. For that, you either use derived tables, subqueries or windowed functions. I'm a fan of using the ROW_NUMBER() function to assign a sequence number. If you do it correctly, you can use that row number to get just the most recent record from a dataset. From your description, it sounds like you always want to have the contract and then get some max values for that contract. If that is the case, then you're second query is closer to what you need. Using windowed functions in derived queries has the added benefit of not having to worry about using the GROUP BY clause. Try this:
SELECT
MaintenanceContracts.Num AS Nro_Contract,
--MaintenanceContracts.Name AS Place
--MaintenanceContracts.StartDate,
--MaintenanceContracts.EndDate
i.AssignmentDate as Last_Maintenance
FROM MaintenanceContracts
INNER JOIN (
SELECT *
--This fuction will order the records for each maintenance contract.
--The most recent intervention will have a row_num = 1
, ROW_NUMBER() OVER(PARTITION BY PkMaintenanceContract ORDER BY AssignmentDate DESC) as row_num
FROM Interventions
) as i
ON i.PkMaintenanceContract = MaintenanceContracts.Pk
AND i.row_num = 1 --Used to get the most recent intervention.
WHERE MaintenanceContracts.ActiveContract = 2
OR MaintenanceContracts.ActiveContract = 1 --// 2 = Inactive; 1 = Active
ORDER BY Nro_Contract
;

Self joining columns from the same table with calculation on one column not displaying column name

I am fairly new to SQL and having issues figuring out how to solve the simple issue below. I have a dataset I am trying to self-join, I am using (b.calendar_year_number -1) as one of the columns to join. I applied a calculation of -1 with the goal of trying to match values from the previous year. However, it is not working as the resulting column shows (No column name) with a screenshot attached below. How do I change the alias to b.calendar_year_number after the calculation?
Code:
SELECT a.day_within_fiscal_period,
a.calendar_month_name,
a.cost_period_rolling_three_month_start_date,
a.calendar_year_number,
b.day_within_fiscal_period,
b.calendar_month_name,
b.cost_period_rolling_three_month_start_date,
(b.calendar_year_number -1)
FROM [data_mart].[v_dim_date_consumer_complaints] AS a
JOIN [data_mart].[v_dim_date_consumer_complaints] AS b
ON b.day_within_fiscal_period = a.day_within_fiscal_period AND
b.calendar_month_name = a.calendar_month_name AND
b.calendar_year_number = a.calendar_year_number
I am using (b.calendar_year_number -1) as one of the columns to join.
Nope, you're not. Look at your join statement and you'll see the third condition is:
b.calendar_year_number = a.calendar_year_number
So just change that to include the calculation. As far as the 'no column name' issue, you can use colname = somelogic syntax or somelogic as colname. Below, I used the former syntax.
select a.day_within_fiscal_period,
a.calendar_month_name,
a.cost_period_rolling_three_month_start_date,
a.calendar_year_number,
b.day_within_fiscal_period,
b.calendar_month_name,
b.cost_period_rolling_three_month_start_date,
bCalYearNum = b.calendar_year_number
from [data_mart].[v_dim_date_consumer_complaints] a
left join [data_mart].[v_dim_date_consumer_complaints] b
on b.day_within_fiscal_period = a.day_within_fiscal_period
and b.calendar_month_name = a.calendar_month_name
and b.calendar_year_number - 1 = a.calendar_year_number;
You could use the analytical function LAG/LEAD to get your required result, no self-join necessary:
select a.day_within_fiscal_period,
a.calendar_month_name,
a.cost_period_rolling_three_month_start_date,
a.calendar_year_number,
old_cost_period_rolling_three_month_start_date =
LAG(cost_period_rolling_three_month_start_date) OVER
(PARTITION BY calendar_month_name, day_within_fiscal_period
ORDER BY calendar_year_number),
old_CalYearNum = LAG(calendar_year_number) OVER
(PARTITION BY calendar_month_name, day_within_fiscal_period
ORDER BY calendar_year_number)
from [data_mart].[v_dim_date_consumer_complaints] a

SQL GROUP BY function returning incorrect SUM amount

I've been working on this problem, researching what I could be doing wrong but I can't seem to find an answer or fault in the code that I've written. I'm currently extracting data from a MS SQL Server database, with a WHERE clause successfully filtering the results to what I want. I get roughly 4 rows per employee, and want to add together a value column. The moment I add the GROUP BY clause against the employee ID, and put a SUM against the value, I'm getting a number that is completely wrong. I suspect the SQL code is ignoring my WHERE clause.
Below is a small selection of data:
hr_empl_code hr_doll_paid
1 20.5
1 51.25
1 102.49
1 560
I expect that a GROUP BY and SUM clause would give me the value of 734.24. The value I'm given is 211461.12. Through troubleshooting, I added a COUNT(*) column to my query to work out how many lines it's running against, and it's giving a result of 1152, furthering reinforces my belief that it's ignoring my WHERE clause.
My SQL code is as below. Most of it has been generated by the front-end application that I'm running it from, so there is some additional code in there that I believe does assist the query.
SELECT DISTINCT
T000.hr_empl_code,
SUM(T175.hr_doll_paid)
FROM
hrtempnm T000,
qmvempms T001,
hrtmspay T166,
hrtpaytp T175,
hrtptype T177
WHERE 1 = 1
AND T000.hr_empl_code = T001.hr_empl_code
AND T001.hr_empl_code = T166.hr_empl_code
AND T001.hr_empl_code = T175.hr_empl_code
AND T001.hr_ploy_ment = T166.hr_ploy_ment
AND T001.hr_ploy_ment = T175.hr_ploy_ment
AND T175.hr_paym_code = T177.hr_paym_code
AND T166.hr_pyrl_code = 'f' AND T166.hr_paid_dati = 20180404
AND (T175.hr_paym_type = 'd' OR T175.hr_paym_type = 't')
GROUP BY T000.hr_empl_code
ORDER BY hr_empl_code
I'm really lost where it could be going wrong. I have stripped out the additional WHERE AND and brought it down to just T166.hr_empl_code = T175.hr_empl_code, but it doesn't make a different.
By no means am I any expert in SQL Server and queries, but I have decent grasp on the technology. Any help would be very appreciated!
Group by is not wrong, how you are using it is wrong.
SELECT
T000.hr_empl_code,
T.totpaid
FROM
hrtempnm T000
inner join (SELECT
hr_empl_code,
SUM(hr_doll_paid) as totPaid
FROM
hrtpaytp T175
where hr_paym_type = 'd' OR hr_paym_type = 't'
GROUP BY hr_empl_code
) T on t.hr_empl_code = T000.hr_empl_code
where exists
(select * from qmvempms T001,
hrtmspay T166,
hrtpaytp T175,
hrtptype T177
WHERE T000.hr_empl_code = T001.hr_empl_code
AND T001.hr_empl_code = T166.hr_empl_code
AND T001.hr_empl_code = T175.hr_empl_code
AND T001.hr_ploy_ment = T166.hr_ploy_ment
AND T001.hr_ploy_ment = T175.hr_ploy_ment
AND T175.hr_paym_code = T177.hr_paym_code
AND T166.hr_pyrl_code = 'f' AND T166.hr_paid_dati = 20180404
)
ORDER BY hr_empl_code
Note: It would be more clear if you have used joins instead of old style joining with where.

SAP Query IMRG Measure documents

I'm learning SAP queries.
I want to get all the Measure documents from an equipement.
To do that, I use 3 tables :
EQUI, IMPTT, IMRG
The query works but I have all documents instead I only want to get the last one by Date. But I can't do that. I'm sure that I have to add a custom field, but I have tried but none of them works.
For example, my last code :
select min( IMRG~INVTS ) IMRG~RECDV
from IMRG inner join IMPTT on
IMRG~POINT = IMPTT~POINT into (INVTS, IMRGVAL)
where IMRG~POINT = IMPTT-POINT AND
IMPTT~MPOBJ = EQUI-OBJNR
and IMRG~CANCL = '' group by IMRG~MDOCM IMRG~RECDV.
ENDSELECT.
Thanks for your help.
You will need to get the date from IMRG, and the inverted timestamp field, so the MIN() of this will be the most recent - that looks correct.
However your GROUP BY looks wrong. You should be grouping on the IMPTT~POINT field so that you get one record per measurement point. Note that one Point IMPTT can have many measurements (IMRG), so something like this:
SELECT EQUI-OBJNR, IMPTT~POINT, MIN(IMRG~IMRC_INVTS)
...
GROUP BY EQUI-OBJNR, IMPTT~POINT
If I got you correctly, you are trying to get the freshest measurement of the equipment disregard of measurement point. So you can try this query, which is not so beautiful, but it just works.
SELECT objnr COUNT(*) MIN( invts )
FROM equi AS eq
JOIN imptt AS tt
ON tt~mpobj = eq~objnr
JOIN imrg AS ig
ON ig~point = tt~point
INTO (wa_objnr, count, wa_invts)
WHERE ig~cancl = ''
GROUP BY objnr.
SELECT SINGLE recdv FROM imrg JOIN imptt ON imptt~point = imrg~point INTO wa_imrgval WHERE invts = wa_invts AND imptt~mpobj = wa_objnr.
WRITE: / wa_objnr, count, wa_invts, wa_imrgval.
ENDSELECT.

Selecting the last set of records in SQL Server that match certain criteria

I've got 10 records that match the criteria I'm searching for. The problem is there are two sets of 10 records, one at 1pm and one at 3pm. I only want the sets at 3pm. Here's part of my SQL:
select shp_rev.ShpNum, shp_rev.RevTime
from shp_rev
where shp_rev.RevDate = '10/1/2015'
and shp_rev.ValAfter = 'O'
and shp_rev.ShpNum = 732809
(I've added the shp_rev.ShpNum to the where just to narrow down the data to a dataset that has this problem. Normally I wouldn't have that in the where. And there are other fields that would be included with this select.)
This produces:
732809 13:14:45
732809 13:14:45
...
732809 15:23:33
732809 15:23:33
...
I only want the records at 15:23:33. I know one way I can do it is to concatenate all of the fields I want into one string with ShpNum and RevTime at the beginning then using MAX() to get just the 3pm records like this:
select max(cast(shp_rev.ShpNum as varchar) + '~' + shp_rev.RevTime)
from shp_rev
where shp_rev.RevDate = '10/1/2015'
and shp_rev.ValAfter = 'O'
and shp_rev.ShpNum = 732809
order by max(cast(shp_rev.ShpNum as varchar) + '~' + shp_rev.RevTime)
But then I have to parse back that string to get everything. It seems there must be a better way. I've tried using MAX() on ShpNum and RevTime separately but that doesn't work. Any thoughts? Oh, I'm working in SQL Server 2012. Thanks!
If I understand correctly, you can use dense_rank():
select s.*
from (select shp_rev.ShpNum, shp_rev.RevTime,
dense_rank() over (partition by revdate, valafter, shipnum order by revtime desc) as seqnum
from shp_rev
where shp_rev.RevDate = '2015-10-01' and
shp_rev.ValAfter = 'O' and
shp_rev.ShpNum = 732809
) s
where seqnum = 1;
This assumes that the time stamps are all exactly the same.