Join on id or null and get first result - sql

I have created the query below:
select * from store str
left join(
select * from schedule sdl
where day = 3
order by
case when sdl.store_id is null then (
case when sdl.strong is true then 0 else 2 end
) else 1 end, sdl.schedule_id desc
) ovr on (ovr.store_id = str.store_id OR ovr.store_id IS NULL)
Sample data:
STORE
[store_id] [title]
20010 Shoes-Shop
20330 Candy-Shop
[SCHEDULE]
[schedule_id] [store_id] [day] [strong] [some_other_data]
1 20330 3 f 10% Discount
2 NULL 3 t 0% Discount
What I want to get from the LEFT JOIN is either data for NULL store_id (global schedule entry - affects all store entries) OR the actual data for the given store_id.
Joining the query like this, returns results with the correct order, but for both NULL and store_id matches. It makes sense using the OR statement on join clause.
Expected results:
[store_id] [title] [some_other_data]
20010 Shoes-Shop 0% Discount
20330 Candy-Shop 0% Discount
Current Results:
[store_id] [title] [some_other_data]
20010 Shoes-Shop 0% Discount
20330 Candy-Shop 0% Discount
20330 Candy-Shop 10% Discount
If there is a more elegant approach on the subject I would be glad to follow it.

DISTINCT ON should work just fine, as soon as you get ORDER BY right. Basically, matches with strong = TRUE in schedule have priority, then matches with store_id IS NOT NULL:
SELECT DISTINCT ON (st.store_id)
st.store_id, st.title, sl.some_other_data
FROM store st
LEFT JOIN schedule sl ON sl.day = 3
AND (sl.store_id = st.store_id OR sl.store_id IS NULL)
ORDER BY NOT strong, store_id IS NULL;
This works because:
Sorting null values after all others, except special
Basics for DISTINCT ON:
Select first row in each GROUP BY group?
Alternative with a LATERAL join (Postgres 9.3+):
SELECT *
FROM store st
LEFT JOIN LATERAL (
SELECT some_other_data
FROM schedule
WHERE day = 3
AND (store_id = st.store_id OR store_id IS NULL)
ORDER BY NOT strong
, store_id IS NULL
LIMIT 1
) sl ON true;
About LATERAL joins:
What is the difference between LATERAL and a subquery in PostgreSQL?

I think the easiest way to do what you want is to use distinct on. The question is then how you order it:
select distinct on (str.store_id) *
from store str left join
schedule sdl
on (sdl.store_id = str.store_id or sdl.store_id is null) and dl.day = 3
order by str.store_id,
(case when sdl.store_id is null then 2 else 1 end)
This will return the store record if available, otherwise the schedule record that has a value of NULL. Note: your query has this notion of strength, but the question doesn't explain how to use it. This can be readily modified to include multiple levels of priorities.

Related

Group table by custom column

I have a table transaction_transaction with columns:
id, status, total_amount, date_made, transaction_type
The status can be: Active, Paid, Trashed, Renewed, Void
So what i want to do is filter by date and status, but since sometimes there are no records with Renewed or Trashed, i get inconsistent data it returns only Active and Paid when grouping by status ( notice Renewed and Trashed is missing ). I want it allways to return smth like:
-----------------------------------
Active | 121 | 2017-08-09
Paid | 122 | 2017-08-19
Trashed | 123 | 2017-08-20
Renewed | 123 | 2017-08-20
The sql query i use:
SELECT
ST.type,
COALESCE(SUM(TR.total_amount), 0) AS amount
FROM sms_admin_status ST
LEFT JOIN transaction_transaction TR ON TR.status = ST.type
WHERE TR.store_id = 21 AND TR.transaction_type = 'Layaway' AND TR.status != 'Void'
AND TR.date_made >= '2018-02-01' AND TR.date_made <= '2018-02-26'
GROUP BY ST.type
Edit: I created a table sms_admin_status since you said its bad not having a table and in the future i might have new statuses, and i also changed the query to fit my needs.
Use a VALUES list in a subquery to LEFT JOIN your transaction table. You may need to NULLIF your sums to have them return 0.
https://www.postgresql.org/docs/10/static/queries-values.html
One possible solution (not very nice one) is the following
select statuses.s, date_made, coalesce(SUM(amount), 0)
from (values('active'),('inactive'),('deleted')) statuses(s)
left join transactions t on statuses.s = t.status and
date_made >= '2017-08-08'
group by statuses.s, date_made
I assume that you forgot to add date_made to the group by. therefore, I added it there. As you can see the possible values are hardcoded in the SQL. Some other solution (much more cleaner) is to create a table with possible values of status and replace my statuses.
Use SELECT ... FROM (VALUES) with restriction from the transaction table:
select * from (values('active', 0),('inactive', 0),('deleted', 0)) as statuses
where column1 not in (select status from transactions)
union select status, sum(amount) from transactions group by status
Add the date column as need be, I assume it's a static value
The multiple where statements will limit the rows selected unless they are in a sub-query. May I suggest something like the following?
SELECT ST.type, ISNULL(SELECT SUM(TR.total_amount)
FROM transaction_transaction TR
WHERE TR.status = ST.type AND TR.store_id = 21 AND TR.transaction_type = 'Layaway' AND TR.status != 'Void'
AND TR.date_made >= '2018-02-01' AND TR.date_made <= '2018-02-26'),0) AS amount
FROM sms_admin_status ST
GROUP BY ST.type

Grouping records on consecutive dates

If I have following table in Postgres:
order_dtls
Order_id Order_date Customer_name
-------------------------------------
1 11/09/17 Xyz
2 15/09/17 Lmn
3 12/09/17 Xyz
4 18/09/17 Abc
5 15/09/17 Xyz
6 25/09/17 Lmn
7 19/09/17 Abc
I want to retrieve such customer who has placed orders on 2 consecutive days.
In above case Xyz and Abc customers should be returned by query as result.
There are many ways to do this. Use an EXISTS semi-join followed by DISTINCT or GROUP BY, should be among the fastest.
Postgres syntax:
SELECT DISTINCT customer_name
FROM order_dtls o
WHERE EXISTS (
SELEST 1 FROM order_dtls
WHERE customer_name = o.customer_name
AND order_date = o.order_date + 1 -- simple syntax for data type "date" in Postgres!
);
If the table is big, be sure to have an index on (customer_name, order_date) to make it fast - index items in this order.
To clarify, since Oto happened to post almost the same solution a bit faster:
DISTINCT is an SQL construct, a syntax element, not a function. Do not use parentheses like DISTINCT (customer_name). Would be short for DISTINCT ROW(customer_name) - a row constructor unrelated to DISTINCT - and just noise for the simple case with a single expression, because Postgres removes the pointless row wrapper for a single element automatically. But if you wrap more than one expression like that, you get an actual row type - an anonymous record actually, since no row type is given. Most certainly not what you want.
What is a row constructor used for?
Also, don't confuse DISTINCT with DISTINCT ON (expr, ...). See:
Select first row in each GROUP BY group?
Try something like...
SELECT `order_dtls`.*
FROM `order_dtls`
INNER JOIN `order_dtls` AS mirror
ON `order_dtls`.`Order_id` <> `mirror`.`Order_id`
AND `order_dtls`.`Customer_name` = `mirror`.`Customer_name`
AND DATEDIFF(`order_dtls`.`Order_date`, `mirror`.`Order_date`) = 1
The way I would think of it doing it would be to join the table the date part with itselft on the next date and joining it with the Customer_name too.
This way you can ensure that the same customer_name done an order on 2 consecutive days.
For MySQL:
SELECT distinct *
FROM order_dtls t1
INNER JOIN order_dtls t2 on
t1.Order_date = DATE_ADD(t2.Order_date, INTERVAL 1 DAY) and
t1.Customer_name = t2.Customer_name
The result you should also select it with the Distinct keyword to ensure the same customer is not displayed more than 1 time.
For postgresql:
select distinct(Customer_name) from your_table
where exists
(select 1 from your_table t1
where
Customer_name = your_table.Customer_name and Order_date = your_table.Order_date+1 )
Same for MySQL, just instead of your_table.Order_date+1 use: DATE_ADD(your_table.Order_date , INTERVAL 1 DAY)
This should work:
SELECT A.customer_name
FROM order_dtls A
INNER JOIN (SELECT customer_name, order_date FROM order_dtls) as B
ON(A.customer_name = B.customer_name and Datediff(B.Order_date, A.Order_date) =1)
group by A.customer_name

ORA-00904 "invalid identifier" but identifier exists in query

I'm working in a fault-reporting Oracle database, trying to get fault information out of it.
The main table I'm querying is Incident, which includes incident information. Each record in Incident may have any number of records in the WorkOrder table (or none) and each record in WorkOrder may have any number of records in the WorkLog table (or none).
What I am trying to do at this point is, for each record in Incident, find the WorkLog with the minimum value in the field MXRONSITE, and, for that worklog, return the MXRONSITE time and the REPORTDATE from the work order. I accomplished this using a MIN subquery, but it turned out that several worklogs could have the same MXRONSITE time, so I was pulling back more records than I wanted. I tried to create a subsubquery for it, but it now says I have an invalid identifier (ORA-00904) for WOL1.WONUM in the WHERE line, even though that identifier is in use elsewhere.
Any help is appreciated. Note that there is other stuff in the query, but the rest of the query works in isolation, and this but doesn't work in the full query or on its own.
SELECT
WL1.MXRONSITE as "Date_First_Onsite",
WOL1.REPORTDATE as "Date_First_Onsite_Notified"
FROM Maximo.Incident
LEFT JOIN (Maximo.WorkOrder WOL1
LEFT JOIN Maximo.Worklog WL1
ON WL1.RECORDKEY = WOL1.WONUM)
ON WOL1.ORIGRECORDID = Incident.TICKETID
AND WOL1.ORIGRECORDCLASS = 'INCIDENT'
WHERE (WL1.WORKLOGID IN
(SELECT MIN(WL3.WORKLOGID)
FROM (SELECT MIN(WL3.MXRONSITE), WL3.WORKLOGID
FROM Maximo.Worklog WL3 WHERE WOL1.WONUM = WL3.RECORDKEY))
or WL1.WORKLOGID is null)
To clarify, what I want is:
For each fault in Incident,
the earliest MXRONSITE from the Worklog table (if such a value exists),
For that worklog, information from the associated record from the WorkOrder table.
This is complicated by Incident records having multiple work orders, and work orders having multiple work logs, which may have the same MXRONSITE time.
After some trials, I have found an (almost) working solution:
WITH WLONSITE as (
SELECT
MIN(WLW.MXRONSITE) as "ONSITE",
WLWOW.ORIGRECORDID as "TICKETID",
WLWOW.WONUM as "WONUM"
FROM
MAXIMO.WORKLOG WLW
INNER JOIN
MAXIMO.WORKORDER WLWOW
ON
WLW.RECORDKEY = WLWOW.WONUM
WHERE
WLWOW.ORIGRECORDCLASS = 'INCIDENT'
GROUP BY
WLWOW.ORIGRECORDID, WLWOW.WONUM
)
select
incident.ticketid,
wlonsite.onsite,
wlonsite.wonum
from
maximo.incident
LEFT JOIN WLONSITE
ON WLONSITE.TICKETID = Incident.TICKETID
WHERE
(WLONSITE.ONSITE is null or WLONSITE.ONSITE = (SELECT MIN(WLONSITE.ONSITE) FROM WLONSITE WHERE WLONSITE.TICKETID = Incident.TICKETID AND ROWNUM=1))
AND Incident.AFFECTEDDATE >= TO_DATE ('01/12/2015', 'DD/MM/YYYY')
This however is significantly slower, and also still not quite right, as it turns out a single Incident can have multiple Work Orders with the same ONSITE time (aaargh!).
As requested, here is a sample input, and what I want to get from it (apologies for the formatting). Note that while TICKETID and WONUM are primary keys, they are strings rather than integers. WORKLOGID is an integer.
Incident table:
TICKETID / Description / FieldX
1 / WORD1 / S
2 / WORD2 / P
3 / WORDX /
4 / / Q
Work order table:
WONUM / ORIGRECORDID / REPORTDATE
11 / 1 / 2015-01-01
12 / 2 / 2015-01-01
13 / 2 / 2015-02-04
14 / 3 / 2015-04-05
Worklog table:
WORKLOGID / RECORDKEY / MXRONSITE
101 / 11 / 2015-01-05
102 / 12 / 2015-01-04
103 / 12 /
104 / 12 / 2015-02-05
105 / 13 /
Output:
TICKETID / WONUM / WORKLOGID
1 / 11 / 101
2 / 12 / 102
3 / /
4 / /
(Worklog 101 linked to TICKETID 1, has non-null MXRONSITE, and is from work order 11)
(Worklogs 102-105 linked to TICKETID 2, of which 102 has lowest MXRONSITE, and is work order 12)
(No work logs associated with faults 103 or 104, so work order and worklog fields are null)
Post Christmas attack!
I have found a solution which works:
The method I found was to use multiple WITH queries, as follows:
WLMINL AS (
SELECT
RECORDKEY, MXRONSITE, MIN(WORKLOGID) AS "WORKLOG"
FROM MAXIMO.WORKLOG
WHERE WORKLOG.CLASS = 'WORKORDER'
GROUP BY RECORDKEY, MXRONSITE
),
WLMIND AS (
SELECT
RECORDKEY, MIN(MXRONSITE) AS "MXRONSITE"
FROM MAXIMO.WORKLOG
WHERE WORKLOG.CLASS = 'WORKORDER'
GROUP BY RECORDKEY
),
WLMIN AS (
SELECT
WLMIND.RECORDKEY AS "WONUM", WLMIND.MXRONSITE AS "ONSITE", WLMINL.WORKLOG AS "WORKLOGID"
FROM
WLMIND
INNER JOIN
WLMINL
ON
WLMIND.RECORDKEY = WLMINL.RECORDKEY AND WLMIND.MXRONSITE = WLMINL.MXRONSITE
)
Thus for each work order finding the first date, then for each work order and date finding the lowest worklogid, then joining the two tables. This is then repeated at a higher level to find the data by incident.
However this method does not work in a reasonable time, so while it may be suitable for smaller databases it's no good for the behemoths I'm working with.
I would do this with row_number function:
SQLFiddle
select ticketid, case when worklogid is not null then reportdate end d1, mxronsite d2
from (
select i.ticketid, wo.reportdate, wl.mxronsite, wo.wonum, wl.worklogid,
row_number() over (partition by i.ticketid
order by wl.mxronsite, wo.reportdate) rn
from incident i
left join workorder wo on wo.origrecordid = i.ticketid
and wo.origrecordclass = 'INCIDENT'
left join worklog wl on wl.recordkey = wo.wonum )
where rn = 1 order by ticketid
When you nest subqueries, you cannot access columns that belong two or more levels higher; in your statement, WL1 is not accessible in the innermost subquery. (There is also a group-by clause missing, btw)
This might work (not exactly sure what output you expect, but try it):
SELECT
WL1.MXRONSITE as "Date_First_Onsite",
WOL1.REPORTDATE as "Date_First_Onsite_Notified"
FROM Maximo.Incident
LEFT JOIN (
Maximo.WorkOrder WOL1
LEFT JOIN Maximo.Worklog WL1
ON WL1.RECORDKEY = WOL1.WONUM
) ON WOL1.ORIGRECORDID = Incident.TICKETID
AND WOL1.ORIGRECORDCLASS = 'INCIDENT'
WHERE WL1.WORKLOGID =
( SELECT MIN(WL3.WORKLOGID)
FROM Maximo.WorkOrder WOL3
LEFT JOIN Maximo.Worklog WL3
ON WL3.RECORDKEY = WOL3.WONUM
WHERE WOL3.ORIGRECORDID = WOL1.ORIGRECORDID
AND WL3.MXRONSITE IS NOT NULL
)
OR WL1.WORKLOGID IS NULL AND NOT EXISTS
( SELECT MIN(WL4.WORKLOGID)
FROM Maximo.WorkOrder WOL4
LEFT JOIN Maximo.Worklog WL4
ON WL4.RECORDKEY = WOL4.WONUM
WHERE WOL4.ORIGRECORDID = WOL1.ORIGRECORDID
AND WL4.MXRONSITE IS NOT NULL )
I may not have the details right on what you're trying to do... if you have some sample input and desired output, that would be a big help.
That said, I think an analytic function would help a lot, not only in getting the output but in organizing the code. Here is an example of how the max analytic function in a subquery could be used.
Again, the details on the join may be off -- if you can furnish some sample input and output, I'll bet someone can get to where you're trying to go:
with wo as (
select
wonum, origrecordclass, origrecordid, reportdate,
max (reportdate) over (partition by origrecordid) as max_date
from Maximo.workorder
where origrecordclass = 'INCIDENT'
),
logs as (
select
worklogid, mxronsite, recordkey,
max (mxronsite) over (partition by recordkey) as max_mx
from Maximo.worklog
)
select
i.ticketid,
l.mxronsite as "Date_First_Onsite",
wo.reportdate as "Date_First_Onsite_Notified"
from
Maximo.incident i
left join wo on
wo.origrecordid = i.ticketid and
wo.reportdate = wo.max_date
left join logs l on
wo.wonum = l.recordkey and
l.mxronsite = l.max_mx
-- edit --
Based on your sample input and desired output, this appears to give the desired result. It does do somewhat of an explosion in the subquery, but hopefully the efficiency of the analytic functions will dampen that. They are typically much faster, compared to using group by:
with wo_logs as (
select
wo.wonum, wo.origrecordclass, wo.origrecordid, wo.reportdate,
l.worklogid, l.mxronsite, l.recordkey,
max (reportdate) over (partition by origrecordid) as max_date,
min (mxronsite) over (partition by recordkey) as min_mx
from
Maximo.workorder wo
left join Maximo.worklog l on wo.wonum = l.recordkey
where wo.origrecordclass = 'INCIDENT'
)
select
i.ticketid, wl.wonum, wl.worklogid,
wl.mxronsite as "Date_First_Onsite",
wl.reportdate as "Date_First_Onsite_Notified"
from
Maximo.incident i
left join wo_logs wl on
i.ticketid = wl.origrecordid and
wl.mxronsite = wl.min_mx
order by 1

SQL percentage of the total

Hi how can I get the percentage of each record over the total?
Lets imagine I have one table with the following
ID code Points
1 101 2
2 201 3
3 233 4
4 123 1
The percentage for ID 1 is 20% for 2 is 30% and so one
how do I get it?
There's a couple approaches to getting that result.
You essentially need the "total" points from the whole table (or whatever subset), and get that repeated on each row. Getting the percentage is a simple matter of arithmetic, the expression you use for that depends on the datatypes, and how you want that formatted.
Here's one way (out a couple possible ways) to get the specified result:
SELECT t.id
, t.code
, t.points
-- , s.tot_points
, ROUND(t.points * 100.0 / s.tot_points,1) AS percentage
FROM onetable t
CROSS
JOIN ( SELECT SUM(r.points) AS tot_points
FROM onetable r
) s
ORDER BY t.id
The view query s is run first, that gives a single row. The join operation matches that row with every row from t. And that gives us the values we need to calculate a percentage.
Another way to get this result, without using a join operation, is to use a subquery in the SELECT list to return the total.
Note that the join approach can be extended to get percentage for each "group" of records.
id type points %type
-- ---- ------ -----
1 sold 11 22%
2 sold 4 8%
3 sold 25 50%
4 bought 1 50%
5 bought 1 50%
6 sold 10 20%
To get that result, we can use the same query, but a a view query for s that returns total GROUP BY r.type, and then the join operation isn't a CROSS join, but a match based on type:
SELECT t.id
, t.type
, t.points
-- , s.tot_points_by_type
, ROUND(t.points * 100.0 / s.tot_points_by_type,1) AS `%type`
FROM onetable t
JOIN ( SELECT r.type
, SUM(r.points) AS tot_points
FROM onetable r
GROUP BY r.type
) s
ON s.type = t.type
ORDER BY t.id
To do that same result with the subquery, that's going to be a correlated subquery, and that subquery is likely to get executed for every row in t.
This is why it's more natural for me to use a join operation, rather than a subquery in the SELECT list... even when a subquery works the same. (The patterns we use for more complex queries, like assigning aliases to tables, qualifying all column references, and formatting the SQL... those patterns just work their way back into simple queries. The rationale for these patterns is kind of lost in simple queries.)
try like this
select id,code,points,(points * 100)/(select sum(points) from tabel1) from table1
To add to a good list of responses, this should be fast performance-wise, and rather easy to understand:
DECLARE #T TABLE (ID INT, code VARCHAR(256), Points INT)
INSERT INTO #T VALUES (1,'101',2), (2,'201',3),(3,'233',4), (4,'123',1)
;WITH CTE AS
(SELECT * FROM #T)
SELECT C.*, CAST(ROUND((C.Points/B.TOTAL)*100, 2) AS DEC(32,2)) [%_of_TOTAL]
FROM CTE C
JOIN (SELECT CAST(SUM(Points) AS DEC(32,2)) TOTAL FROM CTE) B ON 1=1
Just replace the table variable with your actual table inside the CTE.

SQL Column results with NULL Values

I am having the following issue with my query.
I am trying to import data from multiple tables (Fact_Contact, Quali_Seg, etc…) into one table (Fact_Forecast). This is to predict how many individuals are eligible for a specific offer. The problem I am having is that for some reason, the column Date_ID, which is been pulled from Fact_Contact, when importing has NULL values. I don’t know where these NULL values are coming from as the table Fact_Contact don’t have any NULL values in the column DATE_ID.
This is the section of the query that has the problem,
DECLARE #lastDateID int
SELECT TOP 1 #lastDateID = date_id
FROM Fact_Contact
ORDER BY CREATE_DATE DESC
SELECT date_id, Offers.Segmentation_id, Offers.Offer_Code, Offers.Wave_no,
Offers.cadencevalue,
CASE
WHEN dailydata.activity_count IS NOT NULL THEN dailydata.activity_count
ELSE 0
END as "activity_count"
FROM (
SELECT s.Segmentation_id, s.Offer_Code, s.Wave_no, o.cadencevalue,
o.campaign_id, o.offer_desc
FROM Forecast_Model.dbo.Quali_Segment s
LEFT JOIN Forecast_Model.dbo.Dim_Offers o
ON s.offer_code = o.offer_code
) Offers
LEFT JOIN (
SELECT date_id, Offer_Code_1 Offer_Code,
segmentation_group_id, Count(indv_role_id) Activity_count
FROM Forecast_Model.dbo.Fact_Contact
WHERE date_id = #lastDateID
GROUP BY offer_code_1,segmentation_group_id,date_id
) DailyData
ON DailyData.offer_code = Offers.offer_code
AND Offers.Segmentation_id = dailydata.segmentation_group_id
ORDER BY Segmentation_id,Wave_no
The column Date_ID as I mentiones gets only 2 dates which is the same as the #LastDateID which is 2014-05-20 but the rest are NULL.
Thank you,
Omar
date_id will be NULL whenever you have records in Offers (join Quali_Segment) but no matching records in Fact_Contact