SQL Left Outer Join doesn't give full table

SQL Left Outer Join doesn't give full table - sql

I have searched for similar issues but I found nothing...
I have a problem with joining 2 tables in SQL. First table is created using following procedure
DECLARE #Sequence TABLE (n DATETIME NOT NULL)
DECLARE #Index INT
SET #Index = 1
WHILE #Index <= 96 BEGIN
INSERT #Sequence (n) VALUES (DATEADD(Minute, #Index * 5, DATEADD(Hour, -8, '06-25-2010 00:00:00')))
SET #Index = #Index + 1
END
And when I run a regular query like such:
SELECT
Sequence.n
FROM
#Sequence as Sequence
I get what I expect - 96 rows with DateTime values spaced by 5 minute intervals, ending 06-25-2010 00:00:00 (this value will later be substituted by parameter in SSRS Report, thats why it might look weird to specify 'end' of the range and use double DATEADD).
Now the second table contains values registered by PLMC controllers, and they also happen to register one record per 5 minutes (per pointID, but not to complicate it lets assume there is one PointID).
The problem is that sometimes the controller goes offline, and for a given point there is no value registered, not even a 0. Thats why if I want to get a full picture of last 8 hours from any given point I came up with this sequence table. So, if I take the values from the other table in the same time range using this query:
SELECT
DataTime, DataValue
FROM
PointValue
WHERE
PointValue.DataTime > DATEADD(Hour, -8, '06-22-2010 00:00:00')
AND PointValue.DataTime < '06-22-2010 00:00:00'
AND PointID = '5284'
I will get only 56 rows. This is because after 20:30 on that day the controller went offline and there are no records registered.
So here is the problem. I try to run this query to get one value for every 5 minutes, and hopefully still see the whole 96 rows (8 hours in 5min intervals) with null values after 20:30:
SELECT
Sequence.n,
PointValue.DataValue
FROM
#Sequence as Sequence LEFT OUTER JOIN PointValue
ON Sequence.n = PointValue.DataTime
WHERE
PointID = '5280'
Unforetunately the result is still 56 rows, with the same time stamps as the last query... I have tried every possible join and cannot get the Null values for the last 3,5h of the day. I'm sure I'm making a very simple mistake but I have tried different ways to solve it for hours now and I seriously need help.
SOLVED:
Thanks for your comments, I have started to modify the query after reading first comment to appear, by Blorgbeard. All I had to do is make a carthesian multiplication of my time sequence x all relevan pointIDs (since I don't need all), so as a result my 'FROM' looks as follows:
(SELECT Sequence.n, dbo.LABELS.theIndex FROM #TimeSequence as Sequence, dbo.LABELS
WHERE LEFT(dbo.LABELS.theLabel,2)='VA') as BaseTable
LEFT OUTER JOIN PointValue ON BaseTable.n = PointValue.DataTime AND BaseTable.theIndex = PointValue.PointID
Again, thank you for your help!

You need to move the PointID comparison into the ON clause - with it in the WHERE clause, you're forcing the LEFT JOIN to become an INNER JOIN:
ON Sequence.n = PointValue.DataTime AND
PointValue.PointID = '5280'
Conditions in the WHERE clause have to be met by every row in the result set.

I think the problem is:
WHERE
PointID = '5280'
Because PointID is in the PointValue table, so it will be null for missing rows, and null does not equal '5280'.
I think you could change it like this to make it work:
WHERE
(PointID is null) or (PointID = '5280')

This is because, in your NULL rows, PointID is not '5280'.
Try adding that to your JOIN clause...
ON Sequence.n = PointValue.DataTime AND PointID = '5280'

Related

How to pull rows from a SQL table until quotas for multiple columns are met?

I've been able to find a few examples of questions similar to this one, but most only involve a single column being checked.
SQL Select until Quantity Met
Select rows until condition met
I have a large table representing facilities, with columns for each type of resource available and the number of those specific resources available per facility. I want this stored procedure to be able to take integer values in as multiple parameters (representing each of these columns) and a Lat/Lon. Then it should iterate over the table sorted by distance, and return all rows (facilities) until the required quantity of available resources (specified by the parameters) are met.
Data source example:
Id
Lat
Long
Resource1
Resource2
...
1
50.123
4.23
5
12
...
2
61.234
5.34
0
9
...
3
50.634
4.67
21
18
...
Result Wanted:
#latQuery = 50.634
#LongQuery = 4.67
#res1Query = 10
#res2Query = 20
Id
Lat
Long
Resource1
Resource2
...
3
50.634
4.67
21
18
...
1
50.123
4.23
5
12
...
Result includes all rows that meet the queries individually. Result is also sorted by distance to the requested lat/lon
I'm able to sort the results by distance, and sum the total running values as suggested in other threads, but I'm having some trouble with the logic comparing the running values with the quota provided in the params.
First I have some CTEs to get most recent edits, order by distance and then sum the running totals
WITH cte1 AS (SELECT
#origin.STDistance(geography::Point(Facility.Lat, Facility.Long, 4326)) AS distance,
Facility.Resource1 as res1,
Facility.Resource2 as res2
-- ...etc
FROM Facility
),
cte2 AS (SELECT
distance,
res1,
SUM(res1) OVER (ORDER BY distance) AS totRes1,
res2,
SUM(res1) OVER (ORDER BY distance) AS totRes2
-- ...etc, there's 15-20 columns here
FROM cte1
)
Next, with the results of that CTE, I need to pull rows until all quotas are met. Having the issues here, where it works for one row but my logic with all the ANDs isn't exactly right.
SELECT * FROM cte2 WHERE (
(totRes1 <= #res1Query OR (totRes1 > #res1Query AND totRes1- res1 <= #totRes1)) AND
(totRes2 <= #res2Query OR (totRes2 > #res2Query AND totRes2- res2 <= #totRes2)) AND
-- ... I also feel like this method of pulling the next row once it's over may be convoluted as well?
)
As-is right now, it's mostly returning nothing, and I'm guessing it's because it's too strict? Essentially, I want to be able to let the total values go past the required values until they are all past the required values, and then return that list.
Has anyone come across a better method of searching using separate quotas for multiple columns?
See my update in the answers/comments

I think you are massively over-complicating this. This does not need any joins, just some running sum calculations, and the right OR logic.
The key to solving this is that you need all rows, where the running sum up to the previous row is less than the requirement for all requirements. This means that you include all rows where the requirement has not been met, and the first row for which the requirement has been met or exceeded.
To do this you can subtract the current row's value from the running sum.
You could utilize a ROWS specification of ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING. But then you need to deal with NULL on the first row.
In any event, even a regular running sum should always use ROWS UNBOUNDED PRECEDING, because the default is RANGE UNBOUNDED PRECEDING, which is subtly different and can cause incorrect results, as well as being slower.
You can also factor out the distance calculation into a CROSS APPLY (VALUES, avoiding the need for lots of CTEs or derived tables. You now only need one level of derivation.
DECLARE #origin geography = geography::Point(#latQuery, #LongQuery, 4326);
SELECT
f.Id,
f.Lat,
f.Long,
f.Resource1,
f.Resource2
FROM (
SELECT f.*,
SumRes1 = SUM(f.Resource1) OVER (ORDER BY v1.Distance ROWS UNBOUNDED PRECEDING) - f.Resource1,
SumRes2 = SUM(f.Resource2) OVER (ORDER BY v1.Distance ROWS UNBOUNDED PRECEDING) - f.Resource2
FROM Facility f
CROSS APPLY (VALUES(
#origin.STDistance(geography::Point(f.Lat, f.Long, 4326))
)) v1(Distance)
) f
WHERE (
f.SumRes1 < #res1Query
OR f.SumRes2 < #res2Query
);
db<>fiddle

Was able to figure out the problem on my own here. The primary issue I was running into was that I was comparing 25 different columns' running totals versus the 25 stored proc parameters (quotas of resources required by the search).
Changing the lines such as these
(totRes1 <= #res1Query OR (totRes1 > #res1Query AND totRes1- res1 <= #totRes1)) AND --...
to
(totRes1 <= #res1Query OR (totRes1 > #res1Query AND totRes1- res1 <= #totRes1) OR #res1Query = 0) AND --...
(adding in the OR #res1Query = 0)solved my issue.
In other words, the search is often only for one or two columns (types of resources) - leaving others as zero. The way my logic was set up caused it to skip over lots of rows because it was instantly marking them as having met the quota (value less than or equal to the quota). like #A Neon Tetra suggested, was pretty close to it already.
Update:
First attempt didn't exactly fix my own issue. Posting the stripped down version of my code that is now working for me.
DECLARE #Lat AS DECIMAL(12,6)
DECLARE #Lon AS DECIMAL(12,6)
DECLARE #res1Query AS INT
DECLARE #res2Query AS INT
-- repeat for Resource 3 through 25, etc...
DECLARE #origin geography = geography::Point(#Lat, #Lon, 4326);
-- CTE to be able to expose distance
cte AS (SELECT TOP(99999) -- --> this is hacky, it won't let me order by distance unless I'm selecting TOP(x) or some other fn?
dbo.Facility.FacilityGUID,
dbo.Facility.Lat,
dbo.Facility.Lon,
#origin.STDistance(geography::Point(dbo.Facility.Lat, dbo.Facility.Lon, 4326))
AS distance,
dbo.Facility.Resource1 AS res1,
dbo.Facility.Resource2 AS res2,
-- repeat for Resource 3 through 25, etc...
FROM dbo.Facility
ORDER BY distance),
-- third CTE - has access to distance so we can keep track of a running total ordered by distance
---> have to separate into two since you can't reference the same alias (distance) again within the same SELECT
fullCTE AS (SELECT
FacilityID,
Lat,
Long,
distance,
res1,
SUM(res1) OVER (ORDER BY distance)AS totRes1,
res2,
SUM(res2) OVER (ORDER BY distance)AS totRes2,
-- repeat for Resource 3 through 25, etc...
FROM cte)
SELECT * -- Customize what you're pulling here for your output as needed
FROM dbo.Facility INNER JOIN fullCTE ON (fullCTE.FacilityID = dbo.Facility.FacilityID)
WHERE EXISTS
(SELECT
FacilityID
FROM fullCTE WHERE (
FacilityID = dbo.Facility.FacilityID AND
-- Keep pulling rows until all conditions are met, as opposed to pulling rows while they're under the quota
NOT (
((totRes1 - res1 >= #res1Query AND #res1Query <> 0) OR (#res1Query = 0)) AND
((totRes2 - res2 >= #res2Query AND #res2Query <> 0) OR (#res2Query = 0)) AND
-- repeat for Resource 3 through 25, etc...
)
)
)

Group a sub-set of a result set by time interval

I have an audit table where specific actions are being recorded (such as 'access', 'create', 'update' and so on). I am selecting these records so that they can be displayed in a table to the administrative user.
This works fine when I select all the records for a particular entity. However, because I am using the Post-Redirect-Get pattern, the 'access' records are being logged on every page view. In a typical session an end user may hit the same page 6 or 7 times in the same 5 minute window. As a consequence, the administrative user is having to scroll through quite a few redundant access records and this is understandably distracting from the user experience.
To solve this problem, I have written two queries. The first will look for all records that are not access records. The second will look for access records and then groups them into ten minute intervals. I then UNION these two queries and order by the datetime.
-- Select non 'access' records
SELECT
[ORIGIN_ID]
,[ORIGIN_ID_TYPE]
,[REFERENCE_ID]
,[REFERENCE_ID_TYPE]
,[ACTION_TYPE_ID]
,CAST([ORIGINAL_VALUE] AS VARCHAR(8000)) AS ORIGINAL_VALUE
,CAST([CHANGED_VALUE] AS VARCHAR(8000)) AS CHANGED_VALUE
,[CREATED_BY]
,[CREATED_ON]
FROM [HISTORY]
WHERE [ORIGIN_ID] = 500 AND [ORIGIN_ID_TYPE] = 4 AND [ACTION_TYPE_ID] != 1
UNION
-- Select 'access' records and group them into 10 minute intervals by ts
SELECT
[ORIGIN_ID]
,[ORIGIN_ID_TYPE]
,[REFERENCE_ID]
,[REFERENCE_ID_TYPE]
,[ACTION_TYPE_ID]
,CAST([ORIGINAL_VALUE] AS VARCHAR(255)) AS ORIGINAL_VALUE
,CAST([CHANGED_VALUE] AS VARCHAR(255)) AS CHANGED_VALUE
,[CREATED_BY]
,DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [CREATED_ON]) / 10 * 10, 0) AS CREATED_ON
FROM [HISTORY]
WHERE [ACTION_TYPE_ID] = 1 AND [ORIGIN_ID] = 500 AND [ORIGIN_ID_TYPE] = 4
GROUP BY
[ORIGIN_ID]
,[ORIGIN_ID_TYPE]
,[REFERENCE_ID]
,[REFERENCE_ID_TYPE]
,[ACTION_TYPE_ID]
,CAST([ORIGINAL_VALUE] AS VARCHAR(255))
,CAST([CHANGED_VALUE] AS VARCHAR(255))
,[CREATED_BY]
,DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [CREATED_ON]) / 10 * 10, 0)
ORDER BY [CREATED_ON] DESC
SQLFiddle (I had a limited amount of data SQLFiddle would allow me to upload)
I feel like there may be a better way to do this that does not require me to use UNION. In order to do it this way I had to cast my TEXT columns to VARCHAR columns and I feel like there could be a better alternative. Any suggestions as to how this query can be improved?

Eliminate the union by using these two groupings. The second one also becomes the new expression for the combined created_on column. The first can also be used to control sorting and then otherwise discarded. (Don't forget to remove the filter on action_type_id too.):
case when action_type_id <> 1 then 1 else 2 end,
case when action_type_id <> 1
then created_on
else DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [CREATED_ON]) / 10 * 10, 0)
end
This will cause the query to treat the two types of actions as distinct for purposes of aggregation. Since you do want to keep every row with a non-1 action, you don't collapse those into ten-minute blocks at all.
Note that this wouldn't quite work as is if it's possible to have two such rows recorded with the same timestamp. You'd need to group on another ID (or just row_number()) to get around that but I suspect that'll be unnecessary.

why does this query not give me only the specified accounts?

Oracle SQL Developer
I expect to see:
In the subquery, I have that the rownumber be less than 2. When I run this query separately, it gives me 2 accounts. However, when I'm running the entire query, the list of account numbers just goes on! what's happening here?
SELECT m.acctno, i.intervalstartdate, d.name, i.intervalvalue
FROM endpoints E
JOIN meters m on m.acctid = e.acctid
LEFT JOIN intervaldata I ON I.acctid = M.acctid
LEFT JOIN endpointmodels EM ON EM.endpointmodelid=E.hwmodelid
LEFT JOIN datadefinitions D ON D.datadefinitionid = I.datadefinitionid
WHERE 1=1
AND E.statuscodeid = 8
AND m.FORM = 2
and exists
(
SELECT m2.acctno
from acct m2
where m2.acctno is not null
--and m2.acctno=m2.acctno
and rownum <= 2
)
AND D.datadefinitionid =7077
AND I.intervalstartdate BETWEEN '24-SEP-2017 00:00' and '25-SEP-2017 00:00'
--TRUNC(sysdate - 1) + interval '1' hour AND TRUNC(sysdate - 1) + interval
'24' hour
ORDER BY M.acctno, I.intervalstartdate, I.datadefinitionid
This query is supposed to give me 97 rows for each account. The data i'm reading, the interval values, are the data we report for each customer in 96 intervals. so Im expecting for 2 accounts for example, to get 194 rows. i want to test for 2 accounts now, but then i want to run for 50,000. so with 2, it's not even working. Just giving me millions of rows for two accounts. Basicaly, i think my row num line of code is being ignored. I can't use an in clause because i cant pass 50,000 accounts into there. so I used the exist operator.
Let me know!

I think the error is in trying to use and exists (...) clause. The exists predicate returns true if the subquery returns any rows at all. So, in your case, the result of exists will always be true, unless the table is empty. This means it has no effect whatsoever on the outer query. You need to use something like
inner join (SELECT m2.acctno
from acct m2
where m2.acctno is not null
--and m2.acctno=m2.acctno
and rownum <= 2) sub1
on sub1.acctno = m.acctno
to get what you want instead of and exists (...).

One obvious mistake is the date condition, where you require a date to be between two STRINGS. If you keep dates in string format, you will run into thousands of problems and bugs and you won't be able to fix them.
Do you understand that '25-APR-2008 00:00:00' is between '24-SEP-2017 00:00:00' and '25-SEP-2017 00:00:00', if you compare them alphabetically (as strings)?
The solution is to make sure the date column is in DATE (or TIMESTAMP) data type, and then to compare to dates, not to strings.
As an aside - this will not cause any errors, but it is still bad code - in the EXISTS condition you have a condition for ROWNUM <= 2. What do you think that will do? The subquery either returns at least one row (the first one will automatically have ROWNUM = 1) or it doesn't. The condition on ROWNUM in that subquery in the EXISTS condition is just garbage.

Update table with combined field from other tables

I have a table with actions that are being due in the future. I have a second table that holds all the cases, including the due date of the case. And I have a third table that holds numbers.
The problems is as follows. Our system automatically populates our table with future actions. For some clients however, we need to change these dates. I wanted to create an update query for this, and have this run through our scheduler. However, I am kind of stuck at the moment.
What I have on code so far is this:
UPDATE proxima_gestion p
SET fecha = (SELECT To_char(d.f_ult_vencim + c.hrem01, 'yyyyMMdd')
FROM deuda d,
c4u_activity_dates c,
proxima_gestion p
WHERE d.codigo_cliente = c.codigo_cliente
AND p.n_expediente = d.n_expediente
AND d.saldo > 1000
AND p.tipo_gestion_id = 914
AND p.codigo_oficina = 33
AND d.f_ult_vencim > sysdate)
WHERE EXISTS (SELECT *
FROM proxima_gestion p,
deuda d
WHERE p.n_expediente = d.n_expediente
AND d.saldo > 1000
AND p.tipo_gestion_id = 914
AND p.codigo_oficina = 33
AND d.f_ult_vencim > sysdate)
The field fecha is the current action date. Unfortunately, this is saved as a char instead of date. That is why I need to convert the date back to a char. F_ult_vencim is the due date, and hrem01 is the number of days the actions should be placed away from the due date. (for example, this could be 10, making the new date 10 days after the due date)
Apart from that, there are a few more criteria when we need to change the date (certain creditors, certain departments, only for future cases and starting from a certain amount, only for a certain action type.)
However, when I try and run this query, I get the error message
ORA-01427: single-row subquery returns more than one row
If I run both subqueries seperately, I get 2 results from both. What I am trying to accomplish, is that it connects these 2 queries, and updates the field to the new value. This value will be different for every case, as every due date will be different.
Is this even possible? And if so, how?

You're getting the error because the first SELECT is returning more than one row for each row in the table being updated.
The first thing I see is that the alias for the table in UPDATE is the same as the alias in both SELECTs (p). So all of the references to p in the subqueries are referencing proxima_gestion in the subquery rather than the outer query. That is, the subquery is not dependent on the outer query, which is required for an UPDATE.
Try removing "proxima_gestion p" from FROM in both subqueries. The references to p, then, will be to the outer UPDATE query.

oracle sql query generating some unexected resutls

I have an sql query in which I have table named attan for the attendance and column named
S_TIME in which I stored in time and out time with a setting a flag of in with I and for O
Now I am trying to get in time as well as out time using the flag. I have written a query but result from this query are making some wrong sense.
Here is my query
SELECT X.AC_NO, X.IN_TIME, Y.OUT_TIME, X.S_DTE, X.CHK_IN, Y.CHK_OUT
FROM (SELECT A.AC_NO, A.S_TIME IN_TIME, A.AC_CHECKTYPE CHK_IN, A.S_DTE
FROM ATTN A
WHERE A.AC_CHECKTYPE = 'I' AND A.S_DTE = :P_DTE) X,
(SELECT B.AC_NO, B.S_TIME OUT_TIME, B.AC_CHECKTYPE CHK_OUT
FROM ATTN B
WHERE B.AC_CHECKTYPE = 'O' AND B.S_DTE = :P_DTE) Y
WHERE X.AC_NO = Y.AC_NO(+)
Through this query I always get the number of records of the employees who are in and only get 2 number of reocrds with out time.
Whereas in table there are 234 number of emps who are in and 256 are out but results don't match the data.
Please any one help me if any problem in my query.

SUGGESTION:
Run the subselects and see if you're getting all the rows you expect, with and without the "A.S_DTE = :P_DTE" date filter:
SELECT A.AC_NO, A.S_TIME IN_TIME, A.AC_CHECKTYPE CHK_IN, A.S_DTE
FROM ATTN A
WHERE A.AC_CHECKTYPE = 'I' AND A.S_DTE = :P_DTE
My first guess is that maybe you have a datetime, and unless the hours/minutes/seconds match exactly, you won't get the row. You can use to_char() to have just the "date" portion in your "from":
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:204714700346328550
In any case, seeing what rows you do (and don't) get in the subqueries should show you what's wrong with the complete query.
Please post back what you find!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas