Related
Consider the database entries below. The distinguishing identifier is the "short" item number, 89721.
Data:
F3003
IRKIT IRKITL IRMMCU IRMCU IREFFF IREFFT IRAN8 IRTRT IRUOM
89721 7N74N050046 B20 NXT 12/06/2015 12/31/2040 200038 M SY
89721 7N74N050046 B70 NXT 07/28/2021 12/31/2040 200038 M SY
F0101
ABAN8 ABALPH ABAC01
200038 Company XYZ CON
F4101
IMITM IMDSC1
89721 TWIN RIB N05 ONYX HS-SY 46.5"
F41021
LIITM LIPBIN LIPQOH
89721 S 256
...
[a total of 99 "S" entries where LIPQOH sums to 16554]
F4211
SDITM SDNXTR SDDCTO SDUORG
89721 540 SO 4700.00
SQL:
SELECT F3003.IRAN8 AS CUST_NO,
F3003.IRKIT AS SHORT_ITEM,
F3003.IRKITL AS ITEM_NO,
F3003.IRUOM AS UOM,
F3003.IRMCU AS WC,
F0101.ABALPH AS CUST_NAME,
F4101.IMDSC1 AS ITEM_DESC,
SUM(F41021.LIPQOH / 100) AS ON_HAND
FROM PROD2DTA.F3003 AS F3003
INNER JOIN PROD2DTA.F0101 AS F0101
ON F3003.IRAN8 = F0101.ABAN8
INNER JOIN PROD2DTA.F4101 AS F4101
ON F3003.IRKIT = F4101.IMITM
INNER JOIN PROD2DTA.F41021 AS F41021
ON F3003.IRKIT = F41021.LIITM
WHERE F3003.IRMCU LIKE '%NXT'
AND F3003.IRTRT = 'M'
AND F0101.ABAC01 = 'CON'
AND F41021.LIPBIN = 'S'
AND CURRENT_DATE BETWEEN DATE(CONCAT(CAST(F3003.IREFFF / 1000 AS INT) + 1900, RIGHT(MOD(F3003.IREFFF, 1000) + 1000, 3))) AND DATE(CONCAT(CAST(F3003.IREFFT / 1000 AS INT) + 1900, RIGHT(MOD(F3003.IREFFT, 1000) + 1000, 3)))
GROUP BY F3003.IRAN8,
F3003.IRKIT,
F3003.IRKITL,
F3003.IRUOM,
F3003.IRMCU,
F3003.IRDSC1,
F0101.ABALPH,
F4101.IMDSC1
HAVING SUM(F41021.LIPQOH / 100) > 0
When I run the query above, I get the following results:
CUST NO CUST NAME ITEM NO ITEM DESC UOM ON HAND
200038 Company XYZ 7N74N050046 TWIN RIB N05 ONYX HS-SY 46.5" SY 16,554.00
This is correct based on comparison with JD Edwards applications.
Note that in the F3003 table above, there are two entries that meet the filtering criteria in the query. I have provided the IRMMCU column to show a source of multiple results, even though I'm not selecting it in the SQL. This represents a plant location where this item may be manufactured. The dates also may contribute to multiple results, because they both meet the filter critera, but are not selected in the SQL.
Now, consider the following query, which uses the one above as a subquery:
SELECT CUST_SPEC_ON_HAND.CUST_NO AS CUST_NO,
CUST_SPEC_ON_HAND.SHORT_ITEM AS SHORT_ITEM,
CUST_SPEC_ON_HAND.ITEM_NO AS ITEM_NO,
CUST_SPEC_ON_HAND.UOM AS UOM,
CUST_SPEC_ON_HAND.WC AS WC,
CUST_SPEC_ON_HAND.CUST_NAME AS CUST_NAME,
CUST_SPEC_ON_HAND.ITEM_DESC AS ITEM_DESC,
CUST_SPEC_ON_HAND.ON_HAND AS ON_HAND,
SUM(F4211.SDUORG / 100) AS OPEN_ORDER
FROM (SELECT F3003.IRAN8 AS CUST_NO,
F3003.IRKIT AS SHORT_ITEM,
F3003.IRKITL AS ITEM_NO,
F3003.IRUOM AS UOM,
F3003.IRMCU AS WC,
F0101.ABALPH AS CUST_NAME,
F4101.IMDSC1 AS ITEM_DESC,
SUM(F41021.LIPQOH / 100) AS ON_HAND
FROM PROD2DTA.F3003 AS F3003
INNER JOIN PROD2DTA.F0101 AS F0101
ON F3003.IRAN8 = F0101.ABAN8
INNER JOIN PROD2DTA.F4101 AS F4101
ON F3003.IRKIT = F4101.IMITM
INNER JOIN PROD2DTA.F41021 AS F41021
ON F3003.IRKIT = F41021.LIITM
WHERE F3003.IRMCU LIKE '%NXT'
AND F3003.IRTRT = 'M'
AND F0101.ABAC01 = 'CON'
AND F41021.LIPBIN = 'S'
AND CURRENT_DATE BETWEEN DATE(CONCAT(CAST(F3003.IREFFF / 1000 AS INT) + 1900, RIGHT(MOD(F3003.IREFFF, 1000) + 1000, 3))) AND DATE(CONCAT(CAST(F3003.IREFFT / 1000 AS INT) + 1900, RIGHT(MOD(F3003.IREFFT, 1000) + 1000, 3)))
GROUP BY F3003.IRAN8,
F3003.IRKIT,
F3003.IRKITL,
F3003.IRUOM,
F3003.IRMCU,
F0101.ABALPH,
F4101.IMDSC1
HAVING SUM(F41021.LIPQOH / 100) > 0) CUST_SPEC_ON_HAND
LEFT JOIN PROD2DTA.F4211 AS F4211
ON CUST_SPEC_ON_HAND.SHORT_ITEM = F4211.SDITM
AND F4211.SDDCTO IN ( 'SO', 'SM', 'S2' )
AND F4211.SDNXTR < 999
GROUP BY CUST_SPEC_ON_HAND.CUST_NO,
CUST_SPEC_ON_HAND.SHORT_ITEM,
CUST_SPEC_ON_HAND.ITEM_NO,
CUST_SPEC_ON_HAND.UOM,
CUST_SPEC_ON_HAND.WC,
CUST_SPEC_ON_HAND.CUST_NAME,
CUST_SPEC_ON_HAND.ITEM_DESC,
CUST_SPEC_ON_HAND.ON_HAND
When I run this query for the same "short" item number 89721, I get the following:
CUST # CUSTOMER NAME ITEM # ITEM DESCRIPTION UOM ON HAND OPEN ORDERS
200038 Company XYZ 7N74N050046 TWIN RIB N05 ONYX HS-SY 46.5" SY 33,108.00 4,700.00
Notice that the on-hand quantity is now twice what it was from the first query.
I can't figure out why this is happening. It does not happen for all data in the result set. From what I can tell, the items with an inflated on-hand quantity are those items in the F3003 table with entries for multiple manufacturing plants (F3003.IRMMCU). But why would it work correctly for the simpler query, and not for the second query that is using the first query?
Below are CREATE TABLE statements to create the tables shown above. Note: The actual JDE tables contain many more columns than what is shown here.
You should also note that JDE stores dates as an integer in a JDE "Julian" date format. The format is as follows: CYYDDD, where C is the century, YY is the year and DDD is the day of the year. For example, April 28, 2022 is represented as 122118.
CREATE TABLE F3003
(
IRTRT NCHAR(3) NULL, -- Type of Routing
IRKIT INT NULL, -- Parent (short) Item Number
IRKITL NCHAR(25) NULL, -- Kit - 2nd Item Number
IRMMCU NCHAR(12) NULL, -- Branch
IRMCU NCHAR(12) NULL, -- Business Unit
IREFFF NUMERIC(6) NULL, -- Effective - From Date
IREFFT NUMERIC(6) NULL, -- Effective - Thru Date
IRUOM NCHAR(2) NULL, -- Unit of Measure as Input
IRAN8 INT NULL, -- Address Number
);
CREATE TABLE F0101
(
ABAN8 INT NULL, -- Address Number
ABALPH NCHAR(40) NULL, -- Name - Alpha
ABAC01 NCHAR(3) NULL, -- Category Code - Address Book 01
);
CREATE TABLE F4101
(
IMITM INT NULL, -- Item Number - Short
IMDSC1 NCHAR(30) NULL, -- Description
);
CREATE TABLE F41021
(
LIITM INT NULL, -- Item Number - Short
LIPBIN NCHAR(1) NULL, -- Primary Location (P/S)
LIPQOH INT NULL, -- Quantity on Hand - Primary units
);
CREATE TABLE F4211
(
SDDCTO NCHAR(2) NULL, -- Order Type
SDITM INT NULL, -- Item Number - Short
SDNXTR NCHAR(3) NULL, -- Status Code - Next
SDUORG INT NULL, -- Units - Order/Transaction Quantity
);
If you really want some information overload, you can find all sorts of information about the JDE data ecosystem by going to http://www.jdetables.com/.
INSERT statements:
Note that the dates are in the JDE Julian date format.
INSERT INTO F3003 (IRKIT,IRKITL,IRMMCU,IRMCU,IREFFF,IREFFT,IRAN8,IRTRT,IRUOM) VALUES (89721,'7N74N050046','B20','NXT',115340,140366,200038,'M','SY');
INSERT INTO F3003 (IRKIT,IRKITL,IRMMCU,IRMCU,IREFFF,IREFFT,IRAN8,IRTRT,IRUOM) VALUES (89721,'7N74N050046','B70','NXT',121209,140366,200038,'M','SY');
INSERT INTO F0101 (ABAN8,ABALPH,ABAC01) VALUES (200038,'Company XYZ','CON');
INSERT INTO F4101 (IMITM,IMDSC1) VALUES (89721,'TWIN RIB N05 ONYX HS-SY 46.5"');
Note that the on-hand quantity (LIPQOH) is stored with 2 places after the decimal point. To get the actual value, it must be divided by 100. Also, I have just one insert statement to replicate the on-hand quantity, rather than inserting 99 lines.
INSERT INTO F41021 (LIITM,LIPBIN,LIPQOH) VALUES (89721,'S',1655400);
Note that the ordered quantity (SDUORG) is stored with 2 places after the decimal point. To get the actual value, it must be divided by 100.
INSERT INTO F4211 (SDITM,SDNXTR,SDDCTO,SDUORG) VALUES (89721,540,'SO',470000);
I have a database with tables for
equipment we service (table e, field e_id)
contracts on the equipment (table c, fields c_id, e_id, c_start, c_end)
maintenance we have performed in the past (table m, e_id, m_id,
m_date)
I am trying to build a query that will show me all equipment records, if it is currently in contract with the start/end date, and a count of any maintenance performed since the start date of the contract.
I have a subquery to get the current contract (this table is large and has a new line for each contract revision), but I can't work out how to use the result of the contract subquery to return the maintenance visits since that date without returning multiple lines.
select
e.e_id,
c2.c_id,
c2.c_start,
c2.c_end,
m2.count
from e
left join (
select
c_id,
c_start,
c_end,
e_id
...other things and filtering by joining the table to itself
from c
) as c2 on c2.e_id = e.e_id
I would also like to be able to add this
m-subquery v1
left join (
select
count(*),
e_id
from m
where m.m_date >= c2.start
) as m2 on m2.e_id = e.e_id
But I'm unable to access c2.C_start from within the second subquery.
I am able to return this table by joining outside the subquery, but this returns multiple lines.
m-subquery v2
left join (
select
e_id,
m_date,
from m
) as m2 on m2.e_id = e.e_id and m.m_date >= c2.start
Is there a way to:
Get the subquery field c2.start into the m-subquery v1?
Aggregate the result of the m-subquery v2 without using group by (there are a lot of columns in the main select query)?
Do this differently?
I've seen lateral which I kind of think might be what I need but I have tried the keyword in front of both subqueries individually and together and it didn't work to let me use c2.c_start inside at any point.
I am a little averse to using group by, mainly as the BI analyst at work says "slap a group by on it" when there are duplicates in reports rather than trying to understand the business process/database properly. I feel like having a group by on the main query shouldn't be needed when I know for certain that the e table has one record per e_id, and the mess that having probably 59 out of 60 columns named in the group by would cause might make the query less maintainable.
Thanks,
Sam
Since not all RDBMS support lateral, I would like to present you the following general solution. You can make use of CTEs (WITH queries) to help structuring the query and reuse partial results. E.g. in the following code, you can think of current_contracts as a kind of virtual table existing only during query execution.
Part 1: DDLs and test data
DROP TABLE IF EXISTS e;
CREATE TABLE e
(
e_id INTEGER
);
DROP TABLE IF EXISTS c;
CREATE TABLE c
(
c_id INTEGER,
e_id INTEGER,
c_start DATE,
c_end DATE
);
DROP TABLE IF EXISTS m;
CREATE TABLE m
(
e_id INTEGER,
m_id INTEGER,
m_date DATE
);
INSERT INTO e VALUES (101),(102),(103);
INSERT INTO c VALUES (201, 101, DATE '2021-01-01', DATE '2021-12-31'), (202, 102, DATE '2021-03-01', DATE '2021-04-15'), (203, 102, DATE '2021-04-16', DATE '2021-04-30'), (204, 103, DATE '2003-01-01', DATE '2003-12-31'), (205, 103, DATE '2021-04-01', DATE '2021-04-30');
INSERT INTO m VALUES (101, 301, DATE '2021-01-01'), (101, 302, DATE '2021-02-01'), (101, 303, DATE '2021-03-01'), (102, 304, DATE '2021-04-02'), (102, 305, DATE '2021-04-03'), (103, 306, DATE '2021-04-03');
Part 2: the actual query
WITH
-- find currently active contracts per equipment:
-- we assume there is 0 or 1 contract active per equipment at any time
current_contracts AS
(
SELECT *
FROM c
WHERE c.c_start <= CURRENT_DATE -- only active contracts
AND c.c_end >= CURRENT_DATE -- only active contracts
),
-- count maintenance visits during the (single) active contract per equipment, if any:
current_maintenance AS
(
SELECT m.e_id, COUNT(*) AS count_m_per_e -- a count of maintenance visits per equipment
FROM m
INNER JOIN current_contracts cc
ON cc.e_id = m.e_id -- match maintenance to current contracts via equipment
AND cc.c_start <= m.m_date -- only maintenance that was done during the current contract
GROUP BY m.e_id
)
-- bring the parts together for our result:
-- we start with equipment and use LEFT JOINs to assure we retain all equipment
SELECT
e.*,
cc.c_start, cc.c_end,
CASE WHEN cc.e_id IS NOT NULL THEN 'yes' ELSE 'no' END AS has_contract,
COALESCE(cm.count_m_per_e, 0) -- to replace NULL when no contract is active
FROM e
LEFT JOIN current_contracts cc
ON cc.e_id = e.e_id
LEFT JOIN current_maintenance cm
ON cm.e_id = e.e_id
ORDER BY e.e_id;
Please note that your real pre-processing logic for contracts and maintenance visits may be more complex, e.g. due to overlapping periods of active contracts per equipment.
I would like to generate a list of all days where every sailor booked a boat in that particular day.
The table scheme is as follows:
CREATE TABLE SAILOR(
SID INTEGER NOT NULL,
NAME VARCHAR(50) NOT NULL,
RATING INTEGER NOT NULL,
AGE FLOAT NOT NULL,
PRIMARY KEY(SID)
);
CREATE TABLE BOAT(
BID INTEGER NOT NULL,
NAME VARCHAR(50) NOT NULL,
COLOR VARCHAR(50) NOT NULL,
PRIMARY KEY(BID)
);
CREATE TABLE RESERVE (
SID INTEGER NOT NULL REFERENCES SAILOR(SID),
BID INTEGER NOT NULL REFERENCES BOAT(BID),
DAY DATE NOT NULL,
PRIMARY KEY(SID, BID, DAY));
The data is as follows:
INSERT INTO SAILOR(SID, NAME, RATING, AGE)
VALUES
(64, 'Horatio', 7, 35.0),
(74, 'Horatio', 9, 35.0);
INSERT INTO BOAT(BID, NAME, COLOR)
VALUES
(101, 'Interlake', 'blue'),
(102, 'Interlake', 'red'),
(103, 'Clipper', 'green'),
(104, 'Marine', 'red');
INSERT INTO RESERVE(SID, BID, DAY)
VALUES+
(64, 101, '09/05/98'),
(64, 102, '09/08/98'),
(74, 103, '09/08/98');
I have tried using this code:
SELECT DAY
FROM RESERVE R
WHERE NOT EXISTS (
SELECT SID
FROM SAILOR S
EXCEPT
SELECT S.SID
FROM SAILOR S, RESERVE R
WHERE S.SID = R.SID)
GROUP BY DAY;
but it returns a list of all days, no exception. The only day that it should return is "09/08/98". How do I solve this?
I would phrase your query as:
SELECT r.DAY
FROM RESERVE r
GROUP BY r.DAY
HAVING COUNT(DISTINCT r.SID) = (SELECT COUNT(*) FROM SAILOR);
Demo
The above query says to return any day in the RESERVE table whose distinct SID sailor count matches the count of every sailor.
This assumes that SID sailor entries in the RESERVE table would only be made with sailors that actually appear in the SAILOR table. This seems reasonable, and can be enforced using primary/foreign key relationships between the two tables.
Taking a slightly different approach of just counting unique sailors per day:
SELECT day FROM (
SELECT COUNT(DISTINCT sid), day FROM reserve GROUP BY day
) AS sailors_per_day
WHERE count = (SELECT COUNT(*) FROM sailor);
+------------+
| day |
|------------|
| 1998-09-08 |
+------------+
hello guys I need a little help writing a sql statement that would basically point out the rows that don't have a corresponding negative matching number, based on my_id,report_id.
Here is the table declaration for better explanation.
CREATE TABLE ."TEST"
( "REPORT_ID" VARCHAR2(100 BYTE),
"AMOUNT" NUMBER(17,2),
"MY_ID" VARCHAR2(30 BYTE),
"FUND" VARCHAR2(20 BYTE),
"ORG" VARCHAR2(20 BYTE)
)
here is some sample data
Insert into TEST (REPORT_ID,AMOUNT,MY_ID,FUND,ORG) values ('1',50,'910','100000','67120');
Insert into TEST (REPORT_ID,AMOUNT,MY_ID,FUND,ORG) values ('1',-50,'910','100000','67130');
Insert into TEST (REPORT_ID,AMOUNT,MY_ID,FUND,ORG) values ('1',100,'910','100000','67150');
Insert into TEST (REPORT_ID,AMOUNT,MY_ID,FUND,ORG) values ('2',200,'910','100000','67130');
Insert into TEST (REPORT_ID,AMOUNT,MY_ID,FUND,ORG) values ('2',-200,'910','100000','67120');
INSERT INTO TEST (REPORT_ID, AMOUNT, MY_ID, FUND, ORG) VALUES ('1', '40.17', '910', '100000', '67150')
INSERT INTO TEST (REPORT_ID, AMOUNT, MY_ID, FUND, ORG) VALUES ('1', '-40.17', '910', '100000', '67150')
INSERT INTO TEST (REPORT_ID, AMOUNT, MY_ID, FUND, ORG) VALUES ('1', '40.17', '910', '100000', '67150')
if you create the table and look closely , you'll notice that by report_id and my_id most positive amounts have a direct negative amount. In the other hand, I need to identify those positive amounts that do not have a corresponding negative amount by my_id , and report_id.
expected result should look like this
"REPORT_ID" "FUND" "MY_ID" "ORG" "AMOUNT"
"1" "100000" "910" "67150" "40.17"
"1" "100000" "910" "67150" "100"
any ideas how can acomplish this.
EDIT:
Posted the wrong output result. Just to be clear the fund and org don't matter until after the match. For example if i was writing this using plsql i would find how many minuses do i have then how many pluses do i have compare each plus amount to each minus amount and delete them then i would be left with whatever plus amounts did not have negative amounts.
I apologize for the confusion. hope this makes it clearer now. once i have all my matches i should end up with only positive amounts that are left behind.
EDIT:
additional inserts
Insert into TEST (REPORT_ID,AMOUNT,MY_ID,FUND,ORG) values ('5',71,'911','100000','67150');
Insert into TEST (REPORT_ID,AMOUNT,MY_ID,FUND,ORG) values ('5',71,'911','100000','67120');
Insert into TEST (REPORT_ID,AMOUNT,MY_ID,FUND,ORG) values ('5',71,'911','100000','67140');
Insert into TEST (REPORT_ID,AMOUNT,MY_ID,FUND,ORG) values ('5',71,'911','100000','67130');
Insert into TEST (REPORT_ID,AMOUNT,MY_ID,FUND,ORG) values ('5',71,'911','100000','67130');
Insert into TEST (REPORT_ID,AMOUNT,MY_ID,FUND,ORG) values ('5',71,'911','100000','67130');
Insert into TEST (REPORT_ID,AMOUNT,MY_ID,FUND,ORG) values ('5',-71,'911','100000','67150');
Insert into TEST (REPORT_ID,AMOUNT,MY_ID,FUND,ORG) values ('5',-71,'911','100000','67150');
Insert into TEST (REPORT_ID,AMOUNT,MY_ID,FUND,ORG) values ('5',-71,'911','100000','67150');
New Version
This should return just the rows that you want. If you are not concerned with org or fund then you can just use the query that is aliased x:
select distinct t1.report_id, t1.fund, t1.my_id, t1.org, t1.amount
from test t1,
(select distinct t.report_id, t.my_id, abs(amount) as amount
from test t
group by t.report_id, t.my_id, abs(amount)
having sum(t.amount) > 0) x
where t1.report_id = x.report_id
and t1.my_id = x.my_id
and t1.amount = x.amount;
Previous Version
select *
from test t
minus
select t1.*
from test t1,
test t2
where t1.amount = -1*t2.amount
and t1.report_id = t2.report_id
and t1.my_id = t2.my_id;
This just gives on row of output for the row with amt 100. I have asked you to clarify in the comments why any row with 200 should be included (if it should). I am also not sure whether you want one of the 47.17 values to be included. The difficulty with this is that the two positive values are identical in the example data you provided, is this correct?
a modified version query which works for me for me with your scenario. I used SQL SERVERZ
select *
from test t
EXCEPT
select t1.*
from test t1,
test t2
where t1.amount = -1*t2.amount
and t1.report_id = t2.report_id
and t1.my_id = t2.my_id;
Result with your Data
REPORT_ID AMOUNT MY_ID FUND ORG
1 100 910 100000 67150
the result after running this update query update test set AMOUNT=-500 where AMOUNT=-200
REPORT_ID AMOUNT MY_ID FUND ORG
1 100 910 100000 67150
2 -500 910 100000 67120
2 200 910 100000 67130
EDIT:
Here's an updated query based on the feedback and additional sample data provided. This query has the advantage of querying the TEST table just once, and it returns the expected results (3 rows of amount 71, one row of amount 100, and one row of amount 40.17).
SELECT
report_id, MAX(fund) fund, my_id, MAX(org) org, SUM(amount) amount
FROM (
SELECT
report_id, fund, my_id, org, amount
, ROW_NUMBER() OVER( PARTITION BY report_id, my_id, amount ) rn
FROM
test
) t
GROUP BY
rn, ABS(amount), report_id, my_id
HAVING
SUM(amount) > 0;
Results:
report_id fund my_id org amount
5 100000 911 67120 71.00
5 100000 911 67140 71.00
5 100000 911 67150 71.00
1 100000 910 67150 40.17
1 100000 910 67150 100.00
INITIAL ANSWER:
The below query should provide what you're looking for. I'm not sure what should be done if org and/or fund are different since you're not grouping on those values - I decided to use a MAX aggregate function on fund and org to select a single value without affecting the grouping. Maybe those columns should just be left out?
SELECT
report_id, MAX(fund) fund, my_id, MAX(org) org, SUM(amount) amount
FROM
test
GROUP BY
report_id, my_id, ABS(amount)
HAVING
SUM(amount) > 0;
Results:
report_id fund my_id org amount
1 100000 910 67150 40.17
1 100000 910 67150 100.00
Note that based on the sample data you provided, the expected result should not show 200 because there's a corresponding -200 for the same report_id (2) and my_id (910).
A colleague of mine has a problem with a sql query:-
Take the following as an example, two temp tables:-
select 'John' as name,10 as value into #names
UNION ALL SELECT 'Abid',20
UNION ALL SELECT 'Alyn',30
UNION ALL SELECT 'Dave',15;
select 'John' as name,'SQL Expert' as job into #jobs
UNION ALL SELECT 'Alyn','Driver'
UNION ALL SELECT 'Abid','Case Statement';
We run the following query on the tables to give us a joined resultset:-
select #names.name, #names.value, #jobs.job
FROM #names left outer join #jobs
on #names.name = #jobs.name
name value job
John 10 SQL Expert
Abid 20 Case Statement
Alyn 30 Driver
Dave 15 NULL
As 'Dave' does not exist in the #jobs table, he is given a NULL value as expected.
My colleague wants to modify the query so each NULL value is given the same value as the previous entry.
So the above would be:-
name value job
John 10 SQL Expert
Abid 20 Case Statement
Alyn 30 Driver
Dave 15 Driver
Note that Dave is now a 'Driver'
There may be more than one NULL value in sequence,
name value job
John 10 SQL Expert
Abid 20 Case Statement
Alyn 30 Driver
Dave 15 NULL
Joe 15 NULL
Pete 15 NULL
In this case Dave, Joe and Pete should all be 'Driver', as 'Driver' is the last non null entry.
There are probably better ways to do this. Here is one of the ways I could achieve the result using Common Table Expressions (CTE) and using that output to perform a OUTER APPLY to find the previous persion's job. The query here uses id to sort the records and then determines what the previous person's job was. You need at least one criteria to sort the records because data in tables are considered to be unordered sets.
Also, the assumption is that the first person in the sequence should have a job. If the first person doesn't have a job, then there is no value to pick from.
Click here to view the demo in SQL Fiddle.
Click here to view another demo in SQL Fiddle with second data set.
Script:
CREATE TABLE names
(
id INT NOT NULL IDENTITY
, name VARCHAR(20) NOT NULL
, value INT NOT NULL
);
CREATE TABLE jobs
(
id INT NOT NULL
, job VARCHAR(20) NOT NULL
);
INSERT INTO names (name, value) VALUES
('John', 10),
('Abid', 20),
('Alyn', 30),
('Dave', 40),
('Jill', 50),
('Jane', 60),
('Steve', 70);
INSERT INTO jobs (id, job) VALUES
(1, 'SQL Expert'),
(2, 'Driver' ),
(5, 'Engineer'),
(6, 'Barrista');
;WITH empjobs AS
(
SELECT
TOP 100 PERCENT n.id
, n.name
, n.value
, job
FROM names n
LEFT OUTER JOIN jobs j
on j.id = n.id
ORDER BY n.id
)
SELECT e1.id
, e1.name
, e1.value
, COALESCE(e1.job , e2.job) job FROM empjobs e1
OUTER APPLY (
SELECT
TOP 1 job
FROM empjobs e2
WHERE e2.id < e1.id
AND e2.job IS NOT NULL
ORDER BY e2.id DESC
) e2;
Output:
ID NAME VALUE JOB
--- ------ ----- -------------
1 John 10 SQL Expert
2 Abid 20 Driver
3 Alyn 30 Driver
4 Dave 40 Driver
5 Jill 50 Engineer
6 Jane 60 Barrista
7 Steve 70 Barrista
What do you mean by "last" non-null entry? You need a well-defined ordering for "last" to have a consistent meaning. Here's a query with data definitions that uses the "value" column to define last, and that might be close to what you want.
CREATE TABLE #names
(
id INT NOT NULL IDENTITY
, name VARCHAR(20) NOT NULL
, value INT NOT NULL PRIMARY KEY
);
CREATE TABLE #jobs
(
name VARCHAR(20) NOT NULL
, job VARCHAR(20) NOT NULL
);
INSERT INTO #names (name, value) VALUES
('John', 10),
('Abid', 20),
('Alyn', 30),
('Dave', 40),
('Jill', 50),
('Jane', 60),
('Steve', 70);
INSERT INTO #jobs (name, job) VALUES
('John', 'SQL Expert'),
('Abid', 'Driver' ),
('Alyn', 'Engineer'),
('Dave', 'Barrista');
with Partial as (
select
#names.name,
#names.value,
#jobs.job as job
FROM #names left outer join #jobs
on #names.name = #jobs.name
)
select
name,
value,
(
select top 1 job
from Partial as P
where job is not null
and P.value <= Partial.value
order by value desc
)
from Partial;
It might be more efficient to insert the data, then update.