Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I'm looking to break a partition based on a NULL value as seen below in the 'GroupNumber' column. The purpose is that within window function statements, there isn't another identifier within my dataset that could break the groups apart (e.g. seen below to derive the "GroupNumber" column). The point is the create this "GroupNumber" column. Is there a way to break/reset the partition when a NULL value exists (ordered by date DESC)? Note: There can be multiple NULL instances for each partition. Any help is appreciated.
METHODOLOGY:
Create bit flag column to represent NULL values.
Use rolling sum (sorted by date DESC) to create these groups. This is a great method because at each observed NULL value, the "GROUP" field would increment dynamically. This would allow for aggregate calculations using this new field as a partition.
EXAMPLE SETUP:
IF OBJECT_ID('tempdb..#GroupNULL', 'U') IS NOT NULL
DROP TABLE #GroupNULL
CREATE TABLE #GroupNULL
([ID] INT NOT NULL,
[Date] date NULL,
[Number] INT NULL)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/12/2018', 35)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/11/2018', 27)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/10/2018', 7)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/9/2018', 18)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/8/2018', NULL)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/7/2018', 3)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/6/2018', 42)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/5/2018', 16)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/4/2018', 9)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/3/2018', NULL)
FURTHER CONTEXT: I would like to partition this dataset into 2 groups, with the first NULL value (ordered by date DESC) to be the first value of the group.
Here's an example that should get you pretty close. It uses windowing aggregates to add up the number of NULLs you have seen in a given order of the table as returned in a query. This works on recent versions of SQL Server/SQL Azure (SQL Server 2012+ I believe)
drop table t1
create table t1 (col1 int, col2 int)
insert into t1(col1, col2) values (1, 1)
insert into t1(col1, col2) values (1, 10)
insert into t1(col1, col2) values (2, NULL)
insert into t1(col1, col2) values (2, 10)
insert into t1(col1, col2) values (3, 2)
insert into t1(col1, col2) values (3, NULL)
SELECT
col1,
col2,
IsBoundary,
SUM(IsBoundary) OVER(ORDER BY col1, col2 ROWS UNBOUNDED PRECEDING) + 1 as GroupNumber
FROM
(
SELECT
col1,
col2,
CASE WHEN col2 is NULL then 1 ELSE 0 END as IsBoundary
FROM
t1
) A
ORDER BY col1, col2
col1 col2 IsBoundary GroupNumber
----------- ----------- ----------- -----------
1 1 0 1
1 10 0 1
2 NULL 1 2
2 10 0 2
3 NULL 1 3
3 2 0 3
SETUP
IF OBJECT_ID('tempdb..#GroupNULL', 'U') IS NOT NULL
DROP TABLE #GroupNULL
CREATE TABLE #GroupNULL
([ID] INT NOT NULL,
[Date] date NULL,
[Number] INT NULL)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/12/2018', 35)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/11/2018', 27)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/10/2018', 7)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/9/2018', 18)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/8/2018', NULL)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/7/2018', 3)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/6/2018', 42)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/5/2018', 16)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/4/2018', 9)
INSERT INTO #GroupNULL (ID, Date, Number) VALUES (1001, '8/3/2018', NULL)
SOLUTION
SELECT x.*,
SUM(Flagged) OVER(ORDER BY ID, Date DESC ROWS UNBOUNDED PRECEDING) AS [GroupNumber]
FROM
(SELECT *,
CASE WHEN LAG(Number) OVER(PARTITION BY ID ORDER BY Date DESC) IS NULL
THEN 1
ELSE 0
END AS [Flagged]
FROM #GroupNULL) x
ID Date Number Flagged GroupNumber
----------- ---------- ----------- ----------- -----------
1001 2018-08-12 35 1 1
1001 2018-08-11 27 0 1
1001 2018-08-10 7 0 1
1001 2018-08-09 18 0 1
1001 2018-08-08 NULL 0 1
1001 2018-08-07 3 1 2
1001 2018-08-06 42 0 2
1001 2018-08-05 16 0 2
1001 2018-08-04 9 0 2
1001 2018-08-03 NULL 0 2
I have 3 tables with the following schema
create table main (
main_id int PRIMARY KEY,
secondary_id int NOT NULL
);
create table secondary (
secondary_id int NOT NULL,
tags varchar(100)
);
create table bad_words (
words varchar(100) NOT NULL
);
insert into main values (1, 1001);
insert into main values (2, 1002);
insert into main values (3, 1003);
insert into main values (4, 1004);
insert into secondary values (1001, 'good word');
insert into secondary values (1002, 'bad word');
insert into secondary values (1002, 'good word');
insert into secondary values (1002, 'other word');
insert into secondary values (1003, 'ugly');
insert into secondary values (1003, 'bad word');
insert into secondary values (1004, 'pleasant');
insert into secondary values (1004, 'nice');
insert into bad_words values ('bad word');
insert into bad_words values ('ugly');
insert into bad_words values ('worst');
expected output
----------------
1, 1000, good word, 0 (boolean flag indicating whether the tags contain any one of the words from the bad_words table)
2, 1001, bad word,good word,other word , 1
3, 1002, ugly,bad word, 1
4, 1003, pleasant,nice, 0
I am trying to use case to select 1 or 0 for the last column and use a join to join the main and secondary table, but getting confused and stuck. Can someone please help me with a query ? These tables are stored in redshift and i want query compatible with redshift.
you can use the above schema to try your query in sqlfiddle
EDIT: I have updated the schema and expected output now by removing the PRIMARY KEY in secondary table so that easier to join with the bad_words table.
You can use EXISTS and a regex comparison with \m and \M (markers for beginning and end of a word, respectively):
with
main(main_id, secondary_id) as (values (1, 1000), (2, 1001), (3, 1002), (4, 1003)),
secondary(secondary_id, tags) as (values (1000, 'very good words'), (1001, 'good and bad words'), (1002, 'ugly'),(1003, 'pleasant')),
bad_words(words) as (values ('bad'), ('ugly'), ('worst'))
select *, exists (select 1 from bad_words where s.tags ~* ('\m'||words||'\M'))::int as flag
from main m
join secondary s using (secondary_id)
select main_id, a.secondary_id, tags, case when c.words is not null then 1 else 0 end
from main a
join secondary b on b.secondary_id = a.secondary_id
left outer join bad_words c on c.words like b.tags
SELECT m.main_id, m.secondary_id, t.tags, t.is_bad_word
FROM srini.main m
JOIN (
SELECT st.secondary_id, st.tags, exists (select 1 from srini.bad_words b where st.tags like '%'+b.words+'%') is_bad_word
FROM
( SELECT secondary_id, LISTAGG(tags, ',') as tags
FROM srini.secondary
GROUP BY secondary_id ) st
) t on t.secondary_id = m.secondary_id;
This worked for me in redshift and produced the following output with the above mentioned schema.
1 1001 good word false
3 1003 ugly,bad word true
2 1002 good word,other word,bad word true
4 1004 pleasant,nice false
I need to fetch data from multiple tables but with some scenarios here are test scripts to re-create the problem
create table sub_test (sub_id number);
create table sub_svc_test (sub_id number, sub_svc_id number);
create table sub_svc_parm_test (sub_svc_id number, parm_id number, val varchar2(20) );
insert into sub_test values (100);
insert into sub_test values (101);
insert into sub_test values (102);
insert into sub_svc_test values (100,1001);
insert into sub_svc_test values (100,1002);
insert into sub_svc_test values (101,1005);
insert into sub_svc_test values (101,1006);
insert into sub_svc_test values (101,1007);
insert into sub_svc_test values (102,1009);
insert into sub_svc_test values (102,1010);
insert into sub_svc_parm_test values (1001, 51, 'test_id');
insert into sub_svc_parm_test values (1001, 53, 'no');
insert into sub_svc_parm_test values (1002, 54, 'max');
insert into sub_svc_parm_test values (1005, 51, 'test_id');
insert into sub_svc_parm_test values (1007, 51, 'test_id');
insert into sub_svc_parm_test values (1007, 54, 'min');
I need to fetch values from sub_svc_parm_test table for VAL column for a particular parm_id such that
select * from sub_svc_test ss, sub_svc_parm_test ssp
where ss.sub_svc_id = ssp.sub_svc_id and parm_id = 51;
this query will give me the VAL for 51 parm_id now i need to craete a view which will show me the VAL for parm_id 51, 54 but in differnet column like
select ssp.val, ssp1.val
from sub_svc_test ss, sub_svc_parm_test ssp, sub_svc_test ss1,
sub_svc_parm_test ssp1
where ss.sub_svc_id = ssp.sub_svc_id
and ssp.parm_id = 51
and ssp1.parm_id = 54
and ss1.sub_svc_id = ssp1.sub_svc_id;
This query will give me output but also it performs cross joins as i have not join the sub_svc_test ss, and sub_svc_test ss1 so it gives me 6 rows 2*3
but the requirement is that it should show me MAX rows of any column in our case it is first column (3 rows) and the remaining row which do not have data can contain any string or simply null in it like
VAL VAL_1
-------------- -------------
test_id max
test_id min
test_id null
i am using ----
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
Please ask for any clarification
Thanks
So you actually want to perform two separate selects and display them next to each other.
Maybe something like this could work:
select val, val1 from (
(select rownum as r, val as val from sub_svc_parm_test ssp1 where parm_id = 51)
full outer join
(select rownum as r1, val as val1 from sub_svc_parm_test ssp2 where parm_id = 54)
on r = r1
)
I am writing query
from two different tables.
Table A and Table B
Here is Query.
select
A.OUT_NUM,
A.TIMESTAMP,
A.LAST_name,
A.event_type,
A.comments,
B.name
from TABLEA A
left outer join TABLEB B ON A.feed_id = B.id
where A.OUT_NUM = '12345'
and A.event_type In ('cause','status')
B.NAME is not null when event_type = xyz else it will be null
I only want to see when event_type in ('CAUSE','STATUS') but also want to see name field but not empty.
second table is what I am trying to achieve.
Thanks
Making some assumptions about your data as in comments, particularly about how to match and pick a substitute name value; and with some dummy data that I think matches yours:
create table tablea(out_num number,
equip_name varchar2(5),
event_type varchar2(10),
comments varchar2(10),
timestamp date, feed_id number);
create table tableb(id number, name varchar2(10));
alter session set nls_date_format = 'MM/DD/YYYY HH24:MI';
insert into tablea values (12345, null, 'abcd', null, to_date('02/11/2013 11:12'), 1);
insert into tablea values (12345, null, 'abcd', null, to_date('02/11/2013 11:11'), 1);
insert into tablea values (12345, null, 'abcd', null, to_date('02/11/2013 11:06'), 1);
insert into tablea values (12345, null, 'abcd', null, to_date('02/11/2013 11:06'), 1);
insert into tablea values (12345, null, 'SUB', null, to_date('02/11/2013 11:11'), 2);
insert into tablea values (12345, null, 'SUB', null, to_date('02/11/2013 11:12'), 2);
insert into tablea values (12345, null, 'XYZ', null, to_date('02/11/2013 11:13'), 3);
insert into tablea values (12345, null, 'XYZ', null, to_date('02/11/2013 11:13'), 3);
insert into tablea values (12345, null, 'XYZ', null, to_date('02/11/2013 11:13'), 3);
insert into tablea values (12345, null, 'XYZ', null, to_date('02/11/2013 11:13'), 3);
insert into tablea values (12345, null, 'XYZ', null, to_date('02/11/2013 11:13'), 3);
insert into tablea values (12345, null, 'XYZ', null, to_date('02/11/2013 11:03'), 3);
insert into tablea values (12345, null, 'CAUSE', 'APPLE', to_date('02/11/2013 11:13'), 4);
insert into tablea values (12345, null, 'CAUSE', 'APPLE', to_date('02/11/2013 11:13'), 4);
insert into tablea values (12345, null, 'CAUSE', 'APPLE', to_date('02/11/2013 11:13'), 4);
insert into tablea values (12345, null, 'STATUS', 'BOOKS', to_date('02/11/2013 11:13'), 5);
insert into tablea values (12345, null, 'STATUS', 'BOOKS', to_date('02/11/2013 11:13'), 5);
insert into tablea values (12345, null, 'STATUS', 'BOOKS', to_date('02/11/2013 11:03'), 5);
insert into tableb values(3, 'LION');
This gets your result:
select * from (
select a.out_num,
a.timestamp,
a.equip_name,
a.event_type,
a.comments,
coalesce(b.name,
first_value(b.name)
over (partition by a.out_num
order by b.name nulls last)) as name
from tablea a
left outer join tableb b on a.feed_id = b.id
where a.out_num = '12345'
and a.event_type in ('CAUSE', 'STATUS', 'XYZ')
)
where event_type in ('CAUSE', 'STATUS');
OUT_NUM TIMESTAMP EQUIP_NAME EVENT_TYPE COMMENTS NAME
---------- ------------------ ---------- ---------- ---------- ----------
12345 02/11/2013 11:03 STATUS BOOKS LION
12345 02/11/2013 11:13 STATUS BOOKS LION
12345 02/11/2013 11:13 STATUS BOOKS LION
12345 02/11/2013 11:13 CAUSE APPLE LION
12345 02/11/2013 11:13 CAUSE APPLE LION
12345 02/11/2013 11:13 CAUSE APPLE LION
The inner query includes XYZ and uses the analytic first_value() function to pick a name if the directly matched value is null - the coalesce may not be necessary if there really will never be a direct match. (You might also need to adjust the partition by or order by clauses if the assumptions are wrong). The outer query just strips out the XYZ records since you don't want those.
If you want to get a name value from any matching record then just remove the filter in the inner query.
But now you're perhaps more likely to have more than one non-null record; this will give you one that matches a.feed_id if it exists, or the 'first' one (alphabetically, ish) for that out_num if it doesn't. You could order by b.id instead, or any other column in tableb; ordering by anything in tablea would need a different solution. If you'll only have one possible match anyway then it doesn't really matter and you can leave out the order by, though it's better to have it anyway.
If I add some more data for a different out_num:
insert into tablea values (12346, null, 'abcd', null, to_date('02/11/2013 11:11'), 1);
insert into tablea values (12346, null, 'SUB', null, to_date('02/11/2013 11:12'), 2);
insert into tablea values (12346, null, 'XYZ', null, to_date('02/11/2013 11:13'), 6);
insert into tablea values (12346, null, 'CAUSE', 'APPLE', to_date('02/11/2013 11:14'), 4);
insert into tablea values (12346, null, 'STATUS', 'BOOKS', to_date('02/11/2013 11:15'), 5);
insert into tableb values(1, 'TIGER');
...then this - which just has the filter dropped, and I've left out the coalesce this time - gives the same answer for 12345, and this for 12346:
select * from (
select a.out_num,
a.timestamp,
a.equip_name,
a.event_type,
a.comments,
first_value(b.name)
over (partition by a.out_num
order by b.name nulls last) as name
from tablea a
left outer join tableb b on a.feed_id = b.id
)
where out_num = '12346'
and event_type in ('CAUSE', 'STATUS');
OUT_NUM TIMESTAMP EQUIP_NAME EVENT_TYPE COMMENTS NAME
---------- ------------------ ---------- ---------- ---------- ----------
12346 02/11/2013 11:14 CAUSE APPLE TIGER
12346 02/11/2013 11:15 STATUS BOOKS TIGER
... where TIGER is linked to abcd, not XYZ.
Use NVL() and LAG() functions.
General example using my sample data. This query fills out blank rows with data - see first_exam and last_exam columns:
SELECT id, name, proc_date, proc_type, first_exam_date
, NVL(prev_exam_date, LAG(prev_exam_date) OVER (ORDER BY name, proc_date)) last_exam_date
FROM
(
SELECT id, name, proc_date, proc_type, first_exam_date
, NVL(first_exam_date, LAG(first_exam_date) OVER (ORDER BY name, proc_date) ) prev_exam_date
FROM
(
SELECT id
, name
, proc_date
, proc_type
, (SELECT MIN(proc_date) OVER (PARTITION BY name, proc_date)
FROM stack_test WHERE proc_type LIKE 'Exam%' AND a.id = id
) first_exam_date
FROM stack_test a
));
ID NAME PROC_DATE PROC_TYPE FIRST_EXAM_DATE LAST_EXAM_DATE
--------------------------------------------------------------------------
1 George 1/1/2013 ExamA 1/1/2013 1/1/2013
2 George 1/3/2013 TreatmentA 1/1/2013
3 George 1/5/2013 TreatmentB 1/1/2013
4 George 2/1/2013 ExamB 2/1/2013 2/1/2013
5 George 2/5/2013 TreatmentA 2/1/2013
Database: MS SQL 2005
Table:
EmployeeNumber | EntryDate | Status
Sample Data:
200 | 3/1/2009 | P
200 | 3/2/2009 | A
200 | 3/3/2009 | A
201 | 3/1/2009 | A
201 | 3/2/2009 | P
Where P is present, A is absent.
I have tried row_number over partion. But it does not generate the sequence which I expect.
For the above data the sequence I expect is
1
1
2
1
1
SELECT EmployeeNumber, EntryDate,Status
ROW_NUMBER() OVER (
PARTITION BY EmployeeNumber, Status
ORDER BY EmployeeNumber,EntryDate ) AS 'RowNumber'
FROM [Attendance]
i'm not sure I follow what you're wanting with the 1 1 2 1 1 sequence, but simply adding an order by to your original query produces that sequence...
SELECT EmployeeNumber,
EntryDate,
Status,
ROW_NUMBER() OVER (PARTITION BY EmployeeNumber, Status ORDER BY EmployeeNumber, EntryDate) AS 'RowNumber'
FROM Attendance
ORDER BY EmployeeNumber, EntryDate
/*
EmployeeNumber EntryDate Status RowNumber
-------------- ----------------------- ------ --------------------
200 2009-03-01 00:00:00 P 1
200 2009-03-02 00:00:00 A 1
200 2009-03-03 00:00:00 A 2
201 2009-03-01 00:00:00 A 1
201 2009-03-02 00:00:00 P 1
(5 row(s) affected)
*/
You should be able to do this with a CTE in SQL 2005. Stealing Lievens data:
DECLARE #Attendance TABLE (EmployeeNumber INTEGER, EntryDate DATETIME, Status VARCHAR(1))
INSERT INTO #Attendance VALUES (200, '03/01/2009', 'P')
INSERT INTO #Attendance VALUES (200, '03/02/2009', 'A')
INSERT INTO #Attendance VALUES (200, '03/03/2009', 'A')
INSERT INTO #Attendance VALUES (200, '03/04/2009', 'A')
INSERT INTO #Attendance VALUES (200, '04/04/2009', 'A')
INSERT INTO #Attendance VALUES (200, '04/05/2009', 'A')
INSERT INTO #Attendance VALUES (201, '03/01/2009', 'A')
INSERT INTO #Attendance VALUES (201, '03/02/2009', 'A')
INSERT INTO #Attendance VALUES (201, '03/03/2009', 'P');
Then use this CTE to extract the sequence:
WITH Dates
(
EntryDate,
EmployeeNumber,
Status,
Days
)
AS
(
SELECT
a.EntryDate,
a.EmployeeNumber,
a.Status,
1
FROM
#Attendance a
WHERE
a.EntryDate = (SELECT MIN(EntryDate) FROM #Attendance)
-- RECURSIVE
UNION ALL
SELECT
a.EntryDate,
a.EmployeeNumber,
a.Status,
CASE WHEN (a.Status = Parent.Status) THEN Parent.Days + 1 ELSE 1 END
FROM
#Attendance a
INNER JOIN
Dates parent
ON
datediff(day, a.EntryDate, DateAdd(day, 1, parent.EntryDate)) = 0
AND
a.EmployeeNumber = parent.EmployeeNumber
)
SELECT * FROM Dates order by EmployeeNumber, EntryDate
Although as a final note the sequence does seem strange to me, depending on your requirements there may be a better way of aggregating the data? Never the less, this will produce the sequence you require
Does this help you?
It doesn't produce the sequence you ask (No idea how to do that) but it does give you the ammount of consecutive days someone has been absent.
DECLARE #Attendance TABLE (EmployeeNumber INTEGER, EntryDate DATETIME, Status VARCHAR(1))
INSERT INTO #Attendance VALUES (200, '03/01/2009', 'P')
INSERT INTO #Attendance VALUES (200, '03/02/2009', 'A')
INSERT INTO #Attendance VALUES (200, '03/03/2009', 'A')
INSERT INTO #Attendance VALUES (200, '03/04/2009', 'A')
INSERT INTO #Attendance VALUES (200, '04/04/2009', 'A')
INSERT INTO #Attendance VALUES (200, '04/05/2009', 'A')
INSERT INTO #Attendance VALUES (201, '03/01/2009', 'A')
INSERT INTO #Attendance VALUES (201, '03/02/2009', 'A')
INSERT INTO #Attendance VALUES (201, '03/03/2009', 'P')
SELECT a1.EmployeeNumber, [Absent] = COUNT(*) + 1
FROM #Attendance a1
INNER JOIN #Attendance a2 ON a1.EntryDate = a2.EntryDate - 1
AND a1.EmployeeNumber = a2.EmployeeNumber
AND a1.Status = a2.Status
GROUP BY a1.EmployeeNumber
You could use recursion, similar to what I have done here. It seems though that your problem is a little simpler, and since SQL Server limits recursion to 99, this might not work for people who are absent a lot. Let me think about this a few minutes.
If you have a row for every single day, go with Lieven's join.