TSQL, Get top N unique rows across ordered groups - sql

I have the following table of values, sorted by arbitrary segment id specified by the user. ( I know how to do that query and below are the results )
SegmentID SequenceID
3 100
3 200
3 400
3 430
1 100
1 200
1 300
1 410
2 100
2 200
2 300
2 420
I need a SQL query ( Sql Server 2012 ) that returns top N Records in order of Precedence where SequenceID is not repeated.
Example: user wants 7 sequences in order of segment preference: 3, 1,2.
The correct answer is
SegmentID SequenceID
3 100
3 200
3 400
3 430
1 300
1 410
2 420
in a nutshell, i need to traverse recordset from top to bottom, grab unique sequences as i go and add to the list.
How can I do that in a TSql statement?

create table #data (SegmentID int,SequenceID int);
insert into #data values
(3,100),
(3,200),
(3,400),
(3,430),
(1,100),
(1,200),
(1,300),
(1,410),
(2,100),
(2,200),
(2,300),
(2,420);
This table declares the ordering preference:
create table #prefs (Preference int, SegmentID int);
insert into #prefs values(1,3),(2,1),(3,2);
with cte as
(
select #data.SegmentID,
#data.SequenceID,
Preference,
row_number() over (partition by SequenceID order by Preference) rn
from #data
inner join #prefs on #data.SegmentID = #prefs.SegmentID
)
select SegmentId,
SequenceID
from cte
where rn = 1
order by Preference, SequenceID;

DEMO:
http://rextester.com/JKNKD15000
With cte (SegmentID, SequenceID) as
(SELECT 3, 100 UNION ALL
SELECT 3, 200 UNION ALL
SELECT 3, 400 UNION ALL
SELECT 3, 430 UNION ALL
SELECT 1, 100 UNION ALL
SELECT 1, 200 UNION ALL
SELECT 1, 300 UNION ALL
SELECT 1, 410 UNION ALL
SELECT 2, 100 UNION ALL
SELECT 2, 200 UNION ALL
SELECT 2, 300 UNION ALL
SELECT 2, 420),
userOrder (SegmentID, orderID) as (
SELECT 3, 1 UNION ALL
SELECT 1, 2 UNION ALL
SELECT 2, 3),
Results (SegmentID, SequenceID, RN, orderID) as (
Select A.*
, Row_number() over (Partition by A.SequenceID order by B.orderID) RN
, B.orderID
from cte A
INNER JOIN userOrder B
on A.SegmentID = B.SegmentID)
Select Top 7 *
from results where RN = 1
order by OrderID, SequenceID

Related

how to select set of records is ID is present in one of them

Here is the table where ORGID/USERID makes unique combination:
ORGID USERID
1 1
1 2
1 3
1 4
2 1
2 5
2 6
2 7
3 9
3 10
3 11
I need to select all records (organizations and users) wherever USERID 1 is present. So USERID=1 is present in ORGID 1 and 2 so then select all users for these organizations including user 1 itself, i.e.
ORGID USERID
1 1
1 2
1 3
1 4
2 1
2 5
2 6
2 7
Is it possible to do it with one SQL query rather than SELECT *.. WHERE USERID IN (SELECT...
You could use exists:
select *
from mytable t
where exists (select 1 from mytable t1 where t1.orgid = t.orgid and t1.userid = 1)
Another option is window functions. In Postgres:
select *
from (
select t.*,
bool_or(userid = 1) over(partition by orgid) has_user_1
from mytable t
) t
where has_user_1
Or a more generic approach, that uses portable expressions:
select *
from (
select t.*,
max(case when userid = 1 then 1 else 0 end) over(partition by orgid) has_user_1
from mytable t
) t
where has_user_1 = 1
Yes, you can do it with a single select statement - no in or exists conditions, no analytic or aggregate functions in a subquery, etc. Why you want to do it that way is not clear; in any case, it is possible that the solution below is also more efficient than the alternatives. You will have to test on your real-life data to see if that is true.
The solution below has two potential disadvantages: it only works in Oracle (it uses a proprietary extension of SQL, the match_recognize clause); and it only works in Oracle 12.1 or higher.
with
my_table(orgid, userid) as (
select 1, 1 from dual union all
select 1, 2 from dual union all
select 1, 3 from dual union all
select 1, 4 from dual union all
select 2, 1 from dual union all
select 2, 5 from dual union all
select 2, 6 from dual union all
select 2, 7 from dual union all
select 3, 9 from dual union all
select 3, 10 from dual union all
select 3, 11 from dual
)
-- End of SIMULATED data (for testing), not part of the solution.
-- In real life you don't need the WITH clause; reference your actual table.
select *
from my_table
match_recognize(
partition by orgid
all rows per match
pattern (x* a x*)
define a as userid = 1
);
Output:
ORGID USERID
---------- ----------
1 1
1 2
1 3
1 4
2 1
2 5
2 7
2 6
You can use exists:
select ou.*
from orguser ou
where exists (select 1
from orguser ou ou2
where ou2.orgid = ou.orgid and ou2.userid = 1
);
Apart from Exists and windows function, you can use IN as follows:
select *
from your_table t
where t.orgid in (select t1.orgid from your_table t1 where t1.userid = 1)

Create table from loop output Oracle SQL

I need to pull a random sample from a table of ~5 million observations based on 175 demographic options. The demographic table is something like this form:
1 40 4%
2 30 3%
3 30 3%
- -
174 2 .02%
175 1 .01%
Basically I need this same demographic breakdown randomly sampled from the 5M row table. For each demographic I need a sample of the same one from the larger table but with 5x the number of observations (example: for demographic 1 I want a random sample of 200).
SELECT *
FROM (
SELECT *
FROM my_table
ORDER BY
dbms_random.value
)
WHERE rownum <= 100;
I've used this syntax before to get a random sample but is there any way I can modify this as a loop and substitute variable names from existing tables? I'll try to encapsulate the logic I need in pseudocode:
for (each demographic_COLUMN in TABLE1)
select random(5*num_obs_COLUMN in TABLE1) from ID_COLUMN in TABLE2
/*somehow join the results of each step in the loop into one giant column of IDs */
You could join your tables (assuming the 1-175 demographic value exists in both, or there is an equivalent column to join on), something like:
select id
from (
select d.demographic, d.percentage, t.id,
row_number() over (partition by d.demographic order by dbms_random.value) as rn
from demographics d
join my_table t on t.demographic = d.demographic
)
where rn <= 5 * percentage
Each row in the main table is given a random pseudo-row-number within its demographic (via the analytic row_number()). The outer query then uses the relevant percentage to select how many of those randomly-ordered rows for each demographic to return.
I'm not sure I've understood how you're actually picking exactly how many of each you want, so that probably needs to be adjusted.
Demo with a smaller sample in a CTE, and matching smaller match condition:
-- CTEs for sample data
with my_table (id, demographic) as (
select level, mod(level, 175) + 1 from dual connect by level <= 175000
),
demographics (demographic, percentage, str) as (
select 1, 40, '4%' from dual
union all select 2, 30, '3%' from dual
union all select 3, 30, '3%' from dual
-- ...
union all select 174, 2, '.02%' from dual
union all select 175, 1, '.01%' from dual
)
-- actual query
select demographic, percentage, id, rn
from (
select d.demographic, d.percentage, t.id,
row_number() over (partition by d.demographic order by dbms_random.value) as rn
from demographics d
join my_table t on t.demographic = d.demographic
)
where rn <= 5 * percentage;
DEMOGRAPHIC PERCENTAGE ID RN
----------- ---------- ---------- ----------
1 40 94150 1
1 40 36925 2
1 40 154000 3
1 40 82425 4
...
1 40 154350 199
1 40 126175 200
2 30 36051 1
2 30 1051 2
2 30 100451 3
2 30 18026 149
2 30 151726 150
3 30 125302 1
3 30 152252 2
3 30 114452 3
...
3 30 104652 149
3 30 70527 150
174 2 35698 1
174 2 67548 2
174 2 114798 3
...
174 2 70698 9
174 2 30973 10
175 1 139649 1
175 1 156974 2
175 1 145774 3
175 1 97124 4
175 1 40074 5
(you only need the ID, but I'm including the other columns for context); or more succinctly:
with my_table (id, demographic) as (
select level, mod(level, 175) + 1 from dual connect by level <= 175000
),
demographics (demographic, percentage, str) as (
select 1, 40, '4%' from dual
union all select 2, 30, '3%' from dual
union all select 3, 30, '3%' from dual
-- ...
union all select 174, 2, '.02%' from dual
union all select 175, 1, '.01%' from dual
)
select demographic, percentage, count(id) as ids, min(id) as min_id, max(id) as max_id
from (
select d.demographic, d.percentage, t.id,
row_number() over (partition by d.demographic order by dbms_random.value) as rn
from demographics d
join my_table t on t.demographic = d.demographic
)
where rn <= 5 * percentage
group by demographic, percentage
order by demographic;
DEMOGRAPHIC PERCENTAGE IDS MIN_ID MAX_ID
----------- ---------- ---------- ---------- ----------
1 40 200 175 174825
2 30 150 1 174126
3 30 150 2452 174477
174 2 10 23448 146648
175 1 5 19074 118649
db<>fiddle

SQL Grouping by first digit from sets of record

I need your help in SQL
I have a set of records of Cost center ID below.
what I want to do is to segregate/group them by inserting column to distinguish the category.
as you can see all digits start in 7 is belong to the bold digits.
my expected out is on below image also.
You can as the below:
DECLARE #Tbl TABLE (ID INT)
INSERT INTO #Tbl
VALUES
(735121201),
(735120001),
(5442244),
(735141094),
(735141097),
(4008060),
(735117603),
(40100000),
(735142902),
(735151199),
(4010070)
;WITH TableWithRowId
AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) RowId,
ID
FROM
#Tbl
), TempTable
AS
(
SELECT T.RowId + 1 AS RowId FROM TableWithRowId T
WHERE
LEFT(T.ID, 1) != 7
), ResultTable
AS
(
SELECT
T.RowId ,
T.ID,
DENSE_RANK() OVER (ORDER BY (SELECT TOP 1 A.RowId FROM TempTable A WHERE A.RowId > T.RowId ORDER BY A.RowId)) AS Flag
FROM TableWithRowId T
)
SELECT * FROM ResultTable
Result:
RowId ID Flag
----------- ----------- ----------
1 735121201 1
2 735120001 1
3 5442244 1
4 735141094 2
5 735141097 2
6 4008060 2
7 735117603 3
8 40100000 3
9 735142902 4
10 735151199 4
11 4010070 4
The following query is similer with NEER's
;WITH test_table(CenterID)AS(
SELECT '735121201' UNION ALL
SELECT '735120001' UNION ALL
SELECT '5442244' UNION ALL
SELECT '735141094' UNION ALL
SELECT '735141097' UNION ALL
SELECT '4008060' UNION ALL
SELECT '735117603' UNION ALL
SELECT '40100000' UNION ALL
SELECT '735142902' UNION ALL
SELECT '735151199' UNION ALL
SELECT '4010070'
),t1 AS (
SELECT *,ROW_NUMBER()OVER(ORDER BY(SELECT 1)) AS rn,CASE WHEN LEFT(t.CenterID,1)='7' THEN 1 ELSE 0 END AS isSeven
FROM test_table AS t
),t2 AS(
SELECT t1.*,ROW_NUMBER()OVER(ORDER BY t1.rn) AS toFilter
FROM t1 LEFT JOIN t1 AS pt ON pt.rn=t1.rn-1
WHERE pt.CenterID IS NULL OR (t1.isSeven=1 AND pt.isSeven=0)
)
SELECT t1.CenterID,x.toFilter FROM t1
CROSS APPLY(SELECT TOP 1 t2.toFilter FROM t2 WHERE t2.rn<=t1.rn ORDER BY rn desc) x
CenterID toFilter
--------- --------------------
735121201 1
735120001 1
5442244 1
735141094 2
735141097 2
4008060 2
735117603 3
40100000 3
735142902 4
735151199 4
4010070 4

Oracle sql group sum

I have table With ID,Sub_ID and value coloumns
ID SUB_ID Value
100 1 100
100 2 150
101 1 100
101 2 150
101 3 200
102 1 100
SUB ID can vary from 1..maxvalue( In this example it is 3). I need Sum of values for each Sub_ID. If SUB_ID is less than MAXVALUE for a particlaur ID then it should take MAX(SUB_ID) of each ID As shown below ( In this example for ID=100 for SUB_ID 3 it should take 150 i.e 2<3 so value=150))
SUB_ID SUM(values) Remarks
1 300 (100+100+100)
2 400 (150+150+100)
3 450 (150+200+100)
This can be easily done in PL/SQL . Can we use SQL for the same using Model Clause or any other options
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TableA ( ID, SUB_ID, Value ) AS
SELECT 100, 1, 100 FROM DUAL
UNION ALL SELECT 100, 2, 150 FROM DUAL
UNION ALL SELECT 101, 1, 100 FROM DUAL
UNION ALL SELECT 101, 2, 150 FROM DUAL
UNION ALL SELECT 101, 3, 200 FROM DUAL
UNION ALL SELECT 102, 1, 100 FROM DUAL
Query 1:
WITH sub_ids AS (
SELECT LEVEL AS sub_id
FROM DUAL
CONNECT BY LEVEL <= ( SELECT MAX( SUB_ID ) FROM TableA )
),
max_values AS (
SELECT ID,
MAX( VALUE ) AS max_value
FROM TableA
GROUP BY ID
)
SELECT s.SUB_ID,
SUM( COALESCE( a.VALUE, m.max_value ) ) AS total_value
FROM sub_ids s
CROSS JOIN
max_values m
LEFT OUTER JOIN
TableA a
ON ( s.SUB_ID = a.SUB_ID AND m.ID = a.ID )
GROUP BY
s.SUB_ID
Results:
| SUB_ID | TOTAL_VALUE |
|--------|-------------|
| 1 | 300 |
| 2 | 400 |
| 3 | 450 |
Try this
SELECT SUB_ID,SUM(values),
(SELECT DISTINCT SUBSTRING(
(
SELECT '+'+ CAST(values AS VARCHAR)
FROM table_Name AS T2
WHERE T2.SUB_ID = d.SUB_ID
FOR XML PATH ('')
),2,100000)[values]) as values
FROm table_Name d
GROUP BY SUB_ID
How about something like this:
select max_vals.sub_id, sum(nvl(table_vals.value,max_vals.max_value)) as sum_values
from (
select all_subs.sub_id, t1.id, max(t1.value) as max_value
from your_table t1
cross join (select sub_id from your_table) all_subs
group by all_subs.sub_id, t1.id
) max_vals
left outer join your_table table_vals
on max_vals.id = table_vals.id
and max_vals.sub_id = table_vals.sub_id
group by max_vals.sub_id;
The inner query gets you a list of all sub_id/id combinations and their fall-back values. The out query uses an nvl to use the table value if it exists and the fall-back value if it doesn't.

Count occurrence of id across multiple columns

Ok, what am I doing wrong here. This should be simple...
I have a table that isn't normalized. I want to get a count of the IDs that appear in three columns of the table.
1 100 200 300
2 200 700 800
3 200 300 400
4 100 200 300
result:
2 100
4 200
3 300
1 400
1 700
1 800
Here is my attempt. The the union works. It is my attempt to sum and group them that fails:
select sum(cnt), ICDCodeID from
(
select count(*) cnt, ICDCodeID1 ICDCodeID from encounter
where (ICDCodeID1 is not null) group by ICDCodeID1
UNION ALL
select count(*) cnt, ICDCodeID2 ICDCodeID from encounter
where (ICDCodeID2 is not null) group by ICDCodeID2
UNION ALL
select count(*) cnt, ICDCodeID3 ICDCodeID from encounter
where (ICDCodeID3 is not null) group by ICDCodeID3
) group by cnt, ICDCodeID
Or better way?
Here is the error I am getting: "Incorrect syntax near the keyword 'GROUP'."
Maybe this helps:
SELECT D.ICDCode,
COUNT(*) as Cnt
FROM(SELECT ICDCodeID1 AS ICDCode
FROM encounter
UNION
ALL
SELECT ICDCodeID2 AS ICDCode
FROM encounter
UNION
ALL
SELECT ICDCodeID3 AS ICDCode
FROM encounter
)D
GROUP
BY D.ICDCode;
Try this:
-- build sample data
create table temp(
id int,
col1 int,
col2 int,
col3 int
)
insert into temp
select 1, 100, 200, 300 union all
select 2, 200, 700, 800 union all
select 3, 200, 300, 400 union all
select 4, 100, 200, 300
-- start
;with cte(id, col_value) as(
select id, col1 from temp union all
select id, col2 from temp union all
select id, col3 from temp
)
select
col_value,
count(*)
from cte
group by col_value
-- end
-- clean sample data
drop table temp