I have a a below table where all the columns are same except for group column and I am calculating count(distinct group) and blocks in the same table:
Input:
id
time
CODE
group
value
total_blocks
1
22
32206
mn2
1
200
1
22
32206
mn4
1
200
Output:
id
time
CODE
group
value
count(distinct group)
blocks
1
22
32206
mn2
1
2
100
1
22
32206
mn4
1
2
100
count(distinct group) is just distinct values (mn2 and mn4) and blocks overall wrt to code(32206) is 200, but I am splitting the same over the two rows.
The output should look exactly the same in the final, without removal of any columns.
I tried using count(distinct) but it didn't work
Oracle:
Try it like here:
WITH -- S a m p l e d a t a
tbl (ID, A_TIME, CODE, A_GROUP, A_VALUE, TOTAL_BLOCKS) AS
(
Select 1, 22, 32206, 'mn2', 1, 200 From Dual Union All
Select 1, 22, 32206, 'mn4', 1, 200 From Dual
)
-- S Q L --
Select
ID, A_TIME, CODE, A_GROUP, A_VALUE,
Count(DISTINCT A_GROUP) OVER(Partition By CODE) "COUNT_DIST_GROUP", -- Count distinct groups per CODE
-- Count(DISTINCT A_GROUP) OVER() "COUNT_DIST_GROUP", --Count distinct groups over the whole table
TOTAL_BLOCKS / Count(*) OVER(Partition By CODE) "BLOCKS" -- TOTAL_BLOCKS divided by number of rows per CODDE
-- TOTAL_BLOCKS / Count(*) OVER() "BLOCKS" -- TOTAL_BLOCKS divided by number of rows in the whole table
From
tbl
R e s u l t :
ID A_TIME CODE A_GROUP A_VALUE COUNT_DIST_GROUP BLOCKS
---------- ---------- ---------- ------- ---------- ---------------- ----------
1 22 32206 mn2 1 2 100
1 22 32206 mn4 1 2 100
Comments in code explain the basic use of Count() Over() analytic function. More about analytic functions here.
Using just ROW_NUMBER() analytic function and Max() aggregate function...
-- S Q L --
Select
r.ID, r.A_TIME, r.CODE, t.A_GROUP, r.A_VALUE, MAX_RN "COUNT_DIST_GROUP", (TOTAL_BLOCKS / MAX_RN) "BLOCKS"
From
( SELECT ID, A_TIME, CODE, A_VALUE, Max(RN) "MAX_RN"
FROM (Select ID, A_TIME, CODE, A_VALUE, Row_Number() OVER(Partition By CODE Order By CODE, A_GROUP) "RN"
From tbl )
GROUP BY ID, A_TIME, CODE, A_VALUE ) r
Inner Join tbl t ON(t.CODE = r.CODE)
ID A_TIME CODE A_GROUP A_VALUE COUNT_DIST_GROUP BLOCKS
---------- ---------- ---------- ------- ---------- ---------------- ----------
1 22 32206 mn2 1 2 100
1 22 32206 mn4 1 2 100
... and it works with another group of similar data:
WITH -- S a m p l e d a t a
tbl (ID, A_TIME, CODE, A_GROUP, A_VALUE, TOTAL_BLOCKS) AS
(
Select 1, 22, 32206, 'mn2', 1, 200 From Dual Union All
Select 1, 22, 32206, 'mn4', 1, 200 From Dual Union All
--
Select 1, 22, 32207, 'mn6', 1, 450 From Dual Union All
Select 1, 22, 32207, 'mn7', 1, 450 From Dual Union All
Select 1, 22, 32207, 'mn8', 1, 450 From Dual
)
-- S Q L --
Select
r.ID, r.A_TIME, r.CODE, t.A_GROUP, r.A_VALUE, MAX_RN "COUNT_DIST_GROUP", (TOTAL_BLOCKS / MAX_RN) "BLOCKS"
From
( SELECT ID, A_TIME, CODE, A_VALUE, Max(RN) "MAX_RN"
FROM (Select ID, A_TIME, CODE, A_VALUE, Row_Number() OVER(Partition By CODE Order By CODE, A_GROUP) "RN"
From tbl )
GROUP BY ID, A_TIME, CODE, A_VALUE ) r
Inner Join tbl t ON(t.CODE = r.CODE)
ID A_TIME CODE A_GROUP A_VALUE COUNT_DIST_GROUP BLOCKS
---------- ---------- ---------- ------- ---------- ---------------- ----------
1 22 32206 mn2 1 2 100
1 22 32206 mn4 1 2 100
1 22 32207 mn6 1 3 150
1 22 32207 mn7 1 3 150
1 22 32207 mn8 1 3 150
Related
I need to pull a random sample from a table of ~5 million observations based on 175 demographic options. The demographic table is something like this form:
1 40 4%
2 30 3%
3 30 3%
- -
174 2 .02%
175 1 .01%
Basically I need this same demographic breakdown randomly sampled from the 5M row table. For each demographic I need a sample of the same one from the larger table but with 5x the number of observations (example: for demographic 1 I want a random sample of 200).
SELECT *
FROM (
SELECT *
FROM my_table
ORDER BY
dbms_random.value
)
WHERE rownum <= 100;
I've used this syntax before to get a random sample but is there any way I can modify this as a loop and substitute variable names from existing tables? I'll try to encapsulate the logic I need in pseudocode:
for (each demographic_COLUMN in TABLE1)
select random(5*num_obs_COLUMN in TABLE1) from ID_COLUMN in TABLE2
/*somehow join the results of each step in the loop into one giant column of IDs */
You could join your tables (assuming the 1-175 demographic value exists in both, or there is an equivalent column to join on), something like:
select id
from (
select d.demographic, d.percentage, t.id,
row_number() over (partition by d.demographic order by dbms_random.value) as rn
from demographics d
join my_table t on t.demographic = d.demographic
)
where rn <= 5 * percentage
Each row in the main table is given a random pseudo-row-number within its demographic (via the analytic row_number()). The outer query then uses the relevant percentage to select how many of those randomly-ordered rows for each demographic to return.
I'm not sure I've understood how you're actually picking exactly how many of each you want, so that probably needs to be adjusted.
Demo with a smaller sample in a CTE, and matching smaller match condition:
-- CTEs for sample data
with my_table (id, demographic) as (
select level, mod(level, 175) + 1 from dual connect by level <= 175000
),
demographics (demographic, percentage, str) as (
select 1, 40, '4%' from dual
union all select 2, 30, '3%' from dual
union all select 3, 30, '3%' from dual
-- ...
union all select 174, 2, '.02%' from dual
union all select 175, 1, '.01%' from dual
)
-- actual query
select demographic, percentage, id, rn
from (
select d.demographic, d.percentage, t.id,
row_number() over (partition by d.demographic order by dbms_random.value) as rn
from demographics d
join my_table t on t.demographic = d.demographic
)
where rn <= 5 * percentage;
DEMOGRAPHIC PERCENTAGE ID RN
----------- ---------- ---------- ----------
1 40 94150 1
1 40 36925 2
1 40 154000 3
1 40 82425 4
...
1 40 154350 199
1 40 126175 200
2 30 36051 1
2 30 1051 2
2 30 100451 3
2 30 18026 149
2 30 151726 150
3 30 125302 1
3 30 152252 2
3 30 114452 3
...
3 30 104652 149
3 30 70527 150
174 2 35698 1
174 2 67548 2
174 2 114798 3
...
174 2 70698 9
174 2 30973 10
175 1 139649 1
175 1 156974 2
175 1 145774 3
175 1 97124 4
175 1 40074 5
(you only need the ID, but I'm including the other columns for context); or more succinctly:
with my_table (id, demographic) as (
select level, mod(level, 175) + 1 from dual connect by level <= 175000
),
demographics (demographic, percentage, str) as (
select 1, 40, '4%' from dual
union all select 2, 30, '3%' from dual
union all select 3, 30, '3%' from dual
-- ...
union all select 174, 2, '.02%' from dual
union all select 175, 1, '.01%' from dual
)
select demographic, percentage, count(id) as ids, min(id) as min_id, max(id) as max_id
from (
select d.demographic, d.percentage, t.id,
row_number() over (partition by d.demographic order by dbms_random.value) as rn
from demographics d
join my_table t on t.demographic = d.demographic
)
where rn <= 5 * percentage
group by demographic, percentage
order by demographic;
DEMOGRAPHIC PERCENTAGE IDS MIN_ID MAX_ID
----------- ---------- ---------- ---------- ----------
1 40 200 175 174825
2 30 150 1 174126
3 30 150 2452 174477
174 2 10 23448 146648
175 1 5 19074 118649
db<>fiddle
I have a table like this:
id name value
1 elec 10
1 water 20
2 elec 15
2 water 45
Now I need to dynamically add some rows to the result of select query:
id name value
1 elec 10
1 water 20
1 ratio 0.5
2 elec 15
2 water 45
2 ratio 0.33
Add two rows dynamically,how can i do?
It would make a lot more sense to "present" the results with ELEC, WATER and RATIO columns - one row per ID. The solution below shows how you can do that efficiently (reading the base table only one time).
with
inputs ( id, name, value ) as (
select 1, 'elec' , 10 from dual union all
select 1, 'water', 20 from dual union all
select 2, 'elec' , 15 from dual union all
select 2, 'water', 45 from dual
)
-- End of simulated inputs (not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your actual table and column names.
select id, elec, water, round(elec/water, 2) as ratio
from inputs
pivot ( min(value) for name in ('elec' as elec, 'water' as water ) )
;
ID ELEC WATER RATIO
---------- ---------- ---------- ----------
1 10 20 .5
2 15 45 .33
If instead you need the results in the format you showed in your original post, you can unpivot like so (still reading the base table only once):
with
inputs ( id, name, value ) as (
select 1, 'elec' , 10 from dual union all
select 1, 'water', 20 from dual union all
select 2, 'elec' , 15 from dual union all
select 2, 'water', 45 from dual
)
-- End of simulated inputs (not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your actual table and column names.
select id, name, value
from (
select id, elec, water, round(elec/water, 2) as ratio
from inputs
pivot ( min(value) for name in ('elec' as elec, 'water' as water ) )
)
unpivot ( value for name in (elec as 'elec', water as 'water', ratio as 'ratio') )
;
ID NAME VALUE
---------- ----- ----------
1 elec 10
1 water 20
1 ratio .5
2 elec 15
2 water 45
2 ratio .33
Here is one method:
with t as (
<your query here>
)
select id, name, value
from ((select t.*, 1 as ord
from t
) union all
(select id, 'ratio',
max(case when name = 'elec' then value end) / max(case when name = 'water' then value end)
), 2 as ord
from t
group by id
)
) tt
order by id, ord;
If you are fine with slight change in ordering, try this.
SELECT id,name,value FROM yourtable
UNION ALL
SELECT
a.id ,
'ratio' name,
a.value/b.value value
FROM
yourtable a
JOIN yourtable b on a.id = b.id
WHERE a.name = 'elec'
and b.name = 'water'
ORDER BY
id ,
VALUE DESC;
If you need to add the rows to table itself, then use.
INSERT INTO yourtable
SELECT
a.id ,
'ratio' name,
a.value/b.value value
FROM
yourtable a
JOIN yourtable b on a.id = b.id
WHERE a.name ='elec'
and b.name ='water';
My idea is to do something like this:
INPUT:
ID CURRENCY AMOUNT
1 RUS 14,55
1 USD 22,22
1 PLN 444,44
2 PLN 22
Then I want to group by ID and get output:
ID CUR_1 AMOUNT_1 CUR_2 AMOUNT_2 CUR_3 AMOUNT_3
1 RUS 14,55 USD 22,22 PLN 444,44
2 PLN 22
It is important to combine the right amount with right currency. Maximal number of pairs is 3 like for an ID=1. It may vary from 1 to 3.
I tried using LISTAGG but it will generate problem with further processing of the data.
select *
from (select t.*, row_number() over (partition by id order by null) rn
from t)
pivot (max(currency) cur, sum(amount) amt for rn in (1, 2, 3))
Test:
with t(id, currency, amount) as (
select 1, 'RUS', 14.55 from dual union all
select 1, 'USD', 22.22 from dual union all
select 1, 'PLN', 444.44 from dual union all
select 2, 'PLN', 22 from dual )
select *
from (select t.*, row_number() over (partition by id order by null) rn
from t)
pivot (max(currency) cur, sum(amount) amt for rn in (1, 2, 3))
Output:
ID 1_CUR 1_AMT 2_CUR 2_AMT 3_CUR 3_AMT
---------- ----- ---------- ----- ---------- ----- ----------
1 RUS 14,55 USD 22,22 PLN 444,44
2 PLN 22
You can create a virtual table for each row using a subquery, then join the virtual tables by ID into a single row.
My table seems like this;
A B
1 100
1 102
1 105
2 100
2 105
3 100
3 102
I want output like this:
A Count(B)
1 3
1,2 2
1,2,3 3
2 2
3 2
2,3 2
How can i do this?
I try to use listagg but it didn't work.
I suspect that you want to count the number of sets of A that are in the data -- and that your sample results are messed up.
If so:
select grp, count(*)
from (select listagg(a, ',') within group (order by a) as grp
from t
group by b
) b;
This gives you the counts for the full combinations present in the data. The results would be:
1,2,3 1
1,3 1
1,2 1
You can get the original number of rows by doing:
select grp, sum(cnt)
from (select listagg(a, ',') within group (order by a) as grp, count(*) as cnt
from t
group by b
) b;
Oracle Setup:
CREATE TABLE table_name ( A, B ) AS
SELECT 1, 100 FROM DUAL UNION ALL
SELECT 1, 102 FROM DUAL UNION ALL
SELECT 1, 105 FROM DUAL UNION ALL
SELECT 2, 100 FROM DUAL UNION ALL
SELECT 2, 105 FROM DUAL UNION ALL
SELECT 3, 100 FROM DUAL UNION ALL
SELECT 3, 102 FROM DUAL;
Query:
SELECT A,
COUNT(B)
FROM (
SELECT SUBSTR( SYS_CONNECT_BY_PATH( A, ',' ), 2 ) AS A,
B
FROM table_name
CONNECT BY PRIOR B = B
AND PRIOR A + 1 = A
)
GROUP BY A
ORDER BY A;
Output:
A COUNT(B)
----- ----------
1 3
1,2 2
1,2,3 1
2 2
2,3 1
3 2
Emplid FunctionId Count
1------- 2
1 ------ 3 ---------- 1
1 ------ 4 ---------- 2
1 ------ 4
1 ------ 5 ---------- 3
1 ------ 6 ---------- 4
1 ------ 3 ---------- 5
2 ------ 3
2 ------ 3
2 ------ 1 ---------- 1
2 ------ 2 ---------- 2
H&R is looking for a measure to count the flexibility/mobility in the company.
When an Employee is changing from job/function a FunctionID is stored in the DWH empl dim. See the example above how this table looks like (6000 employee records with lots of mutations as well).
So I need a count only when an employee is going to do something else (another function).
The example above is showing you how the count output should be.
How can do it with T-SqL or in a SSIS package (foreach loop ?)
Disclaimer
For given test script, SQL Server honors the order of inserts in the Employee CTE but beware that there is no guarantee it would do so every time.
It is imperative that you provide an additional sort order besides EmplId in the ROW_NUMBER statement.
SQL Statement
;WITH EmployeeRank AS (
SELECT EmplId
, FunctionID
, rn = ROW_NUMBER() OVER (ORDER BY EmplId)
FROM Employee
)
SELECT er1.EmplID, COUNT(*)
FROM EmployeeRank er1
INNER JOIN EmployeeRank er2
ON er2.Emplid = er1.Emplid
AND er2.rn = er1.rn + 1
AND er2.FunctionId <> er1.FunctionId
GROUP BY
er1.EmplID
Test script
;WITH Employee(Emplid, FunctionId) AS (
SELECT 1, 2
UNION ALL SELECT 1, 3 --1
UNION ALL SELECT 1, 4 --2
UNION ALL SELECT 1, 4
UNION ALL SELECT 1, 5 --3
UNION ALL SELECT 1, 6 --4
UNION ALL SELECT 1, 3 --5
UNION ALL SELECT 2, 3 --1
UNION ALL SELECT 2, 3
UNION ALL SELECT 2, 1 --2
UNION ALL SELECT 2, 2 --3
)
, EmployeeRank AS (
SELECT EmplId
, FunctionID
, rn = ROW_NUMBER() OVER (ORDER BY EmplId)
FROM Employee
)
SELECT er1.EmplID, COUNT(*)
FROM EmployeeRank er1
INNER JOIN EmployeeRank er2
ON er2.Emplid = er1.Emplid
AND er2.rn = er1.rn + 1
AND er2.FunctionId <> er1.FunctionId
GROUP BY
er1.EmplID