I already have a working example which does exactly what I need.
Now the problem is, that I'm not really a fan of subqueries and I think there could be a better solution to this problem.
So here is my (already) working example:
with t as
(
select 'Group1' as maingroup,'Name 1' as subgroup, 'random' as random, 500 as subgroupbudget from dual
union all
select 'Group1','Name 1','random2',500 from dual
union all
select 'Group1','Name 2','random3', 500 from dual
union all
select 'Group2','Name 3','random4', 500 from dual
union all
select 'Group2','Name 4','random5',500 from dual
union all
select 'Group2','Name 5', 'random6',500 from dual
)
select
maingroup,
subgroup,
random,
(select distinct sum(subgroupbudget) over(partition by maingroup) from t b where a.maingroup=b.maingroup group by maingroup,subgroup,subgroupbudget) groupbudget
from t a
group by maingroup, subgroup ,subgroupbudget, random
order by maingroup, subgroup
As you can see, the with-clause shows a simplified table with data. Now the problem is that the last column is the budget of the subgroup. In the result I need the budget of the maingroup. That means I have to sum all values within the maingroup, but only if the subgroups are different (Here I need some kind of distinct).
Unfortunately a simple
sum(distinct subgroupbudget) over(partition by maingroup)
won't work because the numbers (subgroupbudget) can be the same (like in the example)
Assuming that for a maingroup/subgroup, the subgroupbudget is always the same (or you only take the highest value for the subgroup), this should work:
with t as (select 'Group1' as maingroup,'Name 1' as subgroup, 'random' as random, 500 as subgroupbudget from dual
union all
select 'Group1','Name 1','random2',500 from dual
union all
select 'Group1','Name 2','random3', 500 from dual
union all
select 'Group2','Name 3','random4', 500 from dual
union all
select 'Group2','Name 4','random5',500 from dual
union all
select 'Group2','Name 5', 'random6',500 from dual),
t1 as (select maingroup,
subgroup,
random,
case when row_number() over (partition by maingroup, subgroup order by subgroupbudget desc) = 1 then subgroupbudget
end subgroupbudget
from t)
select maingroup,
subgroup,
random,
sum(subgroupbudget) over (partition by maingroup) groupbudget
from t1;
MAINGROUP SUBGROUP RANDOM GROUPBUDGET
--------- -------- ------- -----------
Group1 Name 1 random 1000
Group1 Name 1 random2 1000
Group1 Name 2 random3 1000
Group2 Name 3 random4 1500
Group2 Name 4 random5 1500
Group2 Name 5 random6 1500
It is effectively saying that for a maingroup/subgroup, you only want to use one of the values (the highest) of the rows in that subgroup in the sum.
Whether it's "better" (i.e. more performant) than your original query is something that you would have to test. Sub-queries are not necessarily a bad thing; they are a tool, and sometimes they're the right tool to use.
Related
I have a query with union all functionality each giving me count(*) return from respective queries and another count query like below. I want an outer query that gives the total.
1st query
select count(*) from a
union all
select count(*) from b;
Sample result for 1st query:
COUNT
10
40
2nd query
select count(*) from xy;
Sample result for 2nd query:
COUNT
20
I want output like this in 2 rows:
TABLES
COUNT
xy
20
ab
50
something like above. How can I achieve this in oracle? please suggest the best way to do this.
I wrote a select and union all but not sure how to proceed further.
One option is to sum counts returned by the 1st query and then union it with the 2nd; also, add constants which show the source:
select 'ab' what, (select count(*) from a) + (select count(*) from b) cnt from dual
union all
select 'xy', count(*) from xy;
You can use:
SELECT 'ab' AS type,
COUNT(*) AS total
FROM ( SELECT 1 FROM a UNION ALL
SELECT 1 from b );
UNION ALL
SELECT 'xy', COUNT(*)
FROM xy;
You can sum counts from your three unioned Select statements and group the result by combination of sources:
WITH
a AS
( Select LEVEL "A_ID", 'some column a' "COL_A" From Dual Connect By LEVEL <= 30 ),
b AS
( Select LEVEL "B_ID", 'some column b' "COL_B" From Dual Connect By LEVEL <= 20 ),
xy AS
( Select LEVEL "XY_ID", 'some column xy' "COL_XY" From Dual Connect By LEVEL <= 20 )
with above sample data it is like here:
SELECT
CASE WHEN SOURCE IN('a', 'b') THEN 'ab' ELSE SOURCE END "SOURCE",
Sum(CNT) "CNT"
FROM
( Select 'a' "SOURCE", Count(*) "CNT" From a Union All
Select 'b', Count(*) From b Union All
Select 'xy', Count(*) From xy
)
GROUP BY
CASE WHEN SOURCE IN('a', 'b') THEN 'ab' ELSE SOURCE END
--
-- R e s u l t :
-- SOURCE CNT
-- ------ ----------
-- ab 50
-- xy 20
Assuming that your real queries can be a lot more complex, I take it as a given that we shall not try to change them and somehow merge or split them.
Your first query returns two rows. You want to get their sum, so you must aggregate the result and use SUM.
Below query uses CTEs (subqueries in the WITH clause) for your two queries, and then a query that gets this sum. It then uses these CTEs for the final UNION ALL query.
with query1 (cnt) as (select count(*) from a union all select count(*) from b)
, query2 (cnt) as (select count(*) from xy)
, sumquery1 (total) as (select sum(cnt) from query1)
select 'ab' as tables, total from sumquery1
union all
select 'xy' as tables, cnt from query2
order by tables desc;
In my database that represents a car service station, I am trying to figure out a SQL query that would give me a total average of how much does the customer pays for a single service but instead of getting AVG() of the price on all existing Invoices, I want to group the invoices by the same reservation_id. After that, I would like to get the total average of all of those grouped results.
I am using the two tables listed in the picture below. I want to get the value of a total average price by applying AVG() on all averages that are made by grouping prices by the same FK Reservation_reservation_id.
I tried to make this into a single query but I failed so I came looking for help from more experienced users. Also, I need to select (get) only the result of the total average. This result should give me an overview of how much each customer pays on average for one reservation.
Thanks for your time
You appear to want to aggregate twice:
SELECT AVG( avg_price ) avg_avg_price
FROM (
SELECT AVG( price ) AS avg_price
FROM invoice
GROUP BY reservation_reservation_id
)
Which, for the sample data:
CREATE TABLE invoice ( reservation_reservation_id, price ) AS
SELECT 1, 10 FROM DUAL UNION ALL
SELECT 1, 12 FROM DUAL UNION ALL
SELECT 1, 14 FROM DUAL UNION ALL
SELECT 1, 16 FROM DUAL UNION ALL
SELECT 2, 10 FROM DUAL UNION ALL
SELECT 2, 11 FROM DUAL UNION ALL
SELECT 2, 12 FROM DUAL;
Outputs:
AVG_AVG_PRICE
12
db<>fiddle here
If you want this per customer:
SELECT customer_customer_id, AVG(avg_reservation_price)
FROM (SELECT i.customer_customer_id, i.reservation_reservation_id,
AVG(i.price) as avg_reservation_price
FROM invoice i
GROUP BY i.customer_customer_id, i.reservation_reservation_id
) ir
GROUP BY customer_customer_id;
If you want this for a particular "checkout reason" -- which is the closest that I imagine that "service" means -- then join in the reservations table and filter:
SELECT customer_customer_id, AVG(avg_reservation_price)
FROM (SELECT i.customer_customer_id, i.reservation_reservation_id,
AVG(i.price) as avg_reservation_price
FROM invoice i JOIN
reservation r
ON i.reservation_reservation_id = r.reservation_id
WHERE r.checkup_type = ?
GROUP BY i.customer_customer_id, i.reservation_reservation_id
) ir
GROUP BY customer_customer_id;
You might want to try the below:
with aux (gr, subgr, val) as (
select 'a', 'a1', 1 from dual union all
select 'a', 'a2', 2 from dual union all
select 'a', 'a3', 3 from dual union all
select 'a', 'a4', 4 from dual union all
select 'b', 'b1', 5 from dual union all
select 'b', 'b2', 6 from dual union all
select 'b', 'b3', 7 from dual union all
select 'b', 'b4', 8 from dual)
SELECT
gr,
avg(val) average_gr,
avg(avg(val)) over () average_total
FROM
aux
group by gr;
Which, applied to your table, would result in:
SELECT
reservation_id,
avg(price) average_rn,
avg(avg(price)) over () average_total
FROM
invoices
group by reservation_id;
i'm working with oracle, plSql, i need to query a table and select the max id where a key is matched, now i have this query
select t.* from (
select distinct (TO_CHAR(I.DATE, 'YYMMDD') || I.AUTH_CODE || I.AMOUNT || I.CARD_NUMBER) as kies, I.SID as ids
from transactions I) t group by kies, ids order by ids desc;
It's displaying this data
If i remove the ID from the query, it displays the distinct keys (in the query i use the alias KIES because keys was in blue, so i thought it might be a reserved word)
How can i display the max id (last one inserted) for every different key without displaying all the data like in the first image??
greetings.
Do you just want aggregation?
select thekey, max(sid)
from (select t.*,
(TO_CHAR(t.DATE, 'YYMMDD') || t.AUTH_CODE || t.AMOUNT || t.CARD_NUMBER) as thekey,
t.SID
from transactions t
) t
group by thekey
order by max(ids) desc;
Since you haven't provided data in text format, its difficult to type such long numbers and recreated the data.
However I think you can simply use the MAX analytical function to achieve your results.
with data as (
select 1111 keys,1 id from dual
union
select 2222, 1 from dual
union
select 1111, 2 from dual
union
select 2222,3 from dual
union
select 9999, 1 from dual
union
select 1111, 5 from dual
)
select distinct keys, max(id) over( partition by (keys)) from data
This query returns -
KEYS MAX(ID)OVER(PARTITIONBY(KEYS))
1111 5
9999 1
2222 3
I'm trying to join a set of county names from one table with county names in another table. The issue here is that, the county names in both tables are not normalized. They are not same in count; also, they may not be appearing in similar pattern always. For instance, the county 'SAINT JOHNS' in "Table A" may be represented as 'ST JOHNS' in "Table B". We cannot predict a common pattern for them.
That means , we cannot use "equal to" (=) condition while joining. So, I'm trying to join them using the JARO_WINKLER_SIMILARITY function in oracle.
My Left Outer Join condition would be like:
Table_A.State = Table_B.State
AND UTL_MATCH.JARO_WINKLER_SIMILARITY(Table_A.County_Name,Table_B.County_Name)>=80
I've given the measure 80 after some testing of the results and it seemed to be optimal.
Here, the issue is that I'm getting set of "false Positives" when joining. For instance, if there are some counties with similarity in names under the same state ("BARRY'and "BAY" for example), they will be matched if the measure is >=80.
This creates inaccurate set of joined data.
Can anyone please suggest some work around?
Thanks,
DAV
Can you plz help me to build a query that will lookup Table_A for each record in Table B/C/D, and match against the county name in A with highest ranked similarity that is >=80
Oracle Setup:
CREATE TABLE official_words ( word ) AS
SELECT 'SAINT JOHNS' FROM DUAL UNION ALL
SELECT 'MONTGOMERY' FROM DUAL UNION ALL
SELECT 'MONROE' FROM DUAL UNION ALL
SELECT 'SAINT JAMES' FROM DUAL UNION ALL
SELECT 'BOTANY BAY' FROM DUAL;
CREATE TABLE words_to_match ( word ) AS
SELECT 'SAINT JOHN' FROM DUAL UNION ALL
SELECT 'ST JAMES' FROM DUAL UNION ALL
SELECT 'MONTGOMERY BAY' FROM DUAL UNION ALL
SELECT 'MONROE ST' FROM DUAL;
Query:
SELECT *
FROM (
SELECT wtm.word,
ow.word AS official_word,
UTL_MATCH.JARO_WINKLER_SIMILARITY( wtm.word, ow.word ) AS similarity,
ROW_NUMBER() OVER ( PARTITION BY wtm.word ORDER BY UTL_MATCH.JARO_WINKLER_SIMILARITY( wtm.word, ow.word ) DESC ) AS rn
FROM words_to_match wtm
INNER JOIN
official_words ow
ON ( UTL_MATCH.JARO_WINKLER_SIMILARITY( wtm.word, ow.word )>=80 )
)
WHERE rn = 1;
Output:
WORD OFFICIAL_WO SIMILARITY RN
-------------- ----------- ---------- ----------
MONROE ST MONROE 93 1
MONTGOMERY BAY MONTGOMERY 94 1
SAINT JOHN SAINT JOHNS 98 1
ST JAMES SAINT JAMES 80 1
Using some made up test data inline (you would use your own TABLE_A and TABLE_B in place of the first two with clauses, and begin at with matches as ...):
with table_a (state, county_name) as
( select 'A', 'ST JOHNS' from dual union all
select 'A', 'BARRY' from dual union all
select 'B', 'CHEESECAKE' from dual union all
select 'B', 'WAFFLES' from dual union all
select 'C', 'UMBRELLAS' from dual )
, table_b (state, county_name) as
( select 'A', 'SAINT JOHNS' from dual union all
select 'A', 'SAINT JOANS' from dual union all
select 'A', 'BARRY' from dual union all
select 'A', 'BARRIERS' from dual union all
select 'A', 'BANANA' from dual union all
select 'A', 'BANOFFEE' from dual union all
select 'B', 'CHEESE' from dual union all
select 'B', 'CHIPS' from dual union all
select 'B', 'CHICKENS' from dual union all
select 'B', 'WAFFLING' from dual union all
select 'B', 'KITTENS' from dual union all
select 'C', 'PUPPIES' from dual union all
select 'C', 'UMBRIA' from dual union all
select 'C', 'UMBRELLAS' from dual )
, matches as
( select a.state, a.county_name, b.county_name as matched_name
, utl_match.jaro_winkler_similarity(a.county_name,b.county_name) as score
from table_a a
join table_b b on b.state = a.state )
, ranked_matches as
( select m.*
, rank() over (partition by m.state, m.county_name order by m.score desc) as ranking
from matches m
where score > 50 )
select rm.state, rm.county_name, rm. matched_name, rm.score
from ranked_matches rm
where ranking = 1
order by 1,2;
Results:
STATE COUNTY_NAME MATCHED_NAME SCORE
----- ----------- ------------ ----------
A BARRY BARRY 100
A ST JOHNS SAINT JOHNS 80
B CHEESECAKE CHEESE 92
B WAFFLES WAFFLING 86
C UMBRELLAS UMBRELLAS 100
The idea is matches computes all scores, ranked_matches assigns them a sequence within (state, county_name), and the final query picks all the top scorers (i.e. filters on ranking = 1).
You may still get some duplicates as there is nothing to stop two different fuzzy matches scoring the same.
Say, I have a table like
Name Pets
-------------------------
Anna Cats,Dogs,Hamsters
John Cats
Jake Dogs,Cats
Jill Parrots
And I want to count, how many people have different types of pets. The output would be something like
Pets Owners
---------------
Cats 3
Dogs 2
Hamsters 1
Parrots 1
Limitations:
Reworking of DB scheme is impractical. If I could do it, I wouldn't be here.
All logic must be done in one SQL query.
I can't take result table and deduce owner count later in code.
Using built-in Oracle functions is OK, but writing custom functions is discouraged.
Oracle version – 11 and up.
It's a terrible design - as you mentioned - so I don't envy you having to work with it!
It's possible to do what you're after, although I wouldn't like to say that it would be performant for larger datasets!
Assuming the name column is the primary key (or at least unique):
with t1 as (select 'Anna' name, 'Cats,Dogs,Hamsters' pets from dual union all
select 'John' name, 'Cats' pets from dual union all
select 'Jake' name, 'Dogs,Cats' pets from dual union all
select 'Jill' name, 'Parrots' pets from dual)
select pet pets,
count(*) owners
from (select name,
regexp_substr(pets, '(.*?)(,|$)', 1, level, null, 1) pet
from t1
connect by prior name = name
and prior sys_guid() is not null
and level <= regexp_count(pets, ',') + 1)
group by pet
order by owners desc, pet;
PETS OWNERS
---------- ----------
Cats 3
Dogs 2
Hamsters 1
Parrots 1
It is a bad design to store comma-separated values in a single column. You should consider normalizing the data. Having such a design will always push you to have an overhead of manipulating delimited-strings.
Anyway, as a workaround, you could use REGEXP_SUBSTR and CONNECT BY to split the comma-delimited string into multiple rows and then count the pets.
There are other ways of doing it too, like XMLTABLE, MODEL clause. Have a look at split the comma-delimited string into multiple rows.
SQL> WITH sample_data AS(
2 SELECT 'Anna' NAME, 'Cats,Dogs,Hamsters' pets FROM dual UNION ALL
3 SELECT 'John' NAME, 'Cats' pets FROM dual UNION ALL
4 SELECT 'Jake' NAME, 'Dogs,Cats' pets FROM dual UNION ALL
5 SELECT 'Jill' NAME, 'Parrots' pets FROM dual
6 )
7 -- end of sample_data mimicking a real table
8 SELECT pets,
9 COUNT(*) cnt
10 FROM
11 (SELECT trim(regexp_substr(t.pets, '[^,]+', 1, lines.COLUMN_VALUE)) pets
12 FROM sample_data t,
13 TABLE (CAST (MULTISET
14 (SELECT LEVEL FROM dual CONNECT BY LEVEL <= regexp_count(t.pets, ',')+1
15 ) AS sys.odciNumberList ) ) lines
16 ORDER BY NAME,
17 lines.COLUMN_VALUE
18 )
19 GROUP BY pets
20 ORDER BY cnt DESC;
PETS CNT
------------------ ----------
Cats 3
Dogs 2
Hamsters 1
Parrots 1
SQL>
My try, only with substr and instr :)
with a as (
select 'Anna' as name, 'Cats,Dogs,Hamsters' as pets from dual union all
select 'John', 'Cats' from dual union all
select 'Jake', 'Dogs,Cats' from dual union all
select 'Jill', 'Parrots' from dual
),
b as(
select name, pets, substr(pets, starting_pos, ending_pos - starting_pos) pet
from (
select name, pets,
decode(lvl, 1, 0, instr(a.pets,',',1,lvl-1))+1 starting_pos,
instr(a.pets,',',1,lvl) ending_pos
from (select name, pets||',' pets from a
)a
join (select level lvl from dual connect by level < 10)
on instr(a.pets,',', 1, lvl) > 0
)
)
--select * from b
select pet, count(*) from b group by pet;
select x pets ,count(x) Owners from (
select extractvalue(value(x), '/b') x
from (select yourcolumn as str from yourtable) t,
table(
xmlsequence(
xmltype('<a><b>' || replace(str, ',', '</b><b>') || '</b></a>' ).extract('/*/*')
)
) x)
group by x;
/replace yourcolumn with your column name (pets) and yourtable with your table name./