Oracle sql - select distinct columns based on few columns - sql

I have a select query like,
select a, b, c, d, e, f, g, h, i, j from sample_table
I need to have distinct set of records from this table, so I put
select distinct a, b, c, d, e, f, g, h, i, j from sample_table
But still, the duplicate columns are coming in the result set, as i,j is differing with a minor change like result, result1, RESULT. I need to get rid of this minor change but want to have it in the result set.
How do I select distinct columns of a,b,c,d,e,f,g,h and also have i,j in the result set.

You can do this using the analytic functions:
select a, b, c, d, e, f, g, h, i, j
from (select st.*,
row_number() over (partition by a, b, c, d, e, f, g, h order by a) as seqnum
from sample_table
) st
where seqnum = 1;
This ensures that the values of i and j come from the same row.

SELECT DISTINCT removes duplicate rows.
If you consider certain values to be "the same" either within a column or between two columns, then before rows containing them can be seen as duplicates by the DBMS you have to make them actually the same.
Within a column you can convert each possible variation to one particular variation. This is called converting to a canonical or normal form.
select distinct ...,
case i when "result1" then "result"
else "RESULT" then "result"
else "result" then "result"
else "dOg" then "dog"
...
end as i,
convert_to_upper_case(j) as j,
correct_spelling(k) as k
from sample_table
If you want to consider values to be the same across columns then you can convert them in that way and compare the canonical forms. Or you can write an expression that compares them and output a single value both columns. This is called an equivalence relation.
select distinct ...,g,h,i, i as j
from sample_table
where ...
AND my_canonical_form(g) = my_canonical_form(h)
AND equivalent_according_to_me(i,j)
That can be used in generating sample_table if j wasn't really supposed to be different from i there:
select distinct ..., t.i, t.i as j -- no u.j
from t,u where ... and close_enough(t.i,u.j)
The idea is that canonical_form(x) = canonical_form(y) exactly when equivalent(x,y).
You can either keep both i and j columns or drop one if you want.

May be you can try:
select distinct a, b, c, d, e, f, g, h, min(upper(i)) i, min(upper(j)) j from sample_table
group by a, b, c, d, e, f, g, h;
You can consider using min or max combined with substring, upper, lower or whichever suits your requirement.
As alex poole has pointed out, you can also consider having a column with timestamp, so that the latest or the earliest record can be displayed in the result-set.

In addition to Nishanthi Grashia answer: if you have to show all the values of the different columns, you can use listagg as aggregate function:
select same1,same2,same3,same4,same5,same6,same7,same8,
listagg(diff1, ',') within group (order by 1,2,3,4,5,6,7,8)
, listagg(diff2, ',') within group (order by 1,2,3,4,5,6,7,8) from (
select 1 as same1, 2 as same2 ,3 as same3, 4 as same4,5 as same5,6 as same6,7 as same7, 8 as same8,'m' as diff1,'n' as diff2 from dual
union
select 1,2,3,4,5,6,7,8,'n','o' from dual
union
select 2,3,4,5,6,7,8,9,'p','q' from dual
union
select 2,3,4,5,6,7,8,9,'p','x' from dual
) qry1 group by same1,same2,same3,same4,same5,same6,same7,same8

Related

How to chose Table based on parameterized database name?

My code takes in a parameter ${ID}$ (string) and based on what ID evaluates to I want to chose a different table to use. Guess I cant use a case inside a FROM statement. Some example code looks like:
select *
from ${ID}$_charges.transaction_charge
where execution_date = '2011-03-22'
So if ID is 'N' then I want to use the transaction_charge table so the statement resolves to N_charges.transaction_charge
However if ID is 'B' or 'P' then I want to use a different table called conformity_charge and the statement would evaluate to B_charges.conformity_charge or P_charges.conformity_charge
How can I write this statement?
If you have a low number of possible tables to target, the closest you can get, apart from dynamic SQL, is:
NOTE: Depending of the capabilities of your database engine and the size of your tables there might be performance penalties that may or may not matter.
SELECT a, b, c
FROM (
SELECT 'N' as TableName, a, b, c
FROM N_charges.transaction_charge
UNION ALL
SELECT 'P' as TableName, a, b, c
FROM P_charges.transaction_charge
UNION ALL
SELECT 'B' as TableName, a, b, c
FROM B_charges.transaction_charge
) t
WHERE TableName = '${ID}$'
# Another variation
SELECT a, b, c
FROM N_charges.transaction_charge
WHERE 'N' = '${ID}$'
UNION ALL
SELECT a, b, c
FROM P_charges.transaction_charge
WHERE 'P' = '${ID}$'
UNION ALL
SELECT a, b, c
FROM B_charges.transaction_charge
WHERE 'B' = '${ID}$'

Find missing value in table from given set

Assume there is a table called "allvalues" with a column named "column".
This column contains the values "A" to "J" while missing the "H".
I am given a set of values from "G" to "J".
How can I query the table to see which value of my set is missing in the column?
The following does not work:
select * from allvalues where column not in ('G', 'H', 'I', 'J')
This query would result in A, B, C, D, E, F, H which also contains values not included in the given set.
Obviously in such a small data pool the missing value is noticeable by eye, but imagine more entries in the table and a bigger set.
You need to start with a (derived) table with the values you are checking. One explicit method is:
with testvalues as (
select 'G' as val from dual union all
select 'H' as val from dual union all
select 'I' as val from dual union all
select 'J' as val from dual
)
select tv.val
from testvalues tv
where not exists (select 1 from allvalues av where av.column = tv.val);
Often, the values originate through a query or a table. So explicitly declaring them is unnecessary -- you can replace that part with a subquery.
Depends on which SQL syntax you can use, but basically you want to check your table allvalues + the extra values.
eg:
SELECT *
FROM ALLVALUES
WHERE COLUMN NOT IN (
( select s.column from allvalues s )
and column not in ('G', 'H', 'I', 'J')
this will work:
select * from table1;
G
H
I
J
select * from table1
minus
(select * from table1
intersect
select column from allvalues
)
sample input:
select * from ns_table10;
G
H
I
J
SELECT * FROM ns_table11;
A
B
C
D
E
F
G
J
I
select * from ns_table10
minus
(select * from ns_table10
intersect
select * from ns_table11
);
output:
H

How to update a column for all rows after each time one row is processed by a UDF in BigQuery?

I'm trying to update a column for all rows after each time one row is processed by a UDF.
The example has 3 rows with 6 columns. Column "A" has the same value across 3 rows; column "B" and "A" is the joint identifier of each row; column "C" is arrays with any letters in a,b,c,d,e; column "D" is the target array to be filled in; column "E" is some integers; column "abcde" is the integer array with 5 integers specifying the counts for each letter a,b,c,d,e.
Each row will be passed into a UDF to update the column "D" and column "abcde" according to the column "C" and column "E". The rule is: select the number, which specified by "E", of items from "C" to put into "D"; the selection is random; after each selection done for a row, the column 'abcde' will be updated across all rows.
For example, to process the first row, we randomly select one item from ('a','b','c') to put into "D". Let's say the system picked the 'c' in the column "C", so the value in "D" for this row becomes ['c'] and 'abcde' gets updated to [1,3,1,1,1] (before was [1,3,2,1,1]) for all three rows.
Example data:
#StandardSQL in BigQuery
#code to generate the example table
with sample as (
select 'y1' as A, 'x1' as B, ['a','b','c'] as C, [] as D, 1 as E, [1,3,2,1,1] as abcde union all
select 'y1','x2',['a','b'],[],2,[1,3,2,1,1] union all
select 'y1','x3',['c','d','e'],[],3,[1,3,2,1,1])
select * from sample order by B
After the first row is processed:
with sample as (
select 'y1' as A, 'x1' as B, ['a','b','c'] as C, ['c'] as D, 1 as E, [1,3,1,1,1] as abcde union all
select 'y1','x2',['a','b'],[],2,[1,3,1,1,1] union all
select 'y1','x3',['c','d','e'],[],3,[1,3,1,1,1])
select * from sample order by B
After the second row is processed:
with sample as (
select 'y1' as A, 'x1' as B, ['a','b','c'] as C, ['c'] as D, 1 as E, [0,2,1,1,1] as abcde union all
select 'y1','x2',['a','b'],['a','b'],2,[0,2,1,1,1] union all
select 'y1','x3',['c','d','e'],[],3,[0,2,1,1,1])
select * from sample order by B
After the third row is processed:
with sample as (
select 'y1' as A, 'x1' as B, ['a','b','c'] as C, ['c'] as D, 1 as E, [0,2,0,0,0] as abcde union all
select 'y1','x2',['a','b'],['a','b'],2,[0,2,0,0,0] union all
select 'y1','x3',['c','d','e'],['c','d','e'],3,[0,2,0,0,0])
select * from sample order by B
Don't worry about how the UDF will do the random selection. I'm just wondering, if it's possible in BigQuery to do the task to update the column 'abcde' in the way I want?
I've tried using UDFs, but I'm struggling to get it working because my understanding of a UDF is that it can only take one row in and produce multiple rows out. So, I can't update the other rows. Is it possible just using SQL?
Expected output:
After the first row is processed:
After the third row is processed:
Additional information:
create temporary function selection(A string, B string, C ARRAY<STRING>, D ARRAY<STRING>, E INT64, abcde ARRAY<INT64>)
returns STRUCT< A stRING, B string, C array<string>, D array<string>, E int64, abcde array<int64>>
language js AS """
/*
for the row i in the data:
select the number i.E of items (randomly) from i.C where the numbers associated with the item in i.abcde is bigger than 0 (i.e. only the items with numbers in abcde bigger than 0 can be the cadidates for the random selection);
put the selected items in i.D and deduct the amount of selected items from the number for the corresponding item in the column 'abcde' FOR ALL ROWS;
proceed to the next row i+1 until every row is processed;
*/
return {A,B,C,D,E,abcde}
""";
with sample as (
select 'y1' as A, 'x1' as B, ['a','b','c'] as C, CAST([] AS ARRAY<STRING>) as D, 1 as E, [1,3,2,1,1] as abcde union all
select 'y1','x2',['a','b'],[],2,[1,3,2,1,1] union all
select 'y1','x3',['c','d','e'],[],2,[1,3,2,1,1])
select selection(A,B,C,D,E,abcde) from sample order by B
Below is for BigQuery Standard SQL
#StandardSQL
WITH sample AS (
SELECT 'y1' AS A, 'x1' AS B, ['a','b','c'] AS C, ['c'] AS D, 1 AS E, [1,3,2,1,1] AS abcde UNION ALL
SELECT 'y1','x2',['a','b'],['a','b'],2,[1,3,2,1,1] UNION ALL
SELECT 'y1','x3',['c','d','e'],['c','d','e'],3,[1,3,2,1,1] UNION ALL
SELECT 'y2' AS A, 'x1' AS B, ['a','b','c'] AS C, ['a','b'] AS D, 2 AS E, [1,3,2,1,1] AS abcde UNION ALL
SELECT 'y2','x2',['a','b'],['b'],1,[1,3,2,1,1] UNION ALL
SELECT 'y2','x3',['c','d','e'],['d','e'],2,[1,3,2,1,1]
),
counts AS (
SELECT A AS AA, dd, COUNT(1) AS cnt
FROM sample, UNNEST(D) AS dd
GROUP BY AA, dd
),
processed AS (
SELECT A, B, ARRAY_AGG(aa - IFNULL(cnt, 0) ORDER BY pos) AS abcde
FROM sample, UNNEST(abcde) AS aa WITH OFFSET AS pos
LEFT JOIN counts ON A = counts.AA
AND CASE dd
WHEN 'a' THEN 0
WHEN 'b' THEN 1
WHEN 'c' THEN 2
WHEN 'd' THEN 3
WHEN 'e' THEN 4
END = pos
GROUP BY A, B
)
SELECT s.A, s.B, s.C, s.D, s.E, p.abcde
FROM sample AS s
JOIN processed AS p
USING (A, B)
-- ORDER BY A, B
Don't worry about how the UDF will do the random selection
So, as you can see - I just put "random" values into sample data to mimic D

Over partition by 2 columns SQL Server

I took one of my previous exercises and added some complexity to it.
You can find my original problem under: select case with "over partition by"
The scenario: (using SQL Server 2012)
create table #testing (b varchar (20), a date, c int, e int)
insert into #testing (b,a,c,e)
values
('xf_1m','2015-03-02','1','3'),
('xf_3m','2015-03-02','2','5'),
('xf_5y','2015-03-02','4','2'),
('xf_10y','2015-03-02','3','6'),
('ay_10y','2015-03-02','7','2'),
('adfe_1m','2015-03-02','2','5'),
('xm_1m','2013-02-01','7','2'),
('xf_15y','2013-02-01','1','8'),
('xf_20y','2013-02-01','10','1')
After using this query:
select
b, a, c, e,
substring (b, 1, CHARINDEX ('_', b) - 1) rnc,
substring(b, CHARINDEX('_', b) + 1, LEN (b)) rnb,
case
when b like 'xf%' then --
(sum(c * e) over (partition by a )) end as sumProduct
into #testing2
from #testing
select
*,
case
when b like 'xf%' then --
(sum(c * e) over (partition by a )) end as sumProduct
into #testing3
from #testing2
select *
from #testing3
I am getting this:
Only that now I want to calculate the sumProduct partitioned by rnc and date (column a).
How to do this? I tried with group by, but i'm having troubles with the unequal number from the select and the number of items i'm grouping by.
So, I'd like to re-write somehow like this:
(sum(c * e) over (partition by a and partition by rnc )) as sumProduct
I'm not sure why you are using temporary tables, but you can partition by multiple columns just by including them in the partition by list:
select *,
(case when b like 'xf%'
then sum(c * e) over (partition by a, rnd )
end) as sumProduct
into #testing3
from #testing2;

Can I add multiple columns to Totals

Using MS SQL 2012
I want to do something like
select a, b, c, a+b+c d
However a, b, c are complex computed columns, lets take a simple example
select case when x > 4 then 4 else x end a,
( select count(*) somethingElse) b,
a + b c
order by c
I hope that makes sense
You can use a nested query or a common table expression (CTE) for that. The CTE syntax is slightly cleaner - here it is:
WITH CTE (a, b)
AS
(
select
case when x > 4 then 4 else x end a,
count(*) somethingElse b
from my_table
)
SELECT
a, b, (a+b) as c
FROM CTE
ORDER BY c
I would probably do this:
SELECT
sub.a,
sub.b,
(sub.a + sub.b) as c,
FROM
(
select
case when x > 4 then 4 else x end a,
(select count(*) somethingElse) b
FROM MyTable
) sub
ORDER BY c
The easiest way is to do this:
select a,b,c,a+b+c d
from (select <whatever your calcs are for a,b,c>) x
order by c
That just creates a derived table consisting of your calculations for a, b, and c, and allows you to easily reference and sum them up!