SQL Downselect query results based on field duplication - sql

I'm looking for a little help with a SQL query. (I am using Oracle.)
I have a query that is a union of 2 differing select statments. The resulting data looks like the following:
Col1 Col2 Col3
XXX ValA Val1
XXX ValB Val2
YYY ValA Val1
YYY ValA Val2
In this setup the Col1 = XXX are default values and Col1 = YYY are real values. Real values (YYY) should take precidence over default values. The actual values are defined via columns 2 and 3.
I'm looking to downselect those results into the following:
Col1 Col2 Col3
XXX ValB Val2
YYY ValA Val1
YYY ValA Val2
Notice that the first row was removed ... that's because there is a real value (YYY in row 3) took precidence over the default value (XXX).
Any thoughts on how to approach this?

You want to filter out all the rows where col2 and col3 appear with XXX and with another value.
You can implement this filter by doing appropriate counts in a subquery using the analytic functions:
select col1, col2, col3
from (select t.*,
count(*) over (partition by col2, col3) as numcombos,
sum(case when col1 = 'XXX' then 1 else 0 end) over (partition by col2, col3) as numxs
from t
) t
where numcombos = numxs or (col1 <> 'xxx')

My instinct is to use an analytic function:
select distinct
first_value(col1)
over (partition by col2, col3
order by case col1
when 'XXX' then 1
else 0 end asc) as col1,
col2,
col3
from table1
However, if the table is large and indexed, it might be better to solve this with a full outer join (which is possible because there are only two possible values):
select coalesce(rl.col1, dflt.col1) as col1,
coalesce(rl.col2, dflt.col2) as col2,
coalesce(rl.col3, dflt.col3) as col3
from (select * from table1 where col1 = 'XXX') dflt
full outer join (select * from table1 where col1 <> 'XXX') rl
on dflt.col2 = rl.col2 and dflt.col3 = rl.col3;
[Solution in SQLFiddle]

I think you could use a trick like this:
select
case when
max(case when col1<>'XXX' then col1 end) is null then 'XXX' else
max(case when col1<>'XXX' then col1 end) end as col1,
col2,
col3
from
your_table
group by col2, col3
I transform default value to null, then i group by col3. The maximum value between null and a value is the value you are looking for. This works on your example data, but it might not be exactly what you are looking for, it depends on how your real data is.

Related

How to use collec_set in hive to generate 0 for null

We have requirement to collect the values for all the different transection and display in "|" delimited and display 0 for the not available merchant.
table_1
col1
col2
col3
129867
paytm
4
18945
paytm
5
129867
payzap
6
18945
payzap
4
456312
paytm
3
we need to read the table1 and transform it into tabl2 as given below:
table_2
col1
col2
129857
4l6
18945
5l4
456312
3l0
suppose we have two merchant i.e paytm and payzap, how to achieve this in hive.
I have tried like:
SELECT col1,
Nvl(Concat_ws('|', Collect_set(col3)), 0) AS col2
FROM table_1
GROUP BY col1;
but I am not getting desired result.
If you using sql server then use string_agg
SELECT col1,
CASE
WHEN col2 NOT LIKE '%|%' THEN Concat(col2, '|0')
ELSE col2
END AS col2
FROM (SELECT col1,
String_agg(col3, '|') col2
FROM table
GROUP BY col1) B
In Hivesql I have provided you with syntax
SELECT col1,
CASE
WHEN col2 NOT LIKE '%|%' THEN Concat_ws(col2, '|0')
ELSE col2
END AS col2
FROM (SELECT col1,
Concat_ws('|', Collect_set(col3)) AS col2
FROM table_1
GROUP BY col1) A

How to get min value from multiple columns for a row in SQL

I need to get to first (min) date from a set of 4 (or more) columns.
I tried
select min (col1, col2, col3) from tbl
which is obviouslly wrong.
let's say I have these 4 columns
col1 | col2 | col3 | col4
1/1/17 | 2/2/17 | | 3/3/17
... in this case what I want to get is the value in col1 (1/1/17). and Yes, these columns can include NULLs.
I am running this in dashDB
the columns are Date data type,
there is no ID nor Primary key column in this table,
and I need to do this for ALL rows in my query,
the columns are NOT in order. meaning that col1 does NOT have to be before col2 or it has to be null AND col2 does NOT have to be before col3 or it has to be NULL .. and so on
If your DB support least function, it is the best approach
select
least
(
nvl(col1,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col2,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col3,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col4,TO_DATE('2901-01-01','YYYY-MM-DD'))
)
from tbl
Edit: If all col(s) are null, then you can hardcode the output as null. The below query should work. I couldn't test it but this should work.
select
case when
least
(
nvl(col1,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col2,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col3,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col4,TO_DATE('2901-01-01','YYYY-MM-DD'))
)
= TO_DATE('2901-01-01','YYYY-MM-DD')
then null
else
least
(
nvl(col1,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col2,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col3,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col4,TO_DATE('2901-01-01','YYYY-MM-DD'))
)
end
as min_date
from tbl
If a id column in your table. Then
Query
select t.id, min(t.col) as min_col_value from(
select id, col1 as col from your_table
union all
select id, col2 as col from your_table
union all
select id, col3 as col from your_table
union all
select id, col4 as col from your_table
)t
group by t.id;
If you want the first date, then use coalesce():
select coalesce(col1, col2, col3, col4)
from t;
This returns the first non-NULL value (which is one way that I interpret the question). This will be the minimum date, if the dates are in order.
Select Id, CaseWhen (Col1 <= Col2 OR Col2 is null) And (Col1 <= Col3 OR Col3 is null) Then Col1 When (Col2 <= Col1 OR Col1 is null) And (Col2 <= Col3 OR Col3 is null) Then Col2 Else Col3 End As Min From YourTable
This is for 3 Column, Same way you can write for 4 - or more column.

How to use and "in" clause in "having" in HIVE?

I have my data in sometable like this:
col1 col2 col3
A B 3
A B 1
A B 2
C B 1
And I want to get all of the unique groups of col1 and col2 that contain certain rows of col3. Like, all groups of col1 and col2 that contain a "2".
I wanted to do something like this:
select col1, col2 from sometable
group by col1, col2
having col3=1 and col3=2
But I want it to only return groups that have an instance of both 1 and 2 in col3. so, the result after the query should return this:
col1 col2
A B
How do I express this in HIVE? THANK YOU.
I don't know why others deleted answers that where correct and then almost correct but I will put their's back up.
SELECT col1, col2, COUNT(DISTINCT col3)
FROM
sometable
WHERE
col3 IN (1,2)
GROUP BY col1, col2
HAVING
COUNT(DISTINCT col3) > 1
If you actually want to return all of the records that meet your criteria you need to do a sub select and join back to the main table to get them.
SELECT s.*
FROM
sometable s
INNER JOIN (
SELECT col1, col2, COUNT(DISTINCT col3)
FROM
sometable
WHERE
col3 IN (1,2)
GROUP BY col1, col2
HAVING
COUNT(DISTINCT col3) > 1
) t
ON s.Col1 = t.Col1
AND s.Col2 = t.Col2
AND s.col3 IN (1,2)
The gist of this is narrow/filter your rowset to the rows that you want to test col3 IN (1,2) then count the DISTINCT values of col3 to make sure both 1 and 2 exist and not just 1 & 1 or 2 & 2.
I think below mentioned query will be useful for your question.
select col1,col2
from Abc
group by col1,col2
having count(col1) >1 AND COUNT(COL2)>2

Dynamic Group By in a Query

Is there a way to apply or not a group by into a query? for example, I have this:
Col1 Col2 Col3
A 10 X
A 10 NULL
B 12 NULL
B 12 NULL
I have to group by Col1 and Col2 only when I have a value in Col3, if Col3 is null, I don't need to group it. The result should be:
Col1 Col2
A 20
B 12
B 12
Maybe is not an elegant example, but this is the idea.
Thank you.
Here's a SQL Fiddle that does what you want:
http://sqlfiddle.com/#!3/b7f07/2
Here's the SQL itself:
SELECT col1, sum(col2) as col2 FROM dataTable WHERE
col1 in (SELECT col1 from dataTable WHERE col3 IS NOT NULL)
GROUP BY col1
UNION ALL
SELECT col1, col2 FROM dataTable WHERE
(col1 not in
(SELECT col1 from dataTable WHERE col3 IS NOT NULL and col1 is not null))
It sounds like you want all unique values of col1 when col3 is not null. Otherwise, you want all values of col1.
Assuming you have a SQL engine that supports window functions, you can do this as:
select col1, sum(col2)
from (select t.*,
count(col3) over (partition by col1) as NumCol3Values,
row_number() over (partition by col1 order by col1) as seqnum
from t
) t
group by col1,
(case when NumCol3Values > 1 then NULL else seqnum end)
The logic is pretty much as you state it. If there is any non-NULL value, then the second clause of the group by always evaluates to NULL -- everything goes in the same group. If things are all NULL, then the clause evaluates to a sequence number, which puts each values on a separate row.
This is a bit more difficult without window functions. If I assume that the minimum value of column 3 (when not NULL) is unique, then the following would work:
select t.col1,
(case when minCol3 is null then tsum.col2 else t.col2 end) as col2
from t left outer join
(select col1, sum(col2) as col2,
min(col3) as minCol3
from t
) tsum
on t.col1 = tsum.col1
where minCol3 is NULL or t.col3 = MinCol3
re: Is there a way to apply or not a group by into a query?
Not directly, but you can break it down by groupings and then UNION the results together.
Does this work?
Select col1, sum(col2)
from table
group by col1, col2
having max(col3) is not null
union all
select col1, col2
from table t left outer join
(Select col1, col2
from table
group by col1, col2
having max(col3) is not null) g
where g.col1 is null

select all columns with one column has different value

In my table,some records have all column values are the same, except one. I need write a query to get those records. what's the best way to do it? the table is like this:
colA colB colC
a b c
a b d
a b e
What's the best way to get all records with all the columns? Thanks for everyone's help.
Assuming you know that column3 will always be different, to get the rows that have more than one value:
SELECT Col1, Col2
FROM Table t
GROUP BY Col1, Col2
HAVING COUNT(distinct col3) > 1
If you need all the values in the three columns, then you can join this back to the original table:
SELECT t.*
FROM table t join
(SELECT Col1, Col2
FROM Table t
GROUP BY Col1, Col2
HAVING COUNT(distinct col3) > 1
) cols
on t.col1 = cols.col1 and t.col2 = cols.col2
Just select those rows that have the different values:
SELECT col1, col2
FROM myTable
WHERE colWanted != knownValue
If this is not what you are looking for, please post examples of the data in the table and the wanted output.
How about something like
SELECT Col1, Col2
FROM Table
GROUP BY Col1, Col2
HAVING COUNT(*) = 1
This will give you Col1, Col2 that have unique data.
Assuming col3 has the difs
SELECT Col1, Col2
FROM Table
GROUP BY Col1, Col2
HAVING COUNT(*) > 1
OR TO SHOW ALL 3 COLS
SELECT Col1, Col2, Col3
FROM Table1
GROUP BY Col1, Col2, Col3
HAVING COUNT(Col3) > 1