Trying to replicate a '=countifs' function from excel to SQL - sql

I have an excel file with a table that looks like this:
A B C
Registry ID Parent Reg ID Focus Account (Y/N)
100000033 100000036778 Y
100000343 1000 Y
1000343223 100000036778 N
And the formula is on the column D (Focus Parent): =IF(COUNTIFS(C:C,"Y",B:B,B)>=1,"Y","N")
So on the column D the formula returns 'Y' for each row.
I've tried to replicate this in SQL with the following code:
SELECT
REGISTRY_ID,
PARENT_REG_ID,
FOCUS_ACCOUNT,
SCORE_DETAILS,
(CASE
WHEN FOCUS_ACCOUNT = 'Y' THEN
(CASE
WHEN COUNT(PARENT_REG_ID) >= 1 THEN 'Y'
ELSE 'N'
END)
ELSE 'N'
END) AS Focus_Parent
FROM MA_ACCOUNTS
But this query returns this error:
ORA-00937: not a single-group group function
Can you please advise?
Later edit:
Let me clarify this: I have a list with unique Registry_IDs that contain a Parent_Registry_ID. A Parent_Registry_ID can have multiple Registry_ID but if a Registry_ID is marked as ‘Y’ in the Column Focus_Account then that Parent_Registry_ID should have ‘Y’ in the column Focus_Parent.
Registry ID Parent Reg ID Focus Account (Y/N)
1 A N
2 B N
3 A Y
4 C Y
5 A N
6 B Y
7 A N
8 D Y
9 E N
10 E N
Expected outcome:
Registry ID Parent Reg ID Focus Account (Y/N) Focus Parent (Y/N)
1 A N Y
2 B N Y
3 A Y Y
4 C Y Y
5 A N Y
6 B Y Y
7 A N Y
8 D Y Y
9 E N N
10 E N N

You are using an aggregated count() so Oracle is expecting a GROUP BY clause. However, that would not fit the shape of your result set. Seems like an analytic function would be better?
You have posted a clarification which I think defines this rule:
if any registry_id has focus_account='Y' then set focus_parent = 'Y' for all instances of its parent_reg_id.
If my interpretation is correct you can implement it quite simply with an analytic max():
select
registry_id,
parent_reg_id,
focus_account,
max( focus_account ) over (partition by parent_reg_id) as focus_parent
from ma_accounts
This works because focus_account is a Y/N flag. Certainly the above query produces your revised result set from the posted input data.

Your using an aggregate method in the select section, but your not grouping at the end for the other selected variables.
Try:
SELECT
REGISTRY_ID,
PARENT_REG_ID,
FOCUS_ACCOUNT,
SCORE_DETAILS,
CASE WHEN COUNT(PARENT_REG_ID) >= 1 AND FOCUS_ACCOUNT = 'Y' THEN 'Y'
ELSE 'N' END AS Focus_Parent
FROM MA_ACCOUNTS
GROUP BY REGISTRY_ID,
PARENT_REG_ID,
FOCUS_ACCOUNT,
SCORE_DETAILS

Related

Multicolumn order by a tuple

Lets supose I have a tabla A like:
bisac1
bisac2
bisac3
desire
x
y
z
10
y
z
x
8
z
y
x
6
x
y
p
20
r
y
z
13
x
s
z
1
a
y
l
12
a
x
k
2
x
p
w
1
I would like to be able to count the number of times any of these elements (x,y,z) appears in the cols (bisac1,bisac2,bisac3).
So, the expected result should be 3 for the first 3 rows, 2 for the next 3 and 1 for the last 3.
Seems the following should do what you require?
select
case when bisac1 in ('x','y','z') then 1 else 0 end +
case when bisac2 in ('x','y','z') then 1 else 0 end +
case when bisac3 in ('x','y','z') then 1 else 0 end
from t;
You can also use one case per letter instead of one case per column (Stu's approach). The result will be the same for your sample data:
SELECT
CASE WHEN 'x' IN (bisac1, bisac2, bisac3) THEN 1 ELSE 0 END +
CASE WHEN 'y' IN (bisac1, bisac2, bisac3) THEN 1 ELSE 0 END +
CASE WHEN 'z' IN (bisac1, bisac2, bisac3) THEN 1 ELSE 0 END
FROM yourtable;
The result will not be the same if the same letter occurs in different columns, For example, if your row looks like this:
bisac1
bisac2
bisac3
x
y
y
Then Stu's query will produce 3 as result, my query here 2. From your description, it is unclear to me if your sample data can contain such rows at all or if the two queries will always create the same result for your data.
And even if your data can include such rows, it's still unclear to me whether you want to get 3 or 2 as result.
So, summarized, it's up to you what exactly you want to use here.

Setting value based on values in previous columns of the same table ? (Oracle)

I'm creating a new table and carrying over several columns from a previous table. One of the new fields that I need to create is a flag that will have values 0 or 1 and value needs to be determined based on 6 previous fields in the table.
The 6 previous columns have preexisting values of 'Y' (yes) or 'N' (no) stored for each one. This new field needs to check whether any of the 6 columns have Y and if so set the flag to 0. If there is N in all 6 fields then set itself to 1.
I was told to try and use greatest function but it doesnt seem to solve the problem that I have
select greatest('N','Y','N','Y') gr from dual;
Indeed GREATEST() does the trick.
For example:
select t.*,
case when greatest(f1, f2, f3, f4, f5, f6) = 'Y' then 0 else 1 end as x
from t
Result:
F1 F2 F3 F4 F5 F6 X
--- --- --- --- --- --- -
N N N N N N 1
N N N Y N N 0
Y Y Y Y Y Y 0
See running example at db<>fiddle.
maybe try decode: here are some hints.
decode( col1, 'N', 0, 1 )
this will give you a 0 if N otherwise a 1. then add these up.
decode( col1, 'N', 0, 1 ) + decode( col2, 'N', 0, 1 ) + ...
then if > 0 you know at least one was Y
Rather than comparing the columns with values, Yu can compare the value with columns. Like -
SELECT T.*, CASE WHEN 'N' IN (A, B, C, D, E, F) THEN 1 ELSE 0 END AS NEW_FIELD
FROM T;

Hive Query to select lines that meet multiple criteria

I have a table that looks something like this (Column 1 is a URL, column 2 is an action ID, and column 3 is a user ID):
1 2 3
===========
d x a
d q a
e y a
f z a
f z b
d i b
e x b
d i c
g q c
o q c
f q c
I'm trying to check and see if there are any rows where col1 = 'f'.
If col1 = 'f', I need to get the userID from col3 then check all rows where col3 = userID to see if there are any rows where col2 = 'x'.
If there are any userIDs that have a row where col1 = 'f' and a row where col2 = 'x', return all rows that have userID in col3
I'm a hive/sql noob, but here is some python code that i think would accomplish what I'm trying to do...
df = pd.DataFrame(table)
df2 = df[df['1'].str.contains('f')]
df2['check'] = df2['2'].str.contains('x')
ids = df2[df2['check']]
df = df[df['3'].isin(ids)]
The result of my desired query would return
1 2 3
===========
d x a
d q a
e y a
f z a
f z b
d i b
e x b
So far the closest I've gotten is this:
SELECT * FROM log AS a
WHERE a.3 in
(
SELECT DISTINCT 3
FROM log
WHERE ((to_date(log_date)) >= (date_sub(current_date, 1)))
AND 1 = 'f'
)
This gets me half way there, but it's not filtering on col2 and takes an extraordinarily long time to run, which can cause it to fail in my environment.
Is there a way to accomplish this using only Hive / Spark? I really don't want to have to download this file and run a python script on it, as it is several GB and my office wifi is slow :(
Get all userids where url = 'f'.This will give you (a,b)
Use that to check userid for actionid='x'.This will give you (a,b)
Finally get all rows with userid from the above.
select * from log where userid in
(
select distinct userid from log
where
actionid ='x' and
userid in (select distinct userid from log where URL='f')
)

SQL - subtract value from same column

I have a table as follows
fab_id x y z m
12 14 10 3 5
12 10 10 3 4
Here im using group by clause on id .Now i want to subtract those column values which have similar id.
e.g group by on id (12). Now to subtract (14-10)X, (10-10)Y, (3-3)z, (5-4)m
I know there is a aggregate function sum for addition but is there any function which i can use to subtract this value.
Or is there any other method to achieve the results.
Note- There may be a change that value may come in -ve. So any function handle this?
one more example - (order by correction_date desc so result will show recent correction first)
fab_id x y z m correction_date
14 20 12 4 4 2014-05-05 09:03
14 24 12 4 3 2014-05-05 08:05
14 26 12 4 6 2014-05-05 07:12
so result to achieve group by on id (14). Now to subtract (26-20)X, (12-12)Y, (4-4)z, (6-4)m
Now, that you have given more information on how to deal with more records and that you revealed that there is a time column involved, here is a possible solution. The query selects the first and last record per fab_id and subtracts the values:
select
fab_info.fab_id,
earliest_fab.x - latest_fab.x,
earliest_fab.y - latest_fab.y,
earliest_fab.z - latest_fab.z,
earliest_fab.m - latest_fab.m
from
(
select
fab_id,
min(correction_date) as min_correction_date,
max(correction_date) as max_correction_date
from fab
group by fab_id
) as fab_info
inner join fab as earliest_fab on
earliest_fab.fab_id = fab_info.fab_id and
earliest_fab.min_correction_date = fab_info.min_correction_date
inner join fab as latest_fab on
latest_fab.fab_id = fab_info.fab_id and
latest_fab.min_correction_date = fab_info.max_correction_date;
Provided you always want to subtract the least value from the greatest value:
select
fab_id,
max(x) - min(x),
max(y) - min(y),
max(z) - min(z),
max(m) - min(m)
from fab
group by fab_id;
Seeing as you say there will always be two rows, you can simply do a 'self join' and subtract the values from each other:
SELECT t1.fab_id, t1.x - t2.x as diffx, t1.y - t2.y as diffy, <remainder columns here>
from <table> t1
inner join <table> t2 on t1.fab_id = t2.fab_id and t1.correctiondate > t2.correctiondate
If you have more than two rows, then you'll need to make subqueries or use window ranking functions to figure out the largest and smallest correctiondate for each fab_id and then you can do the very same as above by joining those two subqueries together instead of
Unfortunately, it's SQL Server 2012 that has the handy FIRST_VALUE()/LAST_VALUE() OLAP functions, so in the case of more than 2 rows we have to do something a little different:
SELECT fab_id, SUM(CASE WHEN latest = 1 THEN -x ELSE x END) AS x,
SUM(CASE WHEN latest = 1 THEN -y ELSE y END) AS y,
SUM(CASE WHEN latest = 1 THEN -z ELSE z END) AS z,
SUM(CASE WHEN latest = 1 THEN -m ELSE m END) AS m
FROM (SELECT fab_id, x, y, z, m,
ROW_NUMBER() OVER(PARTITION BY fab_id
ORDER BY correction_date ASC) AS earliest,
ROW_NUMBER() OVER(PARTITION BY fab_id
ORDER BY correction_date DESC) AS latest
FROM myTable) fab
WHERE earliest = 1
OR latest = 1
GROUP BY fab_id
HAVING COUNT(*) >= 2
(and working fiddle. Thanks to #AK47 for the initial setup.)
Which yields the expected:
FAB_ID X Y Z M
12 4 0 0 1
14 6 0 0 2
Note that HAVING COUNT(*) >= 2 is so that only rows with changes are considered (you'd get some null result columns otherwise).
;with Ordered as
(
select
fab_id,x,y,z,m,date,
row_Number() over (partition by fab_id order by date desc) as Latest,
row_Number() over (partition by fab_id order by date) as Oldest
from fab
)
select
O1.fab_id,
O1.x-O2.x,
O1.y-O2.y,
O1.z-O2.z,
O1.m-O2.m
from Ordered O1
join Ordered O2 on
O1.fab_id = O2.fab_id
where O1.latest = 1 and O2.oldest = 1
I think if you have consistent set or two rows, then following code should work for you.
select fab_id ,max(x) - min(x) as x
,max(y) - min(y) as y
,max(z) - min(z) as z
,max(m) - main(m) as m
from Mytable
group by fab_id
It will work, even if you get more than 2 rows in a group, but subtraction will be from max value of min value. hope it helps you.
EDIT : SQL Fiddle DEMO
A CTE could help:
WITH cte AS (
SELECT
-- Get the row numbers per fab_id ordered by the correction date
ROW_NUMBER() OVER (PARTITION BY fab_id ORDER BY correction_date ASC) AS rid
, fab_id, x, y, z, m
FROM
YourTable
)
SELECT
fab_id
-- If the row number is 1 then, this is our base value
-- If the row number is not 1 then, we want to subtract it (or add the negative value)
, SUM(CASE WHEN rid = 1 THEN x ELSE x * -1 END) AS x
, SUM(CASE WHEN rid = 1 THEN y ELSE y * -1 END) AS y
, SUM(CASE WHEN rid = 1 THEN z ELSE z * -1 END) AS z
, SUM(CASE WHEN rid = 1 THEN m ELSE m * -1 END) AS m
FROM
cte
GROUP BY
fab_id
Remember, 40-10-20 equals to 40 + (-10) + (-20)

SQL Counting the number of occurence based on a subject

I find it hard to word what I am trying to achieve. I have a table that looks like this:
user char
---------
a | x
a | y
a | z
b | x
b | x
b | y
c | y
c | y
c | z
How do I write a query that would return me the following result?
user x y z
-------
a |1|1|1|
b |2|1|0|
c |0|2|1|
the numbers represent the no of occurences of chars in the original table
EDIT:
The chars values are unknown hence the solution cannot be restricted to these values. Sorry for not mentioning it sooner. I am using Oracle DB but planning to use JPQL to construct the query.
select user,
sum(case when char='x' then 1 else 0 end) as x,
sum(case when char='y' then 1 else 0 end) as y,
sum(case when char='z' then 1 else 0 end) as z
from thetable
group by user
Or, if you don't mind stacking vertically, this solution will give you a solution that works even with unknown sets of characters:
select user, char, count(*) as count
from thetable
group by user, char
This will give you:
user char count
a x 1
a y 1
a z 1
b x 2
If you want to string an unknown set of values out horizontally (as in your demo output), you're going to need to get into dynamic queries... the SQL standard is not designed to generate output with an unknown number of columns... Hope this is helpful!
Another option, using T-SQL PIVOT (SQL SERVER 2005+)
select *
from userchar as t
pivot
(
count([char]) for [char] in ([x],[y],[z])
) as p
Result:
user x y z
----------- ----------- ----------- -----------
a 1 1 1
b 2 1 0
c 0 2 1
(3 row(s) affected)
Edit ORACLE:
You can build a similar PIVOT table using ORACLE.
The tricky part is that you need the right column names in the IN ([x],[y],[z],...) statement. It shouldn't be too hard to construct the SQL query in code, getting a (SELECT DISTINCT [char] from table) and appending it to your base query.
Pivoting rows into columns dynamically in Oracle
If you don't know the exact values on which to PIVOT, you'll either need to do something procedural or mess with dynamic sql (inside an anonymous block), or use XML (in 11g).
If you want the XML approach, it would be something like:
with x as (
select 'a' as usr, 'x' as val from dual
union all
select 'a' as usr, 'y' as val from dual
union all
select 'b' as usr, 'x' as val from dual
union all
select 'b' as usr, 'x' as val from dual
union all
select 'c' as usr, 'z' as val from dual
)
select * from x
pivot XML (count(val) as val_cnt for val in (ANY))
;
Output:
USR VAL_XML
a <PivotSet><item><column name = "VAL">x</column><column name = "VAL_CNT">1</column></item><item><column name = "VAL">y</column><column name = "VAL_CNT">1</column></item></PivotSet>
b <PivotSet><item><column name = "VAL">x</column><column name = "VAL_CNT">2</column></item></PivotSet>
c <PivotSet><item><column name = "VAL">z</column><column name = "VAL_CNT">1</column></item></PivotSet>
Hope that helps