one sql (oracle) query for getting unique information that has two different (null and not null) values per column - sql

Table foobar is, for clarity, structured and has data as follows:
id, action_dt, status_id
1, '02-JUL-10', 'x'
1, '02-JUL-10', '2'
1, '02-JUL-10', NULL
2, '02-JUL-10', 'a'
2, '02-JUL-10', 'b'
3, '02-JUL-10', 'k'
3, '02-JUL-10', NULL
3, '03-JUL-10', 'k'
3, '03-JUL-10', NULL
I need a query that gets IDs such that for each ID a NULL value and a NOT NULL value exists per day. So, in the example dataset above, the query needs to return:
'02-JUL-10', 1
'02-JUL-10', 3
'03-JUL-10', 3
Yes, it can be done using something like:
SELECT
nulls.action_dt
, nulls.id
FROM (SELECT
action_dt
, id
FROM foobar
WHERE status_id IS NULL
GROUP BY action_dt) nulls
INNER JOIN (SELECT
action_dt
, id
FROM foobar
WHERE status_id IS NOT NULL
GROUP BY action_dt) non_nulls ON nulls.action_dt = non_nulls.action_dt
AND nulls.id = non_nulls.id
but as you can see, among other things, two subqueries and another iteration for the join...
The query I've been working on and have hopes for is of the form:
SELECT
action_dt
, id
FROM
foobar
GROUP BY
action_dt
, id
, CASE WHEN status_id IS NOT NULL THEN 1 ELSE 0 END
HAVING
COUNT(prim_card_nb) > 1
but it doesn't quite return what I need (as you know, the HAVING clause applies to the underlying data that is being queried). Any ideas?
After all this, it seems a solution would be to have the above query in a subquery and filter it down that way, such as:
SELECT
action_dt
, id
FROM (SELECT
action_dt
, id
FROM
foobar
GROUP BY
action_dt
, id
, CASE WHEN status_id IS NOT NULL THEN 1 ELSE 0 END
) repeat_ids_per_day
GROUP BY
action_dt
, id
HAVING
COUNT(id) > 1
but I feel it can be better...

Your idea is sound: in such a case you don't need a subquery, an aggregate is sufficient and should be more efficient. This should work:
SQL> SELECT action_dt, id
2 FROM foobar
3 GROUP BY action_dt, ID
4 HAVING COUNT(DISTINCT CASE WHEN status_id IS NULL THEN 1 ELSE 0 END) > 1;
ACTION_DT ID
--------- ----------
02-JUL-10 1
02-JUL-10 3
03-JUL-10 3

I think you have to do some minor changes in your first posted query
as below -
SELECT
nulls.action_dt, nulls.id
FROM
(SELECT
action_dt
, id
FROM foobar
WHERE status_id IS NULL
GROUP BY action_dt,id
uniou all
SELECT
action_dt
, id
FROM foobar
WHERE status_id IS NOT NULL
GROUP BY action_dt,id)
group by action_dt, id
having count(*) >1
what you have posted there is not a correct, as in oracle database..
you can't include not grouped column name while selecting..
so please check that .. it could be your mistake .. and may be it was couse of problem..

Related

Grouping by id and looking at another column in a particular order to see if the id group satisfies a particular condition

customer_id
transaction success
1
Failed
2
Complete
1
Failed
1
Complete
3
Failed
2
Failed
3
Complete
3
Failed
3
Failed
3
Complete
Essentially I want to write a statement to identify if the customer has had a completed transaction after having had a failed transaction sometime before. So in this example, customer 1 and customer 2 would be satisfy this. Assume that there is an added timestamp column next to transaction success.
The resulting table should look like this:
customer_id
returning_success
1
True
2
False
3
True
Assuming that is not important if the Complete was after or prior to the Cancellation, you can LEFT JOIN the table with a subquery that only takes the completes. If the result is NULL, then is not have a complete state. Otherwise is true.
As you don't provide your DBMS (Please read: Why should I "tag my RDBMS"?) we take in consideration IFNULL but this can change in other DBMS: https://www.w3schools.com/sql/sql_isnull.asp
SELECT
yt.customer_id,
IFNULL(completes.customer_id,'false','true') as returning_success
FROM
yourtable yt
LEFT JOIN
(
SELECT
customer_id
FROM
yourTable
WHERE transaction_success = 'Complete') completes
ON completes.customer_id = yt.customer_id
 If you just need customers that had had both succesfull and faild transactions, you should implement this:
select customer_id, case when sum(case
when transaction='Faild'
then 1
else 0 end)>0
and
sum(case
when transaction='Complete'
then 1
else 0 end)>0
then 'True'
else 'False' end
returning_success
from table_
group by customer_id
 If you actually do have some timestamp column:
select nvl(c.customer_id, f.customer_id) customer_id
, case when last_complete_time is null
or first_fail_time is null
or first_fail_time>last_complete_time
then 'False'
else 'True' end
returning_success
from (
select customer_id, max(time_) last_complete_time
from table_
group by customer_id
where transaction='Complete'
) c
full join (
select customer_id, min(time_) first_fail_time
from table_
group by customer_id
where transaction='Fail'
) f on c.customer_id=f.customer_id
 You also can use this query to filter all True cases and then just union or join the rest:
select f.customer_id, 'True'
from (
select customer_id, max(time_) last_complete_time
from table_
group by customer_id
where transaction='Complete'
) c
join (
select customer_id, min(time_) first_fail_time
from table_
group by customer_id
where transaction='Fail'
) f on c.customer_id=f.customer_id
where first_fail_time<last_complete_time

how to avoid duplicates in hive query

I have two tables:
table1
the_date | my_id |
02/03/2021,123
02/03/2021, 1234
02/03/2021, 12345
table2
the_date | my_id |seq | txt
02/03/2021, 1234, 1 , 'OK'
02/03/2021, 12345, 1, 'OK'
02/03/2021, 12345, 2, 'HELLO HI THERE'
02/03/2021, 123456, 1, 'Ok'
Here is my code:
WITH AB AS (
SELECT A1.my_id
FROM DB1.table1 A1 , DB1.MSG_REC A2 WHERE
A1.my_id=A2.my_id
),
BC AS (
SELECT AB.the_date
COUNT ( DISTINCT (CASE WHEN (TXT like '%OK%') THEN AB.my_id ELSE NULL END )) AS
CASE1 ,
COUNT ( DISTINCT (CASE WHEN (TXT like '%HELLO HI THERE%') THEN AB.my_id ELSE NULL END )) AS
CASE2
FROM AB left JOIN DB1.my_id BC ON AB.my_id =BC.my_id
The issue that stems from above is I am looping over the value '12345' twice because it satisfies both of the case statements.
That causes data duplicates when capturing metrics of the counts. Is there a way to execute the first case and then perform the second case but exclude looping any of the "my_id' records from the first case.
So for example, when it is time to run the above script and the first case executes, it will pick up the below records and the count would be 3
02/03/2021, 1234, 1 , 'OK'
02/03/2021, 12345, 1, 'OK'
02/03/2021, 123456, 1, 'Ok
The second case should only be looping through the below records and the count would be only 1
02/03/2021, 12345, 2, 'HELLO HI THERE'
CASE1 would be 4 and CASE2 would by 2 if I don't create a condition to circumvent this issue. Any tips or suggestions?
Assign case to each your ID before DISTINCT aggregation . After that do distinct aggregation, in such way you will eliminate same IDs counted in different cases. See comments in the code:
select --do final distinct aggregation
count(distinct (case when assigned_case='CASE1' then my_id else null end ) ) as CASE1,
count(distinct (case when assigned_case='CASE2' then my_id else null end ) ) as CASE2
from
(
select my_id
--assign single CASE to all rows with the same id based on some logic:
case when case1_flag = 1 then 'CASE1'
when case1_flag = 1 then 'CASE2'
else NULL
end as assigned_case
from
(--calculate all CASE flags for each ID
select AB.my_id,
max(CASE WHEN (TXT like '%OK%') THEN 1 ELSE NULL END) over (partition by AB.my_id) as case1_flag
max(CASE WHEN (TXT like '%HELLO HI THERE%') THEN 1 ELSE NULL END) over (partition by AB.my_id) as case2_flag
from ...
) s
) s

How to Count Distinct on Case When?

I have been building up a query today and I have got stuck. I have two unique Ids that identify if and order is Internal or Web. I have been able to split this out so it does the count of how many times they appear but unfortunately it is not providing me with the intended result. From research I have tried creating a Count Distinct Case When statement to provide me with the results.
Please see below where I have broken down what it is doing and how I expect it to be.
Original data looks like:
Company Name Order Date Order Items Orders Value REF
-------------------------------------------------------------------------------
CompanyA 03/01/2019 Item1 Order1 170 INT1
CompanyA 03/01/2019 Item2 Order1 0 INT1
CompanyA 03/01/2019 Item3 Order2 160 WEB2
CompanyA 03/01/2019 Item4 Order2 0 WEB2
How I expect it to be:
Company Name Order Date Order Items Orders Value WEB INT
-----------------------------------------------------------------------------------------
CompanyA 03/01/2019 4 2 330 1 1
What currently comes out
Company Name Order Date Order Items Orders Value WEB INT
-----------------------------------------------------------------------------------------
CompanyA 03/01/2019 4 2 330 2 2
As you can see from my current result it is counting every line even though it is the same reference. Now it is not a hard and fast rule that it is always doubled up. This is why I think I need a Count Distinct Case When. Below is my query I am currently using. This pull from a Progress V10 ODBC that I connect through Excel. Unfortunately I do not have SSMS and Microsoft Query is just useless.
My Current SQL:
SELECT
Company_0.CoaCompanyName
, SopOrder_0.SooOrderDate
, Count(DISTINCT SopOrder_0.SooOrderNumber) AS 'Orders'
, SUM(CASE WHEN SopOrder_0.SooOrderNumber IS NOT NULL THEN 1 ELSE 0 END) AS 'Order Items'
, SUM(SopOrderItem_0.SoiValue) AS 'Order Value'
, SUM(CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN 1 ELSE 0 END) AS 'INT'
, SUM(CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'WEB%' THEN 1 ELSE 0 END) AS 'WEB'
FROM
SBS.PUB.Company Company_0
, SBS.PUB.SopOrder SopOrder_0
, SBS.PUB.SopOrderItem SopOrderItem_0
WHERE
SopOrder_0.SopOrderID = SopOrderItem_0.SopOrderID
AND Company_0.CompanyID = SopOrder_0.CompanyID
AND SopOrder_0.SooOrderDate > '2019-01-01'
GROUP BY
Company_0.CoaCompanyName
, SopOrder_0.SooOrderDate
I have tried using the following line but it errors on me when importing:
, Count(DISTINCT CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN SopOrder_0.SooParentOrderReference ELSE 0 END) AS 'INT'
Just so know the error I get when importing at the moment is syntax error at or about "CASE WHEN sopOrder_0.SooParentOrderRefer" (10713)
Try removing the ELSE:
COUNT(DISTINCT CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN SopOrder_0.SooParentOrderReference END) AS num_int
You don't specify the error, but the problem is probably that the THEN is returning a string and the ELSE a number -- so there is an attempt to convert the string values to a number.
Also, learn to use proper, explicit, standard JOIN syntax. Simple rule: Never use commas in the FROM clause.
count distinct on the SooOrderNumber or the SooParentOrderReference, whichever makes more sense for you.
If you are COUNTing, you need to make NULL the thing that your are not counting. I prefer to include an else in the case because it is more consistent and complete.
, Count(DISTINCT CASE WHEN SopOrder_0.SooParentOrderReference LIKE 'INT%' THEN SopOrder_0.SooParentOrderReference ELSE null END) AS 'INT'
Gordon Linoff is correct regarding the source of your error, i.e. datatype mismatch between the case then value else value end. null removes (should remove) this ambiguity - I'd need to double check.
Editing my earlier answer...
Even though it looks, as you say, like count distinct is not supported in Pervasive PSQL, CTEs are supported. So you can do something like...
This is what you are trying to do but it is not supported...
with
dups as
(
select 1 as id, 'A' as col1 union all select 1, 'A' union all select 1, 'B' union all select 2, 'B'
)
select id
,count(distinct col1) as col_count
from dups
group by id;
Stick another CTE in the query to de-duplicate the data first. Then count as normal. That should work...
with
dups as
(
select 1 as id, 'A' as col1 union all select 1, 'A' union all select 1, 'B' union all select 2, 'B'
)
,de_dup as
(
select id
,col1
from dups
group by id
,col1
)
select id
,count(col1) as col_count
from de_dup
group by id;
These 2 versions should give the same result set.
There is always a way!!
I cannot explain the error you are getting. You are mistakenly using single quotes for alias names, but I don't actually think this is causing the error.
Anyway, I suggest you aggregate your order items per order first and only join then:
SELECT
c.coacompanyname
, so.sooorderdate
, COUNT(*) AS orders
, SUM(soi.itemcount) AS order_items
, SUM(soi.ordervalue) AS order_value
, COUNT(CASE WHEN so.sooparentorderreference LIKE 'INT%' THEN 1 END) AS int
, COUNT(CASE WHEN so.sooparentorderreference LIKE 'WEB%' THEN 1 END) AS web
FROM sbs.pub.company c
JOIN sbs.pub.soporder so ON so.companyid = c.companyid
JOIN
(
SELECT soporderid, COUNT(*) AS itemcount, SUM(soivalue) AS ordervalue
FROM sbs.pub.soporderitem
GROUP BY soporderid
) soi ON soi.soporderid = so.soporderid
GROUP BY c.coacompanyname, so.sooorderdate
ORDER BY c.coacompanyname, so.sooorderdate;

How to add new column to table with the value corresponding to the same table?

There is a status column in my table with int values. How can I assign a value to the int or do I have to create a new column in the table?
I have tried to ALTER table but what is the best method?
select status from table1;
If I run the above query we get -
id status
1 1
2 2
3 1
4 5
I want to get output -
id status
1 Accepted
2 Completed
3 Accepted
4 Declined
USE case expression, postgres
select status,
case
when status=1 then 'Accepted'
when status=2 then 'Completed'
when status=3 then 'Accepted'
when sttaus=4 then 'Declined'
end mystatus
from table1;
You can use case, refer to this SO Question
PostgreSQL CASE ... END with multiple conditions. The query will look something like this:
SELECT
id,
CASE
WHEN (status = 1) THEN 'Accepted'
WHEN status=2 then 'Completed'
WHEN status=3 then 'Accepted'
WHEN sttaus=4 then 'Declined'
END AS status
FROM table1 ;
The correct case expression would be:
select id,
(case status
when 1 then 'Accepted'
when 2 then 'Completed'
when 5 then 'Declined'
end) as status
from table1;
You can also do this with a join to a derived table:
select t1.id, v.status
from table1 t1 left join
(value (1, 'Accepted'), (2, 'Completed'), (5, 'Declined')
) v(status_int, status)
on t1.status = v.status_int;
I mention this because you should probably have a reference table for the status values. In this case, the reference table is created on the fly in the query. But it should probably be a real table in the database.

Replace NULL with values

Here is my challenge:
I have a log table which every time a record is changed adds a new record but puts a NULL value for each non-changed value in each record. In other words only the changed value is set, the rest unchanged fields in each row simply has a NULL value.
Now I would like to replace each NULL value with the value above it that is NOT a NULL value like below:
Source table: Task_log
ID Owner Status Flag
1 Bob Registrar T
2 Sue NULL NULL
3 NULL NULL F
4 Frank Admission T
5 NULL NULL F
6 NULL NULL T
Desired output table: Task_log
ID Owner Status Flag
1 Bob Registrar T
2 Sue Registrar T
3 Sue Registrar F
4 Frank Admission T
5 Frank Admission F
6 Frank Admission T
How do I write a query which will generate the desired output table?
One the new windowed function of SQLServer 2012 is FIRST_VALUE, wich have quite a direct name, it can be partitioned through the OVER clause, before using it is necessary to divide every column in data block, a block for a column begin when a value is found.
With Block As (
Select ID
, Owner
, OBlockID = SUM(Case When Owner Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
, Status
, SBlockID = SUM(Case When Status Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
, Flag
, FBlockID = SUM(Case When Flag Is Null Then 0 Else 1 End)
OVER (ORDER BY ID)
From Task_log
)
Select ID
, Owner = FIRST_VALUE(Owner) OVER (PARTITION BY OBlockID ORDER BY ID)
, Status = FIRST_VALUE(Status) OVER (PARTITION BY SBlockID ORDER BY ID)
, Flag = FIRST_VALUE(Flag) OVER (PARTITION BY FBlockID ORDER BY ID)
FROM Block
SQLFiddle demo
The UPDATE query is easily derived
As I mentioned in my comment, I would try to fix the process that is creating the records rather than fixing the junk data. If that is not an option, the code below should get you pointed in the right direction.
UPDATE t1
set t1.owner = COALESCE(t1.owner, t2.owner),
t1.Status = COALESCE(t1.status, t2.status),
t1.Flag = COALESCE(t1.flag, t2.flag)
FROM Task_log as t1
INNER JOIN Task_log as t2
ON t1.id = (t1.id + 1)
where t1.owner is null
OR t1.status is null
OR t1.flag is null
I can think of several approaches.
You could use a combination of COALESCE with an array aggregate function. Unfortunately it doesn't look like SQL Server supports array_agg natively (although some nice people have developed some workarounds).
You could also use a subselect for each column.
SELECT id,
(SELECT TOP 1 FROM (SELECT owner FROM ... WHERE id = outer_id AND owner IS NOT NULL order by ID desc )) AS owner,
-- other columns
You could probably do something with window functions, too.
A vanilla solution would be:
select id
, owner
, coalesce(owner, ( select owner from t t2
where id = (select max(id) from t t3
where id < t1.id and owner is not null))
) as new_owner
, flag
, coalesce(flag, ( select flag from t t2
where id = (select max(id) from t t3
where id < t1.id and flag is not null))
) as new_flag
from t t1
Rather inefficient, but should work on most DBMS