Replace NULL with the values

Replace NULL with the values - sql

We have a requirement as shown below.
There are 6 item_class - A,B,C,D,A1A,A1B which should appear for all the records.
Record:
Item_class Rev_id
A 1
B 1
C 2
D 1
Null 1
Null 2
We need to display the record as given below table,
item_class Rev_id
A 1
B 1
C 2
D 1
A1A 1
A1B 2
So all the item_class(A,B,C,D,A1A,A1B) should display when item class is null and also it checks if item_clas A,B,C are already present then for different rev_id it will display the remaining one (i.e. D,A1A,A1B) but all item_class should covered.
How can we write a query to fetch similar records?

the issue got resolved by using outer join. Please find below,
select b.item_class,
a.rev_id
from cct_test a
right outer join cct_item_details b
on a.item_class = b.item_class and a.rev_id = 1;
Output:
item_class | rev_id
A | 1
C | 1
A1B | null
A1A | null
D | null
B | null

Related

How to deal with SQL IN Operator returning multiple values when only one is needed?

I am working on a multiple parent child hierarchy that looks something like this. This is a subset of a table with 105k records.
NodeKey
ParentKey
Child
Parent
1
A
2
B
3
C
A
4
D
B
5
D
C
I need to fill the column ParentKey with data with the following condition.
If the value in Parent is NULL, set the value in ParentKey NULL as well.
If the value in Parent is NOT NULL and is also in Child, then select the corresponding NodeKey and set it as the value in ParentKey (see 2nd table).
I can do that but there is a problem when the value from the Parent column appears more than once in the Child column.
In the 5th row it doesn't matter which value is chosen between 3 or 4. It can be either one.
NodeKey
ParentKey
Child
Parent
1
A
2
B
3
1
C
A
4
2
C
B
5
Doesn't matter if 3 or 4
D
C
SELECT (CASE WHEN Parent IS NULL THEN NULL
ELSE
(SELECT NodeKey from table WHERE Parent IN (SELECT Child from table)) END) as ParentKey
When executing this code it tells me that "Subquery returned more that 1 value" which makes sense. But regardless where I put a max() or min() it doesn't work.
When I put max()
in front of NodeKey it just returns a column full with NULL and 105314. 105314 is the amount of rows in the table.
I am using SQL Server Management Studio 17.

If it is not matter what ParentKey will be used, you can use MIN (MAX) function:
SELECT
TBL.NodeKey,
PK AS ParentKey,
TBL.Child,
TBL.Parent
FROM TBL
LEFT JOIN (
SELECT Child, MIN(NodeKey) PK FROM TBL GROUP BY Child
) P ON P.Child = TBL.Parent;
Test MS SQL query
or another version:
SELECT
TBL.NodeKey,
MIN(P.NodeKey) AS ParentKey,
TBL.Child,
TBL.Parent
FROM TBL
LEFT JOIN TBL P ON P.Child = TBL.Parent
GROUP BY TBL.NodeKey, TBL.Child, TBL.Parent;
Result:
+=========+===========+=======+========+
| NodeKey | ParentKey | Child | Parent |
+=========+===========+=======+========+
| 1 | (null) | A | (null) |
+---------+-----------+-------+--------+
| 2 | (null) | B | (null) |
+---------+-----------+-------+--------+
| 3 | 1 | C | A |
+---------+-----------+-------+--------+
| 4 | 2 | C | B |
+---------+-----------+-------+--------+
| 5 | 3 | D | C |
+---------+-----------+-------+--------+

How to find count differences for IDs in two large tables

I have two large tables. Both containing around 17M rows each. They should have same exact number of rows but I am finding that the counts are different by 343. I want to find out where the counts are different. Tables look like this:
Table A
ID | color
---| ---------
1 | red
1 | green
1 | blue
2 | white
3 | black
3 | red
Tabls B
ID | sale_dates
---| ----------
1 | 2020-10-01
1 | 2020-01-10
2 | 2018-01-09
3 | 2017-08-08
Based on above I would like an output like below:
ID | Table A | Table B | Difference
---| --------| --------| ----------
1 | 5 | 2 | 3
2 | 1 | 1 | 0
3 | 2 | 1 | 1
Or even only find out the ones where the difference is not 0

If the two tables will always have the same set of ID values, you can just JOIN two derived tables of COUNT(*) values to get your desired output:
SELECT A.ID,
"Table A",
"Table B",
"Table A" - "Table B" AS Difference
FROM (
SELECT ID, COUNT(*) AS "Table A"
FROM A
GROUP BY ID
) A
JOIN (
SELECT ID, COUNT(*) AS "Table B"
FROM B
GROUP BY ID
) B ON A.ID = B.ID
ORDER BY A.ID
Output:
id Table A Table B difference
1 3 2 1
2 1 1 0
3 2 1 1
Demo on dbfiddle
If you only want the ID values which have a non-zero difference, add
WHERE "Table A" - "Table B" > 0
before the ORDER BY clause.
Demo on dbfiddle

This is a tweak on Nick's answer. I think a full join is very important in this type of situation, because it is possible that some ids are missing from one table or the other:
SELECT ID, a.cnt, b.cnt,
(COALESCE(a.cnt, 0) - COALESCE(b.cnt, 0)) as difference
FROM (SELECT UPPER(ID) as id, COUNT(*) AS cnt
FROM A
GROUP BY UPPER(ID)
) A FULL JOIN
(SELECT UPPER(ID) as id, COUNT(*) AS cnt
FROM B
GROUP BY UPPER(ID)
) B
USING (ID)
ORDER BY difference DESC;
Add:
WHERE COALESCE(a.cnt, 0) <> COALESCE(b.cnt)
if you only want ids where the counts are not the same.

Comparing different columns in SQL for each row

after some transformation I have a result from a cross join (from table a and b) where I want to do some analysis on. The table for this looks like this:
+-----+------+------+------+------+-----+------+------+------+------+
| id | 10_1 | 10_2 | 11_1 | 11_2 | id | 10_1 | 10_2 | 11_1 | 11_2 |
+-----+------+------+------+------+-----+------+------+------+------+
| 111 | 1 | 0 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
| 111 | 1 | 0 | 1 | 0 | 333 | 0 | 0 | 0 | 0 |
| 111 | 1 | 0 | 1 | 0 | 444 | 1 | 0 | 1 | 1 |
| 112 | 0 | 1 | 1 | 0 | 222 | 1 | 0 | 1 | 0 |
+-----+------+------+------+------+-----+------+------+------+------+
The ids in the first column are different from the ids in the sixth column.
In a row are always two different IDs that are matched with each other. The other columns always have either 0 or 1 as a value.
I am now trying to find out how many values(meaning both have "1" in 10_1, 10_2 etc) two IDs have on average in common, but I don't really know how to do so.
I was trying something like this as a start:
SELECT SUM(CASE WHEN a.10_1 = 1 AND b.10_1 = 1 then 1 end)
But this would obviously only count how often two ids have 10_1 in common. I could make something like this for example for different columns:
SELECT SUM(CASE WHEN (a.10_1 = 1 AND b.10_1 = 1)
OR (a.10_2 = 1 AND b.10_1 = 1) OR [...] then 1 end)
To count in general how often two IDs have one thing in common, but this would of course also count if they have two or more things in common. Plus, I would also like to know how often two IDS have two things, three things etc in common.
One "problem" in my case is also that I have like ~30 columns I want to look at, so I can hardly write down for each case every possible combination.
Does anyone know how I can approach my problem in a better way?
Thanks in advance.
Edit:
A possible result could look like this:
+-----------+---------+
| in_common | count |
+-----------+---------+
| 0 | 100 |
| 1 | 500 |
| 2 | 1500 |
| 3 | 5000 |
| 4 | 3000 |
+-----------+---------+

With the codes as column names, you're going to have to write some code that explicitly references each column name. To keep that to a minimum, you could write those references in a single union statement that normalizes the data, such as:
select id, '10_1' where "10_1" = 1
union
select id, '10_2' where "10_2" = 1
union
select id, '11_1' where "11_1" = 1
union
select id, '11_2' where "11_2" = 1;
This needs to be modified to include whatever additional columns you need to link up different IDs. For the purpose of this illustration, I assume the following data model
create table p (
id integer not null primary key,
sex character(1) not null,
age integer not null
);
create table t1 (
id integer not null,
code character varying(4) not null,
constraint pk_t1 primary key (id, code)
);
Though your data evidently does not currently resemble this structure, normalizing your data into a form like this would allow you to apply the following solution to summarize your data in the desired form.
select
in_common,
count(*) as count
from (
select
count(*) as in_common
from (
select
a.id as a_id, a.code,
b.id as b_id, b.code
from
(select p.*, t1.code
from p left join t1 on p.id=t1.id
) as a
inner join (select p.*, t1.code
from p left join t1 on p.id=t1.id
) as b on b.sex <> a.sex and b.age between a.age-10 and a.age+10
where
a.id < b.id
and a.code = b.code
) as c
group by
a_id, b_id
) as summ
group by
in_common;

The proposed solution requires first to take one step back from the cross-join table, as the identical column names are super annoying. Instead, we take the ids from the two tables and put them in a temporary table. The following query gets the result wanted in the question. It assumes table_a and table_b from the question are the same and called tbl, but this assumption is not needed and tbl can be replaced by table_a and table_b in the two sub-SELECT queries. It looks complicated and uses the JSON trick to flatten the columns, but it works here:
WITH idtable AS (
SELECT a.id as id_1, b.id as id_2 FROM
-- put cross join of table a and table b here
)
SELECT in_common,
count(*)
FROM
(SELECT idtable.*,
sum(CASE
WHEN meltedR.value::text=meltedL.value::text THEN 1
ELSE 0
END) AS in_common
FROM idtable
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_a
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedL ON (idtable.id_1 = meltedL.id)
JOIN
(SELECT tbl.id,
b.*
FROM tbl, -- change here to table_b
json_each(row_to_json(tbl)) b -- and here too
WHERE KEY<>'id' ) meltedR ON (idtable.id_2 = meltedR.id
AND meltedL.key = meltedR.key)
GROUP BY idtable.id_1,
idtable.id_2) tt
GROUP BY in_common ORDER BY in_common;
The output here looks like this:
in_common | count
-----------+-------
2 | 2
3 | 1
4 | 1
(3 rows)

Why IN operator return distinct selection when passing duplicate value (value1 , value1 ....)

Using SQL Server 2008
Why does the IN operator return distinct values when selecting duplicate values?
Table #temp
x | 1 | 2 | 3
--+------------+-------------+------------
1 | first 1 | first 2 | first 3
2 | Second 1 | second 2 | second 3
When I execute this query
SELECT * FROM #temp WHERE x IN (1,1)
it will return
x | 1 | 2 | 3
--+------------+-------------+------------
1 | first 1 | first 2 | first 3
How can I make it so it returns this instead:
x | 1 | 2 | 3
--+------------+-------------+------------
1 | first 1 | first 2 | first 3
1 | first 1 | first 2 | first 3
What is the alternative of IN in this case?

If you want to return duplicates, then you need to phrase the query as a join. The in is simply testing a condition on each row. Whether the condition is met once or twice doesn't matter -- the row either stays in or gets filtered out.
with xes as (
select 1 as x union all
select 1 as x
)
SELECT *
FROM #temp t join
xes
on t.x = xes.x;
EDIT:
If you have a subquery, then it is even simpler:
select *
from #temp t join
(<subquery>) s
on t.x = s.x
This would be a "normal" use of a join.

Return count(*) even if 0

I have the following query:
select bb.Name, COUNT(*) as Num from BOutcome bo
JOIN BOffers bb ON bo.ID = bb.BOutcomeID
WHERE bo.EventID = 123 AND bo.OfferTypeID = 321 AND bb.NumA > bb.NumB
GROUP BY bb.Name
The table looks like:
Name | Num A | Num B
A | 10 | 3
B | 2 | 3
C | 10 | 3
A | 9 | 3
B | 2 | 3
C | 9 | 3
The expected output should be:
Name | Count
A | 2
B | 0
C | 2
Because when name is A and C then Num A is bigger to times than Num B and when Name is B, in both records Num A is lower than Num B.
My current output is:
Name | Count
A | 2
C | 2
Because B's output is 0, i am not getting it back in my query.
What is wrong with my query? how should I get it back?

Here is my guess. I think this is a much simpler approach than all of the left/right join hoops people have been spinning their wheels on. Since the output of the query relies only on columns in the left table, there is no need for an explicit join at all:
SELECT
bb.Name,
[Count] = SUM(CASE WHEN bb.NumA > bb.NumB THEN 1 ELSE 0 END)
-- just FYI, the above could also be written as:
-- [Count] = COUNT(CASE WHEN bb.NumA > bb.NumB THEN 1 END)
FROM dbo.BOffers AS bb
WHERE EXISTS
(
SELECT 1 FROM dbo.BOutcome
WHERE ID = bb.BOutcomeID
AND EventID = 123
AND OfferTypeID = 321
)
GROUP BY bb.Name;
Of course, we're not really sure that both Name and NumA/NumB are in the left table, since the OP talks about two tables but only shows one table in the sample data. My guess is based on the query he says is "working" but missing rows because of the explicit join.

Another wild guess. Feel free to downvote:
SELECT ba.Name, COUNT(bb.BOutcomeID) as Num
FROM
( SELECT DISTINCT ba.Name
FROM
BOutcome AS b
JOIN
BOffers AS ba
ON ba.BOutcomeID = b.ID
WHERE b.EventID = 123
AND b.OfferTypeID = 321
) AS ba
LEFT JOIN
BOffers AS bb
ON AND bb.Name = ba.Name
AND bb.NumA > bb.NumB
GROUP BY ba.Name ;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Replace NULL with the values - sql

the issue got resolved by using outer join. Please find below, select b.item_class, a.rev_id from cct_test a right outer join cct_item_details b on a.item_class = b.item_class and a.rev_id = 1; Output: item_class | rev_id A | 1 C | 1 A1B | null A1A | null D | null B | null

Related

How to deal with SQL IN Operator returning multiple values when only one is needed?

How to find count differences for IDs in two large tables

Comparing different columns in SQL for each row

Why IN operator return distinct selection when passing duplicate value (value1 , value1 ....)

Return count(*) even if 0

Categories

Resources