Understanding SQL IN clause with NULL values [duplicate] - sql

This question already has answers here:
NULL values inside NOT IN clause
(12 answers)
The NOT IN with NULL values dilemma in ORACLE SQL
(1 answer)
Closed 2 years ago.
Assume the following tables:
CREATE TABLE X (x_name VARCHAR(100));
CREATE TABLE Y (y_name VARCHAR(100));
INSERT INTO X VALUES ('blue');
INSERT INTO X VALUES ('red');
INSERT INTO Y VALUES ('blue');
Resulting in:
+---------+ +---------+
| Table X | | Table Y |
+---------+ +---------+
| x_name | | y_name |
+---------+ +---------+
| 'blue' | | 'blue' |
| 'red' | +---------+
+---------+
The results of the following queries are as expected:
SELECT * FROM X WHERE x_name IN (SELECT y_name FROM Y); will return one row | 'blue' |.
SELECT * FROM X WHERE x_name NOT IN (SELECT y_name FROM Y); will return one row | 'red' |.
Let's insert NULL into table Y:
INSERT INTO Y VALUES (NULL);
The first query will return the same result (blue). However, the second query from above will return no rows. Why is this?

Don't use not in with subqueries. Period. Use not exists; it does what you want:
select x.*
from x
where not exists (select 1 from y where y.y_name = x.x_name);
The problem is this. When you have:
x_name in ('a', 'b', null)
SQL actually returns NULL, not false. However, NULL is treated the same as false in where clauses (and when clauses but not for check constraints). So, the row gets filtered out.
When you negate this, either as:
not x_name in ('a', 'b', null)
x_name not in ('a', 'b', null)
The results is not NULL which is also NULL and everything gets filtered out.
Alas. The simplest solution in my opinion is to get in the habit of using not exists.

Related

HQL, insert two rows if a condition is met

I have the following table called table_persons in Hive:
+--------+------+------------+
| people | type | date |
+--------+------+------------+
| lisa | bot | 19-04-2022 |
| wayne | per | 19-04-2022 |
+--------+------+------------+
If type is "bot", I have to add two rows in the table d1_info else if type is "per" i only have to add one row so the result is the following:
+---------+------+------------+
| db_type | info | date |
+---------+------+------------+
| x_bot | x | 19-04-2022 |
| x_bnt | x | 19-04-2022 |
| x_per | b | 19-04-2022 |
+---------+------+------------+
How can I add two rows if this condition is met?
with a Case When maybe?
You may try using a union to merge or duplicate the rows with bot. The following eg unions the first query which selects all records and the second query selects only those with bot.
Edit
In response to the edited question, I have added an additional parity column (storing 1 or 0) named original to differentiate the duplicate entry named
SELECT
p1.*,
1 as original
FROM
table_persons p1
UNION ALL
SELECT
p1.*,
0 as original
FROM
table_persons p1
WHERE p1.type='bot'
You may then insert this into your other table d1_info using the above query as a subquery or CTE with the desired transformations CASE expressions eg
INSERT INTO d1_info
(`db_type`, `info`, `date`)
WITH merged_data AS (
SELECT
p1.*,
1 as original
FROM
table_persons p1
UNION ALL
SELECT
p1.*,
0 as original
FROM
table_persons p1
WHERE p1.type='bot'
)
SELECT
CONCAT('x_',CASE
WHEN m1.type='per' THEN m1.type
WHEN m1.original=1 AND m1.type='bot' THEN m1.type
ELSE 'bnt'
END) as db_type,
CASE
WHEN m1.type='per' THEN 'b'
ELSE 'x'
END as info,
m1.date
FROM
merged_data m1
ORDER BY m1.people,m1.date;
See working demo db fiddle here
I think what you want is to create a new table that captures your logic. This would simplify your query and make it so you could easily add new types without having to edit logic of a case statement. It may also make it cleaner to view your logic later.
CREATE TABLE table_persons (
`people` VARCHAR(5),
`type` VARCHAR(3),
`date` VARCHAR(10)
);
INSERT INTO table_persons
VALUES
('lisa', 'bot', '19-04-2022'),
('wayne', 'per', '19-04-2022');
CREATE TABLE info (
`type` VARCHAR(5),
`db_type` VARCHAR(5),
`info` VARCHAR(1)
);
insert into info
values
('bot', 'x_bot', 'x'),
('bot', 'x_bnt', 'x'),
('per','x_per','b');
and then you can easily do a join:
select
info.db_type,
info.info,
persons.date date
from
table_persons persons inner join info
on
info.type = persons.type

SQL check if exists and insert [duplicate]

This question already has answers here:
Is SELECT or INSERT in a function prone to race conditions?
(3 answers)
Closed last year.
I'm using a table 'Customer' with the following schema
id INTEGER NOT NULL UNIQUE,
name TEXT NOT NULL,
auth BOOLEAN DEFAULT FALSE
Now, I want to add a record if does not exist, I can do the following
IF NOT EXISTS (SELECT name from Customer where id=220)
BEGIN
INSERT into Customer (name,id) values ('Jon', 220)
END;
But at the same time, I also want to know if the id really did not exist along with the insertion i.e. True/False result of the select query. I can split it into two queries, from the first I can know if it exists and if id did not then I can insert it. But how can I do this in a single query?
You need to use INSERT with the RETURNING clause (PostgreSQL INSERT).
'on conflict' clause used with INSERT can be customized to serve your purpose.
INSERT INTO <table_name>(<column_name_list)) values(<column_values>) ON CONFLICT(<constraint_column>) DO NOTHING;
ref: https://www.postgresqltutorial.com/postgresql-upsert/
Set up
Step 1: Create the table:
create table test
(
id INTEGER NOT NULL UNIQUE,
name TEXT NOT NULL,
auth BOOLEAN DEFAULT FALSE
);
Step 2: Load the table with some sample rows:
insert into test(id,name) values(1,'vincent'),(2,'gabriel'),(3,'sebastian');
Step 3: Test with an INSERT of a row with existing id i.e 1 , the insert does not go through as the ID already exists:
INSERT INTO test(id,name) values(1,'xavier') ON CONFLICT(id) DO NOTHING;
Step 4: Now test with a row with ID that does not exist.i.e 4. It gets through.
INSERT INTO test(id,name) values(4,'xavier') ON CONFLICT(id) DO NOTHING;
Demo:
postgres=# select * from test;
id | name | auth
----+-----------+------
1 | vincent | f
2 | gabriel | f
3 | sebastian | f
(3 rows)
postgres=# INSERT INTO test(id,name) values(1,'xavier') ON CONFLICT(id) DO NOTHING;
INSERT 0 0
postgres=#
postgres=# select * from test;
id | name | auth
----+-----------+------
1 | vincent | f
2 | gabriel | f
3 | sebastian | f
(3 rows)
--- NOTE: no row inserted as ID 1 already exists.
postgres=# INSERT INTO test(id,name) values(4,'xavier') ON CONFLICT(id) DO NOTHING;
INSERT 0 1
postgres=# select * from test;
id | name | auth
----+-----------+------
1 | vincent | f
2 | gabriel | f
3 | sebastian | f
4 | xavier | f -------> new row inserted.
(4 rows)
you can use the following :
INSERT into Customer SELECT 'Jon', 220
Where Not EXISTS (SELECT 1
from Customer
where id=220);
Select Cast(##ROWCOUNT as bit);

Use IN to compare Array of Values against a table of data

I want to compare an array of values against the the rows of a table and return only the rows in which the data are different.
Suppose I have myTable:
| ItemCode | ItemName | FrgnName |
|----------|----------|----------|
| CD1 | Apple | Mela |
| CD2 | Mirror | Specchio |
| CD3 | Bag | Borsa |
Now using the SQL instruction IN I would like to compare the rows above against an array of values pasted from an excel file and so in theory I would have to write something like:
WHERE NOT IN (
ARRAY[CD1, Apple, Mella],
ARRAY[CD2, Miror, Specchio],
ARRAY[CD3, Bag, Borsa]
)
The QUERY should return rows 1 and 2 "MELLA" and "MIROR" are in fact typos.
You could use a VALUES expression to emulate a table of those arrays, like so:
... myTable AS t
LEFT JOIN (
VALUES (1, 'CD1','Apple','Mella')
, (1, 'CD2', 'Miror', 'Specchio')
, (1, 'CD3', 'Bag', 'Borsa')
) AS v(rowPresence, a, b, c)
ON t.ItemCode = v.a AND t.ItemName = v.b AND t.FrgnName = v.c
WHERE v.rowPresence IS NULL
Technically, in your scenario, you can do without the "rowPresence" field I added, since none of the values in your arrays are NULL any would do; I basically added it to point to a more general case.

Why SQL comparison with <> doesn't return the row with NULL value [duplicate]

This question already has answers here:
Not equal <> != operator on NULL
(10 answers)
Closed 9 years ago.
Assume a table X like so:
A | B
----------------
2 pqr
3 xyz
*NULL* abc
When I execute a query like:
SELECT *
FROM X
WHERE A <> 2
I expect a result set like this:
A | B
----------------
3 xyz
*NULL* abc
But to my surprise, I get a result set like this :
A | B
----------------
3 xyz
Why does the row with NULL value not appear in the result set?
Can somebody explain this behavior ?
The ANSI-92 SQL Standard states that if one of the operands is NULL, the result of the comparison is "UNKNOWN" - not true or false.
For a good look at how NULLs work in SQL, see 4 Simple Rules for Handling SQL NULLs
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE X
([A] int, [B] varchar(3))
;
INSERT INTO X
([A], [B])
VALUES
(2, 'pqr'),
(3, 'xyz'),
(NULL, 'abc')
;
Query 1:
SELECT *
FROM X
WHERE A IS NULL OR A <> 2
Results:
| A | B |
|--------|-----|
| 3 | xyz |
| (null) | abc |
Because of the null <> 2 return unknown in three-state logic which in predicates treats as false.
NULL is not compared using = or <>. Check this NULL COMPARISON article. NULLS are compared as IS NULL or IS NOT NULL.
You need to handle NULL ... if any operand is NULL, the result of a comparison is NULL.
if you try something like this:
SELECT * FROM X WHERE isnull(A,'NULL') <> 2
you may get the results you are expecting (if A is a varchar field) ...
If A is a number field, you could try this:
SELECT * FROM X WHERE isnull(A,0) <> 2
If you are a visual learner, run this query:
SELECT
a,
b,
CASE
WHEN a <> b THEN 'a <> b'
WHEN a = b THEN 'a = b'
ELSE 'neither'
END
FROM
(VALUES (0),(1),(NULL)) a(a),
(VALUES (0),(1),(NULL)) a(b)

In SQL, what's the difference between count(column) and count(*)?

I have the following query:
select column_name, count(column_name)
from table
group by column_name
having count(column_name) > 1;
What would be the difference if I replaced all calls to count(column_name) to count(*)?
This question was inspired by How do I find duplicate values in a table in Oracle?.
To clarify the accepted answer (and maybe my question), replacing count(column_name) with count(*) would return an extra row in the result that contains a null and the count of null values in the column.
count(*) counts NULLs and count(column) does not
[edit] added this code so that people can run it
create table #bla(id int,id2 int)
insert #bla values(null,null)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,1)
insert #bla values(1,null)
insert #bla values(null,null)
select count(*),count(id),count(id2)
from #bla
results
7 3 2
Another minor difference, between using * and a specific column, is that in the column case you can add the keyword DISTINCT, and restrict the count to distinct values:
select column_a, count(distinct column_b)
from table
group by column_a
having count(distinct column_b) > 1;
A further and perhaps subtle difference is that in some database implementations the count(*) is computed by looking at the indexes on the table in question rather than the actual data rows. Since no specific column is specified, there is no need to bother with the actual rows and their values (as there would be if you counted a specific column). Allowing the database to use the index data can be significantly faster than making it count "real" rows.
The explanation in the docs, helps to explain this:
COUNT(*) returns the number of items in a group, including NULL values and duplicates.
COUNT(expression) evaluates expression for each row in a group and returns the number of nonnull values.
So count(*) includes nulls, the other method doesn't.
We can use the Stack Exchange Data Explorer to illustrate the difference with a simple query. The Users table in Stack Overflow's database has columns that are often left blank, like the user's Website URL.
-- count(column_name) vs. count(*)
-- Illustrates the difference between counting a column
-- that can hold null values, a 'not null' column, and count(*)
select count(WebsiteUrl), count(Id), count(*) from Users
If you run the query above in the Data Explorer, you'll see that the count is the same for count(Id) and count(*)because the Id column doesn't allow null values. The WebsiteUrl count is much lower, though, because that column allows null.
The COUNT(*) sentence indicates SQL Server to return all the rows from a table, including NULLs.
COUNT(column_name) just retrieves the rows having a non-null value on the rows.
Please see following code for test executions SQL Server 2008:
-- Variable table
DECLARE #Table TABLE
(
CustomerId int NULL
, Name nvarchar(50) NULL
)
-- Insert some records for tests
INSERT INTO #Table VALUES( NULL, 'Pedro')
INSERT INTO #Table VALUES( 1, 'Juan')
INSERT INTO #Table VALUES( 2, 'Pablo')
INSERT INTO #Table VALUES( 3, 'Marcelo')
INSERT INTO #Table VALUES( NULL, 'Leonardo')
INSERT INTO #Table VALUES( 4, 'Ignacio')
-- Get all the collumns by indicating *
SELECT COUNT(*) AS 'AllRowsCount'
FROM #Table
-- Get only content columns ( exluce NULLs )
SELECT COUNT(CustomerId) AS 'OnlyNotNullCounts'
FROM #Table
COUNT(*) – Returns the total number of records in a table (Including NULL valued records).
COUNT(Column Name) – Returns the total number of Non-NULL records. It means that, it ignores counting NULL valued records in that particular column.
Basically the COUNT(*) function return all the rows from a table whereas COUNT(COLUMN_NAME) does not; that is it excludes null values which everyone here have also answered here.
But the most interesting part is to make queries and database optimized it is better to use COUNT(*) unless doing multiple counts or a complex query rather than COUNT(COLUMN_NAME). Otherwise, it will really lower your DB performance while dealing with a huge number of data.
Further elaborating upon the answer given by #SQLMeance and #Brannon making use of GROUP BY clause which has been mentioned by OP but not present in answer by #SQLMenace
CREATE TABLE table1 (
id INT
);
INSERT INTO table1 VALUES
(1),
(2),
(NULL),
(2),
(NULL),
(3),
(1),
(4),
(NULL),
(2);
SELECT * FROM table1;
+------+
| id |
+------+
| 1 |
| 2 |
| NULL |
| 2 |
| NULL |
| 3 |
| 1 |
| 4 |
| NULL |
| 2 |
+------+
10 rows in set (0.00 sec)
SELECT id, COUNT(*) FROM table1 GROUP BY id;
+------+----------+
| id | COUNT(*) |
+------+----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 3 |
| 3 | 1 |
| 4 | 1 |
+------+----------+
5 rows in set (0.00 sec)
Here, COUNT(*) counts the number of occurrences of each type of id including NULL
SELECT id, COUNT(id) FROM table1 GROUP BY id;
+------+-----------+
| id | COUNT(id) |
+------+-----------+
| 1 | 2 |
| 2 | 3 |
| NULL | 0 |
| 3 | 1 |
| 4 | 1 |
+------+-----------+
5 rows in set (0.00 sec)
Here, COUNT(id) counts the number of occurrences of each type of id but does not count the number of occurrences of NULL
SELECT id, COUNT(DISTINCT id) FROM table1 GROUP BY id;
+------+--------------------+
| id | COUNT(DISTINCT id) |
+------+--------------------+
| NULL | 0 |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+------+--------------------+
5 rows in set (0.00 sec)
Here, COUNT(DISTINCT id) counts the number of occurrences of each type of id only once (does not count duplicates) and also does not count the number of occurrences of NULL
It is best to use
Count(1) in place of column name or *
to count the number of rows in a table, it is faster than any format because it never go to check the column name into table exists or not
There is no difference if one column is fix in your table, if you want to use more than one column than you have to specify that how much columns you required to count......
Thanks,
As mentioned in the previous answers, Count(*) counts even the NULL columns, whereas count(Columnname) counts only if the column has values.
It's always best practice to avoid * (Select *, count *, …)