Related
Assume I have a hive table that looks like this:
|ID |CODE |AMT |NEW AMT|
|---|---------------|-----|-------|
|1 |['a','b',,,] |10 | 50 |
|2 |[,,,'a','b'] |20 | 70 |
|3 |[,'c','d','e',]|30 | 20 |
|4 |['p','q',,,] |40 | 20 |
The code column is of an array datatype. It can have 5 values and these values are being populated by an ETLjob. These values are comma separated.
I need to find the aggregated value of AMT column keeping the following conditions in place:
if the code has values 'a', 'b' then the value in amount for that id should be zero.
if the code has values 'c','d','e' then the value in amount should be replaced with the value
that is in new amt.
if it doesn't match either of the above conditions, the value should be same as that in amt.
After this, the sum of amt can be taken. So with the table given above, the sum(amt) should be 60.
I have been struggling with this as I am new to hql/sql.
I have tried summing up using a case statement but failed.
Thank you for any input you may have!
"The code column is of an array datatype."
Use array_contains() function with case expressions:
select t.id, t.code,
case when array_contains(t.code, 'a') and array_contains(t.code, 'b') then 0
when array_contains(t.code, 'c') and array_contains(t.code, 'd') and array_contains(t.code, 'e') then t.new_amt
else t.amt
end AMT
from table_name t
Just use if else or case when
Lets create a table with the sample data you provided
CREATE TABLE `table1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`code` longtext NOT NULL,
`amt` int(11) NOT NULL,
`new_amt` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `table1` (`id`, `code`, `amt`, `new_amt`) VALUES
(1, '[\'a\',\'b\',,,]', 10, 50),
(2, '[,,,\'a\',\'b\']', 20, 70),
(3, '[,\'c\',\'d\',\'e\',]', 30, 20),
(4, '[\'p\',\'q\',,,]', 40, 20);
See how the table looks like SELECT * FROM table1
id
code
amt
new_amt
1
['a','b',,,]
10
50
2
[,,,'a','b']
20
70
3
[,'c','d','e',]
30
20
4
['p','q',,,]
40
20
Now use if else to decide the value
SELECT
`code`,
IF(
`code` LIKE "%a','b%",
0,
IF(
`code` LIKE "%c','d','e%",
`new_amt`,
`amt` + `new_amt`
)
) AS price
FROM
`table1`
Result :
id
code
price
1
['a','b',,,]
0
2
[,,,'a','b']
0
3
[,'c','d','e',]
20
4
['p','q',,,]
60
This question already has answers here:
NULL values inside NOT IN clause
(12 answers)
The NOT IN with NULL values dilemma in ORACLE SQL
(1 answer)
Closed 2 years ago.
Assume the following tables:
CREATE TABLE X (x_name VARCHAR(100));
CREATE TABLE Y (y_name VARCHAR(100));
INSERT INTO X VALUES ('blue');
INSERT INTO X VALUES ('red');
INSERT INTO Y VALUES ('blue');
Resulting in:
+---------+ +---------+
| Table X | | Table Y |
+---------+ +---------+
| x_name | | y_name |
+---------+ +---------+
| 'blue' | | 'blue' |
| 'red' | +---------+
+---------+
The results of the following queries are as expected:
SELECT * FROM X WHERE x_name IN (SELECT y_name FROM Y); will return one row | 'blue' |.
SELECT * FROM X WHERE x_name NOT IN (SELECT y_name FROM Y); will return one row | 'red' |.
Let's insert NULL into table Y:
INSERT INTO Y VALUES (NULL);
The first query will return the same result (blue). However, the second query from above will return no rows. Why is this?
Don't use not in with subqueries. Period. Use not exists; it does what you want:
select x.*
from x
where not exists (select 1 from y where y.y_name = x.x_name);
The problem is this. When you have:
x_name in ('a', 'b', null)
SQL actually returns NULL, not false. However, NULL is treated the same as false in where clauses (and when clauses but not for check constraints). So, the row gets filtered out.
When you negate this, either as:
not x_name in ('a', 'b', null)
x_name not in ('a', 'b', null)
The results is not NULL which is also NULL and everything gets filtered out.
Alas. The simplest solution in my opinion is to get in the habit of using not exists.
I wonder why the BINARY_CHECKSUM function returns different result for the same:
SELECT *, BINARY_CHECKSUM(a,b) AS bc
FROM (VALUES(1, NULL, 100),
(2, NULL, NULL),
(3, 1, 2)) s(id,a,b);
SELECT *, BINARY_CHECKSUM(a,b) AS bc
FROM (VALUES(1, NULL, 100),
(2, NULL, NULL)) s(id,a,b);
Ouput:
+-----+----+------+-------------+
| id | a | b | bc |
+-----+----+------+-------------+
| 1 | | 100 | -109 |
| 2 | | | -2147483640 |
| 3 | 1 | 2 | 18 |
+-----+----+------+-------------+
-- -109 vs 100
+-----+----+------+------------+
| id | a | b | bc |
+-----+----+------+------------+
| 1 | | 100 | 100 |
| 2 | | | 2147483647 |
+-----+----+------+------------+
And for second sample I get what I would anticipate:
SELECT *, BINARY_CHECKSUM(a,b) AS bc
FROM (VALUES(1, 1, 100),
(2, 3, 4),
(3,1,1)) s(id,a,b);
SELECT *, BINARY_CHECKSUM(a,b) AS bc
FROM (VALUES(1, 1, 100),
(2, 3, 4)) s(id,a,b);
Ouptut for both first two rows:
+-----+----+------+-----+
| id | a | b | bc |
+-----+----+------+-----+
| 1 | 1 | 100 | 116 |
| 2 | 3 | 4 | 52 |
+-----+----+------+-----+
db<>fiddle demo
It has strange consequences when I want to compare two tables/queries:
WITH t AS (
SELECT 1 AS id, NULL AS a, 100 b
UNION ALL SELECT 2, NULL, NULL
UNION ALL SELECT 3, 1, 2 -- comment this out
), s AS (
SELECT 1 AS id ,100 AS a, NULL as b
UNION ALL SELECT 2, NULL, NULL
UNION ALL SELECT 3, 2, 1 -- comment this out
)
SELECT t.*,s.*
,BINARY_CHECKSUM(t.a, t.b) AS bc_t, BINARY_CHECKSUM(s.a, s.b) AS bc_s
FROM t
JOIN s
ON s.id = t.id
WHERE BINARY_CHECKSUM(t.a, t.b) = BINARY_CHECKSUM(s.a, s.b);
db<>fiddle demo2
For 3 rows I get single result:
+-----+----+----+-----+----+----+--------------+-------------+
| id | a | b | id | a | b | bc_t | bc_s |
+-----+----+----+-----+----+----+--------------+-------------+
| 2 | | | 2 | | | -2147483640 | -2147483640 |
+-----+----+----+-----+----+----+--------------+-------------+
but for 2 rows I get also id = 1:
+-----+----+------+-----+------+----+-------------+------------+
| id | a | b | id | a | b | bc_t | bc_s |
+-----+----+------+-----+------+----+-------------+------------+
| 1 | | 100 | 1 | 100 | | 100 | 100 |
| 2 | | | 2 | | | 2147483647 | 2147483647 |
+-----+----+------+-----+------+----+-------------+------------+
Remarks:
I am not searching for alternatives like(HASH_BYTES/MD5/CHECKSUM)
I am aware that BINARY_CHECKSUM could lead to collisions(two different calls produce the same output) here scenario is a bit different
For this definition, we say that null values, of a specified type,
compare as equal values. If at least one of the values in the
expression list changes, the expression checksum can also change.
However, this is not guaranteed. Therefore, to detect whether values
have changed, we recommend use of BINARY_CHECKSUM only if your
application can tolerate an occasional missed change.
It is strange for me that hash function returns different result for the same input arguments.
Is this behaviour by design or it is some kind of glitch?
EDIT:
As #scsimon
points out it works for materialized tables but not for cte.
db<>fiddle actual table
Metadata for cte:
SELECT name, system_type_name
FROM sys.dm_exec_describe_first_result_set('
SELECT *
FROM (VALUES(1, NULL, 100),
(2, NULL, NULL),
(3, 1, 2)) s(id,a,b)', NULL,0);
SELECT name, system_type_name
FROM sys.dm_exec_describe_first_result_set('
SELECT *
FROM (VALUES(1, NULL, 100),
(2, NULL, NULL)) s(id,a,b)', NULL,0)
-- working workaround
SELECT name, system_type_name
FROM sys.dm_exec_describe_first_result_set('
SELECT *
FROM (VALUES(1, cast(NULL as int), 100),
(2, NULL, NULL)) s(id,a,b)', NULL,0)
For all cases all columns are INT but with explicit CAST it behaves as it should.
db<>fidde metadata
This has nothing to do with the number of rows. It is because the values in one of the columns of the 2-row version are always NULL. The default type of NULL is int and the default type of a numeric constant (of this length) is int, so these should be comparable. But from a values() derived table, these are (apparently) not exactly the same type.
In particular, a column with only typeless NULLs from a derived table is not comparable, so it is excluded from the binary checksum calculation. This does not occur in a real table, because all columns have types.
The rest of the answer illustrates what is happening.
The code behaves as expected with type conversions:
SELECT *, BINARY_CHECKSUM(a, b) AS bc
FROM (VALUES(1, cast(NULL as int), 100),
(2, NULL, NULL)
) s(id,a,b);
Here is a db<>fiddle.
Actually creating tables with the values suggests that columns with only NULL values have exactly the same type as columns with explicit numbers. That suggests that the original code should work. But an explicit cast also fixes the problem. Very strange.
This is really, really strange. Consider the following:
select v.*, checksum(a, b), checksum(c,b)
FROM (VALUES(1, NULL, 100, NULL),
(2, 1, 2, 1.0)
) v(id, a, b, c);
The change in type for "d" affects the binary_checksum() for the second row, but not for the first.
This is my conclusion. When all the values in a column are binary, then binary_checksum() is aware of this and the column is in the category of "noncomparable data type". The checksum is then based on the remaining columns.
You can validate this by seeing the error when you run:
select v.*, binary_checksum(a)
FROM (VALUES(1, NULL, 100, NULL),
(2, NULL, 2, 1.0)
) v( id,a, b, c);
It complains:
Argument data type NULL is invalid for argument 1 of checksum function.
Ironically, this is not a problem if you save the results into a table and use binary_checksum(). The issue appears to be some interaction with values() and data types -- but something that is not immediately obvious in the information_schema.columns table.
The happyish news is that the code should work on tables, even if it does not work on values() generated derived tables -- as this SQL Fiddle demonstrates.
I also learned that a column filled with NULLs really is typeless. The assignment of the int data type in a select into seems to happen when the table is being defined. The "typeless" type is converted to an int.
For the literal NULL without the CAST (and without any typed values in the column) it entirely ignores it and just gives you the same result as BINARY_CHECKSUM(b).
This seems to happen very early on. The initial tree representation output from
SELECT *, BINARY_CHECKSUM(a,b) AS bc
FROM (VALUES(1, NULL, 100),
(2, NULL, NULL)) s(id,a,b)
OPTION (RECOMPILE, QUERYTRACEON 8605, QUERYTRACEON 3604);
Already shows that it has decided to just use one column as input to the function
ScaOp_Intrinsic binary_checksum
ScaOp_Identifier COL: Union1008
This compares with the following output for your first query
ScaOp_Intrinsic binary_checksum
ScaOp_Identifier COL: Union1011
ScaOp_Identifier COL: Union1010
If you try and get the BINARY_CHECKSUM with
SELECT *, BINARY_CHECKSUM(a) AS bc
FROM (VALUES(1, NULL, 100)) s(id,a,b)
It gives the error
Msg 8184, Level 16, State 1, Line 8 Error in binarychecksum. There are
no comparable columns in the binarychecksum input.
This is not the only place where an untyped NULL constant is treated differently from an explicitly typed one.
Another case is
SELECT COALESCE(CAST(NULL AS INT),CAST(NULL AS INT))
vs
SELECT COALESCE(NULL,NULL)
I'd err on the side of "glitch" in this case rather than "by design" though as the columns from the derived table are supposed to be int before they get to the checksum function.
SELECT COALESCE(a,b)
FROM (VALUES(NULL, NULL)) s(a,b)
Does work as expected without this glitch.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have 2 tables and I need to make one view of them like if it was 1 single table
Table1 DEVICE
+-----+-------+-----------+
|DevID|DevName|DevIP |
+-----+-------+-----------+
|1 |HH1 |192.168.1.1|
+-----+-------+-----------+
|2 |HH2 |192.168.1.2|
+-----+-------+-----------+
Table2 DEVICECUSTOMDATA
+-----+------------+--------+
|DevID|Name |Value |
+-----+------------+--------+
|1 |Model |CN70 |
+-----+------------+--------+
|1 |BuildVersion|1.2 |
+-----+------------+--------+
|1 |BuildDate |20140113|
+-----+------------+--------+
|2 |Model |MC55 |
+-----+------------+--------+
|2 |BuildVersion|1.2 |
+-----+------------+--------+
|2 |BuildDate |20140110|
+-----+------------+--------+
The resulting table should be:
+-----+-------+-----------+-----+------------+---------+
|DevID|DevName|DevIP |Model|BuildVersion|BuildDate|
+-----+-------+-----------+-----+------------+---------+
|1 |HH1 |192.168.1.1|CN70 |1.2 |20140113 |
+-----+-------+-----------+-----+------------+---------+
|2 |HH2 |192.168.1.2|MC55 |1.2 |20140110 |
+-----+-------+-----------+-----+------------+---------+
I would appreciate any help to do this. Thanks
SQL SERVER:
See SqlFiddle:
SELECT d.DevId, d.DevName, d.DevIp, p.Model, p.BuildVersion, p.BuildDate
FROM DEVICE d
JOIN (
SELECT *
FROM DEVICECUSTOMDATA
PIVOT (MAX(Value) FOR Name IN ([Model], [BuildVersion], [BuildDate])) as Something) p
on d.DevId = p.DevId
Working Online Example (SQL Server Syntax): SQL Fiddle
Result:
SQL Script:
DECLARE #Device TABLE (
DevID int not null,
DevName varchar(max) not null,
DevIP varchar(max) not null
)
insert into #Device values ('1', 'HH1','192.168.1.1')
insert into #Device values ('2', 'HH2','192.168.1.2')
DECLARE #DeviceCustomData TABLE (
CDevID int not null,
Name varchar(max) not null,
Value varchar(max) not null
)
insert into #DeviceCustomData
values ('1','Model','CN70')
insert into #DeviceCustomData
values ('1','BuildVersion','1.2')
insert into #DeviceCustomData
values ('1','BuildDate','20140113')
insert into #DeviceCustomData
values ('2','Model','MC55')
insert into #DeviceCustomData
values ('2','BuildVersion','1.2')
insert into #DeviceCustomData
values ('2','BuildDate','20140110')
SELECT *
FROM
(SELECT d.DevID, d.DevName, d.DevIP, c.Value, c.Name
FROM #Device d
inner join #DeviceCustomData c on d.DevID = c.CDevID) AS SourceTable
PIVOT(
MIN(Value)
FOR Name in ([Model],[BuildVersion],[BuildDate])
) as PivotTable
Reference: http://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx
Someone I know went to an interview and was given the following problem to solve. I've thought about it for a few hours and believe that it's not possible to do without using some database-specific extensions or features from recent standards that don't have wide support yet.
I don't remember the story behind what is being represented, but it's not relevant. In simple terms, you're trying to represent chains of unique numbers:
chain 1: 1 -> 2 -> 3
chain 2: 42 -> 78
chain 3: 4
chain 4: 7 -> 8 -> 9
...
This information is already stored for you in the following table structure:
id | parent
---+-------
1 | NULL
2 | 1
3 | 2
42 | NULL
78 | 42
4 | NULL
7 | NULL
8 | 7
9 | 8
There could be millions of such chains and each chain can have an unlimited number of entries. The goal is to create a second table that would contain the exact same information, but with a third column that contains the starting point of the chain:
id | parent | start
---+--------+------
1 | NULL | 1
2 | 1 | 1
3 | 2 | 1
42 | NULL | 42
78 | 42 | 42
4 | NULL | 4
7 | NULL | 7
8 | 7 | 7
9 | 8 | 7
The claim (made by the interviewers) is that this can be achieved with just 2 SQL queries. The hint they provide is to first populate the destination table (I'll call it dst) with the root elements, like so:
INSERT INTO dst SELECT id, parent, id FROM src WHERE parent IS NULL
This will give you the following content:
id | parent | start
---+--------+------
1 | NULL | 1
42 | NULL | 42
4 | NULL | 4
7 | NULL | 7
They say that you can now execute just one more query to get to the goal shown above.
In my opinion, you can do one of two things. Use recursion in the source table to get to the front of each chain, or continuously execute some version of SELECT start FROM dst WHERE dst.id = src.parent after each update to dst (i.e. can't cache the results).
I don't think either of these situations is supported by common databases like MySQL, PostgreSQL, SQLite, etc. I do know that in PostgreSQL 8.4 you can achieve recursion using WITH RECURSIVE query, and in Oracle you have START WITH and CONNECT BY clauses. The point is that these things are specific to database type and version.
Is there any way to achieve the desired result using regular SQL92 in just one query? The best I could do is fill-in the start column for the first child with the following (can also use a LEFT JOIN to achieve the same result):
INSERT INTO dst
SELECT s.id, s.parent,
(SELECT start FROM dst AS d WHERE d.id = s.parent) AS start
FROM src AS s
WHERE s.parent IS NOT NULL
If there was some way to re-execute the inner select statement after each insert into dst, then the problem would be solved.
It can not be implemented in any static SQL that follows ANSI SQL 92.
But as you said it can be easy implemented with oracle's CONNECT BY
SELECT id,
parent,
CONNECT_BY_ROOT id
FROM table
START WITH parent IS NULL
CONNECT BY PRIOR id = parent
In SQL Server you would use a Common Table Expression (CTE).
To replicate the stored data I've created a temporary table
-- Create a temporary table
CREATE TABLE #SourceData
(
ID INT
, Parent INT
)
-- Setup data (ID, Parent, KeyField)
INSERT INTO #SourceData VALUES (1, NULL);
INSERT INTO #SourceData VALUES (2, 1);
INSERT INTO #SourceData VALUES (3, 2);
INSERT INTO #SourceData VALUES (42, NULL);
INSERT INTO #SourceData VALUES (78, 42);
INSERT INTO #SourceData VALUES (4, NULL);
INSERT INTO #SourceData VALUES (7, NULL);
INSERT INTO #SourceData VALUES (8, 7);
INSERT INTO #SourceData VALUES (9, 8);
Then I create the CTE to compile the data result:
-- Perform CTE
WITH RecursiveData (ID, Parent, Start) AS
(
-- Base query
SELECT ID, Parent, ID AS Start
FROM #SourceData
WHERE Parent IS NULL
UNION ALL
-- Recursive query
SELECT s.ID, s.Parent, rd.Start
FROM #SourceData AS s
INNER JOIN RecursiveData AS rd ON s.Parent = rd.ID
)
SELECT * FROM RecursiveData WHERE Parent IS NULL
Which will output the following:
id | parent | start
---+--------+------
1 | NULL | 1
42 | NULL | 42
4 | NULL | 4
7 | NULL | 7
Then I clean up :)
-- Clean up
DROP TABLE #SourceData
There is no recursive query support in ANSI-92, because it was added in ANSI-99. Oracle has had it's own recursive query syntax (CONNECT BY) since v2. While Oracle supported the WITH clause since 9i, SQL Server is the first I knew of to support the recursive WITH/CTE syntax -- Oracle didn't start until 11gR2. PostgreSQL added support in 8.4+. MySQL has had a request in for WITH support since 2006, and I highly doubt you'll see it in SQLite.
The example you gave is only two levels deep, so you could use:
INSERT INTO dst
SELECT a.id,
a.parent,
COALESCE(c.id, b.id) AS start
FROM SRC a
LEFT JOIN SRC b ON b.id = a.parent
LEFT JOIN SRC c ON c.id = b.parent
WHERE a.parent IS NOT NULL
You'd have to add a LEFT JOIN for the number of levels deep, and add them in proper sequence to the COALESCE function.