Preventing "NULL" values in Hive

Preventing "NULL" values in Hive - hive

I have thousands of rows of this structure
sum(case when sp_flag='Y' then 1 else 0 end) SP_Fl
How can I force the unknowns to be 0 instead of NULL and have the column be numeric?

Use coalesce (or) NVL functions to replace NULL values with 0.
hive> select coalesce(int(null),0); --replacing integer null value with 0
+------+--+
| _c0 |
+------+--+
| 0 |
+------+--+
hive> select nvl(int(null),0); --replacing integer null value with 0
+------+--+
| _c0 |
+------+--+
| 0 |
+------+--+
For more details regards to replace empty strings/Null values refer to this link.
UPDATE:
1. Replacing NULL values in the strings:
hive> select coalesce(string(NULL),0);
+------+--+
| _c0 |
+------+--+
| 0 |
+------+--+
hive> select coalesce(string(NULL),0);
+------+--+
| _c0 |
+------+--+
| 0 |
+------+--+
2. Replacing "NULL" strings:
We cannot replace NULL strings with coalesce because there is NULL string(length of 4) is in the data but coalesce only replaces null values.
hive> select coalesce(string("NULL"),0),length(string("NULL"));
+-------+------+--+
| _c0 | _c1 |
+-------+------+--+
| NULL | 4 |
+-------+------+--+
If you are having this kind of data then write a case statement (or) regexp_replace (or) replace functions to replace NULL strings with 0.

Related

SQL Server: SUM numeric value but leave strings as is

I'm having trouble suming a column which has both numeric and ncharvar values where the numerics are summed (and grouped) but the strings are left as is.
I.e.:
from:
| ID | Value |
+----+-------+
| a | 4 |
| b | 3 |
| c | hello |
| a | 8 |
+----+-------+
to:
| ID | Value |
+----+-------+
| a | 12 |
| b | 3 |
| c | hello |
+----+-------+
So far I have:
SELECT
[ID],
[CASE]
WHEN ISNUMERIC([Value]) = 1 THEN SUM(CAST([Value] AS INT))
ELSE [Value]
END AS Value
FROM db
GROUP BY [ID]
But I get an error that "the column Value is inavlid in the select list because it is not contained in either an aggregate function or the GROUP BY clause".

Use try_convert()/try_cast() instead:
SELECT [ID], SUM(TRY_CAST([Value] as int))
FROM db
GROUP BY [ID];
Incidentally, the error that you are getting is because your cast() is after the sum(). You have a syntax error because value is not in t he group by. If you didn't you would still get a run-time error.

Select * query for sequences does not show all columns in output. (PG 11.5)

I am upgrading postgres from 9.1 to 11.5 .
but select query on sequences is returning different output in 11.5 as compared to 9.1, not all columns are shown in the output.
Output in 11.5
SELECT * FROM session_SEQ;
11.5:
last_value | log_cnt | is_called
------------+---------+-----------
1 | 0 | f
(1 row)
Output in 9.1:
SELECT * FROM session_SEQ;
sequence_name | last_value | start_value | increment_by | max_value | min_value | cache_value | log_cnt | is_cycled | is_called
---------------+------------+-------------+--------------+-----------+-----------+-------------+---------+-----------+-----------
session_seq | 1 | 1 | 1 | 99999999 | 1 | 1 | 0 | f | f
How can we display all the columns in 11.5? is there any workaround?*

You can query the pg_sequences and pg_class table to find this information out, like:
select relname,pg_sequence.* from pg_sequence inner join pg_class on pg_class.oid=pg_sequence.seqrelid;

How to combine 2 columns in result table?

I wrote a pretty big sql query that joins (outer join) two similar queries. Each one of them returns a table in format:
date | value1(q1)
-----------+-----------
05-06-2010 | 10
05-07-2017 | 12
And the same for the second subquery. After i join them i get a following table:
date | value1(q1) | date | value(q2)
-----------+------------+------------+--------
05-06-2010 | 10 | NULL | NULL
05-07-2017 | 12 | NULL | NULL
NULL | NULL | 05-07-2010 | 15
NULL | NULL | 01-02-2008 | 17
I tried wrapping everything in a CONCAT, but it doesn't work.
How to get result in such a form:
date | value1(q1) | value(q2)
-----------+------------+-----------
05-06-2010 | 10 | 0
05-07-2017 | 12 | 10
07-08-2018 | 14 | 17

Try this below script-
SELECT [date],
SUM([value1(q1)]) AS 'value1(q1)',
SUM([value(q2)]) AS 'value(q2)'
FROM
(
SELECT [date],
[value1(q1)] AS 'value1(q1)',
0 AS 'value(q2)'
FROM your_table_1
UNION ALL
SELECT [date],
0 AS 'value1(q1)',
[value(q2)] AS 'value(q2)'
FROM your_table_2
)A
GROUP BY [date]

I think you want a full join:
select coalesce(q1.date, q2.date) as date,
coalesce(q1.value, 0) as value1,
coalesce(q2.value, 0) as value2
from q1 full join
q2
on q1.date = q2.date;

Remove duplicates from query, while repeating

I have an SQL table with some data like this, it is sorted by date:
+----------+------+
| Date | Col2 |
+----------+------+
| 12:00:01 | a |
| 12:00:02 | a |
| 12:00:03 | b |
| 12:00:04 | b |
| 12:00:05 | c |
| 12:00:06 | c |
| 12:00:07 | a |
| 12:00:08 | a |
+----------+------+
So, I want my select result to be the following:
+----------+------+
| Date | Col2 |
+----------+------+
| 12:00:01 | a |
| 12:00:03 | b |
| 12:00:05 | c |
| 12:00:07 | a |
+----------+------+
I have used the distinct clause but it removes the last two rows with Col2 = 'a'

You can use lag (SQL Server 2012+) to get the value in the previous row and then compare it with the current row value. If they are equal assign them to one group (1 here) and a different group (0 here) otherwise. Finally select the required rows.
select dt,col2
from (
select dt,col2,
case when lag(col2,1,0) over(order by dt) = col2 then 1 else 0 end as somecol
from t) x
where somecol=0

If you are using Microsoft SQL Server 2012 or later, you can do this:
select date, col2
from (
select date, col2,
case when isnull(lag(col2) over (order by date, col2), '') = col2 then 1 else 0 end as ignore
from (yourtable)
) x
where ignore = 0
This should work as long as col2 cannot contain nulls and if the empty string ('') is not a valid value for col2. The query will need some work if either assumption is not valid.

same as accepted answer (+1) just moving the conditions
assumes col2 is not null
select dt, col2
from ( select dt, col2
lag(col2, 1) over(order by dt) as lagCol2
from t
) x
where x.lagCol2 is null or x.lagCol2 <> x.col2

Changing the value of a `*_id_seq` table in postgresql

I have the following table in postgresql: myapp_mymodel_id_seq
Column | Type | Value
---------------+---------+----------------------------
sequence_name | name | myapp_mymodel_id_seq
last_value | bigint | 3
start_value | bigint | 1
increment_by | bigint | 1
max_value | bigint | 9223372036854775807
min_value | bigint | 1
cache_value | bigint | 1
log_cnt | bigint | 32
is_cycled | boolean | f
is_called | boolean | t
How do I change 3 under Value and last_value to 40?
I tried updating last_value but it won't recognize the column.
UPDATE myapp_mymodel_id_seq SET Value=40 WHERE Value=3;
ERROR: column "value" does not exist

select setval('myapp_mymodel_id_seq', 40);
See the manual for more details: http://www.postgresql.org/docs/current/static/functions-sequence.html

UPDATE myapp_mymodel_id_seq SET last_value = 40 WHERE last_value = 3;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Preventing "NULL" values in Hive - hive

I have thousands of rows of this structure sum(case when sp_flag='Y' then 1 else 0 end) SP_Fl How can I force the unknowns to be 0 instead of NULL and have the column be numeric?

Related

SQL Server: SUM numeric value but leave strings as is

Select * query for sequences does not show all columns in output. (PG 11.5)

How to combine 2 columns in result table?

Remove duplicates from query, while repeating

Changing the value of a `*_id_seq` table in postgresql

Categories

Resources