how to handle multiple occurrences for single digit in hive - hive

I want to format date coming from source file. I am handling junk values as this column may contain 0 (it could be 0 or 00 or 000 etc.)
select case when capture_date in ('','00','000') then NULL else from_unixtime(unix_timestamp(capture_date,'yyyyMMdd'),'yyyy-MM-dd') end as capt_dt from test_table;
instead of increasing junk values in the list I want to handle it in a generic way meaning if we receive any number of 0's it should populate as NULL .
any solution?

It seems there is no point dealing with illegal date literals since they will yield NULL values in any case (unless we might have 7 zeros or more)
hive> with test_table as (select stack(5,'','0','00','000','20170831') as capture_date)
> select from_unixtime(unix_timestamp(capture_date,'yyyyMMdd'),'yyyy-MM-dd') as capt_dt
> from test_table
> ;
OK
capt_dt
NULL
NULL
NULL
NULL
2017-08-31
If 7 or more zeros are optional -
hive> with test_table as (select stack(5,'','0','00','000000000000','20170831') as capture_date)
> select from_unixtime(unix_timestamp(regexp_replace(capture_date,'^0+$',''),'yyyyMMdd'),'yyyy-MM-dd') as capt_dt
> from test_table
> ;
OK
capt_dt
NULL
NULL
NULL
NULL
2017-08-31
or
hive> with test_table as (select stack(5,'','0','00','000000000000','20170831') as capture_date)
> select case
> when capture_date not rlike '^0+$'
> then from_unixtime(unix_timestamp(capture_date,'yyyyMMdd'),'yyyy-MM-dd')
> end as capt_dt
>
> from test_table
> ;
OK
capt_dt
NULL
NULL
NULL
NULL
2017-08-31

Related

SQL Server, Remove null values in a select case query

The code below generates around 300 rows, but only a small fraction of them has any value in column "Unit=3". The rest have null values, and hence many duplicate values in column "ekod" exists.
Does anyone know how to remove all rows with a null value in the column "unit=3"?
Best regards!
Result:
ekod unit=3
0004 NULL
0114 15
0114 NULL
0114 NULL
0120 NULL
0120 NULL
0120 46
0120 NULL
Code:
select
A.ekod
,case when A.unit='3' then count(*) end AS [Unit=3]
from [Stat_unitdata].[dbo].[XXX_YYY] A
group by a.ekod, a.unit
order by ekod
You can use sum.
select
A.ekod
,sum(case when a.unit='3' then 1 else 0 end) AS [Unit=3]
from [Stat_unitdata].[dbo].[XXX_YYY] A
group by a.ekod
order by ekod
As a note, if you don't care about ekods with zero units:
select a.ekod, count(*) as [Unit=3]
from [Stat_unitdata].[dbo].[XXX_YYY] a
where a.unit = '3'
group by a.ekod
order by a.ekod;
This returns only ekod values that have at least one unit = '3'.

Can I issue a 'select exists' that checks for ALL specified fields?

Say that I have a database:
TITLE | RUNTIME | EPISODES
-------------------------------------------
The X-Files 42 202
Fringe NULL 100
Seinfeld 21 NULL
I want to issue a statement like SELECT EXISTS(SELECT title,runtime,episodes FROM shows); that will return 1 if all three of those fields are present (as for The X-Files) but 0 if any of them are empty/null (as with Fringe and Seinfeld).
Is this possible using SQL alone?
I would suggest just doing:
select t.*,
(case when title is not null and runtime is not null and episodes is not null
then 1 else 0 end) as HasAllThree
from table t;
The EXISTS function checks if rows exist, not columns. You can add a WHERE clause to meet your business objectives with the EXISTS and a CASE.
SELECT
CASE WHEN EXISTS(
SELECT * FROM shows
WHERE title IS NOT NULL
AND runtime IS NOT NULL
AND episodes IS NOT NULL
) THEN 1 ELSE 0 END

SQL PIVOT Rows Into Matrix With Empty Column Results

I was needing to get my data into a format that works smoothly for kendo grid (example). This example is actually a hack to try to color individual chart bars which isn't available by default. If you format your data correctly you can stack the bars in groups and you are able to color each group.
My data needed to be structured like this with the 1st column names as column headers
names CompletedAllCourses HasExpiredCourses HasNotTakenCourses HasDueCourses
-------------------------------------------------------------------------------------------
CompletedAllCourses 12 NULL NULL NULL
HasDueCourses NULL NULL NULL 4
HasExpiredCourses NULL 8 NULL NULL
HasNotTakenCourses NULL NULL 24 NULL
This is what I had to start with GroupedStats Table
CompletedAllCourses 12
HasDueCourses 4
HasExpiredCourses 8
HasNotTakenCourses 24
I tried the following query from an example I found online.
SELECT * FROM GroupedStats
PIVOT
(
MAX(cnt) FOR cat IN (CompletedAllCourses,
HasExpiredCourses, HasNotTakenCourses, HasDueCourses)
) p
This was the result.
CompletedAllCourses HasExpiredCourses HasNotTakenCourses HasDueCourses
------------------- ----------------- ------------------ -------------
12 8 24 4
I figured out one way and posted it as the answer.
This will give you the required result but it requires hardcoded literals in CASE:
SELECT cat,
CASE cat WHEN 'CompletedAllCourses' THEN CompletedAllCourses ELSE NULL END AS CompletedAllCourses,
CASE cat WHEN 'HasExpiredCourses' THEN HasExpiredCourses ELSE NULL END AS HasExpiredCourses,
CASE cat WHEN 'HasNotTakenCourses' THEN HasNotTakenCourses ELSE NULL END AS HasNotTakenCourses,
CASE cat WHEN 'HasDueCourses' THEN HasDueCourses ELSE NULL END AS HasDueCourses
FROM GroupedStats
JOIN
(
SELECT * FROM GroupedStats
PIVOT
(
MAX(cnt) FOR cat IN (CompletedAllCourses,
HasExpiredCourses, HasNotTakenCourses, HasDueCourses)
) p
) X
ON 1 = 1
SQL Fiddle Demo
PIVOT Example to save the day
I moved this part of this post into the answer because it was ultimately what worked for me.
Need another column to fix things up or things collapse into one result.
select * from
(select cat as names, cnt, cat FROM GroupedStats) x
PIVOT
(
MAX(cnt) FOR cat IN (CompletedAllCourses, HasExpiredCourses,
HasNotTakenCourses, HasDueCourses)
) p
[sql fiddle] (http://sqlfiddle.com/#!6/8b706/3)
And I get the format I wanted PROBLEM SOLVED! Please comment if you can add to the explanation.
names CompletedAllCourses HasExpiredCourses HasNotTakenCourses HasDueCourses
---------------------------------------------------------------------------------------
CompletedAllCourses 12 NULL NULL NULL
HasDueCourses NULL NULL NULL 4
HasExpiredCourses NULL 8 NULL NULL
HasNotTakenCourses NULL NULL 24 NULL

Coalesce function not selecting data value from series when it exists

My code is as follows:
Insert Into dbo.database (Period, Amount)
Select coalesce (date_1, date_2, date_3), Amount FROM Source.dbo.[10]
I'm 100% a value exists in one of the 3 variables: date_1, date_2, date_3, all as strings (var char 100), yet I am still getting blanks when I call Period.
Any help?
Coalesce is designed to return the first NOT NULL field from the list or NULL if none of the fields are NOT NULL, follow the link for full details http://msdn.microsoft.com/en-us/library/ms190349.aspx
I would guess that you have blank values (' ') in one of the columns instead of NULL values. If you are trying to find the first not null non-blank column you can use a case statement.
select
case
when len(rtrim(ltrim(date_1))) > 0 then date_1
when len(rtrim(ltrim(date_2))) > 0 then date_2
when len(rtrim(ltrim(date_3))) > 0 then date_3
else null
end,
Amount
from Source.dbo.[10]

select result set row to columns transformation

I've a table remarks with columns id, story_id, like like can be +1, -1
I want my select query to return the following columns story_id, total, n_like, n_dislike where total = n_like + n_dislike without sub queries.
I am currently doing a group by on like and selecting like as like_t, count(like) as total which is giving me an output like
-- like_t --+ --- total --
-1 | 2
1 | 6
and returning two rows in result set. But what I want is to get 1 row where n_like is 6 and n_dislike is 2 and total is 8
First, LIKE is a reserved word in PostgreSQL, so you have to double-quote it. Maybe a better name should be picked for this column.
CREATE TABLE testbed (id int4, story_id int4, "like" int2);
INSERT INTO testbed VALUES
(1,1,'+1'),(1,1,'+1'),(1,1,'+1'),
(1,1,'+1'),(1,1,'+1'),(1,1,'+1'),
(1,1,'-1'),(1,1,'-1');
SELECT
story_id,
sum(CASE WHEN "like" > 0 THEN abs("like") ELSE 0 END) AS n_like,
sum(CASE WHEN "like" < 0 THEN abs("like") ELSE 0 END) AS n_dislike,
count(story_id) AS total
-- for cases +2 / -3 in the "like" field, use following construct instead
-- sum(abs("like")) AS total
FROM testbed
GROUP BY story_id;
I used abs("like") for cases when you'll have +2 or -3 in your "like" column.