CAST string to BIGINT in hive not giving expected result

CAST string to BIGINT in hive not giving expected result - hive

I am not able to understand output of query run on hive
select count(*),
count(col1) as count,
count( distinct col1) as distinct,
SUM (case when (cast(col1 as BIGINT) is null or cast(col1 as BIGINT) is not null )then 1 else 0 end) as total_count,
SUM (case when cast(col1 as BIGINT) is null then 1 else 0 end) as non_int_count,
SUM (case when cast(col1 as BIGINT) is not null then 1 else 0 end) as int_count,
SUM (case when cast(col1 as BIGINT) is null then 0 else 1 end) as int_count2,
FROM TABLE
where conditions
Result for this query in Hive is
count(*) count distinct total_count non_int_count int_count int_count2
23030525 23030525 1631400 23030525 2 258898 258898
Shouldn't total_count=count(*)=(non_int_count+ int_count). ?

Related

01476. 00000 - "divisor is equal to zero" Oracle

I have the following code as part of a Script:
ROUND
(
(
(COUNT(DISTINCT CASE WHEN ONLINE_SALES > 0 THEN CONTACT_KEY ELSE NULL END))
/
(COUNT(DISTINCT CASE WHEN ONLINE_SALES > 0 OR OFFLINE_SALES > 0 THEN CONTACT_KEY ELSE NULL END))
),3
) AS UNIQ_ONLINE_SHOP_RATE
when I run the script I get the 'Divizer is equal to zero' erro.
I ran the denominator and numerator separately which both equal zero so I understand the error.
I have tried NULLiF(,0) as so:
ROUND
(
(
(
COUNT(DISTINCT CASE WHEN ONLINE_SALES > 0 THEN CONTACT_KEY ELSE NULL END) /
nullif((COUNT(DISTINCT CASE WHEN ONLINE_SALES > 0 OR OFFLINE_SALES > 0 THEN CONTACT_KEY ELSE NULL END)),0)
),3
) AS UNIQ_ONLINE_SHOP_RATE
but then get 'FROM keyword not found where expected error.

Is this your real query? I assume ONLINE_SALES > 0 or OFFLINE_SALES > 0 is the default and ONLINE_SALES = 0 or OFFLINE_SALES = 0 is the exception.
Then in most cases your query would result in
COUNT(DISTINCT CONTACT_KEY) / COUNT(DISTINCT CONTACT_KEY)
which is not so "exciting", i.e. always 1
With some best-guess I would do:
NULLIF(
ROUND(
COUNT(DISTINCT CONTACT_KEY)
/
NULLIF(COUNT(DISTINCT ONLINE_SALES), 0)
, 3)
, 0) AS UNIQ_ONLINE_SHOP_RATE

Here are 3 options (CASE, NULLIF and DECODE), visually simple to understand - use a CTE to calculate both values, while the main query divides them, taking care about 0 as the denominator:
with temp as
(select
COUNT(DISTINCT CASE WHEN ONLINE_SALES > 0 THEN CONTACT_KEY ELSE NULL END) val1,
COUNT(DISTINCT CASE WHEN ONLINE_SALES > 0 OR OFFLINE_SALES > 0 THEN CONTACT_KEY ELSE NULL END) val2
from your_table
)
select
-- using CASE
val1 / case when val2 = 0 then null else val2 end as result_1,
-- using NULLIF
val1 / nullif(val2, 0) as result_2
-- using DECODE
val1 / decode(val2, 0, null, val2) as result_3
from temp;
Shouldn't be difficult to round the result at the end.

Validation for columns work very slow (SQL Server)

I want to perform data profiling on the columns of a table. In this particular case - what percentage of data is date/integer/numeric/bit. The query that I am using:
SELECT
CAST(SUM(CASE WHEN TRY_CAST([column1] AS date) IS NOT NULL AND TRY_CAST(TRY_CAST([column1] AS VARCHAR(8000)) AS date) between '1950-01-01' AND '2049-12-31' AND LEN(RTRIM(CAST([column1] AS VARCHAR(MAX)))) > 0 THEN 1 ELSE 0 END) AS NUMERIC(25,2))/(CAST(SUM(CASE WHEN [column1] IS NOT NULL AND LOWER(LTRIM(RTRIM(CAST([column1] AS VARCHAR(MAX))))) not IN ('null', 'n/a') AND LEN(LTRIM(RTRIM(CAST([column1] AS VARCHAR(MAX))))) > 0 THEN 1 ELSE 0 END) AS NUMERIC(25,2))+0.00000001) AS PercentDate,
CAST(SUM(CASE WHEN TRY_CAST([column1] AS FLOAT) IS NOT NULL AND LEN(RTRIM(CAST([column1] AS VARCHAR(MAX)))) > 0 THEN 1 ELSE 0 END) AS NUMERIC(25,2))/(CAST(SUM(CASE WHEN [column1] IS NOT NULL AND LOWER(LTRIM(RTRIM(CAST([column1] AS VARCHAR(MAX))))) not IN ('null', 'n/a') AND LEN(LTRIM(RTRIM(CAST([column1] AS VARCHAR(MAX))))) > 0 THEN 1 ELSE 0 END) AS NUMERIC(25,2))+0.00000001) AS PercentNumeric,
CAST(SUM(CASE WHEN TRY_CAST([column1] AS BIGINT) IS NOT NULL AND LEN(RTRIM(CAST([column1] AS VARCHAR(MAX)))) > 0 THEN 1 ELSE 0 END) AS NUMERIC(25,2))/(CAST(SUM(CASE WHEN [column1] IS NOT NULL AND LOWER(LTRIM(RTRIM(CAST([column1] AS VARCHAR(MAX))))) not IN ('null', 'n/a') AND LEN(LTRIM(RTRIM(CAST([column1] AS VARCHAR(MAX))))) > 0 THEN 1 ELSE 0 END) AS NUMERIC(25,2))+0.00000001) AS PercentInteger,
CAST(SUM(CASE WHEN LOWER(TRY_CAST([column1] AS VARCHAR(MAX))) IN ('1', '0', 't', 'f', 'y', 'n', 'true', 'false', 'yes', 'no') THEN 1 ELSE 0 END) AS NUMERIC(25,2))/(CAST(SUM(CASE WHEN [column1] IS NOT NULL AND LOWER(LTRIM(RTRIM(CAST([column1] AS VARCHAR(MAX))))) not IN ('null', 'n/a') AND LEN(LTRIM(RTRIM(CAST([column1] AS VARCHAR(MAX))))) > 0 THEN 1 ELSE 0 END) AS NUMERIC(25,2))+0.00000001) AS PercentBit
FROM tbl
This query works really slow even if I choose only top 1 row. Actually I am not able to get any result, or at least I cannot wait such a long time.
The column that I am checking is of type decimal if this is of any importance.
The number of records in the table is: 37,431,866. This is why I choose only top 1000 for example, but still does not load any result for more than 40 minutes

If you want this to run faster, then you don't want to limit the rows in the query you are using. After all, an aggregation query with no GROUP BY only returns one row.
Instead use a subquery:
SELECT . . .
FROM (SELECT TOP (1000) t.*
FROM tbl t
) t
Note that this is not a random sample. And if you attempt ORDER BY newid() you will kill performance. One alternative to get an approximate n% sample is to use logic like this:
SELECT . . .
FROM (SELECT TOP (1000) t.*
FROM tbl t
WHERE RAND(CHECKSUM(NEWID())) < 0.001
) t
The 0.001 would be about a 0.1% sample.

Your question can be simpliefied. The part:
CAST(SUM(CASE WHEN TRY_CAST([column1] AS date) IS NOT NULL AND TRY_CAST(TRY_CAST([column1] AS VARCHAR(8000)) AS date) between '1950-01-01' AND '2049-12-31' AND LEN(RTRIM(CAST([column1] AS VARCHAR(MAX)))) > 0 THEN 1 ELSE 0 END) AS NUMERIC(25,2))
can alo be written as:
CAST(SUM(CASE WHEN TRY_CAST(TRY_CAST([column1] AS VARCHAR(8000)) AS date) between '1950-01-01' AND '2049-12-31' THEN 1 ELSE 0 END) AS NUMERIC(25,2))
The second one is quicker then the first one, with the same result. (AFAIK)
This probably can also be applied to the other parts in the query.

SQL server Calculate with COUNT [duplicate]

This question already has an answer here:
How to use an Alias in a Calculation for Another Field
(1 answer)
Closed 4 years ago.
select
category, count(category) as 'TotalCounts',
COUNT(case kind when 'avail'then 1 else null end) as 'avail',
Count(case kind when 'offers' then 1 else null end) as 'offers',
COUNT(CASE contactMethod WHEN 'SMS' then 1 else null END) as 'SMS',
COUNT(case contactMethod when 'call' then 1 else null end) as 'call',
CONVERT(varchar(254),COUNT (case when max_biz_status='A' OR
max_biz_status ='B' then 1 else null end) * 100 / count(category)) +'%'
as 'Percetange'
from reports
group by category
order by TotalCounts
Instead of calculating again in Convert method i want to use avail* 100 / TotalCounts like i did in order by when i used TotalCounts.
i tried:
CONVERT(varchar(254),avail * 100 / TotalCounts) +'%' as 'Percetange'
but i get 'invalid column name' for avail and TotalCounts

You can't do that because your TotalCounts column is made from your result set.
you can try to use a subquery to contain it then calculation.
if your mssql version support CONCAT function you can use it let the SQL clearer.
SELECT t1.*,CONCAT((max_biz_statusCnt * 100 /TotalCounts),'%')as 'Percetange'
FROM
(
select
category,
count(*) as 'TotalCounts',
COUNT(case kind when 'avail'then 1 else null end) as 'avail',
Count(case kind when 'offers' then 1 else null end) as 'offers',
COUNT(CASE contactMethod WHEN 'SMS' then 1 else null END) as 'SMS',
COUNT(case contactMethod when 'call' then 1 else null end) as 'call',
COUNT (case when max_biz_status='A' OR max_biz_status ='B' then 1 else null end) 'max_biz_statusCnt'
from reports
group by category
) t1
order by TotalCounts

You can't use avail or TotalCounts as you just created them, so they aren't in scope, using a common-table expression is one way to fix this:
WITH cte AS (
SELECT
category,
COUNT(category) AS TotalCounts,
COUNT(case kind WHEN 'avail' THEN 1 ELSE NULL END) AS avail,
COUNT(case kind WHEN 'offers' THEN 1 ELSE NULL END) AS offers,
COUNT(CASE contactMethod WHEN 'SMS' THEN 1 ELSE NULL END) AS SMS,
COUNT(case contactMethod WHEN 'call' THEN 1 ELSE NULL END) AS [call]
FROM
reports
GROUP BY
category)
SELECT
*,
CONVERT(varchar(254),avail * 100 / TotalCounts) +'%' AS Percetange --(sic)
FROM
cte
ORDER BY
TotalCounts;

count categories of values

I have int values v >= 0 in nullable column and I would like to count number of occurrences of Null, 0, 1 and 2+ in column how to do it efficiently?

One method is group by:
select (case when col in (0, 1) then cast(col as varchar(255))
else '2+'
end) as grp, count(*)
from t
group by (case when col in (0, 1) then cast(col as varchar(255))
else '2+'
end)
order by min(col);
The exact syntax for the cast() might depend on the database. This also assumes all values are non-negative.
You can put the counts in different columns as well:
select sum(case when val = 0 then 1 else 0 end) as cnt_0,
sum(case when val = 1 then 1 else 0 end) as cnt_1,
sum(case when val >= 2 then 1 else 0 end) as cnt_2pl
from t;

Same part of a subquery in multiple select

I have a table like this
TABLEMAIN
Q1 Name Group Zone Month Type
1 'N1' 'G1' 'Z1' 12 'T1'
4 'N1' 'G3' 'Z2' 12 'T6'
6 'N1' 'G1' 'Z5' 12 'T2'
3 'N2' 'G4' 'Z5' 12 'T4'
.
.
.
And I have something like this to get certain results
Query1:
select
(SUM(CASE Q1>=2 and Q1<=4 THEN 1 ELSE 0 END)) TOTAL,
(CASE WHEN Type = 'T1' THEN SUM(CASE WHEN Q1=4 THEN 1 ELSE 0 END)) T1TYPE,
(CASE WHEN Type = 'T1' THEN SUM(CASE WHEN Q1=4 THEN 1 ELSE 0 END)) T2TYPE,
Type,
**Zone,**
Month
from
TABLEMAIN
GROUP BY Type, **Zone,** Month;
Query2:
select
(SUM(CASE Q1>=2 and Q1<=4 THEN 1 ELSE 0 END)) TOTAL,
(CASE WHEN Type = 'T1' THEN SUM(CASE WHEN Q1=4 THEN 1 ELSE 0 END)) T1TYPE,
(CASE WHEN Type = 'T1' THEN SUM(CASE WHEN Q1=4 THEN 1 ELSE 0 END)) T2TYPE,
Type,
**Group,**
Month
from
TABLEMAIN
GROUP BY Type, **Group,** Month;
As you can see I group this table many times in many ways, but this part is the same in every query
select
(SUM(CASE Q1>=2 and Q1<=4 THEN 1 ELSE 0 END)) TOTAL,
(CASE WHEN Type = 'T1' THEN SUM(CASE WHEN Q1=4 THEN 1 ELSE 0 END)) T1TYPE,
(CASE WHEN Type = 'T1' THEN SUM(CASE WHEN Q1=4 THEN 1 ELSE 0 END)) T2TYPE,
Is there a better way to do this? I'm not sure if I can use a materialized view for this

Perhaps. You can do it all in one query, if you like by using grouping sets:
select SUM(CASE Q1>=2 and Q1<=4 THEN 1 ELSE 0 END) as TOTAL,
(CASE WHEN Type = 'T1' THEN SUM(CASE WHEN Q1=4 THEN 1 ELSE 0 END)) as T1TYPE,
(CASE WHEN Type = 'T1' THEN SUM(CASE WHEN Q1=4 THEN 1 ELSE 0 END)) as T2TYPE
Type, **Zone,**, **Group,** Month
from TABLEMAIN
GROUP BY GROUPING SETS((Type, **Zone,** Month), (Type, **Group,** Month));
This puts all the results in a single table.

I second with #GolezTrol comment. Would like to explain further.
SUBQUERY FACTORING is what you need. The WITH clause, or subquery factoring clause, is part of the SQL-99 standard and was added into the Oracle SQL syntax in Oracle 9.2. The WITH clause may be processed as an inline view or resolved as a temporary table. The advantage of the latter is that repeated references to the subquery may be more efficient as the data is easily retrieved from the temporary table, rather than being required by each reference.
WITH data AS(
<your subquery>
)
SELECT * FROM data
bla bla bla...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

CAST string to BIGINT in hive not giving expected result - hive

Related

01476. 00000 - "divisor is equal to zero" Oracle

Validation for columns work very slow (SQL Server)

SQL server Calculate with COUNT [duplicate]

count categories of values

Same part of a subquery in multiple select

Categories

Resources