How to Select count and literal value in hive

How to Select count and literal value in hive - sql

Why is this query returning an error. I am trying to load the code for table as a constant string, the flag for data again a constant string, the time of insertion and the counts for a table. I thought, let me try and run the secelct before writing the inserts.
But for some reason, it fails listing column names from tables from where I am trying to get a count. All i need is two constant values, one date and one count. Tried by removing the groupby as well, throws another error.
hive -e "select "WEB" as src_cd, "1Hr" as Load_Flag, from_unixtime((unix_timestamp(substr(sysDate, 0, 11), 'dd/MMM/yyyy')), 'MM/dd/yyyy') as time, count(*)
from weblog
where year=2015 and month=04 and day=17
group by src_cd, load_flag, time
;
"
OK
Time taken: 1.446 seconds
FAILED: SemanticException [Error 10004]: Line 4:9 Invalid table alias or column reference 'src_cd': (possible column names are: clientip, authuser, sysdate, clfrequest.........(and so on) year, month, day)

The double quotes on the literals is a problem. Here is a simpler version that I tested successfully:
hive -e "select 'WEB' , '1Hr' , from_unixtime((unix_timestamp(substr(sysDate, 0, 11), 'dd/MMM/yyyy')), 'MM/dd/yyyy') as time, count(*) from weblog where year=2015 and month=04 and day=17 group by 1,2 , from_unixtime((unix_timestamp(substr(sysDate, 0, 11), 'dd/MMM/yyyy')), 'MM/dd/yyyy') ; "

Just leave out the constants in the group by. It isn't doing anything:
select "WEB" as src_cd, "1Hr" as Load_Flag,
from_unixtime((unix_timestamp(substr(sysDate, 0, 11), 'dd/MMM/yyyy')), 'MM/dd/yyyy') as time, count(*)
from weblog
where year = 2015 and month = 04 and day = 17
group by from_unixtime((unix_timestamp(substr(sysDate, 0, 11), 'dd/MMM/yyyy')), 'MM/dd/yyyy')
I don't think Hive allows column aliases in the group by, so you need to put in the entire expression or use a subquery/CTE.

There are two things.
1. Hive does not parse double quote or single quote in that way. So instead use back quote(`).
2. In group by clause either use the columnar position specifier or the direct functional translation.

Related

What causes error "Strings cannot be added or subtracted in dialect 3"

I have the query:
WITH STAN_IND
AS (
SELECT ro.kod_stanow, ro.ind_wyrob||' - '||ro.LP_OPER INDEKS_OPERACJA, count(*) ILE_POWT
FROM M_REJ_OPERACJI ro
JOIN M_TABST st ON st.SYMBOL = ro.kod_stanow
WHERE (st.KOD_GRST starting with 'F' or (st.KOD_GRST starting with 'T') ) AND ro.DATA_WYKON>'NOW'-100
GROUP BY 1,2)
SELECT S.kod_stanow, count(*) ILE_INDEKS, SUM(ILE_POWT-1) POWTORZEN
from STAN_IND S
GROUP BY S.kod_stanow
ORDER BY ILE_INDEKS
That should be working, but I get an error:
SQL Error [335544606] [42000]: Dynamic SQL Error; expression evaluation not supported; Strings cannot be added or subtracted in dialect 3 [SQLState:42000, ISC error code:335544606]
I tried to cast it into bigger varchar but still no success. What is wrong here? Database is a Firebird 2.1

Your problem is 'NOW'-100. The literal 'NOW' is not a date/timestamp by itself, but a CHAR(3) literal. Only when compared to (or assigned to) a date or timestamp column will it be converted, and here the subtraction happens before that point. And the subtraction fails, because subtraction from a string literal is not defined.
Use CAST('NOW' as TIMESTAMP) - 100 or CURRENT_TIMESTAMP - 100 (or cast to DATE or use CURRENT_DATE if the column DATA_WYKON is a DATE).

How to SELECT records from a temp table applying a specific condition?

I have dumped a file into a temp table and trying to pull records where the score is <= 100. The column has values for example, 3, 4.50, 10.02, 99.88,99,99, 100, 100.2, 100, 116, 116.44 etc. I need only 3, 4.50, 10.02,99.88,99,99,100. When I tried the below, it does not give the proper result. Could anyone advise?
SELECT CAST(sum AS numeric ),* FROM temp WHERE sum >= '100' ;

There are two issues here. First, you're using the >= operator instead of the <= operator. Second, sum seems to be a string column, so the comparison is performed lexicographically. You could apply the same casting you've applied in the select list and compare the value to a number literal 100, not a string literal '100':
SELECT CAST(sum AS numeric ),* FROM temp WHERE CAST(sum AS numeric) >= 100;

Using a subquery in the WHERE clause of gapfill in TimescaleDB

I would like to run the gapfill function of timescaleDB in a way, where the start and end dates are generated automatically. For example, I would like to run the gapfill function between the largest and the lowest entries in the database.
Given dataset playground:
CREATE TABLE public.playground (
value1 numeric,
"timestamp" bigint,
name "char"
);
INSERT INTO playground(name, value1, timestamp)
VALUES ('test', 100, 1599100000000000000);
INSERT INTO playground(name, value1, timestamp)
VALUES ('test', 100, 1599100001000000000);
INSERT INTO playground(name, value1, timestamp)
VALUES ('test', 100, 1599300000000000000);
I have tried getting the data as such:
SELECT time_bucket_gapfill(300E9::BIGINT, timestamp) as bucket
FROM playground
WHERE
timestamp >= (SELECT COALESCE(MIN(timestamp), 0) FROM playground)
AND
timestamp < (SELECT COALESCE(MAX(timestamp), 0) FROM playground)
GROUP BY bucket
I get an error:
ERROR: missing time_bucket_gapfill argument: could not infer start from WHERE clause
If I try the query with hard coded timestamps, the query runs just fine.
For example:
SELECT time_bucket_gapfill(300E9::BIGINT, timestamp) as bucket
FROM playground
WHERE timestamp >= 0 AND timestamp < 15900000000000000
GROUP BY bucket
Another approach of providing the start and end dates as arguments in the gapfill function fails as well.
WITH bounds AS (
SELECT COALESCE(MIN(timestamp), 0) as min, COALESCE(MAX(timestamp), 0) as max
FROM playground
WHERE timestamp >= 0 AND timestamp < 15900000000000000
),
gapfill as(
SELECT time_bucket_gapfill(300E9::BIGINT, timestamp, bounds.min, bounds.max) as bucket
FROM playground, bounds
GROUP BY bucket
)
select * from gapfill
ERROR: invalid time_bucket_gapfill argument: start must be a simple expression

time_bucket_gapfill only accepts start and finish values, which can be evaluated to constants at the query planning time. So it works to provide expression with constants and now, however it doesn't work to access a table in the expressions.
While this limitation on time_bucket_gapfill is in place it is not possible to achieve the desired behaviour in a single query. The work around is to calculate values for start and finish separately and then provide the values into the query with time_bucket_gapfill, which can be done in a stored procedure or in the application.
A side note, if PREPARE statement will be used in PostgreSQL 12, it is important to explicitly disable generic plan for the same reason.

for inferring start and stop from WHERE clause only direct column references are supported
see : https://github.com/timescale/timescaledb/issues/1345
so something like that might work , ( I have no timescaleDB access to test)
but try this :
SELECT
time_bucket_gapfill(300E9::BIGINT, time_range.min , time_range.max ) AS bucket
FROM
(
SELECT
COALESCE(MIN(timestamp), 0) AS min
, COALESCE(MAX(timestamp), 0) AS max
FROM
playground
) AS time_range
, playground
WHERE
timestamp >= time_range.min
AND timestamp < time_range.max
GROUP BY
bucket;

How many times is to_date executed in where clause?

In the following query, is the to_date function executed multiple times or just once? This query is running long and I'm trying to find a way around it without asking for an index to be added.
select edi_stat_rsn_cd
, TSET_ID
, count(*) as count
from comt_po_msg
where TSET_ID in ('PSH','ORD','850','870')
and trunc(crt_ts) = to_date('03-06-2017','mm-dd-yyyy') --here
and LOC_TYP_CD in ('BOSS','STH')
group by edi_stat_rsn_cd, TSET_ID;

Try the below query. This will avoid the TRUNC() function for every row,
SELECT EDI_STAT_RSN_CD
, TSET_ID
, COUNT(*) AS COUNT
FROM COMT_PO_MSG
WHERE TSET_ID IN ('PSH','ORD','850','870')
--AND TRUNC(CRT_TS) = TO_DATE('03-06-2017','MM-DD-YYYY') --HERE
AND TO_CHAR(CRT_TS, 'MM-DD-YYYY') = '03-06-2017'
AND LOC_TYP_CD IN ('BOSS','STH')
GROUP BY EDI_STAT_RSN_CD
, TSET_ID;

A reasonable optimizer should only call deterministic functions once on constants. But why bother? Instead, use the date keyword to include a date constant:
where TSET_ID in ('PSH', 'ORD', '850', '870') and
trunc(crt_ts) = date '2017-03-06' and
LOC_TYP_CD in ('BOSS', 'STH')
This has the additional advantage of allowing standard date formats.
The use of trunc(crt_ts) can prevent the optimizer from using an index -- which is a much bigger consideration than calling to_date(). I would recommend:
where TSET_ID in ('PSH', 'ORD', '850', '870') and
crt_ts >= date '2017-03-06' and
crt_ts < '2017-03-07' and
LOC_TYP_CD in ('BOSS', 'STH')

As already commented, it should be treated static since you are using static value to_date('03-06-2017','mm-dd-yyyy') and your to_date() function doesn't depend on column value (or) not getting evaluated based on per record value

query to subtract date from systimestamp in oracle 11g

I want to perform a subtraction operation on the date returned from another query and the system time in oracle SQL. So far I have been able to use the result of another query but when I try to subtract from systimestamp it gives me the following error
ORA-01722: invalid number
'01722. 00000 - "invalid number"
*Cause: The specified number was invalid.
*Action: Specify a valid number.
Below is my query
select round(to_number(systimestamp - e.last_time) * 24) as lag
from (
select ATTR_VALUE as last_time
from CONFIG
where ATTR_NAME='last_time'
and PROCESS_TYPE='new'
) e;
I have also tried this
select to_char(sys_extract_utc(systimestamp)-e.last_time,'YYYY-MM-DD HH24:MI:SS') as lag
from (
select ATTR_VALUE as last_time
from CONFIG
where ATTR_NAME='last_time'
and PROCESS_TYPE='new'
) e;
I want the difference between the time intervals to be in hours.
Thank you for any help in advance.
P.S. The datatype of ATTR_VALUE is VARCHAR2(150). A sample result of e.last_time is 2016-09-05 22:43:81796

"its VARCHAR2(150). That means I need to convert that to date"
ATTR_VALUE is a string so yes you need to convert it to the correct type before attempting to compare it with another datatype. Given your sample data the correct type would be timestamp, in which case your subquery should be:
(
select to_timestamp(ATTR_VALUE, 'yyyy-mm-dd hh24:mi:ss.ff5') as last_time
from CONFIG
where ATTR_NAME='last_time'
and PROCESS_TYPE='new'
)
The assumption is that your sample is representative of all the values in your CONFIG table for the given keys. If you have values in different formats your query will break on some other way: that's the danger of using this approach.

So finally after lots of trial and errors I got this one
1. Turns out initially the error was because the data_type of e.last_time was VARCHAR(150).
To find out the datatype of a given column in the table I used
desc <table_name>
which in my case was desc CONFIG
2. To convert VARCHAR to system time I have two options to_timestamp and to_date. If I use to_timestamp like
select round((systimestamp - to_timestamp(e.last_time,'YYYY-MM-DD HH24:MI:SSSSS')) * 24, 2) as lag
from (
select ATTR_VALUE as last_time
from CONFIG
where ATTR_NAME='last_time'
and PROCESS_TYPE='new'
) e;
I get an error that round expects NUMBER and got INTERVAL DAY TO SECONDS since the difference in date comes out to be like +41 13:55:20.663990. To convert that into hour would require a complex logic.
An alternative is to use to_data which I preferred and used it as
select round((sysdate - to_date(e.last_time,'YYYY-MM-DD HH24:MI:SSSSS')) * 24, 2) as lag
from (
select ATTR_VALUE as last_time
from CONFIG
where ATTR_NAME='last_time'
and PROCESS_TYPE='new'
) e;
This returns me the desired result i.e. the difference in hours rounded off to 2 floating digits

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to Select count and literal value in hive - sql

There are two things. 1. Hive does not parse double quote or single quote in that way. So instead use back quote(`). 2. In group by clause either use the columnar position specifier or the direct functional translation.

Related

What causes error "Strings cannot be added or subtracted in dialect 3"

How to SELECT records from a temp table applying a specific condition?

Using a subquery in the WHERE clause of gapfill in TimescaleDB

How many times is to_date executed in where clause?

query to subtract date from systimestamp in oracle 11g

Categories

Resources