Between Clause For dates not working in Impala - impala

My table EMPLOYEE contains loaddate column having string datatype.
In this column dates are stored like
'2019-12-8',
'2019-12-9',
'2019-12-10',
'2019-12-11',
'2019-12-12',
'2019-12-13',
'2019-12-14',
'2019-12-15',
when I am running below query
SELECT * FROM employee where loaddate between '2019-12-8' and '2019-12-14';
I am getting 0 records.
But when I am running below query getting records but only from 2019-12-10 to 2019-12-14' though table contains records for 8 & 9
SELECT * FROM employee where loaddate between '2019-12-08' and '2019-12-14';
The difference between above two queries is 2019-12-8 and 2019-12-08.
So don't understand why first query is not giving any record while second query is giving records but not all.

The answer lies in understanding of below parts.
1. Between operator functionality.
Refer : impala between operator
BETWEEN Operator : expression BETWEEN lower_bound AND upper_bound
a. In a WHERE clause, compares an expression to both a lower_bound and upper_bound.
b. The comparison is successful if the expression is greater than or equal to the lower_bound, and less than or equal to the upper_bound.
c. If the bound values are switched, so the lower_bound is greater than the upper_bound, does not match any values.
2. datatype used is string but between operator is typically used with numeric data types as per the documentation.
3. order by of the loaddate
Clearly the value '2019-12-8' is greater than '2019-12-15' in the ascending order. Insert two more values '2019-12-08' and '2019-12-09' they will come at the top when same order by query is run since they will be less than '2019-12-10'.
SELECT loaddate FROM employee order by loaddate;
'2019-12-10'
'2019-12-11'
'2019-12-12'
'2019-12-13'
'2019-12-14'
'2019-12-15'
'2019-12-8'
'2019-12-9'
SELECT * FROM employee where loaddate between '2019-12-8' and '2019-12-14'; ---zero results because lower_bound('2019-12-8') is the larger value than upper_bound('2019-12-14'). (case c)
SELECT * FROM employee where loaddate between '2019-12-08' and '2019-12-14'; ---getting records from 2019-12-10 to 2019-12-14 because 08 and 09 don't exist.
SELECT * FROM employee where loaddate between '2019-12-08' and '2019-12-9'; ---this will fetch all the data in the sample considering 10 is greater than 08(assuming 08 and 09 are not inserted yet).

Related

Firebird select distinct with count

In Firebird 2.5 I have a table of hardware device events; each row contains a timestamp, a device ID and an integer status of the event. I need to retrieve a rowset of the subset of IDs with non-0 statuses and the number of instances of the non-0 events for each ID, within a specified date range. I can get the subset of IDs with non-0 statuses in the specified date range, but I can't figure out how to get the count of non-0-status rows associated with each ID in the same rowset. I'd prefer to do this in a query rather than a stored proc, if possible.
The table is:
RPR_HISTORY
TSTAMP timestamp
RPRID integer
PARID integer
LASTRES integer
LASTCUR float
The rowset I want is like
RPRID ERRORCOUNT
-------------------
18 4
19 2
66 7
The query
select distinct RPRID from RPR_HISTORY
where (LASTRES <> 0)
and (TSTAMP >= :STARTSTAMP);
gives me the IDs I'm looking for, but obviously not the count of non-0-status rows for each ID. I've tried a bunch of combinations of nested queries derived from the above; all generate errors, usually on grouping or aggregation errors. It seems like a straightforward thing to do but is just escaping me.
Got it! The query
select rh.RPRID, count(rh.RPRID) from RPR_HISTORY rh
where (rh.LASTRES <> 0)
and (rh.TSTAMP >= :STARTSTAMP)
and rh.RPRID in
(select distinct rd.RPRID from RPR_HISTORY rd where rd.LASTRES <> 0)
group by rh.RPRID;
returns the rowset I need.

SQL JOIN with CASE statement result

Is there any way of joining the result of a case statement with a reference table without creating a CTE, ect.
Result AFTER CASE statement:
ID Name Bonus Level (this is the result of a CASE statement)
01 John A
02 Jim B
01 John B
03 Jake C
Reference table
A 10%
B 20%
C 30%
I want to then get the % next to each employee, then the max %age using the MAX function and grouping by ID, then link it back again to the reference so that each employee has the single correct (highest) bonus level next to their name. (This is a totally fictitious scenario, but very similar to what I am looking for).
Just need help with joining the result of the CASE statement with the reference table.
Thanks in advance.
In place of a temporary value as the result of the case statement, you could use a select statement from the reference table.
So if your case statement looks like:
case when variable1=value then bonuslevel =A
Then, replacing it like this might help
case when variable1=value then (select percentage from ReferenceTable where variable2InReferenceTable=A)
Don't know if I am overly simplifying, but based on the results of your case result query, why not just join that to the reference table, and do a max grouped by ID/Name. Since the ID and persons name wont change anyhow since they are the same person, you are just getting the max you want. To complete the Bonus level, rejoin just that portion after the max percentage determined for the person.
select
lvl1.ID,
lvl1.Name,
lvl1.FinalBonus,
rt2.BonusLvl
from
( select
PQ.ID,
PQ.Name,
max( rt.PcntBonus ) as FinalBonus
from
(however you
got your
data query ) PQ
JOIN RefTbl rt
on PQ.BonusLvl = rt.BonusLvl
) lvl1
JOIN RefTbl rt2
on lvl1.FinalBonus = rt2.PcntBonus
Since the Bonus levels (A,B,C) do not guarantee corresponding % levels (10,20,30), I did it this way... OTHERWISE, you could have just used max() on both the bonus level and percent. But what if your bonus levels were listed as something like
Limited 10%
Aggressive 20%
Ace 30%
You could see that a max of the level above would have "Limited", but the max % = 30 is associated with an "Ace" sales rep... Get the 30% first, then see what the label that matched that is.

How can I retrieve data from a Hive table from two columns with non null values and top 500 records in one query?

I have a Hive table (my_table) which is in ORC format and has 30 columns. Two of the columns (col_us, col_ds) store numeric values which can be 0 or null or some integer. The table is partitioned on the bases of day and hourly.
The table has approx. 8 Million x 96 records in a days partition and I am referring to 15 daily partitions
Currently I am running separate queries to retrieve top 500 records with value greater than 0 using a rank function. One query to retrieve col_us and other for col_ds
It is possible that clo_US may have a numeric value while col_DS is 0 or null
Question:
I want to retrieve top 500 non null and non 0 records from each of these columns from one query.
My Query:
From(
SELECT D.COL_US, D.DATESTAMP,
ROW_NUMBER() OVER (PARTITION BY D.ID,D.SUB_ID ORDER BY CONCAT (D.DATESTAMP,D.HOURSTAMP,D.TIMESTAMP) DESC) AS RNK
FROM ${wf_table_name} D
WHERE DATESTAMP >= '${datestamp_15}' AND DATESTAMP < '${datestamp}'
AND COL_US > 0)T
INSERT OVERWRITE TABLE ${wf_us_table}
SELECT T.COL_US, T.DATESTAMP, T.RNK WHERE T.RNK < 500;
As per your query I can guess that you are trying to get top 500 rows from your table based on date/time that means latest 500 rows where col_us, col_ds both have a value which is >0 but not top 500 from each of these columns.
As per your question your table may have 2 type of value. for example.
col_us
0
NULL
10
5
col_ds
5
10
0
NULL
or both column may have >0 value.
So instead of 'AND COL_US > 0' under WHERE clause use 'AND (COL_US > 0 and col_ds > 0)'
But with this condition you will not get any value from above stated 4 rows.
So if you want to get 10,5 from col_us along with 5,10 col_ds then I should say it's not possible using a single query.
Again, as per your question stated "I want to retrieve top 500 non null and non 0 records from each of these columns from one query." ,
I can guess that you want to get top 500 records from col_us, col_ds depends on the value of col_us/col_ds then you must have to use these columns within rank clause instead of date/time.
What you want to retrieve you may get by UPDATE query depending on other available columns but before that I want to request you to share exactly what you want (top 500 based on col_us/col_ds or latest 500) along with your base and target table structure.

Selecting filtered values from Oracle using ROWNUM

I have a requirement wherein i need to find the record number of the records that are returned from the resultset. I know that i can use ROWNUM to get the record number from the resultset but my issue is slightly different. below are the details
Table : ProcessSummary
Columns:
PS_PK ProcessId StepId AsscoiateId ProcessName AssetAmount
145 25 50 Process1 3,500.00
267 26 45 Process2 4,400.00
356 27 70 Process3 2,400.00
456 28 80 90 Process4 780.00
556 29 56 67 Process5 4,500.00
656 45 70 Process6 6,000.00
789 31 75 Process7 8,000.00
Now what i need to do is fetch all the records from the ProcessSummary Table when either of ProcessId OR StepId OR AssociateId is NULL. I wrote the below query
select * from ProcessSummary where ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
As expected i got 1st, 2nd, 3rd, 6th and 7th records in the resultset that got returned.
Now what i need is to get the records numbers 1,2,3,6,7. I tried to use the ROWNUM as below but i got the values of 1,2,3,4,5 and not 1,2,3,6,7.
select ROWNUM from ProcessSummary where ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
Is it possible to get the ROWNUM values in the sequence that i want and if yes then can you please let me know how can i do this. Also if ROWNUM cannot be used then what would be the other option that i can use to get the result in the form that i want.
Any help would be greately appericiated as i could not find much on the net or SO regarding this sort of requirement.
Thanks
Vikeng21
rownum is an internal numbering that gives you a row number based on the current query results only, so that numbering is not tied to a specific record, and it will change when you change the data or the query.
But the numbering you ask for is already in your table. It looks like you just need to SELECT PS_PK .. instead. PS_PK is the field in your table that contains the actual number you want.
You can generate a numbering using an analytical function, and then filter that query. You need some fields to order by, though. In this case I've chosen PS_PK, but it can be another field, like ProcessName or a combination of other fields as well.
select
*
from
(select
dense_rank() over (order by PS_PK) as RANKING,
p.*
from
ProcessSummary p)
where
ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
So, in this query, first a numbering is calculated for each row that is returned from the inner query. The numbering is returned as the field RANKING. And then the other query filters further, but still will return the field RANKING with the original numbering.
Instead of dense_rank there is also rank and row_number. The differences are subtle, but you can just experiment and read some docs here and here to learn about the differences and see which one fits you best.
Note that this might slow down your query, because the inner query first generates a number for each row in the table (there is no filtering on that level now).

How to get three count values from same column using SQL in Access?

I have a table that has an integer column from which I am trying to get a few counts from. Basically I need four separate counts from the same column. The first value I need returned is the count of how many records have an integer value stored in this column between two values such as 213 and 9999, including the min and max values. The other three count values I need returned are just the count of records between different values of this column. I've tried doing queries like...
SELECT (SELECT Count(ID) FROM view1 WHERE ((MyIntColumn BETWEEN 213 AND 9999));)
AS Value1, (SELECT Count(ID) FROM FROM view1 WHERE ((MyIntColumn BETWEEN 500 AND 600));) AS Value2 FROM view1;
So there are for example, ten records with this column value between 213 and 9999. The result returned from this query gives me 10, but it gives me the same value of 10, 618 times which is the number of total records in the table. How would it be possible for me to only have it return one record of 10 instead?
Use the Iif() function instead of CASE WHEN
select Condition1: iif( ), condition2: iif( ), etc
P.S. : What I used to do when working with Access was have the iif() resolve to 1 or 0 and then do a SUM() to get the counts. Roundabout but it worked better with aggregation since it avoided nulls.
SELECT
COUNT(CASE
WHEN MyIntColumn >= 213 AND MyIntColumn <= 9999
THEN MyIntColumn
ELSE NULL
END) AS FirstValue
, ??? AS SecondValue
, ??? AS ThirdValue
, ??? AS FourthValue
FROM Table
This doesn't need nesting or CTE or anything. Just define via CASE your condition within COUNTs argument.
I dont really understand what You want in the second, third an fourth column. Sounds to me, its very similar to the first one.
Reformatted, your query looks like:
SELECT (
SELECT Count(ID)
FROM view1
WHERE MyIntColumn BETWEEN 213 AND 9999
) AS Value1
FROM view1;
So you are selecting a subquery expression that is not related to the outer query. For each row in view1, you calculate the number of rows in view1.
Instead, try to do the calculation once. You just have to remove your outer query:
SELECT Count(ID)
FROM view1
WHERE MyIntColumn BETWEEN 213 AND 9999;
OLEDB Connection in MS Access does not support key words CASE and WHEN .
You can only use iif() function to count two, three.. values in same columns
SELECT Attendance.StudentName, Count(IIf([Attendance]![Yes_No]='Yes',1,Null)) AS Yes, Count(IIf([Attendance]![Yes_No]='No',1,Null)) AS [No], Count(IIf([Attendance]![Yes_No]='Not',1,Null)) AS [Not], Count(IIf([Attendance]![Yes_No],1,Null)) AS Total
FROM Attendance
GROUP BY Attendance.StudentName;