Multiple select statement in hive

Multiple select statement in hive - hive

toi am fairly new to hive.
I have a table called stats as shown below
from_date | to_date | customer_name | callcount
-------------------------------------------------------
2016_01_01 | 2016_01_02 | ABC | 25
2016_01_02 | 2016_01_03 | ABC | 53
2016_01_03 | 2016_01_04 | ABC | 44
2016_01_04 | 2016_01_05 | ABC | 55
I want to build a hive query will accept:
a) current time range (from and to time)
b) previous time range (from and to time)
c) customer name
For e.g.: the inputs will be:
current time range can be from time(2016_01_03) and to time(2016_01_05)
current time range can be from time(2016_01_01) and to time(2016_01_02)
customer name can be ABC
And the result i want to display is:
current_call_count(sum of call counts for current time range), previous_call_count(sum of call counts for previous time range)
and difference between current_call_count & previous_call_count
like this:
customer | current_call_count | previous_call_count | Diff
---------------------------------------------------------
ABC | 99 | 25 | 74
The query which i built was:
select * from
(
select sum(callCount) as current_count from stats
where customer_name='ABC' and from_date>='2016-04-03' and to_date<='2016-04-05'
UNION ALL
select sum(callCount) as current_count from stats
where customer_name='ABC' and from_date>='2016-04-01' and to_date<='2016-04-02'
) FINAL
I am not able to get the calculation and also not able to display result as columns. Please help

Try conditional aggregation:
select
sum(case when from_date >= '2016_01_04' and to_date <= '2016_01_05' then callcount else 0 end)
as current_call_count,
sum(case when from_date >= '2016_01_02' and to_date <= '2016_01_03' then callcount else 0 end)
as previous_call_count,
sum(case when from_date >= '2016_01_04' and to_date <= '2016_01_05' then callcount else 0 end)
- sum(case when from_date >= '2016_01_02' and to_date <= '2016_01_03' then callcount else 0 end)
as difference
from stats
where customer_name = 'ABC'
Note that your example data uses _ instead of - (which is used in your query) as a date separator.

Related

Get last date with data for each calendar date

I have some data : sales amount for each day, but sometimes I have missing data so no record (for example on the weekend, but not only). For these dates, I want to replace the null value with the last known value. I create a reference table with all calendar dates and a boolean to tell me if I have data for this day.
For example with this reference table :
Date
is_data_present
27/10/2022
1
28/10/2022
1
29/10/2022
0
10/10/2022
0
I want this outcome :
Date
is_data_present
date_to_use
27/10/2022
1
27/10/2022
28/10/2022
1
28/10/2022
29/10/2022
0
28/10/2022
30/10/2022
0
28/10/2022
I tried things with LEAD but I don't know how to add a condition like 'where is_data_present = 1'

Basically, you don't need a window function for this.
The coalsesce is for the case that the first row is 0, and so has no value that is prior to it
SELECT
"Date", "is_data_present",
COALESCE((SELECT "Date" FROM table1 WHERE "Date" <= Tab1."Date" AND "is_data_present" = 1 ORDER BY "Date" DESC LIMIT 1 ),"Date") date_to_use
FROM table1 tab1

I tried things with LEAD but I don't know how to add a condition like 'where is_data_present = 1'
In addtion to #nbk's approach, you might consider FIRST_VALUE or LAST_VALUE if you want to use a window function since LEAD or LAG doesn't support IGNORE NULLS in it.
WITH sample_table AS (
SELECT '27/10/2022' date, 1 is_data_present UNION ALL
SELECT '28/10/2022' date, 1 is_data_present UNION ALL
SELECT '29/10/2022' date, 0 is_data_present UNION ALL
SELECT '30/10/2022' date, 0 is_data_present
)
SELECT *,
LAST_VALUE(IF(is_data_present = 1, date, NULL) IGNORE NULLS) OVER (ORDER BY date) date_to_use,
FROM sample_table;
+------------+-----------------+-------------+
| date | is_data_present | date_to_use |
+------------+-----------------+-------------+
| 27/10/2022 | 1 | 27/10/2022 |
| 28/10/2022 | 1 | 28/10/2022 |
| 29/10/2022 | 0 | 28/10/2022 |
| 30/10/2022 | 0 | 28/10/2022 |
+------------+-----------------+-------------+

SQL: Calculate number of days since last success

Following table represents results of given test.
Every result for the same test is either pass ( error_id=0) or fail ( error_id <> 0)
I need help to write a query, that returns the number of runs since last good run ( error_id= 0) and the date.
| Date | test_id | error_id |
-----------------------------------
| 2019-12-20 | 123 | 23
| 2019-12-19 | 123 | 23
| 2019-12-17 | 123 | 22
| 2019-12-18 | 123 | 0
| 2019-12-16 | 123 | 11
| 2019-12-15 | 123 | 11
| 2019-12-13 | 123 | 11
| 2019-12-12 | 123 | 0
So the result for this example should be:
| 2019-12-18 | 123 | 4
as the test 123 was PASS on 2019-12-18 and this happened 4 runs ago.
I have a query to determine whether given run is error or not, but I have trouble applying appropriate window function to it to get the wanted result
select test_id, Date, error_id, (CASE WHEN error_id 0 THEN 1 ELSE 0 END) as is_error
from testresults

You can generate a row number, in reverse order from the sorting of the query itself:
SELECT test_date, test_id, error_code,
(row_number() OVER (ORDER BY test_date asc) - 1) as runs_since_last_pass
FROM tests
WHERE test_date >= (SELECT MAX(test_date) FROM tests WHERE error_code=0)
ORDER BY test_date DESC
LIMIT 1;
Note that this will run into issues if test_date is not unique. Better use a timestamp (precise to the millisecond) instead of a date.
Here's a DBFiddle: https://www.db-fiddle.com/f/8gSHVcXMztuRiFcL8zLeEx/0
If there's more than one test_id, you'll want to add a PARTITION BY clause to the row number function, and the subquery would become a bit more complex. It may be more efficient to come up with a way to do this by a JOIN instead of a subquery, but it would be more cognitively complex.

I think you just want aggregation and some filtering:
select id, count(*),
max(date) over (filter where error_id = 0) as last_success_date
from t
where date > (select max(t2.date) from t t2 where t2.error_id = 0);
group by id;

You have to use the Maximum date of the good runs for every test_id in your query. You can try this query:
select tr2.Date_error, tr.test_id, count(tr.error_id) from
testresults tr inner join (select max(Date_error), test_id
from testresult where error_id=0 group by test_id) tr2 on
tr.test_id=tr2.test_id and tr.date_error >=tr2.date_error
group by test_id

This should do the trick:
select count(*) from table t,
(select max(date) date from table where error_id = 0) good
where t.date >= good.date
Basically you are counting the rows that have a date >= the date of the last success.
Please note: If you need the number of days, it is a complete different query:
select now()::date - max(test_date) last_valid from tests
where error_code = 0;

Calculating satisfaction scores when some months aren't reported

Suppose I have this table (Postgres 9.5) composed of a interation id, a satisfaction value (1 for satisfied 0 for not satisfied), and the date of the interaction that is truncated to the first day of the month in which it took place. Assume that the layout of this table cannot be changed.
interaction | satisfaction | surveyed_on
------------+---------------+-------------
325524 | 1 | 2016-01-01
325999 | 1 | 2016-01-01
332642 | 0 | 2016-03-01
333152 | 1 | 2016-02-01
326765 | 0 | 2016-01-01
How would I calculate satisfaction percentage on a monthly basis while accounting for the fact that it's possible for some months to not receive positive or negative interactions which. Ideally, the results would look something like this:
month | positive_scr | negative_scr | satisfaction_pct
------------+---------------+--------------+-----------------
2016-01-01 | 100 | 1 | 99
2016-02-01 | 10 | 5 | 50
2016-03-01 | 50 | 10 | 80
2016-04-01 | 35 | 35 | 100
Thanks!

I'd approach this using a couple of steps:
Generate a date series (in the attached example I have used the postgres generate_series function). This allows you calculate scores for months where limited data is available
Join your data on to this date series
Aggregate your data, transposing your positive/negative scores into their own columns
I've had a go in the attached SQLfiddle:
select dt,
sum( case when satisfaction = 1 then 1 else 0 end ) as positive_scr,
sum( case when satisfaction = 0 then 1 else 0 end ) as negative_scr,
sum( case when satisfaction = 1 then 1 else 0 end ) * 100 / count(*) as satisfaction_pct
from (
/* If not using Postgres you will need to use your database specific function here */
select generate_series( '2016-01-01', '2016-04-01', interval '1 month' ) as dt
) as a
left join
(
select satisfaction, surveyed_on
from scores
) as b
on a.dt = b.surveyed_on
group by dt

You can do it with this kind of query, which will work on almost all popular RDBMS.
Note: You have to handle divide by 0 condition while handling percentage, but that would be database specific, so I left it for you to figure out.
The formula I am using to calculate satisfaction percentage is 100 - negative_score*(100/positive_score). If you want to change it, just put negative_score and positive score on your custom formula.
Rextester Demo
select
surveyed_on,
sum(case when satisfaction=1 then 1 else 0 end) as positive ,
sum(case when satisfaction=0 then 1 else 0 end) as negative ,
(100 - (sum(case when satisfaction=0 then 1 else 0 end))
*(100/(sum(case when satisfaction=1 then 1 else 0 end)))
) as satisfaction_percent
from tbl234
group by surveyed_on;

SQL Query Compare values in per 15 minutes and display the result per hour

I have a table with 2 columns. UTCTime and Values.
The UTCTime is in 15 mins increment. I want a query that would compare the value to the previous value in one hour span and display a value between 0 and 4 depends on if the values are constant. In other words there is an entry for every 15 minute increment and the value can be constant so I just need to check each value to the previous one per hour.
For example
+---------|-------+
| UTCTime | Value |
------------------|
| 12:00 | 18.2 |
| 12:15 | 87.3 |
| 12:30 | 55.91 |
| 12:45 | 55.91 |
| 1:00 | 37.3 |
| 1:15 | 47.3 |
| 1:30 | 47.3 |
| 1:45 | 47.3 |
| 2:00 | 37.3 |
+---------|-------+
In this case, I just want a Query that would compare the 12:45 value to the 12:30 and 12:30 to 12:15 and so on. Since we are comparing in only one hour span then the constant values must be between 0 and 4 (O there is no constant values, 1 there is one like in the example above)
The query should display:
+----------+----------------+
| UTCTime | ConstantValues |
----------------------------|
| 12:00 | 1 |
| 1:00 | 2 |
+----------|----------------+
I just wanted to mention that I am new to SQL programming.
Thank you.
See SQL fiddle here

Below is the query you need and a working solution Note: I changed the timeframe to 24 hrs
;with SourceData(HourTime, Value, RowNum)
as
(
select
datepart(hh, UTCTime) HourTime,
Value,
row_number() over (partition by datepart(hh, UTCTime) order by UTCTime) RowNum
from foo
union
select
datepart(hh, UTCTime) - 1 HourTime,
Value,
5
from foo
where datepart(mi, UTCTime) = 0
)
select cast(A.HourTime as varchar) + ':00' UTCTime, sum(case when A.Value = B.Value then 1 else 0 end) ConstantValues
from SourceData A
inner join SourceData B on A.HourTime = B.HourTime and
(B.RowNum = (A.RowNum - 1))
group by cast(A.HourTime as varchar) + ':00'

select SUBSTRING_INDEX(UTCTime,':',1) as time,value, count(*)-1 as total
from foo group by value,time having total >= 1;
fiddle

Mine isn't much different from Vasanth's, same idea different approach.
The idea is that you need recursion to carry it out simply. You could also use the LEAD() function to look at rows ahead of your current row, but in this case that would require a big case statement to cover every outcome.
;WITH T
AS (
SELECT a.UTCTime,b.VALUE,ROW_NUMBER() OVER(PARTITION BY a.UTCTime ORDER BY b.UTCTime DESC)'RowRank'
FROM (SELECT *
FROM #Table1
WHERE DATEPART(MINUTE,UTCTime) = 0
)a
JOIN #Table1 b
ON b.UTCTIME BETWEEN a.UTCTIME AND DATEADD(hour,1,a.UTCTIME)
)
SELECT T.UTCTime, SUM(CASE WHEN T.Value = T2.Value THEN 1 ELSE 0 END)
FROM T
JOIN T T2
ON T.UTCTime = T2.UTCTime
AND T.RowRank = T2.RowRank -1
GROUP BY T.UTCTime
If you run the portion inside the ;WITH T AS ( ) you'll see that gets us the hour we're looking at and the values in order by time. That is used in the recursive portion below by joining to itself and evaluating each row compared to the next row (hence the RowRank - 1) on the JOIN.

COUNT records in RANGE GROUP BY date

I have a sales table that shows the date and time of each sale.
For example:
saleid | saledate | saletime
1 | 20110327 | 101
2 | 20110327 | 102
3 | 20110328 | 201
(So sale 2 occurred on 20110327 at 102)
I need to construct a single SQL statement that:
Groups the sales by date (each row is a different date) and then
counts the sales for each time range. (With each time range being a separate column)
The table should look something like this:
saledate | 101-159 | 200-259 |
20110327 | 2 | 0 |
20110328 | 0 | 1 |
It needs to be a single statement and saledate and saletime need to remain in numeric format.
(I am pulling from a database table with several million rows)
I am using MS Access
Any advice is greatly appreciated.
Thank you so much!

SELECT saledate,
SUM(IIF(saletime >= 101 and saletime <= 159), 1, 0) as [101To159)
SUM(IIF(saletime >= 200 and saletime <= 259), 1, 0) as [200To259)
FROM myTable
GROUP BY saledate
Note: I haven't run this query. However, this is how it could be.

SELECT saledate,
SUM(case when saletime between 101 and 159 then 1 else 0 end ) as R101_159,
SUM(case when saletime between 200 and 259 then 1 else 0 end ) as R200_59
FROM myTable
GROUP BY saledate

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Multiple select statement in hive - hive

Related

Get last date with data for each calendar date

SQL: Calculate number of days since last success

Calculating satisfaction scores when some months aren't reported

SQL Query Compare values in per 15 minutes and display the result per hour

COUNT records in RANGE GROUP BY date

Categories

Resources