Hive query to get count from previous day

Hive query to get count from previous day - sql

Table A
Year(String), Month(String), and Day(String) are partition columns.
I want to get the count for the previous day.
Note - here no Date formate like 2022-06-01 in any columns
I tried the below query.
Select count(*) FROM Table A where Day='03' and Month='06' and Year='2022' GROUP BY city;
But I don't want to hard-coded value.

I think I got the solution -
select count(*) from Table A where cast(concat_ws('-',year,month,day) as date)=date_sub(current_date, 1);
this is giving me the correct results.

Related

Returning single records per month

I have a use case function that needs to returns a single row only for every end of month.
I tried using select distinct and it is showing multiple records for the same end of month
SELECT DISTINCT CASE
WHEN eff_interest_balance < 0.01 THEN trial_balance_date
WHEN date_paid < trial_balance_date THEN date_paid
END as A
, period
FROM dbo.Intpayments[enter image description here][1]
WHERE loan_number = 60023
ORDER BY period ASC
Each row should return single date for each month

Distinct is returning unique rows, not grouping them. You are looking to aggregate rows. This means using some combination of aggregate functions and group by.
What your current query is missing is some sort of logic for aggregating the rows that are in the same period. Do you want to compare the sum of these values? The min, the max?
In any case, the basic idea of aggregating and grouping would look like this - I don't think this summing is what you want, but the query shows the basic idea of aggregating and grouping:
SELECT
period
, SUM(eff_interest_balance) AS SumOfBalance
FROM dbo.Intpayments
WHERE loan_number = 60023
GROUP BY period

SELECT MIN from a subset of data obtained through GROUP BY

There is a database in place with hourly timeseries data, where every row in the DB represents one hour. Example:
TIMESERIES TABLE
id date_and_time entry_category
1 2017/01/20 12:00 type_1
2 2017/01/20 13:00 type_1
3 2017/01/20 12:00 type_2
4 2017/01/20 12:00 type_3
First I used the GROUP BY statement to find the latest date and time for each type of entry category:
SELECT MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category;
However now, I want to find which is the date and time which is the LEAST RECENT among the datetime's I obtained with the query listed above. I will need to use somehow SELECT MIN(date_and_time), but how do I let SQL know I want to treat the output of my previous query as a "new table" to apply a new SELECT query on? The output of my total query should be a single value—in case of the sample displayed above, date_and_time = 2017/01/20 12:00.
I've tried using aliases, but don't seem to be able to do the trick, they only rename existing columns or tables (or I'm misusing them..).There are many questions out there that try to list the MAX or MIN for a particular group (e.g. https://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ or Select max value of each group) which is what I have already achieved, but I want to do work now on this list of obtained datetime's. My database structure is very simple, but I lack the knowledge to string these queries together.
Thanks, cheers!

You can use your first query as a sub-query, it is similar to what you are describing as using the first query's output as the input for the second query. Here you will get the one row out put of the min date as required.
SELECT MIN(date_and_time)
FROM (SELECT MAX(date_and_time) as date_and_time, entry_category
FROM timeseries_table
GROUP BY entry_category)a;

Is this what you want?
SELECT TOP 1 MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category
ORDER BY MAX(date_and_time) ASC;
This returns ties. If you do not want ties, then include an additional sort key:
SELECT TOP 1 MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category
ORDER BY MAX(date_and_time) ASC, entry_category;

Sql count and and/or condition in single statement

I am trying to build where clause condition on table having columns “Id”, itemNumber” which can be either 1 or 2 for any row and “date”.
My goal is to write where clause such that i only get “Id’s” where “itemNumber” is 2, and then if count is greater than some value it should filter whole rows to date between today and today+1, otherwise today and today+2.
I tried,
Select Id
from table
where itemNumber=2 And ((count(itemNumber)>2 and date between ‘today’ and ‘today+1’) OR (count(itemNumber)<=2 and date between ‘today’ and ‘today+2’))
I got error saying you need to have sql “having”. Am i doing it wrong?

Try it like this:
SELECT id
FROM t
WHERE itemNumber = 2
GROUP BY id
HAVING (COUNT(itemNumber) > 2 AND date BETWEEN 'today' and 'today+1'))
OR (COUNT(itemNumber) <= 2 AND date BETWEEN 'today' and 'today+2'))
Think of HAVING as a WHERE clause after you have grouped your data, which you have to do if you want to count something by group (or id).

Teradata filter query to pull all data for a month

I have a query below which fetches me the data for last day of a month. In this query, ME_DT is defined as date time type. So when I do the max on ME_DT then it gives me the data for last day of a month. I think I need to convert the date time type to integer YYYYMM in a teradata filter condition, so that it gives me the data for the entire month not just for the last day of a month. How should I modify my existing query to get my desired result?
PADW.PL_CURR_DEFN_LOSS_FRCST_ME.ME_DT =
(select max (PADW.PL_CURR_DEFN_LOSS_FRCST_ME.ME_DT) from PADW.PL_CURR_DEFN_LOSS_FRCST_ME)

You should try to avoid calculations on a column in the WHERE-condition to get better estimates and possible index/partition-access:
with cte (dt) as
(
select max (PADW.PL_CURR_DEFN_LOSS_FRCST_ME.ME_DT)
from PADW.PL_CURR_DEFN_LOSS_FRCST_ME
)
select ....
where PADW.PL_CURR_DEFN_LOSS_FRCST_ME.ME_DT
between TRUNC(dt, 'mon')
and last_day(dt)

I have to use filer on the table because it has millions of records...
i did this ...but still verifying if I have got what i want...
(PADW.PL_CURR_DEFN_LOSS_FRCST_ME.ME_DT =
(select max (cast(PADW.PL_CURR_DEFN_LOSS_FRCST_ME.ME_DT as date format 'YYYYMM')) from PADW.PL_CURR_DEFN_LOSS_FRCST_ME))

Aggregate function -Avg is not working in my sql query

In my query I need to display date and average age:
SELECT (SYSDATE-rownum) AS DATE,
avg((SYSDATE - rownum)- create_time) as average_Age
FROM items
group by (SYSDATE-rownum)
But my output for average age is not correct. It's simply calculating/displaying the output of (SYSDATE - rownum)- create_time but not calculating the average of them though I use: avg((SYSDATE - rownum)- create_time).
Can someone tell me why the aggregate function AVG is not working in my query and what might be the possible solution

In the select clause you are using both an non-aggregate expression as wel as an aggregate expression. By dropping the (SYSDATE-rownum) AS DATE statemant you would generate an outcome over the whole data set. In that way the avg is calculated over the whole data set ... and not just per single record retrieve.
Then you might drop the group by too. In the end you just keep the avg statement
SELECT avg((SYSDATE - rownum)- create_time) as average_Age
FROM items

First you need to think on rows or group on which you need avg. this column will come in group by clause. as a simple thing if there is 5 rows with age of 20, 10, 20, 30 then avg will be (80/4=20) i.e. 20. so I think you need to fist calculate age by (sysdate - create_time).
eg.select months_between(sysdate,create_date)/12 cal3 from your_table
and then there will be outer query for avg on group.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive query to get count from previous day - sql

I think I got the solution - select count(*) from Table A where cast(concat_ws('-',year,month,day) as date)=date_sub(current_date, 1); this is giving me the correct results.

Related

Returning single records per month

SELECT MIN from a subset of data obtained through GROUP BY

Sql count and and/or condition in single statement

Teradata filter query to pull all data for a month

Aggregate function -Avg is not working in my sql query

Categories

Resources