Count on particular partitions for particular month in hive - sql

I have a have a hive table which is partitioned with the table_date.I want to get the count of individual partition for the particular day for the particular month and particular year.
When I run the following query I am getting a count for an entire month but I want it as individual day.
select count(*) from table where month(table_date)=1 and year(table _date)=2016

If it is partitioned on date, I would expect this to work:
select table_date, count(*)
from table
where table_date >= '2016-01-01' and table_date < '2016-02-01'
group by table_date;

select table_date, count(*)
from table
where table_date between '2016-01-01' and '2016-02-01'
group by table_date;

Related

Frequency distinct values grouped by date

I am trying to get the frequency of unique ID values for each month of the last year. However, I don't get the outcome.. including the error message "SELECT list expression references column user_id which is neither grouped nor aggregated".
How can I get the count of unique IDs in each month and them group them by month?
What I tried:
SELECT
user_id,
EXTRACT(MONTH FROM date) as month
FROM
TABLE
WHERE
date >= '2020-09-01'
GROUP BY
month
I want something like this:
month
count of unique user_id
1
300
2
200
...
...
12
250
You would use GROUP BY and COUNT(DISTINCT):
SELECT EXTRACT(MONTH FROM date) as month, COUNT(DISTINCT user_id)
FROM TABLE
WHERE date >= '2020-09-01'
GROUP BY 1;
I would advise you to include the year in the query. In BigQuery, this is simplest using DATE_TRUNC():
SELECT DATE_TRUNC(date, MONTH) as month, COUNT(DISTINCT user_id)
FROM TABLE
WHERE date >= '2020-09-01'
GROUP BY 1;

Get 30 days prior data for each row of query

I have a query where I have a list of ~ 20k users for a specific week of the month that represents that they have logged on to our site.
What I need to get - for each of these users, in the past 30 days if they have
1. logged on: defined by any rows recorded in the same table
2. max event in the 30 day window, prior to the date in the current where clause
This is the current code snippet that helps me narrow to the ~20k users for a given week to begin with:
select
user_id,
max(timestamp)
from table
where timestamp between '2019-02-01' and '2019-02-05'
group by 1,2;
Expected result set/columns:
user_id,
max(timestamp),
logged_on, [if they have any # of rows in the same table within 30 days prior to their max(timestamp) date]
previous_timestamp, [the 2nd most recent login date within 30 days prior to their max(timestamp) date]
I think this is what you're looking for. Not sure if it's the most efficient method though - perhaps windowing functions may perform better but like bob-mccormick mentioned: the tricky bit would be filling in dates where the user (partition key) was not active so that the range query will work correctly.
Example data setup (Snowflake syntax)
-- Create sample table
create temporary table user_logins (userid number, date_logged_on timestamp);
;
-- Insert some random sample data
insert overwrite into user_logins
select
uniform(1,10,random()) userid,
dateadd('minutes', uniform(1,86400,random()) * -1,current_timestamp::timestamp_ntz) date_logged_on
from table(generator(rowcount => 100))
;
Select statement
-- Run select
with user_last_logins as (
select
userid,
max(date_logged_on) last_login
from user_logins
where
date_logged_on between '2019-01-01' and '2019-05-08'
group by userid
)
select
user_last_logins.userid,
max(user_last_logins.last_login) last_logged_on,
count(prior_30_each_user.userid) num_logins_prior_30,
max(prior_30_each_user.date_logged_on)
from user_last_logins
left join user_logins prior_30_each_user
on user_last_logins.userid = prior_30_each_user.userid
and prior_30_each_user.date_logged_on > dateadd('day', -30, user_last_logins.last_login) and prior_30_each_user.date_logged_on < user_last_logins.last_login
group by user_last_logins.userid
;

SQL Server data search with date range

I have a table with the following columns:
Date
Skills,
Customer ID
I want to find out Date(x), Customers, Count of Customers in between Date(x) and Date(x)+6
Can somebody guide me how to make this query, or can I create this function in SQL Server?
If I understand you correctly, you want something like this:
(take care, can be bad syntax, because i "work" only with oracle. But I think that it should work)
select date, customer_id, COUNT(*)
from your_table --add your table
where date between getdate() and DATEADD(day, 6, getdate())
-- between current database system date and +6 day
group by date, customer id
order by COUNT (*) desc -- if you want, you can order your result - ASC||DESC
If you have data on each date, then perhaps this is what you want:
select date, count(*),
sum(count(*)) over (order by date rows between 6 preceding and current row) as week_count
from t
group by date;

Number of Unique Visits in SQL

I am using Postgres 9.1 to count the number of unique patient visits over a given period of time, using the invoices entered for each patient.
I have two columns, transactions.ptnumber and transactions.dateofservice, and can calculate the the patient visits in the following manner:
select count(*)
from transactions
where transactions.dateofservice between '2012-01-01' and '2013-12-31'
The problem is that sometimes one patient might get two invoices for the same day, but that should be counted as only one patient visit.
If I use SELECT DISTINCT or GROUP BY on the column transactions.ptnumber, that would count the number of patients who were seen (but not the number of times they were seen).
If I use SELECT DISTINCT or GROUP BY on the column transactions.dateofservice, that would count the number of days that had an invoice.
Not sure how to approach this.
This will return unique patients per day.
select count(distinct transactions.ptnumber) as cnt
from transactions
where transactions.dateofservice between '2012-01-01' and '2013-12-31'
group by transactions.dateofservice
You can sum them up to get the unique patients for the whole period
select sum(cnt) from (
select count(distinct transactions.ptnumber) as cnt
from transactions
where transactions.dateofservice between '2012-01-01' and '2013-12-31'
group by transactions.dateofservice
)
You might use a subselect, consider
Select count(*)
from (select ptnumber, dateofservice
from transactions
where dateofservice between '2012-01-01' and '2013-12-31'
group by ptnumber, dateofservice
)
You may also want to make this a stored procedure so you can pass in the date range.
There are multiple ways to achieve this, but you could use the WITH clause to construct a temporary table that contains the unique visits, then count the results !
WITH UniqueVisits AS
(SELECT DISTINCT transactions.ptnumber, transactions.dateofservice
FROM transactions
WHERE transactions.dateofservice between '2012-01-01' and '2013-12-31')
SELECT COUNT(*) FROM UniqueVisits

hsqldb count for each day

having table
id|date|somefield
I need to get count of entries for each day of the year
select EXTRACT (DAY_OF_YEAR FROM date) as day, id from table
works fine
but when I try
select EXTRACT (DAY_OF_YEAR FROM date) as day, count(*) from table
fails
select count(*) from table group by EXTRACT (DAY_OF_YEAR FROM date)
fails as well
You need to add a group by expression. Here is some pseudo code, I will work up a SqlFiddle in a moment.
select EXTRACT (DAY_OF_YEAR FROM date) as day,
count(*)
from table
group by EXTRACT (DAY_OF_YEAR FROM date)
SQLFIDDLE (Using MYSQL) http://sqlfiddle.com/#!2/42c9e/10