how to Find out users_id with some conditions filter - data-visualization

i have a table like this
order_id | user_id | createdAt | transaction_amount
order_id as the id of the transaction, user_id as the user, createdAt as the dates, and transaction_amount is the transaction of each id order.
so on tableau i want to find out users in time range '2020-01-01' until '2020-01-31' with 2 conditions
the users are doing transaction before last date in range ('2020-01-31') and atleast doing more than 1 transaction
and the users are at least doing 1 transaction in date range ('2020-01-01' until '2020-01-31')
on mysql that conditions can be described with this query
HAVING SUM(createdAt <= '2020-01-31') > 1
AND SUM(createdAt BETWEEN '2020-01-01' AND '2020-01-31')
on tableau i did this
[![enter image description here][1]][1]
on first filter (createdAt) i made range of dates ('2020-01-01' until '2020-01-31')
on second filter (createdAt copy) i made range before last range ( < '2020-01-31')
on filter CNTD(user_id) i made count distinct at least 1.
so it appear 2223 users, instead when i check it in mysql, its appear 1801 user, and the mysql was always right since i used mysql and im new in tableau. so what did i missed in here?

Edit: Take this example
user1 doing 2 transactions, 1 each in Dec-19 and Jan-20
user2 doing 1 transaction in Jan-20
user3 doing 2 transactions in Dec-19
user4 doing 2 transactions in Feb-20
user5 doing 2 transactions in Jan-20
data snapshot
Now if your date range is Jan-20 (say). If you want users doing at least two transactions before end of date range (31 Jan 2020) and at least one of these should be in Jan-2020 then user1 and user5 satisfy the condition.
In this case proceed like this-
Step-1 Create two date type parameters (parameter 1 and parameter 2 respectively for start and end of date ranges)
Step-2 create a calculated field condition with calculation as
{Fixed [User]: sum(
if [Trans Date]<=[Parameter 2] then 1 else 0 end)}>=2
AND
{FIXED [User]: sum(
IF [Trans Date]<=[Parameter 2] AND
[Trans Date] >= [Parameter 1] THEN 1 ELSE 0 END)}>=1
this will evaluate to TRUE whenever your given condition is satisfied. See this screenshot.
Needless to say trans date is your createdAt

Related

Calculating Datediff of two days based on when the sum of a column hits a number cap

Tried to see if this was asked anywhere else but doesn't seem like it. Trying to create a sql query to give me the date difference in days between '2022-10-01' and the date when our impression sum hits our cap of 5.
For context, we may see duplicate dates because someone revisit our website that day so we'll get a different session number to pair with that count. Here's an example table of one individual and how many impressions logged.
My goal is to get the number of days it takes to hit an impression cap of 5. So for this individual, they would hit the cap on '2022-10-07' and the days between '2022-10-01' and '2022-10-07' is 6. I am also calculating the difference before/after '2023-01-01' since I need this count for Q4 of '22 and Q1 of '23 but will not include in the example table. I have other individuals to include but for the purpose of asking here, I kept it to one.
Current Query:
select
click_date,
case
when date(click_date) < date('2023-01-01') and sum(impression_cnt = 5) then datediff('day', '2022-10-01', click_date)
when date(click_date) >= date('2023-01-01') and sum(impression_cnt = 5) then datediff('day', '2023-01-01', click_date)
else 0
end days_to_capped
from table
group by customer, click_date, impression_cnt
customer
click date
impression_cnt
123456
2022-10-05
2
123456
2022-10-05
1
123456
2022-10-06
1
123456
2022-10-07
1
123456
2022-10-11
1
123456
2022-10-11
3
Result Table
customer
days_to_cap
123456
6
I'm currently only getting 0 days and then 81 days once it hits 2022-12-21 (last date) for this individual so i know I need to fix my query. Any help would be appreciated!
Edited: This is in snowflake!
So, the issue with your query is that the sum is being calculated at the level that you are grouping by, which is every field, so it will always just be the value of the impressions field every time.
What you need to do is a running sum, which is a SUM() OVER (PARTITION BY...) statement. And then qualify the results of that:
First, just to get the data that you have:
with x as (
select *
from values
(123456,'2022-10-05'::date,2),
(123456,'2022-10-05'::date,1),
(123456,'2022-10-06'::date,1),
(123456,'2022-10-07'::date,1),
(123456,'2022-10-11'::date,1),
(123456,'2022-10-11'::date,3) x (customer,click_date,impression_cnt)
)
Then, I query the CTE to do the running sum with a QUALIFY statement to choose the record that actually has the value I'm looking for
select
customer,
case
when click_date < '2023-01-01'::date and sum(impression_cnt) OVER (partition by customer order by click_date) = 5 then datediff('day', '2022-10-01', click_date)
when click_date >= '2023-01-01'::date and sum(impression_cnt) OVER (partition by customer order by click_date) = 5 then datediff('day', '2023-01-01', click_date)
else 0
end days_to_capped
from x
qualify days_to_capped > 0;
The qualify filters your results to just the record that you cared about.

Get open cases counts for particular user in specific date range

I'm creating a SSRS report and I want to get the open cases for particular user in specific date range like below.
I have table called User from there I'm getting user info(User1,User2,User3).
I have open cases in the table management under description table.
I have c_date column in class table.
And I have 3 parameters user, startdate and enddate
And I need to use c_date between startdate and enddate.
If User enters startdate as 2019-01-01 and enddate as 2019-31-01, then I want to display the User1 who has open count.
For 0-5days and User1 who has open count for 6-11 days and same thing for user2 also.
Expected output:
User 0-5days 6-11days
---- ------- -------
User1 2 1
User2 1 4
User3 5 0
Explanation: User 1 has 2 open cases between 0-5 days means when I enter date range consider 2019-01-01 and 2019-31-01 so I have 2 open cases between first 0-5 days(2019-01-01 and 2019-05-01) and 1 open cases between next 6-11 days(2019-06-01 and 2019-11-01) etc.
Can I get result like this?
You should probably do this in the dataset query if possible. Use CASE and DATEDIFF to group your data something like
SELECT
[User],
[AnyOtherColumns],
CASE
WHEN DATEDIFF(d, #startdate, c_date) BETWEEN 0 AND 5 THEN '0-5'
WHEN DATEDIFF(d, #startdate, c_date) BETWEEN 6 AND 11 THEN '6-11'
ELSE 'older'
END AS [Age]
FROM myTable
WHERE [User] = #user
AND c_date BETWEEN #startdate AND #enddate
(done from memory so may not be perfect)
In your report you can use [User] on your row group, [Age] as your column group and then simply count any of the columns to give you the actual count of records.
You could do the counting in SQL too but I'm not sure if you need the detail for something else.
Considering you have two columns,
My approach would be
Have 3 parameters, one for user and other for To and from date.
Now selecting these parameters, add them to your dataset query as filter
Note you can apply filter on ssrs dataset as well but I would prefer on query level so that you have data been filtered and loaded only req one.
Then you can apply summing and grouping based on user and play around with Ssrs tablix to get the desired results.
https://www.mssqltips.com/sqlservertip/3453/sql-server-reporting-services-reports-with-optional-query-parameters/
https://reportsyouneed.com/ssrs-tip-put-parameters-in-your-query-not-your-filter/

SQL - multiple date time aggregations within single query

I have a SQL database of customer actions, a customer is defined by a UniqueId and an action is given a date and time of action timestamp. A user can have more than one action on any one day as so:
UniqueID | actionDate | actionTime |
1 17-01-18 13:01
1 17-01-18 13:15
2 17-01-18 13:15
1 18-01-18 12:56
I want to understand multiple things from the database ideally in a single query.
The first is how many times has each uniqueId preformed an action over a given time period (day, week, month) so for the example above there would be a count of 2 for id1 for 17-01-18, a count of 1 for 18-01-18 and assuming they are the only two actions that week a count of 3 for id 1 for that week.
On days that have more than one action (17-01-18 in the above example) I would want to understand the distribution of actions across the day and more importantly the number of actions that occurred within a time frame of an hour. In this case id want to understand that 2 actions occurred between 13:00 - 14:00 for id 1 but the other 23 hours had 0 actions.
The end goal would be to have a time series that looks back over three months and be able to view monthly, weekly and importantly daily / intra-daily counts of actions for each unique ID.
Desired result may look something like this:
ID | M1W1D1H1|M1W1D1H2|->|M1W1D1H13|->|M1W1D2H12|
1 0 0 2 1
2 0 0 1 0
M=Month, W=Week, D=Day, H=Hour. AC = ActionCount
So the above shows that on month 1, week 1, day 1, hour 1, id1 had no actions. The first action was on M1W1D1H13, in which time they had two actions. There next action was on D2 of W1, M1. Could then aggregate up to get the respective, weekly, daily monthly actions. The result will be fairly sparse with many 0 action.
Any help and guidance appreciated.
If I am understanding your question you have an id with a date and time details in a normalized data structure. You, however, want to denormalize this data so that you have only one line per id aggregated at the conditions you desire.
To do this you could use a simple group by and nest your aggregations into case statements qualifying them for the column range you desire. If you can not hard code your time slices and need that to be dynamic that may be possible but I would need more information about your requirements. You can also nest case statments into case statements and use derived tables to further enable more complex rules.
So, using your example...
sel
UniqueID
, sum(
case when actionDate between <someDate> and <someDate> then 1
end) as evnt_cnt_in_range01
, count(distinct(
case when actionDate between <someDate> and <someDate> then actionDate
end)) as uniq_dates_in_range01
, min(
case when actionDate between <someDate> and <someDate> then actionTime
end) as earliest_action_in_range01
, max(
case when actionDate between <someDate> and <someDate> then actionTime
end) as latest_action_in_range01
, max(
case when actionDate between <someDate> and <someDate> then
CASE WHEN actionTime > '12:00' THEN 1 ELSE 0 END -- I flip caps to keeps nests straight
end) as cnt_after_noon_action_range1
FROM <sometable>
group by 1

SQL Find latest record only if COMPLETE field is 0

I have a table with multiple records submitted by a user. In each record is a field called COMPLETE to indicate if a record is fully completed or not.
I need a way to get the latest records of the user where COMPLETE is 0, LOCATION, DATE are the same and no additional record exist where COMPLETE is 1. In each record there are additional fields such as Type, AMOUNT, Total, etc. These can be different, even though the USER, LOCATION, and DATE are the same.
There is a SUB_DATE field and ID field that denote the day the submission was made and auto incremented ID number. Here is the table:
ID NAME LOCATION DATE COMPLETE SUB_DATE TYPE1 AMOUNT1 TYPE2 AMOUNT2 TOTAL
1 user1 loc1 2017-09-15 1 2017-09-10 Food 12.25 Hotel 65.54 77.79
2 user1 loc1 2017-09-15 0 2017-09-11 Food 12.25 NULL 0 12.25
3 user1 loc2 2017-08-13 0 2017-09-05 Flight 140 Food 5 145.00
4 user1 loc2 2017-08-13 0 2017-09-10 Flight 140 NULL 0 140
5 user1 loc3 2017-07-14 0 2017-07-15 Taxi 25 NULL 0 25
6 user1 loc3 2017-08-25 1 2017-08-26 Food 45 NULL 0 45
The results I would like is to retrieve are ID 4, because the SUB_DATE is later that ID 3. Which it has the same Name, Location, and Date information and there is no COMPLETE with a 1 value.
I would also like to retrieve ID 5, since it is the latest record for the User, Location, Date, and Complete is 0.
I would also appreciate it if you could explain your answer to help me understand what is happening in the solution.
Not sure if I fully understood but try this
SELECT *
FROM (
SELECT *,
MAX(CONVERT(INT,COMPLETE)) OVER (PARTITION BY NAME,LOCATION,DATE) AS CompleteForNameLocationAndDate,
MAX(SUB_DATE) OVER (PARTITION BY NAME, LOCATION, DATE) AS LastSubDate
FROM your_table t
) a
WHERE CompleteForNameLocationAndDate = 0 AND
SUB_DATE = LastSubDate
So what we have done here:
First, if you run just the inner query in Management Studio, you will see what that does:
The first max function will partition the data in the table by each unique Name,Location,Date set.
In the case of your data, ID 1 & 2 are the first partition, 3&4 are the second partition, 5 is the 3rd partition and 6 is the 4th partition.
So for each of these partitions it will get the max value in the complete column. Therefore any partition with a 1 as it's max value has been completed.
Note also, the convert function. This is because COMPLETE is of datatype BIT (1 or 0) and the max function does not work with that datatype. We therefore convert to INT. If your COMPLETE column is type INT, you can take the convert out.
The second max function partitions by unique Name, Location and Date again but we are getting the max_sub date this time which give us the date of the latest record for the Name,Location,Date
So we take that query and add it to a derived table which for simplicity we call a. We need to do this because SQL Server doesn't allowed windowed functions in the WHERE clause of queries. A windowed function is one that makes use of the OVER keyword as we have done. In an ideal world, SQL would let us do
SELECT *,
MAX(CONVERT(INT,COMPLETE)) OVER (PARTITION BY NAME,LOCATION,DATE) AS CompleteForNameLocationAndDate,
MAX(SUB_DATE) OVER (PARTITION BY NAME, LOCATION, DATE) AS LastSubDate
FROM your)table t
WHERE MAX(CONVERT(INT,COMPLETE)) OVER (PARTITION BY NAME,LOCATION,DATE) = 0 AND
SUB_DATE = MAX(SUB_DATE) OVER (PARTITION BY NAME, LOCATION, DATE)
But it doesn't allow it so we have to use the derived table.
So then we basically SELECT everything from our derived table Where
CompleteForNameLocationAndDate = 0
Which are Name,Location, Date partitions which do not have a record marked as complete.
Then we filter further asking for only the latest record for each partition
SUB_DATE = LastSubDate
Hope that makes sense, not sure what level of detail you need?
As a side, I would look at restructuring your tables (unless of course you have simplified to better explain this problem) as follows:
(Assuming the table in your examples is called Booking)
tblBooking
BookingID
PersonID
LocationID
Date
Complete
SubDate
tblPerson
PersonID
PersonName
tblLocation
LocationID
LocationName
tblType
TypeID
TypeName
tblBookingType
BookingTypeID
BookingID
TypeID
Amount
This way if you ever want to add Type3 or Type4 to your booking information, you don't need to alter your table layout

MDX last order date and last order value

I've googled but I cannot get the point
I've a fact table like this one
fact_order
id, id_date, amount id_supplier
1 1 100 4
2 3 200 4
where id_date is the primary key for a dimension that have
id date month
1 01/01/2011 january
2 02/01/2011 january
3
I would like to write a calculated member that give me the last date and the last amount for the same supplier.
Last date and last amount -- it's a maximum values for this supplier?
If "yes", so you can create two metrics with aggregation "max" for fields id_date and amount.
And convert max id_date to appropriate view in the following way:
CREATE MEMBER CURRENTCUBE.[Measures].[Max Date]
AS
IIF([Measures].[Max Date Key] is NULL,NULL,
STRTOMEMBER("[Date].[Calendar].[Date].&["+STR([Measures].[Max Date Key])+"]").name),
VISIBLE = 1 ;
It will works, If maximum dates in your dictionary have maximum IDs. In my opinion You should use date_id not 1,2,3..., but 20110101, 20110102, etc.
If you don't want to obtain max values - please provide more details and small example.