Merge two date fields in Google Data Studio - data-visualization

I have the following set-up in a spreadsheet
Order | Open date | Completion date | Order status
--------------------------------------------------------------
Ord1 | 01/01/2020 | 01/02/2020 | Success
Ord2 | 01/01/2020 | 01/01/2020 | Rejected by the client
In this scenario I have:
2 orders opened in Jan
1 order success order in Feb
1 order rejected in Jan
If I want to put this in a line graph and compare Total number of orders vs Succes vs Rejected , I would have to merge somehow the two date fields in a single date field, correct?
I say this because I have to use a date field for filtering, but if I filter based on Open date I don't get the correct date for the two statuses that link to Completion date and same the other way around.
Any ideas how to to this comparison in Google Data Studio?
Open date

For this "I want to set-up a chart to show me the open postions in January, and the completed positions by status (Success, Rejected). "
First create a filter on opendate. Then draw the line chart as below:
Dimension: comlpleted_date
Breakdown Dimession : status
metric: record count

Related

Significance test for AB testing of email marketing campaign - data in MySQL

I have 50,000 email ids divided into Group A and Group B equally.
An email is sent to both groups with different subject.
The Open Rate is as follows:
Group | Emails_Opened | Total Emails | Open Rate
A | 24332 | 34471 | 70.5869%
B | 24020 | 33761 | 71.1427%
Which significance test should I run to make sure the results are statistically significant and how?
The data is in SQL database.
So you fill in the column totals and row totals, then each expected value is
<corresponding column total>*<corresponding row total>/<grand total>
e.g.
=B$4*$D2/$D$4
Then use
=CHISQ.TEST(B2:C3,F2:G3)

Is there any good way to store records of the change in status together with number of days after status change

I have a table to record the current products status as follow:
product_id
Date
status
A1
20/10/2021
manufacturing
A1
22/10/2021
packaging
A1
24/10/2021
pending to deliver
But I want to store the number of days after status change in my database too for tracking and analysis purpose. Similar to this:
product_id
Date
status
days
A1
20/10/2021
manufacturing
2
A1
22/10/2021
packaging
2
A1
24/10/2021
pending to deliver
#no. of day until today's date
I want the #no. of day until today's date to update everyday and stop until a new status is posted.
product_id
Date
status
days
A1
20/10/2021
manufacturing
2
A1
22/10/2021
packaging
2
A1
24/10/2021
pending to deliver
3
A1
27/10/2021
delivered
null
And when the status is 'delivered' the days is null.
I am completely new to PostgreSQL, a lot of things still yet to be discover. I not so sure whether my ways of thinking is correct or not. Or is there any good ways to keep track of the time need for products to more from one status to another.
I hope I can get some sense of solving it.
What you need a SELECT with either LAG() or LEAD() to calculate the days on demand - storing them isn't necessary and is not recommended, as you'd need to change this column every time the date columns get updated. The following example checks the status of a record and calculates the interval in days from the date column to the current date in case the status is pending to deliver. If it is not pending, it calculates the actual record to the next status change (1 FOLLOWING):
SELECT
*,
CASE
WHEN status = 'pending to deliver' AND LEAD(dt) OVER w IS NULL
THEN CURRENT_DATE - LEAD(dt) OVER w
WHEN status = 'pending to deliver' AND LEAD(dt) OVER w IS NOT NULL
THEN LEAD(dt) OVER w-dt
WHEN status = 'delivered'
THEN NULL
ELSE LEAD(dt) OVER w-dt
END AS days
FROM
t
WINDOW w AS (PARTITION BY product_id ORDER BY dt ASC
ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING);
product_id | dt | status | days
------------+------------+--------------------+------
A1 | 2021-10-20 | manufacturing | 2
A1 | 2021-10-22 | packaging | 2
A1 | 2021-10-24 | pending to deliver | 3
A1 | 2021-10-27 | delivered |
A2 | 2021-10-22 | packaging | 2
A2 | 2021-10-24 | pending to deliver |
Demo: db<>fiddle

Structuring Month-Based Data in SQL

I'm curious about what the best way to structure data in a SQL database where I need to keep track of certain fields and how they differ month to month.
For example, if I had a users table in which I was trying to store 3 different values: name, email, and how many times they've logged in each month. Would it be best practice to create a new column for each month and store the number of times they logged in that month under that column? Or would it be better to create a new row/table for each month?
My instinct says creating new columns is the best way to reduce redundancy, however I can see it getting a little unwieldy when the number of columns in the table changes over time. (I was also thinking that if I were to do it by column, it would warrant having a total_column that keeps track of all months at a time).
Thanks!
In my opinion, the best approach is to store each login for each user.
Use a query to summarize the data the way you need it when you query it.
You should only be thinking about other structures if summarizing the detail doesn't meet performance requirements -- which for a monthly report don't seem so onerous.
Whatever you do, storing counts in separate columns is not the right thing to do. Every month, you would need to add another column to the table.
I'm not an expert but in my opinion, it is best to store data in a separate table (in your case). That way you can manipulate the data easily and you don't have to modify the table design in the future.
PK: UserID & Date or New Column (Ex: RowNo with auto increment)
+--------+------------+-----------+
| UserID | Date | NoOfTimes |
+--------+------------+-----------+
| 01 | 2018.01.01 | 1 |
| 01 | 2018.01.02 | 3 |
| 01 | 2018.01.03 | 5 |
| .. | | |
| 02 | 2018.01.01 | 2 |
| 02 | 2018.01.02 | 6 |
+--------+------------+-----------+
Or
PK: UserID, Year & Month or New Column (Ex: RowNo with auto increment)
+--------+------+-------+-----------+
| UserID | Year | Month | NoOfTimes |
+--------+------+-------+-----------+
| 01 | 2018 | Jan | 10 |
| 01 | 2018 | feb | 13 |
+--------+------+-------+-----------+
Before you create the table, please take a look at the database normalization. Especially 1st (1NF), 2nd (2NF) and 3rd (3NF) normalization forms.
https://www.tutorialspoint.com/dbms/database_normalization.htm
https://www.lifewire.com/database-normalization-basics-1019735
https://www.essentialsql.com/get-ready-to-learn-sql-database-normalization-explained-in-simple-english/
https://www.studytonight.com/dbms/database-normalization.php
https://medium.com/omarelgabrys-blog/database-normalization-part-7-ef7225150c7f
Either approach is valid, depending on query patterns and join requirements.
One row for each month
For a user, the row containing login count for the month will be inserted when data is available for the month. There will be 1 row per month per user. This design will make it easier to do joins by month column. However, multiple rows will need to be accessed to get data for a user for the year.
-- column list
name
email
month
login_count
-- example entries
'user1', 'user1#email.com','jan',100
'user2', 'user2#email.com','jan',65
'user1', 'user1#email.com','feb',90
'user2', 'user2#email.com','feb',75
One row for all months
You do not need to dynamically add columns, since number of months is known in advance. The table can be initially created to accommodate all months. By default, all month_login_count columns would be initialized to 0. Then, the row would be updated as the login count for the month is populated. There will be 1 row per user. This design is not the best for doing joins by month. However, only one row will need to be accessed to get data for a user for the year.
-- column list
name
email
jan_login_count
feb_login_count
mar_login_count
apr_login_count
may_login_count
jun_login_count
jul_login_count
aug_login_count
sep_login_count
oct_login_count
nov_login_count
dec_login_count
-- example entries
'user1','user1#email.com',100,90,0,0,0,0,0,0,0,0,0,0
'user2','user2#email.com',65,75,0,0,0,0,0,0,0,0,0,0

Dynamically determine and categorize duplicates in Tableau

I have a set that has the following structure:
ID | Date | DollarAmount
1 | Jan | 50
1 | Jan | 20
2 | Jan | 10
1 | Feb | 20
2 | Feb | 10
I am trying to dynamically be able to determine if for a particular period in time there is a duplicate based on the ID column.
For example, based on the data above, I would ideally have
I have tried to filter based on Number of Records but it shows filters out based on the TOTAL observations across the dataset, not date ranges.
Any help is much appreciated
Thanks!
Apparently you define a duplicate records as those that have the same value for the ID and Date fields, where Date is really a string containing the abbreviation for the month name.
In that case, define a (Boolean valued) LOD calculated field called [Duplicates] as {FIXED [ID], [Date] : Count(1) > 1}
Place [Duplicates] on the color shelf, Sum([Dollar Amount]) on rows and [Date] on Columns.
You will see the values True and False on the Color Legend. You can edit the aliases for those values if you want to display a more clear label such Duplicates, Non-Duplicates
If you have a true date valued field instead of a string, you may want to use DateTrunc() to define your duplicate test at the level of granularity that matches your problem.

How can I see if a date is on a weekend?

I have a table:
ID | Name | TDate
1 | John | 1 May 2013, 8:67AM
2 | Jack | 2 May 2013, 6:43AM
3 | Adam | 3 May 2013, 9:53AM
4 | Max | 4 May 2013, 2:13AM
5 | Leny | 5 May 2013, 5:33AM
I need a query that will return all the items where TDate is a weekend. How would I write such a
query?
WHAT I HAVE SO FAR
select
table.*,
EXTRACT (DAY FROM table.tdate )
from table
I did a select using EXTRACT to just see if I can get the right values. However, EXTRACT with the parameter DAY returns the day of the month. If I instead use WEEKDAY, as per the documentation here, then I get error:
ERROR: timestamp units "weekday" not recognized
SQL state: 22023
limit 1250
EDIT
TDate has a data type of datetime (timestamp). I just wrote it like that for easy reading. But regardless of the type, I could easily cast between types if need be.
I know dates 4May and 5May are weekends (as they fall on a Saturday and a Sunday). Does firebird allow for a way to write a query that will return dates if they fall on weekends.
try this:
SELECT ID, Name, TDate
FROM your_table
WHERE EXTRACT(WEEKDAY FROM TDate) IN (6,0)
UPDATE
condition must be (0,6) not (0,1).