QuestDB: Partition by week - documentation

Is there a way to partition tables by week in QuestDB? The QuestDB documentation shows NONE, YEAR, MONTH, DAY, and HOUR.

There is not a way that you can create Partitions in table based on week, But I'm sure it'll be available in future versions so if you want then you can raise a feature request at QuestDB's github repo HERE

Related

Partitioning public dateset does not return any values before an x date

I looked at different articles, GCP documentation and tutorials online to see where the error may be on my side with no avail.
This script in the BigQuery editor UI does not partition dates before May/June 2022.
CREATE OR REPLACE TABLE <TABLE>
PARTITION BY
DATE(trip_start_timestamp)
AS (
SELECT
trip_start_timestamp, trip_seconds
FROM
`bigquery-public-data.chicago_taxi_trips.taxi_trips`
);
After the creation of the table, I checked the results and there are no partitions before May 2022.
I have tested this with a client library as well (Python), with different CAST by date or timestamp before the load, with different qualifying filters (WHERE clause).
No dates before May are picked up after the creation of the partition table.
Is there something I am missing that is so obvious that does not return the expected results?
The expected output should be a partitioned table with all the dates returned by the DDL statement, no just dates or timestamps between May and June 2022.
Edited September 5th:
For the interested reader, I've filed a bug on the issue tracker to investigate this further.

Organizing per-week and daily data in SQL

Problem overview
I'm working on a simple app for reminding the user of weekly goals. Let's say the goal is to do 30 minutes of exercise on specific days of the week.
Sample goal: do exercise on Mon, Wed, Fri.
The app also needs to track past record, i.e. dates when the user did exercise. It could be just dates, e.g.: 2019-09-02, 2019-09-05, 2019-09-11 means the user did exercise on these days and did not on the others (doesn't need to be on "exercise goal" days of the week).
The goal can change in time. Let's say today is 2019-09-11 and the goal for this week ([2019-09-09, 2019-09-15]) is Mon, Wed, Fri but from 2019-08-05 to 2019-09-08 it was Mon, Thu (repeatedly for all these weeks).
I need to store these week-oriented goals and historic exercise of data and be able to retrieve the following:
The goal days for the current week (or any week, let's say I can compute start and end day for any week given a date).
Exercise history for a larger range of days together with goal days for that range (e.g. to show when the user was supposed to exercise and when they actually did in the last month).
Question
How to best store this data in SQL.
This is a little bit academic because I'm working on a small Android app and the data is just for a single user. So there will be little data and I can successfully use any approach, even a very clumsy one will be efficient enough.
However, I'd like to explore the topic and maybe learn a thing or two.
Possible solutions
Here are two approaches that come to my mind.
In both cases I would store exercise history as a table of dates. If there is an entry for that date it means the user did exercise on that day.
It's the goal storage that is interesting.
Approach 1
Store the goals per-week (it's SQLite so dates are stored as strings - all dates are just 'YEAR-MONTH-DAY'):
CREATE TABLE goals (
start_date TEXT,
exercise_days TEXT);
"start_date" is the first day of the week,
"exercise_days" is a comma-separated list of weekdays (let's say numbers 1-7).
So for the example above we might have two rows:
'2019-08-05', '1,4'
'2019-09-09', '1,3,5'
meaning that since 2019-08-05 the goal is Mon, Thu for all weeks until 2019-09-09, when the goal becomes Mon, Wed, Fri. So there is a gap in the data. I wouldn't want to generate data for weeks starting on 2019-08-12, 2019-08-19, 2019-08-26.
With this approach it is easy to work with the data week-wise. The current goal is the one with MAX('start_date'). The goal for a week for a given date is MAX('start_date') WHERE 'start_date' <= :date.
However it gets cumbersome when I want to get data for the last 3 months and show the user their progress.
Or maybe I want to show the user the percentage of actual exercise days to what they set as their goal in a year.
In this case it seems the best approach is to fetch the data separately and merge it in the application (or maybe write some complex queries), processing week by week. This is ok performance-wise because the amount of data is small and I rarely need more than a handful of weeks.
Approach 2
Store goals in such a way that each goal day is a record:
CREATE TABLE goals (
day TEXT,
);
"day" is a day when the user should exercise. So for the week starting 2019-09-09 (Mon, Wed, Fri) we would have:
'2019-09-09'
'2019-09-11'
'2019-09-13'
and for the week starting 2019-08-05 (Mon, Thu) we would have:
'2019-08-05'
'2019-08-09'
but what for the weeks in-between?
If my app could fill all the weeks in-between then it would be easy to merge this data with the exercise history and display days when the user was supposed to exercise and when they actually did. Extracting the goal for any given week would also be easy.
The problem is: this requires the app to generate data for the "gap" weeks even if the user doesn't tweak the goal. This can be implemented as a transaction that is run each time the app process starts. In some cases it could take noticeable time for occasional users of the app (think progress bar for a second).
Maybe there a smart way to generate the data in-between when making a SELECT query?
I don't like the fact that it requires generating data. I do like the fact that I can just join the tables and then process that (e.g. compute how many exercise days there were supposed to be in August and how many days the user did actually exercise and then show them percentage like "you did 85% of your goal" - in fact I can do this without joining the tables).
Also, it seems this approach gives me more flexibility for analysis in the future.
But is there a third way? Or maybe I am overthinking this? :)
(I am asking mostly for the way of organizing the data, there's no need for exact SQL queries)
Perhaps I'm over-thinking this, but if a goal can have multiple components to it, and can change over time I'd have a goal header record, with the ID, name and other data about the goal as a whole, and then a separate table linked with the components of that goal which are time-boxed, for example:
CREATE TABLE goal_days (goal_day_ID INT,
goal_ID INT,
day_ID INT,
target_minutes INT,
start_date TEXT,
end_date TEXT)
I'd have thought that allows you to easily check against the history to map against each day of the goal - e.g. they got 100% of the Mondays, but kept missing Thursday - however when the goal was changed to Friday instead they got better.

Add column of customer's past purchase total at time of current purchase and find rate of purchases that are from a returning customer - SQL

I am working with a table containing the purchase history for a shop. There is a purchase id, a date column and a customer id. I am trying (without much success so far) to do two things:
Add a column which for each purchase tells how many purchases the customer made before this (in the last month). I started by joining the table on itself but haven't got much further. I know I'll need to somehow filter the date so it only counts purchases before this date and not more than a month ago. Any suggestions on a simple way to tackle this?
The second thing I would like to see is what the weekly rate of returning customer transactions is. That is, what proportion of the purchases are by someone who purchased recently (in the last month). Ideally I would be able to graph this so from my sql queries I would like to end up with a date, weekly total (the 7 days up to the date) and weekly rate. I have been reading up on rolling windows and to be honest am having a bit of trouble getting my head around it. My SQL level is still quite low unfortunately. Any tips on a relatively simple way to do this would be much appreciated.
Thanks
I would need to see your data structure for the table(s) to better answer your question. But right off the top of my head is seems like you just need a simple SELECT COUNT.
So something like this would return all transactions from a single customer made in the past month:
SELECT COUNT(purchase_id)
FROM purchases
WHERE customer_id='some_customer_id'
AND date >= DATEADD(m, -1, GETDATE());
As for your second question you would probably want to setup a job (jenkins, ect..) that would run a query every month. The results of which you would plot. Checkout https://oss.oetiker.ch/rrdtool/ for graphing

Display the most recent entry in a given week

I have got multiple entries in each week. I would like to see the most recent entry in the current week and the most recent entry the previous week.
Any help is much appreciated.
Your question is very vague.
Query #1:
For the most recent entry this week (including today), try max([DateFieldGoesHere]).
Query #2:
For the most recent entry in the previous week, it depends on how the week is defined.
If your week starts "7 days from now", then you can do a WHERE clause with a datediff of 7 days.
If you week starts on Sunday, then your WHERE will have to take that into consideration.
The datediff functions available to you will depend on the technologies you are using.

Core Data or NSDictionary with multiple date entries (about 800+ each year)? What would be the most easy to implement?

I'm trying to figure the best approach to solve this problem.
--
I have a "History" table that,
lists ALL years that have data.
If a user clicks a given Year, it segues to a new Table and,
lists ALL months that have data.
Clicking a given month, shows a new table that,
lists ALL days that have data.
Clicking a specific day, shows a list of one or multiple Time Stamps.
--
What is the best approach to solve this?
If user creates a Time Stamp. I need to insert it with today's date.
I also need to have the ability that if a user,
Deletes a given year. Everything in that year is deleted.
That same way,
Deleting a month, deletes everything in that month, for it's particular year.
And so on, to the point where the user should be able to delete Individual Time Stamps.
--
I thought I would Use a Dictionary with key for the "year". 2012, 2013, ...
And each retrieving another Dictionary with key for the "month", 1, 2, 3, 4, ...
And so on ... and so on ...
I also thought I could make a model using Core Data.
A Class Year representing the "Year" entity, having a relation to many possible Months, and each month, having a relation to many possible days, and days to Time *Stamps*.
And last,
I thought of creating a model with only two Entities.
Entries, with only one attribute "Date", that has a to-many relationship to "Time Stamps", receiving All the possible Time Stamps for that given day.
I am new to iOS programming. So this is all theory for me. But I did follow some Core Data tutorials and others working with NSDictionaries, protocols delegates and so on.
The "Dig In" approach as I go trough, seems more elegant. Specially because I think I could delete a particular given object in a cascade manner?
Do any of these make sense? Or is there a more obvious easy way to go about it? Also, please consider in the answer what would be easier to implement if a user chooses to delete a given entry in the "tree"
Any help is most appreciated.
Thank you advance!
Nuno
If you are going to rely on Core Data or any database engine, the best way to solve this is to use the database itself.
I see two possible solutions (there is more of course). The first, the simplest :
Entity
- timestamp
- year
- month
- day
- all_the_stuff_you_need
Make year, month and day readonly, updated along timestamp. Indexes: year, year+month, year+month+day. Easy call.
That way, you can very simple query the database, asking it to return the entities you need and only the entities you need.
A more complex setup would be:
Entity
- timestamp
- all_the_stuff_you_need
- year -> Year
- month -> Month
- day -> Day
Year
- year
- entities ->> Entity
Month
- month
- entities ->> Entity
Day
- day
- entities ->> Entity
So basically, 3 data domains for the years, months and days, months and days being immutable.
That structure is more complex, but it gives a better view of your data. You have a direct access to more information on your data as the data domains are explicit and well defined.
A third solution would be to create a date entity with year, month and day, with one entry per day. A middle ground between the two solutions above. Less interesting I think, but hey, it may suit your needs anyway.