I need some help with a SQL (in particular: SQLite) related problem. I have a table 'vacation'
CREATE TABLE vacation(
name TEXT,
to_date TEXT,
from_date TEXT
);
where I store the date (YYYY-MM-DD), when somebody leaves for vacation and comes back again. Now, I would like to get a distinctive list of all dates, where somebody is on vacation. Let's assume my table looks like:
+------------+-------------+------------+
| name | to_date | from_date |
+------------+-------------+------------+
| Peter | 2013-07-01 | 2013-07-10 |
| Paul | 2013-06-30 | 2013-07-05 |
| Simon | 2013-05-10 | 2013-05-15 |
+------------+-------------+------------+
The result from the query should look like:
+------------------------------+
| dates_people_are_on_vacation |
+------------------------------+
| 2013-05-10 |
| 2013-05-11 |
| 2013-05-13 |
| 2013-05-14 |
| 2013-05-15 |
| 2013-06-30 |
| 2013-07-01 |
| 2013-07-02 |
| 2013-07-03 |
| 2013-07-04 |
| 2013-07-05 |
| 2013-07-06 |
| 2013-07-07 |
| 2013-07-08 |
| 2013-07-09 |
| 2013-07-10 |
+------------------------------+
I thought about using a date - table 'all_dates'
CREATE TABLES all_dates(
date_entry TEXT
);
which covers a 20 year time span (2010-01-01 to 2030-01-01) and the following query:
SELECT date_entry FROM all_dates WHERE date_entry BETWEEN (SELECT from_date FROM vacation) AND (SELECT to_date FROM vacation);
However, If i apply this query on the above dataset, I only get a fraction of my desired result:
+------------------------------+
| dates_people_are_on_vacation |
+------------------------------+
| 2013-07-01 |
| 2013-07-02 |
| 2013-07-03 |
| 2013-07-04 |
| 2013-07-05 |
| 2013-07-06 |
| 2013-07-07 |
| 2013-07-08 |
| 2013-07-09 |
| 2013-07-10 |
+------------------------------+
Can it be done with SQLite? Or it is better, if I just return the 'to_date' and 'from_date' column and fill the gaps between these dates in my Python application?
Any help is appreciated!
You can try that:
SELECT date_entry
FROM vacation
JOIN all_dates ON date_entry BETWEEN from_date AND to_date
GROUP BY date_entry
ORDER BY date_entry
Related
First of all, I am very very new to SQL.
I would like to convert the table format as follows:
--------------------------------
| ID | create_date |
--------------------------------
| 1 | 2020-03-01 |
| 1 | 2020-04-01 |
| 2 | 2019-03-15 |
| 2 | 2020-04-20 |
| 2 | 2021-05-30 |
| 3 | 2022-04-01 |
--------------------------------
on such:
-----------------------------------------------
| ID | create_date | to_date |
-----------------------------------------------
| 1 | 2020-03-01 | 2020-03-31 |
| 1 | 2020-04-01 | 9999-12-31 |
| 2 | 2019-03-15 | 2020-04-19 |
| 2 | 2020-04-20 | 2021-05-29 |
| 2 | 2021-05-30 | 9999-12-31 |
| 3 | 2022-04-01 | 9999-12-31 |
-----------------------------------------------
using Oracle SQL.
As you can see, I have records with the same customers (id) but different dates (create_date).
I want to create a new column (let's call it to_date) in which I will have the appriopriate value:
1. if first `id` is the same as second `id`, put the same date as in second row but with -1 day
<so in first row it will be '31.03.2020', beacause in second row there is '01.04.2020'>
2. if first `id` is NOT the same as second `id`, put the date `31.12.9999` (or `9999-12-31`)
<or in other words, put '31.12.9999' for every unique id that has the biggest create_date>
In Python it would look more or less like this:
for i in range(len(df)):
if df.ID[i] == df.ID[i+1]:
to_date[i] = create_date[i+1]-1
else:
to_date[i] = '9999-12-31'
if df.ID[len(df)]:
to_date[i] = '9999-12-31'
How can I achieve that in Oracle SQL?
IF you want to do it all in sql, you can use LEADthe get the next date using "ID" as Partition
SELECt
"ID","create_date"
, COALESCE(LEAD("create_date") OVER(PARTITION By "ID" ORDEr BY "create_date") - INTERVAL '1' DAY,
DATE '9999-12-31')
as "created_at"
FROM
tab1
ID | create_date | created_at
-: | :---------- | :---------
1 | 01-MAR-20 | 31-MAR-20
1 | 01-APR-20 | 31-DEC-99
2 | 15-MAR-19 | 19-APR-20
2 | 20-APR-20 | 29-MAY-21
2 | 30-MAY-21 | 31-DEC-99
3 | 01-APR-22 | 31-DEC-99
db<>fiddle here
Below is my data where am looking to generate sum of revenues per month basis using columns event_time and price.
+--------------------------+----------------------+----------------------+-----------------------+-------------------------+-----------------+-----------------+-------------------+---------------------------------------+
| oct_data.event_time | oct_data.event_type | oct_data.product_id | oct_data.category_id | oct_data.category_code | oct_data.brand | oct_data.price | oct_data.user_id | oct_data.user_session |
+--------------------------+----------------------+----------------------+-----------------------+-------------------------+-----------------+-----------------+-------------------+---------------------------------------+
| 2019-10-01 00:00:00 UTC | cart | 5773203 | 1487580005134238553 | | runail | 2.62 | 463240011 | 26dd6e6e-4dac-4778-8d2c-92e149dab885 |
| 2019-10-01 00:00:03 UTC | cart | 5773353 | 1487580005134238553 | | runail | 2.62 | 463240011 | 26dd6e6e-4dac-4778-8d2c-92e149dab885 |
| 2019-10-01 00:00:07 UTC | cart | 5881589 | 2151191071051219817 | | lovely | 13.48 | 429681830 | 49e8d843-adf3-428b-a2c3-fe8bc6a307c9 |
| 2019-10-01 00:00:07 UTC | cart | 5723490 | 1487580005134238553 | | runail | 2.62 | 463240011 | 26dd6e6e-4dac-4778-8d2c-92e149dab885 |
| 2019-10-01 00:00:15 UTC | cart | 5881449 | 1487580013522845895 | | lovely | 0.56 | 429681830 | 49e8d843-adf3-428b-a2c3-fe8bc6a307c9 |
| 2019-10-01 00:00:16 UTC | cart | 5857269 | 1487580005134238553 | | runail | 2.62 | 430174032 | 73dea1e7-664e-43f4-8b30-d32b9d5af04f |
| 2019-10-01 00:00:19 UTC | cart | 5739055 | 1487580008246412266 | | kapous | 4.75 | 377667011 | 81326ac6-daa4-4f0a-b488-fd0956a78733 |
| 2019-10-01 00:00:24 UTC | cart | 5825598 | 1487580009445982239 | | | 0.56 | 467916806 | 2f5b5546-b8cb-9ee7-7ecd-84276f8ef486 |
| 2019-10-01 00:00:25 UTC | cart | 5698989 | 1487580006317032337 | | | 1.27 | 385985999 | d30965e8-1101-44ab-b45d-cc1bb9fae694 |
| 2019-10-01 00:00:26 UTC | view | 5875317 | 2029082628195353599 | | | 1.59 | 474232307 | 445f2b74-5e4c-427e-b7fa-6e0a28b156fe |
+--------------------------+----------------------+----------------------+-----------------------+-------------------------+-----------------+-----------------+-------------------+---------------------------------------+
I have used the below query but the sum does not seem to occur. Please suggest best approaches to generate the desired output.
select date_format(event_time,'MM') as Month,
sum(price) as Monthly_Revenue
from oct_data_new
group by date_format(event_time,'MM')
order by Month;
Note: event_time field is in TIMESTAMP format.
First convert the timestamp to date and then apply date_format():
select date_format(cast(event_time as date),'MM') as Month,
sum(price) as Monthly_Revenue
from oct_data_new
group by date_format(cast(event_time as date),'MM')
order by Month;
This will work if all the dates are of the same year.
If not then you should also group by year.
Your code should work -- unless you are using an old version of Hive. date_format() has accepted a timestamp argument since 1.1.2 -- released in early 2016. That said, I would strongly suggest that you include the year:
select date_format(event_time, 'yyyy-MM') as Month,
sum(price) as Monthly_Revenue
from oct_data_new
group by date_format(event_time, 'yyyy-MM')
order by Month;
I have a set of data that tells me the owner for something for each date, sample data below. There are some breaks in the date column.
| owner | date |
|-------------+-------------+
| Samantha | 2010-01-02 |
| Max | 2010-01-03 |
| Max | 2010-01-04 |
| Max | 2010-01-06 |
| Max | 2010-01-07 |
| Conor | 2010-01-08 |
| Conor | 2010-01-09 |
| Conor | 2010-01-10 |
| Conor | 2010-01-11 |
| Abigail | 2010-01-12 |
| Abigail | 2010-01-13 |
| Abigail | 2010-01-14 |
| Abigail | 2010-01-15 |
| Max | 2010-01-17 |
| Max | 2010-01-18 |
| Abigail | 2010-01-20 |
| Conor | 2010-01-21 |
I am trying to write a query that can capture date ranges for when each owner's interval.. such as
| owner | start | end |
|-------------+------------+------------+
| Samantha | 2010-01-02 | 2010-01-02 |
| Max | 2010-01-03 | 2010-01-04 |
| Max | 2010-01-06 | 2010-01-07 |
| Conor | 2010-01-08 | 2010-01-11 |
| Abigail | 2010-01-12 | 2010-01-15 |
| Max | 2010-01-17 | 2010-01-18 |
| Abigail | 2010-01-20 | 2010-01-20 |
| Conor | 2010-01-21 | 2010-01-21 |
I tried think of this using min() and max() but I am stuck. I feel like I need to use lead() and lag() but not sure how to use them to get the output I want. Any ideas? Thanks in advance!
This is a typical gaps-and-island problem. Here is one way to solve it using row_number():
select owner, min(date) start, max(date) end
from (
select
owner,
row_number() over(order by date) rn1,
row_number() over(partition by owner, order by date) rn2
from mytable
) t
group by owner, rn1 - rn2
This works by ranking records by date over two different partitions (within the whole table and within groups having the same owner). The difference between the ranks gives you the group each record belongs to. You can run the inner query and look at the results to understand the logic.
This is a gaps-and-islands problem. You want to solve it by subtracting a sequential value from the date and aggregating:
select owner, min(date), max(date)
from (select t.*,
row_number() over (partition by owner order by date) as seqnum
from t
) t
group by owner, (date - seqnum * interval '1 day')
order by min(date);
The magic is that the sequence subtracted from the date is constant when the date values increment.
I have database table in postgreSQL name as "time" like:
| Name | | Date1 | |AttendHour1| | Date2 | |AttendHour2|
---------------------------------------------------------------------
| Zakir1 | | 2018-10-01 | | 8.00 | | 2018-10-02 | | 8.00 |
| Zakir2 | | 2018-10-01 | | 9.00 | | 2018-10-02 | | 9.00 |
| Zakir3 | | 2018-10-01 | | 7.00 | | 2018-10-02 | | 7.00 |
From this table I want the result like..
| Name | | 2018-10-01 | | 2018-10-02 |
----------------------------------------
| Zakir1 | | 8.00 | | 8.00 |
| Zakir2 | | 9.00 | | 9.00 |
| Zakir3 | | 7.00 | | 7.00 |
What is postgreSQL Query ?
As it stands, you don't even need a crosstab() query for this. Just:
SELECT name, AttendHour1 AS "2018-10-01", AttendHour2 AS "2018-10-02"
FROM time;
If your desire is to assign column names dynamically from column values: that's not possible. SQL does not allow dynamic column names. You need a two-step workflow:
1. Create the query string dynamically.
To generate above query:
SELECT format('SELECT name, AttendHour1 AS %I, AttendHour2 AS %I FROM time'
, date1, date2)
FROM time
LIMIT 1;
2. Execute the query.
There is a table like:
+-----------+---------+------------+
| uid | user_id | month |
+-----------+---------+------------+
| d23fsdfsa | 101 | 2017-01-02 |
| 43gdasc | 102 | 2017-05-06 |
| b65hrfd | 101 | 2017-08-11 |
| 1wseda | 103 | 2017-09-13 |
| vdfhryd | 101 | 2017-08-06 |
| b6thd3d | 105 | 2017-05-03 |
| ve32h65 | 102 | 2017-01-02 |
| 43gdasc | 102 | 2017-09-06 |
+-----------+---------+------------+
How can one count each user_id where if the user_id appears in the same month, then only count one?
The final table should look like below: (because '101' has two uid in the same month so it only counts one for it)
+---------+-----------+
| user_id | count_num |
+---------+-----------+
| 101 | 2 |
| 102 | 3 |
| 103 | 1 |
| 105 | 1 |
+---------+-----------+
If I understand correctly, you want the number of distinct months for each user. If so:
select user_id, count(distinct trunc(month, 'MONTH')) as count_num
from t
group by user_id;