Nested sorting - SQL - sql

I have a table like this:
---------------------------------------------------------
| ArticleID |ReleaseTime | PurchaseTime |
--------------------------------------------------------
| 7 | 7/24/20 3:00 PM | NULL |
--------------------------------------------------------
| 5 | 7/16/20 1:00 PM | NULL |
---------------------------------------------------------
| 4 | 7/24/20 2:00 PM | NULL |
--------------------------------------------------------
| 1 | NULL | 7/25/20 5:45 PM |
--------------------------------------------------------
| 3 | NULL | 7/26/20 9:00 AM |
--------------------------------------------------------
| 3 | 7/25/20 8:30 AM | NULL |
---------------------------------------------------------
| 1 | 7/24/20 5:00 PM | NULL |
--------------------------------------------------------
| 1 | NULL | 7/25/20 6:00 PM |
---------------------------------------------------------
| 6 | 7/24/20 3:30 PM | NULL |
which needs to be sorted by ReleaseTime ASC and then by all PurchaseTime ASC for that ID. Result should look like this:
---------------------------------------------------------
| ArticleID |ReleaseTime | PurchaseTime |
--------------------------------------------------------
| 5 | 7/16/20 1:00 PM | NULL |
--------------------------------------------------------
| 4 | 7/24/20 2:00 PM | NULL |
---------------------------------------------------------
| 7 | 7/24/20 3:00 PM | NULL |
--------------------------------------------------------
| 6 | 7/24/20 3:30 PM | NULL |
--------------------------------------------------------
| 1 | 7/24/20 5:00 PM | NULL |
--------------------------------------------------------
| 1 | NULL | 7/25/20 5:45 PM |
--------------------------------------------------------
| 1 | NULL | 7/25/20 6:00 PM |
--------------------------------------------------------
| 3 | 7/25/20 8:30 AM | NULL |
---------------------------------------------------------
| 3 | NULL | 7/26/20 9:00 AM |
Any suggestion? Thanks in advance!

If I understand correctly, you want to sort by the minimum release time for each id and then sort by purchase time.
You can use a window function in the order by:
order by min(releasetime) over (partition by id),
id,
purchasetime asc

Related

How to return only one record when there are 2 or more repetitive time data?

In PostgreSQL I have 2 tables with fields:
Working_date: id (autonumeric), employee_code (varchar (6)), work_date (date), work_start_time (time), attendance_start_time (time), start_lunch (time), attendance_start_lunch (time), end_lunch (time), attendance_end_lunch (time), work_end_time (time), attendance_end_time (time)
Attendance: id (autonumeric), employee_code (varchar (6)), attendance_date (date), attendance_hour (time),
Example of table:
Working_date
ID | employee_code | work_date | work_start_time | attendance_start_time | start_lunch | attendance_start_lunch | end_lunch | attendance_end_lunch | work_end_time | attendance_end_time
1 | 12345 | 2021-04-19 | 08:00 | | 13:00 | | 15:00 | | 18:00 |
2 | 12345 | 2021-04-20 | 08:00 | | 13:00 | | 15:00 | | 18:00 |
3 | 12345 | 2021-04-21 | 08:00 | | 13:00 | | 15:00 | | 18:00 |
4 | 12345 | 2021-04-22 | 08:00 | | 13:00 | | 15:00 | | 18:00 |
5 | 12345 | 2021-04-23 | 08:00 | | 13:00 | | 15:00 | | 18:00 |
Attendance
ID | employee_code | attendance_date | attendance_time
1 | 12345 | 2021-04-19 | 07:57:23
2 | 12345 | 2021-04-19 | 07:57:29
3 | 12345 | 2021-04-19 | 13:00:42
4 | 12345 | 2021-04-19 | 14:55:12
5 | 12345 | 2021-04-19 | 18:05:21
6 | 12345 | 2021-04-19 | 18:12:01
7 | 12345 | 2021-04-20 | 07:50:45
8 | 12345 | 2021-04-20 | 12:59:56
9 | 12345 | 2021-04-20 | 13:00:03
10 | 12345 | 2021-04-20 | 14:58:10
11 | 12345 | 2021-04-20 | 18:02:06
12 | 12345 | 2021-04-21 | 07:58:15
13 | 12345 | 2021-04-21 | 13:02:01
14 | 12345 | 2021-04-21 | 14:52:08
15 | 12345 | 2021-04-21 | 14:52:12
16 | 12345 | 2021-04-21 | 18:05:22
17 | 12345 | 2021-04-21 | 18:05:27
18 | 12345 | 2021-04-22 | 07:44:25
19 | 12345 | 2021-04-22 | 13:05:08
20 | 12345 | 2021-04-22 | 14:57:11
21 | 12345 | 2021-04-22 | 18:10:27
22 | 12345 | 2021-04-23 | 07:51:16
23 | 12345 | 2021-04-23 | 13:11:02
24 | 12345 | 2021-04-23 | 14:58:59
25 | 12345 | 2021-04-23 | 18:01:17
In table "Attendance" there are some repetitive rows because employee entered attendance more than once. For example, taking as a reference the data of "Working_date" table:
- On 2021-04-19 there are 2 records (row 1 and 2) for "attendance_time (attendance_start_time)" (07:57:23, 07:57:29)
- On 2021-04-19 there are 2 records (row 5 and 6) for "attendance_time (attendance_end_time)" (18:05:21, 18:12:01)
- On 2021-04-20 there are 2 records (row 8 and 9) for "attendance_time (attendance_start_lunch)" (12:59:56, 13:00:03)
- On 2021-04-21 there are 2 records (row 14 and 15) for "attendance_time (attendance_end_lunch)" (14:52:08, 14:52:12)
- On 2021-04-21 there are 2 records (row 16 and 17) for "attendance_time (attendance_end_time)" (18:05:22, 18:05:27)
How can I get the minimal or maximun time for getting only 1 time for each 2 or more (sometimes can be 3) repetitive records?
It is only needed 4 records: "attendance_start_time", "attendance_start_lunch", "attendance_end_lunch" and "attendance_end_time". In the case of row 8 and 9 (Atendance table) it is needed to get the maximun time because it is greater than "start_lunch", in the case of row 1 and 2, it can be the minimal or maximun because both of them are less than "work_start_time"
Is there a way for getting only these 4 records and then insert them into their respective fields in "Working_date" table with pure SQL queries?

How to Do Data-Grouping in BigQuery?

I have list of database that needed to be grouped. I've successfully done this by using R, yet now I have to do this by using BigQuery. The data is shown as per following table
| category | sub_category | date | day | timestamp | type | cpc | gmv |
|---------- |-------------- |----------- |----- |------------- |------ |------ |--------- |
| ABC | ABC-1 | 2/17/2020 | Mon | 11:37:36 PM | BI | 1.94 | 252,293 |
| ABC | ABC-1 | 2/17/2020 | Mon | 11:37:39 PM | RT | 1.94 | 252,293 |
| ABC | ABC-1 | 2/17/2020 | Mon | 11:38:29 PM | RT | 1.58 | 205,041 |
| ABC | ABC-1 | 2/18/2020 | Tue | 12:05:14 AM | BI | 1.6 | 208,397 |
| ABC | ABC-1 | 2/18/2020 | Tue | 12:05:18 AM | RT | 1.6 | 208,397 |
| ABC | ABC-1 | 2/18/2020 | Tue | 12:05:52 AM | RT | 1.6 | 208,397 |
| ABC | ABC-1 | 2/18/2020 | Tue | 12:06:33 AM | BI | 1.55 | 201,354 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:55:47 PM | PP | 1 | 129,282 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:56:23 PM | PP | 0.98 | 126,928 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:57:19 PM | PP | 0.98 | 126,928 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:57:34 PM | PP | 0.98 | 126,928 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:58:46 PM | PP | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:59:27 PM | PP | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 11:59:51 PM | RT | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 12:00:57 AM | BI | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 12:01:11 AM | PP | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 12:03:01 AM | PP | 0.89 | 116,168 |
| XYZ | XYZ-1 | 2/17/2020 | Mon | 12:12:42 AM | RT | 1.19 | 154,886 |
I wanted to group the rows. A row that has <= 8 minutes timestamp-difference with the next row will be grouped as one row with below output example:
| category | sub_category | date | day | time | start_timestamp | end_timestamp | type | cpc | gmv |
|---------- |-------------- |----------------------- |--------- |---------- |--------------------- |--------------------- |---------- |------ |--------- |
| ABC | ABC-1 | 2/17/2020 | Mon | 23:37:36 | (02/17/20 23:37:36) | (02/17/20 23:38:29) | BI|RT | 1.82 | 236,542 |
| ABC | ABC-1 | 2/18/2020 | Tue | 0:05:14 | (02/18/20 00:05:14) | (02/18/20 00:06:33) | BI|RT | 1.59 | 206,636 |
| XYZ | XYZ-1 | 02/17/2020|02/18/2020 | Mon|Tue | 0:06:21 | (02/17/20 23:55:47) | (02/18/20 00:12:42) | PP|RT|BI | 0.95 | 123,815 |
There were some new-generated fields as per below:
| fields | definition |
|----------------- |-------------------------------------------------------- |
| day | Day of the row (combination if there's different days) |
| time | Start of timestamp |
| start_timestamp | Start timestamp of the first row in group |
| end_timestamp | Start timestamp of the last row in group |
| type | Type of Row (combination if there's different types) |
| cpc | Average CPC of the Group |
| gwm | Average GMV of the Group |
Could anyone help me to make the query as per above requirements?
Thank you
This is a gaps and island problem. Here is a solution that uses lag() and a cumulative sum() to define groups of adjacent records with less than 8 minutes gap; the rest is aggregation.
select
category,
sub_category,
string_agg(distinct day, '|' order by dt) day,
min(dt) start_dt,
max(dt) end_dt,
string_agg(distinct type, '|' order by dt) type,
avg(cpc) cpc,
avg(gwm) gwm
from (
select
t.*,
sum(case when dt <= datetime_add(lag_dt, interval 8 minute) then 0 else 1 end)
over(partition by category, sub_category order by dt) grp
from (
select
t.*,
lag(dt) over(partition by category, sub_category order by dt) lag_dt
from (
select t.*, datetime(date, timestamp) dt
from mytable t
) t
) t
) t
) t
group by category, sub_category, grp
Note that you should not be storing the date and time parts of your timestamps in separated columns: this makes the logic more complicated when you need to combine them (I added another level of nesting to avoid repeated conversions, which would have obfuscated the code).

Return in only one table info from two tables into rows when info is in columns

In PostgreSQL I have 2 tables with fields:
Working_date: id (autonumeric), employee_code (varchar (6)), working_date (date), working_hour (time),
Attendance: id (autonumeric), employee_code (varchar (6)), attendance_date (date), attendance_hour (time),
Data example:
Working_date
ID | employee_code | working_date | working_hour
1 | 12345 | 2015-07-09 | 08:00
2 | 12345 | 2015-07-09 | 13:00
3 | 12345 | 2015-07-09 | 14:00
4 | 12345 | 2015-07-09 | 17:00
5 | 12345 | 2015-07-10 | 08:00
6 | 12345 | 2015-07-10 | 13:00
7 | 12345 | 2015-07-10 | 14:00
8 | 12345 | 2015-07-10 | 17:00
9 | 12345 | 2015-07-11 | 08:00
10 | 12345 | 2015-07-11 | 13:00
11 | 12345 | 2015-07-11 | 14:00
12 | 12345 | 2015-07-11 | 17:00
13 | 12345 | 2015-07-12 | 08:00
14 | 12345 | 2015-07-12 | 13:00
15 | 12345 | 2015-07-12 | 14:00
16 | 12345 | 2015-07-12 | 17:00
17 | 12345 | 2015-07-13 | 08:00
18 | 12345 | 2015-07-13 | 13:00
19 | 12345 | 2015-07-13 | 14:00
20 | 12345 | 2015-07-13 | 17:00
Attendance
ID | employee_code | attendance_date | attendance_hour
1 | 12345 | 2015-07-09 | 07:56:53
2 | 12345 | 2015-07-09 | 10:33:31
3 | 12345 | 2015-07-09 | 13:00:42
4 | 12345 | 2015-07-09 | 13:00:47
5 | 12345 | 2015-07-09 | 13:30:21
6 | 12345 | 2015-07-09 | 17:00:01
7 | 12345 | 2015-07-10 | 07:48:35
8 | 12345 | 2015-07-10 | 12:15:20
9 | 12345 | 2015-07-10 | 13:58:42
10 | 12345 | 2015-07-10 | 17:02:00
11 | 12345 | 2015-07-11 | 08:06:46
12 | 12345 | 2015-07-11 | 12:00:01
13 | 12345 | 2015-07-11 | 13:52:01
14 | 12345 | 2015-07-11 | 17:05:08
15 | 12345 | 2015-07-12 | 07:55:02
16 | 12345 | 2015-07-12 | 12:03:22
17 | 12345 | 2015-07-12 | 13:37:40
18 | 12345 | 2015-07-12 | 17:05:01
19 | 12345 | 2015-07-13 | 07:54:25
20 | 12345 | 2015-07-13 | 10:44:15
21 | 12345 | 2015-07-13 | 13:59:21
22 | 12345 | 2015-07-13 | 17:01:17
In table "Attendance" there are some repetitive rows because employee entered attendance more than once. For example on 2015-07-09 there are 2 attendance times (13:00:42, 13:00:47) when it is time to go out for lunch. In this case, I should get only one of the two records.
The other case on 2015-07-09 there is 10:33:31. It is recorded when the employee asks permission to leave work and then returns in this case at 13:00:42 / 13:00:47.
Is there a way for getting working_date, working_hour with its respective attendance_hour in one table with only pure SQL queries (maybe some type of subqueries)?
Example:
ID | employee_code | working_date | working_hour1 | attendance_time_1 | working_hour2 | attendance_time_2 | working_hour3 | attendance_time_3 | working_hour4 | attendance_time_4
1 | 12345 | 2015-07-09 | 08:00 | 07:56:53 | 13:00:00 | 13:00:42 or 13:00:47 | 14:00 | 13:30:21 | 17:00 | 17:00:01
2 | 12345 | 2015-07-10 | 08:00 | 07:48:35 | 13:00:00 | 12:15:20 | 14:00 | 13:58:42 | 17:00 | 17:02:00
3 | 12345 | 2015-07-11 | 08:00 | 08:06:46 | 13:00:00 | 12:00:01 | 14:00 | 13:52:01 | 17:00 | 17:05:08
4 | 12345 | 2015-07-12 | 08:00 | 07:55:02 | 13:00:00 | 12:03:22 | 14:00 | 13:37:40 | 17:00 | 17:05:01
5 | 12345 | 2015-07-13 | 08:00 | 07:54:25 | 13:00:00 | 10:44:15 | 14:00 | 13:59:21 | 17:00 | 17:01:17
In case it is not possible to get it with pure SQL querie, how can it achieved with maybe PL/PGSQL?
Currently I make it with PHP like this:
I query employee_code and working_date fields from working_date table. This query is run between 2 dates: from_date, to_date.
Inside a "for statement" I consult with every row of working_date all the working_hour rows: working_hour1, working_hour2, working_hour3, working_hour4. For every row is run an SQL query. For this query I send it employee_code and working_date parameters.
Inside a nested "for statement", with every working_hour, I run a query to "attendance_date" table with parameters: employee_code, working_date and working_hour. It returns the attendance_hour for every working_hour.
This way (calling SELECTS from PHP with nested "for statement") is too slow for getting and showing the info. I see the process when executing it and process takes 100% of CPU.
You can join those tables on their dates and aggregate the attendance times into an array by grouping by date and employee_code somewhat like this:
SELECT
w.employee_code,
w.working_date,
array_agg(distinct(w.working_hour)) working_hours,
array_agg(distinct(a.attendance_hour)) attendance_hours
FROM Working_date w
LEFT JOIN attendance a
ON (w.working_date = a.attendance_date)
GROUP BY w.working_date, w.employee_code
ORDER BY w.working_date
You could use the unnest() function of postgres to unnest those arrays, but it will put them into new rows not columns. Putting them into separate columns is difficult because those arrays probably won't be the same length and all rows have to have the same columns.
Heres a fiddle
http://sqlfiddle.com/#!15/2a75c/7/0

UPDATE a Table with the smallest date between 2 tables

I have 2 Tables:
#Temdate1
+------+------------+---------------+--------+
| Year | Entry_Date | DeliveryMonth | Symbol |
+------+------------+---------------+--------+
| 2016 | 2016-01-07 | June | ABC |
| 2015 | 2015-01-06 | June | ABC |
| 2014 | 2014-01-05 | June | ABC |
| 2016 | 2016-03-05 | Sep | CDE |
| 2015 | 2015-03-04 | Sep | CDE |
| 2014 | 2014-03-03 | Sep | CDE |
+------+------------+---------------+--------+
and AllProducts
+-----------------+---------------+--------+
| Date | DeliveryMonth | Symbol |
+-----------------+---------------+--------+
| 2016-01-07 | June | ABC |
| 2016-01-08 | June | ABC |
| 2016-01-09 | June | ABC |
| 2016-01-10 | June | ABC |
| 2015-01-01 | June | ABC |
| 2015-01-02 | June | ABC |
| 2015-01-03 | June | ABC |
| 2014-01-05 | June | ABC |
+-----------------+---------------+--------+
Results I am looking for the Updated Table #Temdate1:
+------+------------+---------------+--------+
| Year | Entry_Date | DeliveryMonth | Symbol |
+------+------------+---------------+--------+
| 2016 | 2016-01-07 | June | ABC |
| 2015 | 2015-01-01 | June | ABC |
| 2014 | 2014-01-05 | June | ABC |
| 2016 | 2016-03-05 | Sep | CDE |
| 2015 | 2015-03-04 | Sep | CDE |
| 2014 | 2014-03-03 | Sep | CDE |
+------+------------+---------------+--------+
I have this query to find the smallest (earliest) date for a given Year and a given Product. With this query how to Update Temdate1 with the earliest date when ever it doesn't have the earliest date?
SELECT
Year
,CASE
WHEN MIN([Date])<entry_date THEN MIN([Date])
ELSE entry_date
END AS MDate
FROM #TempDate1 a
INNER JOIN AllProducts b on a.DeliveryMonth =b.DeliveryMonth AND a.Symbol = b.Symbol
GROUP BY Year,entry_date
It seems you make a typo in expected results, Or maybe was me
Update a
set Entry_Date = case when a.Entry_Date> b.Date then b.Date else a.Entry_Date end
from
#Tempdate1 a
inner join
#AllProducts b
on b.Symbol = a.Symbol
and b.DeliveryMonth = a.DeliveryMonth
and year(b.[Date]) = a.Year
http://rextester.com/AQXR21093

How to join 2 tables with some of transpose row to columns

for example i have 2 tables
info_table
id | Title | description
1 | title1 | dec1
2 | title2 | dec2
3 | title3 | dec3
Instance_Table
e_id | name | string
1 | date | 2015/01/19
2 | time | 10:00
3 | value | 10
1 | date | 2015/01/20
2 | time | 11:00
3 | value | 12
1 | date | 2015/01/21
2 | time | 12:00
3 | value | 13
What result expected:
id | Title | date | Time | value | Description
1 | title1 | 2015/01/19 | 10:00 | 10 | Des1
2 | title2 | 2015/01/20 | 11:00 | 11 | Des2
3 | title3 | 2015/01/21 | 12:00 | 13 | Des3
You should integrate a Foreign Key on the Instance_Table, and your e_id should be a Primary Key.
info_table
id | Title | description
1 | title1 | dec1
2 | title2 | dec2
3 | title3 | dec3
Instance_Table
e_id | name | string | FK_InfoTable
1 | date | 2015/01/19 | 1
2 | time | 10:00 | 1
3 | value | 10 | 1
4 | date | 2015/01/20 | 2
5 | time | 11:00 | 2
6 | value | 12 | 2
7 | date | 2015/01/21 | 3
8 | time | 12:00 | 3
9 | value | 13 |3
And with that kind of SQL Statement you should get what you want.
SELECT * FROM info_table INNER JOIN Instance_Table ON info_table.id = Instance_Table.FK_InfoTable
You can read on relationnal database here
Relationnal database WIKI