I am attempting to query a Postgres database to retrieve records for a specific customer that are from dates that are adjacent to the current day.
So for example:
If today is Wednesday, it will pull the records from Monday and Tuesday if they exist.
If today is Wednesday, it will not pull a record from Monday if Tuesday does not have a record.
If pulling a record from last Wednesday, it will pull Tuesday and Thursday if they exist.
The database is currently structured with a Dates table that contains:
Id----------|------DateTime--------
And the Id has a foreign key relationship with the Customers table containing:
CustomerId-------|-----DateId(fk)-----|----other columns.....
I have thought to simply query the database if the date exists before and after the queried date (and pull that record if exists), and then loop such that a success will query the next date before or after..... but that seems horribly inefficient.
Is there a better method to obtain these records?
EDIT:
Example Dates table:
Id | DateTime
1 | '2018-01-23 16:19:17.600305'
2 | '2018-01-24 00:03:11.213492'
3 | '2018-01-25 03:10:14.911771'
Sample Customers table:
Id | DateId | CustomerId | Location1 | Location2 | Transport
66 | 2 | 2 | Market | Library | Car
67 | 2 | 3 | Bookstore | Library | Car
68 | 3 | 3 | Pool | Town | Car
69 | 3 | 3 | Pool | Town | Bus
Join the tables and use the lag and lead window functions with a window clause like OVER (ORDER BY dates.datetime).
Related
Im badly in need of a way on how to merge almost identical records.
These set of records consist of customerID ItemID Location ItemName DeliveryTime.
Ex:
CName |cID |ItemID| Location| ItemName | DeliveryTime | Delivery time2 |
Artemys|123456| 321 | US | Mango | March 31,2022|
Artemys|123456| 321 | US | Mango | April 30 2022|
as you can see the only difference is the delivery time in the second record below. What i need is, put the delivery time of the second record join in the first record as delivery time 2 using SQL Automation.
Example record
I have a Production Table and a Standing Data table. The relationship of Production to Standing Data is actually Many-To-Many which is different to how this relationship is usually represented (Many-to-One).
The standing data table holds a list of tasks and the score each task is worth. Tasks can appear multiple times with different "ValidFrom" dates for changing the score at different points in time. What I am trying to do is query the Production Table so that the TaskID is looked up in the table and uses the date it was logged to check what score it should return.
Here's an example of how I want the data to look:
Production Table:
+----------+------------+-------+-----------+--------+-------+
| RecordID | Date | EmpID | Reference | TaskID | Score |
+----------+------------+-------+-----------+--------+-------+
| 1 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 2 | 27/02/2020 | 1 | 123 | 1 | 1.5 |
| 3 | 30/02/2020 | 1 | 123 | 1 | 2 |
| 4 | 31/02/2020 | 1 | 123 | 1 | 2 |
+----------+------------+-------+-----------+--------+-------+
Standing Data
+----------+--------+----------------+-------+
| RecordID | TaskID | DateActiveFrom | Score |
+----------+--------+----------------+-------+
| 1 | 1 | 01/02/2020 | 1.5 |
| 2 | 1 | 28/02/2020 | 2 |
+----------+--------+----------------+-------+
I have tried the below code but unfortunately due to multiple records meeting the criteria, the production data duplicates with two different scores per record:
SELECT p.[RecordID],
p.[Date],
p.[EmpID],
p.[Reference],
p.[TaskID],
s.[Score]
FROM ProductionTable as p
LEFT JOIN StandingDataTable as s
ON s.[TaskID] = p.[TaskID]
AND s.[DateActiveFrom] <= p.[Date];
What is the correct way to return the correct and singular/scalar Score value for this record based on the date?
You can use apply :
SELECT p.[RecordID], p.[Date], p.[EmpID], p.[Reference], p.[TaskID], s.[Score]
FROM ProductionTable as p OUTER APPLY
( SELECT TOP (1) s.[Score]
FROM StandingDataTable AS s
WHERE s.[TaskID] = p.[TaskID] AND
s.[DateActiveFrom] <= p.[Date]
ORDER BY S.DateActiveFrom DESC
) s;
You might want score basis on Record Level if so, change the where clause in apply.
I have prescription drug data that has a prescription date and the number of days supplied for that prescription. I am trying estimate actually drug intake dates which can be different then prescription date if people (1) refill their prescription before their current prescription is done or (2) they lost their current prescription and so need a refill.
Below is sample data for 1 patient:
| patient_id | rx_start_date | days_supply |
|------------|---------------|-------------|
| 1 | 1/10/2013 | 3 |
| 1 | 1/11/2013 | 3 |
| 1 | 1/14/2013 | 3 |
Without adjusting for stockpiling the end dates are calculated as rx_start_date + days_supply - 1 see:
| patient_id | rx_start_date | days_supply | rx_end_date |
|------------|---------------|-------------|-------------|
| 1 | 1/10/2013 | 3 | 1/12/2013 |
| 1 | 1/11/2013 | 3 | 1/13/2013 |
| 1 | 1/14/2013 | 3 | 1/16/2013 |
As you can see the start date for the 2nd prescription is overlapped by the first prescription. If we assume that they filled their prescription early then the actual intake date for the 2nd prescription should start on 1/13/2013. But moving the end date of the 2nd prescription causes an overlap over the 3rd prescription and so that must be moved as well. See the expected resulting table below:
| patient_id | rx_start_date | days_supply | rx_end_date |
|------------|---------------|-------------|-------------|
| 1 | 1/10/2013 | 3 | 1/12/2013 |
| 1 | 1/13/2013 | 3 | 1/15/2013 |
| 1 | 1/16/2013 | 3 | 1/18/2013 |
The other case is we might say if the current prescription overlaps the next one by more than 50% than we assume they lost their prescription and the 2nd prescription start date is the actual intake date. This means though that we need to truncate the current prescription to end when the 2nd one starts.
The algorithm is relatively simple using a non-sql iterative solution but I'm having trouble with a generic sql solution since adjusting dates at time X could potentially cause a cascading effect that adjust many other dates. I'm using Impala SQL so recursive CTE's are not an option and I'd like this to work on other databases so database specific functions are not ideal either.
The following should give you what you are looking for, so long as there are no gaps in the treatment regime:
with aggs as (select d1.patient_id, d1.rx_start_dt, sum(ds.days_supply) days_supply, min(ds.rx_start_dt) + sum(ds.days_supply) - 1 end_dt
from drugs d1
inner join drugs ds
on ds.patient_id = d1.patient_id and ds.rx_start_dt <= d1.rx_start_dt
group by d1.patient_id, d1.rx_start_dt)
select patient_id, coalesce(lag(end_dt+1) over (partition by patient_id order by rx_start_dt),rx_start_dt) start_dt, end_dt
from aggs;
Using the given sample data, this gives as output:
ID Start End
1 2013-01-10 2013-01-12
1 2013-01-13 2013-01-15
1 2013-01-16 2013-01-18
This was tested on Oracle, but all functions used appear to also be available in impala so should work there too.
I'm working with a SQLite database that receives large data dumps on a regular basis from several sources. Unfortunately, those sources aren't intelligent about what they dump, and I end up with a lot of repeated records from one time to the next. I'm looking for a way to remove these repeated records without affecting the records that have legitimately changed from the past dump to this one.
Here's the general structure of the data (_id is the primary key):
| _id | _dateUpdated | _dateEffective | _dateExpired | name | status | location |
|-----|--------------|----------------|--------------|------|--------|----------|
| 1 | 2016-05-01 | 2016-05-01 | NULL | Fred | Online | USA |
| 2 | 2016-05-01 | 2016-05-01 | NULL | Jim | Online | USA |
| 3 | 2016-05-08 | 2016-05-08 | NULL | Fred | Offline| USA |
| 4 | 2016-05-08 | 2016-05-08 | NULL | Jim | Online | USA |
| 5 | 2016-05-15 | 2016-05-15 | NULL | Fred | Offline| USA |
| 6 | 2016-05-15 | 2016-05-15 | NULL | Jim | Online | USA |
I'd like to be able to reduce this data to something like this:
| _id | _dateUpdated | _dateEffective | _dateExpired | name | status | location |
|-----|--------------|----------------|--------------|------|--------|----------|
| 1 | 2016-05-01 | 2016-05-01 | 2016-05-07 | Fred | Online | USA |
| 2 | 2016-05-15 | 2016-05-01 | NULL | Jim | Online | USA |
| 3 | 2016-05-15 | 2016-05-08 | NULL | Fred | Offline| USA |
The idea here is that rows 4, 5, and 6 exactly duplicate rows 2 and 3 except for the timestamps (I'd need to compare by all three fields - name, status, location). However, row 3 does not duplicate row 1 (status changed from Online to Offline), so the _dateExpired field is set in row 1, and row 3 becomes the most recent record.
I'm querying this table with something like this:
SELECT * FROM Data WHERE
date(_dateEffective) <= date("now")
AND (_dateExpired IS NULL OR date(_dateExpired) > date("now"))
Is this sort of reduction possible in SQLite?
I am still a beginner to SQL and database design in general, so it's possible that I haven't structured the database in the best way. I'm open to suggestions there as well...I'm going for the ability to query data at a given point in time - for example, "what was Jim's status around 2016-05-06?"
Thanks in advance!
Consider using a staging table where the dump file goes into a DumpTable (regularly cleaned out before each dump) and then an INSERT...SELECT query migrates to your final table.
Now the SELECT portion maintains a correlated subquery (to calculate new [_dateExpired] for needed rows) and derived table subquery (to filter out non-dups according to your criteria). Finally, the LEFT JOIN...NULL with FinalTable is to ensure no duplicate records are appended, assuming [_id] is a unique identifier. Below is the routine:
Clean Out DumpTable
DELETE FROM DumpTable;
Run Dump Routine to be appended into DumpTable
Append Records to FinalTable
INSERT INTO FinalTable ([_id], [_dateUpdated], [_dateEffective], [_dateExpired],
[name], status, location)
SELECT d.[_id], d.[_dateUpdated], d.[_dateEffective],
(SELECT Min(date(sub.[_dateEffective], '-1 day'))
FROM DumpTable sub
WHERE sub.[name] = DumpTable.[name]
AND sub.[_dateEffective] > DumpTable.[_dateEffective]
AND sub.status <> DumpTable.status) As calcExpired
d.name, d.status, d.location
FROM DumpTable d
INNER JOIN
(SELECT Min(DumpTable.[_id]) AS min_id,
DumpTable.name, DumpTable.status
FROM DumpTable
GROUP BY DumpTable.name, DumpTable.status) AS c
ON (c.name = d.name)
AND (c.min_id = d.[_id])
AND (c.status = d.status)
LEFT JOIN FinalTable f
ON d.[_id] = f.[_id]
WHERE f.[_id] IS NULL;
-- INSERTED RECORDS:
-- _id _dateUpdated _dateEffective _dateExpired name status location
-- 1 2016-05-01 2016-05-01 2016-05-07 Fred Online USA
-- 2 2016-05-01 2016-05-01 Jim Online USA
-- 3 2016-05-08 2016-05-08 Fred Offline USA
Is this sort of reduction possible in SQLite?
The answer to any "reduction" question in SQL is always Yes. The trick is to find what axes you're reducing along.
Here's a partial solution to illustrate; it gives the first Online date for each name & location.
select min(_dateEffective) as start_date
, name
, location
from Data
where status = 'Online'
group by
name
, location
With an outer join back to the table (on name & location) where the status is 'Offline' and the _dateEffective is greater than start_date, you get your _dateExpired.
_id is the primary key
There is a commonly held misunderstanding that every table needs some kind of sequential "ID" number as a primary key. The key you really care about is known as a natural key, 1 or more columns in the data that uniquely identify the data. In your case, it looks to me like that's _dateEffective, name, status, and location. At the very least, declare them unique to prevent accidental duplication.
I have a table called transactions that has the ledger from a storefront. Let's say it looks like this, for simplicity:
trans_id | cust | date | num_items | cost
---------+------+------+-----------+------
1 | Joe | 4/18 | 6 | 14.83
2 | Sue | 4/19 | 3 | 8.30
3 | Ann | 4/19 | 1 | 2.28
4 | Joe | 4/19 | 4 | 17.32
5 | Sue | 4/19 | 3 | 8.30
6 | Lee | 4/19 | 2 | 9.55
7 | Ann | 4/20 | 1 | 2.28
For the credit card purchases, I subsequently get an electronic ledger that has the full timestamp. So I have a table called cctrans with date, time, cust, cost, and some other info. I want to add a column trans_id to the cctrans table, that references the transactions table.
The update statement for this is simple enough, except for one hitch: I have an 11 AM transaction from Sue on 4/19 for $8.30 and a 3 PM transaction from Sue on 4/19 for $8.30 that are the same in the transactions table except for the trans_id field. I don't really care which record of the cctrans table gets linked to trans_id 2 and which one gets linked to trans_id 5, but they can't both be assigned the same trans_id.
The question here is: How do I accomplish that (ideally in a way that also works when a customer makes the same purchase three or four times in a day)?
The best I have so far is to do:
UPDATE cctrans AS cc
SET trans_id = t.trans_id
WHERE cc.cust = t.cust AND cc.date = t.date AND cc.cost = t.cost
And then fix them one-by-one via manual inspection. But obviously that's not my preferred solution.
Thanks for any help you can provide.