Return All Historical Records for Accounts with Change in Specific Associated Value - sql

I am trying to select all records in a time-variant Account table for each account with a change in an associated value (e.g. the maturity date). A change in the value will result in the most recent record for an account being end-dated and a new record (containing a new effective date of the following day) being created. The most recent records for accounts in this table have an end-date of 12/31/9000.
For instance, in the below illustration, account 44444444 would not be included in my query result set since it hasn't had a change in the value (and thus also has no additional records aside from the original); however, the other accounts have multiple changes in values (and multiple records), so I would want to see those returned.
Also, the table has a number of other fields (columns) not included below but for which changes in the values for these fields can trigger a new record being created; however, I only want to retrieve all records for those accounts where the figure in the “value” column has changed. What are some ways to obtain the results I need?
Note: The primary key for this table includes the acct_id and eff_dt, and I'm using PostgreSQL within a Greenplum environment.
Here are two types of queries I tried to use but which produced problematic results:
Query 1
Query 2

I think you want window functions to compare the value:
select t.*
from (select t.*,
min(t.value) over (partition by t.acct_id) as min_value,
max(t.value) over (partition by t.acct_id) as max_value
from t
) t
where min_value <> max_value;

Related

Querying a table from a parameter in a BigQuery UDF

I am trying to create a UDF that will find the maximum value of a field called 'DatePartition' for each table that is passed through to the UDF as a parameter. The UDF I have created looks like this:
CREATE TEMP FUNCTION maxDatePartition(x STRING) AS ((
SELECT MAX(DatePartition) FROM x WHERE DatePartition >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(),INTERVAL 7 DAY)
));
but I am getting the following error: "Table name "x" missing dataset while no default dataset is set in the request."
The table names will get passed to the UDF in the format:
my-project.my-dataset.my-table
EDIT: Adding more context: I have multiple tables that are meant to update every morning with yesterday's data. Sometimes the tables are updated later than expected so I am creating a view which will allow users to quickly see the most recent data in each table. To do this I need to calculate MAX(DatePartition) for all of these tables in one statement. The list of tables will be stored in another table but it will change from time to time so I can't hardcode them in.
I have tried to do it in a single statement, but have found I need to invoke a common table expression as a sorting mechanism. I haven't found success using the MAX() function on TIMESTAMPs. Here is a method that has worked the best for me that I've discovered (and most concise). No UDF needed. Try something like this:
WITH
DATA AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY your_group_by_fields ORDER BY DatePartition DESC) AS _row,
*
FROM
`my-project.my-dataset.my-table`
WHERE
Date_Partition >= TIMESTAMP_SUB(CURRENT_TIMESTAMP, INTERVAL 7 DAY)
)
SELECT
* EXCEPT(_row)
FROM
DATA
WHERE
_row = 1;
What this does is creates a new field with a row number for each partition of whatever grouped field that has muliple records of different timestamps. So for each of the records of a certain group, it will order them by most recent DatePartition and give them a row number value with "1" being the most recent since we sorted the DatePartition DESC.
Then it takes your common table expression of sorted values, and just returns everything in your table (EXCEPT that row number "_row" you assigned) and then filter only on "_row =1" which will be your most recent records.

Remove all non-duplicate rows based on a single column in a SQL Server table where there conditions on multiple columns for the select statement

I am attempting to pull data from a SQL Server table that meets certain criteria. Part of that criteria is that there can be multiple rows with the same data in my column and I need all of those rows returned...what I do not want are rows returned that are distinct.
I want to find a session that is in a specific date range, and meets one of two types of action, and are multiple, meaning there are two or more rows for the session.
Example SQL query:
SELECT activity and message
FROM myTable
WHERE (date BETWEEN '1/1/2020' and '1/31/2020')
AND activity IN ('trace', 'info')
Can you advise how I can grab the rows that meet my criteria of being in the correct date range and with the correct activity, but that have multiple rows only. I want no data that does not meet those three criteria.
Update to Body:
In creating the example query in my initial post, I neglected to include the label column. So the SELECT should read "SELECT activity, label and message FROM myTable WHERE (date BETWEEN '1/1/2020' and '1/31/2020') AND activity IN ('trace','info')". Based on sample data, I would expect the following return:
activity message label
-------- ------- -----
trace logged 1234
info written 1234
Label '1234' is the only value that meets all criteria: falls in the date range, meets activity values and has multiple rows.
With the limited information I could only make a wild guess and see if something like below works for you.
SELECT activity,
message,
COUNT(*) AS Count
FROM myTable
WHERE date BETWEEN '1/1/2020' and '1/31/2020'
AND activity IN ('trace', 'info')
AND message IN ('logged', 'written')
GROUP BY activity,
message
HAVING COUNT(*) > 1
ORDER BY Count DESC

Excluding results that are within same month in SQL

I have two tables -- one, a history table that contains a log of some kind of entries, and another (let's call it flags) that contains columns about flags (for a certain account). Both tables contain account IDs.
I want to write a query that only extracts rows from the flag table if the account ID does not already have an entry for that month in the history table (e.g., in the flag table, an entry was entered on April 2, 2019 and in the history table, the account already had an entry recorded on April 1, 2019. The result is, the April 2nd entry should not be pulled up).
I have a query right now that basically looks like this:
SELECT *multiple column names*
FROM flags
WHERE NOT EXISTS (SELECT acc_id FROM history WHERE ...)
This is where I am stuck. With the subquery, I basically want to get the matches where the dates from both tables match (same month and year), and with the WHERE NOT EXISTS, exclude the results from flag that are found in the subquery (essentially I only want results where the date for the entry is not from the same month)
The most important columns are:
the account ID (to correctly associate each log entry to the right account)
date (to only get rows where the month recorded is not already logged in the history table)
I initially used MONTH(), but that only extracts the month of the date. I need it to match both the month and the year because the history table contains a few years of data.
Any help would be greatly appreciated! Thank you in advance!
SELECT *multiple column names*
FROM flags
WHERE NOT EXISTS (
SELECT 1
FROM history
WHERE history.acc_id=flags.acc_id
AND date_trunc('month', history.date) =
date_trunc('month', flags.date)
)
The date_trunc function will work for postgres, which was one of the tags originally. If you aren't using postgres, there may be a similar function in your database, it you could format the dates to just Year-month and compare the resulting strings.

Get latest data for all people in a table and then filter based on some criteria

I am attempting to return the row of the highest value for timestamp (an integer) for each person (that has multiple entries) in a table. Additionally, I am only interested in rows with the field containing ABCD, but this should be done after filtering to return the latest (max timestamp) entry for each person.
SELECT table."person", max(table."timestamp")
FROM table
WHERE table."type" = 1
HAVING table."field" LIKE '%ABCD%'
GROUP BY table."person"
For some reason, I am not receiving the data I expect. The returned table is nearly twice the size of expectation. Is there some step here that I am not getting correct?
You can 1st return a table having max(timestamp) and then use it in sub query of another select statement, following is query
SELECT table."person", timestamp FROM
(SELECT table."person",max(table."timestamp") as timestamp, type, field FROM table GROUP BY table."person")
where type = 1 and field LIKE '%ABCD%'
Direct answer: as I understand your end goal, just move the HAVING clause to the WHERE section:
SELECT
table."person", MAX(table."timestamp")
FROM table
WHERE
table."type" = 1
AND table."field" LIKE '%ABCD%'
GROUP BY table."person";
This should return no more than 1 row per table."person", with their associated maximum timestamp.
As an aside, I surprised your query worked at all. Your HAVING clause referenced a column not in your query. From the documentation (and my experience):
The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed.

Create Select distinct query with criteria of having the latest date

I have been struggling with creating a query in Access to select a distinct field with the criteria of having the newest entry in the database.
Heres a brief summary of how what my table conssists of. I have a table with surveying data collected from 2007 to the present. We have field with a survey marks name with corresponding adjustment data. In the corresponding data there is field with the adjusmtent date. Many of the marks have been occupied mutiple times and only want to retrieve the most recent occupation information.
Roughly i want to
SELECT DISTINCT STATUS_POINT_DESIGNATION
FROM __ALL_ADJUSTMENTS
WHERE [__ALL_ADJUSMENTS]![ADJ_DATE]=MAX(ADJ_DATE)
I seem to be getting confused how relate the select a distinct value with a constraint. Any Suggestions?
DH
Seems you could achieve your aim of getting the latest observation for each survey point by a summary function:
SELECT STATUS_POINT_DESIGNATION, Max(ADJ_DATE) AS LatestDate, Count(STATUS_POINT_DESIGNATION) AS Observations
FROM __ALL_ADJUSTMENTS
GROUP BY STATUS_POINT_DESIGNATION;