I have two tables -- one, a history table that contains a log of some kind of entries, and another (let's call it flags) that contains columns about flags (for a certain account). Both tables contain account IDs.
I want to write a query that only extracts rows from the flag table if the account ID does not already have an entry for that month in the history table (e.g., in the flag table, an entry was entered on April 2, 2019 and in the history table, the account already had an entry recorded on April 1, 2019. The result is, the April 2nd entry should not be pulled up).
I have a query right now that basically looks like this:
SELECT *multiple column names*
FROM flags
WHERE NOT EXISTS (SELECT acc_id FROM history WHERE ...)
This is where I am stuck. With the subquery, I basically want to get the matches where the dates from both tables match (same month and year), and with the WHERE NOT EXISTS, exclude the results from flag that are found in the subquery (essentially I only want results where the date for the entry is not from the same month)
The most important columns are:
the account ID (to correctly associate each log entry to the right account)
date (to only get rows where the month recorded is not already logged in the history table)
I initially used MONTH(), but that only extracts the month of the date. I need it to match both the month and the year because the history table contains a few years of data.
Any help would be greatly appreciated! Thank you in advance!
SELECT *multiple column names*
FROM flags
WHERE NOT EXISTS (
SELECT 1
FROM history
WHERE history.acc_id=flags.acc_id
AND date_trunc('month', history.date) =
date_trunc('month', flags.date)
)
The date_trunc function will work for postgres, which was one of the tags originally. If you aren't using postgres, there may be a similar function in your database, it you could format the dates to just Year-month and compare the resulting strings.
Related
I get the feeling this is easy for SQL, but I'm new to it. I want to query table CUST_TRAN and find out the ID for value $1 which is in column INVOICE.
I then want to use the newly discovered ID to update table TRANSACTS to change the DATE and DATE_TIME (two values in two columns) to value $2 using the column IDFROMTRAN to match the ID against. There are very often two rows with the same IDFROMTRAN that need to be updated, because the customer signs for the goods and that creates its own row.
Does any of this make sense?
Basically, to explain what I'm doing I'm changing the date on transactions to put them into last month, so that they'll be invoiced correctly and reflect when the customer was actually in store.
TLDR: How do I find out an alternative ID for a transaction then use that to change the date in two rows in another table?
Probably somebody has created this before, but I need to create a dummy table to be able to union to my other transaction table.
The use case is, I have daily transactional data with Fields like
"customer", "date", "sales"
No I want a dummy data with the same fields populated by zero values at day level,
customer information should read from customer master but the date field should be pre-generated at day level at least for current year and previous 3 years.
The idea is I want to union this dummy table to my original transaction table to be able to produced no null transactions at day level, and those days with no sales should be populated by zero.
This requirement is for BI Analytics projects.
The make it simple I don't want my table to have a missing transactions at day level.
Here is the sample screenshot
I'm working on BigQuery and have created a view using multiple tables. Each day data needs to be synced with multiple platforms. I need to insert a date or some other field via SQL through which I can identify which rows were added into the view each day or which rows got updated so only that data I can take forward each day instead of syncing all every day. Best way I can think is to somehow add the the current date wherever an update to a row happens but that date needs to be constant until a further update happens for that record.
Ex:
Sample data
Say we get the view T1 on 1st September and T2 on 2nd. I need to to only spot ID:2 for 1st September and ID:3,4,5 on 2nd September. Note: no such date column is there.I need help in creating such column or any other approach to verify which rows are getting updated/added daily
You can create a BigQuery schedule queries with frequency as daily (24 hours) using below INSERT statement:
INSERT INTO dataset.T1
SELECT
*
FROM
dataset.T2
WHERE
date > (SELECT MAX(date) FROM dataset.T1);
Your table where the data is getting streamed to (in your case: sample data) needs to be configured as a partitioned table. Therefor you use "Partition by ingestion time" so that you don't need to handle the date yourself.
Configuration in BQ
After you recreated that table append your existing data to that new table with the help of the format options in BQ (append) and RUN.
Then you create a view based on that table with:
SELECT * EXCEPT (rank)
FROM (
SELECT
*,
ROW_NUMBER() OVER (GROUP BY invoice_id ORDER BY _PARTITIONTIME desc) AS rank
FROM `your_dataset.your_sample_data_table`
)
WHERE rank = 1
Always use the view from that on.
I am trying to select all records in a time-variant Account table for each account with a change in an associated value (e.g. the maturity date). A change in the value will result in the most recent record for an account being end-dated and a new record (containing a new effective date of the following day) being created. The most recent records for accounts in this table have an end-date of 12/31/9000.
For instance, in the below illustration, account 44444444 would not be included in my query result set since it hasn't had a change in the value (and thus also has no additional records aside from the original); however, the other accounts have multiple changes in values (and multiple records), so I would want to see those returned.
Also, the table has a number of other fields (columns) not included below but for which changes in the values for these fields can trigger a new record being created; however, I only want to retrieve all records for those accounts where the figure in the “value” column has changed. What are some ways to obtain the results I need?
Note: The primary key for this table includes the acct_id and eff_dt, and I'm using PostgreSQL within a Greenplum environment.
Here are two types of queries I tried to use but which produced problematic results:
Query 1
Query 2
I think you want window functions to compare the value:
select t.*
from (select t.*,
min(t.value) over (partition by t.acct_id) as min_value,
max(t.value) over (partition by t.acct_id) as max_value
from t
) t
where min_value <> max_value;
I'm trying to generate monthly records in one table based on instructions in another table. Software - MS Access 2007, though I'm looking for an SQL solution here. To greatly simplify the matter, let's say the following describes the tables:
TaskManager:
- DayDue
- TaskName
Task:
- DateDue
- TaskName
So what happens is that there may be an entry in TaskManager {15, "Accounts due"}, so this should lead to an "Account due" record in the Task table with the due date being the 15th of each month. I'd want it to create records for the last few months and the next year.
What I'm thinking that I need to do is first create a SELECT query that results in x records for each record in the TaskManager table, with a date for each month. After that, I do an INSERT query which inserts records into the Task table if they do not EXIST in the aforementioned SELECT query.
I think I can manage the INSERT query, though I'm having trouble figuring out how to do the SELECT query. Could someone give me a pointer?
You could use a calendar table.
INSERT INTO Task ( DateDue, TaskName )
SELECT calendar.CalDate, TaskManager.TaskName
FROM calendar, TaskManager
WHERE (((Day([CalDate]))=TaskManager.DayDue)
AND ((calendar.CalDate)<#7/1/2013#));
The calendar table would simply contain all dates and other such relevant fields as work day (yesno). Calendar tables are generally quite useful.
Here is the solution I developed using Remou's Calendar table idea.
First create a Calendar table, which simply contains all dates for a desired range. It's easy to just make the dates in Excel and paste them into the table. This is also a very reliable way of doing it, as Excel handles leap years correctly for the modern range of dates.
After building this table, there are three queries to run. The first is a SELECT, which selects every possible task generated by the TaskManager based on the date and frequency. This query is called TaskManagerQryAllOptions, and has the following code:
SELECT TaskManager.ID, Calendar.CalendarDate
FROM TaskManager INNER JOIN Calendar ON
TaskManager.DateDay = Day(Calendar.CalendarDate)
WHERE (TaskManager.Frequency = "Monthly")
OR (TaskManager.Frequency = "Yearly" AND
TaskManager.DateMonth = Month(Calendar.CalendarDate))
OR (TaskManager.Frequency = "Quarterly" AND
(((Month(Calendar.CalendarDate)- TaskManager.DateMonth) Mod 3) = 0));
The bulk of the above is to cover the different options a quarterly Day and Month pair could cover. The next step is another SELECT query, which selects records from the TaskManagerQryAllOptions in which the date is within the required range. This query is called TaskManagerQrySelect.
SELECT TaskManagerQryAllOptions.ID, TaskManager.TaskName,
TaskManagerQryAllOptions.CalendarDate
FROM TaskManagerQryAllOptions INNER JOIN TaskManager
ON TaskManagerQryAllOptions.ID = TaskManager.ID
WHERE (TaskManagerQryAllOptions.CalendarDate > Date()-60)
AND (TaskManagerQryAllOptions.CalendarDate < Date()+370)
AND (TaskManagerQryAllOptions.CalendarDate >= TaskManager.Start)
AND ((TaskManagerQryAllOptions.CalendarDate <= TaskManager.Finish)
OR (TaskManager.Finish Is Null))
ORDER BY TaskManagerQryAllOptions.CalendarDate;
The final query is an INSERT. As we will be using this query frequently, we don't want it to generate duplicates, so we need to filter out already created records.
INSERT INTO Task ( TaskName, TaskDate )
SELECT TaskManagerQrySelect.TaskName, TaskManagerQrySelect.CalendarDate
FROM TaskManagerQrySelect
WHERE Not Exists(
SELECT *
FROM Task
WHERE Task.TaskName = TaskManagerQrySelect.TaskName
AND Task.TaskDate = TaskManagerQrySelect.CalendarDate);
One limitation of this method is that if the date of repetition (e.g. the 15th of each month) is changed, the future records with the wrong day will remain. A solution to this would be to update all the future records with the adjusted date, then run the insert.
One possibility could be to create a table of Months, and a table of Years (prior year, current, and next one). I could run a SELECT query which takes the Day from the TaskManager table, the Month from the Month table, and the Year from the Year table - I imagine that this could somehow create my desired multiple records for a single TaskManager record. Though I'm not sure what the exact SQL would be.