Making new column by subtracting date column from current date after joining tables in SQL - sql

I have two tables that I have joined and I want to add to the result of that join another column. This column will be the result of the current date minus the column date called start_date. I would like to call the new column days because the difference would be in days.
Here is how my join looks like:
SELECT t.kind,
t.start_date,
t.user,
t.user_email,
t.completed,
g.name
FROM d_trans AS t
JOIN d_gar AS g
ON t.rest_id = g.id
WHERE t.completed = false
AND t.kind = 'RESTING';
My SQL coding skills are limited. I usually just work with R after importing the data. This would be simple in R but I am not sure how to do this in SQL. I looked online for similar questions and I saw things like ALTER TABLE ADD to add a new column and the function GETDATE() to get the current date but everything I tried did not work. I am working in Athena from AWS I don't know if my problems are due to the limitations of Athena.

Use date_diff('day', current_date, date_col) function:
SELECT t.kind,
t.start_date,
t.user,
t.user_email,
t.completed,
g.name,
date_diff('day', cast(t.start_date as timestamp), current_timestamp) as days
FROM d_trans AS t
JOIN d_gar AS g
ON t.rest_id = g.id
WHERE t.completed = false
AND t.kind = 'RESTING';

Related

How can I create a query in SQL Server, using as base table a date function and linking it to another table?

I am trying to create a query using a function of dates and a table of shifts, which can show me the shifts of workers each day, when I have a shift or rest depending on the day,
What do I have: I have a date function that gives me a range of dates that I add, I attach an example:
I have a table of shifts, with only the days that a person has a shift, if a day has a break, the date or the row does not appear, I attach an example:
It can be seen that in the shift table there are only records when a person has a shift.
Problem: when I perform the join between the function and the shift table through the date field, the result is that it only shows me the record when it has a shift and no, it does not put the date when it has a break, I attach an example:
Desired result:
The idea is that when the worker has a break, the row will be blank, only showing the date and his ID, or saying the word break.
I hope you can help me. Thank you so much.
Use LEFT JOIN for avoiding few date missing which has transaction in table.
Use two subquery here for getting appropriate result. In first subquery function CROSS JOIN with transaction table where retrieving distinct id_trabajador for specified date range. If it doesn't do then id will blank in result where no transaction exists for specific id in a certain date. In second subquery retrieve all rows for given date range.
-- SQL Server
SELECT tmp.fecha, tmp.id_trabajador
, tmd.inicio, tmd.termino
, COALESCE(CAST(tmd.jornada AS varchar(20)), 'DESCANSO') jornada
FROM (SELECT * FROM shift_cmr..fnRangoFechas('01-sep-2021', '31-dec-2021') t
CROSS JOIN (SELECT id_trabajador
FROM shift_cmr..trabajadores_turnos_planificados
WHERE fecha BETWEEN '2021-09-01' AND '2021-12-31'
GROUP BY id_trabajador) tt
) tmp
LEFT JOIN (SELECT *
FROM shift_cmr..trabajadores_turnos_planificados
WHERE fecha BETWEEN '2021-09-01' AND '2021-12-31') tmd
ON tmp.fecha = tmd.fecha
AND tmp.id_trabajador = tmd.id_trabajador
You need to start with the date table and LEFT JOIN everything else
SELECT
dates.fecha,
sh.id_trabajador,
sh.inicio,
sh.termino,
jornada = ISNULL(CAST(sh.jornada AS varchar(10)), 'DESCANSO')
FROM shift_cmr..fnRangoFechas('01-sep-2021', '31-dec-2021') dates
LEFT JOIN shift_cmr..trabajadores_turnos_planificados sh
ON sh.fecha = dates.fecha
This only gives you one blank row per date. If you need a blank row for every id_trabajador then you need to cross join that
SELECT
dates.fecha,
t.id_trabajador,
sh.inicio,
sh.termino,
jornada = ISNULL(CAST(sh.jornada AS varchar(10)), 'DESCANSO')
FROM shift_cmr..fnRangoFechas('01-sep-2021', '31-dec-2021') dates
CROSS JOIN shift_cmr..trabajadores t -- guessing the table name
LEFT JOIN shift_cmr..trabajadores_turnos_planificados sh
ON sh.fecha = dates.fecha AND t.id_trabajador = sh.id_trabajador

SQL Query to pull date range based on dates in a table

I have some SQL code which is contained in an SSRS report and when run pulls a list of student detentions for a set period such as a week or month but I have been asked to get the report to run automatically from the start of the current school term to the date the report has been run. Is this possible? We have 3 terms per year and the dates change each year. The report has multiple subscriptions which will run weekly and filter to students in particular day houses and years so we ideally need the report to update itself.
We have a table in our database titled TblSchoolManagementTermDates which includes txtStartDate and txtFinishDate columns for each term.
The date of the detention is stored in the column detPpl.dDetentionDate
The full SQL code I am currently using is:
SELECT ppl.txtSchoolID AS SchoolID,
detPpl.TblDisciplineManagerDetentionsPupilsID AS DetentionID,
ppl.txtSurname AS Surname,
ppl.txtForename AS Forename,
ppl.txtPrename AS PreferredName,
ppl.intNCYear AS Year,
ppl.txtAcademicHouse AS House,
schTermDates.intSchoolYear AS AcademicYear,
schTerms.txtName AS TermName,
CAST(schTermDates.intSchoolYear AS CHAR(4)) + '/' +
RIGHT(CAST(schTermDates.intSchoolYear + 1 AS CHAR(4)), 2) AS AcademicYearName,
detPpl.dDetentionDate AS DetentionDate,
detSessions.txtSessionName AS DetentionName,
detPpl.txtOffenceDescription AS OffenceDescription,
LEFT(Staff.Firstname, 1) + '. ' + Staff.Surname AS PutInBy,
detPpl.intPresent AS AttendedDetention
FROM dbo.TblPupilManagementPupils AS ppl
INNER JOIN
dbo.TblDisciplineManagerDetentionsPupils AS detPpl
ON detPpl.txtSchoolID = ppl.txtSchoolID
INNER JOIN
dbo.TblDisciplineManagerDetentionsSessions AS detSessions
ON detPpl.intDetentionSessionID = detSessions.TblDisciplineManagerDetentionsSessionsID
INNER JOIN
dbo.TblStaff AS Staff
ON Staff.User_Code = detPpl.txtSubmittedBy
INNER JOIN
dbo.TblSchoolManagementTermDates AS schTermDates
ON detPpl.dDetentionDate BETWEEN schTermDates.txtStartDate AND schTermDates.txtFinishDate
INNER JOIN
dbo.TblSchoolManagementTermNames AS schTerms
ON schTermDates.intTerm = schTerms.TblSchoolManagementTermNamesID
LEFT OUTER JOIN
dbo.TblDisciplineManagerDetentionsCancellations AS Cancelled
ON Cancelled.intSessionID = detPpl.intDetentionSessionID
AND Cancelled.dDetDate = detPpl.dDetentionDate
WHERE (ppl.txtAcademicHouse = 'Challoner') AND (Cancelled.TblDisciplineManagerDetentionsCancellationsID IS NULL) AND (CAST(detPpl.dDetentionDate AS DATE) >= CAST (GETDATE()-28 AS DATE))
ORDER BY ppl.txtSurname, ppl.txtForename, detPpl.dDetentionDate
What you need is to assign a couple of parameters to this code.
lets call the parameters
#term_start
and
#term_end
In your where clause you simply need to remove this piece
AND (CAST(detPpl.dDetentionDate AS DATE) >= CAST (GETDATE()-28 AS DATE))
and add this piece in
AND (CAST(detPpl.dDetentionDate AS DATE) between #term_start and #term_end
Now create another dataset based on your term dates - lets call the dataset term_dates
something like this (I'm making up these fields as I don't know what columns are available or have no sample data) Use the idea below to adapt to your requirements
select
min(term_start_date) as start_date
,max(term_end_date) as end_date
from TblSchoolManagementTermNames
where convert(date,getdate()) between term_start_date and term_end_date
Now your report should have 2 parameters.. You simply need to set the default value for the parameters.
Set the default value for #term_start as the start_date and #term_end as the end_date from your term_dates dataset
Run your report.. You should have the data between the term dates.
This should work.. unless I've misunderstood the requirement

Can someone help me with this join

I need it to give me me a total of 0 for week 33 - 39, but I'm really bad with joining 3 tables and I cant figure it out
Right now it only gives me an answer for dates that there are actual records in the tracker_weld_table.
SELECT SUM(tracker_parts_archive.weight),
WEEK(mycal.dt) as week
FROM
tracker_parts_archive, tracker_weld_archive
RIGHT JOIN
(SELECT dt FROM calendar_table WHERE dt >= '2018-7-1' AND dt <= '2018-10-1') as mycal
ON
weld_worker = '133'AND date(weld_dateandtime) = mycal.dt
WHERE
tracker_weld_archive.tracker_partsID = tracker_parts_archive.id
GROUP BY week
I think you are trying for something like this:
SELECT WEEK(c.dt) as week, COALESCE(SUM(tpa.weight), 0)
FROM calendar_table c left join
tracker_weld_archive tw
on date(tw.weld_dateandtime) = c.dt left join
tracker_parts_archive tp
on tw.tracker_partsID = tp.id and tp.weld_worker = 133
WHERE c.dt >= '2018-07-01' AND c.dt <= '2018-10-01'
GROUP BY week
ORDER BY week;
Notes:
You want to keep all (matching) rows in the calendar table, so it should be first.
All subsequent joins should be LEFT JOINs.
Never use commas in the FROM clause. Always use proper, explicit, standard JOIN syntax.
Write out the full proper date constant -- YYYY-MM-DD. This is an ISO-standard format.
I am guessing that weld_worker is a number, so single quotes are not needed for the comparison.
First, lets start with understanding what you want.. You want totals per week. This means there will be a "GROUP BY" clause (also for any MIN(), MAX(), AVG(), SUM(), COUNT(), etc. aggregates). What is the group BY basis. In this scenario, you want per week. Leading to the next part that you want for a specific date range qualified per your calendar table.
I would start in order what WHAT filtering criteria first. Also, ALWAYS TRY to identify all table( or alias).column in your queries so anyone after you knows where the columns are coming from, especially when multiple tables. In this case "ct" is the ALIAS for "Calendar_Table"
SELECT
ct.dt
from
calendar_table ct
where
ct.dt >= '2018-07-01'
AND ct.dt <= '2018-10-01'
Now, the above date looks to be INCLUSIVE of October 1 and looks like you are trying to generate a quarterly sum from July, Aug, Sept. I would change to LESS than Oct 1.
Now, your calendar has many days and you want it grouped by week, so the WEEK() function gets you that distinct reference without explicitly checking every date. Also, try NOT to use reserved keywords as final column names... makes for confusion later on sometimes.
I have aliased the column name as "WeekBasis". Here, I did a COUNT(*) just to show the total days and the group by showing it in context.
SELECT
WEEK( ct.dt ) WeekBasis,
MIN( ct.dt ) as FirstDayOfThisWeek,
MAX( ct.dt ) as LastDayOfThisWeek,
COUNT(*) as DaysInThisWeek
from
calendar_table ct
where
ct.dt >= '2018-07-01'
AND ct.dt <= '2018-10-01'
group by
WEEK( ct.dt )
So, at this point, we have 1 record per week within the date period you are concerned,
but I also grabbed the earliest and latest dates just to show other components too.
Now, lets get back to your extra tables. We know the dates in question, now need to
get the details from the other tables (which is lacking in the post. You should post
critical components such as how tables are related via common / joined column basis.
How is tracker_part_archive related to tracker_weld_archive??
To simplify your query, you dont even NEED your calendar table as the welding
table HAS a date field and you know your range. Just query against that directly.
IF your worker's ID is numeric, don't add quotes around it, just leave as a number.
SELECT
WEEK( twa.Weld_DateAndTime ) WeekBasis,
COUNT(*) WeldingEntriesDone,
SUM(tpa.weight) TotalWeight
from
tracker_weld_archive twa
JOIN tracker_parts_archive tpa
-- GUESSING on therelationship here.
-- may also be on a given date too???
-- all pieces welded by a person on a given date
ON twa.weld_worker = tpa.weld_worker
AND twa.Weld_DateAndTime = tpa.Weld_DateAndTime
where
twa.Weld_Worker = 133
AND twa.Weld_DateAndTime >= '2018-07-01'
AND twa.Weld_DateAndTime <= '2018-10-01'
group by
WEEK( twa.Weld_DateAndTime )
IF you provide the table structures AND sample data, this can be refined a bit more for you.

SQL WITH AS statements in Ecto Subquery

I have an SQL query that using the PostgreSQL WITH AS to act as an XOR or "Not" Left Join. The goal is to return what is in unique between the two queries.
In this instance, I want to know what users have transactions within a certain time period AND do not have transactions in another time period. The SQL Query does this by using WITH to select all the transactions for a certain date range in new_transactions, then select all transactions for another date range in older_transactions. From those, we will select from new_transactions what is NOT in older_transactions.
My Query in SQL is :
/* New Customers */
WITH new_transactions AS (
select * from transactions
where merchant_id = 1 and inserted_at > date '2017-11-01'
), older_transactions AS (
select * from transactions
where merchant_id = 1 and inserted_at < date '2017-11-01'
)
SELECT * from new_transactions
WHERE user_id NOT IN (select user_id from older_transactions);
I'm trying to replicate this in Ecto via Subquery. I know I can't do a subquery in the where: statement, which leaves me with a left_join. How do I replicate that in Elixir/Ecto?
What I've replicated in Elixir/Ecto throws an (Protocol.UndefinedError) protocol Ecto.Queryable not implemented for [%Transaction....
Elixir/Ecto Code:
def new_merchant_transactions_query(merchant_id, date) do
from t in MyRewards.Transaction,
where: t.merchant_id == ^merchant_id and fragment("?::date", t.inserted_at) >= ^date
end
def older_merchant_transactions_query(merchant_id, date) do
from t in MyRewards.Transaction,
where: t.merchant_id == ^merchant_id and fragment("?::date", t.inserted_at) <= ^date
end
def new_customers(merchant_id, date) do
from t in subquery(new_merchant_transactions_query(merchant_id, date)),
left_join: ot in subquery(older_merchant_transactions_query(merchant_id, date)),
on: t.user_id == ot.user_id,
where: t.user_id != ot.user_id,
select: t.id
end
Update:
I tried changing it to where: is_nil(ot.user_id), but get the same error.
This maybe should be a comment instead of an answer, but it's too long and needs too much formatting so I went ahead and posted this as an answer. With that out of the way, here we go.
What I would do is re-write the query to avoid the Common Table Expression (or CTE; this is what a WITH AS is really called) and the IN() expression, and instead I'd do an actual JOIN, like this:
SELECT n.*
FROM transactions n
LEFT JOIN transactions o ON o.user_id = n.user_id and o.merchant_id = 1 and o.inserted_at < date '2017-11-01'
WHERE n.merchant_id = 1 and n.inserted_at > date '2017-11-01'
AND o.inserted_at IS NULL
You might also choose to do a NOT EXISTS(), which on Sql Server at least will often produce a better execution plan.
This is probably a better way to handle the query anyway, but once you do that you may also find this solves your problem by making it much easier to translate to ecto.

SQL: Average value per day

I have a database called ‘tweets’. The database 'tweets' includes (amongst others) the rows 'tweet_id', 'created at' (dd/mm/yyyy hh/mm/ss), ‘classified’ and 'processed text'. Within the ‘processed text’ row there are certain strings such as {TICKER|IBM}', to which I will refer as ticker-strings.
My target is to get the average value of ‘classified’ per ticker-string per day. The row ‘classified’ includes the numerical values -1, 0 and 1.
At this moment, I have a working SQL query for the average value of ‘classified’ for one ticker-string per day. See the script below.
SELECT Date( `created_at` ) , AVG( `classified` ) AS Classified
FROM `tweets`
WHERE `processed_text` LIKE '%{TICKER|IBM}%'
GROUP BY Date( `created_at` )
There are however two problems with this script:
It does not include days on which there were zero ‘processed_text’s like {TICKER|IBM}. I would however like it to spit out the value zero in this case.
I have 100+ different ticker-strings and would thus like to have a script which can process multiple strings at the same time. I can also do them manually, one by one, but this would cost me a terrible lot of time.
When I had a similar question for counting the ‘tweet_id’s per ticker-string, somebody else suggested using the following:
SELECT d.date, coalesce(IBM, 0) as IBM, coalesce(GOOG, 0) as GOOG,
coalesce(BAC, 0) AS BAC
FROM dates d LEFT JOIN
(SELECT DATE(created_at) AS date,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|IBM}%' then tweet_id
END) as IBM,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|GOOG}%' then tweet_id
END) as GOOG,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|BAC}%' then tweet_id
END) as BAC
FROM tweets
GROUP BY date
) t
ON d.date = t.date;
This script worked perfectly for counting the tweet_ids per ticker-string. As I however stated, I am not looking to find the average classified scores per ticker-string. My question is therefore: Could someone show me how to adjust this script in such a way that I can calculate the average classified scores per ticker-string per day?
SELECT d.date, t.ticker, COALESCE(COUNT(DISTINCT tweet_id), 0) AS tweets
FROM dates d
LEFT JOIN
(SELECT DATE(created_at) AS date,
SUBSTR(processed_text,
LOCATE('{TICKER|', processed_text) + 8,
LOCATE('}', processed_text, LOCATE('{TICKER|', processed_text))
- LOCATE('{TICKER|', processed_text) - 8)) t
ON d.date = t.date
GROUP BY d.date, t.ticker
This will put each ticker on its own row, not a column. If you want them moved to columns, you have to pivot the result. How you do this depends on the DBMS. Some have built-in features for creating pivot tables. Others (e.g. MySQL) do not and you have to write tricky code to do it; if you know all the possible values ahead of time, it's not too hard, but if they can change you have to write dynamic SQL in a stored procedure.
See MySQL pivot table for how to do it in MySQL.