SQL WITH AS statements in Ecto Subquery - sql

I have an SQL query that using the PostgreSQL WITH AS to act as an XOR or "Not" Left Join. The goal is to return what is in unique between the two queries.
In this instance, I want to know what users have transactions within a certain time period AND do not have transactions in another time period. The SQL Query does this by using WITH to select all the transactions for a certain date range in new_transactions, then select all transactions for another date range in older_transactions. From those, we will select from new_transactions what is NOT in older_transactions.
My Query in SQL is :
/* New Customers */
WITH new_transactions AS (
select * from transactions
where merchant_id = 1 and inserted_at > date '2017-11-01'
), older_transactions AS (
select * from transactions
where merchant_id = 1 and inserted_at < date '2017-11-01'
)
SELECT * from new_transactions
WHERE user_id NOT IN (select user_id from older_transactions);
I'm trying to replicate this in Ecto via Subquery. I know I can't do a subquery in the where: statement, which leaves me with a left_join. How do I replicate that in Elixir/Ecto?
What I've replicated in Elixir/Ecto throws an (Protocol.UndefinedError) protocol Ecto.Queryable not implemented for [%Transaction....
Elixir/Ecto Code:
def new_merchant_transactions_query(merchant_id, date) do
from t in MyRewards.Transaction,
where: t.merchant_id == ^merchant_id and fragment("?::date", t.inserted_at) >= ^date
end
def older_merchant_transactions_query(merchant_id, date) do
from t in MyRewards.Transaction,
where: t.merchant_id == ^merchant_id and fragment("?::date", t.inserted_at) <= ^date
end
def new_customers(merchant_id, date) do
from t in subquery(new_merchant_transactions_query(merchant_id, date)),
left_join: ot in subquery(older_merchant_transactions_query(merchant_id, date)),
on: t.user_id == ot.user_id,
where: t.user_id != ot.user_id,
select: t.id
end
Update:
I tried changing it to where: is_nil(ot.user_id), but get the same error.

This maybe should be a comment instead of an answer, but it's too long and needs too much formatting so I went ahead and posted this as an answer. With that out of the way, here we go.
What I would do is re-write the query to avoid the Common Table Expression (or CTE; this is what a WITH AS is really called) and the IN() expression, and instead I'd do an actual JOIN, like this:
SELECT n.*
FROM transactions n
LEFT JOIN transactions o ON o.user_id = n.user_id and o.merchant_id = 1 and o.inserted_at < date '2017-11-01'
WHERE n.merchant_id = 1 and n.inserted_at > date '2017-11-01'
AND o.inserted_at IS NULL
You might also choose to do a NOT EXISTS(), which on Sql Server at least will often produce a better execution plan.
This is probably a better way to handle the query anyway, but once you do that you may also find this solves your problem by making it much easier to translate to ecto.

Related

Executing a Aggregate function within a case without Group by

I am trying to assign a specific code to a client based on the number of gifts that they have given in the past 6 months using a CASE. I am unable to use WITH (screenshot) due to the limitations of the software that I am creating the query in. It only allows for select functions. I am unsure how to get a distinct count from another table (transaction data) and use that as parameters in the CASE I have currently built (based on my client information table). Does anyone know of any workarounds for this? I am unable to GROUP BY clientID at the end of my query because not all of my columns are aggregate, and I only need to GROUP BY clientID for this particular WHEN statement in the CASE. I have looked into the OVER() clause, but I am needing my date range that I am evaluating to be dynamic (counting transactions over the last six months), and the amount of rows that I would be including is variable, as the transaction count month to month varies. Also, the software that I am building this in does not recognize the PARTITIONED BY parameter of the over clause.
Any help would be great!
EDIT:
it is not letting me attach an image... -____- I have added the two sections of code that I am looking for assistance with!
WITH "6MonthGIftCount" (
"ConstituentID"
,"GiftCount"
)
AS (
SELECT COUNT(DISTINCT "GiftView"."GiftID" FROM "GiftView" WHERE MONTHS_BETWEEN("GiftView"."GiftDate", getdate()) <= 6 GROUP BY "GiftView"."ConstituentID")
SELECT...CASE
WHEN "6MonthGiftCount"."GiftCount" >= 4
THEN 'A010'
)
Perform your grouping/COUNT(1) in a subquery to obtain the total # of donations by ConstituentID, then JOIN this total into your main query that uses this new column to perform its CASE statement.
select
hist.*,
case when timesDonated > 5 then 'gracious donor'
when timesDonated > 3 then 'repeated donor'
when timesDonated >= 1 then 'donor'
else null end as donorCode
from gifthistory hist
left join ( /* your grouping subquery here, pretending to be a new table */
select
personID,
count(1) as timesDonated
from gifthistory i
WHERE abs(months_between(giftDate, sysdate)) <= 6
group by personid ) grp on hist.personid = grp.personID
order by 1;
*Naturally, syntax changes will vary by DB; you didn't specify which it was based on, but you should be able to use this template with whichever you utilize. This works in both Oracle and SQL Server after tweaking the month calculation appropriately.

Can someone help me with this join

I need it to give me me a total of 0 for week 33 - 39, but I'm really bad with joining 3 tables and I cant figure it out
Right now it only gives me an answer for dates that there are actual records in the tracker_weld_table.
SELECT SUM(tracker_parts_archive.weight),
WEEK(mycal.dt) as week
FROM
tracker_parts_archive, tracker_weld_archive
RIGHT JOIN
(SELECT dt FROM calendar_table WHERE dt >= '2018-7-1' AND dt <= '2018-10-1') as mycal
ON
weld_worker = '133'AND date(weld_dateandtime) = mycal.dt
WHERE
tracker_weld_archive.tracker_partsID = tracker_parts_archive.id
GROUP BY week
I think you are trying for something like this:
SELECT WEEK(c.dt) as week, COALESCE(SUM(tpa.weight), 0)
FROM calendar_table c left join
tracker_weld_archive tw
on date(tw.weld_dateandtime) = c.dt left join
tracker_parts_archive tp
on tw.tracker_partsID = tp.id and tp.weld_worker = 133
WHERE c.dt >= '2018-07-01' AND c.dt <= '2018-10-01'
GROUP BY week
ORDER BY week;
Notes:
You want to keep all (matching) rows in the calendar table, so it should be first.
All subsequent joins should be LEFT JOINs.
Never use commas in the FROM clause. Always use proper, explicit, standard JOIN syntax.
Write out the full proper date constant -- YYYY-MM-DD. This is an ISO-standard format.
I am guessing that weld_worker is a number, so single quotes are not needed for the comparison.
First, lets start with understanding what you want.. You want totals per week. This means there will be a "GROUP BY" clause (also for any MIN(), MAX(), AVG(), SUM(), COUNT(), etc. aggregates). What is the group BY basis. In this scenario, you want per week. Leading to the next part that you want for a specific date range qualified per your calendar table.
I would start in order what WHAT filtering criteria first. Also, ALWAYS TRY to identify all table( or alias).column in your queries so anyone after you knows where the columns are coming from, especially when multiple tables. In this case "ct" is the ALIAS for "Calendar_Table"
SELECT
ct.dt
from
calendar_table ct
where
ct.dt >= '2018-07-01'
AND ct.dt <= '2018-10-01'
Now, the above date looks to be INCLUSIVE of October 1 and looks like you are trying to generate a quarterly sum from July, Aug, Sept. I would change to LESS than Oct 1.
Now, your calendar has many days and you want it grouped by week, so the WEEK() function gets you that distinct reference without explicitly checking every date. Also, try NOT to use reserved keywords as final column names... makes for confusion later on sometimes.
I have aliased the column name as "WeekBasis". Here, I did a COUNT(*) just to show the total days and the group by showing it in context.
SELECT
WEEK( ct.dt ) WeekBasis,
MIN( ct.dt ) as FirstDayOfThisWeek,
MAX( ct.dt ) as LastDayOfThisWeek,
COUNT(*) as DaysInThisWeek
from
calendar_table ct
where
ct.dt >= '2018-07-01'
AND ct.dt <= '2018-10-01'
group by
WEEK( ct.dt )
So, at this point, we have 1 record per week within the date period you are concerned,
but I also grabbed the earliest and latest dates just to show other components too.
Now, lets get back to your extra tables. We know the dates in question, now need to
get the details from the other tables (which is lacking in the post. You should post
critical components such as how tables are related via common / joined column basis.
How is tracker_part_archive related to tracker_weld_archive??
To simplify your query, you dont even NEED your calendar table as the welding
table HAS a date field and you know your range. Just query against that directly.
IF your worker's ID is numeric, don't add quotes around it, just leave as a number.
SELECT
WEEK( twa.Weld_DateAndTime ) WeekBasis,
COUNT(*) WeldingEntriesDone,
SUM(tpa.weight) TotalWeight
from
tracker_weld_archive twa
JOIN tracker_parts_archive tpa
-- GUESSING on therelationship here.
-- may also be on a given date too???
-- all pieces welded by a person on a given date
ON twa.weld_worker = tpa.weld_worker
AND twa.Weld_DateAndTime = tpa.Weld_DateAndTime
where
twa.Weld_Worker = 133
AND twa.Weld_DateAndTime >= '2018-07-01'
AND twa.Weld_DateAndTime <= '2018-10-01'
group by
WEEK( twa.Weld_DateAndTime )
IF you provide the table structures AND sample data, this can be refined a bit more for you.

SQL merging result sets on a unique column value

I have 2 similar queries which both work on the same table, and I essentially want to combine their results such that the second query supplies default values for what the first query doesn't return. I've simplified the problem as much as possible here. I'm using Oracle btw.
The table has account information in it for a number of accounts, and there are multiple entries for each account with a commit_date to tell when the account information was inserted. I need get the account info which was current for a certain date.
The queries take a list of account ids and a date.
Here is the query:
-- Select the row which was current for the accounts for the given date. (won't return anything for an account which didn't exist for the given date)
SELECT actr.*
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
AND actr.commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ')
AND actr.commit_date =
(
SELECT MAX(actrInner.commit_date)
FROM Account_Information actrInner
WHERE actrInner.account_id = actr.account_id
AND actrInner.commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ')
)
This looks a little ugly, but it returns a single row for each account which was current for the given date. The problem is that it doesn't return anything if the account didn't exist until after the given date.
Selecting the earliest account info for each account is trival - I don't need to supply a date for this one:
-- Select the earliest row for the accounts.
SELECT actr.*
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
AND actr.commit_date =
(
SELECT MAX(actrInner .commit_date)
FROM Account_Information actrInner
WHERE actrInner .account_id = actr.account_id
)
But I want to merge the result sets in such a way that:
For each account, if there is account info for it in the first result set - use that.
Otherwise, use the account info from the second result set.
I've researched all of the joins I can use without success. Unions almost do it but they will only merge for unique rows. I want to merge based on the account id in each row.
Sql Merging two result sets - my case is obviously more complicated than that
SQL to return a merged set of results - I might be able to adapt that technique? I'm a programmer being forced to write SQL and I can't quite follow that example well enough to see how I could modify it for what I need.
The standard way to do this is with a left outer join and coalesce. That is, your overall query will look like this:
SELECT ...
FROM defaultQuery
LEFT OUTER JOIN currentQuery ON ...
If you did a SELECT *, each row would correspond to the current account data plus your defaults. With me so far?
Now, instead of SELECT *, for each column you want to return, you do a COALESCE() on matched pairs of columns:
SELECT COALESCE(currentQuery.columnA, defaultQuery.columnA) ...
This will choose the current account data if present, otherwise it will choose the default data.
You can do this more directly using analytic functions:
select *
from (SELECT actr.*, max(commit_date) over (partition by account_id) as maxCommitDate,
max(case when commit_date <= to_date( '2010-DEC-30','YYYY-MON-DD ') then commit_date end) over
(partition by account_id) as MaxCommitDate2
FROM Account_Information actr
WHERE actr.account_id in (30000316, 30000350, 30000351)
) t
where (MaxCommitDate2 is not null and Commit_date = MaxCommitDate2) or
(MaxCommitDate2 is null and Commit_Date = MaxCommitDate)
The subquery calculates two values, the two possibilities of commit dates. The where clause then chooses the appropriate row, using the logic that you want.
I've combined the other answers. Tried it out at apex.oracle.com. Here's some explanation.
MAX(CASE WHEN commit_date <= to_date('2010-DEC-30', 'YYYY-MON-DD')) will give us the latest date not before Dec 30th, or NULL if there isn't one. Combining that with a COALESCE, we get
COALESCE(MAX(CASE WHEN commit_date <= to_date('2010-DEC-30', 'YYYY-MON-DD') THEN commit_date END), MAX(commit_date)).
Now we take the account id and commit date we have and join them with the original table to get all the other fields. Here's the whole query that I came up with:
SELECT *
FROM Account_Information
JOIN (SELECT account_id,
COALESCE(MAX(CASE WHEN commit_date <=
to_date('2010-DEC-30', 'YYYY-MON-DD')
THEN commit_date END),
MAX(commit_date)) AS commit_date
FROM Account_Information
WHERE account_id in (30000316, 30000350, 30000351)
GROUP BY account_id)
USING (account_id, commit_date);
Note that if you do use USING, you have to use * instead of acrt.*.

SQL Query - Pull User Achievement

First, I am sorry as I could not come up with better title for this question.
I have a badge/achievement system in my website, community users are rewarded specific badges according to their activity in the website, below sql example I use to pull the number of users who made at at least 100 forum posts (I am using informix db version 10)
SELECT tjm.userid::INTEGER AS user_id,
EXTEND(DBINFO("UTC_TO_DATETIME",tjm.creationdate/1000), year to fraction)
AS earned_date
FROM TABLE(
MULTISET(
SELECT jm.userid, jm.creationdate, (
SELECT COUNT(*) from TABLE(
MULTISET(
SELECT userid, creationdate
FROM jive:jivemessage
)
) AS i
WHERE i.userid = jm.userid AND i.creationdate < jm.creationdate
) + 1 AS row_num
FROM jive:jivemessage jm
)
) AS tjm
WHERE tjm.row_num=100
This sql takes around more than 30 minutes to execute, we have a very large community and there are millions of forum posts.
I would like to know if there is a solution to improve the query performance? I am trying to reduce the execution time because I have 40 sql queries similar to this one but for different tables and activities.
I don't now Informix DB, but the query below should do what you ask and it's ANSI SQL (except for the EXTEND part, which I copied from your original query).
SELECT
jm.userid
,EXTEND(DBINFO("UTC_TO_DATETIME",tjm.creationdate/1000), year to fraction) AS earned_date
FROM
(
-- This sub-query will return all Users who have 100 messages or more
SELECT
jm.userid
,count(jm.userid) as totalmessages
FROM
jive:jivemessage jm
GROUP BY
jm.userid
HAVING
count(jm.userid) >= 100) AS MessageCount
The above could probably be done without having to use a sub-query. The only reason why I used it is to have the DateEarned, as per original query, in the result set. Adding it to the sub-query would have required adding it to the GROUP BY, with unpredictable results if the query runs across two days (e.g. at 23:59:59).
Update 2012/08/14 - Rewritten query following new requirements
As I stated before, I don't know Informix at all, therefore the following query may or may not run.
SELECT
UsersWithBadge.userid
,MAX(UsersWithBadge.creationdate) as dateearned
FROM
(
SELECT FIRST 100
jm.userid
,jm.creationdate
FROM
jive:jivemessage jm
JOIN
(-- This sub-query will return all Users who have 100 messages or more
SELECT
jm.userid
,count(jm.userid) as totalmessages
FROM
jive:jivemessage jm
GROUP BY
jm.userid
HAVING
count(jm.userid) >= 100)
AS MessageCount ON
(MessageCount.userid = jm.userid)
) AS UsersWithBadge
GROUP BY
UsersWithBadge.userid

counting date and time for historical reporting

I am currently working on a query that will be used in junction with share-point to run reports. I have a query that I know will work with Oracle, but the company I am working for is running SQL Server 2005.
What the report will do is give the person the ability to select any date and time, and give the count for that specific operation. The problem is that there are large gaps in the time stamps (because it takes a little while for the product to get to the next operation). The date type is varchar, so i used substrings to parse out the year, month, day, and time. I have sample data available.
The people looking at the reports want the ability to say at this time and day how many units went through this operation.
I know this is is confusing, let me know if you need any clarification.
Here is the oracle syntax
SELECT T3.PAYMENT_DATE, T3."Hr", T3."Min",
(SELECT COUNT(*)
FROM INVOICE_ARCHIVE T4
WHERE TO_NUMBER(TO_CHAR(T4.PAYMENT_DATE, 'MM')) <= T3."Hr"
AND TO_NUMBER(TO_CHAR(T4.PAYMENT_DATE, 'DD')) <= T3."Min") AS "NUM"
FROM(SELECT T1.PAYMENT_DATE, T2."Hr", T2."Min"
FROM (SELECT (FLOOR((LEVEL + 359)/60)) AS "Hr",
MOD((LEVEL + 359), 60) AS "Min"
FROM dual CONNECT BY LEVEL <= 961) T2, INVOICE_ARCHIVE T1
ORDER BY T1.PAYMENT_DATE, T2."Hr", T2."Min") T3
The answer to your question is the datepart() function in SQL Server. This will allow you to extract minutes and hours from dates.
The harder part is the "connect by level" portion. How is this being used? You might need to use recursive CTEs to handle this.
With the little hint from spencer, the following may suffice for your query:
SELECT T3.PAYMENT_DATE, T3."Hr", T3."Min",
(SELECT COUNT(*)
FROM INVOICE_ARCHIVE T4
WHERE datepart(month, T4.PAYMENT_DATE) <= T3."Hr" AND
datepart(day, T4.PAYMENT_DATE, 'DD') <= T3."Min"
) AS "NUM"
FROM (SELECT T1.PAYMENT_DATE, T2."Hr", T2."Min"
FROM (SELECT top 961 (FLOOR((LEVEL + 359)/60)) AS "Hr",
MOD((LEVEL + 359), 60) AS "Min"
FROM (select top 961 row_number() over (order by (select NULL)) as level
from invoice_archive
) t
) T2 cross join
INVOICE_ARCHIVE T1
) T3
ORDER BY T3.PAYMENT_DATE, T3."Hr", T3."Min"
I made the following changes:
Changed the date arithmetic to use datepart() instead of to_char() .
Replaced the method for getting a list of numbers, by using row_number() instead of connect by level
Made the cross join explicit
Moved the order by to the outer query, since neither SQL Server nor Oracle guarantee the results of an order by in a subquery (and SQL Server does not allow it unless you have a "TOP" query)