SQL question, query is not updating account_id's fields: income, customerid, customergroup? - sql

I am executing this query through a databricks notebook, to join data from a stage table to a target table based on the shared join keys: account_id and stmt_end_dt. The stage table has 2 billion rows of data and the target table has 3 billion rows of data.
Here is the main query:
"UPDATE TARGET_TBL SET INCOME = S.INCOME, CUSTOMERGROUPID = S.CUSTOMERGROUPID, CUSTOMERID = S.CUSTOMERID
FROM STAGE_TBL AS S
WHERE CAST(S.ACCT_ID AS NUMBER(18,0)) = TARGET_TBL.ACCT_ID
AND CAST(S.STMT_END_DT AS DATE) = TARGET_TBL.STMT_END_DT"
What I want to do is add "income", "customerid", and "customergroup" data to the matching rows of "account_id" and "stmt_end_dt" in the target table, from the stage table. When I go into the target table I see that there are now fields for "income", "customerid", and "customergrop" (this is fine because it was created through an earlier query). After my query has run and I click into the target table I see that account_id is blank and that "income", "customerid" and "customergroup" all have data filled. And when I run this query: SELECT * FROM TARGET_TBL WHERE INCOME IS NOT NULL; I get back 80000 rows (seems kinda low considering the stage table is 2 billion). Also after that query runs I see again that "income", "customerid" and "customergroup" are all populated with data, but account_id is full of NULLS. It is as this data is just being appended or tacked on, and not updating each account_id's fields with the matching data, this is how I imagine it should look like:
account_id | income | customerid | customergroupid
4321 | 60000 | 6345 | 3
5432 | 55000 | 4345 | 5
But instead it looks like this:
account_id | income | customerid | customergroupid
| 60000 | 6345 | 3
| 55000 | 4345 | 5
Or when I run: SELECT * WHERE INCOME IS NOT NULL:
account_id | income | customerid | customergroupid
NULL | 60000 | 6345 | 3
NULL | 55000 | 4345 | 5
And if I simply open the target table it looks like this:
account_id | income | customerid | customergroupid
4321 | | |
5432 | | |
After that query runs, it is also NULL for all other fields outside of the last 3 shown.
Perhaps the data types coming from the stage table aren't compatible with the target table?
What could be causing this strange behavior?

you can't compare "values" with "null"... if a field is "null" there is nothing to compare. I believe this is your problem.
if you have null fields and you want to compare, usually you can try "is null" or "nvl" lookup for the syntax of these.. it is very helpfull.

Related

Get the row with latest start date from multiple tables using sub select

I have data from 3 tables as copied below . I am not using joins to get data. I dont know how to use joins for multiple tables scenario. My situation is to update the OLD(eff_start_ts) date rows to sydate in one of the tables when we find the rows returned for a particular user is more than 2. enter code here
subscription_id |Client_id
----------------------------
20685413 |37455837
reward_account_id|subscription_id |CURRENCY_BAL_AMT |CREATE_TS |
----------------------------------------------------------------------
439111697 | 20685413 | -40 |1-09-10 |
REWARD_ACCT_DETAIL_ID|REWARD_ACCOUNT_ID |EFF_START_TS |EFF_STOP_TS |
----------------------------------------------------------------------
230900968 | 439111697 | 14-06-11 | 15-01-19
47193932 | 439111697 | 19-02-14 | 19-12-21
243642632 | 439111697 | 18-03-23 | 99-12-31
247192972 | 439111697 | 17-11-01 | 17-11-01
The SQL should update the EFF_STOP_TS of last table except the second row - 47193932 bcz that has the latest EFF_START_TS.
Expected result is to update the EFF_STOP_TS column of 230900968, 243642632 and 247192972 to sysdate.
As per my understanding, You need to update it per REWARD_ACCOUNT_ID. So, You can try the below code -
UPDATE REWARD_ACCT_DETAIL RAD
SET EFF_STOP_TS = SYSDATE
WHERE EFF_START_TS NOT IN (SELECT MAX(EFF_START_TS)
FROM REWARD_ACCT_DETAIL RAD1
WHERE RAD.REWARD_ACCOUNT_ID = RAD1.REWARD_ACCOUNT_ID)

MS ACCESS - cannot subtract returned values from a query from values in another table

I have two tables: Project and Invoice
Project Table:
ID | UR_No | Budget_Total | Budget_To_Date
1 | 329000 | 150000.00 |
2 | 403952-C | 33000 |
Invoice Table:
ID | URID | InvAmount
1 | 329000 | 157.00
2 | 329000 | 32.00
3 | 403952-C| 193.00
Invoice table has amounts charged to a project. A project has a unique UR number (UR_No) and invoices have duplicate UR numbers (URID), meaning the same project gets billed monthly and has different invoice numbers.
What I would like to achieve is:
ID | UR_No | Budget_Total | Budget_To_Date
1 | 329000 | 150000.00 | 149811.00
2 | 403952-C | 33000 | 32807
First, an aggregate query is done on the Invoice table to get the running total of money charged to the project:
SELECT Invoice.URID, Sum(Invoice.InvAmount) AS total
FROM Invoice
GROUP BY Invoice.URID;
This returns the following:
URID | InvAmount
329000 | 189.00
403952-C| 193.00
This is then exported to a table in the DB named Invoice_Totals
I then want to join the Invoice_Totals table to the Project table using UR_No & URID and calculate an empty existing field "Budget_to_Date" in the Project table by subtracting Invoice_Totals.total in the query table from a field named Budget_total in the project table. Before attempting that, I would just like the query to return the values:
SELECT Project.Budget_Total - Invoice_Totals.total
FROM Project INNER JOIN Invoice_Totals ON Project.UR_No = Invoice_Totals.URID;
This returns the error:
Cannot join on Memo, OLE, or hyperlink object (Project.UR_No=Invoice_Totals.URID)
I looked up an SO post and tried using left 255:
SELECT Project.Budget_Total - Invoice_Totals.total
FROM Project INNER JOIN Invoice_Totals ON left(Project.UR_No,255) = left(Invoice_Totals.URID, 255);
This returns nothing. If possible, How can I subtract the aggregate field from budget total in the Project table in either the Budget_To_Date field or in a new field?
Your comment states linking fields are LongText which is synonymous with Memo data type, therefore the error message clearly identifies cause. Change the field type to ShortText.
However, really should use ID field in Project table as primary key and save that instead of UR_No into Invoice table. Numbers are more efficient keys.

Access query combine two tables with criteria

The below code references two tables. Each table are identical in structure, only difference being the "PRICE" and "PRICE_DATE" values. This is because it's the same table created one year ago. All I want to do is have a new table which takes the latest price in each table for each fund and inserts that into a new table. In addition to this, I also want another column which calculates the growth.
The code below works for this purpose.
SELECT [2015_11_Fund_Prices].FUND_CODE, [2015_11_Fund_Prices].PRICE AS
[PRICE_#_112015], [2016_11_Fund_Prices].PRICE AS [PRICE_#_112016]
([2016_11_Fund_Prices].[PRICE]/[2015_11_Fund_Prices].[PRICE]-1) AS Growth INTO 2016_11_Monthly_Fund_Prices
FROM 2016_11_Fund_Prices INNER JOIN 2015_11_Fund_Prices ON [2016_11_Fund_Prices].FUND_CODE = [2015_11_Fund_Prices].FUND_CODE
GROUP BY [2015_11_Fund_Prices].FUND_CODE, [2015_11_Fund_Prices].PRICE_DATE, [2015_11_Fund_Prices].PRICE, [2016_11_Fund_Prices].PRICE, [2016_11_Fund_Prices].PRICE_DATE, ([2016_11_Fund_Prices].[PRICE]/[2015_11_Fund_Prices].[PRICE]-1)
HAVING ((([2015_11_Fund_Prices].PRICE_DATE)=#24/11/2015#) AND (([2016_11_Fund_Prices].PRICE_DATE)=#24/11/2016#));
However, this code assumes that the latest price is 24/11 in both tables. I want to replace this with a max function that will result in the query referencing only the price in the row with the highest date value.
Can anyone help?
Tabels used are
+-----------+------------+-------+
| Fund_Code | PRICE_DATE | PRICE |
+-----------+------------+-------+
| 1 | 12/12/12 | 1 |
| 1 | 13/12/12 | 1.2 |
| 1 | 14/12/12 | 1.1 |
| 2 | 12/12/12 | 1.12 |
| 2 | 13/12/12 | 1.13 |
| 2 | 14/12/12 | 1.11 |
So the second table is exactly the same but dates corresponding to the following year.
All I want is a table with:
Fund_Code Price1 Price2 Growth
Thanks
You need a sub-query like this:
SELECT FUND_CODE, MAX(PRICE_DATE) AS MaxPriceDate FROM 2016_11_Fund_Prices GROUP BY FUND_CODE
If you add this sub-query to the above and link it to the 2016_11_Fund_Prices table on FUND_CODE and PRICE_DATE=MaxPriceDate it should do what you need.
SELECT 2016_11_Fund_Prices.FUND_CODE, PRICE, PRICE_DATE
FROM 2016_11_Fund_Prices
INNER JOIN (SELECT FUND_CODE, MAX(PRICE_DATE) AS MaxPriceDate FROM 2016_11_Fund_Prices GROUP BY FUND_CODE) mp
ON 2016_11_Fund_Prices.FUND_CODE=mp.FUND_CODE AND 2016_11_Fund_Prices.PRICE_DATE=mp.MaxPriceDate

Select rows with same value in one column but different value in another column

I've been trying to build this query but am new to SQL so I'd really appreciate some help.
In the below table example, I have a Customer Code, a linked Customer Code (which is used to link a child customer to a parent customer), a salesperson, and other irrelevant columns. The goal is to have one Salesperson for each parent customer and it's children. So in the example, CustCode #100 is the parent of itself, #200, #500, and #800. All of these accounts have the same Salesperson (JASON) which is perfect. But for CustCode #300, it is the parent of itself, #400, and #600. However, there isn't one salesperson assigned - its both JIM and SUZY. I want to build a query that shows all accounts for this example. Basically, accounts where the Salesperson field isn't the same value for all of it's child customers.
I tried a Where clause for Salesperson <> Salesperson but its not showing up right.
+-----------+-----------------+------------+----------------------+
| CustCode | Linked CustCode | Salesperson| additional columns...|
+-----------+-----------------+------------+----------------------+
| 100 | 100 | JASON | ... |
| 200 | 100 | JASON | ... |
| 300 | 300 | JIM | ... |
| 400 | 300 | JIM | ... |
| 500 | 100 | JASON | ... |
| 600 | 300 | SUZY | ... |
| 700 | NULL | JIM | ... |
| 800 | 100 | JASON | ... |
+-----------+-----------------+------------+----------------------+
Thanks so much for your help!
You can do self join on the table.
select distinct r2.* from
table r1
join table r2
on
r1.linkedcustcode = r2.linkedcustcode and r1.salesperson <> r2.salesperson
This solution uses a recursive CTE first to build the hierarchy and find the leading code for each row, even if a linked code points to a row which is pointing to an upper row itself.
The final query shows the count of different Salespersons:
DECLARE #tbl TABLE(CustCode INT,[Linked CustCode] INT,Salesperson VARCHAR(100));
INSERT INTO #tbl VALUES
(100,100,'JASON')
,(200,100,'JASON')
,(300,300,'JIM')
,(400,300,'JIM')
,(500,100,'JASON')
,(600,300,'SUZY')
,(700,NULL,'JIM')
,(800,100,'JASON');
--The query
WITH CleanUp AS
(
SELECT CustCode
,CASE WHEN [Linked CustCode]=CustCode THEN NULL ELSE [Linked CustCode] END AS [Linked CustCode]
,Salesperson
FROM #tbl
)
,recCTE AS
(
SELECT CustCode AS LeadingCode,CustCode,[Linked CustCode],Salesperson
FROM CleanUp
WHERE [Linked CustCode] IS NULL
UNION ALL
SELECT recCTE.LeadingCode,t.CustCode,t.[Linked CustCode],t.Salesperson
FROM recCTE
INNER JOIN CleanUp AS t ON t.[Linked CustCode]=recCTE.CustCode
)
SELECT LeadingCode,COUNT(DISTINCT Salesperson) AS CountSalesperson
FROM recCTE
GROUP BY LeadingCode
The result
LeadingCode CountSalesperson
100 1
300 2
700 1

SQL Query to return a distinct count of one column while allowing a full summation of a second column, grouped by a third

I'm writing a query in access 2010 and i can't use count(distinct... so I'm running into a bit of trouble with what can be found below:
An example of my table is as follows
Provider | Member ID | Dollars | Status
FacilityA | 1001 | 50 | Pended
FacilityA | 1001 | 100 | Paid
FacilityA | 1002 | 200 | Paid
FacilityB | 1005 | 30 | Pended
FacilityB | 1009 | 90 | Pended
FacilityC | 1001 | 100 | Paid
FacilityC | 1008 | 500 | Paid
I want to return the total # of unique members that have visited each facility, but I also want to get the total dollar amount that is Pended, so for this example the ideal output would be
Provider | # members | Total Pended charges
FacilityA | 2 | 50
FacilityB | 2 | 120
FacilityC | 2 | 0
I tried using some code I found here: Count Distinct in a Group By aggregate function in Access 2007 SQL
and here:
SQL: Count distinct values from one column based on multiple criteria in other columns
Copying the code from the first link provided by gzaxx:
SELECT cd.DiagCode, Count(cd.CustomerID)
FROM (select distinct DiagCode, CustomerID from CustomerTable) as cd
Group By cd.DiagCode;
I can make this work for counting the members:
SELECT cd.Provider_Number, Count(cd.Member_ID)
FROM (select distinct Provider_Number, Member_ID from Claims_Table) as cd
ON claims_table.Provider_Number=cd.Provider_Number
Group By cd.Provider_Number;
However, no matter what I try I can't get a second portion dealing with the dollars to work without causing an error or messing up the calculation on the member count.
SELECT cd.Provider_Number,
-- claims_table.Member_ID, claims_table.Dollars
SUM(IIF ( Claims_Table.Status = 'Pended' , Claims_Table.Dollars , 0 )) as Dollars_Pending,
Count(cd.Member_ID) as Uniq_Members,
Sum(Dollars) as Dollar_Wrong
FROM (select distinct Provider_Number, Member_ID from Claims_Table) as cd inner join #claims_table
ON claims_table.Provider_Number=cd.Provider_Number and claims_table.Member_ID = cd.Member_ID
Group By cd.Provider_Number;
This should work fine based only on the table you described (named Tabelle1):
SELECT Provider, count(MemberID) as [# Members],
NZ(SUM(SWITCH([Status]='Pended', Dollars)),0) as [Total pending charges]
FROM Tabelle1
GROUP BY Provider;
Explanation
I think the first and second column are self-explanatory.
The third column is where most things are done. The SWITCH([Status]='Pended', Dollars) returns the Dollars only if the status is pending. This then gets summed up by SUM. The NZ(..,0) will set the column to 0 if the SUM returns a NULL.
EDIT: This was tested on Access 2016