Troubleshooting an inner join that's not finding expected matches - sql

Okay...I'm not a coder and my code has lots of steps that would make it difficult to post and get an outright answer. So I am looking for general steps you would follow if an inner join does not seem to be working correctly. Here is my general situation:
Problem with inner join:
I start with two tables that I basically am appending to each other - they share most fields, including "id". One of the tables contains households who receive an email, and the other table are households who did not receive an email - "controls". So I append them into a single table, keep in mind they come from different sources with different processes creating them.
Then I match the id against another table that contains only customers and get a custnum for some of those households that are indeed customers.
Next is to use the custnum variable to join to a sales table. At least some controls, and likely a greater number of the mailed households should be customers and have sales - the point of the email was to obviously bring about sales.
My problem is that NO control households are showing up with any sales. That is impossible, given that there are hundreds of thousands of households. I'm getting a reasonable number of matches to the emailed households.
In trying to troubleshoot this all I can figure is that somehow there is a format issue of the id or the custnum fields between the mailed and control households - perhaps because they did come from different sources and I had to append them together at the start.
Is this possible? Should both the format and informat be the same for each key?What else could be the problem?

It is much easier to append data using a DATA STEP than using SQL statements.
data both ;
set email noemail;
run;
If you want to do the same thing in SQL then use UNION instead of any type of join.
proc sql ;
create table both as
select * from email
union
select * from noemail
;
quit;

Related

MS Access - Query - Required Forms for Each Employee

I have 3 tables, all SharePoint lists. I am trying to create a query that will show me all of the required DQ_File Forms that do not have an attachment in the DQ_File.
DQ_File_Lookup is a lookup table for the description field in the DQ_File. It also has the "DQRequired" flag I am looking for to see all of the required fields that do not have an attachment.
I have included a screen shot showing the table layouts and relations.
Any help would be appreciated, I am sure I am just overlooking something obvious.
A example would be as follows:
Employee Name | Document Name
You would have employee Joe and he has forms A,B,D out of a possible forms A,B,C,D,E,F so he would be missing forms C,E and F.
So the employee name would come from the employee table, and the document name needs to get passed through the DQ_File Table from the DQ_File_Lookup
the way I thought to do it was to get it to show all documents from the DQ_File table that are missing, that I can do. But that only shows the information that has an entry. There are certain forms that are required for every employee that I want to be able to see if a employee is missing any of those forms.
Using what #June7 posted below I got it to work, and it now will show me all 15 documents that are required for every driver. But when I add the attachment field from DQ_File it shows them all as zero attachments, when I know some of them do indeed have attachments already.
Here is a screen cap showing this.
Williams in particular should only have about 5 documents that should be on this list, but instead it is showing like all 15 are missing.
Here is the SQL from the combined query:
SELECT [qryEmployees+DQFileLookup].Last, [qryEmployees+DQFileLookup].Description, DQ_File.Attachment
FROM DQ_File RIGHT JOIN [qryEmployees+DQFileLookup] ON DQ_File.EmployeeNo = [qryEmployees+DQFileLookup].EmployeeCode
WHERE (((DQ_File.Attachment.FileURL) Is Null) AND (([qryEmployees+DQFileLookup].CURRENT)=True) AND (([qryEmployees+DQFileLookup].DRIVER)=True) AND (([qryEmployees+DQFileLookup].DQRequired)=True));
If you want to know which required docs employees do not have, then need a dataset of all possible combinations of employees/docs. Then match that dataset with DQ_File to see what is missing. The all combinations dataset can be generated with a Cartesian query (a query without JOIN clause) - every record of each table will associate with every record of other table.
SELECT Employees.*, DQ_File_Lookup.* FROM Employees, DQ_File_Lookup;
Then join that query with DQ_File.
SELECT Query1.EmployeeID, Query1.First, Query1.Last, Query1.ID, Query1.Title, Query1.DQRequired, DQ_File.Description, DQ_File.EmployeeNo
FROM DQ_File RIGHT JOIN Query1 ON (DQ_File.EmployeeNo = Query1.EmployeeID) AND (DQ_File.Description = Query1.ID)
WHERE (((Query1.DQRequired)=True) AND ((DQ_File.EmployeeNo) Is Null));
Advise not to use exact same field names in multiple tables. For instance, Title in DQ_File_Lookup could be DocTitle and Title in Employees could be JobTitle. And there will be less confusion if ID is not used as name in all tables.
It seems unnecessary to repeat Title and [Compliance Asset ID] in all 3 tables.
Strongly advise not to use spaces in naming convention. Title case is better than all upper case.

I'm having trouble resolving a Internship postgresql exercise. Using two different tables

I've applied myself for a internship which uses postgresql, I never had any contact with programming language before the university which started just a couple months ago. The employer sent me an email with some exercises that I have to do before Monday. I have three days to learn the language and resolve the exercises. I've been studying the whole day, about 14 hours (for real). I'm getting used to the postgresql but I'm struggling with one thing. Since I'm very new to programming and I don't have enough time to do that very specific search I have no other options but ask you guys for.
Here's the problem. I have the same columns 'id_cliente' on both tables. I need to show a table where it shows all persons names, ids and how many movies each one of them borrowed from the rental.
I tried two different codes and none of them works.
select en_cliente.id_cliente, nome, count(en_aluguel.id_cliente) as alugueis
from en_cliente, en_aluguel
where en_cliente.id_cliente=en_aluguel.id_cliente
group by en_cliente.id_cliente;
Which makes Maria goes missing (Because her ID doesn't shows at the first table. It's supposed to show a zero
Also:
select en_cliente.id_cliente, nome, count(en_aluguel.id_cliente) as alugueis
from en_cliente, en_aluguel
group by en_cliente.id_cliente;
Which makes every value of the last column (id as alugueis) to be a '7'
First Table:
Second Table:
Third Table:
Two things:
Any time you have two tables in the FROM line, you’re doing an INNER JOIN, which requires matching rows on both tables.
Any time you have criteria in the WHERE clause, only rows matching that will be returned, which again will limit you to records in both tables with the clause you have.
You need to LEFT JOIN, which allows you to go from records that exist, to records that may or may not exist.
Give this a try. It will start at your en_cliente table and will join to your records in the en_aluguel table even if there is not a match in en_aluguel.
select en_cliente.id_cliente, nome, count(en_aluguel.id_cliente) as alugueis
from en_cliente
left join en_aluguel on (en_cliente.id_cliente=en_aluguel.id_cliente)
group by en_cliente.id_cliente;
Note: if you change the word “left” in my example to “inner”, you’ll end up with exactly the same code (just a different syntax) as your first example.

SSRS - Display fields based on criteria from two different datasets

I'm working on a SSRS report where I am pulling claim data from two different datasets (creatively named DataSet1 and DataSet2) and it is creating two separate tables and information here:
You'll see the fields pretty easily spelled out, what I am looking to do is create another table with data that displays ONLY data that is not matching in both, so in the example given, it would display the claim no, trans date, and amount of everything other than CLAIM987654321 (which is the only unique identifier, as with the way things are processed the dates may be different.)
I know how to display only based on a query, but am unsure how to make the multi-dataset comparison happen.
Sadly there is no possible way to have the data combine to my knowledge, there may be, but I am unsure how to do this. Below are the queries I am using within SSMS.
The servers are already linked per previous joins, but if there is a way to manipulate the data into a single pull, I am unfamiliar with that.
NEW UPDATE: I threw together a really ugly linked server pull, but it is only pulling the data that exists in both, and I would want the data that is NOT as well.
You'll need to use FULL JOIN
So if you want to see only data that does not appear in both tables then I would do something like this. (I've not used you full qualifiers for clarity but you'll get the idea)
SELECT
COALESCE(c.ClaimNo, r.CHK_claim_number) AS [Claim Number] -- COALESCE will get first non null value
, COALESCE(d.OtherPayer1Paid, r.CHK_payable_cost) AS [Amount]
, COALESCE(c.TransactionDate, d.CHK_paid_date) AS [Transaction Date]
FROM EDI_Claims c -- Full join shows all records, null will show for missing records
JOIN EDI_ClaimDetails d ON c.id =d.claimid
FULL JOIN PaidClaims_by_CheckRun r ON r.CHK_claim_number = c.claimno
WHERE d.OtherPayer1Paid != 0
AND (r.CHK_ClaimNUmber IS NULL OR c.ClaimNo IS NULL) -- only show when one side of the join fails
ORDER BY c.TrandactionDate

MS Access - Log daily totals of query in new table

I have an ODBC database that I've linked to an Access table. I've been using Access to generate some custom queries/reports.
However, this ODBC database changes frequently and I'm trying to discover where the discrepancy is coming from. (hundreds of thousands of records to go through, but I can easily filter it down into what I'm concerned about)
Right now I've been manually pulling the data each day, exporting to Excel, counting the totals for each category I want to track, and logging in another Excel file.
I'd rather automate this in Access if possible, but haven't been able to get my heard around it yet.
I've already linked the ODBC databases I'm concerned with, and can generate the query I want to generate.
What I'm struggling with is how to capture this daily and then log that total so I can trend it over a given time period.
If it the data was constant, this would be easy for me to understand/do. However, the data can change daily.
EX: This is a database of work orders. Work orders(which are basically my primary key) are assigned to different departments. A single work order can belong to many different departments and have multiple tasks/holds/actions tied to it.
Work Order 0237153-03 could be assigned to Department A today, but then could be reassigned to Department B tomorrow.
These work orders also have "ranking codes" such as Priority A, B, C. These too can be changed at any given time. Today Work Order 0237153-03 could be priority A, but tomorrow someone may decide that it should actually be Priority B.
This is why I want to capture all available data each day (The new work orders that have come in overnight, and all the old work orders that may have had changes made to them), count the totals of the different fields I'm concerned about, then log this data.
Then repeat this everyday.
the question you ask is very vague so here is a general answer.
You are counting the items you get from a database table.
It may be that you don't need to actually count them every day, but if the table in the database stores all the data for every day, you simply need to create a query to count the items that are in the table for every day that is stored in the table.
You are right that this would be best done in access.
You might not have the "log the counts in another table" though.
It seems you are quite new to access so you might benefit form these links videos numbered 61, 70 here and also video 7 here
These will help or buy a book / use web resources.
PART2.
If you have to bodge it because you can't get the ODBC database to use triggers/data macros to log a history you could store a history yourself like this.... BUT you have to do it EVERY day.
0 On day 1 take a full copy of the ODBC data as YOURTABLE. Add a field "dump Number" and set it all to 1.
1. Link to the ODBC data every day.
join from YOURTABLE to the ODBC table and find any records that have changed (ie test just the fields you want to monitor and if any of them have changed...).
Append these changed records to YOURTABLE with a new value for "dump number of 2" This MUST always increment!
You can now write SQL to get the most recent record for each primary key.
SELECT *
FROM Mytable
WHERE
(
SELECT PrimaryKeyFields, MAX(DumpNumber) AS MAXDumpNumber
FROM Mytable
GROUP BY PrimaryKeyFields
) AS T1
ON t1.PrimaryKeyFields = Mytable.PrimaryKeyFields
AND t1.MAXDumpNumber= Mytable.DumpNumber
You can compare the most recent records with any previous records.
ie to get the previous dump
Note that this will NOT work in the abvoe SQL (unless you always keep every record!)
AND t1.MAXDumpNumber-1 = Mytable.DumpNumber
Use something like this to get the previous row:
SELECT *
FROM Mytable
INNER JOIN
(
SELECT PrimaryKeyFields
, MAX(DumpNumber) AS MAXDumpNumber
FROM Mytable
INNER JOIN
(
SELECT PrimaryKeyFields
, MAX(DumpNumber) AS MAXDumpNumber
FROM Mytable
GROUP BY PrimaryKeyFields
) AS TabLatest
ON TabLatest.PrimaryKeyFields = Mytable.PrimaryKeyFields
AND
TabLatest.MAXDumpNumber <> Mytable.DumpNumber
-- Note that the <> is VERY important
GROUP BY PrimaryKeyFields
) AS T1
ON t1.PrimaryKeyFields = Mytable.PrimaryKeyFields
AND t1.MAXDumpNumber= Mytable.DumpNumber
Create 4 and 5 and MS Access named queries (or SS views) and then treate them like tables to do comparison.
Make sure you have indexes created on the PK fields and the DumpNumber and they shoudl be unique - this will speed things up....
Finish it in time for christmas... and flag this as an answer!

Linking a table to two columns in a second table

I have an issue where I think my major problem is figuring out how to phrase it to get an acceptable answer from Google.
The situation:
Table A is 'Invoice's it has a column that links to Table B 'Jobs' in two places. It either links to our 'Job Number' column or the 'Client Number' column. The major issue is the fact that 'Client Number' and 'Job Number' can be the same number if we set the job up instead of the client setting the job up.
What I'm getting is that every time there is the same number in either column the results are duplicated.
Now this is extremely simplifying the situation to try and make it a bit more understandable, but I am essentially looking for a statement that looks at Table A gets the value then compares against Column B1 if that doesn't match then compares it against B2 if that doesn't match then excludes it from the results. The key would be that if it matches when it compares against B1 it doesn't go on to compare it against B2.
Any help with this would be greatly appreciated, even if it is just a point in the direction of the very obvious operator or function that does this. It's hitting the end of a very long day.
Thank you.
Edit:
A further description:
Invoice Table
---------------------------------
PK, INVOICE_NUMBER, LINK_TO_JOB
Job Table
---------------------------------
PK, JOB_NUMBER, CLIENT_JOB_NUMBER
Now the crux of the matter is that both PK are database generated sequential numbers, no overlap there. The invoice number and the job number are both application generated sequential numbers with no overlap the link to job is application generated and when an invoice is raised links to one of two fields in the jobs table based on rules. For simplicity lets say those rules are if there is a Client Job Number link to that if not link to the job number.
Now the Client job number is a field that is written into buy people, lots of mistakes can and do happen, but lots of crap gets put in this field as well. Stuff like 'Email' 'Fax' are very common answers. So when there is crap in there like 'Email' it links to a series of other fields holding the same 'Email' tag.
So that's problem one.
Problem two Where Statement:
SELECT INVOICE_NUMBER,
LINK_TO_JOB
JOB_NUMBER,
CLINET_JOB_NUMBER
FROM JOBS_TABLE a,
INVOICE_TABLE b
How do I set up the where statement to get the desire result, I've tried:
WHERE (LINK_TO_JOB = JOB_NUMBER OR LINK_TO_JOB = CLIENT_JOB_NUMBER)
This returns lots of multiples, such as when the job number and client job number are identical and when there are multiple identical written in answers 'email' etc. Now this might be unavoidable and I will end up using a Distinct with this where statement to do the best I can with what I have. However what I want to do is:
WHERE (LINK_TO_JOB = JOB_NUMBER (+) OR LINK_TO_JOB = CLIENT_JOB_NUMBER (+))
Which comes back with an error as you can use outer joins with an OR operator.
If nothing comes from this I might just have to go with the OR connection and then throw in the Select Distinct and then build redundancy into Invoicing process so that when the database misses links a manual process catches them.
Although I'm all ears for any ideas.
One way of doing this would be to use a set operation. UNION will give you a distinct set of values. You haven't given much detail so I'm guessing at the specifics: you'll need to amend them for your needs.
with j as ( select * from jobs )
select j.*, inv.*
from invoices inv
join j on ( inv.job_no = j.job_no)
union
select j.*, inv.*
from invoices inv
join j on ( inv.job_no = j.client_no)
The underlying reason for your difficulties is that the data model is half-cooked. In a proper design INVOICES.JOB_NO would have a foreign key relationship referencing JOBS.JOB_NO. Whereas JOBS.CLIENT_NO would be an additional piece of information, a business key, but would not be referenced by INVOICES. Of course it can be displayed on an actual invoice, that's why Nature gave us joins.
Use SELECT DISTINCT to remove the duplicates from your results set.
OK, well group effort here. I used the union join like suggested by APC. and modified to fit my data and all of it's eccentricities (read the French couldn't data model there way out of a paper bag) And then I surrounded everything in a distinct statement suggested by user1871207 and Hikaru-Shindo.
But negative marks go to me, the reason my question was so unclear was several fold, but the big piece of information that was difficult for me to grasp / explain was that Invoices are not always for jobs, coupled with the fact that Invoices can be consolidated (which just went and screwed everything up) and This is just a big mess that I've with your help managed to put a very small piece of two year old scotch tape on.
My only hope for a continued career here is to use the exceptions that come up (and they will come at me like a spider monkey!) to hopefully amend the entire invoice process so that we can report some basic profit and loss numbers.
Cheers for all your help.