Linking a table to two columns in a second table - sql

I have an issue where I think my major problem is figuring out how to phrase it to get an acceptable answer from Google.
The situation:
Table A is 'Invoice's it has a column that links to Table B 'Jobs' in two places. It either links to our 'Job Number' column or the 'Client Number' column. The major issue is the fact that 'Client Number' and 'Job Number' can be the same number if we set the job up instead of the client setting the job up.
What I'm getting is that every time there is the same number in either column the results are duplicated.
Now this is extremely simplifying the situation to try and make it a bit more understandable, but I am essentially looking for a statement that looks at Table A gets the value then compares against Column B1 if that doesn't match then compares it against B2 if that doesn't match then excludes it from the results. The key would be that if it matches when it compares against B1 it doesn't go on to compare it against B2.
Any help with this would be greatly appreciated, even if it is just a point in the direction of the very obvious operator or function that does this. It's hitting the end of a very long day.
Thank you.
Edit:
A further description:
Invoice Table
---------------------------------
PK, INVOICE_NUMBER, LINK_TO_JOB
Job Table
---------------------------------
PK, JOB_NUMBER, CLIENT_JOB_NUMBER
Now the crux of the matter is that both PK are database generated sequential numbers, no overlap there. The invoice number and the job number are both application generated sequential numbers with no overlap the link to job is application generated and when an invoice is raised links to one of two fields in the jobs table based on rules. For simplicity lets say those rules are if there is a Client Job Number link to that if not link to the job number.
Now the Client job number is a field that is written into buy people, lots of mistakes can and do happen, but lots of crap gets put in this field as well. Stuff like 'Email' 'Fax' are very common answers. So when there is crap in there like 'Email' it links to a series of other fields holding the same 'Email' tag.
So that's problem one.
Problem two Where Statement:
SELECT INVOICE_NUMBER,
LINK_TO_JOB
JOB_NUMBER,
CLINET_JOB_NUMBER
FROM JOBS_TABLE a,
INVOICE_TABLE b
How do I set up the where statement to get the desire result, I've tried:
WHERE (LINK_TO_JOB = JOB_NUMBER OR LINK_TO_JOB = CLIENT_JOB_NUMBER)
This returns lots of multiples, such as when the job number and client job number are identical and when there are multiple identical written in answers 'email' etc. Now this might be unavoidable and I will end up using a Distinct with this where statement to do the best I can with what I have. However what I want to do is:
WHERE (LINK_TO_JOB = JOB_NUMBER (+) OR LINK_TO_JOB = CLIENT_JOB_NUMBER (+))
Which comes back with an error as you can use outer joins with an OR operator.
If nothing comes from this I might just have to go with the OR connection and then throw in the Select Distinct and then build redundancy into Invoicing process so that when the database misses links a manual process catches them.
Although I'm all ears for any ideas.

One way of doing this would be to use a set operation. UNION will give you a distinct set of values. You haven't given much detail so I'm guessing at the specifics: you'll need to amend them for your needs.
with j as ( select * from jobs )
select j.*, inv.*
from invoices inv
join j on ( inv.job_no = j.job_no)
union
select j.*, inv.*
from invoices inv
join j on ( inv.job_no = j.client_no)
The underlying reason for your difficulties is that the data model is half-cooked. In a proper design INVOICES.JOB_NO would have a foreign key relationship referencing JOBS.JOB_NO. Whereas JOBS.CLIENT_NO would be an additional piece of information, a business key, but would not be referenced by INVOICES. Of course it can be displayed on an actual invoice, that's why Nature gave us joins.

Use SELECT DISTINCT to remove the duplicates from your results set.

OK, well group effort here. I used the union join like suggested by APC. and modified to fit my data and all of it's eccentricities (read the French couldn't data model there way out of a paper bag) And then I surrounded everything in a distinct statement suggested by user1871207 and Hikaru-Shindo.
But negative marks go to me, the reason my question was so unclear was several fold, but the big piece of information that was difficult for me to grasp / explain was that Invoices are not always for jobs, coupled with the fact that Invoices can be consolidated (which just went and screwed everything up) and This is just a big mess that I've with your help managed to put a very small piece of two year old scotch tape on.
My only hope for a continued career here is to use the exceptions that come up (and they will come at me like a spider monkey!) to hopefully amend the entire invoice process so that we can report some basic profit and loss numbers.
Cheers for all your help.

Related

SQL Query is creating way too many repeated rows

i have an issue with a sql query and how the output is being displayed, you see, i have 3 tables and have at least one field in common, the thing is when i join 2 tables together the information i need is displayed properly, but when i join the third the output goes insane and duplicates the results way too much and i need to figure out why it is happening, down below i'll show you all the tables and relations between each other
this is how the tables are related to each other
This is how the first table (dbo_predios) is made the first three fields are the only relevant in this case
This is how the second table (dbo_permisos_obras_mayores) is made the first three fields are the only relevant in this case as well, the second two can match the first table (dbo_predios)
And here is how the third table (dbo_recepciones_obras_mayores) is made, the fourth field is the only relevant in this case, it could relate to the second table (dbo_permisos_obras_mayores) to the same name field
okay, now that is structurewise, now the query i'm executing is the following:
SELECT
dbo_predios.codigo_unico_predio,
dbo_permisos_obras_mayores.numero_permiso_edificacion,
dbo_permisos_obras_mayores.fecha_permiso_edificacion
FROM dbo_predios
INNER JOIN dbo_permisos_obras_mayores ON dbo_predios.codigo_manzana_predio = dbo_permisos_obras_mayores.codigo_manzana_predio AND dbo_predios.codigo_lote_predio = dbo_permisos_obras_mayores.codigo_lote_predio
INNER JOIN dbo_recepciones_obras_mayores ON dbo_permisos_obras_mayores.numero_recepcion_permiso = dbo_recepciones_obras_mayores.numero_recepcion_permiso
WHERE dbo_permisos_obras_mayores.codigo_manzana_predio = 9402 AND dbo_permisos_obras_mayores.codigo_lote_predio = 30
And the result of executing the query in that way is this:
Later on i did some trial and error and removed the second inner join line, and the result surprised me, here is what happened:
Conclusion: in brief the third table is causing the cartesian product, why? i wish i knew why, what do you think of this particular case? i'd thank any help you could give me, thanks in advance.
Here's the solution - since you are saying that the numero_recepcion_permiso is blank, just add the condition to the inner join, to exclude empty ones:
SELECT
dbo_predios.codigo_unico_predio,
dbo_permisos_obras_mayores.numero_permiso_edificacion,
dbo_permisos_obras_mayores.fecha_permiso_edificacion
FROM dbo_predios
INNER JOIN dbo_permisos_obras_mayores ON dbo_predios.codigo_manzana_predio = dbo_permisos_obras_mayores.codigo_manzana_predio AND dbo_predios.codigo_lote_predio = dbo_permisos_obras_mayores.codigo_lote_predio
INNER JOIN dbo_recepciones_obras_mayores ON dbo_permisos_obras_mayores.numero_recepcion_permiso = dbo_recepciones_obras_mayores.numero_recepcion_permiso
AND dbo_recepciones_obras_mayores.numero_recepcion_permiso <>''
WHERE dbo_permisos_obras_mayores.codigo_manzana_predio = 9402 AND dbo_permisos_obras_mayores.codigo_lote_predio = 30
With that said, should that field allowed to be blank or NULL? Perhaps you need to add a constraint to your table to prevent that scenario. Another suggestion - why did you choose NUMERIC(18,0) as the data type on the primary key for those tables? I would prefer a simple INT or BIGINT and maybe let the database generate the sequence for me.
Okay, i did what Icarus told me and i figured out something that is useful, you see, i made a big mistake and the number combination i was trying out didn't have a numero_recepcion_permiso so the output column is completely blank, however when there is an actual numero_recepcion_permiso it shows correctly, anyway i still need that doesn't output that much amount of repeated rows, how can i fix that? thank y'all for your help so far
First of all, make sure that both values exist in both fields and they actually match or else could generate that amount of repeated rows, however the amount of rows repeated is something i can't tell since i don't know what your actual data is, but that may clear up a Little bit that issue

How to get data based on two columns from same table in SQL

I wanted to retrieve some data from a table based on two columns see the below table structure
Update
i want the output data based on two condition
1. if the code value is having 'Web' or 'Offline'.
2. Memo column is having data same as Pre_memo column.
Output should be as shown below
So far i got the output by using same table two times but i wanted to get the output result by using the table only 1 time to avoid performance related issues as this table is having huge data.
select distinct OrderTable.Memo,
max(OrderTable.Memo_Date) as Date1,
max(ot.Pre_Memo_Date) as Date2
from OrderTable,
OrderTable ot
where OrderTable.code in ('Web')
and ot.code in ('Offline')
and OrderTable.Memo = ot.Pre_Memo
group by OrderTable.Memo
Can anyone help on this? With the use of OrderTable only once in the query and filter based on memo and pre_memo column as it's having same data?
You can use union all and do the conditional aggregation :
select Memo, max(case when code = 'Offline' then Date end) as Memo_date,
max(case when code = 'Web' then Date end) as Per_Memo_date
from (select Date, 'Web' as code, Pre_memo as Memo
from OrderTable o
where code = 'Web'
union all
select Date, 'Offline', Memo
from OrderTable o
where code = 'Offline'
) t
group by Memo;
"I wanted to retrieve some data from a table based on two columns see the below table structure"
Providing a sample is sufficient to illustrate the problem (and it is desirable to do so on SO) but it is not sufficient and thus not a replacement for defining the problem, which you have failed to do.
Absent such definition of the problem, we can only guess what you're trying to achieve. E.G.
from the subset of tuples that have 'Offline' for 'code' value, take the MAX() 'Date' value per appearing value of 'Memo'.
Match that (using some matching condition) to the subset of tuples that have 'Web' for 'code value and retain the 'Date' value from those as 'Memo_date' in the result set.
matching condition being that 'Memo' value of [a tuple in] the former is equal to 'Pre_memo' value in [the matching tuple in] the latter.
If all that is correct, then that explains why it is impossible to do this in SQL without having at least two references. You cannot avoid doing some kind of matching, and matching by definition takes two distinct things to match (even if the two distinct things are distinct subsets of one and the same thing). In fact it is almost certainly a fundamental design mistake for you to have those two distinct things in one single table, probably under the totally misguided belief that "having everything in one table makes things easier".
"So far i got the output by using same table two times but i wanted to get the output result by using the table only 1 time to avoid performance related issues as this table is having huge data"
From the way you have presented the question, I suspect that you were hoping for some means to exploit the fact that those 'Offline' tuples are "the next" after a 'Web' tuple, and that you could write the SQL in such a way that the engine could then derive a sort of "single pass" algorithm (which you probably assume would go faster).
It does not work like that. SQL tables have no inherent ordering and as a consequence there simply ain't no such thing as "the next" in a table.

SQL to Spotfire query filtering issue with multiple tables

I am trying to calculate hours flowing in and out of a cost center. When the cost center lends out an employee for an hour it's +1 and when they borrow an employee for an hour it's -1.
Right now I'm using a query that says
select
columns
from dbo.table
where EmployeeCostCenter <> ProjectCostCenter
So when ProjectCostCenter = ID_CostCenter it returns +HoursQuantity.
Then I update ID_CostCenter = EmployeeCostCenter then where ID_CostCenter = EmployeeCostCenter to take -HoursQuantity.
That works fine. The problem is when I import it to Spotfire I can't filter on the main table even after I added the table relations. Can anyone explain why?
I can upload the actual code if needed, but I use 4 queries and a couple of them are quite lengthy. The main table, a temp table to calculate incoming hours, and a temp table to calculate outgoing hours are the only ones involved in this problem I think.
(moved to answer to avoid lengthy discussion)
Essentially, data relations are used to populate filtering / marking between different data-sets. Just like in RDBMS, the relation is what Spotfire uses as the link between dataset. Essentially it's the same as the column or columns you join on. Thus, any column that you wish to filter in TableA and have the result set limited in TableB (or visa versa) must be a relation.
Column matches aren't related columns, but are associated for aggregations, category axis, etc within each visualization. So if TableA has "amount" and TableB has "amount debit" and you wanted to use both of these in an expression, say Sum([TableA].[amount],[TableB].[amount debit]), they would need to be matched in order to not produce erroneous results.
Lastly, once you set up your relations, you should check your filter panel to set up how you want the filtering to work. You can have the rows included, excluded, or ignored all together. Here is a link explaining that.

Troubleshooting an inner join that's not finding expected matches

Okay...I'm not a coder and my code has lots of steps that would make it difficult to post and get an outright answer. So I am looking for general steps you would follow if an inner join does not seem to be working correctly. Here is my general situation:
Problem with inner join:
I start with two tables that I basically am appending to each other - they share most fields, including "id". One of the tables contains households who receive an email, and the other table are households who did not receive an email - "controls". So I append them into a single table, keep in mind they come from different sources with different processes creating them.
Then I match the id against another table that contains only customers and get a custnum for some of those households that are indeed customers.
Next is to use the custnum variable to join to a sales table. At least some controls, and likely a greater number of the mailed households should be customers and have sales - the point of the email was to obviously bring about sales.
My problem is that NO control households are showing up with any sales. That is impossible, given that there are hundreds of thousands of households. I'm getting a reasonable number of matches to the emailed households.
In trying to troubleshoot this all I can figure is that somehow there is a format issue of the id or the custnum fields between the mailed and control households - perhaps because they did come from different sources and I had to append them together at the start.
Is this possible? Should both the format and informat be the same for each key?What else could be the problem?
It is much easier to append data using a DATA STEP than using SQL statements.
data both ;
set email noemail;
run;
If you want to do the same thing in SQL then use UNION instead of any type of join.
proc sql ;
create table both as
select * from email
union
select * from noemail
;
quit;

SQL - Complex query using foreign keys

So, I am totally new to SQL, but the book I have from the courses I take is useless and I am trying to do a project for said course. Internet did not help me all that much (I do not know where to start exactly), so I want to ask for both links to good tutorials to check out as well as help with a very specific piece of query.
If anything I say is not clear enough, please ask me to explain! :)
Suppose two tables sale and p_sale in a database called jewel_store.
sale contains two columns: sale_CODE and sale_date
p_sale contains sale_CODE which references the above sale_ID, p_ID, p_sl_quantity and
p_sl_value
sale_CODE is the primary key of sale and sale_CODE,p_ID is the primary key of p_sale
For the time being p_ID is not of much use so just ignore it for the most part.
p_sl_quantityis int and p_sl_value is double(8,2). The first one is the quantity of the product bought and the second one is the value PER UNIT of the product.
As it probably is obvious a sale_CODE can be linked to a multitude of entries in the p_sale table (example for sale_CODE 1, I have 2 entries on p_sale).
All this is based on what I was given from the task and is correctly implemented and has some example values in.
What I now have to do is find the total income from sales in a specific month. My initial approach was to start structuring everything step by step so I have come to a point that looks like the follows:
SELECT
SUM(p_sl_value * p_sl_quantity) AS sales_monthly_income,
p_sale.sale_CODE
FROM jewel_store.p_sale
GROUP BY p_sale.sale_CODE
This is probably half way through as I can get the total money a sale generated for the store. So my next step was to use this query and SELECT from it. I messed it up a couple of times already and I am scratching my head now. What I did was like this:
SELECT
SUM(sales_monthly_income),
sales_monthly_income,
EXTRACT(MONTH FROM jewel_store.sale.sale_date) AS sales_month
FROM (
SELECT
SUM(p_sl_value * p_sl_quantity) AS sales_monthly_income,
sale_CODE
FROM jewel_store.p_sale
GROUP BY sale_CODE
) as code_income, jewel_store.sale
GROUP BY sales_month
First off, I only need to print the total_montly_income and the month columns in my final form, but I used this to clarify that everything went wrong in there. I think I need to somehow use the foreign key that references the other table, but my book is totally useless in helping me out. I would like someone to explain why this is wrong and what the right one would be and please point me to a good pdf, site or anything to learn how to do this kind of stuff. (I have checked W3SCHOOLS, it is good for the basics, but not for too advanced stuff)
Thanks in advance!
From the top of my head this could be it, group by month the sum of value times quantity.
SELECT
SUM(p.p_sl_value * p.p_sl_quantity) AS sales_monthly_income,
month(s.sale_date)
FROM p_sale p
inner join sale s on s.sale_code = p.sale_code
GROUP BY MONTH(s.sale_date)