SQL - Complex query using foreign keys - sql

So, I am totally new to SQL, but the book I have from the courses I take is useless and I am trying to do a project for said course. Internet did not help me all that much (I do not know where to start exactly), so I want to ask for both links to good tutorials to check out as well as help with a very specific piece of query.
If anything I say is not clear enough, please ask me to explain! :)
Suppose two tables sale and p_sale in a database called jewel_store.
sale contains two columns: sale_CODE and sale_date
p_sale contains sale_CODE which references the above sale_ID, p_ID, p_sl_quantity and
p_sl_value
sale_CODE is the primary key of sale and sale_CODE,p_ID is the primary key of p_sale
For the time being p_ID is not of much use so just ignore it for the most part.
p_sl_quantityis int and p_sl_value is double(8,2). The first one is the quantity of the product bought and the second one is the value PER UNIT of the product.
As it probably is obvious a sale_CODE can be linked to a multitude of entries in the p_sale table (example for sale_CODE 1, I have 2 entries on p_sale).
All this is based on what I was given from the task and is correctly implemented and has some example values in.
What I now have to do is find the total income from sales in a specific month. My initial approach was to start structuring everything step by step so I have come to a point that looks like the follows:
SELECT
SUM(p_sl_value * p_sl_quantity) AS sales_monthly_income,
p_sale.sale_CODE
FROM jewel_store.p_sale
GROUP BY p_sale.sale_CODE
This is probably half way through as I can get the total money a sale generated for the store. So my next step was to use this query and SELECT from it. I messed it up a couple of times already and I am scratching my head now. What I did was like this:
SELECT
SUM(sales_monthly_income),
sales_monthly_income,
EXTRACT(MONTH FROM jewel_store.sale.sale_date) AS sales_month
FROM (
SELECT
SUM(p_sl_value * p_sl_quantity) AS sales_monthly_income,
sale_CODE
FROM jewel_store.p_sale
GROUP BY sale_CODE
) as code_income, jewel_store.sale
GROUP BY sales_month
First off, I only need to print the total_montly_income and the month columns in my final form, but I used this to clarify that everything went wrong in there. I think I need to somehow use the foreign key that references the other table, but my book is totally useless in helping me out. I would like someone to explain why this is wrong and what the right one would be and please point me to a good pdf, site or anything to learn how to do this kind of stuff. (I have checked W3SCHOOLS, it is good for the basics, but not for too advanced stuff)
Thanks in advance!

From the top of my head this could be it, group by month the sum of value times quantity.
SELECT
SUM(p.p_sl_value * p.p_sl_quantity) AS sales_monthly_income,
month(s.sale_date)
FROM p_sale p
inner join sale s on s.sale_code = p.sale_code
GROUP BY MONTH(s.sale_date)

Related

How to create a relationship where all columns have many details

I am working in a small personal project about capital expenses. There is one part I can't figure it out.
The tables i have are the following:
capex_form
capex_cashflow
When I create a capex_form I am able to request money and divide this money however I want in 13 months including this month (to show how I I will pay it in the next year). this will reflect in capex_cashflow who has 13 columns with either an amount or 0.
The problem comes here:
I need to be able to add many descriptions for each payment. For example:
in July 2019 I will spend 200 ( this is done), I need to enter a breakdown of this 200 dollars and a description. 50 dollars on one thing and 150 on another thing.
I added 3 columns per month which works, But then it will only let me add one description per month.
I was thinking I might be able to create another table for description, but how this is going to related to a specific column(month). As far as my brain gives, you relate one table with another table not column.
I also was thinking to create 13 tables for 13 months, but I think there should be something I am missing to avoid to create 13 unnecessary tables.
I appreciate any kind of help or guidance
This is pretty straightforward and a common thing.
Put an index column in the "header" table. The header table is a summary of the information, so in your case may you create a table that just takes the capex_income.
CAPEX_FORM
Capex_id
Capex_Amount
Then create a payment table, the payment table can have a month column (only 1) and a capex_Id column, along with a description or whatever else you need
CAPEX_PAYMENT
Capex_payment_id
Capex_id
Payment_Amount
Month (1-13)
Description
Now because you have the Capex_id in this table, it will be related to the CAPEX table and you will be able to query all the payments that are associated like so
select payment_amount, month, description from capex_payment p join capex_form f on p.capex_id = f.capex_id

Database structures - populating tables via foreign keys and summation

Good afternoon,
I've been trying to brainstorm ways to do this, but I've been unable to think of something solid so far. As a last resort I've decided to ask the brilliant minds on stack overflow on some advice as to how I should do this. Some preface before I get started, I'm using SMSS 18 and Microsoft SQL Server 2017.
First of all, I'll show you the backbone database that I want to achieve with this:
Should I structure the database another way to achieve what I want to achieve? (A way to log meals with a summation of all the calories, etc.)
Is it possible to use foreign keys to populate a column in a foreign table? E.g. using X as primary key, populating Z column(in another table) with Y as a foreign key
How do I go about with wrapping this up and using SMSS to create the foreign keys to automatically do this?
Is it possible to do summation in MSSQL? I wanted to get the sum of calories in the final meal table so that I can later use that data in tableau to draw comparisons and create representations of that data. Should I let tableau do that or is it possible to do so with MSSQL?
I'm all over the place right now, I know my questions seem daunting, but I'm happy to provide much more information if you're missing anything.
I've got the flow of data planned and designed - but I do not know how exactly to implement this architecture in MSSQL.
Thank you in advance.
Well, I guess this is not really an answer to your question...
I would expect some data model like this:
ingredient (ingredient_id, ingredient_name, calories, fat, ...)
meal (meal_id, meal_name)
meal_ingredient (meal_id, ingredient_id, amount)
Per meal you would then calculate:
select mi.meal_id, sum(i.calories * mi.amount), sum(i.fat * mi.amount)
from ingredient
join meal_ingredient mi on mi.ingredient_id = i.ingredient_id
group by mi.meal_id;
For persons eating the meals:
person (person_id, name)
person_meal (person_id, meal_id, consume_date, consume_amount)
And a query for the persons' daily consumption:
select
pm.person_id, pm.consume_date,
sum(i.calories * mi.amount * pm.consume_amount),
sum(i.fat * mi.amount * pm.consume_amount)
from ingredient
join meal_ingredient mi on mi.ingredient_id = i.ingredient_id
join person_meal pm on pm.meal_id, mi.meal_id
group by pm.person_id, pm.consume_date
order by pm.person_id, pm.consume_date;
We would not store results reduantly, as we can always write queries to get them from the tables. After all, this is what SQL is about.

Linking a table to two columns in a second table

I have an issue where I think my major problem is figuring out how to phrase it to get an acceptable answer from Google.
The situation:
Table A is 'Invoice's it has a column that links to Table B 'Jobs' in two places. It either links to our 'Job Number' column or the 'Client Number' column. The major issue is the fact that 'Client Number' and 'Job Number' can be the same number if we set the job up instead of the client setting the job up.
What I'm getting is that every time there is the same number in either column the results are duplicated.
Now this is extremely simplifying the situation to try and make it a bit more understandable, but I am essentially looking for a statement that looks at Table A gets the value then compares against Column B1 if that doesn't match then compares it against B2 if that doesn't match then excludes it from the results. The key would be that if it matches when it compares against B1 it doesn't go on to compare it against B2.
Any help with this would be greatly appreciated, even if it is just a point in the direction of the very obvious operator or function that does this. It's hitting the end of a very long day.
Thank you.
Edit:
A further description:
Invoice Table
---------------------------------
PK, INVOICE_NUMBER, LINK_TO_JOB
Job Table
---------------------------------
PK, JOB_NUMBER, CLIENT_JOB_NUMBER
Now the crux of the matter is that both PK are database generated sequential numbers, no overlap there. The invoice number and the job number are both application generated sequential numbers with no overlap the link to job is application generated and when an invoice is raised links to one of two fields in the jobs table based on rules. For simplicity lets say those rules are if there is a Client Job Number link to that if not link to the job number.
Now the Client job number is a field that is written into buy people, lots of mistakes can and do happen, but lots of crap gets put in this field as well. Stuff like 'Email' 'Fax' are very common answers. So when there is crap in there like 'Email' it links to a series of other fields holding the same 'Email' tag.
So that's problem one.
Problem two Where Statement:
SELECT INVOICE_NUMBER,
LINK_TO_JOB
JOB_NUMBER,
CLINET_JOB_NUMBER
FROM JOBS_TABLE a,
INVOICE_TABLE b
How do I set up the where statement to get the desire result, I've tried:
WHERE (LINK_TO_JOB = JOB_NUMBER OR LINK_TO_JOB = CLIENT_JOB_NUMBER)
This returns lots of multiples, such as when the job number and client job number are identical and when there are multiple identical written in answers 'email' etc. Now this might be unavoidable and I will end up using a Distinct with this where statement to do the best I can with what I have. However what I want to do is:
WHERE (LINK_TO_JOB = JOB_NUMBER (+) OR LINK_TO_JOB = CLIENT_JOB_NUMBER (+))
Which comes back with an error as you can use outer joins with an OR operator.
If nothing comes from this I might just have to go with the OR connection and then throw in the Select Distinct and then build redundancy into Invoicing process so that when the database misses links a manual process catches them.
Although I'm all ears for any ideas.
One way of doing this would be to use a set operation. UNION will give you a distinct set of values. You haven't given much detail so I'm guessing at the specifics: you'll need to amend them for your needs.
with j as ( select * from jobs )
select j.*, inv.*
from invoices inv
join j on ( inv.job_no = j.job_no)
union
select j.*, inv.*
from invoices inv
join j on ( inv.job_no = j.client_no)
The underlying reason for your difficulties is that the data model is half-cooked. In a proper design INVOICES.JOB_NO would have a foreign key relationship referencing JOBS.JOB_NO. Whereas JOBS.CLIENT_NO would be an additional piece of information, a business key, but would not be referenced by INVOICES. Of course it can be displayed on an actual invoice, that's why Nature gave us joins.
Use SELECT DISTINCT to remove the duplicates from your results set.
OK, well group effort here. I used the union join like suggested by APC. and modified to fit my data and all of it's eccentricities (read the French couldn't data model there way out of a paper bag) And then I surrounded everything in a distinct statement suggested by user1871207 and Hikaru-Shindo.
But negative marks go to me, the reason my question was so unclear was several fold, but the big piece of information that was difficult for me to grasp / explain was that Invoices are not always for jobs, coupled with the fact that Invoices can be consolidated (which just went and screwed everything up) and This is just a big mess that I've with your help managed to put a very small piece of two year old scotch tape on.
My only hope for a continued career here is to use the exceptions that come up (and they will come at me like a spider monkey!) to hopefully amend the entire invoice process so that we can report some basic profit and loss numbers.
Cheers for all your help.

How do I optimise my voting application to produce monthly charts?

I'd appreciate any help you can offer - I'm currently trying to decide on a schema for a voting app I'm building with PHP / MySQL, but I'm completely stuck on how to optimise it. The key elements are to allow only one vote per user per item, and be able to build a chart detailing the top items of the month – based on votes received that month.
So far the initial schema is:
Items_table
item_id
total_points
(lots of other fields unrelated to voting)
Voting_table
voting_id
item_id
user_id
vote (1 = up; 0 = down)
month_cast
year_cast
So I'm wondering if it's going to be a case of selecting all information from voting table where month = currentMonth & year = currentYear, somehow running a count and grouping by item_id; if so, how would I go about doing so? Or would I be better off creating a separate table for monthly charts which is updated with each vote, but then should I be concerned with the requirement to update 3 database tables per vote?
I'm not particularly competent – if it shows – so would really love any help / guidance someone could provide.
Thanks,
_just_me
I wouldn't add separate tables for monthly charts; to prevent users from casting more than one vote per item, you could use a unique key on voting_table(item_id, user_id).
As for the summary, you should be able to use a simple query like
select item_id, vote, count(*), month, year
from voting_table
group by item_id, vote, month, year
I would use a voting table similar to this:
create table votes(
item_id
,user_id
,vote_dtm
,vote
,primary key(item_id, user_id)
,foreign key(item_id) references item(item_id)
,foreign key(user_id) references users(user_id)
)Engine=InnoDB;
Using a composite key on a innodb table will cluster the data around the items, making it much faster to find the votes related to an item. I added a column vote_dtm which would hold the timestamp for when the user voted.
Then I would create one or several views, used for reporting purposes.
create view votes_monthly as
select item_id
,year(vote_dtm) as year
,month(vote_dtm) as month
,sum(vote) as score
,count(*) as num_votes
from votes
group
by item_id
,year(vote_dtm)
,month(vote_dtm);
If you start having performance issues, you can replace the view with a table containing pre-computed values without even touching the reporting code.
Note that I used both count(*) and sum(vote). The count(*) would return the number of cast votes, whereas the sum would return the number of up-votes. Howver, if you changed the vote column to use +1 for upvotes and -1 for downvotes, a sum(vote) would return a score much like the votes on stackoverflow are calculated.

What is a fast way of joining two tables and using the first table column to "filter" the second table?

I am trying to develop a SQL Server 2005 query but I'm being unsuccessful at the moment. I trying every different approach that I know, like derived tables, sub-queries, CTE's, etc, but I couldn't solve the problem. I won't post the queries I tried here because they involve many other columns and tables, but I will try to explain the problem with a simpler example:
There are two tables: PARTS_SOLD and PARTS_PURCHASED. The first contains products that were sold to customers, and the second contains products that were purchased from suppliers. Both tables contains a foreign key associated with the movement itself, that contains the dates, etc.
Here is the simplified schema:
Table PARTS_SOLD:
part_id
date
other columns
Table PARTS_PURCHASED
part_id
date
other columns
What I need is to join every row in PARTS_SOLD with a unique row from PARTS_PURCHASED, chose by part_id and the maximum "date", where the "date" is equal of before the "date" column from PARTS_PURCHASED. In other words, I need to collect some information from the last purchase event for the item for every event of selling this item.
The problem itself is that I didn't find a way of joining the PARTS_PURCHASED table with PARTS_SOLD table using the column "date" from PARTS_SOLD to limit the MAX(date) of the PARTS_PURCHASED table.
I could have done this with a cursor to solve the problem with the tools I know, but every table has millions of rows, and perhaps using cursors or sub-queries that evaluate a query for every row would make the process very slow.
You aren't going to like my answer. Your database is designed incorrectly which is why you can't get the data back out the way you want. Even using a cursor, you would not get good data from this. Assume that you purchased 5 of part 1 on May 31, 2010. Assume on June 1, you sold ten of part 1. Matching just on date, you would match all ten to the May 31 purchase even though that is clearly not correct, some parts might have been purchased on May 23 and some may have been purchased on July 19, 2008.
If you want to know which purchased part relates to which sold part, your database design should include the PartPurchasedID as part of the PartsSold record and this should be populated at the time of the purchase, not later for reporting when you have 1,000,000 records to sort through.
Perhaps the following would help:
SELECT S.*
FROM PARTS_SOLD S
INNER JOIN (SELECT PART_ID, MAX(DATE)
FROM PARTS_PURCHASED
GROUP BY PART_ID) D
ON (D.PART_ID = S.PART_ID)
WHERE D.DATE <= S.DATE
Share and enjoy.
I'll toss this out there, but it's likely to contain all kinds of mistakes... both because I'm not sure I understand your question and because my SQL is... weak at best. That being said, my thought would be to try something like:
SELECT * FROM PARTS_SOLD
INNER JOIN (SELECT part_id, max(date) AS max_date
FROM PARTS_PURCHASED
GROUP BY part_id) AS subtable
ON PARTS_SOLD.part_id = subtable.part_id
AND PARTS_SOLD.date < subtable.max_date