I am currently having trouble with learning SQL, and am unable to get a table to join to another one when two or more of the columns in both tables are the same.
For example, I have 2 tables:
(I'm not sure how to post the code so I've just posted a link I hope that this is ok)
This is table 1, it shows how long each stage of each Project will take
http://puu.sh/gt92M/3dfe0063f0.png
This is table 2, it shows how long the stage of each project has been worked upon
http://puu.sh/gt9HO/2fd5090c9a.png
So far I have been able to put them into the same table, but I am unable to get the hours taken into its own column, currently they mix with the hours needed column.
SELECT ID, Stage, SUM(Hours_Taken)
FROM Work
GROUP BY ID, Stage
UNION
SELECT ID, Stage, Hours
FROM Budget_Allocation
GROUP BY ID, Stage
As you can see, each project has stages, and each stage needs a different amount of work hours. I want to be able to display a 4 columned table:
ID
Stage
Hours
Hours_Taken.
You are asking for a result whose columns include some derived from one table and others derived from a different table. That means you need to perform some kind of JOIN. The UNION operator does not join tables, it just collates multiple row sets into a single row set, eliminating duplicates.
One of the rowsets you want to select from is not a base table, however, but rather the result of an aggregate query. This calls for a subquery, the results of which you join to the other base table as needed:
SELECT
tw.ID AS ID,
tw.Stage AS Stage,
ba.Hours AS Hours,
tw.Hours_Taken AS Hours_Taken
FROM
Budget_Allocation ba
-- JOIN operator --
JOIN (
-- here's the subquery --
SELECT ID, Stage, SUM(Hours_Taken) AS Hours_Taken
FROM Work
GROUP BY ID, Stage
) tw
-- predicate for the preceding JOIN operator --
ON ba.ID = tw.ID AND ba.Stage = tw.Stage
Note that in this case you do not want to join base tables first and then aggregate rows of the joint results, because you are selecting values from one column (Budget_Allocation.Hours) that is neither a grouping column nor a function of the groups. There are workarounds and implementation-specific exceptions to that limitation, but in this case it's easy to do the right thing straight off by aggregating before joining.
you are doing union instead of join.
select w.id,w.stage,w.hours_taken, b.hours
from work w, budge_allocation b
where w.id = b.id and
w.stage = b.stage;
now you have everything you need in one row and can do what you want with it.
Related
I need to run a query every hour against a table that joins and aggregates data from another table with millions of rows.
select f.master_con,
s.containers
from
(
select master_con
from shipped
where start_time >= a and start_time <= a+1
) f,
(
select master_con,
count(distinct container) as containers
from picked
) s
where f.master_con = s.master_con
This query above sorta works, the exact syntax may not be correct because I wrote it from memory.
In the sub query 's' I only want to count container for each master_con in the 'f' query, and I think my query runs for a long time because I'm counting container for all master_con but then joining only to master_con from 'f'
Is there a better, more efficient way to write this type of query?
(In the end, I'll sum(containers) from this query above to get total containers shipped during that hour)
Most likely, there is. Can you provide some simplified sample table structures? Additionally, the join method being used has been moving towards deprecation for some time. You should declare your joins explicitly. The below should be an improvement. Left outer join was used so that you get all of the shipper records that meet your criteria and keep them even if they aren't in the picked table. Change that to inner join if you want them gone.
SELECT shipped.master_con,
COUNT(DISTINCT picked.containers) AS containers
FROM shipped LEFT OUTER JOIN
Picked ON picked.master_con = shipped.master_con
WHERE shipped.start_time BETWEEN a AND a+1
GROUP BY shipped.master_con
I have an Oracle DB and use this query below to fetch records for a requirement. Five columns from three tables and a where condition.
select un.name, he.emp_no, he.lname, hr.in_unit, hr.out_unit
from hr_employee he
inner join hr_roster hr on he.eid = hr.eid
inner join units un on he.unit = un.unit_code
where hr.unit_date = to_date( '24-JUL-20','dd-MON-yy')
Later on I realize that if used in this way below, without Joins it is slightly faster.
select un.name, he.emp_no, he.lname, hr.in_unit, hr.out_unit
from hr_employee he, hr_roster hr, units un
where hr.unit_date = to_date( '24-JUL-20','dd-MON-yy')
But I notice that there's a difference of the rows getting fetched comparing the queries above.
When I took a row count of both queries, the one using Joins returns 1012 and the other one keeps fetching without a count.
I am bit confused and do not know which query is the most suitable to use.
The Second query treats as a CROSS JOIN, since there's no respective join conditions among those tables' columns, just exists a restriction due to a certain date, while the first one has a standard inner joins among tables with regular INNER JOIN conditions.
The second query is basically incorrect as does not have join conditions on the second and 3rd table, except for a limitation on a date for the first table only. So it basically produces a cartesian product of the selected records from 1rst table times ALL records on 2nd table times ALL records on 3rd table.
The first query, which looks more correct, produces the selected records on 1rst table times the records on 2nd table joined by he.eid = hr.eid times the records on 3rd table joined by he.unit = un.unit_code
I'm sorry if this is a dumb question, but I have this particular case I can't figure how to handle. I need a query where I get all date values between two date values on another tables, and right now this is my query
SELECT h.hour_gkey, h.hour_time
FROM Hours as h
INNER JOIN ServiceHours sh ON h.hour_gkey BETWEEN sh.openhour_hour_gkey AND sh.closehour_hour_gkey;
So to explain it a bit further, the ServiceHours table has two fields openhour_hour_gkey and closehour_hourg_key that are integer, this two fields contain Foreign Keys of the Hours table and therefore they have time values, the hour_gkey(integer) its the primary key of Hours table and I need to show only the values of hour_time (date fieldtype) that are between the dates that correspond to those two fields. How could I do that
Using right now SQL Server 2014
I'm interpreting your question to be "How do I select all of the rows of Hours whose hour_time values are between those related to ServiceHours.openhour_hour_gkey and ServiceHours.closehour_hour_gkey?"
I furthermore suppose that it is intentional that you are neither selecting any columns from ServiceHours nor filtering to narrow the results to those associated with a single ServiceHours row. Thus, if there are multiple ServiceHour rows you will get a set of Hours rows for each one, with these sets not necessarily being disjoint, and with no indication of which goes with which ServiceHour.
In any case, you need to perform a join for each relationship you want to traverse, and for this query you seem to want another join to get the target data. that might look like this:
SELECT h.hour_gkey, h.hour_time
FROM
Hours h
CROSS JOIN ServiceHours sh
INNER JOIN Hours sho
ON sh.openhour_hour_gkey = sho.hour_gkey
INNER JOIN Hours shc
ON sh.closehour_hour_gkey = shc.hour_gkey
WHERE h.hour_time BETWEEN sho.hour_time AND shc.hour_time;
I have written the BETWEEN condition as a filter predicate instead of a join predicate because that seems a better characterization. For inner joins, however, the two alternatives are equivalent. Note also that this query is semantically equivalent to #DaveCosta's.
Here is one way I think it could be done. I am not certain if the syntax is exactly right for SQL Server. But the basic idea is, you would need to join from ServiceHours to Hours to get the actual open/close hour values, then select the rows from Hours with values in that range.
WITH min_max AS (
SELECT h1.hour_time min_hour_time, h2.hour_time max_hour_time
FROM ServiceHours sh
JOIN Hours h1 ON h1.hour_gkey = sh.openhour_hour_gkey
JOIN Hours h2 ON h2.hour_gkey = sh.closehour_hour_gkey
)
SELECT h.hour_time
FROM Hours h
JOIN min_max ON h.hour_time BETWEEN min_max.min_hour_time AND min_max.max_hour_time
(Note I'm assuming that ServiceHours has only one row. If it doesn't, there is probably some other field you want to include in both the subquery and the main query to indicate which row in ServiceHours each resulting row relates to.)
Here is a simplified description of 2 tables:
CREATE TABLE jobs(id PRIMARY KEY, description);
CREATE TABLE dates(id PRIMARY KEY, job REFERENCES jobs(id), date);
There may be one or more dates per job.
I would like create a query which generates the following (in pidgin):
jobs.id, jobs.description, min(dates.date) as start, max(dates.date) as finish
I have tried something like this:
SELECT id, description,
(SELECT min(date) as start FROM dates d WHERE d.job=j.id),
(SELECT max(date) as finish FROM dates d WHERE d.job=j.id)
FROM jobs j;
which works, but looks very inefficient.
I have tried an INNER JOIN, but can’t see how to join jobs with a suitable aggregate query on dates.
Can anybody suggest a clean efficient way to do this?
While retrieving all rows: aggregate first, join later:
SELECT id, j.description, d.start, d.finish
FROM jobs j
LEFT JOIN (
SELECT job AS id, min(date) AS start, max(date) AS finish
FROM dates
GROUP BY job
) d USING (id);
Related:
SQL: How to save order in sql query?
About JOIN .. USING
It's not a "different type of join". USING (col) is a standard SQL (!) syntax shortcut for ON a.col = b.col. More precisely, quoting the manual:
The USING clause is a shorthand that allows you to take advantage of
the specific situation where both sides of the join use the same name
for the joining column(s). It takes a comma-separated list of the
shared column names and forms a join condition that includes an
equality comparison for each one. For example, joining T1 and T2 with
USING (a, b) produces the join condition ON *T1*.a = *T2*.a AND *T1*.b = *T2*.b.
Furthermore, the output of JOIN USING suppresses redundant columns:
there is no need to print both of the matched columns, since they must
have equal values. While JOIN ON produces all columns from T1 followed
by all columns from T2, JOIN USING produces one output column for each
of the listed column pairs (in the listed order), followed by any
remaining columns from T1, followed by any remaining columns from T2.
It's particularly convenient that you can write SELECT * FROM ... and joining columns are only listed once.
In addition to Erwin's solution, you can also use a window clause:
SELECT j.id, j.description,
first_value(d.date) OVER w AS start,
last_value(d.date) OVER w AS finish
FROM jobs j
JOIN dates d ON d.job = j.id
WINDOW w AS (PARTITION BY j.id ORDER BY d.date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
Window functions effectively group by one or more columns (the PARTITION BY clause) and/or ORDER BY some other columns and then you can apply some window function to it, or even a regular aggregate function, without affecting grouping or ordering of any other columns (description in your case). It requires a somewhat different way of constructing queries, but once you get the idea it is pretty brilliant.
In your case you need to get the first value of a partition, which is easy because it is accessible by default. You also need to look beyond the window frame (which ends by default with the current row) to the last value in the partition and then you need the ROWS clause. Since you produce two columns using the same window definition, the WINDOW clause is used here; in case it applies to a single column you can just write the window function in the select list followed by the OVER clause and the window definition without its name (WINDOW w AS (...)).
I have combined two different tables together, one side is named DynDom and the other is CATH. I am trying to remove duplicates from that table such as below:
However, if i select distinct Dyndom pdbcode from the table, it returns distinct values of that pdbcode.
and
Based on the pictures above, I commented out the DynDom/CATH columns in the table and ran the query separately for DynDom/CATH and it returned those values accordingly, which is what i need and i was wondering if it's possible for me to use 2 distinct statements to return distinct values of the entire table based on the pdbcode.
Here's my code :
select DISTINCT
cath_dyndom_table_2."DYNDOM_DOMAINID",
cath_dyndom_table_2."DYNDOM_DSTART",
cath_dyndom_table_2."DYNDOM_DEND",
cath_dyndom_table_2."DYNDOM_CONFORMERID",
cath_dyndom_table_2.pdbcode,
cath_dyndom_table_2."DYNDOM_ChainID",
cath_dyndom_table_2.cath_pdbcode,
cath_dyndom_table_2."CATH_BEGIN",
cath_dyndom_table_2."CATH_END"
from
cath_dyndom_table_2
where
pdbcode = '2hun'
order by
cath_dyndom_table_2."DYNDOM_DOMAINID",
cath_dyndom_table_2."DYNDOM_DSTART",
cath_dyndom_table_2."DYNDOM_DEND",
cath_dyndom_table_2.pdbcode,
cath_dyndom_table_2.cath_pdbcode,
cath_dyndom_table_2."CATH_BEGIN",
cath_dyndom_table_2."CATH_END";
In the end, i would like to search domains from DynDom and CATH, based on the pdbcode and return the rows without having duplicate values.
Thank you.
UPDATE :
This is my VIEW table that i have done.
CREATE VIEW cath_dyndom_table AS
SELECT
r.domainid AS "DYNDOM_DOMAINID",
r.DomainStart AS "DYNDOM_DSTART",
r.Domain_End AS "DYNDOM_DEND",
r.ddid AS "DYN_DDID",
r.confid AS "DYNDOM_CONFORMERID",
r.pdbcode,
r.chainid AS "DYNDOM_ChainID",
d.cath_pdbcode,
d.cathbegin AS "CATH_BEGIN",
d.cathend AS "CATH_END"
FROM dyndom_domain_table r
FULL OUTER JOIN cath_domains d ON d.cath_pdbcode::character(4) = r.pdbcode
ORDER BY confid ASC;
What you are getting is the cartesian product of the ´two tables`.
In order to get one line without duplicates you need to have to have a 1-to-1 relation between both tables.
You can see HERE what are cartesian joins and HERE how to avoid them!
It sounds as though you want a UNION of domain name and ranges from each table - this can be achieved like so:
SELECT DYNDOM_DOMAINID, DYNDOM_DSTART, DYNDOM_DEND
FROM DynDom
UNION
SELECT RTRIM(cath_pdbcode), CATH_BEGIN, CATH_END
FROM CATH
This should eliminate exact duplicates (ie. where the domain name, start and end are all identical) but will not eliminate duplicate domain names with different ranges - if these exist you will need to decide how to handle them (retain them as separate entries, combine them with lowest start and highest end, or whatever other option is preferred).
EDIT: Actually, I believe you can get the desired results simply by changing the JOIN ON condition in your view to be:
FULL OUTER JOIN cath_domains d
ON d.cath_pdbcode::character(5) = r.pdbcode || r.chainid AND
r.DomainStart <= d.cathbegin AND
r.Domain_End >= d.cathend