Join with time comparison - sql

I have a trouble getting a code working with CI especially that I need to embed a complex condition on the join
I want to extract and count information from all fields in A even though not satisfying the condition which will result in zero 0 as count
First case that works great for count
$this->db->distinct();
$this->db->select('A.id, count(B.id) AS C');
$this->db->from('A');
$this->db->join('B','B.id=A.id','left outer');
$this->db->group_by('A.id');
Now if I want to add a condition on a row from A that is a datetime, so that we extract all information from A AFTER a given date, and same thing, if a condition is not satisfying we return 0 :
$this->db->distinct();
$this->db->select('A.id, count(B.id) AS C');
$this->db->from('A');
$this->db->join('B','B.id=A.id','left outer');
$this->db->where('B.date >',$date);
$this->db->group_by('A.id');
This code is working but retruning only rows satisfying the condition and not all the other rows with 0 as count.
Can someone tell me what is wrong with the where clause ?
Thanks

The where is undoing the left outer join. Can you do this?
$this->db->join('B','B.id=A.id and 'B.date >', $date, 'left outer');
In SQL, you would handle this by putting both conditions in the on statement:
from A left outer join
B
on B.id = A.id and B.date > $date;

Related

What are the possible ways to optimize the below postgreSQL code?

I have written this SQL query to fetch the data from greenplum datalake. The primary table has hardy 800,000ish rows which I am joining with other table. The below query is taking insane amount of time to give result. What might be the possible reason for the longer query time? How to resolve it?
select
a.pole,
t.country_name,
a.service_area,
a.park_name,
t.turbine_platform_name,
a.turbine_subtype,
a.pad as "turbine_name",
t.system_number as "turbine_id",
a.customer,
a.service_contract,
a.component,
c.vendor_mfg as "component_manufacturer",
a.case_number,
a.description as "case_description",
a.rmd_diagnosis as "case_rmd_diagnostic_description",
a.priority as "case_priority",
a.status as "case_status",
a.actual_rootcause as "case_actual_rootcause",
a.site_trends_feedback as "case_site_feedback",
a.added as "date_case_added",
a.start as "date_case_started",
a.last_flagged as "date_case_flagged_by_algorithm_latest",
a.communicated as "date_case_communicated_to_field",
a.field_visible_date as "date_case_field_visbile_date",
a.fixed as "date_anamoly_fixed",
a.expected_clse as "date_expected_closure",
a.request_closure_date as "date_case_request_closure",
a.validation_date as "date_case_closure",
a.production_related,
a.estimated_value as "estimated_cost_avoidance",
a.cms,
a.anomaly_category,
a.additional_information as "case_additional_information",
a.model,
a.full_model,
a.sent_to_field as "case_sent_to_field"
from app_pul.anomaly_stage a
left join ge_cfg.turbine_detail t on a.scada_number = t.system_number and a.added > '2017-12-31'
left join tbwgr_v.pmt_wmf_tur_component_master_t c on a.component = c.component_name
Your query is basically:
select . . .
from app_pul.anomaly_stage a left join
ge_cfg.turbine_detail t
on a.scada_number = t.system_number and
a.added > '2017-12-31' left join
tbwgr_v.pmt_wmf_tur_component_master_t c
on a.component = c.component_name
First, the condition on a is ignored, because it is the first table in the left join and is the on clause. So, I assume you actually intend for it to filter, so write the query as:
select . . .
from app_pul.anomaly_stage a left join
ge_cfg.turbine_detail t
on a.scada_number = t.system_number left join
tbwgr_v.pmt_wmf_tur_component_master_t c
on a.component = c.component_name
where a.added > '2017-12-31'
That might help with performance. Then in Postgres, you would want indexes on turbine_detail(system_number) and pmt_wmf_tur_component_master_t(component_name). It is doubtful that an index would help on the first table, because you are already selecting a large amount of data.
I'm not sure if indexes would be appropriate in Greenplum.
Verify if the joins are using respective primary and foreign keys.
Try to execute the query removing one left join after the other, so you see the focus the problem.
Try using the plan execution.

Write an additional column to query result with different values everytime

I've been searching for quite a while now and I haven't been able to find an answer for what I was looking. I have the following query:
SELECT DISTINCT o.titulo, o.fecha_estreno
FROM Obra o
WHERE (o.titulo LIKE '%Barcelona%' AND EXISTS(SELECT p.id_obra FROM Pelicula p WHERE p.id_obra = o.id_obra)) OR EXISTS(SELECT DISTINCT pa.id_obra
FROM Participa pa
WHERE pa.id_obra = o.id_obra AND EXISTS(SELECT DISTINCT l.nombre FROM Lugar l
WHERE l.nombre LIKE '%Barcelona%' AND EXISTS(SELECT DISTINCT tl.id_lugar FROM TieneLugar tl
WHERE tl.id_lugar = l.id_lugar AND tl.id_profesional = pa.id_profesional))) OR EXISTS(SELECT DISTINCT er.id_obra
FROM EstaRelacionado er
WHERE er.id_obra = o.id_obra AND EXISTS(SELECT DISTINCT k.keyword
FROM Keywords k
WHERE k.id_keyword = er.id_keyword AND k.keyword LIKE '%Barcelona%'));
What it basically does is it searches for every movie in my database which is related in some way to the city it gets. I wanted to have a third column showing for every result, with the reason the row is showing as a result (for example: TITLE CONTAINS IT, or ACTOR FROM THE MOVIE BORN THERE, etc.)
Thank you for your patience and help!
EDIT: As suggested, here are some examples of output. The column should show just the first cause related to the movie:
TITULO FECHA_ESTRENO CAUSE
---------- ---------------- ----------
Barcelona mia 1967 TITLE
https://www.postgresql.org/docs/7.4/static/functions-conditional.html
The SQL CASE expression is a generic conditional expression, similar
to if/else statements in other languages:
CASE WHEN condition THEN result
[WHEN ...]
[ELSE result]
END
CASE clauses can be used wherever an expression is valid. condition is an expression that returns a boolean result. If
the result is true then the value of the CASE expression is the result
that follows the condition. If the result is false any subsequent WHEN
clauses are searched in the same manner. If no WHEN condition is true
then the value of the case expression is the result in the ELSE
clause. If the ELSE clause is omitted and no condition matches, the
result is null.
Example for your case:
SELECT (CASE WHEN EXISTS(... l.nombre LIKE '%Barcelona%') THEN 'TITLE CONTAINS IT' WHEN <conditon for actor> THEN 'ACTOR WA BORN THERE' WHEN ... END) as reason
Here is one solution.
Create a subquery for each search condition.
include the reason in the subqueries' projections
outer join the subqueries so it doesn't matter which one hist
filter to make sure that at least one of your subqueries has a positive result
use coalesce() to get one reason.
I haven't done all your conditions, and I've probably mangled your logic but this is the general idea:
SELECT o.titulo
, o.fecha_estreno
, coalesce(t1.reason, t2.reason) as reason
FROM Obra o
left outer join ( select id_obra, 'title contains it' as reason
from Obra
where titulo LIKE '%Barcelona%' ) t1
on t1.id_obra o.id_obra
left outer join ( select distinct pa.id_obra , 'takes place there' as reason
from Participa pa
join TieneLugar tl
on tl.id_profesional = pa.id_profesional
join Lugar l
on tl.id_lugar = l.id_lugar
where l.nombre LIKE '%Barcelona%' ) t2
on t2.id_obra o.id_obra
WHERE t1.id_obra is not null
or t2.id_obra is not null
/
coalesce() just returns the first non-null value which means you won't see multiple reasons if you get more than one hit. So order the arguments to put the most powerful reasons first.
Also, you should consider consider using Oracle Text. It's the smartest way to wrangle this sort of keyword searching. Find out more.

Access SQL query without duplicate results

I made a query and wanted to not have any duplicates but i got some times 3 duplicates and when i used DISTINCT or DISTINCTROW i got only 2 duplicates.
SELECT f.flight_code,
f.status,
a.airport_name,
a1.airport_name,
f.departing_date+f.departing_time AS SupposedDepartingTime,
f.landing_date+f.landing_time AS SupposedLandingTime,
de.actual_takeoff_date+de.actual_takeoff_time AS ActualDepartingTime,
SupposedLandingTime+(ActualDepartingTime-SupposedDepartingTime) AS ActualLandingTime
FROM
(((Flights AS f
LEFT JOIN Aireports AS a
ON a.airport_code = f.depart_ap)
LEFT JOIN Aireports AS a1
ON f.target_ap = a1.airport_code)
LEFT JOIN Irregular_Events AS ie
ON f.flight_code = ie.flight_code)
LEFT JOIN Delay_Event AS de
ON ie.IE_code = de.delay_code;
had to use LEFT JOIN because when i used INNER JOIN i missed some of the things i wanted to show because i wanted to see all the flights and not only the flights that got delayed or canceled.
This is the results when i used INNER JOIN, you can see only the flights that have the status "ביטול" or "עיכוב" and that is not what i wanted.
[the results with LEFT JOIN][2]
[2]: https://i.stack.imgur.com/cgE2G.png
and when i used DISTINCT where you see the rows with the NUMBER 6 on the first column it appear only two times
IMPORTANT!
I just checked my query and all the tables i use there and i saw my problem but dont know how to fix it!
in the table Irregular_Events i have more the one event for flights 3,6 and 8 and that is why when i use LEFT JOIN i see more even thou i use distinct, please give me some help!
Not entirely sure without seeing the table structure, but this might work:
SELECT f.flight_code,
f.status,
a.airport_name,
a1.airport_name,
f.departing_date+f.departing_time AS SupposedDepartingTime,
f.landing_date+f.landing_time AS SupposedLandingTime,
de.actual_takeoff_date+de.actual_takeoff_time AS ActualDepartingTime,
SupposedLandingTime+(ActualDepartingTime-SupposedDepartingTime) AS ActualLandingTime
FROM
((Flights AS f
LEFT JOIN Aireports AS a
ON a.airport_code = f.depart_ap)
LEFT JOIN Aireports AS a1
ON f.target_ap = a1.airport_code)
LEFT JOIN
(
SELECT
ie.flight_code,
de1.actual_takeoff_date,
de1.actual_takeoff_time
FROM
Irregular_Events ie
INNER JOIN Event AS de1
ON ie.IE_code = de1.delay_code
) AS de
ON f.flight_code = de.flight_code
It is hard to tell what is the problem with your query without any sample of the output, and without any description of the structure of your tables.
But your problem is that your are querying from the flights table, which [I assume] can be linked to multiple irregular_events, which can possibly also be linked to multiple delay_event.
If you want to get only one row per flight, you need to make sure your joins return only one row too. Maybe you can do it by adding one more condition to the join, or by adding a condition in a sub-query.
EDIT
You could try to add a GROUP BY to the query:
GROUP BY
f.flight_code,
f.status,
a.airport_name,
a1.airport_name;

COUNT is outputting more than one row

I am having a problem with my SQL query using the count function.
When I don't have an inner join, it counts 55 rows. When I add the inner join into my query, it adds a lot to it. It suddenly became 102 rows.
Here is my SQL Query:
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'
GROUP BY [fmsStage].[dbo].[File].[FILENUMBER]
Also, I have to do TOP 1 at the SELECT statement because it returns 51 rows with random numbers inside of them. (They are probably not random, but I can't figure out what they are.)
What do I have to do to make it just count the rows from [fmsStage].[dbo].[file].[FILENUMBER]?
First, your query would be much clearer like this:
SELECT COUNT(f.[FILENUMBER])
FROM [fmsStage].[dbo].[File] f INNER JOIN
[fmsStage].[dbo].[Container] c
ON v.[FILENUMBER] = c.[FILENUMBER]
WHERE f.[RELATIONCODE] = 'SHIP02' AND
c.DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08';
No GROUP BY is necessary. Otherwise you'll just one row per file number, which doesn't seem as useful as the overall count.
Note: You might want COUNT(DISTINCT f.[FILENUMBER]). Your question doesn't provide enough information to make a judgement.
Just remove GROUP BY Clause
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'

left join excludes nulls and takes left value

SELECT p.ticket AS posted,
e.ticket AS settled,
Sum(e.amount)
FROM post AS p
LEFT JOIN settle AS e
ON p.ticket = e.ticket
WHERE p.date = '2016-05-10 00:00:00.000'
GROUP BY p.pticket,
e.eticket
ORDER BY posted
I understand that the grouping or where is the culprit but I've tried so many variations, the rows for the two tables are :
(Table1=Table2)
(total = (item + tax= total))
So the second table has 2 rows that I sum. I need the date because it has to much info and I've tried "is null" in dates and in other places but can't get this right. Instead of null, it shows the value of the left table as if they match.
So I figured out! Just love this site and wanted to leave input for whoever runs into this, please correct me if my information is not accurate as I would like to explain in detail but basically, from what I understand and from SQL Fundamentals, nulls are considered outer rows and therefore I would need a "Left Outer Join":
SELECT p.ticket AS posted,
e.ticket AS settled,
Sum(e.amount)
FROM post AS p
LEFT JOIN settle AS e
ON p.ticket = e.ticket
WHERE p.date = '2016-05-10 00:00:00.000'
GROUP BY p.pticket,
e.eticket
ORDER BY posted
A good rule that stood out for me and helps me check my joins with nulls is that the Where clause should only have conditions to the table where the nulls will not be included, so you keep the outer rows, with the exception of "is null". You could use that with a column on the non-preserved table and still get nulls.