Postgres Table Join Added Additional Rows - sql

I noticed that the number of rows has increased after running the queries below.
The table data.consolidated_billing_journal has about 15,883,000 rows but the result of this query was 15,900,000 rows.
I had taken care such that all joined tables cluster_profit_center(profit_center), cname_fixed_match(cname) are joined on their primary keys, hence no duplicate.
SELECT l.*, cl0, cl1, cl2, cl_tag, std_cname
FROM data.consolidated_billing_journal l
LEFT JOIN ref.cluster_profit_center cpc
ON l.cluster_profit_center = cpc.profit_center
LEFT JOIN cname.cname_fixed_match cfm
ON l.cust_acc_name = cfm.cname OR l.billing_acc_name = cfm.cname
What could be the reason where additional rows were added? I'm imagining it to be similar to Excel VLOOKUP where the fact table remains with fixed number of rows.

Related

Outputting the data from several sql tables without having a common value

I have a select query which combined several tables. PRODUCTION_ORDER_RESULTS, PRODUCTION_ORDERS and SERVICE_GUARANTY_NEW have common value however STOCKS table does not.
SELECT PR_ORDERS.ARRIVED_CITY,
PR_ORDERS.MONTAJ_DATE,
PR_ORDER_RESULT.TRANSFER_DATE,
PR_ORDERS.P_ORDER_ID,
PR_ORDER_RESULT.P_ORDER_ID,
SG.SALE_CONSUMER_ID,
SG.IS_SERI_SONU,
S.BRAND_ID,
S.PROPERTY
FROM workcube_test_1.PRODUCTION_ORDER_RESULTS PR_ORDER_RESULT,
workcube_test_1.PRODUCTION_ORDERS PR_ORDERS,
workcube_test_1.SERVICE_GUARANTY_NEW SG,
workcube_test_1.STOCKS S
WHERE PR_ORDER_RESULT.P_ORDER_ID = PR_ORDERS.P_ORDER_ID
AND PR_ORDER_RESULT.PR_ORDER_ID = SG.PROCESS_ID
when I run the query, it shows the output as below.
The problem here is there are four data rows returned from PRODUCTION_ORDER_RESULTS, PRODUCTION_ORDERS, SERVICE_GUARANTY_NEW and once I have added the STOCKS table, arrived_city, montaj_date, transfer_date columns are side by side with STOCKS table's rows, but the columns value should be null, not filled with data.
The way I tried is UNION of STOCKS table, however unioned table values are ignored, can not use them in html blocks.
there needs to be at least one more join condition among tables where there's for STOCKS table, I think there might exist such a column STOCK_ID within a table such as PRODUCTION_ORDER_RESULTS in order to join with STOCKS table. I think this should be the reason for multiple returning rows. If there's no common column, then the returning data will be produced as many as the number of records within STOCK table due to existing CROSS JOIN logic within the current query. So rearrange your query as
SELECT PR_ORDERS.ARRIVED_CITY,
PR_ORDERS.MONTAJ_DATE,
PR_ORDER_RESULT.TRANSFER_DATE,
PR_ORDERS.P_ORDER_ID,
PR_ORDER_RESULT.P_ORDER_ID,
SG.SALE_CONSUMER_ID,
SG.IS_SERI_SONU,
S.BRAND_ID,
S.PROPERTY
FROM workcube_test_1.PRODUCTION_ORDER_RESULTS PR_ORDER_RESULT
JOIN workcube_test_1.PRODUCTION_ORDERS PR_ORDERS
ON PR_ORDER_RESULT.P_ORDER_ID = PR_ORDERS.P_ORDER_ID
JOIN workcube_test_1.SERVICE_GUARANTY_NEW SG
ON PR_ORDER_RESULT.PR_ORDER_ID = SG.PROCESS_ID
JOIN workcube_test_1.STOCKS S
ON PR_ORDER_RESULT.STOCK_ID = S.ID

Joining 3 tables in Sqlite and not receiving expected output

I understand similar question have been posted, however my issue isn't an error rather the lack of the desired result. I'm trying to join 3 tables each with 10,000 observations and combine them in the one table, however when i use inner join the observations reduce to little over 4000. I understand that INNER JOIN is essentially an intersection but I'm expecting 10,000 observations and based on my code I don't see how that is occurring. Here is my code:
SELECT *
FROM Characteristics
INNER JOIN Prices ON Prices.pid = Characteristics.pid
INNER JOIN Locations ON Locations.tid = Characteristics.tid
;
CHARACTERISTICS
||Property_Id|| ||Beds|| ||Baths|| ||Type_ID||
PRICES
||Price|| ||Year|| ||Property_ID||
LOCATIONS
||Type_ID|| ||X coord|| ||Y coord||
Those are representative of the tables I didn't include numbers because of formatting issues but as you can imagine the number contained in Property_id and Type_id are the same for all columns regardless of table, what i would like is one table with each of the respective columns containing 10,000 rows, i've checked for NA values on R and they're all of the same length.
If you want to keep all characteristics -- even when there are no matches in the other tables -- then use left join:
SELECT *
FROM Characteristics c LEFT JOIN
Prices p
ON p.pid = c.pid LEFT JOIN
Locations l
ON l.tid = c.tid;

Joining two tables to update results in third table using like

I am trying to join two tables and update the results in the third table.
So table A is the results table and it has the columns customer number and score.
Table B has customer number and ind_code and table C has ind_code and ind_score.
So the output of the query should be such that the ind_code in table B and C should join together based on the first two digits and ind_score should be updated in Table A in the score column. Table A and Table B should be joined on the basis of customer number.
Could anyone please help. I tried multiple queries but nothing seems to work. i am using oracle sql developer
Generally, the JOIN operation mustn't cut field information but if your structure (for me not correct) is that ...
If I understand better:
UPDATE TableA
SET score =
(SELECT MAX(C.ind_score)
FROM TableC C
JOIN TableB B
ON C.ind_code = SUBSTRING(B.ind_code, 1, 2)
WHERE B.customernumber = TableA.customernumber)
I use on subquery MAX aggregate function, because I don't know if your cut of ind_code of TableB can be not unique (i.e. you have ind_code 5555 and 5554)

Full outer join joining together every record multiple times

Query below:
select
cu.course_id as 'bb_course_id',
cu.user_id as 'bb_user_id',
cu.role as 'bb_role',
cu.available_ind as 'bb_available_ind',
CASE cu.row_status WHEN 0 THEN 'ENABLED' ELSE 'DISABLED' END AS 'bb_row_status',
eff.course_id as 'registrar_course_id',
eff.user_id as 'registrar_user_id',
eff.role as 'registrar_role',
eff.available_ind as 'registrar_available_ind',
CASE eff.row_status WHEN 'DISABLE' THEN 'DISABLED' END as 'registrar_row_status'
into enrollments_comparison_temp
from narrowed_users_enrollments cu
full outer join enrollments_feed_file eff on cu.course_id = eff.course_id
Quick background: I'm taking the data from a replicated table and selecting it into narrowed_users_enrollments based on some criteria. In a script I'm taking a text feed file, with enrollment data, and inserting it into enrollments_feed_file. The purpose is to compare the most recent enrollment data with enrollments already in the database.
However the issue is that joining these tables results in about 160,000 rows when I'm really only expecting about 22,000. The point of doing this comparison is so that I can look for nulled values on either side of the join. For example, if the table on the right contains a null, then disable the enrollment record. If the table on the left contains a null, then add this student's enrollment.
I know it's a little off because I'm not using PKs or FKs. This is what is selected into the table:
Here's a screenshot showing a select * from the enrollments table on the left and a feed file on the right.
http://i.imgur.com/0ZPZ9HS.png
Here's a screenshot showing the newly created table from the full outer join.
http://i.imgur.com/89ssAkS.png
As you can see even though there there's only one matching enrollment(the matching jmartinez12 columns), there's 4 extra rows created for the same record on the left for the enrollments on the right. What I'm trying to get is for it to be 5 rows, with the first being how it is in the screenshot(matching pre-existing enrollment and enrollment in the feed file), BUT, the next 4 rows with the bb_* columns should be NULL up to the registrar_course_id.
Am I overlooking something simple here? I've tried a select distinct and I've added a where clause specifying when the course_ids are equal however that ensures that I won't get null rows which I need. I have also joined the tables on the user_id however the results are still the same.
One quick suggestion is to add the DISTNCT clause. If the records you are setting are complete duplicates that may cut it down to what you are expecting.
The fix was to also join on:
ON cu.course_id = eff.course_id AND cu.user_id = eff.user_id

SQL Inner Outer Join confusion

I'm having a little confusion with a certain example.
I am supposed to list all orders and their corresponding details.
This is what I'm doing:
SELECT *
FROM Orders
LEFT OUTER JOIN [Order Details]
ON Orders.OrderID=[Order Details].OrderID;
This gives the number of rows = 2155.
Now the problem is, the number of rows in Orders Table is 830...how can left outer join create more rows ?
By definition of left outer join, all the rows from the left table are taken and matching records from the second table are added.
I checked the number of rows in the Order Details table..that is 2155.
Why is left outer join using all rows from Order Details table?
LEFT JOIN takes all the details from the table you define on the left side of the join and match records from the right table.
If there's no match, all columns of the right table have NULL values.
If there's a match, all matching records from the right table are returned. If your relationship is 1-to-many (as in your case), it means that there may be more than one record returned from the right table for each record in the left table.
LEFT OUTER JOIN will match all records in the right table, just as an INNER JOIN will. The difference is a LEFT JOIN will preserve records from the left table with no match in the right table.
In this scenario, all records in [ORDER DETAILS] have a corresponding entry in ORDERS, which is why the total number of records matches the number of rows in ORDER DETAILS
Based on the table descriptions, this is exactly what you want. Having an ORDER DETAIL without an ORDER would be a much more serious issue.