Left join statement returning unexpected null values for rows/column that have data - sql

I have two tables that I'm writing a query against. Some of the columns can be found in one of the tables, while some of the columns are calucalted.
For clarity, I will copy my query below:
select field_a,
cast(field_b as int),
field_c,
field_d,
Year,
coalesce( cast(field_e as float),0) America_spend,
sum( cast(field_e as float), 0) as America_spend,
coalesce( cast(field_e as float)/ sum( cast( field_e as float)) over(partition by Year) as total_spend
from table_a
left join table_b on
table_a.flield_a = table_b.field_a1 and
table_a.flield_b = table_b.field_b1 and
table_a.Year = table_b.Year
group by field_a,
field_b,
Year
I have tables that look like this
table a:
|field_a|field_b|field_c|field_d|Year|field_f|field_g|field_h
|data | 1 | data | data |2014| data | data | data
|data | 1 | data | data |2014| null | data | data
|0 | 1 | data | data |2014| data | data | data
|data | 1 | data | data |2014| null | data | data
|0 | 1 | data | data |2014| data | data | data
table b:
|field_a1|field_b1|Year|field_c1|field_j
|null | 1 |2014| data | data
|data | 1 |2015| data | data
|null | 0 |2014| data | data
|data | 1 |2015| data | data
|null | 0 |2014| data | data
The problem that I'm having is that some of the values in the 'total spend column' get assigned a value of null. Total spend is calculated per year and this field should never be null. Likewise, the year column doesn't contain a null value in either of the tables. But for some reason when I run the query, I get results that have some of the rows in the year column with a null value. This should never happen. Most of the results conform to what I would expect, but there are some that do not.
I'm guessing that is has something to do with the fact that some of the rows in field_b are null and get converted to 0, but why does this matter?
I updated the tables and the queries to more accurately reflect the structure of the database.
Yes the query runs and I have no naming conflicts.

#SeanLange's comment is what is most likely the issue with your expected results. That is that NULL does NOT Equal NULL (NULL <> NULL). Null is an "unknown" value in sql and 2 unknowns are not equal.
But you can eliminate your NULL if you want to match them together as the same case. Simply use COALESCE() or ISNULL() and provide the same default value on both sides of your ON condition and make sure your default is not represented within your dataset or you will get undesired results.
DECLARE #TableA AS TABLE (FieldA VARCHAR(5),FiledB INT,Yr INT)
DECLARE #TableB AS TABLE (FieldA VARCHAR(5),FiledC INT,Yr INT)
INSERT INTO #TableA (FieldA,FiledB,Yr)
VALUES (null,1,2014),('data',1,2015),(null,1,2014),('data',1,2015),(null,1,2014)
INSERT INTO #TableB (FieldA,FiledC,Yr)
VALUES (null,1,2014),('data',1,2015),(null,1,2014),('data',1,2015),(null,1,2014)
SELECT *
FROM
#TableA a
LEFT JOIN #TableB b
ON COALESCE(a.FieldA,'NULLVALUE') = COALESCE(b.FieldA,'NULLVALUE')
AND a.Yr = b.Yr
Your particular example dataset that you provided us repeats FieldA to Yr Combinations so the results are a little funky but it still works.

Related

HQL, insert two rows if a condition is met

I have the following table called table_persons in Hive:
+--------+------+------------+
| people | type | date |
+--------+------+------------+
| lisa | bot | 19-04-2022 |
| wayne | per | 19-04-2022 |
+--------+------+------------+
If type is "bot", I have to add two rows in the table d1_info else if type is "per" i only have to add one row so the result is the following:
+---------+------+------------+
| db_type | info | date |
+---------+------+------------+
| x_bot | x | 19-04-2022 |
| x_bnt | x | 19-04-2022 |
| x_per | b | 19-04-2022 |
+---------+------+------------+
How can I add two rows if this condition is met?
with a Case When maybe?
You may try using a union to merge or duplicate the rows with bot. The following eg unions the first query which selects all records and the second query selects only those with bot.
Edit
In response to the edited question, I have added an additional parity column (storing 1 or 0) named original to differentiate the duplicate entry named
SELECT
p1.*,
1 as original
FROM
table_persons p1
UNION ALL
SELECT
p1.*,
0 as original
FROM
table_persons p1
WHERE p1.type='bot'
You may then insert this into your other table d1_info using the above query as a subquery or CTE with the desired transformations CASE expressions eg
INSERT INTO d1_info
(`db_type`, `info`, `date`)
WITH merged_data AS (
SELECT
p1.*,
1 as original
FROM
table_persons p1
UNION ALL
SELECT
p1.*,
0 as original
FROM
table_persons p1
WHERE p1.type='bot'
)
SELECT
CONCAT('x_',CASE
WHEN m1.type='per' THEN m1.type
WHEN m1.original=1 AND m1.type='bot' THEN m1.type
ELSE 'bnt'
END) as db_type,
CASE
WHEN m1.type='per' THEN 'b'
ELSE 'x'
END as info,
m1.date
FROM
merged_data m1
ORDER BY m1.people,m1.date;
See working demo db fiddle here
I think what you want is to create a new table that captures your logic. This would simplify your query and make it so you could easily add new types without having to edit logic of a case statement. It may also make it cleaner to view your logic later.
CREATE TABLE table_persons (
`people` VARCHAR(5),
`type` VARCHAR(3),
`date` VARCHAR(10)
);
INSERT INTO table_persons
VALUES
('lisa', 'bot', '19-04-2022'),
('wayne', 'per', '19-04-2022');
CREATE TABLE info (
`type` VARCHAR(5),
`db_type` VARCHAR(5),
`info` VARCHAR(1)
);
insert into info
values
('bot', 'x_bot', 'x'),
('bot', 'x_bnt', 'x'),
('per','x_per','b');
and then you can easily do a join:
select
info.db_type,
info.info,
persons.date date
from
table_persons persons inner join info
on
info.type = persons.type

SQLite Create Table and Insert & Update Value

When I am studying at Udemy SQL course, I am stuck on the below problem. Could you help me how to solve? It is very difficult for me.
Here's the question:
The goal of this exercise is to find out whether each one of the 4 numeric attributes in TemporalData is
positively or negatively correlated with sales.
Requirements:
Each example in the sample consists of a pair (Store, WeekDate). This means that Sales data must be summed across all departments before the correlation is computed.
Return a relation with schema (AttributeName VARCHAR(20), CorrelationSign Integer).
The values of AttributeName can be hardcoded string literals, but the values
of CorrelationSign must be computed automatically using SQL queries over
the given database instance.
You can use multiple SQL statements to compute the result. It might be of help to create and update your own tables.
Here's the sales table:
| store | dept | weekdate | weeklysales |
|1 | 1 | 2010-07-01| 2400.0 |
|2 | 1 | 2010-07-02| 40.0 |
Here's the temporaldata table:
| store | weekdate | temperature | fuelprice | cpi | unemploymentrate|
| 1 |2010-07-01| 100.14. | 2.981 | 125.1222 | 9.402
| 2 |2010-07-02| 99.13 | 3.823 | 129.2912 | 14.81
So, I wrote code like below. Could you give how to solve this question.
DROP TABLE IF EXISTS CorrelationGroup
CREATE TABLE CorrelationGroup(
attributName VARCHAR(30),
CorrelationSign INTEGER
);
SELECT SUM(weeklysales)
FROM temporaldate t, sales s
WHERE s.weekdate = t.weekdate AND s.store = t.store ;
SELECT AVG(t.temperature) as 'temp', AVG(t.fuelprice) as 'fuel', AVG(t.cpi) as 'fuel', AVG(t.unemploymentrate) as'unemploymentrate
FROM sales s, temporaldate t
WHERE s.store = t.store AND s.weekdate = t.weekdate;
INSERT INTO CorrelationGroup VALUES ('Temp', 'The Value Of CorrelationSign');
-- 3 more

Use IN to compare Array of Values against a table of data

I want to compare an array of values against the the rows of a table and return only the rows in which the data are different.
Suppose I have myTable:
| ItemCode | ItemName | FrgnName |
|----------|----------|----------|
| CD1 | Apple | Mela |
| CD2 | Mirror | Specchio |
| CD3 | Bag | Borsa |
Now using the SQL instruction IN I would like to compare the rows above against an array of values pasted from an excel file and so in theory I would have to write something like:
WHERE NOT IN (
ARRAY[CD1, Apple, Mella],
ARRAY[CD2, Miror, Specchio],
ARRAY[CD3, Bag, Borsa]
)
The QUERY should return rows 1 and 2 "MELLA" and "MIROR" are in fact typos.
You could use a VALUES expression to emulate a table of those arrays, like so:
... myTable AS t
LEFT JOIN (
VALUES (1, 'CD1','Apple','Mella')
, (1, 'CD2', 'Miror', 'Specchio')
, (1, 'CD3', 'Bag', 'Borsa')
) AS v(rowPresence, a, b, c)
ON t.ItemCode = v.a AND t.ItemName = v.b AND t.FrgnName = v.c
WHERE v.rowPresence IS NULL
Technically, in your scenario, you can do without the "rowPresence" field I added, since none of the values in your arrays are NULL any would do; I basically added it to point to a more general case.

Join two tables - One common column with different values

I have been searching around for how to do this for days - unfortunately I don't have much experience with SQL Queries, so it's been a bit of trial and error.
Basically, I have created two tables - both with one DateTime column and a different column with values in.
The DateTime column has different values in each table.
So...
ACOQ1 (Table 1)
===============
| DateTime | ACOQ1_Pump_Running |
|----------+--------------------|
| 7:14:12 | 1 |
| 8:09:03 | 1 |
ACOQ2 (Table 2)
===============
| DateTime | ACOQ2_Pump_Running |
|----------+--------------------|
| 3:54:20 | 1 |
| 7:32:57 | 1 |
I want to combine these two tables to look like this:
| DateTime | ACOQ1_Pump_Running | ACOQ2_Pump_Running |
|----------+--------------------+--------------------|
| 3:54:20 | 0 OR NULL | 1 |
| 7:14:12 | 1 | 0 OR NULL |
| 7:32:57 | 0 OR NULL | 1 |
| 8:09:03 | 1 | 0 OR NULL |
I have achieved this by creating a third table that 'UNION's the DateTime column from both tables and then uses that third table's DateTime column for the new table but was wondering if there was a way to skip this step out.
(Eventually I will be adding more and more columns on from different tables and don't really want to be adding yet more processing time by creating a joint DateTime table that may not be necessary).
My working code at the moment:
CREATE TABLE JointDateTime
(
DateTime CHAR(50)
CONSTRAINT [pk_Key3] PRIMARY KEY (DateTime)
);
INSERT INTO JointDateTime (DateTime)
SELECT ACOQ1.DateTime FROM ACOQ1
UNION
SELECT ACOQ2.DateTime FROM ACOQ2
SELECT JointDateTime.DateTime, ACOQ1.ACOQ1_NO_1_PUMP_RUNNING, ACOQ2.ACOQ2_NO_1_PUMP_RUNNING
FROM (SELECT ACOQ1.DateTime FROM ACOQ1
UNION
SELECT ACOQ2.DateTime FROM ACOQ2) JointDateTime
LEFT OUTER JOIN ACOQ1
ON JointDateTime.DateTime = ACOQ1.DateTime
LEFT OUTER JOIN ACOQ2
ON JointDateTime.DateTime = ACOQ2.DateTime
You need a plain old FULL OUTER JOIN like this.
SELECT COALESCE(A1.DateTime,A2.DateTime) DateTime,ACOQ1_Pump_Running, ACOQ2_Pump_Running
FROM ACOQ1 A1
FULL OUTER JOIN ACOQ2 A2
ON A1.DateTime = A2.DateTime
This will give you NULL for ACOQ1_Pump_Running, ACOQ2_Pump_Running for rows which do not match the date in the corresponding table. If you need 0 just use COALESCE or ISNULL.
Side Note: : In your script, I can see your are using DateTime CHAR(50). Please use appropriate types

Append a zero to value if necessary in SQL statement DB2

I have a complex SQL statement that I need to match up two table based on a join. The the intial part of the complex query has a location number that is stored in a table as a Smallint and the second table has the Store number stored as a CHAR(4). I have been able to cast the smallint to a char(4) like this:
CAST(STR_NBR AS CHAR(4)) AND LOCN_NBR
The issue is that because the Smallint suppresses the leading '0' the join returns null values from the right hand side of the LEFT OUTER JOIN.
Example
Table set A(Smallint) Table Set B (Char(4))
| 96 | | 096 |
| 97 | | 097 |
| 99 | | 099 |
| 100 | <- These return -> | 100 |
| 101 | <- These return -> | 101 |
| 102 | <- These return -> | 102 |
I need to add make it so that they all return, but since it is in a join statement how do you append a zero to the beginning and in certain conditions and not in others?
SELECT RIGHT('0000' || STR_NBR, 4)
FROM TABLE_A
Casting Table B's CHAR to tinyint would work as well:
SELECT ...
FROM TABLE_A A
JOIN TABLE_B B
ON A.num = CAST(B.txt AS TINYINT)
Try LPAD function:
LPAD(col,3,'0' )
I was able to successfully match it out to obtain a 3 digit location number at all times by doing the following:
STR_NBR was originally defined as a SmallINT(2)
LOCN_NO was originally defined as a Char(4)
SELECT ...
FROM TABLE_A AS A
JOIN TABLE_B AS B
ON CAST(SUBSTR(DIGITS(A.STR_NBR),3,3)AS CHAR(4)) = B.LOCN_NO