SQL query malfunction - sql

So Im trying to use INNER JOIN in my sql command because I am trying to replace the Foreign keys ID numbers with the text value of each column. However, when I use INNER JOIN, the column for "Standards" always gives me the same value. The following is what I started with
SELECT Grade_Id, Cluster_Eng_Id, Domain_Math_Eng_Id, Standard
FROM `math_standards_eng`
WHERE 1
and returns this (which is good). Notice the value of Standard values are different
Grade_Id Cluster_Eng_Id Domain_Math_Eng_Id Standard
103 131 107 Explain equivalence of fractions in special cases...
104 143 105 Know relative sizes of measurement units within o...
When I try to use Inner Join, the values for Grade_Id, Cluster_Eng_Id, and Domain_Math_Eng_Id are changed from numbers to actual text. Standard column values, however, seems to return the same value. Here is my code:
SELECT
grades_eng.Grade, domain_math_eng.Domain, cluster_eng.Cluster,
math_standards_eng.Standard
FROM
math_standards_eng
INNER JOIN
grades_eng ON math_standards_eng.Grade_Id = grades_eng.Id
INNER JOIN
domain_math_eng ON math_standards_eng.Domain_Math_Eng_Id
INNER JOIN
cluster_eng ON math_standards_eng.Cluster_Eng_Id
This is what I get when I run the query:
Grade Domain Cluster Standard
3rd Counting and cardinality Know number names and the count sequence Explain equivalence of fractions in special cases...
3rd Expressions and Equations Know number names and the count sequence Explain equivalence of fractions in special cases...
3rd Functions Know number names and the count sequence Explain equivalence of fractions in special cases.
4th Counting and cardinality Know number names and the count sequence Know relative sizes of measurement units within o...
4th Expressions and Equations Know number names and the count sequence Know relative sizes of measurement units within o...
The text value for Standard keeps on showing the same value per grade and I do not know why. 3rd Will keep showing the same thing, and then the next grade will change to a new value and repeat over and over. Lastly, each table has a 1:M relationship with standard as they each appear multiple times in the standard Table. Any advice would be greatly appreciated.

You are missing the = part of your INNER JOIN on domain_math_eng and cluster_eng. I would expect something like:
SELECT grades_eng.Grade, domain_math_eng.Domain, cluster_eng.Cluster, math_standards_eng.Standard FROM math_standards_eng
INNER JOIN grades_eng ON math_standards_eng.Grade_Id = grades_eng.Id
INNER JOIN domain_math_eng ON math_standards_eng.Domain_Math_Eng_Id = domain_math_eng.Id
INNER JOIN cluster_eng ON math_standards_eng.Cluster_Eng_Id = cluster_eng.Id

Related

SQL Join on Not Exact Numbers

I am trying to match up a table based on two 'unique' identifiers. First one is fine and is a string text that doesnt chagne. There is multiple lines of this first variable which is why I need a second variable to match over. The issue i have is that for variable B which is a decimal number it can very minorly change. So 90% of them will match exact but there might be instances where i am trying to maytch 1.97 to 1.96 for example which leaves me with missing values. Any ideas of a work around?
need some ideas.......
For approximately join on numeric values you can use something like next query:
select *
from a
join b on (a.val/b.val) between 0.99 and 1.01;
Look live test on https://sqlize.online/sql/psql15/db4d0e6bcc5b44e8bfc3b2bc252d567d/
The above query join numbers with +- 1% accuracy :)

Creating a view that contains all records from one table, that match the comma separated field content in another table

I have two tables au_postcodes and groups.
Table groups contains a field called PostCodeFootPrint
that contains the postcode set making up the footprint.
Table au_postcodes contains a field called poa_code that
contains a single postcode.
The records in groups.PostCodeFootPrint look like:
PostCodeFootPrint
2529,2530,2533,2534,2535,2536,2537,2538,2539,2540,2541,2575,2576,2577,2580
2640
3844
2063, 2064, 2065, 2066, 2067, 2068, 2069, 2070, 2071, 2072, 2073, 2074, 2075, 2076, 2077, 2079, 2080, 2081, 2082, 2083, 2119, 2120, 2126, 2158, 2159
2848, 2849, 2850, 2852
Some records have only one postcode, some have multiple separated by a "," or ", " (comma and space).
The records in au_postcode.poa_code look like:
poa_code
2090
2092
2093
829
830
836
2080
2081
Single postcode (always).
The objective is to:
Get all records from au_postcode, where the poa_code appears in groups.*PostCodeFootPrint into a view.
I tried:
SELECT
au_postcodes.poa_code,
groups."NameOfGroup"
FROM
groups,
au_postcodes
WHERE
groups."PostcodeFootprint" LIKE '%au_postcodes.poa_code%'
But no luck
You can use regex for this. Take a look at this fiddle:
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=739592ef262231722d783670b46bd7fa
Where I form a regex from the poa_code and the word boundary (to avoid partial matches) and compare that to the PostCodeFootPrint.
select p.poa_code, g.PostCodeFootPrint
from groups g
join au_postcode p
on g.PostCodeFootPrint ~ concat('\y', p.poa_code, '\y')
Depending on your data, this may be performant enough. I also believe that in postGres you have access to the array data type, and so it might be better to store the post code lists as arrays.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=ae24683952cb2b0f3832113375fbb55b
Here I stored the post code lists as arrays, then used ANY to join with.
select p.poa_code, g.PostCodeFootPrint
from groups g
join au_postcode p
on p.poa_code = any(g.PostCodeFootPrint);
In these two fiddles I use explain to show the cost of the queries, and while the array solution is more expensive, I imagine it might be easier to maintain.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=7f16676825e10625b90eb62e8018d78e
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=e96e0fc463f46a7c467421b47683f42f
I changed the underlying data type to integer in this fiddle, expecting it to reduce the cost, but it didn't, which seems strange to me.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=521d6a7d0eb4c45471263214186e537e
It is possible to reduce the query cost with the # operator (see the last query here: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=edc9b07e9b22ee72f856e9234dbec4ba):
select p.poa_code, g.PostCodeFootPrint
from groups g
join au_postcode p
on (g.PostCodeFootPrint # p.poa_code) > 0;
but it is still more expensive than the regex. However, I think you can probably rearrange the way the tables are set up and radically change performance. See the first and second queries in the fiddle, where I take each post code in the footprint and insert it as a row in a table, along with an identifier for the group it was in:
select p.poa_code, g.which
from groups2 g
join au_postcode p
on g.footprint = p.poa_code;
The explain plan for this indicates that query cost drops significantly (from 60752.50 to 517.20, or two orders of magnitude) and the execution times go from 0.487 to 0.070. So it might be worth looking into changing the table structure.
Since the values of PostCodeFootPrint are separated by a common character, you can easily create an array out of it. From there use unnest to convert the array elements to records, and then join then with au_postcode:
SELECT * FROM au_postcode au
JOIN (SELECT trim(unnest(string_to_array(PostCodeFootPrint,',')))
FROM groups) fp (PostCodeFootPrint) ON fp.PostCodeFootPrint = au.poa_code;
Demo: db<>fiddle

Query using COUNT returns records where the count is positive only

Good day everyone.
Consider this portion of a relational SQLite database:
floors(number) - rooms(number, #floorNumber)
I aim to query for the number of rooms per floor. This is my attempt:
select floors.number, count(rooms.floornumber)
from floors, rooms where floors.number=rooms.floornumber
group by floors.number, rooms.floornumber;
Example:
1|5
2|7
3|5
4|3
The issue is that I also would like the query to return records where the floor contains 0 rooms (for example floor number 5 exists in the "floors" table but isn't shown in the query result).
Your assistance is appreciated. Thank you.
Never use commas in the FROM clause. Always use proper, explicit JOIN syntax.
You need a LEFT JOIN, but you cannot even see what you need because of the way that your query is written.
select f.number, count(r.floornumber)
from floors f left join
rooms r
on f.number = r.floornumber
group by f.number;

sum 'distinct' rows with same values

I have a database which has a feeder that may have several distributors, each which may have several transformers, each which may have several clients and a certain kVA (power that gets to the clients).
And I have the following code:
SELECT f.feeder,
d.distributor,
count(DISTINCT t.transformer) AS total_transformers,
sum(t.Kvan) AS Total_KVA,
count(c.client) AS Clients,
FROM feeders f
LEFT JOIN distributors d
ON (d.feeder = f.feeder)
LEFT JOIN transformers t
ON (t.transformer = d.transformer)
LEFT JOIN clients c
ON (c.transformer = t.transformer)
WHERE d.transformer IS NOT NULL
GROUP BY f.feeder,
d.distributor,
f.feeder
ORDER BY f.feeder,
d.distributor
The sum is supposed to bring the sum of the different kVA the transformers have. Each transformer has a certain kVA. Problem is, 1 transformer has 1kVA for all the clients it has connected, but it will sum it as if it was 1kVA per client.
I need to group it on the feeder and distributor (I want to see how much kVA the distributor has and how many clients total).
So what should be "feeder1|dist1|2|600|374" brings me "feeder1|dist1|2|130000|374" (1 transformer has 200 kVA and the otherone 400, but it will sum these two 374 times instead of 400+200)
Your data model seems a little messy, in that you've specified a distributor can have many transformers (and logic suggests that a transformer is only on a single distributor) yet your query implies that the transformer ID is on the distributor record, which normally implies the opposite relationship ...
So if that's right, it must mean that you have multiple records in the distributors table for the same distributor - i.e. distributor can't then be a unique key in distributors table, which makes the query quite hard to reason accurately about. (e.g. What happens if the records for a distributor don't all have the same feeder ID on them? I'm guessing you wouldn't like the answer so much... Presumably you mean for that to be impossible, but if the model is as described it's not impossible. And worse I'm now second-guessing whether the apparent keys on the other tables are in fact unique... But I digress...)
Or maybe something else is broken. Point is the info you've given may be inconsistent or incomplete. Since I'm inferring an abnormal data model I can't guarantee the following is bug-free (though if you provide more detail so I can make fewer guesses, I may be able to refine the answer)...
So you know the trouble is that by the time you're ready to do the aggregation, the transformer data is embedded in a larger row that isn't based just on the identity of the transformer. There are a couple ways you could fix it, basically all centered on changing how you look at the aggregation of values. Here's one option:
select f.feeder
, dtc.distributor
-- next values work because transformer is already unique per group
, count(dtc.transformer) total_transformers
, sum(dtc.kvam) total_kvam
, sum(dtc.clients) clients
from feeder f
join (select d.distributor
, d.feeder
, t.transformer
, max(t.kvan) as kvan -- or min, doesn't matter
, count(distinct c.client) clients
from distributors d
left join transformers t
on d.transformer = t.transformer
left join clients c
on c.transformer = t.transformer
where d.transformer is not null
group by d.distributor, d.feeder, t.transformer
) dtc
on dtc.feeder = f.feeder
group by f.feeder, dtc.distributor
A few notes:
I changed the outer query join to an inner join, because any null rows from the original left join from feeder would be eliminated by the original where clause.
I kept the where clause anyway; having it along side the distributor-to-transformer left join is a little weird but is different from either an inner join or an outer join without the where clause (since the where clause acts on the left table's value). I'm avoiding changing the semantics from your original query, but it's weird enough this is something you might want to take another look at.
What using the subquery does here is, the inner query returns one row per feeder/distributor/transformer - i.e. for each feeder/distributor it returns one row per transformer. That row is itself an aggregate so that we can count clients, but since all rows in that aggregation come from the same transformer record we can use max() to get that single record's kvan value onto the aggregation.

Why does changing the where clause on this criteria reduce the execution time so drastically?

I ran across a problem with a SQL statement today that I was able to fix by adding additional criteria, however I really want to know why my change fixed the problem.
The problem query:
SELECT *
FROM
(SELECT ah.*,
com.location,
ha.customer_number,
d.name applicance_NAME,
house.name house_NAME,
dr.name RULE_NAME
FROM actionhistory ah
INNER JOIN community com
ON (t.city_id = com.city_id)
INNER JOIN house_address ha
ON (t.applicance_id = ha.applicance_id
AND ha.status_cd = 'ACTIVE')
INNER JOIN applicance d
ON (t.applicance_id = d.applicance_id)
INNER JOIN house house
ON (house.house_id = t.house_id)
LEFT JOIN the_rule tr
ON (tr.the_rule_id = t.the_rule_id)
WHERE actionhistory_id >= 'ACT100010000'
ORDER BY actionhistory_id
)
WHERE rownum <= 30000;
The "fix"
SELECT *
FROM
(SELECT ah.*,
com.location,
ha.customer_number,
d.name applicance_NAME,
house.name house_NAME,
dr.name RULE_NAME
FROM actionhistory ah
INNER JOIN community com
ON (t.city_id = com.city_id)
INNER JOIN house_address ha
ON (t.applicance_id = ha.applicance_id
AND ha.status_cd = 'ACTIVE')
INNER JOIN applicance d
ON (t.applicance_id = d.applicance_id)
INNER JOIN house house
ON (house.house_id = t.house_id)
LEFT JOIN the_rule tr
ON (tr.the_rule_id = t.the_rule_id)
WHERE actionhistory_id >= 'ACT100010000' and actionhistory_id <= 'ACT100030000'
ORDER BY actionhistory_id
)
All of the _id columns are indexed sequences.
The first query's explain plan had a cost of 372 and the second was 14. This is running on an Oracle 11g database.
Additionally, if actionhistory_id in the where clause is anything less than ACT100000000, the original query returns instantly.
This is because of the index on the actionhistory_id column.
During the first query Oracle has to return all the index blocks containing indexes for records that come after 'ACT100010000', then it has to match the index to the table to get all the records, and then it pulls 29999 records from the result set.
During the second query Oracle only has to return the index blocks containing records between 'ACT100010000' and 'ACT100030000'. Then it grabs from the table those records that are represented in the index blocks. A lot less work in that step of grabbing the record after having found the index than if you use the first query.
Noticing your last line about if the id is less than ACT100000000 - sounds to me that those records may all be in the same memory block (or in a contiguous set of blocks).
EDIT: Please also consider what is said by Justin - I was talking about actual performance, but he is pointing out that the id being a varchar greatly increases the potential values (as opposed to a number) and that the estimated plan may reflect a greater time than reality because the optimizer doesn't know the full range until execution. To further optimize, taking his point into consideration, you could put a function based index on the id column or you could make it a combination key, with the varchar portion in one column and the numeric portion in another.
What are the plans for both queries?
Are the statistics on your tables up to date?
Do the two queries return the same set of rows? It's not obvious that they do but perhaps ACT100030000 is the largest actionhistory_id in the system. It's also a bit confusing because the first query has a predicate on actionhistory_id with a value of TRA100010000 which is very different than the ACT value in the second query. I'm guessing that is a typo?
Are you measuring the time required to fetch the first row? Or the time required to fetch the last row? What are those elapsed times?
My guess without that information is that the fact that you appear to be using the wrong data type for your actionhistory_id column is affecting the Oracle optimizer's ability to generate appropriate cardinality estimates which is likely causing the optimizer to underestimate the selectivity of your predicates and to generate poorly performing plans. A human may be able to guess that actionhistory_id is a string that starts with ACT10000 and then has 30,000 sequential numeric values from 00001 to 30000 but the optimizer is not that smart. It sees a 13 character string and isn't able to figure out that the last 10 characters are always going to be numbers so there are only 10 possible values rather than 256 (assuming 8-bit characters) and that the first 8 characters are always going to be the same constant value. If, on the other hand, actionhistory_id was defined as a NUMBER and had values between 1 and 30000, it would be dramatically easier for the optimizer to make reasonable estimates about the selectivity of various predicates.