Creating a view that contains all records from one table, that match the comma separated field content in another table - sql

I have two tables au_postcodes and groups.
Table groups contains a field called PostCodeFootPrint
that contains the postcode set making up the footprint.
Table au_postcodes contains a field called poa_code that
contains a single postcode.
The records in groups.PostCodeFootPrint look like:
PostCodeFootPrint
2529,2530,2533,2534,2535,2536,2537,2538,2539,2540,2541,2575,2576,2577,2580
2640
3844
2063, 2064, 2065, 2066, 2067, 2068, 2069, 2070, 2071, 2072, 2073, 2074, 2075, 2076, 2077, 2079, 2080, 2081, 2082, 2083, 2119, 2120, 2126, 2158, 2159
2848, 2849, 2850, 2852
Some records have only one postcode, some have multiple separated by a "," or ", " (comma and space).
The records in au_postcode.poa_code look like:
poa_code
2090
2092
2093
829
830
836
2080
2081
Single postcode (always).
The objective is to:
Get all records from au_postcode, where the poa_code appears in groups.*PostCodeFootPrint into a view.
I tried:
SELECT
au_postcodes.poa_code,
groups."NameOfGroup"
FROM
groups,
au_postcodes
WHERE
groups."PostcodeFootprint" LIKE '%au_postcodes.poa_code%'
But no luck

You can use regex for this. Take a look at this fiddle:
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=739592ef262231722d783670b46bd7fa
Where I form a regex from the poa_code and the word boundary (to avoid partial matches) and compare that to the PostCodeFootPrint.
select p.poa_code, g.PostCodeFootPrint
from groups g
join au_postcode p
on g.PostCodeFootPrint ~ concat('\y', p.poa_code, '\y')
Depending on your data, this may be performant enough. I also believe that in postGres you have access to the array data type, and so it might be better to store the post code lists as arrays.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=ae24683952cb2b0f3832113375fbb55b
Here I stored the post code lists as arrays, then used ANY to join with.
select p.poa_code, g.PostCodeFootPrint
from groups g
join au_postcode p
on p.poa_code = any(g.PostCodeFootPrint);
In these two fiddles I use explain to show the cost of the queries, and while the array solution is more expensive, I imagine it might be easier to maintain.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=7f16676825e10625b90eb62e8018d78e
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=e96e0fc463f46a7c467421b47683f42f
I changed the underlying data type to integer in this fiddle, expecting it to reduce the cost, but it didn't, which seems strange to me.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=521d6a7d0eb4c45471263214186e537e
It is possible to reduce the query cost with the # operator (see the last query here: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=edc9b07e9b22ee72f856e9234dbec4ba):
select p.poa_code, g.PostCodeFootPrint
from groups g
join au_postcode p
on (g.PostCodeFootPrint # p.poa_code) > 0;
but it is still more expensive than the regex. However, I think you can probably rearrange the way the tables are set up and radically change performance. See the first and second queries in the fiddle, where I take each post code in the footprint and insert it as a row in a table, along with an identifier for the group it was in:
select p.poa_code, g.which
from groups2 g
join au_postcode p
on g.footprint = p.poa_code;
The explain plan for this indicates that query cost drops significantly (from 60752.50 to 517.20, or two orders of magnitude) and the execution times go from 0.487 to 0.070. So it might be worth looking into changing the table structure.

Since the values of PostCodeFootPrint are separated by a common character, you can easily create an array out of it. From there use unnest to convert the array elements to records, and then join then with au_postcode:
SELECT * FROM au_postcode au
JOIN (SELECT trim(unnest(string_to_array(PostCodeFootPrint,',')))
FROM groups) fp (PostCodeFootPrint) ON fp.PostCodeFootPrint = au.poa_code;
Demo: db<>fiddle

Related

My Joins in query not pulling through correctly

Good evening. Could someone please help me with the following. I am trying to join two tables.The first id wbr_global.gl_ap_details. This stores historic GL information. The second table sandbox.utr_fixed_mapping is where account mapping is stored. For example, ana ccount number 60820 is mapped as Employee relation. The first table needs the mapping from the second table linked on the account number. The output I am getting is not right and way to bug. Any help would be appreciated!
Output
select sandbox.utr_fixed_mapping_na.new_mapping_1,sum(wbr_global.gl_ap_details.amount)
from wbr_global.gl_ap_details
LEFT JOIN sandbox.utr_fixed_mapping_na ON wbr_global.gl_ap_details.account_number = sandbox.utr_fixed_mapping_na.account_number
Where gl_ap_details.cost_center = '1172'
and gl_ap_details.period_name = 'JUL-21'
and gl_ap_details.ledger_name = 'Amazon.com, Inc.'
Group by 1;
I tried adding the cast function but after 5000 seconds of the query running I canceled it.
The query itself appears ok, but minor changes. Learn to use table "aliases". This way you don't have to keep typing long database.table.column all over. Additionally, SQL is easier to read doing it that way anyhow.
Notice the aliases "gl" and "fm" after the tables are declared, then these aliases are used to represent the columns.. Easier to read, would you agree.
Added GL Account number as described below the query.
select
gl.account_number,
fm.new_mapping_1,
sum(gl.amount)
from
wbr_global.gl_ap_details gl
LEFT JOIN sandbox.utr_fixed_mapping_na fm
ON gl.account_number = fm.account_number
Where
gl.cost_center = '1172'
and gl.period_name = 'JUL-21'
and gl.ledger_name = 'Amazon.com, Inc.'
Group by
gl.account_number,
fm.new_mapping_1
Now, as for your query and getting null. This just means that there are records within the gl_ap_details table with an account number that is not found in the utr_fixed_mapping_na table. So, to see WHAT gl account number does NOT exist, I have added it to the query. Its possible there are MULTIPLE records in the gl_ap_details that are not found in the mapping table. So, you may get
GLAccount Description SumOfAmount
glaccount1 null $someAmount
glaccount37 null $someAmount
glaccount49 null $someAmount
glaccount72 Depreciation $someAmount
glaccount87 Real Estate $someAmount
glaccount92 Building $someAmount
glaccount99 Salaries $someAmount
I obviously made-up glaccounts just to show the purpose. You may have multiple where the null's total amount is actually masking how many different gl account numbers were NOT found.
Once you find which are missing, you can check / confirm they SHOULD be in the mapping table.
FEEDBACK.
Since you do realize the missing numbers, lets consider a Cartesian result. If there are multiple entries in the mapping table for the same G/L account number, you will get a Cartesian result thus bloating your numbers. To clarify, lets say your mapping table has
Mapping file.
GL Descr1 NewMapping
1 test Salaries
1 testView Buildings
1 Another Depreciation
And your GL_AP_Details has
GL Amount
1 $100
Your total for the query would result in $300 because the query is trying to join the AP Details GL #1 to EACH of the entries in the mapping file thus bloating the amount. You could also add a COUNT(*) as NumberOfEntries to the query to see how many transactions it THINKS it is processing. Is there some "unique ID" in the GL_AP_Details table? If so, then you could also do a count of DISTINCT ID values. If they are different (distinct is lower than # of entries), I think THAT is your culprit.
select
fm.new_mapping_1,
sum(gl.amount),
count(*) as NumberOfEntries,
count( distinct gl.UniqueIdField ) as DistinctTransactions
from
wbr_global.gl_ap_details gl
LEFT JOIN sandbox.utr_fixed_mapping_na fm
ON gl.account_number = fm.account_number
Where
gl.cost_center = '1172'
and gl.period_name = 'JUL-21'
and gl.ledger_name = 'Amazon.com, Inc.'
Group by
fm.new_mapping_1
Might you also need to limit the mapping table for a specific prophecy or mec view?
If you "think" that the result of an aggregate is wrong, then the easiest way to verify this is to select the individual rows that correlate to 1 record in the aggregate output and inspect the records, looking for duplications.
For instance, pick 'Building Management':
SELECT fixed.new_mapping_1,details.amount,*
FROM wbr_global.gl_ap_details details
LEFT JOIN sandbox.utr_fixed_mapping_na fixed ON details.account_number = fixed.account_number
WHERE details.cost_center = '1172'
AND details.period_name = 'JUL-21'
AND details.ledger_name = 'Amazon.com, Inc.'
AND details.account_number = 'Building Management'
Notice that we tack on a ,* to the end of the projection, this will show you everything that the query has access to, you should look for repeating sections of data that you were not expecting, then depending on which table they originate from your might add additional criteria to the JOIN, or to the WHERE or you might need to group by additional columns.
This type of issue is really hard to comment on in a forum like this because it is highly specific to your schema, and the data contained within it, making solutions highly subjective to criteria you are not likely to publish online.
Generally if you think a calculation is wrong, you need to manually compute it to verify, this above advice helps you to inspect the data your query is using, you should either construct your own query or use other tools to build the data set that helps you to manually compute the correct values, then work them back into or replace your original query.
The speed issues are out of scope here, we can comment on the poor schema design but I suspect you don't have a choice. In the utr_fixed_mapping_na table you should make the account_number have the same column type as the source data, or add a new column that has the data in the original type, then you can setup indexes on the columns to improve the speed of the join.

SQL query malfunction

So Im trying to use INNER JOIN in my sql command because I am trying to replace the Foreign keys ID numbers with the text value of each column. However, when I use INNER JOIN, the column for "Standards" always gives me the same value. The following is what I started with
SELECT Grade_Id, Cluster_Eng_Id, Domain_Math_Eng_Id, Standard
FROM `math_standards_eng`
WHERE 1
and returns this (which is good). Notice the value of Standard values are different
Grade_Id Cluster_Eng_Id Domain_Math_Eng_Id Standard
103 131 107 Explain equivalence of fractions in special cases...
104 143 105 Know relative sizes of measurement units within o...
When I try to use Inner Join, the values for Grade_Id, Cluster_Eng_Id, and Domain_Math_Eng_Id are changed from numbers to actual text. Standard column values, however, seems to return the same value. Here is my code:
SELECT
grades_eng.Grade, domain_math_eng.Domain, cluster_eng.Cluster,
math_standards_eng.Standard
FROM
math_standards_eng
INNER JOIN
grades_eng ON math_standards_eng.Grade_Id = grades_eng.Id
INNER JOIN
domain_math_eng ON math_standards_eng.Domain_Math_Eng_Id
INNER JOIN
cluster_eng ON math_standards_eng.Cluster_Eng_Id
This is what I get when I run the query:
Grade Domain Cluster Standard
3rd Counting and cardinality Know number names and the count sequence Explain equivalence of fractions in special cases...
3rd Expressions and Equations Know number names and the count sequence Explain equivalence of fractions in special cases...
3rd Functions Know number names and the count sequence Explain equivalence of fractions in special cases.
4th Counting and cardinality Know number names and the count sequence Know relative sizes of measurement units within o...
4th Expressions and Equations Know number names and the count sequence Know relative sizes of measurement units within o...
The text value for Standard keeps on showing the same value per grade and I do not know why. 3rd Will keep showing the same thing, and then the next grade will change to a new value and repeat over and over. Lastly, each table has a 1:M relationship with standard as they each appear multiple times in the standard Table. Any advice would be greatly appreciated.
You are missing the = part of your INNER JOIN on domain_math_eng and cluster_eng. I would expect something like:
SELECT grades_eng.Grade, domain_math_eng.Domain, cluster_eng.Cluster, math_standards_eng.Standard FROM math_standards_eng
INNER JOIN grades_eng ON math_standards_eng.Grade_Id = grades_eng.Id
INNER JOIN domain_math_eng ON math_standards_eng.Domain_Math_Eng_Id = domain_math_eng.Id
INNER JOIN cluster_eng ON math_standards_eng.Cluster_Eng_Id = cluster_eng.Id

SQL - LEFT JOIN and WHERE statement to show just first row

I read many threads but didn't get the right solution to my problem. It's comparable to this Thread
I have a query, which gathers data and writes it per shell script into a csv file:
SELECT
'"Dose History ID"' = d.dhs_id,
'"TxFieldPoint ID"' = tp.tfp_id,
'"TxFieldPointHistory ID"' = tph.tph_id,
...
FROM txfield t
LEFT JOIN txfielpoint tp ON t.fld_id = tp.fld_id
LEFT JOIN txfieldpoint_hst tph ON fh.fhs_id = tph.fhs_id
...
WHERE d.dhs_id NOT IN ('1000', '10000')
AND ...
ORDER BY d.datetime,...;
This is based on an very big database with lots of tables and machine values. I picked my columns of interest and linked them by their built-in table IDs. Now I have to reduce my result where I get many rows with same values and just the IDs are changed. I just need one(first) row of "tph.tph_id" with the mechanics like
WHERE "Rownumber" is 1
or something like this. So far i couldn't implement a proper subquery or use the ROW_NUMBER() SQL function. Your help would be very appreciated. The Result looks like this and, based on the last ID, I just need one row for every og this numbers (all IDs are not strictly consecutive).
A01";261511;2843119;714255;3634457;
A01";261511;2843113;714256;3634457;
A01";261511;2843113;714257;3634457;
A02";261512;2843120;714258;3634464;
A02";261512;2843114;714259;3634464;
....
I think "GROUP BY" may suit your needs.
You can group rows with the same values for a set of columns into a single row

Get once time a duplicate record (SQL)

SELECT DISTINCT A.LeaseID,
C.SerialNumber,
B.LeasedProjectNumber As 'ProjectNumber',
A.LeaseComment As 'LeaseContractComments'
FROM aLease A
LEFT OUTER JOIN aLeasedAsset B
ON a.LeaseID = B.LeaseID
LEFT OUTER JOIN aAsset C
ON B.LeasedProjectNumber = C.ProjectNumber AND B.PartID = C.aPartid
WHERE A.LeaseComment IS NOT NULL
I got this result from a query statement. But I don't want to get repeated the last column(Comments) for the 3 records in the second column.
I want for the values on the second column write once the repeated comment. Like a Group By
Alright, I'll take a stab at this. It's pretty unclear what exactly you're hoping for, but reading your comments, it sounds like you're looking to build a hierarchy of sorts in your table.
Something like this:
"Lease Terminated Jan 29, 2013 due to the event of..."
216 24914 87
216 724992 87
216 724993 87
"Other potential column"
217 2132 86
...
...
Unfortuantely, I don't believe that that's possible. SQL Server is pretty strict about returning a table, which is two-dimensional by definition. There's no good way to describe a hierarchy such as this in SQL. There is the hierarchyid type, but that's not really relevant here.
Given that, you only really have two options:
My preference 99% of the time, just accept the duplicates. Handle them in your procedural code later on, which probably does have support for these trees. Unless you're dealing with performance-critical situations, or if you're pulling back a lot of data (or really long comments), that should be totally fine.
If you're hoping to print this result directly to the user, or if network performance is a big issue, aggregate your columns into a single record for each comment. It's well-known that you can't have multiple values in the same column, for the very same reason as the above-listed result isn't possible. But what you could do, data and your own preferences permitting, is write an aggregate function to concatenate the grouped results into a single, comma-delimited column.
You'd likely then have to parse those commas out, though, so unless network traffic is your biggest concern, I'd really just do it procedural-side.
SELECT STUFF((SELECT DISTINCT ', ' + SerialNumber
FROM [vLeasedAsset]
WHERE A.LeaseID = LeaseID AND A.ProjectNumber = ProjectNumber
FOR XML PATH (''))
, 1, 1, '') AS SerialNumber, [ProjectNumber],
MAX(ContractComment) 'LeaseContractComment'
FROM [vLeasedAsset] A
WHERE ContractComment != ''
GROUP BY [ProjectNumber], LeaseID
Output:
SerialNumber
24914, 724993
23401, 720356
ProjectNumber
87
91

SQL query to find records with specific prefix

I'm writing SQL queries and getting tripped up by wanting to solve everything with loops instead of set operations. For example, here's two tables (lists, really - one column each); idPrefix is a subset of idFull. I want to select every full ID that has a prefix I'm interested in; that is, every row in idFull which has a corresponding entry in idPrefix.
idPrefix.ID idFull.ID
---------- ----------
12 8
15 12
300 12-1-1
12-1-2
15
15-1
300
Desired result would be everything in idFull except the value 8. Super-easy with a for each loop, but I'm just not conceptualizing it as a set operation. I've tried a few variations on the below; everything seems to return all of one table. I'm not sure if my issue is with how I'm doing joins, or how I'm using LIKE.
SELECT f.ID
FROM idPrefix AS p
JOIN idFull AS f
ON f.ID LIKE (p.ID + '%')
Details:
Values are varchars, prefixes can be any length but do not contain the delimiter '-'.
This question seems similar, but more complex; this one only uses one table.
Answer doesn't need to be fast/optimized/whatever.
Using SQL Server 2008, but am more interested in conceptual understanding than a flavor-specific query.
Aaaaand I'm coming back to both real coding & SO after ~3 years, so sorry if I'm rusty on any etiquette.
Thanks!
You can join the full table to the prefix table with a LIKE
SELECT idFull.ID
FROM idFull full
INNER JOIN idPrefix pre ON full.ID LIKE pre.ID + '%'