How to extract from Text and match from another query - sql

I have a table with airport sequences (AIRPRT_SERIES) like SFO/ATL/GRU or PIT/ATL/GIG/VIX. I would like to find all matches in this field (in this case ATL/GRU and ATL/GIG) that match another table where I have those flights (Hub: ATL Spoke:GIG). Problem is that I don't know how to pass or join my tables to make this happen.
This is a query that does half of what I want. The problem is that there are no fields in either table that match (other than when I extract them) so I don't know how to do the join.
select
*
from LEG_OD leg
inner join myMarkets mkts
on leg.nondir=mkts.nd_arp -- Current condition but not what I want/need
WHERE
REGEXP_SIMILAR(AIRPRT_SERIES , '[A-Z]{3}/('||mkts.SPOKE||'/'||mkts.Hub||'/|'||mkts.Hub||'/'||mkts.Spoke||'/)[A-Z]{3}' )=1
AND
leg.year_month BETWEEN '20160101' AND '20160112'
LEG_OD Fields: AIRPRT_SERIES, Passengers, nondir
myMarkets: Hub, Spoke, Distance,nd_arp
I would like to keep the REGEXP_SIMILAR condition as this is part of a larger query.

You'll need to use STRTOK_SPLIT_TO_TABLE. Taking your LEG_OD table, we can split out the individual elements from airport_series.
select
d.* from TABLE (strtok_split_to_table (LEG_OD.<some sort of id column>,LEG_OD.AIRPORT_SERIES, '/')
returns (outkey integer, tokennum integer,toke varchar(3) character set unicode)) as d
order by 1,2
That will get you a row for each element in the airport_series column.
If I understand your data correctly, you could then join on those values.

Related

How can I identify all matching columns for two rows of data?

Using Standard ANSI SQL, how does one return a list of columns which are matching for two specific rows of data? We don't know the names of the columns, only the table name and the ID (or other primary key) to pick out the two specific rows we wish to compare?
Let's say we have a table with a large number of columns for real estate listings. If I choose two specific rows like so:
SELECT *
FROM listing_data
WHERE mls_number IN ('111111', '222222')
How can I identify the names of all other columns which happen to match between these two particular rows?
For example, perhaps there is a column called 'school_district' and they both are in the same district. Or perhaps the two listings share the same street name, or the same listing agent, or all three of these.
To get column names you can select from information_schema.columns however that is a table of column names only and does not have any data. If you are trying to do a select * from tablex where select * from tabley where columnname = 'value' then if it works at all unless your tables are small it may take hours to complete. It is simple if you know your column names to form up a query. Do some research and practice query on your tables and you should get some insight. You are unlikely to have address data in a name column so once you get familiar with your data you should be able to craft a simple query.
You need to explicitly do the comparison for each column. One method is:
SELECT (CASE WHEN ld1.col1 IS NOT DISTINCT FROM ld.col1 THEN 'col1;' ELSE '' END ||
CASE WHEN ld1.col2 IS NOT DISTINCT FROM ld.col2 THEN 'col2;' ELSE '' END ||
. . .
) as matches
FROM listing_data ld1 JOIN
listing_data ld2
ON ld1.mls_number = '111111' AND
ld2.mls_number = '222222'

JOIN of 4 tables, how to restrict SELECT columns to one table only?

I am working on ABAP program - user input is to query column ANLAGE and output is to get all records from table EADZ (and only fields of EADZ) based on ANLAGE.
Statement and joins should work like this:
Input ANLAGE, find in table EASTL, gets LOGIKNR
Input LOGIKNR, find in table EGERR, gets EQUNR
Input EQUNR, find in table ETDZ, gets LOGIKZW
Input LOGIKZW, find in table EADZ, gets all records (this is the final output)
Here is the code I tried:
DATA: gt_cas_rezy TYPE STANDARD TABLE OF eadz,
lv_dummy_eanl LIKE eanl-anlage.
SELECT-OPTIONS: so_anl FOR lv_dummy_eanl NO INTERVALS NO-EXTENSION.
SELECT * FROM eadz
INNER JOIN etdz ON eadz~logikzw EQ etdz~logikzw
INNER JOIN egerr ON etdz~equnr EQ egerr~equnr
INNER JOIN eastl ON egerr~logiknr EQ eastl~logiknr
INTO CORRESPONDING FIELDS OF TABLE #gt_cas_rezy
WHERE eastl~anlage IN #so_anl.
I got the records from table EADZ except that the date fields are empty (even though, they are filled in database table). I am assuming there is a problem with JOINs since in statement like this I join all the fields of all 4 tables into one "record" and then to corresponding fields of internal table.
How to get the values of date fields?
You can find the answer in the documentation.
If a column name appears multiple times and no alternative column name was granted, the last column listed is assigned.
In your case, at least two tables share the same column name. Therefore the values from the last mentioned table are used in the join.
You can solve this by listing the columns explicitly (or eadz~* in your case), giving an alias if required.
SELECT EADZ~* FROM EADZ INNER JOIN ETDZ ON EADZ~LOGIKZW = ETDZ~LOGIKZW
INNER JOIN EGERR ON ETDZ~EQUNR = EGERR~EQUNR
INNER JOIN EASTL ON EGERR~LOGIKNR = EASTL~LOGIKNR
INTO CORRESPONDING FIELDS OF TABLE #gt_cas_rezy
WHERE EASTL~ANLAGE IN #SO_ANL.
If you require additional fields, you can add them explicily with e.g. EADZ~*, EASTL~A.

Replace values in array column with related values from another table

In my database I have a table relations with a column relation_ids containing the IDs of users (user_id). This takes the form of an array with many IDs possible, e.g.:
{111,112,156,4465}
I have another table names containing information on users such as user_id, first_name, last_name etc.
I would like to create an SQL query to return all rows from relations with all columns, but append the array column relation_ids with first_name from the names table substituted for IDs.
Is it possible as some kind of subquery?
Assuming you want to preserve the order in the array - first names listed in the same order as IDs in the original relation_ids.
I suggest an ARRAY constructor over a correlated subquery with unnest() and WITH ORDINALITY, joined to the names table, like:
SELECT r.*
, (ARRAY (
SELECT n.first_name
FROM unnest(r.relation_ids) WITH ORDINALITY AS a(user_id, ord)
JOIN names n ON n.user_id = a.user_id
ORDER BY a.ord
)
) AS first_names
FROM relations r;
This query preserves all rows from relations in any case.
Corner cases to note:
1. A NULL value in relation_ids (for the whole column) is translated to an empty array. (Same as empty array in the source.)
2. NULL elements are silently dropped from the array.
You might want to define desired behavior if those corner cases are possible ...
db<>fiddle here
Related:
LEFT OUTER JOIN on array column with multiple values
PostgreSQL unnest() with element number
Considered a normalized db design:
Can PostgreSQL array be optimized for join?
This will get you all the columns and rows from Relations with first_name appended from the Names table.
Select Relations.relation_ids, Names.user_id, Names.first_name From Relations
Inner Join Names On Relations.user_id=Names.user_id

Subquery that matches column with several ranges defined in table

I've got a pretty common setup for an address database: a person is tied to a company with a join table, the company can have an address and so forth.
All pretty normalized and easy to use. But for search performance, I'm creating a materialized, rather denormalized view. I only need a very limited set of information and quick queries. Most of everything that's usually done via a join table is now in an array. Depending on the query, I can either search it directly or join it via unnest.
As a complement to my zipcodes column (varchar[]), I'd like to add a states column that has the (German fedaral) states already precomputed, so that I don't have to transform a query to include all kinds of range comparisons.
My mapping date is in a table like this:
CREATE TABLE zip2state (
state TEXT NOT NULL,
range_start CHARACTER VARYING(5) NOT NULL,
range_end CHARACTER VARYING(5) NOT NULL
)
Each state has several ranges, and ranges can overlap (one zip code can be for two different states). Some ranges have range_start = range_end.
Now I'm a bit at wit's end on how to get that into a materialized view all at once. Normally, I'd feel tempted to just do it iteratively (via trigger or on the application level).
Or as we're just talking about 5 digits, I could create a big table mapping zip to state directly instead of doing it via a range (my current favorite, yet something ugly enough that it prompted me to ask whether there's a better way)
Any way to do that in SQL, with a table like the above (or something similar)? I'm at postgres 9.3, all features allowed...
For completeness' sake, here's the subquery for the zip codes:
(select array_agg(distinct address.zipcode)
from affiliation
join company
on affiliation.ins_id = company.id
join address
on address.com_id = company.id
where affiliation.per_id = person.id) AS zipcodes,
I suggest a LATERAL join instead of the correlated subquery to conveniently compute both columns at once. Could look like this:
SELECT p.*, z.*
FROM person p
LEFT JOIN LATERAL (
SELECT array_agg(DISTINCT d.zipcode) AS zipcodes
, array_agg(DISTINCT z.state) AS states
FROM affiliation a
-- JOIN company c ON a.ins_id = c.id -- suspect you don't need this
JOIN address d ON d.com_id = a.ins_id -- c.id
LEFT JOIN zip2state z ON d.zipcode BETWEEN z.range_start AND z.range_end
WHERE a.per_id = p.id
) z ON true;
If referential integrity is guaranteed, you don't need to join to the table company at all. I took the shortcut.
Be aware that varchar or text behaves differently than expected for numbers. For example: '333' > '0999'. If all zip codes have 5 digits you are fine.
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?

SELECT DISTINCT. Please explain?

Wondering if someone could please explain the difference between these two queries and advise why one works and the other doesn't.
This one works. Gives me two records of the distinct GantryRtn value and their corresponding SSD value.
SELECT DISTINCT GantryRtn as Gantry, ROUND(Field.SSD,1) as SSD
FROM Field, PlanSetup, Course, Patient, Radiation
WHERE Field.RadiationSer=Radiation.RadiationSer
AND Radiation.PlanSetupSer=PlanSetup.PlanSetupSer
AND PlanSetup.CourseSer=Course.CourseSer
AND Course.PatientSer=Patient.PatientSer
AND Patient.PatientId='ZZZ456'
AND PlanSetup.PlanSetupId='F T1 R CHEST'
However there is a foreign key in the Field table that links to the primary key of another table that contains a plain text name for each field. I'd also like to extract that name (in a separate query if I have to) by pulling out this foreign key RadiationSer. But as soon as I put RadiationSer into the query, I lose my DISTINCT result.
SELECT DISTINCT GantryRtn as Gantry, ROUND(Field.SSD,1) as SSD, Field.RadiationSer
FROM Field, PlanSetup, Course, Patient, Radiation
WHERE Field.RadiationSer=Radiation.RadiationSer
AND Radiation.PlanSetupSer=PlanSetup.PlanSetupSer
AND PlanSetup.CourseSer=Course.CourseSer
AND Course.PatientSer=Patient.PatientSer
AND Patient.PatientId='ZZZ456'
AND PlanSetup.PlanSetupId='F T1 R CHEST'
This second query gives me 7 records with non-distinct GantryRtn values.
Why does this happen??
I have investigated using GROUP BY but this slows the query down and appears to pull ALL GantryRtn's out of the database (100s of records).
Thanks
Greg
The DISTINCT keyword applys to a result set (all fields) and not just to the first field.
In your case:
SELECT DISTINCT GantryRtn as Gantry, ROUND(Field.SSD,1) as SSD, Field.RadiationSer
will return any records that are distinct (not the same) when taken together with Gantry, SSD, and RadiationSer
So, you may have 7 records for the same Gantry and with different values for RadiationSer.
If you'd like to first filter by distinct Gantry values you can accomplish that with a sub-query and an inner join but somehow you must settle on which RadiationSer value to use.