Postgis ST_Distance between zip codes on Postgress SQL 9.x - sql

this is more of a SQL question than a PostGIS question, but I'm getting stuck again :(
I have a table called referred with id numbers in the "from" and "to" columns.
I want to calculate the distance between ALL these id numbers based on their zip code.
There is a separate reference table called doc which contains the id number in column "NPI" and zip code in column "Provider Business Mailing Address Postal Code" and a separate geo table called zctas which has zip code column as zcta and geom column.
For example, this query works fine:
SELECT z.zcta As zip1,
z2.zcta As zip2,
ST_Distance(z.geom,z2.geom) As thedistance
FROM zctas z,
zctas z2
WHERE z2.zcta = '60611'
AND z.zcta='19611';
One catch is that the "Provider Business Mailing Address Postal Code" should = left("Provider Business Mailing Address Postal Code", 5).
I'm getting stuck on JOIN-ing the 2 zip codes from the reference table in this one query.
Sample table:
referred table:
from | to | count
------------+------------+-------
1174589766 | 1538109665 | 108
1285653204 | 1982604013 | 31
desired output:
from | to | count | distance
------------+------------+----------------
1174589766 | 1538109665 | 108 | 53434
1285653204 | 1982604013 | 31 | 34234
\d+
Table "public.zctas"
Column | Type | Modifiers | Storage | Stats target | Description
------------------+------------------------+-----------+----------+--------------+-------------
state | character(2) | | extended | |
zcta | character(5) | | extended | |
junk | character varying(100) | | extended | |
population_tot | bigint | | plain | |
housing_tot | bigint | | plain | |
water_area_meter | double precision | | plain | |
land_area_meter | double precision | | plain | |
water_area_mile | double precision | | plain | |
land_area_mile | double precision | | plain | |
latitude | double precision | | plain | |
longitude | double precision | | plain | |
thepoint_lonlat | geometry(Point,4269) | | main | |
thepoint_meter | geometry(Point,32661) | not null | main | |
geom | geometry(Point,32661) | | main | |
Indexes:
"idx_zctas_thepoint_lonlat" gist (thepoint_lonlat)
"idx_zctas_thepoint_meter" gist (thepoint_meter) CLUSTER
Table "public.referred"
Column | Type | Modifiers | Storage | Stats target | Description
--------+-----------------------+-----------+----------+--------------+-------------
from | character varying(25) | | extended | |
to | character varying(25) | | extended | |
count | integer | | plain | |
Has OIDs: no
Table "public.doc"
Column | Type | Modifiers | Storage | Stats target | Description
--------------------------------------------------------------+------------------------+-----------+----------+--------------+-------------
NPI | character varying(255) | | extended | |
Entity Type Code | character varying(255) | | extended | |
Replacement NPI | character varying(255) | | extended | |
Employer Identification Number (EIN) | character varying(255) | | extended | |
Provider Organization Name (Legal Business Name) | character varying(255) | | extended | |
Provider Last Name (Legal Name) | character varying(255) | | extended | |
Provider First Name | character varying(255) | | extended | |
Provider Middle Name | character varying(255) | | extended | |
Provider Name Prefix Text | character varying(255) | | extended | |
Provider Name Suffix Text | character varying(255) | | extended | |
Provider Credential Text | character varying(255) | | extended | |
Provider Other Organization Name | character varying(255) | | extended | |
Provider Other Organization Name Type Code | character varying(255) | | extended | |
Provider Other Last Name | character varying(255) | | extended | |
Provider Other First Name | character varying(255) | | extended | |
Provider Other Middle Name | character varying(255) | | extended | |
Provider Other Name Prefix Text | character varying(255) | | extended | |
Provider Other Name Suffix Text | character varying(255) | | extended | |
Provider Other Credential Text | character varying(255) | | extended | |
Provider Other Last Name Type Code | character varying(255) | | extended | |
g(255) | | extended | |
Provider Second Line Business Mailing Address | character varying(255) | | extended | |
Provider Business Mailing Address City Name | character varying(255) | | extended | |
Provider Business Mailing Address State Name | character varying(255) | | extended | |
Provider Business Mailing Address Postal Code | character varying(255) | | extended | . . . . other columns not really needed.
Thanks!!!!

This should be relatively straightforward.
Assuming the NPIs are actually all the same length in doc and referred, you can join those tables quite easily:
SELECT ad."Provider Business Mailing Address Postal Code" as a_zip,
bd."Provider Business Mailing Address Postal Code" as b_zip,
r."count"
FROM referred r
LEFT JOIN doc ad ON r."from" = ad."NPI"
LEFT JOIN doc bd ON r."from" = bd."NPI";
Obviously, adjust this join based on careful analysis of the NPI and from/to fields in your data. Add trim or left method calls within the join if necessary -- the most important thing is that the JOIN condition be on comparable data.
Now, going from this to your original query to find a distance is trivial:
SELECT ad."Provider Business Mailing Address Postal Code" as a_zip,
bd."Provider Business Mailing Address Postal Code" as b_zip,
r."count",
ST_Distance(az.geom,bz.geom) As thedistance
FROM referred r
LEFT JOIN doc ad ON r."from" = ad."NPI"
LEFT JOIN doc bd ON r."from" = bd."NPI"
LEFT JOIN zctas az
ON az.zcta = left(ad."Provider Business Mailing Address Postal Code",5)
LEFT JOIN zctas bz
ON bz.zcta = left(bd."Provider Business Mailing Address Postal Code",5)
This is just one construction that should work, many others are possible. This particular construction will ensure that every entry in referred is represented, even if the NPI doesn't match to an entry in the doc table, or a zipcode can't be matched against the zctas table.
On the flip side, if there exists more than one entry for an NPI in the doc table, any referred entry that mentions this duplicated NPI will also be duplicated.
Similarly, if there is more than one entry in zctas for a particular zip code (zcta), you would see duplicates of referred rows.
That's how LEFT JOIN works, but I figured it was worth putting in the warning, as Provider data is typically full of duplicates against NPI, and there are often duplicate zip codes in zip code lookup lists as some zip codes cross state lines.

Related

Translation database schema for software (exemplified)

I have a device with a GUI (menus etc) that I need translated. The translations are managed using a database.
I have looked at different answers to same issue:
Best practice for multi-language
Schema for a multilanguage database
As most of the examples explain mostly schemas, I have tried to make a small example, using SQLite:
Database shemas:
-- language table to hold individual languages
--
CREATE TABLE "languages" (
"languageID" TEXT NOT NULL,
"langNativeName" TEXT,
"langISOCode" TEXT
PRIMARY KEY("languageID")
)
-- words table to hold the strings to be translated
-- each string has an identifier used in software, such as "mainMenu", "label123" etc
CREATE TABLE "words" (
"wordID" INTEGER NOT NULL,
"wordKey" TEXT,
"wordDefault" TEXT,
PRIMARY KEY("wordID")
)
-- translations
-- combining languages and words
CREATE TABLE "translations" (
"keyID" INTEGER NOT NULL,
"langID" INTEGER NOT NULL,
"translation" TEXT,
PRIMARY KEY("keyID","langID")
)
With some sample data:
Languages:
+------------+----------------+-------------+-------------+
| languageID | langNativeName | langEnglish | langISOCode |
+------------+----------------+-------------+-------------+
| 1 | English | English | en |
| 2 | Francois | French | fr |
| 3 | Deutsch | German | de |
+------------+----------------+-------------+-------------+
Words: (strings to translate). The wordKey will always be unique.
+--------+----------------+-------------+
| wordID | wordKey | wordDefault |
+--------+----------------+-------------+
| 1 | tileProduction | Start |
| 2 | tileJobs | Job |
| 3 | tileGoto | Go To |
+--------+----------------+-------------+
Translations:
word (1) is not translated for English
word (3) is not translated at all
+-------+--------+--------------------+
| keyID | langID | translation |
+-------+--------+--------------------+
| 1 | 2 | Produccion |
| 1 | 3 | Produktion |
| 2 | 2 | Seleccion de tarea |
| 2 | 3 | Jobauswahl |
+-------+--------+--------------------+
My proposed SQL to get a translation
As I have understood from examples, the following SQL should be used to get a translation. An extra field has been added to get a star on untranslated items.
select
wordKey,
coalesce( translation, wordDefault) displaytext,
case coalesce( translation, wordDefault)
when translation then ""
else "*"
end remark
from
words
left join translations
on words.wordid = translations.keyID
AND
translations.langid = 3
left join languages on
languages.languageid = translations.langID
AND
translations.langID = 3
Which will give me:
+----------+-------------+--------+
| wordKey | displaytext | remark |
+----------+-------------+--------+
| tileProd | Produktion | |
| tileJob | Jobauswahl | |
| tileGoto | Go To | * |
+----------+-------------+--------+
This works.. but what I am in doubt about before I start inserting larger amounts of real data:
is this model correct, and if not, what may I have missed ? There will be some metadata for each language, which I have left out for clarity.
is the query to get a translation correct, or can it be done better ?

Replacing for loop by sql

I have SQL for example
show tables from mydb;
It shows the list of table
|table1|
|table2|
|table3|
Then,I use sql sentence for each table.
such as "show full columns from table1 ;"
+----------+--------+-----------+------+-----+---------+----------------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+----------+--------+-----------+------+-----+---------+----------------+---------------------------------+---------+
| id | bigint | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | |
| user_id | bigint | NULL | NO | MUL | NULL | | select,insert,update,references | |
| group_id | int | NULL | NO | MUL | NULL | | select,insert,update,references | |
+----------+--------+-----------+------+-----+---------+----------------+---------------------------------+---------+
So in this case I can use programming language such as .(this is not correct code just showing the flow)
tables = "show tables from mydb;"
for t in tables:
cmd.execute("show full columns from {t} ;")
However is it possible to do this in sql only?
If you are using MySQL you can use the system view - INFORMATION_SCHEMA.
It contains table name and column name (and other details). No loop is require and you can easily filter by other information, too.
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
If you are using Microsoft SQL Server, you can use the above command

Cast VARCHAR columns to int, bigint, time, etc (PL/pgSQL)

Problem
(This is for an open source, analytics library.)
Here's our query results from events_view:
id | visit_id | name | prop0 | prop1 | url
------+----------+--------+----------------------------+-------+------------
2004 | 4 | Magnus | 2021-10-26 02:25:55.790999 | 142 | cnn.com
2007 | 4 | Hartis | 2021-10-26 02:26:37.773999 | 25 | fox.com
Currently all columns are VARCHAR.
Column | Type | Collation | Nullable | Default
----------+-------------------+-----------+----------+---------
id | bigint | | |
visit_id | character varying | | |
name | character varying | | |
prop0 | character varying | | |
prop1 | character varying | | |
url | character varying | | |
They should be something like
Column | Type | Collation | Nullable | Default
----------+------------------------+-----------+----------+---------
id | bigint | | |
visit_id | bigint | | |
name | character varying | | |
prop0 | time without time zone | | |
prop1 | bigint | | |
url | character varying | | |
Desired result
Hardcoding these castings as in SELECT visit::bigint, name::varchar, prop0::time, prop1::integer, url::varchar FROM tbl won't do, column names are known in run time only.
To simplify things we could cast each column into only three types: boolean, numeric, or varchar. Use regexps below for matching types:
boolean: ^(true|false|t|f)$
numeric: ^(,-)[0-9]+(,\.[0-9]+)$
varchar: every result that does not match boolean and numeric above
What should be the SQL that discover what type each column is and dynamically cast them?
These are a few ideas rather than a true solution for this tricky job. A slow but very reliable function can be used instead of regular expressions.
create or replace function can_cast(s text, vtype text)
returns boolean language plpgsql immutable as
$body$
begin
execute format('select %L::%s', s, vtype);
return true;
exception when others then
return false;
end;
$body$;
Data may be presented like this (partial list of columns from your example)
create or replace temporary view tv(id, visit_id, prop0, prop1) as
values
(
2004::bigint,
4::bigint,
case when can_cast('2021-10-26 02:25:55.790999', 'time') then '2021-10-26 02:25:55.790999'::time end,
case when can_cast('142', 'bigint') then '142'::bigint end
), -- determine the types
(2007, 4, '2021-10-26 02:26:37.773999', 25)
-- the rest of the data here;
I believe that it is possible to generate the temporary view DDL dynamically as a select from events_view too.

Error in condition in where clause in timescale db while visualising in grafana

I am trying to visualise in Grafana from timescale db with the following query
SELECT $__timeGroup(timestamp,'30m'), sum(error) as Error
FROM userCounts
WHERE serviceid IN ($Service) AND ciclusterid IN ($CiClusterId)
AND environment IN ($environment) AND filterid IN ($filterId)
AND $__timeFilter("timestamp")
GROUP BY timestamp;
however it gives an error and no data shows when i add the filterid IN ($filterId) part
have checked the variable names a thousand times but not sure what is error. Logically if the filters for variables are working in other conditions , it should work here also. not sure what is going wrong. Can anyone give input ?
Edit:
The schema is like
timestamp | timestamp without time zone | | not nul
l |
measurement | character varying(150) | |
|
filterid | character varying(150) | |
|
environment | character varying(150) | |
|
iscanary | boolean | |
|
servicename | character varying(150) | |
|
serviceid | character varying(150) | |
|
ciclusterid | character varying(150) | |
--more--
In grafana , it is giving the error
pq: column "in_orgs_that_have_had_an_operational_connector" does not exist
Where filterId = IN_ORGS_THAT_HAVE_HAD_AN_OPERATIONAL_CONNECTOR is selected, it is a value and not a column so not sure why they mentioned that, also they are showing in lower case while the value is in uppercase

Building a Database Model for Role Based Access Control

I'm trying to make a role based access control system, but the problem comes when I approach the Database Part of It.
Should I make two models, Role and Permission, and then make a many to many relationship between role and permission or what?
My User Model looks something like this:
Column | Type | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+--------------------
id | uuid | | not null | uuid_generate_v4()
name | character varying(50) | | not null |
email | character varying(320) | | not null |
avatar | text | | |
password | text | | not null |
phone | character varying(30) | | |
created_at | timestamp without time zone | | not null | now()
updated_at | timestamp without time zone | | | now()
companyId | uuid | | |
roleId | uuid | | |
So I just have 1 to many relation ship between user and role.
Mainly it's done with a single role and multiple users associated with it.
Roles can be assigned privileges and users can be assigned roles.
You can refer following for the same.
Role-based access control in Postgres/mongo