Homoiconicity and SQL - sql

I'm currently using emacs sql-mode as my sql shell, a (simplified) query response is below:
my_db=# select * from visit limit 4;
num | visit_key | created | expiry
----+-----------------------------+----------------------------+------------
1 | 0f6fb8603f4dfe026d88998d81a | 2008-03-02 15:17:56.899817 | 2008-03-02
2 | 7c389163ff611155f97af692426 | 2008-02-14 12:46:11.02434 | 2008-02-14
3 | 3ecba0cfb4e4e0fdd6a8be87b35 | 2008-02-14 16:33:34.797517 | 2008-02-14
4 | 89285112ef2d753bd6f5e51056f | 2008-02-21 14:37:47.368657 | 2008-02-21
(4 rows)
If I want to then formulate another query based on that data, e.g.
my_db=# select visit_key, created from visit where expiry = '2008-03-02'
and num > 10;
You'll see that I have to add the comma between visit_key and created, and surround the expiry value with quotes.
Is there a SQL DB shell that shows it's content more homoiconically, so that I could minimise this sort of editing? e.g.
num, visit_key, created, expiry
(1, '0f6fb8603f4dfe026d88998d81a', '2008-03-02 15:17:56.899817', '2008-03-02')
or
(num=1, visit_key='0f6fb8603f4dfe026d88998d81a',
created='2008-03-02 15:17:56.899817', expiry='2008-03-02')
I'm using postgresql btw.

Here's one idea, which is similar to what I do sometimes, though I'm not sure that it's exactly what you're asking for:
Run a Lisp compiler (like SBCL) in SLIME. Then load CLSQL. It has a "Functional Data Manipulation Language" (SELECT documentation) which might help you do something like you want, perhaps in conjunction with SLIME's autocompletion capabilities. If not, it's easy to define Lisp functions and macros (assuming you know Lisp, but you're already an Emacser!).
Out-of-the-box, it doesn't give the nicely formatted tables that most SQL interfaces have, but even that isn't too hard to add. And Lisp is certainly powerful enough to let one easily come up with ways to make your common operations easier.

I've found the following changes in psql go some way to giving me homoiconicity:
=# select remote_ip, referer, http_method, time from hit limit 1;
remote_ip | referer | http_method | time
-----------------+---------+-------------+---------------------------
213.233.132.148 | | GET | 2013-08-27 08:01:42.38808
(1 row)
=# \a
Output format is unaligned.
=# \f ''', '''
Field separator is "', '".
=# \t
Showing only tuples.
=# select remote_ip, referer, http_method, time from hit limit 1;
213.233.132.148', '', 'GET', '2013-08-27 08:01:42.38808
caveats: everything is a string, and it's missing start and end quotes.

Related

Open sums in SQL / dynamic selection of tables

Much ink has been spilled on the topic of sum types in SQL. The standard solutions are called absorption, separation, and partition; see, e.g.: https://www.inf.unibz.it/~montali/teaching/1415/dpm/slides/4.relational-mapping.pdf .
I want to ask about how to encode open sums. Normal sums allow a field to be one of a fixed set of several different types; with open sums, this set is not fixed.
The basic setup in our program: There is a list of "triggers," where each trigger can be one of many different things. Plugins can be written defining new trigger types, although the set of trigger types can be assumed to be known at compile time.
We want a table of all triggers.
Our current best idea:
Dynamically create a materialized view of the following form:
id | id_in_plugin_table | thing_in_main_program_it_refs | plugin_name
---------------------------------------------------------------------
1 | 27 | 8 | RegexTrigger
2 | 27 | 12 | RidiculouslyUnsafeCustomJSTrigger
This relation is automatically generated from the various plugin tables, each of which have their own ID and a thing_in_main_program_it_refs field.
For illustration, here's what the referenced tables may look like.
RegexTrigger table:
id | thing_in_main_program_it_refs | regex
---------------------------------------------------------------------
27 | 8 | hel*o
RidiculouslyUnsafeCustomJSTrigger
id | thing_in_main_program_it_refs | custom_js
---------------------------------------------------------------------
27 | 12 | (x) => isPrime(x.length())
Either use two roundtrips to lookup the plugin table and then query it, or combine them into a single SQL program which uses EXEC.
I'm happy with part 1, but not with part 2. Neither option sounds efficient, and the latter option uses EXEC.
So, we're looking for either (a) a better way to dynamically select a table in a query, or (b) a different approach to open sums.

Combine query to get all the matching search text in right order

I have the following table:
postgres=# \d so_rum;
Table "public.so_rum"
Column | Type | Collation | Nullable | Default
-----------+-------------------------+-----------+----------+---------
id | integer | | |
title | character varying(1000) | | |
posts | text | | |
body | tsvector | | |
parent_id | integer | | |
Indexes:
"so_rum_body_idx" rum (body)
I wanted to do phrase search query, so I came up with the below query, for example:
select id from so_rum
where body ## phraseto_tsquery('english','Is it possible to toggle the visibility');
This gives me the results, which only match's the entire text. However, there are documents, where the distance between lexmes are more and the above query doesn't gives me back those data. For example: 'it is something possible to do toggle between the. . . visibility' doesn't get returned. I know I can get it returned with <2> (for example) distance operator by giving in the to_tsquery, manually.
But I wanted to understand, how to do this in my sql statement itself, so that I get the results first with distance of 1 and then 2 and so on (may be till 6-7). Finally append results with the actual count of the search words like the following query:
select count(id) from so_rum
where body ## to_tsquery('english','string & string . . . ')
Is it possible to do in a single query with good performance?
I don't see a canned solution to this. It sounds like you need to use plainto_tsquery to get all the results with all the lexemes, and then implement your own custom ranking function to rank them by distance between the lexemes, and maybe filter out ones with the wrong order.

How to compare value with multiple modified values from another table in BigQuery?

I am using Google BigQuery and I got the following issue:
I have a table (A) like this:
| time | request |
|------------------------|-----------------|
|2019-09-24 11:10:00 UTC | fakewebsite.com |
|2019-09-24 11:10:00 UTC | realwebsite.com |
|........................|.................|
|2019-09-24 11:10:00 UTC | foobwebsite.com |
|2019-09-24 11:10:00 UTC | barrwebsite.com |
And another table (B) like this:
| blacklist |
|---------------|
| foo.com |
| ... |
| bar.com |
I want to make a query that will grab a modified version of the values inside the blacklist field of table B as follows:
SPLIT(NET.REG_DOMAIN(blacklist), CONCAT('.',NET.PUBLIC_SUFFIX(blacklist)))[OFFSET(0)] AS to_exclude --this will return only "foo" from "foo.com"
and then return all values from the request field of table A where none of the to_exclude was found.
I know how to do this for one value but I don't know how to do this for multiple. I am looking for something like the following:
#standardSQL
WITH tmp_blacklist AS
(SELECT
SPLIT(NET.REG_DOMAIN(blacklist), CONCAT('.',NET.PUBLIC_SUFFIX(blacklist)))[OFFSET(0)] AS to_exclude
FROM
mydataset.B)
SELECT
request
FROM
mydataset.A
WHERE
request NOT LIKE ("%value1%", "%value2%", ..., "%valuen%") -- I can't use OR along with the NOT LIKE since the values are too many and they will change.
The n values are the values of the tmp_blacklist table.
Also if I don't define the table with the WITH and I define it after the NOT LIKE I am going to get the following error: Scalar subquery produced more than one element which makes sense if LIKE expects only one element. But then again that's half of the job done if it get's fixed since I want the "%value%" and not just the value of the table.
Now I searched online for a way to do this and I found people saying that it can't be done and then some workarounds with combinations of LIKE and IN where people said it will be very slow if one of the tables grows to have tons of data(my case).
What is the best way to do this?
One method uses not exists:
SELECT a.request
FROM mydataset.A a
WHERE NOT EXISTS (SELECT 1
FROM tmp_blacklist bl
WHERE a.request LIKE CONCAT('%', bl.to_exclude, '%'
);
Note that this can be expensive. You might want to test constructing the exclusion string as:
'value1|value2|value3'
and then using regular expressions.

Postgres matching against an array of regular expressions

My client wants the possibility to match a set of data against an array of regular expressions, meaning:
table:
name | officeId (foreignkey)
--------
bob | 1
alice | 1
alicia | 2
walter | 2
and he wants to do something along those lines:
get me all records of offices (officeId) where there is a member with
ANY name ~ ANY[.*ob, ali.*]
meaning
ANY of[alicia, walter] ~ ANY of [.*ob, ali.*] results in true
I could not figure it out by myself sadly :/.
Edit
The real Problem was missing form the original description:
I cannot use select disctinct officeId .. where name ~ ANY[.*ob, ali.*], because:
This application, stored data in postgres-xml columns, which means i do in fact have (after evaluating xpath('/data/clients/name/text()'))::text[]):
table:
name | officeId (foreignkey)
-----------------------------------------
[bob, alice] | 1
[anthony, walter] | 2
[alicia, walter] | 3
There is the Problem. And "you don't do that, that is horrible, why would you do it like this, store it like it is meant to be stored in a relation database, user a no-sql database for Document-based storage, use json" are no options.
I am stuck with this datamodel.
This looks pretty horrific, but the only way I can think of doing such a thing would be a hybrid of a cross-join and a semi join. On small data sets this would probably work pretty well. On large datasets, I imagine the cross-join component could hit you pretty hard.
Check it out and let me know if it works against your real data:
with patterns as (
select unnest(array['.*ob', 'ali.*']) as pattern
)
select
o.name, o.officeid
from
office o
where exists (
select null
from patterns p
where o.name ~ p.pattern
)
The semi-join helps protect you from cases where you have a name like "alicia nob" that would meet multiple search patterns would otherwise come back for every match.
You could cast the array to text.
SELECT * FROM workers WHERE (xpath('/data/clients/name/text()', xml_field))::text ~ ANY(ARRAY['wal','ant']);
When casting a string array into text, strings containing special characters or consisting of keywords are enclosed in double quotes kind of like {jimmy,"walter, james"} being two entries. Also when matching with ~ it is matched against any part of the string, not the same as LIKE where it's matched against the whole string.
Here is what I did in my test database:
test=# select id, (xpath('/data/clients/name/text()', name))::text[] as xss, officeid from workers WHERE (xpath('/data/clients/name/text()', name))::text ~ ANY(ARRAY['wal','ant']);
id | xss | officeid
----+-------------------------+----------
2 | {anthony,walter} | 2
3 | {alicia,walter} | 3
4 | {"walter, james"} | 5
5 | {jimmy,"walter, james"} | 4
(4 rows)

How to select everything that is NOT part of this string in database field?

First: I'm using Access 2010.
What I need to do is pull everything in a field out that is NOT a certain string. Say for example you have this:
00123457*A8V*
Those last 3 characters that are bolded are just an example; that portion can be any combination of numbers/letters and from 2-4 characters long. The 00123457 portion will always be the same. So what I would need to have returned by my query in the example above is the "A8V".
I have a vague idea of how to do this, which involved using the Right function, with (field length - the last position in that string). So what I had was
SELECT Right(Facility.ID, (Len([ID) - InstrRev([ID], "00123457")))
FROM Facility;
Logically in this mind it would work, however Access 2010 complains that I am using the Right function incorrectly. Can someone here help me figure this out?
Many thanks!
Why not use a replace function?
REPLACE(Facility.ID, "00123457", "")
You are missing a closing square bracket in here Len([ID)
You also need to reverse this "00123457" in InStrRev(), but you don't need InStrRev(), just InStr().
If I understand correctly, you want the last three characters of the string.
The simple syntax: Right([string],3) will yield the results you desire.
(http://msdn.microsoft.com/en-us/library/ms177532.aspx)
For example:
(TABLE1)
| ID | STRING |
------------------------
| 1 | 001234567A8V |
| 2 | 008765432A8V |
| 3 | 005671234A8V |
So then you'd run this query:
SELECT Right([Table1.STRING],3) AS Result from Table1;
And the Query returns:
(QUERY)
| RESULT |
---------------
| A8V |
| A8V |
| A8V |
EDIT:
After seeing the need for the end string to be 2-4 characters while the original, left portion of the string is 00123457 (8 characters), try this:
SELECT Right([Table1].[string],(Len([Table1].[string])-'8')) AS Result
FROM table1;