What SQLite column name can be/cannot be? - sql

Is there any rule for the SQLite's column name?
Can it have characters like '/'?
Can it be UTF-8?

Can it have characters like '/'?
All examples are from SQlite 3.5.9 running on Linux.
If you surround the column name in double quotes, you can:
> CREATE TABLE test_forward ( /test_column INTEGER );
SQL error: near "/": syntax error
> CREATE TABLE test_forward ("/test_column" INTEGER );
> INSERT INTO test_forward("/test_column") VALUES (1);
> SELECT test_forward."/test_column" from test_forward;
1
That said, you probably shouldn't do this.

The following answer is based on the SQLite source code, mostly relying on the file parse.y (input for the lemon parser).
TL;DR:
The allowed series of characters for column and table names in CREATE TABLE statements are
'-escaped strings of any kind (even keywords)
Identifiers, which means
``` and "-escaped strings of any kind (even keywords)
a series of the MSB=1 8-bit ASCII characters or 7-bit ASCII characters with 1 in the following table that doesn't form a keyword:
Keyword INDEXED because it's non-standard
Keyword JOIN for reason that is unknown to me.
The allowed series of characters for result columns in a SELECT statement are
Either a string or an identifier as described above
All of the above if used as a column alias written after AS
Now to the exploration process itself
let's look at the syntax for CREATE TABLE columns
// The name of a column or table can be any of the following:
//
%type nm {Token}
nm(A) ::= id(X). {A = X;}
nm(A) ::= STRING(X). {A = X;}
nm(A) ::= JOIN_KW(X). {A = X;}
digging deeper, we find out that
// An IDENTIFIER can be a generic identifier, or one of several
// keywords. Any non-standard keyword can also be an identifier.
//
%type id {Token}
id(A) ::= ID(X). {A = X;}
id(A) ::= INDEXED(X). {A = X;}
"Generic identifier" sounds unfamiliar. A quick look into tokenize.c however brings forth the definition
/*
** The sqlite3KeywordCode function looks up an identifier to determine if
** it is a keyword. If it is a keyword, the token code of that keyword is
** returned. If the input is not a keyword, TK_ID is returned.
*/
/*
** If X is a character that can be used in an identifier then
** IdChar(X) will be true. Otherwise it is false.
**
** For ASCII, any character with the high-order bit set is
** allowed in an identifier. For 7-bit characters,
** sqlite3IsIdChar[X] must be 1.
**
** Ticket #1066. the SQL standard does not allow '$' in the
** middle of identfiers. But many SQL implementations do.
** SQLite will allow '$' in identifiers for compatibility.
** But the feature is undocumented.
*/
For a full map of identifier characters, please consult the tokenize.c.
It is still unclear what are the available names for a result-column (i. e. the column name or alias assigned in the SELECT statement). parse.y is again helpful here.
// An option "AS <id>" phrase that can follow one of the expressions that
// define the result set, or one of the tables in the FROM clause.
//
%type as {Token}
as(X) ::= AS nm(Y). {X = Y;}
as(X) ::= ids(Y). {X = Y;}
as(X) ::= . {X.n = 0;}

Except for placing "illegal" identifier names between double quotes "identifier#1", [ before and ] after works as well [identifire#2].
Example:
sqlite> create table a0.tt ([id#1] integer primary key, [id#2] text) without rowid;
sqlite> insert into tt values (1,'test for [x] id''s');
sqlite> select * from tt
...> ;
id#1|id#2
1|test for [x] id's

Valid field names are subject to the same rules as valid Table names. Checked this with SQlite administrator.
Only Alphanumeric characters and underline are allowed
The field name must begin with an alpha character or underline
Stick to these and no escaping is needed and it may avoid future problems.

This isn't a full answer but may help - these are the keywords from https://www.sqlite.org/lang_keywords.html converted to an array.
["ABORT", "ACTION", "ADD", "AFTER", "ALL", "ALTER", "ALWAYS", "ANALYZE", "AND", "AS",
"ASC", "ATTACH", "AUTOINCREMENT", "BEFORE", "BEGIN", "BETWEEN", "BY", "CASCADE", "CASE", "CAST",
"CHECK", "COLLATE", "COLUMN", "COMMIT", "CONFLICT", "CONSTRAINT", "CREATE", "CROSS", "CURRENT", "CURRENT_DATE",
"CURRENT_TIME", "CURRENT_TIMESTAMP", "DATABASE", "DEFAULT", "DEFERRABLE", "DEFERRED", "DELETE", "DESC", "DETACH", "DISTINCT",
"DO", "DROP", "EACH", "ELSE", "END", "ESCAPE", "EXCEPT", "EXCLUDE", "EXCLUSIVE", "EXISTS",
"EXPLAIN", "FAIL", "FILTER", "FIRST", "FOLLOWING", "FOR", "FOREIGN", "FROM", "FULL", "GENERATED",
"GLOB", "GROUP", "GROUPS", "HAVING", "IF", "IGNORE", "IMMEDIATE", "IN", "INDEX", "INDEXED",
"INITIALLY", "INNER", "INSERT", "INSTEAD", "INTERSECT", "INTO", "IS", "ISNULL", "JOIN", "KEY",
"LAST", "LEFT", "LIKE", "LIMIT", "MATCH", "MATERIALIZED", "NATURAL", "NO", "NOT", "NOTHING",
"NOTNULL", "NULL", "NULLS", "OF", "OFFSET", "ON", "OR", "ORDER", "OTHERS", "OUTER",
"OVER", "PARTITION", "PLAN", "PRAGMA", "PRECEDING", "PRIMARY", "QUERY", "RAISE", "RANGE", "RECURSIVE",
"REFERENCES", "REGEXP", "REINDEX", "RELEASE", "RENAME", "REPLACE", "RESTRICT", "RETURNING", "RIGHT", "ROLLBACK",
"ROW", "ROWS", "SAVEPOINT", "SELECT", "SET", "TABLE", "TEMP", "TEMPORARY", "THEN", "TIES",
"TO", "TRANSACTION", "TRIGGER", "UNBOUNDED", "UNION", "UNIQUE", "UPDATE", "USING", "VACUUM", "VALUES",
"VIEW", "VIRTUAL", "WHEN", "WHERE", "WINDOW", "WITH", "WITHOUT"]

Related

SQL for json array in column

I have a SQL table with one of the column as jsonb datatype. Below is a json entry:
{
"size": -1,
"regions": [
{
"shape_attributes": {
"name": "polygon",
"X": [
2703,
2801,
2884
]
},
"region_attributes": {
"Material Type": "wood",
"Color": "red"
}
},
{
"shape_attributes": {
"name": "polygon",
"X": [
2397,
2504,
2767
]
},
"region_attributes": {
"Material Type": "metal",
"Color": "blue"
}
}
],
"filename": "filenam_1"
}
I am using PostgresSQL.
Given a search_string, how can I use SQL to select rows for the two cases-
Key is known
Key is not known, i.e. string anywhere in json
I have tried this
select *
from TABLE_Name
WHERE ‘wood’ IN ( SELECT value FROM OPENJSON(COL_NAME,'$.Material Type'))
---
Error occurred during SQL query execution
Reason:
SQL Error [42883]: ERROR: function openjson(jsonb, unknown) does not exist
Hint: No function matches the given name and argument types. You might need to add explicit type casts.
SELECT *
FROM TABLE_Name
CROSS APPLY OPENJSON(COL_NAME,'$.Material Type')
WHERE value ='wood'
---
Error occurred during SQL query execution
Reason:
SQL Error [42601]: ERROR: syntax error at or near "APPLY"
To find a key/value pair with a known key, you can use several different methods, using the contains operator is one of them:
select *
from table_name
where the_jsonb_column #> '{"regions": [{"region_attributes": {"Material Type": "wood"}}]}'
The equivalent of the mentioned openjson function (from SQL Server) would be jsonb_each() but that (just like openjson) will only expand the top-level key/value pairs. It doesn't do this recursively.
If you at least know the key is somewhere in the regions array, you can use a JSON/Path expression that iterates over all elements (recursively):
select *
from table_name
where (t.the_jsonb_column -> 'regions') ## '$[*].** == "wood"'
I think what you are doing isn't even possible at all, unless I don't know it. You could rather use a programming language, like Python or C# and execute the SQL Queries in the program. It is much more easier.

how to use trino/presto to query redis

I have a simple string and hash stored in redis
get test
"1"
hget htest first
"first hash"
I'm able to see the "table" test, but there are no columns
trino> show columns from redis.default.test;
Column | Type | Extra | Comment
--------+------+-------+---------
(0 rows)
and obviously I can't get result from select
trino> select * from redis.default.test;
Query 20210918_174414_00006_dmp3x failed: line 1:8: SELECT * not allowed from relation
that has no columns
I see in the documentation that I might need to create a table definition file, but I wasn't able to create one that will work.
I had few variations of this, but this is the one for example:
{
"tableName": "test",
"schemaName": "default",
"value": {
"dataFormat": "json",
"fields": [
{
"name": "number",
"mapping": 0,
"type": "INT"
}
]
}
}
any idea what am I doing wrong?
I focused on the string since it's simpler, but I also need to query the hash

How do I INSERT columns with nested name syntax (ie. "item.description")?

I'm trying to merge two databases with the same schema on Google BigQuery.
I'm following the merge samples here: https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement
However, my tables have nested columns, ie "service.id" or "service.description"
My code so far is:
MERGE combined_table
USING table1
ON table1.id = combined_table.id
WHEN NOT MATCHED THEN
INSERT(id, service.id, service.description)
VALUES(id, service.id, service.description)
However, I get the error message: Syntax error: Expected ")" or "," but got ".", and a red squiggly underline under .id on the INSERT(...) line.
Here is a view of part of my table's schema:
[
{
"name": "id",
"type": "STRING"
},
{
"name": "service",
"type": "RECORD",
"fields": [
{
"name": "id",
"type": "STRING"
},
{
"name": "description",
"type": "STRING"
}
]
},
{
"name": "cost",
"type": "FLOAT"
}
...
]
How do I properly structure this INSERT(...) statement so that I can include the nested columns?
Syntax error: Expected ")" or "," but got "."
Looks like you are on the right direction, Note in the documentation how you need to insert value to a REPEATED column,
You need to define the structure to guide BigQuery what to expect, For example:
STRUCT<created DATE, comment STRING>
This is the full example from the documentation
MERGE dataset.DetailedInventory T
USING dataset.Inventory S
ON T.product = S.product
WHEN NOT MATCHED AND quantity < 20 THEN
INSERT(product, quantity, supply_constrained, comments)
-- insert values like this
VALUES(product, quantity, true, ARRAY<STRUCT<created DATE, comment STRING>>[(DATE('2016-01-01'), 'comment1')])
WHEN NOT MATCHED THEN
INSERT(product, quantity, supply_constrained)
VALUES(product, quantity, false)
I've found the answer.
It turns out when referencing the top level of a STRUCT, BigQuery references all of the nested columns as well. So if I wanted to INSERT service and all of it's sub-columns (service.id and service.description), I only have to include service in the INSERT(...) statement.
The following code worked:
...
WHEN NOT MATCHED THEN
INSERT(id, service)
VALUES(id, service)
This would merge all sub columns, including service.id and service.description.

How to setup a field mapping for ElasticSearch that allows both exact and full text searching?

Here is my problem:
I have a field called product_id that is in a format similar to:
A+B-12321412
If I used the standard text analyzer it splits it into tokens like so:
/_analyze/?analyzer=standard&pretty=true" -d '
A+B-1232412
'
{
"tokens" : [ {
"token" : "a",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "b",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "1232412",
"start_offset" : 5,
"end_offset" : 12,
"type" : "<NUM>",
"position" : 3
} ]
}
Ideally, I would like to sometimes search for an exact product id and other times use a sub string and or just do a query for part of the product id.
My understanding of mappings and analyzers is that I can only specify one analyzer per field.
Is there a way to store a field as both analyzed and exact match?
Yes, you can use the fields parameter. In your case:
"product_id": {
"type": "string",
"fields": {
"raw": { "type": "string", "index": "not_analyzed" }
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html
This allows you to index the same data twice, using two different definitions. In this case it will be indexed via both the default analyzer and not_analyzed which will only pick up exact matches. This is also useful for sorting return results:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-fields.html
However, you will need to spend some time thinking about how you want to search. In particular, given part numbers with a mix of alpha, numeric and punctuation or special characters you may need to get creative to tune your queries and matches.

Search multiple fields with "and" operator (but use fields' own analyzers)

ElasticSearch Version: 0.90.2
Here's the problem: I want to find documents in the index so that they:
match all query tokens across multiple fields
fields own analyzers are used
So if there are 4 documents:
{ "_id" : 1, "name" : "Joe Doe", "mark" : "1", "message" : "Message First" }
{ "_id" : 2, "name" : "Ann", "mark" : "3", "message" : "Yesterday Joe Doe got 1 for the message First"}
{ "_id" : 3, "name" : "Joe Doe", "mark" : "2", "message" : "Message Second" }
{ "_id" : 4, "name" : "Dan Spencer", "mark" : "2", "message" : "Message Third" }
And the query is "Joe First 1" it should find ids 1 and 2. I.e., it should find documents which contain all the tokens from search query, no matter in which fields they are (maybe all tokens are in one field, or maybe each token is in its own field).
One solution would be to use elasticsearch "_all" field functionality: that way it will merge all the fields I need (name, mark, message) into one and I'll be able to query it with something like
"match": {
"_all": {
"query": "Joe First 1",
"operator": "and"
}
}
But this way I can specify analyzer for the "_all" field only. And I need "name" and "message" fields to have different set of tokenizers/token filters (let's say name will have phonetic analyzer and message will have some stemming token filter).
Is there a way to do this?
Thanks to guys at elasticsearch group, here's the solution... pretty simple need to say :)
All I needed to do is to use query_string query http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query/ with default_operator = AND and it will do the trick:
{
"query": {
"query_string": {
"fields": [
"name",
"mark",
"message"
],
"query": "Joe First 1",
"default_operator": "AND"
}
}
}
I think using a multi match query makes sense here. Something like:
"multi_match": {
"query": "Joe First 1",
"operator": "and"
"fields": [ "name", "message", "mark"]
}
As you say, you can set the analyzer (or search_analyzer/index_analyzer) to be used on the _all field. It seems to me that should indeed be your first step to achieve the query results you're looking for.
From http://jontai.me/blog/2012/10/lucene-scoring-and-elasticsearch-_all-field/, we have this tasty quote:
... the _all field copies the text from the other fields and analyzes
them again; it doesn’t copy the pre-analyzed tokens. You can set a
separate analyzer for the _all field.
Which I interpret to mean that you should set your _all analyzer(s) as well as individual field analyzer(s). The _all field won't re-analyze the individual field data; it will grab the original field contents.