SQL Query in StarcounterDB - sql

I have an SQL query problem in my Starcounter database where I want to write a query that retrieves html pages (Content) containing a specific set of keywords based on the following models.
class Keyword {
int Id;
string Text;
}
Example data:
Id Text
1 google
2 advertising
3 twitter
class Content {
int Id;
string Text;
}
Example data:
Id Text
4 "google advertising is nice for marketers"
5 "twitter is not a good way to get exposure"
class HasWord : Relation {
Keyword Keyword;
Content Content;
}
Example data:
Keyword Content
1 4
2 4
3 5
How would I write a Query that retrieves all Content containing 2 specific words, in this case "google" and "advertising"? Is it possible without nested queries?
Input:
"google" & "advertising"
Desired output:
"google advertising is nice for marketers"

If you need to find an instance of Content , which Text contains both words, then you can use operator LIKE with regular expression, which is described here. For example:
SELECT Text
FROM Content
WHERE Text LIKE '%google%' AND Text LIKE '%advertising%'
If you need to find an instance of Content, which is in relationship with both keywords, then you can use self-join. For example
SELECT r1.Content.Text
FROM HasWord r1, HasWord r2
WHERE r1.Content = r2.Content AND
r1.Keyword.Text = 'google' AND r2.Keyword.Text = 'advertising'
Note that the result of the query will contain duplicates.

Related

Filter based on 2 JSON properties after using json_extract

Imported database tables :
id | JSON
-------------|---------
Signed 32int | Raw JSON
It is easier to search via the properties of the JSON data than by id of the row itself. Each piece of JSON data contains (for this demo):
json: {
displayProperties: {},
hash: "foo"
itemType: "bar"
}
When I select I would like to matching hash, and then filter those results by a matching itemType.
My query :
SELECT json_extract(ItemDefinition.json, '$')
FROM ItemDefinition, json_tree(ItemDefinition.json, '$')
WHERE json_tree.key = 'hash' AND json_tree.value IN ${hashList}
However this returns every item that has a matching hash value. From here, I would like to also filter by key: itemType and value: "19". So I tried :
SELECT json_extract(ItemDefinition.json, '$')
FROM ItemDefinition, json_tree(ItemDefinition.json, '$')
WHERE json_tree.key = 'hash' AND json_tree.value IN ${hashList}
AND WHERE json_tree.key = 'itemType' AND json_tree.value = 19
But this isn't syntactically correct, let alone output what I am looking for. Error:
SQLITE_ERROR: near "WHERE": syntax error
The title of the question turned out to not be accurate to what I was looking for. I miss-understood what json_tree actually did. json_tree actually builds a new object with values that are filled in by the database.
What I was actually looking for was to filter by a specific value in the json column, which can be achieved by json_extract. json_extract('{column}', $.{filterValue}) will pull the raw json object out of the json column
This is the query that is working for me now:
SELECT json_extract(ItemDefinition.json, '$')
FROM ItemDefinition, json_tree(ItemDefinition.json, '$')
WHERE json_tree.key = 'hash'
AND json_tree.value IN ${hashList}
AND json_extract(ItemDefinition.json, '$.itemType') = 19
This selects the json column from ItemDefinition
Creates a json_tree from the json column
Filters results by json tree key and value
Finally filters by the property itemType from the raw json column

How to retrieve the list of dynamic nested keys of BigQuery nested records

My ELT tools imports my data in bigquery and generates/extends automatically the schema for dynamic nested keys (in the schema below, under properties)
It looks like this
How can I get the list of nested keys of a repeated record ? so for example I can group by properties when those items have said property non-null ?
I have tried
select column_name
from my_schema.INFORMATION_SCHEMA.COLUMNS
where
table_name = 'my_table
But it will only list first level keys
From the picture above, I want, as a first step, a SQL query that returns
message
user_id
seeker
liker_id
rateable_id
rateable_type
from_organization
likeable_type
company
existing_attempt
...
My real goal through, is to group/count my data based on a non-null value of a 2nd level nested properties properties.filters.[filter_type]
The schema may evolve when our application adds more filters, so this need to be dynamically generated, I can't just hard-code the list of nested keys.
Note: this is very similar to this question How to extract all the keys in a JSON object with BigQuery but in my case my data is already in a shcema and it's not a JSON object
EDIT:
Suppose I have a list of such records with nested properties, how do I write a SQL query that adds a field "enabled_filters" which aggregates, for each item, the list of properties for wihch said property is not null ?
Example input (properties.x are dynamic and not known by the programmer)
search_id
properties.filters.school
properties.filters.type
1
MIT
master
2
Princetown
null
3
null
master
Example output
search_id
enabled_filters
1
["school", "type"]
2
["school"]
3
["type"]
Have you looked at COLUMN_FIELD_PATHS? It should give you the paths for all columns.
select field_path from my_schema.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS where table_name = '<table>'
[https://cloud.google.com/bigquery/docs/information-schema-column-field-paths]
The field properties is not nested by array only by structures. Then a UDF in JavaScript to parse thise field should work fast enough.
CREATE TEMP FUNCTION jsonObjectKeys(input STRING, shownull BOOL,fullname Bool)
RETURNS Array<String>
LANGUAGE js AS """
function test(input,old){
var out=[]
for(let x in input){
let te=input[x];
out=out.concat(te==null ? (shownull?[x+'==null']:[]) : typeof te=='object' ? test(te,old+x+'.') : [fullname ? old+x : x] );
}
return out;
Object.keys(JSON.parse(input));
}
return test(JSON.parse(input),"");
""";
with tbl as (select struct(1 as alpha,struct(2 as x, 3 as y,[1,2,3] as z ) as B) A from unnest(generate_array(1,10*1))
union all select struct(null,struct(null,1,[999])) )
select *,
TO_JSON_STRING (A ) as string_output,
jsonObjectKeys(TO_JSON_STRING (A),true,false) as output1,
jsonObjectKeys(TO_JSON_STRING (A),false,true) as output2,
concat('["', array_to_string(jsonObjectKeys(TO_JSON_STRING (A),false,true),'","' ) ,'"]') as output_sring,
jsonObjectKeys(TO_JSON_STRING (A.B),false,true) as outpu
from tbl

Query to check if we have any row which start with input String

lets say have column with values
"Html", "Java", "JQuery", "Sqlite"
and if user enters
"Java is my favorite programming language"
then result should be
1 row selected with value "Java"
because entered sentence starts with "Java"
but if user enters
"My application database is in Sqlite"
then query should return empty.
0 row selected
because entered sentence does not start with "Sqlite".
I am trying following query:
SELECT * FROM demo where 'Java' like demo.name; // work
but if enter long text then it fails
SELECT * FROM demo where 'Java is my favorite programming language' like demo.name; // Fails
I think you want something like this :
SELECT * FROM demo WHERE yourText LIKE demo.name || '%' ;
Just put the column name on the left side of the like operator, and concatenate a wildcard at the right side of your string keywork:
select * from demo where name like 'Java%' ; -- or like 'Sqlite%' etc
You can make it more generic like:
select * from demo where name like ? || %' ;
where ? is a bind parameter that represents the value provided by the user (you may also concatenate with the wildcard in your application before passing the parameter).
Concatenate your column with a wildcard:
SELECT * FROM demo where 'Java is my favorite programming language' like demo.name || '%';
You can make your query like this
Select * from SearchPojo where name GLOB '*' || :namestring|| '*'"
Not sure if I got the problem but ...
You can query these values (Html, Java, JQuery, Sqlite) put them in a collection, like List, then grab user's input and check it:
List<String> values ... // Html, Java, JQuery, Sqlite
for (String value : values) {
if (userInput.contains(value) {
// print
System.out.println("1 row selected with value \"Java\");
}
}
It supports cases where user type more then one option.

Groovy SQL named list parameter

I want to use a keyset of a Map as a list parameter in a SQL query:
query = "select contentid from content where spaceid = :spaceid and title in (:title)"
sql.eachRow(query, [spaceid: 1234, title: map.keySet().join(',')]) {
rs ->
println rs.contentid
}
I can use single values but no Sets or Lists.
This is what I've tried so far:
map.keySet().join(',')
map.keySet().toListString()
map.keySet().toList()
map.keySet().toString()
The map uses Strings as key
Map<String, String> map = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
Also, I don't get an error. I just get nothing printed like have an empty result set.
You appoach will not give the expected result.
Logically you are using a predicate such as
title = 'value1,value2,value3'
This is the reason why you get no exception but also no data.
Quick search gives a little evidence, that a mapping of a collections to IN list is possible in Groovy SQL.
Please check here and here
So very probably you'll have to define the IN list in a proper length and assign the values from your array.
title in (:key1, :key2, :key3)
Anyway something like this works fine:
Data
create table content as
select 1 contentid, 1 spaceid, 'AAA' title from dual union all
select 2 contentid, 1 spaceid, 'BBB' title from dual union all
select 3 contentid, 2 spaceid, 'AAA' title from dual;
Groovy Script
map['key1'] = 'AAA'
map['key2'] = 'BBB'
query = "select contentid from content where spaceid = :spaceid and title in (${map.keySet().collect{":$it"}.join(',')})"
println query
map['spaceid'] = 1
sql.eachRow(query, map) {
rs ->
println rs.contentid
}
Result
select contentid from content where spaceid = :spaceid and title in (:key1,:key2)
1
2
The key step is to dynamicall prepare the IN list with proper names of the bind variable using the experssion map.keySet().collect{":$it"}.join(',')
Note
You may also want to check the size if the map and handle the case where it is greater than 1000, which is an Oracle limitation of a single IN list.
It has worked for me with a little adaptation, I've added the map as a second argument.
def sql = Sql.newInstance("jdbc:mysql://localhost/databaseName", "userid", "pass")
Map<String,Long> mapProduitEnDelta = new HashMap<>()
mapProduitEnDelta['key1'] = 1
mapProduitEnDelta['key2'] = 2
mapProduitEnDelta['key3'] = 3
produits : sql.rows("""select id, reference from Produit where id IN (${mapProduitEnDelta.keySet().collect{":$it"}.join(',')})""",mapProduitEnDelta),
Display the 3 products (colums + values from the produit table) of id 1, 2, 3

Semantic Similarity Result interpretation

I'm performing a semantic similarity using a tool here,
I'm getting the following results, but cannot properly interprete them:
apple#n#1,banana#n#1 0.04809463683080774
apple#n#1,banana#n#2 0.13293629283742603
apple#n#2,banana#n#1 0.0
apple#n#2,banana#n#2 0.0
here is the code:
URL url = new URL ( "file" , null , "dictionary/3.0/dict" );
IDictionary dict = new Dictionary ( url ) ;
dict.open () ;
// look up first sense of the word " dog "
IIndexWord idxWord = dict . getIndexWord ( "dog" , POS.NOUN ) ;
IWordID wordID = idxWord . getWordIDs () . get (0) ; // 1 st meaning
List <IWordID> wordIDs = idxWord.getWordIDs();
JWS ws= new JWS ("dictionary", "3.0");
TreeMap <String,Double> scores1 = ws.getJiangAndConrath().jcn("apple", "banana", "n");
for (String s:scores1.keySet())
System.out.println(s+"\t"+scores1.get(s));
From the NLTK Documentation:
The Jiang Conrath similarity returns a score denoting how similar two
word senses are, based on the Information Content (IC) of the Least
Common Subsumer (most specific ancestor node) and that of the two
input Synsets. The relationship is given by the equation 1 / (IC(s1) +
IC(s2) - 2 * IC(lcs)).
A result of 0 means that the two concepts are not related at all.
A result near 1 would mean a very close relationship.
can you put me code source written in JAVA responsible for the execution of LeacockAndChodorow algorithm because I do have some problems with Url variable?