SPARQL column headings with different name - sparql

I am new to SANSA-STACK and I am using SPARQL Query to perform some operations on Triples RDD , I am using Select with some column names, but when I am completing the query, the column names are getting changed to some random values.
val query = s""" PREFIX ns0: <https://www.example.com/discovery/catalog/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?ColumnRef
WHERE
{
{<https://www.example.com/db/h2/fred/2020/table/FRED.FRED.US_REGIONS}> ns0:column ?ColumnRef .}
}
"""
val result : sql.DataFrame = triples.sparql(query)
result.show()
The output of result.show() has the column name getting changed.
+--------------------+
| o|
+--------------------+
|https://www.examp...|
|https://www.examp...|
|https://www.examp...|
|https://www.examp...|
+--------------------+
I am new to this technology stack, please let me know what I am doing wrong.

Here is a temporary solution that works for my purposes that returns a dataframe with the expected column names. By decomposing rdd.sparql, the rewrite object can be used to obtain the column mappings: https://gist.github.com/JNKHunter/c16caa882993facb31a273ec274cb8e3
Warning: Sansa query often returns more than one column for each sparql selected element. In those cases this code simply concatenates those columns since columns after the first tend to contain empty strings. Concatenation may not be what you want, however I've yet to discover what those empty string dataframe columns represent.

Related

Elasticsearch, Elasticsearch SQL, SHOW COLUMNS or DESCRIBE - is there a posibility to filter the output

I have simple elastic SQL query like this:
GET /_sql?format=txt
{
"query" :"""
DESCRIBE "index_name"
"""
}
and it works, and the output is like this:
column | type | mapping
-----------------------------------------------------------
column_name1 | STRUCT | object
column_name1.Id | VARCHAR | text
column_name1.Id.keyword | VARCHAR | keyword
Is there a possibility to the prepare above query using filter or where, for example something like this:
GET /_sql?format=txt
{
"query":"""
DESCRIBE "index_name"
""",
"filter": {"terms": {"type.keyword": ["STRUCT"]}}
}
or
GET /_sql?format=txt
{
"query":"""
DESCRIBE "index_name"
WHERE "type" = 'STRUCT'
"""
}
That is not possible, no.
While the DESCRIBE sql command seems to return tabular data, it is not a query and it does not support WHERE clauses or can be used within a SELECT statement. That is actually not specific to Elasticsearch, but the same in RDBMs.
The same apparently is true for the Elasticsearch filter clause. This again will work with SELECT SQL statements, but with DESCRIBE or SHOW COLUMNS - while not producing an error - it simply will have no effect on the results.
In "real" SQL, you could work around this by querying information_schema.COLUMNS, but that is not an option in Elasticsearch.

SPARQL same predicates with different values

I am using SPARQL to query my data from the triplestore.
In the triplestore I have a form containing a table, the table has different table entries. Each entry has a number of predicates. I want to use the predicates index and cost. My query needs to catch the first 2 rows, indices 0 and 1 and save the cost under a specific variable for each.
So I have tried the following (prefix ext is ommited) :
SELECT DISTINCT ?entry ?priceOne ?priceTwo
WHERE {
?form ext:table ?table.
?table ext:entry ?entry.
?entry ext:index 0;
ext:cost ?priceOne.
?entry ext:index 1;
ext:cost ?priceTwo.
}
This however does not show any values, if I remove the second part (with index 1) than I do get ?priceOne. How can I get both values?
Your current query finds each ?entry that has both ext:index values, 0 and 1. You could avoid this by using something like ?entry0 and ?entry1, essentially duplicating your triple patterns.
But typically, you would match alternatives with UNION:
SELECT DISTINCT ?entry ?priceOne ?priceTwo
WHERE {
# a shorter way to specify this, if you don’t need the ?table variable
?form ext:table/ext:entry ?entry .
{
?entry ext:index 0 ;
ext:cost ?priceOne .
}
UNION
{
?entry ext:index 1 ;
ext:cost ?priceTwo .
}
}

Can I combine these two JOOQ queries into one?

I have two queries which look at separate database tables and find items from a JSONB column in each table that are in the format ["tag1","tag2","tag3"] etc. The purpose of the queries are to populate a list for a predictive dropdown i.e. if the list contains "dog" and the user types "d", "dog" should be returned. Each of these queries works individually and I can easily combine them into a single JOOQ query?
final Field<String> value = field(name("A", "value"), String.class);
final Result<Record1<String>> res1 = sql.dsl()
.selectDistinct(value)
.from(CAMPAIGN,lateral(table("jsonb_array_elements_text({0})", CAMPAIGN.TAGS)).as("A"))
.where(CAMPAIGN.STORE_KEY.equal(campaign.getStoreKey()))
.and(CAMPAIGN.CAMPAIGN_KEY.notEqual(campaignKey))
.and(value.like(search + "%%"))
.fetch();
final Result<Record1<String>> res2 = sql.dsl()
.selectDistinct(value)
.from(STOREFRONT, lateral(table("jsonb_array_elements_text({0})", STOREFRONT.TAGS)).as("A"))
.where(STOREFRONT.STORE_KEY.equal(campaign.getStoreKey()))
.and(value.like(search + "%%")).fetch();
Sure! In SQL, "combining" two queries is mostly implemented using UNION [ ALL ] (where ALL indicates that you want to maintain duplicates). In your case, write the following:
final Result<Record1<String>> result =
sql.dsl()
.select(value)
.from(
CAMPAIGN,
lateral(table("jsonb_array_elements_text({0})", CAMPAIGN.TAGS)).as("A"))
.where(CAMPAIGN.STORE_KEY.equal(campaign.getStoreKey()))
.and(CAMPAIGN.CAMPAIGN_KEY.notEqual(campaignKey))
.and(value.like(search + "%%"))
.union(
select(value)
.from(
STOREFRONT,
lateral(table("jsonb_array_elements_text({0})", STOREFRONT.TAGS)).as("A"))
.where(STOREFRONT.STORE_KEY.equal(campaign.getStoreKey()))
.and(value.like(search + "%%")))
.fetch();
Note that I have replaced selectDistinct() by select(), because the UNION operation already removes duplicates, so there's no need to remove duplicates in each individual union subquery.

Using SQLDF to select specific values from a column

SQLDF newbie here.
I have a data frame which has about 15,000 rows and 1 column.
The data looks like:
cars
autocar
carsinfo
whatisthat
donnadrive
car
telephone
...
I wanted to use the package sqldf to loop through the column and
pick all values which contain "car" anywhere in their value.
However, the following code generates an error.
> sqldf("SELECT Keyword FROM dat WHERE Keyword="car")
Error: unexpected symbol in "sqldf("SELECT Keyword FROM dat WHERE Keyword="car"
There is no unexpected symbol, so I'm not sure whats wrong.
so first, I want to know all the values which contain 'car'.
then I want to know only those values which contain just 'car' by itself.
Can anyone help.
EDIT:
allright, there was an unexpected symbol, but it only gives me just car and not every
row which contains 'car'.
> sqldf("SELECT Keyword FROM dat WHERE Keyword='car'")
Keyword
1 car
Using = will only return exact matches.
You should probably use the like operator combined with the wildcards % or _. The % wildcard will match multiple characters, while _ matches a single character.
Something like the following will find all instances of car, e.g. "cars", "motorcar", etc:
sqldf("SELECT Keyword FROM dat WHERE Keyword like '%car%'")
And the following will match "car" or "cars":
sqldf("SELECT Keyword FROM dat WHERE Keyword like 'car_'")
This has nothing to do with sqldf; your SQL statement is the problem. You need:
dat <- data.frame(Keyword=c("cars","autocar","carsinfo",
"whatisthat","donnadrive","car","telephone"))
sqldf("SELECT Keyword FROM dat WHERE Keyword like '%car%'")
# Keyword
# 1 cars
# 2 autocar
# 3 carsinfo
# 4 car
You can also use regular expressions to do this sort of filtering. grepl returns a logical vector (TRUE / FALSE) stating whether or not there was a match or not. You can get very sophisticated to match specific items, but a basic query will work in this case:
#Using #Joshua's dat data.frame
subset(dat, grepl("car", Keyword, ignore.case = TRUE))
Keyword
1 cars
2 autocar
3 carsinfo
6 car
Very similar to the solution provided by #Chase. Because we do not use subset we do not need a logical vector and can use both grep or grepl:
df <- data.frame(keyword = c("cars", "autocar", "carsinfo", "whatisthat", "donnadrive", "car", "telephone"))
df[grep("car", df$keyword), , drop = FALSE] # or
df[grepl("car", df$keyword), , drop = FALSE]
keyword
1 cars
2 autocar
3 carsinfo
6 car
I took the idea from Selecting rows where a column has a string like 'hsa..' (partial string match)

Merging result from 2 columns with same name and not over-writing one

I have a simple MySQL query like:
SELECT *
FROM `content_category` CC , `content_item` CI
WHERE CI.content_id = '" . (int)$contentId . "'
AND CI.category_id = CC.category_id
AND CI.active = 1
Both tables have a column called configuration one of which gets overwritten in the query i.e only content_item.configuration is returned in the result.
Short of implicitly naming and aliasing the columns like
SELECT CC.configuration as `category_configuration`,
CC.category_id as `.....
is there a way of selecting ALL data i.e * from both and resolve those duplicate column names in a non-destructive way.
You don't need to alias ALL the columns, just the one conflicting one:
SELECT *,CC.configuration as cc_conf, CI.configuration as ci_conf FROM `content_category` CC , `content_item` CI WHERE
CI.content_id = '" . (int)$contentId . "'
AND CI.category_id = CC.category_id
AND CI.active = 1
This demonstrates one of the many reasons why using the * wildcard is not a good practice all the time. All the columns are returned in the result set, but if you access them via an associative array or via object properites in your host language (e.g. PHP or Ruby) you can naturally only have one of the columns associated with each key or object property.
Solutions:
Fetch them all and reference the columns by ordinal position.
Stop using the wildcard for one table or the other, and give column aliases.
Rename your columns to be distinct.
Define a VIEW with the column aliasing spelled out, and query from the view.