I have a Oracle SQL Query as below :
select id from docs where CONTAINS (text,
'<query>
<textquery lang="ENGLISH" grammar="CONTEXT"> Informix
<progression>
<seq><rewrite>transform((TOKENS, "{", "}", "AND"))</rewrite></seq>
</progression>
</textquery>
</query>')>0;
The Query Works as expected. But I want to search for word Inform / Infor / Info. So I altered the query to below :
select id from docs where CONTAINS (text,
'<query>
<textquery lang="ENGLISH" grammar="CONTEXT"> Informix
<progression>
<seq><rewrite>transform((TOKENS, "?{", "}", "AND"))</rewrite></seq>
</progression>
</textquery>
</query>')>0;
By adding extra "?" in transform function. But this looks for informix / informi / inform / infor / info / inf / in. I want to restrict the search to a specific characters 4. Say till info. How can the same be achieved?
Thanks.
To find all documents that contain at least one occurrence of any of the terms between informix and info use the OR operator
and list all you allowerd terms in the template
<query>
<textquery lang="ENGLISH" grammar="CONTEXT"> informix informi inform infor info
<progression>
<seq><rewrite>transform((TOKENS, "{", "}", "OR"))</rewrite></seq>
</progression>
</textquery>
</query>
But the usage of template is not realy meaninfull here.
The same result you get with a direct query
select score(1), id from docs
where contains(text,'informix OR informi OR inform OR infor OR info',1) > 0
order by 1 desc;
The advantage of this case is that you can controll the score by prefering the documents with longer string with higher weights
select score(1), id from docs
where contains(text,'informix*5 OR informi*4 OR inform*3 OR infor*2 OR info',1) > 0
order by 1 desc;
Btw the ? (fuzzy) operator is used IMO to find misspelled words, not the exact prefixes of a term.
UPDATE
The concatenation of the prefixes you may assembly in PL/SQL or if necessary in SQL such as follows:
with txt as (
select 'informix' text from dual),
txt2 as (
select
substr(text,1,length(text) -rownum+1) text
from txt connect by level <= length(text) -3
)
select
LISTAGG( text, ', ') WITHIN GROUP (ORDER BY text desc)
from txt2
.
informix, informi, inform, infor, info
Related
I've got a model method that conditionally concatenates the user's username ("login") and real name, if they've saved a real name - otherwise it just shows the username. I'd like to rewrite the query in ActiveRecord or Arel.
It looks like I should use an Arel::Nodes::NamedFunction. But i don't understand how to do the conditional concatenation with a named function. (Does Arel know about "if"? I can't find any reference in the docs.)
def primer_values
connection.select_values(%(
SELECT CONCAT(users.login,
IF(users.name = "", "", CONCAT(" <", users.name, ">")))
FROM users
ORDER BY IF(last_login > CURRENT_TIMESTAMP - INTERVAL 1 MONTH,
last_login, NULL) DESC,
contribution DESC
LIMIT 1000
)).uniq.sort
end
There's also similarly a conditional in ORDER BY.
While generally I abhor Raw SQL in rails given this usage I'd leave it as is. Although I might change it to something a bit more idiomatic like.
User
.order(
Arel.sql("IF(last_login > CURRENT_TIMESTAMP - INTERVAL 1 MONTH,last_login, NULL)").desc,
User.arel_table[:contribution].desc)
.limit(1000)
.pluck(Arel.sql(
'CONCAT(users.login,
IF(users.name = "", "",
CONCAT(" <", users.name, ">")))'))
.uniq.sort
Converting this to Arel without abstracting it into an object of its own will damage the readability significantly.
That being said just to give you an idea; the first part would be 3 NamedFunctions
CONCAT
IF
CONCAT
Arel::Nodes::NamedFuction.new(
"CONCAT",
[User.arel_table[:name],
Arel::Nodes::NamedFuction.new(
"IF",
[User.arel_table[:name].eq(''),
Arel.sql("''"),
Arel::Nodes::NamedFuction.new(
"CONCAT",
[Arel.sql("' <'"),
User.arel_table[:name],
Arel.sql("'>'")]
)]
)]
)
A NamedFunction is a constructor for FUNCTION_NAME(ARG1,ARG2,ARG3) so any SQL that uses this syntax can be created using NamedFunction including empty functions like NOW() or other syntaxes like LATERAL(query).
I'm working with Big Query's Hacker News dataset, and was looking at which urls have the most news stories. I'd also like to strip the domain names out, and see which of those have the most news stories. I'm working in R, and am having a bit of trouble getting the follow query to work.
sql_domain <- "SELECT url,
REPLACE(CASE WHEN REGEXP_CONTAINS(url, '//')
THEN url ELSE 'http://' + url END, '&', '?') AS domain_name,
COUNT(domain_name) as story_number
FROM `bigquery-public-data.hacker_news.full`
WHERE type = 'story'
GROUP BY domain_name
ORDER BY story_number DESC
LIMIT 10"
I've been getting the following error: "Error: No matching signature for operator + for argument types: STRING, STRING. Supported signatures: INT64 + INT64; FLOAT64 + FLOAT64; NUMERIC + NUMERIC"
Can't for the life of me figure out a replacement for the "+" operator. Your help is much appreciated!
Can't for the life of me figure out a replacement for the "+" operator
In BigQuery - instead of 'http://' + url you should use CONCAT('http://', url)
For your goals (top domains submitting to Hacker News):
#standardSQL
SELECT NET.REG_DOMAIN(url) domain, COUNT(*) c
, ROUND(EXP(AVG(LOG(IF(score<=0,0.1,score)))),2) avg_score
FROM `bigquery-public-data.hacker_news.full`
WHERE type = 'story'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 100
Note how much easier is to call NET.REG_DOMAIN() to get the domain.
I have an xml like the below.
<LPNDetail>
<ItemName>5054807025389</ItemName>
<DistroNbr/>
<DistributionNbr>TR001000002514</DistributionNbr>
<OrderLine>2</OrderLine>
<RefField2/>
<RefField3>OU01180705</RefField3>
<RefField4>0002</RefField4>
<RefField5>Retail</RefField5>
<Qty>4</Qty>
<QtyUom>Unit</QtyUom>
</LPNDetail>
<LPNDetail>
<ItemName>5054807025563</ItemName>
<DistroNbr/>
<DistributionNbr>TR001000002514</DistributionNbr>
<OrderLine>4</OrderLine>
<RefField2/>
<RefField3>OU01180705</RefField3>
<RefField4>0004</RefField4>
<RefField5>Retail</RefField5>
<Qty>2</Qty>
<QtyUom>Unit</QtyUom>
</LPNDetail>
I have extracted the xml field using extract.xmltype and now i am getting the below result.
42
But i need to sum the quantity values i.e i need to get result as 6 (4+2).
Any help will be appreciated.
Thanks,
Shihaj
It is not clear what you mean by "an xml". If it's supposed to be an XML document, you are missing the outermost tags, perhaps something like <Document> ..... </Document>
If your text value is EXACTLY as you have shown it (which would be pretty bad), you can wrap within such outermost tags manually, and then use standard Oracle XML tools. For the illustration below I assume you simply have a string (VARCHAR2 or CLOB), not converted to XML type; in that case, I concatenate the beginning and end tags, and then convert to XMLtype, in the query.
with t ( str ) as (
select '<LPNDetail>
<ItemName>5054807025389</ItemName>
<DistroNbr/>
<DistributionNbr>TR001000002514</DistributionNbr>
<OrderLine>2</OrderLine>
<RefField2/>
<RefField3>OU01180705</RefField3>
<RefField4>0002</RefField4>
<RefField5>Retail</RefField5>
<Qty>4</Qty>
<QtyUom>Unit</QtyUom>
</LPNDetail>
<LPNDetail>
<ItemName>5054807025563</ItemName>
<DistroNbr/>
<DistributionNbr>TR001000002514</DistributionNbr>
<OrderLine>4</OrderLine>
<RefField2/>
<RefField3>OU01180705</RefField3>
<RefField4>0004</RefField4>
<RefField5>Retail</RefField5>
<Qty>2</Qty>
<QtyUom>Unit</QtyUom>
</LPNDetail>'
from dual
)
-- End of SIMULATED table (for testing purposes only, not part of the solution)
-- Query begins below this line
select sum(x.qty) as total_quantity
from t,
xmltable('/Document/LPNDetail'
passing xmltype('<Document>' || t.str || '</Document>')
columns qty number path 'Qty') x
;
Output:
TOTAL_QUANTITY
--------------
6
I am getting into Hive and learning hive. I have customer table in teradata , used sqoop to extract complete table in hive which worked fine.
See below customer table both in Teradata and HIVE.
In Teradata :
select TOP 4 id,name,'"'||status||'"' from customer;
3172460 Customer#003172460 "BUILDING "
3017726 Customer#003017726 "BUILDING "
2817987 Customer#002817987 "COMPLETE "
2817984 Customer#002817984 "BUILDING "
In HIVE :
select id,name,CONCAT ('"' , status , '"') from customer LIMIT 4;
3172460 Customer#003172460 "BUILDING "
3017726 Customer#003017726 "BUILDING "
2817987 Customer#002817987 "COMPLETE "
2817984 Customer#002817984 "BUILDING "
When I tried to fetch records from table customer with column matching which is of String type. I am getting different result for same query in different environment.
See below query results..
In Teradata :
select TOP 2 id,name,'"'||status||'"' from customer WHERE status = 'BUILDING';
3172460 Customer#003172460 "BUILDING "
3017726 Customer#003017726 "BUILDING "
In HIVE :
select id,name,CONCAT ('"' , status , '"') from customer WHERE status = 'BUILDING' LIMIT 2;
**<<No Result>>**
It seems that teradata is doing trimming short of thing before actually comparing stating values. But Hive is matching strings as it is.
Not sure, It is expected behaviour or bug or can be raised as enhancement.
I see below possible solution:
* Convert into like operator expression with wildcard charater before and after
Looking forward for your response on this. How can it be handled/achieved in hive.
You could use rtrim function, i.e:
select id,name,CONCAT ('"' , status , '"') from customer WHERE rtrim(status) = 'BUILDING' LIMIT 2;
But question here arise what standard in string comparision Hive uses? According to ANSI/ISO SQL-92 'BUILDING' == 'BUILDING ', Here is a link for an article about it.
Answering to my own question ...got it from hive-user mailing list
Hive is only PRO SQL compliance,
In hive the string comparisons work just like they would work in java
so in hive
"BUILDING" = "BUILDING"
"BUILDING " != "BUILDING" (extra space added)
I am new to SQL programming and I am trying to figure out how to get a report to show a mismatch in System Names & DNS Names. Both of the columns are in a table called nodes.
System Name router-1-dc and the DNS would be router-1-dc.domain I am trying to find Nodes that don't match to the "." prior to the domain example for this would be
System Name "router-1-datacenter" and DNS Name "router-1-dc.domain" I would want this example to show on the report page.
The tricky part is that some of the system names have the ".domain" and some don't.
Here is the SQL Query I built however it does not appear to be working as I need it too.
SELECT N. NodeID, N.Caption, N.SysName, N.DNS, N.IP_Address, N.Device_Type
FROM (
SELECT Nodes.NodeID, Nodes.Caption, Nodes.SysName, Nodes.DNS, Nodes.Device_Type, Nodes.IP_Address
FROM Nodes
WHERE CHARINDEX('.',Nodes.SysName)>0 AND CHARINDEX('.',Nodes.DNS)>0
) N
WHERE SUBSTRING(N.SysName, 1, CHARINDEX('.',N.SysName)-1) <> SUBSTRING(N.DNS, 1, CHARINDEX('.',N.DNS)-1)
AND N.Device_Type = 'UPS'
ORDER BY 5 ASC, 2 ASC
Thanks in advance for the help
Try this, or something like it (I've no data to test it against):
SELECT N.NodeID, N.Caption, N.SysName, N.DNS, N.IP_Address, N.Device_Type
from Nodes N
where left(n.sysname, charindex('.', n.sysname + '.') - 1 )
<> left(n.dns, charindex('.', n.dns + '.') - 1)
order by N.IP_Address, N.Caption
The trick is to add a "." to the end of each string for evaluation purposes. If there already is a period in the string, this has no effect, otherwist you get the whole string.